CorpusWiki is an online platform for the collaborative creation, annotation, and exploitation of freely available textual corpora for any language in the world. The main features of the system are highlighted below, for more information you can consult the documentation section.
CorpusWiki allows you to work on collaboratively created annotated corpora for all languages in the world. You can study any of the languages already in the system, or help create a corpus for new languages. Contribute to make your language available for linguistic research!
For many languages without a long written tradition or with very few speakers, creating a corpus of a reasonable size is not a trivial matter. That is why CorpusWiki attempts to make it easy for any speaker to contribute any text they might have.
All corpora in CorpusWiki can be searching using the powerful Corpus Query Processor (CQP). For annotated texts, all annotation features can be used in complex search actions.
CorpusWiki is capable of recognizing the language of a text in about 800 languages, and will automatically start to recognize each language for which there is a corpus in the system.
CorpusWiki features an easy, graphical interface for providing each word in each text of the corpus with morphosyntactic features, as well as a labels for meaning and pronunciation. CorpusWiki will train an internal part-of-speech tagger to automatically assign the most likely features to each word in a new text.
The translation glosses in the CorpusWiki corpora are used to automatically generate a bilingual dictionary. Furthermore, it is possible to create a corpus-driven monolingual dictionary for each corpus. All dictionaries can be consulted online, and monolingual dictionaries can be downloaded in a number of standard dictionary formats such as XDXF, LIFT, and Shoebox.