CorpusUp one level
For the following series of assignments you need your own corpus of texts.
Search for an available free corpus or create your own collection (e.g. from Web pages, News groups, ...).
Divide your material into a development set, a training set and a set for testing.
Store your corpus files in your WDOK home directory in a directory called
corpus. Put the three sets into three subdirectories called
Describe the sources of your corpus and characterize your corpus with respect to size, text types (e.g. newspaper stories, scientific abstracts, ...) and domain(s) (e.g. politics, biomedicine, ...).