Personal tools
You are here: Home Personen Dr.-Ing. Michael Piotrowski, M.A. Exercises NLS II Corpus
Document Actions

Corpus

Up one level

For the following series of assignments you need your own corpus of texts.

Search for an available free corpus or create your own collection (e.g. from Web pages, News groups, ...).

Divide your material into a development set, a training set and a set for testing.

Store your corpus files in your WDOK home directory in a directory called corpus. Put the three sets into three subdirectories called devel, train, and test.

Describe the sources of your corpus and characterize your corpus with respect to size, text types (e.g. newspaper stories, scientific abstracts, ...) and domain(s) (e.g. politics, biomedicine, ...).



Powered by Plone, the Open Source Content Management System