Personal tools
Document Actions


Up one level

The text2wfreq tool from the CMU SLM Toolkit presented in the lecture produces an unsorted list of words with their frequencies, e.g.:

 before 44
 SMEs 14
 improvements 5
 thorough 6
 mincemeat 1
 misunderstood 1
 finishing 2
 barrier 2
 learning 1
 confidence 34
 Human 1
 Fourthly 3

Log in to a WDOK machine and read the online documentation (the so-called man page) for the UNIX sort utility. You can access the documentation by typing man sort at the prompt. How can you sort a list in the format produced by text2wfreq

a) alphabetically

b) numerically by frequency?

Test it by generating a frequency list for a file from the Europarl corpus; you can find the files in the directory ~mxp/europarl-en/, e.g.:

 text2wfreq < ~mxp/europarl-en/ > frq

The frequency list is then written to the file frq.

Powered by Plone, the Open Source Content Management System