Word class frequencies in combination with type-token statistics

Concept: This function indicates what proportions the various word classes form of the lexical inventory, showing lexical variation overall and within specific word classes . To this end, the bi-logarithmic type-token ratio will be calculated (Scott 2004: 124), namely, the ratio between the total number of words in a text and the number of specific lexemes therein. This method shows whether a text is relatively rich or poor in vocabulary, considering its length.

Function: Calling up this function for the entire corpus allows users to choose a text database from a drop down list ("Select text database"), the options being a (non-Demotic) Egyptian and a Demotic corpus of texts. When this function is called up via part of the corpus, non-Demotic Egyptian will be the default settingUsing the drop-down list "Run analysis for", users may opt for the index to be generated for top level lemmata within the hierarchy of the lemma list (suggested), or just for individual lemmata. Lemmata are hierarchized according to lexicographical principles, with variants of meaning being differentiated or cross-references given. This hierarchical structure has been refined during work on the lemma list. The top level in this hierarchy comprises all sub-entries, providing access to all references/citations that might be of interest..

Results: For the entire inventory and for individual word classes, a results table will show the number of lexemes and their proportion relative to the lexical inventory of the corpus. The third column contains the bi-logarithmic type-token ratio (Scott 2004).

Scott, Mike: Oxford WordSmith Tools, Oxford 2004

