ABSTRACT

Corpus Linguistics is a relatively new field which was made possible as a result of the wider availability and use of personal computers in the last quarter of the twentieth century. The field’s rationale is that we can better understand language production if we use computer software to identify linguistic patterns that occur across large sets of texts that have been collected in order to be representative of a particular language variety. As an example of how corpora are sampled in order to be balanced, the first corpus, the Brown Corpus which comprised 1 million words of written American English published in 1961, contained 500 samples of writing, each of about 2,000 words taken from 15 genres which included news, fiction, academic writing and government documentation.