ABSTRACT

This chapter introduces an intelligent computer-assisted language learning system for Chinese miscollocation detection that integrates natural language processing with corpora. It presents the design of a syntax-based Chinese collocation checker under a data-driven language learning framework in which Chinese collocation errors can be identified and corrected via monolingual and bilingual corpus tools. The Chinese collocation checker is based on a Chinese dependency parser trained from the Sinica Chinese Treebank that can extract dependency relations in a Chinese sentence such as subject-verb and verb-object. A large Chinese corpus of approximately 0.1 billion words was processed using the dependency parser. Its output was used to develop a Chinese dependency relations database. When a Chinese sentence is input to the system, it will be parsed by the Chinese dependency parser, which will output the dependency relations in the sentence. Each of these dependency relations is automatically checked against the Chinese dependency relations database. Those that are not attested in the Chinese dependency relations database are treated as potential miscollocations and prompted to the users for scrutiny. Meanwhile, collocation candidates derived from the Chinese dependency relations database are automatically suggested to the users. It is argued that while these collocation candidates may potentially help learners correct some collocation errors, the process is very time-consuming and sometimes futile, as there is no support for the users’ first language. A new tool combining a web-based bilingual concordancer and a bilingual dictionary such as Linguee and Youdao Dictionary is found to be more efficient and may suit most learners’ needs. The design of the proposed Chinese collocation checker and its integration with bilingual concordancers under the data-driven language learning framework are discussed. The theoretical implications of the proposed method in relation to second language acquisition theories are also explored.