This Web site has been created and is supported by the Computational Linguistics Laboratory (CLL) at Katanov State University of Khakasia  (KSUK) located in Abakan, Russia.
Its aim is to provide information about laboratory's activities and products.

The CLL conducts non-commercial theoretical and applied research within scopes of information retrieval, text summarization, data mining, computer assisted language learning (CALL), and corpus linguistics. The research is supported by federal and local grants. Since its foundation the laboratory has been headed by Viatcheslav Yatsko (last name also spelt "Iatsko"), ScD, Prof.

The CLL at KSUK was founded in 2002 to do research in the following areas.

  • Applied linguistics research, development of computer systems for Computer Assisted Language Learning and Instruction. By now 2 such systems have been created (see 'Products' section). A classification of software used in foreign language learning and teaching is given in [3;4;18] (see 'Publications' section).
  • Automatic text summarization research. V.Yatsko is the author of symmetric summarization conception that underlies PASS and ETS allowing to produce coherent and adequate summaries. For details see our publications [1-2; 12;14;17]. In 2008 we released Universal Summarizer (UNIS) that has a smart automatic text classification function. Once the text is classified as scientific, publicistic, or fiction UNIS applies algorithms specially optimized for this text type to significantly increase the quality of resulting summaries [18].
  • Evaluation of the Internet information retrieval systems. Depth of user's search [5] and reference dictionary conceptions are being developed to evaluate automatic text summarization systems as well as the Internet information retrieval systems [11; 15].
  • Discourse analysis. Integrational discourse analysis conception [6-9] distinguishes between surface and deep levels of discourse structure. Currently we are investigating various types of possessive discourse and linguistic features of possessive relations differentiating between alienable and inalienable possession.
  • Computer learner corpora research project.This ongoing project is aimed at 1) creating corpora of texts (dictations, expositions, compositions, etc.) produced by Russian-speaking learners of English; 2) creating tools for error tagging and automatic analysis of these corpora; 3) contrastive analysis of Russian learner corpora with corpora produced by speakers of other languages. The project is in line with research done by Granger et al [10]
  •   Corpus linguistics research. We developed Linguistic Toolbox (LIT), a concordance that provides the user with a set of instruments for linguistic analysis, such as tokenizer, text splitter, tagger, dictionary comparer, wordlist, concordancer. By means of these instruments the user can get statistic data about the text, annotate it with POS tags, and conduct various types of searches [15].

  •     Data/text mining. We are developing algorithms for mining chat logs and blogs with the aim of thwarting undesirable events, for example acts of violence. TEXOR system that performs such mining will be available online. Recently we completed a commercial project on sentiments analysis and opinion mining having created a system that recognized and analyzed opinions of users about commercial products. The system works on an ontology and linear grammar that we specially developed for this project.
  •      Laws of information distribution and automatic text classification. Basing on Bradford's law of scattering V.Yatsko suggested Y-law allowing to break a text into several zones. When performing text classification task distributions of parameters in respective zones of given texts are matched to effectively calculate distances between them. This methodology is called zonal-correlation text processing.