3. Informatics, information Science, and computer science. November 2018. In: Scientific and Technical Information Processing 45(4):235-240 DOI: 10.3103/S014768821804008
In this paper, I draw a distinction between information and computer sciences in terms of their objects and directions of research. In terms of the operating mode, I distinguish among automatic, automated, semi-automatic, and assistant systems and show that in each application domain there can be different configurations of these systems at different periods of time. In addition, I analyze the engineering, linguistic, and mathematical aspects of domain-specific research that falls within scope of informatics to formulate some problems, and discuss promising directions.
4. Another tagger. July 2018
The paper describes a hybrid part-of-speech tagger for Russian designed to support an opinion mining system. The tagger is based on the extensive morphological dictionary and Bayesian analysis. The paper suggests principles and methodologies for development of taggers for morphologically rich languages, such as the principle of lexical and morphological distribution, the principle of consecutive priority of lexical and morphological classes, the principle of morphological variation; lexical, lemma-based, and affixal methodologies for parts-of-speech recognition. Much attention is paid to the description of instances of sound alteration and deletion characteristic of contemporary Russian. A generalized algorithm of tagger's functioning is described that includes 44 steps. The algorithm was realized in a stand-alone application distributed as freeware. Russian version of the paper (shorter and outdated) was published in Integratsia Nauk journal (p. 78-83) and is available at http://in-sc.ru/d/1942991/d/vypusk_1014.pdf Key words: principles and methods for Russian parts-of-speech recognition, opinion mining system, morphological dictionary, lexical and morphological classes, morphological variation, part-of-speech tagger's algorithm.
5. Bayes theorem and a methodology for prediction of US presidential elections results. November 2017.
6. The principles for the investigation of the historical development of computer science.
The paper develops a methodology for prediction of US presidential elections results. The methodology involves the following procedures. 1) Select time horizon for statistical analysis. 2) Select parameters for statistical analysis. 3) Study configuration of parameters during the election year for which the forecast is made and extrapolate this configuration to previous years. 4) Apply Bayes theorem using statistical data about previous elections to calculate probabilities for the GOP and Democrats. 5) If some or all parameters are not known make a projection for the given year basing on results of previous campaigns and developing optimistic, pessimistic and intermediate scenarios. Basing on this methodology the paper first makes prediction for 2016 US presidential election taking its results as a reference data and then formulates the projection for 2020 basing on optimistic and pessimistic scenarios for Republicans and Democrats. Russian version of the paper was published in Integratsia Nauk journal, issue 10(14), 2017, pp.16-22.
July 2017. In: Scientific and Technical Information Processing 44(3):207-214.
The paper proposes the principles of publicity of historical process, consequency, and paradigmality. Depending on their significance, historical events are classified into critical, epochal, key, and occasional events. Local and global events are differentiated and possible variants of representation of local events at the global level are determined. The paper describes variants of correlation between the new paradigm and the old one, such as total rejection, partial rejection, mergence, and co-existence. It proves that the last variant is more characteristic of computer science. An attempt is made to distinguish “computer science” from “information science.”
7. Distinctive features of the structure of linguistic ontology. June 2017 In: Automatic Documentation and Mathematical Linguistics 25(3):149-158. DOI: 10.3103/S0005105517030128
This paper describes a methodology for developing a linguistic ontology as a component of a system for automatic analysis of customer opinions about commercial products. The fundamental principles of building ontologies of this type are substantiated, which include the following: the relationship between ontology and grammar; distinguishing parametric and evaluative terms in its structure and classification of evaluative terms into syntactic and semantic ones; the binary relationship between syntactic and semantic terms; the gradation scale of the intensity of evaluations. The cases of the homonymy and synonymy of evaluative terms are analyzed for the first time based on Russian data.
8. Evaluation of the efficiency of the chi-square metric. July 2016. In Automatic Documentation and Mathematical Linguistics 50(4):173-178. DOI:10.3103/S0005105516040051
The efficiency of using the chi-square metrics to weigh terms used in text documents is evaluated. The procedure includes the selection and advanced processing of class C and ~C texts, compilation of a reference dictionary and calculation of scores for all the terms in the dictionary, calculation of χ2 coefficients for terms from a class C text, and calculation of the general efficiency factor by the sum of the coefficients found for the terms from the reference dictionary. The weighting by the χ2 formula, odds-ratio (OR) formula, and on the basis of probabilistic variables is analyzed and compared. It was found that the best result is yielded by the OR-based weighting.
9. The Methodology of Symmetric Weighting of Sentences. February 2016.
The paper describes the methodology of symmetric weighting of sentences that involves creation of a dictionary and calculation of connections between sentences. It demonstrates opportunities for application of this methodology for the purposes of text summarization and authorship attribution. It develops an original methodology to compare results of symmetric weighting with results of Copernic Summarizer and AutoSummarize function in MS Word. Based on standard deviation symmetric weighting can be used for the purpose of authorship attribution. This paper is the English translation of the Russian paper, which is available at http://lamb.viniti.ru/sid2/sid2free?sid2=J14210360 Original text formatting and pagination retained. Cite this paper as: Yatsko, V.A. (2016) The methodology of symmetric weighting of sentences. Naucno-technicheskaya informatsia. Ser.2. [Scientific and Technical Information. Series 2.] , 2:36-41.
10. Automatic text classification method based on Zipf's law. June 2015. In Automatic Documentation and Mathematical Linguistics 49(3):83-88. DOI: 10.3103/S0005105515030048
This paper describes a method for automatic text classification based on analyzing the deviation of the word distribution from Zipf’s law, combined with the zonal data processing approach. Deviation is understood as the difference between the actual numerical score of a word calculated based ob its frequency and its score calculated according to Zipf’s law. The proposed method involves the division of input and reference texts into J0, J1, and J2 zones, and the creation of a numerical series using the words that are contained in the J0 zone. The constructed numerical series shows the difference between the real scores of words and the scores calculated according to Zipf’s law. The proposed method can significantly reduce text dimensionality and thus improve the running speed of automatic text classification