Proceedings of the 10th International Academic Conference, Vienna

AUTOMATIC GENERATION OF ASSOCIATION THESAURUS BASED ON DOMAIN-SPECIFIC TEXT COLLECTION

ALIYA NUGUMANOVA, DINARA ISSABAEVA, YERZHAN BAIBURIN

Abstract:

The given work examines distributive approach for automatic generation of the associative thesauri of a definite domain. Distributive approach is based on assumption that presence of associative link among terms of the domain is defined by the statistics of their co-occurence in thematically related discources. The advantage of distributive approach is defined by the fact that it uses raw basic material (for example collection of documents of the domain) and it does not use additional knowledge about the domain. Distributive approach is supported only by mathematical apparatus of statistics and does not take into account neither lexical nor semantic information, that is why this approach let cover extensive lexical space of terms. However it leads to the main shortcoming of the approach, i.e. it produces excessive amount of “unnecessary” links among words which are less informative from utilitarian point of view. For solving set problems in the given work it is suggested to use special approach represented by combination of methods of distributive statistics, latent semantic analysis and graph theory.

Keywords: LSA, thesaurus, chi-square test, graph

PDF: Download



Copyright © 2024 The International Institute of Social and Economic Sciences, www.iises.net