Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Објеката
- Тип
- Рад у зборнику
- Верзија рада
- објављена верзија
- Језик
- енглески
- Креатор
- Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović
- Извор
- Trans. Computational Collective Intelligence - Lecture Notes in Computer Science
- Уредник
- Ngoc Thanh Nguyen, Ryszard Kowalczyk, Alexandre Miguel Pinto and Jorge S. Cardoso
- Издавач
- Springer
- Датум издавања
- 2017
- Сажетак
- Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named entity recognition. Documents in this geological database are described by a summary report, and other data, such as title, domain, keywords, abstract, and geographical location. These metadata were used for generating a bag of words for each document with the aid of morphological dictionaries and transducers. Named entities within metadata were also recognized with the help of a rule-based system. Both the bag of words and the metadata were then used for pre-indexing each document. A combination of several tf idf based measures was applied for selecting and ranking of retrieval results of indexed documents for a specific query and the results were compared with the initial retrieval system that was already in place. In general, a significant improvement has been achieved according to the standard information retrieval performance measures, where the InQuery method perfromed the best.
- почетак странице
- 162
- крај странице
- 185
- doi
- 10.1007/978-3-319-59268-8_8
- isbn
- 978-3-319-59267-1
- Шира категорија рада
- M30
- Ужа категорија рада
- M33
- Права
- Отворен приступ
- Лиценца
- Creative Commons – Attribution-NonComercial-No Derivative Works 4.0 International
- Формат
- Волумен
- 26
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8