Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources ⚒ Радови ⚒ Др РГФ

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Објеката

Тип: Рад у зборнику
Верзија рада: објављена верзија
Језик: енглески
Креатор: Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović
Извор: Trans. Computational Collective Intelligence - Lecture Notes in Computer Science
Уредник: Ngoc Thanh Nguyen, Ryszard Kowalczyk, Alexandre Miguel Pinto and Jorge S. Cardoso
Издавач: Springer
Датум издавања: 2017
Сажетак: Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named entity recognition. Documents in this geological database are described by a summary report, and other data, such as title, domain, keywords, abstract, and geographical location. These metadata were used for generating a bag of words for each document with the aid of morphological dictionaries and transducers. Named entities within metadata were also recognized with the help of a rule-based system. Both the bag of words and the metadata were then used for pre-indexing each document. A combination of several tf idf based measures was applied for selecting and ranking of retrieval results of indexed documents for a specific query and the results were compared with the initial retrieval system that was already in place. In general, a significant improvement has been achieved according to the standard information retrieval performance measures, where the InQuery method perfromed the best.
почетак странице: 162
крај странице: 185
doi: 10.1007/978-3-319-59268-8_8
isbn: 978-3-319-59267-1
Шира категорија рада: M30
Ужа категорија рада: M33
Права: Отворен приступ
Лиценца: Creative Commons – Attribution-NonComercial-No Derivative Works 4.0 International
Формат: .pdf
Волумен: 26
ORCID: https://orcid.org/0000-0001-5123-6273; https://orcid.org/0000-0001-5123-6273; https://orcid.org/0000-0002-7571-2729

Скупови објеката: Ранка Станковић; Оливера Китановић; Иван Обрадовић; Radovi istraživača

Медија: Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8 M33