Претрага
345 items
-
Combining Heterogeneous Lexical Resources
... (IDE), which allows them to share tools and facilitates the creation of mixed-language solutions. In addition, these languages leverage the functionality of the .NET Framework, which provides access to key technologies that simplify the development of ASP Web applications and XML Web services ...
... the other hand, the XML Schema definition language (XSD) enables the definition of the structure and data types of XML documents. Figure 1 shows the graphical representation of XSD schema of Serbian WN. The XML Path Language (XPath) provides a language for addressing parts of an XML document ...
... Resources | Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić | Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4 | 2004 | | http://dr.rgf.bg.ac.rs/s/repo/item/0004863 Дигитални репозиторијум Ру ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... Texts”. Natural Language Engi- neering Vol. 22, no. 4 (2016): 517–548 Koehn, Philipp, Franz Josef Och and Daniel Marcu. “Statistical Phrase- based Translation”. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology ...
... “debugger”, the transcribed version is adopted for everyday use in Information Technologies domain. It is a challenge to produce and maintain up-to-date terminology re- sources, especially for an under-resourced language, such as Serbian. Today, Serbian terminology is transferred mainly from English ...
... domain terms for the source language (Input ii) is (a) the source language part of LIS-dict including SWTs; (b) the output of the extractor Eng-TE applied to the source language part of the aligned input corpus; 3. The extraction of the set of MWTs in the target language by Serb-TE (Input iii) was ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
-
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... n for Computational Linguistics: Human Language Technologies, June 1ŰJune 6, 2018, New Orleans, Louisiana, Vol. 1, 2018. 47 Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. Overview of the germeval 2018 shared task on the identification of offensive language. In Proceedings of GermEval 2018, ...
... information is a crucial component in human language technology, the FrAC module facilitates sharing and utilising this valued information [9], as presented in Listing 3. 4 Discussion and conclusion In this paper, we presented AbCoSER 1.0, the first corpus of abusive language in Serbian which consists of tweets ...
... as a language successfully, and thus the language column of a tweet could not be relied upon, the annotators were given one more task – to check the language of a tweet and whether it could be interpreted. They needed to mark tweets with meaningless content, tweets written in a foreign language or m ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... Bulgarian were par- tially funded by the Bulgarian National In- terdisciplinary Research e-Infrastructure for Resources and Technologies in favor of the Bulgarian Language and Cultural Heritage, part of the EU infrastructures CLARIN and DARIAH – CLaDA-BG, Grant number DO1- 272/16.12.2019. This work ...
... - The Repository is available at: www.dr.rgf.bg.ac.rs Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3232–3242 Marseille, 11–16 May 2020 c© European Language Resources Association (ELRA), licensed under CC-BY-NC 3232 A Multilingual Evaluation Dataset ...
... eu/MWSA. Keywords: lexical semantic resources, sense alignment, lexicography, language resource 1. Introduction Lexical semantic resources (LSRs) are knowledge reposi- tories that provide the vocabulary of a language in a de- scriptive and structured way. One of the famous examples of LSRs are ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... the 49th Annual Meeting of the ACL: Human Language Technologies: short papers – Volume 2. Association for Computational Linguistics, 564–568. [10] Matthieu Constant, Cvetana Krstev, and Duško Vitas. 2015. Hybrid Lexical Tagging in Serbian. In Proc. of 7th Language & Technology Conference. Fundacja U ...
... 1145/3136273.3136298 1 INTRODUCTION There are many different theories on what irony is and what role it plays in language understanding. According to [33] “Irony is . . . a uniquely human mode of communication, curious in that the speaker says something other than what he or she intends”. Like- wise ...
... annotators were asked to decide whether the language of the tweet was recognized and whether the tweet represents an ironic statement.13 The results of the language tagging were used to estimate a binary language classifier (BCMS or not_BCMS). After the language classification we obtained a subset of 1 ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
-
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... concerning the future work. 2 Corpora Numerous corpora are developed within research activities of Human Language Technology (HLT) Group at the University of Belgrade and the Language Resources and Technologies Society (JeRTeh): – monolingual general corpora: Corpus of Contemporary Serbian (versions SrpKor2003 ...
... Stanković, Miloš Utvić, Ivan Obradović and Božo Kolonja. “Managing Mining Project Documentation Using Human Language Technology”. The Electronic Library Vol. 36, no. 6 (2018): 993– 1009. https://doi.org/10.1108/EL-11-2017-0239 Infotheca Vol. 19, No. 2, December 2019 117 Stanković R. and Utvić ...
... The main objectives are 1) to upgrade the existing web interfaces for searching through language resources and 2) to enable querying language resources supported with available lexical resources. The language resources to be searched are various digital libraries and corpora, but in this paper, we ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... lemmatized and POS tagged corpus of textbooks (Stanković et al. 2020) is available on the corpus search platform of the Jerteh Society for Language Resources and Technologies https://noske.jerteh.rs/#dashboard?corpname=SkolKor. The search system for monolingual and multilingual corpora is based on the NoSketch ...
... Prace Filologiczne, vol. LXIII, Warszawa, 2012, 279–292. Acknowledgements This paper is supported by the COST Action CA19102 - LITHME “Language in the Human-Machine Era” https://lithme.eu/. Access to SketchEngine is provided by the ELEXIS project funded by the European Union’s Horizon 2020 research ...
... al., 2020). Barnbrook (2002) claimed that definition is a basic activity of language, of particular importance to linguists because of its use of language to describe itself. He described the subset of general language used in definition sentences and the development of a taxonomy of definition types ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Building Terminological Resources in an e-Learning Environment
... course Information technologies. For each concept separate Serbian and English entries were created. In line with the standard requirements for glossaries, besides the basic Serbian and English terms, each entry contained a short definition of the term in the respective language. However, no synonyms ...
... functionality within the information system, an UML (Unified Modeling Language) engineering model with a special structure has been developed, whose main features are depicted in Figure 2. Assuming basic familiarity with this language we will briefly comment this model. The class Rečnik in the model ...
... synonyms of the basic term, its available translational equivalent in the chosen language, and the inflectional forms of the Serbian term and its synonyms. Namely, as Serbian is a morphologically very rich language, there was a need to provide for all inflectional forms of terms, as they can be ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
-
Development of terminological resources for expert knowledge: a case study in mining
Ljiljana Kolonja, Ranka Stanković, Ivan Obradović, Olivera Kitanović, Aleksandar Cvjetić. "Development of terminological resources for expert knowledge: a case study in mining" in Knowledge Management Research & Practice, Palgrave Macmillan (2015). https://doi.org/10.1057/kmrp.2015.10
-
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...... Literary En- tities. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua- ge Technologies, volume 1, pages 2138–2144. Niels Dekker, Tobias Kuhn, and Marieke van Erp. 2019. Evaluating Named Entity Recognition To- ols for Extracting ...
... Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić | Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications | 2021 | | 10.26615/978-954-452-072-4_141 http://dr.rgf.bg.ac.rs/s/repo/item/0005139 ...
... contained 14 5spaCy, https://spacy.io/ 6ELTeC (Distant Reading for European Literary Hi- language sub-collections each with at least 50 novels, while 8 collections contained targeted 100 novels per language. The SrpELTeC corpus7 in the latest EL- TeC release has 90 novels. The work on this collection ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... Minimally-supervised Extraction of Enti- ties from Text Advertisements. In Human Lan- guage Technologies: The 2010 Annual Conference of the North American Chapter of the Associa- tion for Computational Linguistics. Association for Computational Linguistics, pages 73–81. Pontus Stenetorp, Sampo Pyysalo ...
... Task: Language-independent Named Entity Recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). Satoshi Sekine, Kiyoshi Sudo, and Chikashi No- bata. 2002. Extended Named Entity Hier- archy. In Proceedings of the Third Interna- tional Conference on Language Resources ...
... Entity Recognition Systems for Serbian - The Case of Personal Names | Branislava Šandrih, Cvetana Krstev, Ranka Stanković | Proceedings - Natural Language Processing in a Deep Learning World | 2019 | | 10.26615/978-954-452-056-4_122 http://dr.rgf.bg.ac.rs/s/repo/item/0005243 Дигитални репозиторијум ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Advantages and challenges in presenting mathematical content using EDX platform
... development of terminological dictionaries in various fields. The realization of the application was based on the ASP.NET Framework for C# programming language and MVC design pattern, as well as HTML and JavaScript, whereas SQL Server served as support for the database. The application is located at h ...
... bases are needed. According to [11] there is a great difference between natural languages and mathematical terms. For instance, in Serbian natural language the word “prava” is an adjective but within mathematical terms in Serbian it is a noun. Thus, there is a need for developing a Semantic, Mu ...
... Università degli Studi della Basilicata, roberto.linzalone@unibas.it Abstract: In recent years, rapid improvement of educational and internet technologies has contributed to faster development of Open Educational Resources. OERs have had significant impact on lifelong learning and on the availability ...Marija Radojičić, Ivan Obradović, Ranka Stanković, Olivera Kitanović, Roberto Linzalone. "Advantages and challenges in presenting mathematical content using EDX platform" in The Seventh International Conference on e-Learning (eLearning-2016), Belgrade : Metropolitan University (2016)
-
Simulation of Hydrogeological Environmental Discharge in Case of Interruption Constant Observations
Marina Čokorilo Ilić, Dragoljub Bajić, Miroslav Popović. "Simulation of Hydrogeological Environmental Discharge in Case of Interruption Constant Observations" in International Scientific Conference - Sinteza 2024, Belgrade, 16. maj 2024, Singidunum University (2024). https://doi.org/10.15308/Sinteza-2024-288-294
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... are not taken into consideration. This can par- tially solve the problem of the rich morphology that characterizes Serbian, as a language belonging to the South-Slavic Language family. For instance, scanning with lignit ‘lignite’ will also retrieve inflected forms lignita, lignitu, lignitom, etc. Search ...
... Heidelberg (1989), http://dx.doi.org/10.1007/3-540-51465-1 3 3. Hiemstra, D.: Using language models for information retrieval. Taaluitgeverij Nes- lia Paniculata (2001) 4. Jackson, P., Moulinier, I.: Natural language processing for online applications: Text retrieval, extraction and categorization, vol ...
... assigning of surrogates is usually done by ex- tracting and selecting terms (words) that appear in the text of documents. To that end, many natural language processing (NLP) methods and techniques are used: determining the boundaries of sentences, tokenization, stemming, tagging, recognition of nominal ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Malathion-induced inhibition of human plasma cholinesterase studied by the fluorescence spectroscopy method
The in vitro effect of technical grade malathion was assessed via the kinetic parameters of human plasma butyrylcholinesterase (BChE) using N-methylindoxyl acetate as a substrate for BChE. An inhibitor kinetics study demonstrated the existence of a biphasic inhibition curve, indicating high- and low-affinity binding sites of malathion. The IC50 values as calculated from the experimental inhibition curves were 1.33 × 10–9 and 1.48 × 10–5 M for the high- and low-affinity binding sites, respectively; Hill’s analysis gave 1.29 × ...V. M. Pavelkić, K. S. Krinulović, J. Z. Savić, M. A. Ilić. "Malathion-induced inhibition of human plasma cholinesterase studied by the fluorescence spectroscopy method" in Russian Journal of Physical Chemistry A, Pleiades Publishing Ltd (2008). https://doi.org/10.1134/S0036024408050312
-
Testing the energy value of different types of coal by the method of active thermography
In this paper, coal thermograms are presented and analyzed in order to determine their energy value. Two types of coal of different categories, brown and lignite, were selected for active thermographic imaging. The tested coal samples were processed before measurement so that they are similar in dimensions and have two plane-parallel smooth surfaces. The test samples were "primarily" heated under the same conditions and the process of their cooling was monitored by a thermal camera. "Then" they were cooled ...Stevan Đenadić, Ljubiša Tomić, Vesna Damnjanović, Predrag Jovančić, Dragutin Jovković. "Testing the energy value of different types of coal by the method of active thermography" in 9th International Scientific Conference on Defensive Technologies, Belgrade, Serbia, 15-16 October 2020, The Military Technical Institute (2020)
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... Belgrade, 2008. [5] A. Savary, “Computational Inflection of Multi-Word Units — A Con- trastive Study of Lexical Approaches,” Linguistic Issues in Language Technologies, vol. 1, no. 2, 2008. [6] C. Krstev and D. Vitas, “Finite State Transducers for Recognition and Generation of Compound Words,” in IS-LTC 2006 ...
... data. Finally, we discuss some further possible applications of our procedure and LeXimir in language processing tasks. I. INTRODUCTION MORPHOLOGICAL electronic dictionaries of Serbian for natural language processing (NLP) are being de- veloped for many years now. Their development follows the m ...
... before, most of these conditions are satisfied for many languages. However, in order to apply this functionality to a new language it would be necessary to develop a new language-dependent strategy, that is, a new XML document. It is also worth mentioning that the system can be easily modified to work ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... Belgrade (2008) 3. Savary, A.: Computational Inflection of Multi-Word Units - A Contrastive Study of Lexical Approaches. Linguistic Issues in Language Technologies 1 (2008) 4. Krstev, C., Vitas, D.: Finite State Transducers for Recognition and Generation of Compound Words. In Erjavec, T., Žganec Gros ...
... In: 6th LREC, Marrakech, Marocco (2008) 7. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Pro- cessing. MIT Press (2001) 8. Laporte, E.: Lexicons and Grammars for Language Processing: Industrial or Hand- crafted Products? In Rezende, L.M., da Silva, B.C.D., Barbosa, J.B., eds ...
... multi-word units | Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić | Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010 | 2010 | | 10.1007/978-3-642-14770-8_26 ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... (corpora and e-dictionaries), as well as applications for basic language processing (tokenization, Part-Of-Speech (POS) tagging, mor- phological analysis), information retrieval and extraction [26]. Several successful applications of Serbian language resources and tools in tasks related to document indexing ...
... entropy modeling. Zečević and Vujičić-Stanković [27] apply various language-identification tools to distinguish Serbian among other closely related languages. In this paper we describe an application of lexical resources and language tools for solving a big data problem, namely improvement of document ...
... 69–96 (2011) 17. Milosevic, N.: Stemmer for Serbian language. CoRR abs/1209.4471 (2012). http:// arxiv.org/abs/1209.4471 18. Mladenović, M., Mitrović, J., Krstev, C., Vitas, D.: Hybrid sentiment analysis framework for a morphologically rich language. J. Intell. Inf. Syst. 1–22, to appear 19. Nadeau ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8