Combining Heterogeneous Lexical Resources
IDE), which allows them to share tools and facilitates the creation of mixed-language solutions. In addition, these languages leverage the functionality of the .NET Framework, which provides access to key technologies that simplify the development of ASP Web applications and XML Web services
the XML Schema definition language (XSD) enables the definition of the structure and data types of XML documents. Figure 1 shows the graphical representation of XSD schema of Serbian WN. The XML Path Language (XPath) provides a language for addressing parts of an XML document
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca" za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu
It is a challenge to produce and maintain up-to-date terminology resources, especially for an under-resourced language, such as Serbian. Today, Serbian terminology is transferred mainly from English
domain terms for the source language (Input ii) is (a) the source language part of LIS-dict including SWTs; (b) the output of the extractor Eng-TE applied to the source language part of the aligned input corpus; 3. The extraction of the set of MWTs in the target language by Serb-TE (Input iii) was
Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih
information is a crucial component in human language technology, the FrAC module facilitates sharing and utilising this valued information [9], as presented in Listing 3. 4 Discussion and conclusion In this paper, we presented AbCoSER 1.0, the first corpus of abusive language in Serbian which consists of tweets
as a language successfully, and thus the language column of a tweet could not be relied upon, the annotators were given one more task – to check the language of a tweet and whether it could be interpreted. They needed to mark tweets with meaningless content, tweets written in a foreign language or m
Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages
The Repository is available at: www.dr.rgf.bg.ac.rs Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 3232–3242 Marseille, 11–16 May 2020 c© European Language Resources Association (ELRA), licensed under CC-BY-NC 3232 A Multilingual Evaluation Dataset
Keywords: lexical semantic resources, sense alignment, lexicography, language resource 1. Introduction Lexical semantic resources (LSRs) are knowledge repositories that provide the vocabulary of a language in a descriptive and structured way. One of the famous examples of LSRs are
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... the 49th Annual Meeting of the ACL: Human Language Technologies: short papers – Volume 2. Association for Computational Linguistics, 564–568. [10] Matthieu Constant, Cvetana Krstev, and Duško Vitas. 2015. Hybrid Lexical Tagging in Serbian. In Proc. of 7th Language & Technology Conference. Fundacja U ...
There are many different theories on what irony is and what role it plays in language understanding. According to [33] "Irony is . . . a uniquely human mode of communication, curious in that the speaker says something other than what he or she intends".
annotators were asked to decide whether the language of the tweet was recognized and whether the tweet represents an ironic statement.13 The results of the language tagging were used to estimate a binary language classifier (BCMS or not_BCMS). After the language classification we obtained a subset of 1
Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.
... Stanković, Miloš Utvić, Ivan Obradović and Božo Kolonja. “Managing Mining Project Documentation Using Human Language Technology”. The Electronic Library Vol. 36, no. 6 (2018): 993– 1009. https://doi.org/10.1108/EL-11-2017-0239 Infotheca Vol. 19, No. 2, December 2019 117 Stanković R. and Utvić ...
The main objectives are 1) to upgrade the existing web interfaces for searching through language resources and 2) to enable querying language resources supported with available lexical resources. The language resources to be searched are various digital libraries and corpora, but in this paper, we
Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog
... Prace Filologiczne, vol. LXIII, Warszawa, 2012, 279–292. Acknowledgements This paper is supported by the COST Action CA19102 - LITHME “Language in the Human-Machine Era” https://lithme.eu/. Access to SketchEngine is provided by the ELEXIS project funded by the European Union’s Horizon 2020 research ...
Barnbrook (2002) claimed that definition is a basic activity of language, of particular importance to linguists because of its use of language to describe itself. He described the subset of general language used in definition sentences and the development of a taxonomy of definition types
Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
Building Terminological Resources in an e-Learning Environment
course Information technologies. For each concept separate Serbian and English entries were created. In line with the standard requirements for glossaries, besides the basic Serbian and English terms, each entry contained a short definition of the term in the respective language. However, no synonyms
functionality within the information system, an UML (Unified Modeling Language) engineering model with a special structure has been developed, whose main features are depicted in Figure 2. Assuming basic familiarity with this language we will briefly comment this model. The class Rečnik in the model
synonyms of the basic term, its available translational equivalent in the chosen language, and the inflectional forms of the Serbian term and its synonyms. Namely, as Serbian is a morphologically very rich language, there was a need to provide for all inflectional forms of terms, as they can be
Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
Development of terminological resources for expert knowledge: a case study in mining
Ljiljana Kolonja, Ranka Stanković, Ivan Obradović, Olivera Kitanović, Aleksandar Cvjetić. "Development of terminological resources for expert knowledge: a case study in mining" in Knowledge Management Research & Practice, Palgrave Macmillan (2015). https://doi.org/10.1057/kmrp.2015.10
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History" CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje
contained 14 5spaCy, https://spacy.io/ 6ELTeC (Distant Reading for European Literary History) had 8 collections contained targeted 100 novels per language. The SrpELTeC corpus7 in the latest ELTeC release has 90 novels. The work on this collection
Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... Minimally-supervised Extraction of Enti- ties from Text Advertisements. In Human Lan- guage Technologies: The 2010 Annual Conference of the North American Chapter of the Associa- tion for Computational Linguistics. Association for Computational Linguistics, pages 73–81. Pontus Stenetorp, Sampo Pyysalo ...
... Task: Language-independent Named Entity Recognition. In COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002). Satoshi Sekine, Kiyoshi Sudo, and Chikashi No- bata. 2002. Extended Named Entity Hier- archy. In Proceedings of the Third Interna- tional Conference on Language Resources ...
Advantages and challenges in presenting mathematical content using EDX platform
development of terminological dictionaries in various fields. The realization of the application was based on the ASP.NET Framework for C# programming language and MVC design pattern, as well as HTML and JavaScript, whereas SQL Server served as support for the database. The application is located at h
bases are needed. According to [11] there is a great difference between natural languages and mathematical terms. For instance, in Serbian natural language the word "prava" is an adjective but within mathematical terms in Serbian it is a noun. Thus, there is a need for developing a Semantic, Mu
In recent years, rapid improvement of educational and internet technologies has contributed to faster development of Open Educational Resources. OERs have had significant impact on lifelong learning and on the availability
Marija Radojičić, Ivan Obradović, Ranka Stanković, Olivera Kitanović, Roberto Linzalone. "Advantages and challenges in presenting mathematical content using EDX platform" in The Seventh International Conference on e-Learning (eLearning-2016), Belgrade : Metropolitan University (2016)
Simulation of Hydrogeological Environmental Discharge in Case of Interruption Constant Observations
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these
... Heidelberg (1989), http://dx.doi.org/10.1007/3-540-51465-1 3 3. Hiemstra, D.: Using language models for information retrieval. Taaluitgeverij Nes- lia Paniculata (2001) 4. Jackson, P., Moulinier, I.: Natural language processing for online applications: Text retrieval, extraction and categorization, vol ...
assigning of surrogates is usually done by extracting and selecting terms (words) that appear in the text of documents. To that end, many natural language processing (NLP) methods and techniques are used: determining the boundaries of sentences, tokenization, stemming, tagging, recognition of nominal
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... Belgrade, 2008. [5] A. Savary, “Computational Inflection of Multi-Word Units — A Con- trastive Study of Lexical Approaches,” Linguistic Issues in Language Technologies, vol. 1, no. 2, 2008. [6] C. Krstev and D. Vitas, “Finite State Transducers for Recognition and Generation of Compound Words,” in IS-LTC 2006 ...
data. Finally, we discuss some further possible applications of our procedure and LeXimir in language processing tasks. I. INTRODUCTION MORPHOLOGICAL electronic dictionaries of Serbian for natural language processing (NLP) are being developed for many years now. Their development follows the m
before, most of these conditions are satisfied for many languages. However, in order to apply this functionality to a new language it
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... Belgrade (2008) 3. Savary, A.: Computational Inflection of Multi-Word Units - A Contrastive Study of Lexical Approaches. Linguistic Issues in Language Technologies 1 (2008) 4. Krstev, C., Vitas, D.: Finite State Transducers for Recognition and Generation of Compound Words. In Erjavec, T., Žganec Gros ...
... In: 6th LREC, Marrakech, Marocco (2008) 7. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Pro- cessing. MIT Press (2001) 8. Laporte, E.: Lexicons and Grammars for Language Processing: Industrial or Hand- crafted Products? In Rezende, L.M., da Silva, B.C.D., Barbosa, J.B., eds ...
... multi-word units | Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić | Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010 | 2010 | | 10.1007/978-3-642-14770-8_26 ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... (corpora and e-dictionaries), as well as applications for basic language processing (tokenization, Part-Of-Speech (POS) tagging, mor- phological analysis), information retrieval and extraction [26]. Several successful applications of Serbian language resources and tools in tasks related to document indexing ...
... entropy modeling. Zečević and Vujičić-Stanković [27] apply various language-identification tools to distinguish Serbian among other closely related languages. In this paper we describe an application of lexical resources and language tools for solving a big data problem, namely improvement of document ...
... 69–96 (2011) 17. Milosevic, N.: Stemmer for Serbian language. CoRR abs/1209.4471 (2012). http:// arxiv.org/abs/1209.4471 18. Mladenović, M., Mitrović, J., Krstev, C., Vitas, D.: Hybrid sentiment analysis framework for a morphologically rich language. J. Intell. Inf. Syst. 1–22, to appear 19. Nadeau ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8