Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these
... ex- tracting and selecting terms (words) that appear in the text of documents. To that end, many natural language processing (NLP) methods and techniques are used: determining the boundaries of sentences, tokenization, stemming, tagging, recognition of nominal phrases and named entities and, finally, parsing ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
The Many Faces of SrpKor
Акроним СрпКор означава фамилију електронских корпуса савременог српског језика чија је изградња почела крајем седамдесетих година прошлога века, а која је постала шире видљива заинтересованој истраживачкој заједници објављивањем његове прве верзије на вебу 2002. године. У овом дугом периоду, посебно пре појаве корисних текстуелних ресурса на вебу, развој корпуса се састојао у прикупљању и обради грађе као и у развоју метода обраде корпуса. Наиме, електронски корпус није само колекција текстова у дигиталном облику (како се то, на пример, наводи ...Duško Vitas, Ranka Stanković, Cvetana Krstev. "The Many Faces of SrpKor" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)
Creation of a Training Dataset for Question-Answering Models in Serbian
Razvoj i primena veštačke inteligencije u jezičkim tehnologijama značajno su napredovali poslednjih godina, posebno u domenu zadatka odgovaranja na pitanja (Question Answering - QA). Dok su postojeći resursi za QA zadatke razvijeni za glavne svetske jezike, srpski jezik je relativno zanemaren u ovoj oblasti. Ovaj rad predstavlja inicijativu za kreiranje obimnog i raznovrsnog skupa podataka za obučavanje modela za odgovaranje na pitanja na srpskom jeziku, koji će doprineti unapređenju jezičkih tehnologija za srpski jezik. Pored brojnih istraživanja o jezičkim modelima ...veštačka inteligencija, obrada prirodnog jezika, jezički resursi, anotirani skupovi, ekstrakcija informacija, odgovaranje na pitanjaRanka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov. "Creation of a Training Dataset for Question-Answering Models in Serbian" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)
Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has
sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci
... using human language technology [31] and used within this research in the web and mobile applications. 2.3. General Purpose Morphological Dictionaries Serbian has an extensive system of inflection and a complex agreement system that makes extraction of terminology more complicated, and thus the use ...
... the corpora and in the dictionaries. Finally, candidates are harmonised and assembled to the microstructure of the lexical database Termi, which consists of a headword, synonyms, abbreviations, definition, for each language, bibliographic source and possibility to include illustration and other external ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
New Language Models for South Slavic Languages
Mihailo Škorić (2024)Izlaganje će predstaviti izazove i perspektive modelovanja južnoslovenskih jezika, sa posebnim osvrtom opšte jezičke modele građene na arhitekturi transformera (BERT, GPT), na dostupne skupove tekstova za obučavanje tih modela, te kvantitet i kvalitet tih skupova. Izlaganje će ponuditi pregled dostupnih skupova i modela, dok će posebna pažnja biti posvećena najnovijim korpusima tekstova. Prvi korpus, Kišobran, predstavlja krovni veb korpus južnoslovenskih jezika i ujedno trenutno najveći korpus tekstova na našim prostorima koji broji preko osamnaest milijardi reči i uključuje sve ...Mihailo Škorić. "New Language Models for South Slavic Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
... strings and with the help of seven regular expres- sions that find their variations, and without prior knowledge of the language or its grammar, software formed a database of terms and their values on a positive-negative scale. The system has been tested by several independent evaluators and based on ...
... use in their messages (in the form of emoti- cons or language-universal phrases) and assigning values of sentiment polar- ity to terms in which those determiners are located. As the determiners are language-independent, the system would be language-independent as well. If it turns out to be valid, this ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... the University of Belgrade Language Technology Group [10] to support com- putational linguists in developing, maintaining and exploiting e-dictionaries. LeXimir is written in C#, and operates on the .NET platform. It can run on any personal computer under Windows and supports simultaneous manipulation ...
... LeXimir in language processing tasks. I. INTRODUCTION MORPHOLOGICAL electronic dictionaries of Serbian for natural language processing (NLP) are being de- veloped for many years now. Their development follows the methodology and format (known as DELAS/DELAF) pre- sented for French in [1]. E-dictionaries ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom
Lidija Beko, Ivan Obradović, Ranka Stanković. "Developing Students' Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom" in Proceedings of the Second International Conference on Teaching English for Specific Purposes and New Language Learning Technologies, May, 22-24, 2015, Niš, Serbia, Faculty of Electronic Engineering, University of Niš, Niš : Faculty of Electronic Engineering (2015)
Softverski alati za korišćenje resursa za srpski jezik
... research and scientific institutions were engaged in the BalkaNet project, mainly from countries where the BalkaNet languages are spoken, but also from France and Netherlands. A national development team was formed for each language, and in the case of Serbian this team was the Human Language Technology ...
... manipulation of lexi- cal resources (for the time being only the update and search of SWN is envisaged), and also offer information of the Human Language Technology Group and the developed software for lexical re- sources, namely WS4LR and WS4QE. Figure 10. The main menu of the WS4QE web application The ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
A Tel Platform Blending Academic And Entrepreneurial Knowledge
... development, use, reuse and delivery of learning content, including content and learning management systems, as well as content development tools; • Learning content - learning resources, both academic and entrepreneurial, and reference resources, where language resources hold the most ...
... BAEKTEL language support system consists of several software components handling simultaneously several types of language resources: grammars, lexical and textual resources (Fig 2). One of the basic lexical resources is the system of morphological dictionaries of Serbian simple words and compounds ...
... mentioned that due to the complex Serbian grammar the language support system also features grammars implemented through finite state automata, finite state transducers and compound inflection rules. The language resources in the BAEKTEL language support system are managed by a web application ...Ivan Obradović, Ranka Stanković, Jelena Prodanović, Olivera Kitanović. "A Tel Platform Blending Academic And Entrepreneurial Knowledge" in Proceedings of the The Fourth International Conference on e-Learning (eLearning-2013), September 2013, Belgrade, Serbia, Belgrade, Serbia : Belgrade Metropolitan University (2013)
LRMI markup of OER content within the BAEKTEL project
... a markup language, where its simplicity makes it useful and easily implemented convention for tagging content. Key benefits of using this approach is expanded access to descriptive data on educational resources, pooling knowledge about learning resources and providing tools and services to ...
... obrazaca u tekstu, konačni automati, transduktori, elektronski rečnici, kaskade i višečlane reči. about: Unitex about: Computational linguistics about: Natural language procesing about: Računarska linvistika about: Obrada tekstova na prirodnom jeziku about: elektronski rečnici about: analiza teksta ...
... physical and virtual communities, where more and more are published as Open Educational Resources (OER). Nowadays, online searches for educational materials are conducted more often than a few years ago. Although Internet searches are easy and speedy, they are not always efficient and retrieve ...Ranka Stanković, Daniela Carlucci, Olivera Kitanović, Nikola Vulović, Bojan Zlatić. "LRMI markup of OER content within the BAEKTEL project" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
Infotheca (Q25460443) in Wikidata
Ranka Stanković, Lazar Davidović (2021)Vikipodaci su baza znanja Zadužbine Vikimedija koja predstavlja zajednički izvor različitih vrsta podataka koje koriste ne samo drugi Vikipedijini projekti, već sve više i brojne aplikacije semantičkog veba. U ovom radu ćemo prezentovati primer integracije Vikipodataka sa digitalnim bibliotekama i eksternim sistemima, kao i mogućnost ubrzanja pripreme i unosa podataka na primeru radova iz časopisa za digitalnu humanistiku Infoteka.... continued activity where special attention will be paid to linked open data in the domain of linguistics – LLOD and its application. We must certainly be aware of the problems and limitations related to Wikidata and other kinds of linked open data, so as to be able to look into the ways of overcoming or ...
... the semantic web. The concept of the seman- tic web and open linked data technologies expand the traditional web by using a standard markup language and similar processing tools, where RDF (Resource Description Framework) plays a significant role and makes more efficient information retrieval solutions ...
... fact or a piece of data about the item. Table 1 shows several examples of natural language sentences and the encoding of this information in Wikipedia, represented as triples of subject, predicate and object (left), and in shortened notation (right). In the above example, the second column of the table ...Ranka Stanković, Lazar Davidović. "Infotheca (Q25460443) in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.5
Ontološki model upravljanja rizikom u rudarstvu
Olivera Kitanović (2021)Rudarska proizvodnja obuhvata kompleksne tehnološke sisteme, što nameće potrebu za uspostavljanjem i unapređivanjem sistema upravljanja rizikom. Heterogenost i obim podataka neophodnih za upravljanje rizikom zahtevaju sistem koji ih na fleksibilan način integriše i omogućava njihovo optimalno korišćenje. Osnovni cilj ove disertacije je razvoj ontologije za domen rudarstva i na njoj zasnovanog modela za upravljanje rizikom. Njegova realizacija podrazumeva i implementaciju algoritama ekstrakcije informacija za popunjavanje ontologije, kao i odgovarajuće softversko rešenje. Razvoj modela obuhvata i značajno proširenje rudarskog korpusa, kao
rudarstvo, rizik, upravljanje rizikom, procena rizika, ontologija, semantička mreža, ekstrakcija informacija, upravljanje znanjem, računarska lingvistika
... Ranka Stanković, and Duško Vitas. 2018. “Knowledge and Rule-Based Diacritic Restoration in Serbian.” In Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), edited by Svetla Koeva, 41–51. Sofia, Bulgaria: Institute for Bulgarian Language “Prof. Lyubomir ...
Olivera Kitanović. Ontološki model upravljanja rizikom u rudarstvu, Beograd : [O. Kitanović], 2021
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
Чији је пример? Анализа лексичких обележја на примерима Речника САНУ
... Academy of Science and Arts. Each dictionary example is documented with its author, so we decided to examine only examples that origin from twelve great names in the domestic literature. For each author’s example, we extracted different lexical features, and then we visualized and compared these results ...
Бранислава Б. Шандрих, Ранка М. Станковић, Мирјана С. Гочанин. "Чији је пример? Анализа лексичких обележја на примерима Речника САНУ" in Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch13
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... that is, aligned parallel texts, between tools and translation vendors [TMX, 2005]. TMX uses ISO standards for country and language codes, as well as for date and time. A TMX document consists of a header (metadata describing the aligned texts) and a body, containing a set of translation units ...
... metadata All metadata, except language independent data, such as the numeration metadata (, , , , ), the and , are entered in both languages (Serbian and English), using the attribute xml:lang to denote the language of the content (see Figure 2) ... Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.
Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
... comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
Parallel Bidirectionally Pretrained Taggers as Feature Generators
In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Speech tagging using multiple standalone annotation systems as feature generators for a stacked classifier. It also explores automatic resource expansion via dataset augmentation and bidirectional training in order to increase the number of taggers and to maximize the impact of the composite system, which ...Ranka Stanković, Mihailo Škorić, Branislava Šandrih Todorović. "Parallel Bidirectionally Pretrained Taggers as Feature Generators" in Applied Sciences, MDPI AG (2022). https://doi.org/10.3390/app12105028
Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... thematic content, and entrepreneurial: case studies, best practice examples, expert presentations and software demonstrations. Language resources supporting the multilinguality of the platform, terminology and its search and browse functions are lexical and textual resources and grammars. Impl ...
... materials, video lectures, thematic content and the like, supported by evaluation tools, and entrepreneurial, such as case studies, best practice examples, expert presentations and software demonstrations; • Language resources – lexical and textual resources and grammars to support the multilinguality ...
... ach – The TEL platform consists of tools and resources: learning, language and implementation resources. Among the tools some are available open source and commercial tools, some are in-house tools developed by the University of Belgrade Human Language Technology Group. Learning resources are both ...Ivan Obradović, Ranka Stanković. "Using technology for knowledge transfer between academia and enterprises" in Knowledge and Management Models for Sustainable Growth, Proc. of IFKAD 2014, 9th International Forum on Knowledge Asset Dynamics, 11-13 June 2013, Matera, Italy, Bari : IFKAD (2014)