186 items
Keyword Extraction from Parallel Abstracts of Scientific Publications
... this study are better for the Serbian than for the English language (see Table 2). This is in line with our previous findings for the Croatian language [8]. Both, Serbian and Croatian language are morpho- logically rich, and closely related languages from South Slavic language family. Unlike English, which ...
... suffixes from words). For preprocessing of texts in the Serbian language we use: (1) Stop-word list - prepared at the Human Language Technology Group at the University of Belgrade [30], and (2) a Serbian lemmatizer. For lemmatization, we use Serbian morphological elec- tronic dictionaries and grammars ...
... the ontology for the domain of geology and mining in the Serbian language. Finally, it is worth mentioning that the different structure and syntax of the Serbian and English languages are reflected in the results. By combining (translating) Serbian and English keywords, a larger set of keywords can be ...Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... unification of language script, as the usage of Cyrillic and Latin scripts in Serbian is equal. All tweets were converted into Latin script. The second problem was the classification of tweets according to language. Although our principal aim was to obtain a collection of tweets in Serbian, due to the ...
... two collections of tweets that had been manually annotated according to irony. These collections of tweets as well as the used language resources are in the Serbian language (or one of closely related languages – Bosnian/Croatian/Montenegrin). The best accuracy of the devel- oped classifier was achieved ...
... task. A language classifier was built and assessed in the following way (step 1 in Fig 1). First we manually marked each tweet with a (BCMS) or (not_BCMS) mark. After that we used Serbian Morphological Electronic Dictionaries [22] to automatically tag each word with a mark of belonging to a language _word ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
The Nooj System as Module within an Integrated Language Processing Environment
... integrated functions and resources enable queries to be posed in one language and bitext to be searched in the same or other language. For example, if a query consists of Serbian word ‘kompjuter’, it can be expanded by Serbian wordnet to ‘računar, kompjuter’, and then transformed by ILI (interlingual ...
... Although it has so far been used mainly for Serbian, WS4LR is not language dependent and can be successfully used for resources in other languages provided that they follow the described formats and methodologies. The integration of NooJ with other language resources was aimed in the first place ...
... such as Serbian, where two alphabets, Cyrillic and Latin are equally used. WS4LR enables the exploitation of language resources both in Cyrillic and Latin alphabet, as well as in a special encoding, that uses the ASCII character set and that can be unambiguously transformed into Serbian Latin or ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
Managing mining project documentation using human language technology
Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentationAleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239
Building learning capacity by blending different sources of knowledge
... such as Serbian. In order to support the multilinguality of the BMP, the language support system can also expand a query formulated in one language to another language, e.g. a query in Serbian to English or Russian, and vice-versa. With all the aforementioned features the language support ...
... coupled with language specific morphological dictionaries. Morphological dictionaries of Serbian simple words and compounds in the so-called LADL format (Krstev et al., 2010) are thus a necessary part of the lexical resources used by the BMP language support system. Besides Serbian, such resources ...
... Figure 3: The BMP language support system Finally, the language support system features domain specific terminological resources such as GeolISS term and RudOnto (Stanković et al., 2012). GeolISS is a thesaurus of geological terms with entries in Serbian and English, developed at ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Dalibor Vorkapić. "Building learning capacity by blending different sources of knowledge" in International Journal of Learning and Intellectual Capital (2016). https://doi.org/10.1504/IJLIC.2016.075698
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... such as Serbian, where two alphabets, Cyrillic and Latin are equally used. WS4LR enables the exploitation of language resources both in Cyrillic and Latin alphabet, as well as in a special encoding, that uses the ASCII character set and that can be unambiguously transformed into Serbian Latin ...
... tool that integrates diverse language resources and is thus more powerful than the majority of other wordnet tools. The desktop version of WS4LR is fully operational and is already being used as the main tool for developing resources in Serbian, including the Serbian wordnet, but its commercial ...
... provides a common semantic framework for all the languages, while language specific properties are maintained in the individual wordnets. BalkaNet, a project aimed at developing wordnets for Bulgarian, Greek, Romanian, Serbian and Turkish and expanding the Czech wordnet, followed an approach ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
A Tel Platform Blending Academic And Entrepreneurial Knowledge
... be mentioned that due to the complex Serbian grammar the language support system also features grammars implemented through finite state automata, finite state transducers and compound inflection rules. The language resources in the BAEKTEL language support system are managed by a web ...
... The BAEKTEL language support system consists of several software components handling simultaneously several types of language resources: grammars, lexical and textual resources (Fig 2). One of the basic lexical resources is the system of morphological dictionaries of Serbian simple words and ...
... the query morphologically, which is especially important for Serbian, due to its morphological richness. The query can also be expanded to another language thus supporting multilinguality within BAEKTEL. The BAEKTEL language support system is a very important part of the entire concept ...Ivan Obradović, Ranka Stanković, Jelena Prodanović, Olivera Kitanović. "A Tel Platform Blending Academic And Entrepreneurial Knowledge" in Proceedings of the The Fourth International Conference on e-Learning (eLearning-2013), September 2013, Belgrade, Serbia, Belgrade, Serbia : Belgrade Metropolitan University (2013)
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, VikipodaciRanka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... API. We used the LanguageApp service in Google Apps Script2 to create our own version of Language Translation API, which, unlike the official Google Language Translation API, produces text translated into Serbian in Latin script, instead of Cyrillic, and serializes it into a plain text file.3 An example ...
... approach to word sense alignment. TACL, 1:151–164. Mladenović, M. and Mitrović, J. (2014). Natural Language Processing for Serbian – Resources and Application, chapter Semantic Networks for Serbian: New Functionalities of Developing and Maintaining a WordNet Tool. University of Belgrade, Mathematical Faculty ...
... chancery which is converted, after translation by our Translation API based on Google Language Translation API, into: ENG30-08331011-n | sud koji je nadležan za pravičnost | kancelarija ; sudski ured If a Serbian translational equivalent for a PWN literal obtained by Google Translation is not of- ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution
This paper explores the effectiveness of parallel stylometric document embeddings in solving the authorship attribution task by testing a novel approach on literary texts in 7 different languages, totaling in 7051 unique 10,000-token chunks from 700 PoS and lemma annotated documents. We used these documents to produce four document embedding models using Stylo R package (word-based, lemma-based, PoS-trigrams-based, and PoS-mask-based) and one document embedding model using mBERT for each of the seven languages. We created further derivations of these ...Mihailo Škorić, Ranka Stanković, Milica Ikonić Nešić, Joanna Byszuk, Maciej Eder. "Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution" in Mathematics, MDPI AG (2022). https://doi.org/10.3390/math10050838
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... frequent words in the Serbian Corpus of the Serbian Language SrbCorp (version of 122 million words by Duško Vitas and Miloš Utvić)6. Information about the Corpus is stored in the KorpusMeta table. The LexicalRelation table stores information 6 Corpus of the Serbian Language – SrbCorp 86 Infotheca ...
... MultilingualLabels table was created with the idea of pre- senting meta-language that is used for description of labels, eg. labels and its description could be described in Serbian, English, French, etc. Currently only Serbian language is in use. 4 Leximirka application 4.1 Interface Leximirka application ...
... represent a significant resource for Serbian language processing. The importance of this resource is in its multiple applications. Although Serbian morphological dictionaries (SMD) were initially developed for Unitex1, which enables various complex queries with regular expressions or FSA, their main importance ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... depicted in Figure 3 on the left, while on the right are main application components of the language support system. Main lexical resources include morphological dictionaries for Serbian language15, Serbian and English WordNets, terminological databases: Termi, GeolISSTerm, RudOnto and Librarian ...
... I. Obradović & D. Vitas Natural Language Processing for Serbian – Resources and Application, 1-11. Matematički fakultet, Beograd. 21 Mladenović, M., Mitrović, J., Krstev, C., & Vitas, D. (2015). Hybrid Sentiment Analysis Framework For A Morphologically Rich Language. Journal of Intelligent Information ...
... Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 28-30 May 2008, European Language Resources Association (ELRA), 2008 4. Duško Vitas, Cvetana Krstev, Ivan Obradović, Ljubomir Popović, Gordana Pavlović-Lažetić”,An Processing Serbian Written Texts: An Overview ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
An Approach to Development of Bilingual Lexical Resources
... one term can belong to one or more synsets, and that one synset can have several terms from the corresponding language. In its initial phase, Biblimir is conceived as a bilingual Serbian-English resource, but the model enables further expansion to other languages, such as French, German, etc. ...
... appeared in one language without its translational equivalent in the other. In such cases an appropriate entry to Biblimir was considered. In this section we will illustrate this process with several examples. Table 1: Overview of Available Resources E-dictionaries Serbian Wordnet Dictionary ...
... Multilingual textual repositories, such as digital libraries of e- journals represent a specific type of language resources. Efficient search of these resources usually relies on specific language tools, which often use other available resources, such as e-dictionaries, wordnets and the like. An ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction
Velislava Stoykova, Ranka Stanković (2018)Velislava Stoykova, Ranka Stanković. "Using Query Expansion for Cross-Lingual Mathematical Terminology Extraction" in Advances in Intelligent Systems and Computing, Springer International Publishing (2018). https://doi.org/10.1007/978-3-319-91189-2_16
Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... is especially important for morphologically rich languages such as Serbian. In order to support the multilinguality of the TEL platform, LSS can also expand the query in one language to another language, e.g. a query in Serbian to English or Russian, and vice-versa. With all the aforementioned ...
... use of. Specific features of Serbian grammar need corresponding language resources in the form of grammars. Grammars within LSS are implemented by the so called finite state automata, finite state transducers and compound inflection rules (Krstev, 2008). The language support system handles various ...
... TEL platform consists of tools and resources: learning, language and implementation resources. Among the tools some are available open source and commercial tools, some are in-house tools developed by the University of Belgrade Human Language Technology Group. Learning resources are both academic: ...Ivan Obradović, Ranka Stanković. "Using technology for knowledge transfer between academia and enterprises" in Knowledge and Management Models for Sustainable Growth, Proc. of IFKAD 2014, 9th International Forum on Knowledge Asset Dynamics, 11-13 June 2013, Matera, Italy, Bari : IFKAD (2014)
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... Krstev, C., Obradović, I., Pavlović-Lažetić, G. and Stanojević, M. (2012). The Serbian Language in the Digital Age. Berlin; Springer-Verlag. 8. Language Resource References Vitas D., Utvić M. (2015). SrpKor22M, Serbian automatically lemmatized, PoS and morphosyntactically annotated corpus 22M ...
... of MWT extraction and lemmatization from Serbian texts we have chosen a rule-based approach, which relies on a system of language resources such as morphological e-dictionaries and grammars developed within the University of Belgrade Human Language Technology Group (Vitas et al., 2012). For ...
... compare term frequency in the domain corpus and the general language corpus, thus illustrating how specific the MWU is for the selected domain. As the general corpus we used a 22 million words excerpt from the Corpus of Contemporary Serbian (SrpKor – http://www.korpus.matf.bg.ac.rs). The computed ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC
OntoLex, dominantni standard zajednice za mašinski čitljive leksičke resurse u kontekstu RDF-a, Linked Data i tehnologija Semantičkog veba, trenutno se proširuje sa posebnim modulom za Frekvencije, Primere i Informacije zasnovane na Korpusu (OntoLex-FrAC). Predlažemo novi komponent za OntoLex-FrAC, koji se bavi inkorporacijom korpusnih upita za (a) povezivanje rečnika sa korpusnim mašinama, (b) omogućavanje RDF baziranih web servisa da dinamički razmenjuju korpusne upite i podatke odgovora, i (c) korišćenje konvencionalnih upitačkih jezika za formalizaciju unutrašnje strukture kolokacija, skica reči i ...standardizacija, digitalna leksikografija, OntoLex, upiti korpusa, povezani podaci, Lingvistički povezani otvoreni podaciChristian Chiarcos, Ranka Stanković, Maxim Ionov, Gilles Sérasset. "Bridging Computational Lexicography and Corpus Linguistics: A Query Extension for OntoLex-FrAC" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin, 20-25 May 2024, LREC (2024)
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)