41 items
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... well. This example leads us to possible applications related to inflection of free noun phrases based on the recognition of their syntactic structure. This idea draws from the assumption that many free noun phrases (used in search queries, for example) may have the same syntactic structure as a MWU ...
... applied to inflect free noun phrases as well. For example, in the phrase kućni aparati prošlogodišnje proizvodnje ‘home appliances of last year’s production’ our procedure would recognize a structure that is inflected according to the AXN4X1 pattern - adjective+noun that do not inflect in number ...
... Adjective/noun (both inflect and agree in gender, number and case) • Noun/noun (both inflect and agree in number and case) • Noun/noun in the genitive (only the first noun inflects) • Word/noun (only the second noun inflects; the first word is usually not a Serbian simple word) • Noun/adjective ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... present how the same procedure is used for other languages. Key words: electronic dictionary, Serbian, morphology, inflection, multi- word units, noun phrases, query expansion 1 Introduction We have been developing morphological electronic dictionaries of Serbian for natural language processing for many ...
... military service’ is a multi-word noun that inherits its gender from the constituent noun rok (masculine in this case), and it inflects for case, but it does not inflect for number (although the simple word rok does). The adjectives civilni and vojni agree with the noun rok in number, case, gender and ...
... illustrated in Table 2 by six inflectional transducers all belonging to one super-class nxn and used for the inflection of MWUs consisting of a noun followed by another noun, where both nouns inflect and must agree in basic grammatical categories. It should be noted that MWUs sharing the same inflectional class ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases
Stanković Ranka (2008)Stanković Ranka. "Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases" in POLIBITS, Research journal on Computer science and computer engineering with applications, Special section: Natural Langugage Processing, Journal of Research and Developement in Computer Science and Engeneering, ed. Grigori Sidorov no. 37, Mexico City, Mexico:Center for Technological Design and Development in Computer Science (CIDETEC) of the National Polytechnic Institute (IPN) (2008): 14-20
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... follow one of the 23 specific syntactic patterns, most frequent for noun terms (AN adjective-noun, NNg noun-noun in genitive case, AAN, . . . ). The first step in this task is to recognise and extract Serbian terminological phrases from the corpus using syntactic patterns, and calculate their frequency ...
... for single terms, we have also used a heuristic for evaluating terminological phrases based on the following ob- servations. The last noun in English noun compounds, which represent the majority of English terminological phrases, as a rule, is the head word carrying the basic meaning, while the preceding ...
... N-N (10%), NXN—a noun followed by a noun that agrees with it in number and case, where the separator can be a hyphen; examples are ‘gas-lift’ (197), ‘blok dijagram’ (block diagram—192), ‘bager vedričar’ (bucket excavator—174). This class had the largest number of recognized phrases for rejection, that ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... c representation of RuThes-lite 2.0 text entries, that is part-of-speech labels for single words and syntactic classes (noun group, verb group, and adjective group) for phrases. The divided synsets were linked to each other with the relation of part-of-speech synonymy (cross-categorial synonymy). ...
... words and phrases. RuThes is a concept-oriented resource as much as possible in describing senses of Russian words and expressions. Each concept has a unique, unambiguous name. In this, RuThes is similar to information-retrieval thesauri and formal ontologies. Rules for inclusion of phrases in the thesaurus ...
... thesaurus are more similar to information-retrieval thesauri guidelines (NISO, 2005). Each concept is linked with words and phrases conveying the concept in texts (text entries). Detailed description of lexical units (words in specific senses), representation of senses of ambiguous words are closer ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... unbalanced dataset, maintaining average F-score of 89%. After conducting the experiment our system extracted 846 different Serbian domain phrases, containing 515 Serbian phrases that were not present in the existing domain terminology. Keywords: aligned texts, word alignment, terminology extraction, electronic ...
... fessional from the librarianship domain to perform manual annotation of the extracted phrases. After manual valida- tion, 515 extracted Serbian MWTs were evaluated as good translations of the paired English Dictionary phrases. The examples illustrating this process are given in Table 3. MWTs extracted by the ...
... domain-specific textual resources, the termino- logical list in the source language and the system for the extraction of terminology-specific nomi- nal phrases (MWT) in the target language it is possible to compile the bilingual aligned termi- nological list. 2. Related Work In recent years extraction of ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...LR web services, MultiWord Expressions & Collocations, Information Extraction, Information Retrieval... inflections of such a list as a whole. Some phrases, as we have expected, had a structure not yet found among compounds, such as adjective+noun+conjunction+noun in Beogradski vodovod i kanalizacija ‘Belgrade water supply and sewage system’. For many free phrases, especially those with fewer components ...
... 1 Dinar is Serbian currency 221 phrases we have produced some new inflectional transducers as for the structure adjective+conjunction+adjective+noun in ekonomska i monetarna unija ‘economic and monetary union’ 8. The bilingual search – ...
... compounds with two components is adjective+noun, followed by the compounds with the structure X+noun, where X means “a word form that does not inflect within the compound”. For compounds with three components the most frequent structure is noun+X+X. Data on frequencies can help in deciding ...Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... characteristic examples. phrases the most frequent pattern is A N (448), a noun preceded by an adjective that agrees with it in the number, the case and the gender, for instance belosvetska kurva ‘worldwide whore’. The other frequent patterns are: N N (50), usually a noun followed by a noun in the genitive or ...
... however, it is a frozen expression mostly used as an interjection. As already mentioned, prior to this task SrpMD contained 79 multi-word entries (noun phrases) from the compiled list of 653 nominal MWEs, however, without any marker pointing to their usage. After reallocating those that were incorrectly ...
... MWEs with more than 5 components. MWEs were tagged using Serbian tagger (Stanković et al., 2020) and separated in two groups: nominal phrases (653) and verbal phrases (1179). Among nominal 80 Abusive category Examples – single word Example - MWE Ethnicity and nationality (ABUS=racial) Ciganin/Gipsy ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... (Definition), sentence segments containing pseudonyms or additional names (Alias Term) are also annotated and associated with the basic term. Likewise, noun phrases (Referential Term) that refer to the previously marked term, secondary definitions (Secondary Definition) with additional information, qualifiers ...
... (FSTs) are abstract mathematical constructions that allow modelling of local grammars to describe some linguistic constructions, for example, noun phrases. A finite state transducer “passes” through the text it analyses to compare a text chunk with the model it represents. In the case of successful ...
... of organizations and the like. Moreover, the dictionary contains multi-word units, which are recorded in traditional dictionaries as syntagms or phrases. The basic unit of these dictionaries is a word form associated with its lemma (usually the headword of a traditional dictionary entry), Part-Of-Speech ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... filtered entries having a dialect form followed by the list of words or phrases in standard Serbian, 3,452 entries were verbs and others were verbal nouns (gerund). For 3,452 verb entries 7,353 synonyms were detected — related words or phrases in the standard Serbian that describe dialect forms and that have ...
... Serbian which, in addition to linguistic information, provides also: sound information (pronun- ciation) about terms and examples of the use of words or phrases as they are spoken in the dialect; graphic information about the geographical location using concepts of Google Maps; the etymological origin of the ...
... is represented by the set of synonymous word forms that have the same or similar meaning in a given context. Synsets respect the syntactic categories noun, verb, adjective, and adverb and can be interconnected by semantic relations, while word forms can be connected by lexical relations. In SWN ontology ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... lexicon of sentiment words and phrases in Serbian (resource C, Fig. 1). Keeping in mind the nature of the rhetorical figure verbal irony which is used to portray a negative statement in the form of a positive one, using the sentiment lexi- con we can detect words and phrases that carry positive sentiment ...
... and već10 and stylistic irony markers like punc- tuation marks, exclamation mark, and cursive font were described. Still, irony markers can also be phrases such as: [Uh|Ah] što volim ‘[Uh|Ah] I really love that’, Ah, kakav. . . ‘Ah, what a . . . ’, nema ničeg lepšeg ‘there is nothing more beautiful’,[ ...
... described in [24]. The lexicon contains 4,593 entries with sentiment polarity values. Lexicon of irony markers (resource B, Fig. 1) which consists of 62 phrases, whose examples we quoted in the previous section, was built based on research presented in [18], [25], [26]. Finally, we used the results of the POS ...Miljana Mladenović, Cvetana Krstev, Jelena Mitrović, Ranka Stanković. "Using Lexical Resources for Irony and Sarcasm Classification" in Proceedings of the 8th Balkan Conference in Informatics (BCI '17), New York, NY, USA, : ACM (2017). https://doi.org/
Development Of The Serbian Geological Resources Portal
... water and solid mineral resources10. The above-mentioned searches were derived from the basic model which is based on mul- tiple entry of key words, phrases, different criteria and finally, the most important part – a ranking of search results. The search system includes advanced methods for ranking search ...
... transformed into SQL (Structured Query Language) format. Expanded in this way, the query is used to search resources on the basis of entered key words and phrases, within the subset of attributes in the data- base which fit the chosen search criteria. Ranking of results is performed by adding weight factors and ...Ranka Stanković, Jelena Prodanović, Olivera Kitanović, Velizar Nikolić. "Development Of The Serbian Geological Resources Portal" in Proceedings of the 17th Meeting of the Association of European Geological Societies, Belgrade, Serbia : The Serbian Geological Society (2011)
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... the local grammar covers the aforementioned structure as well. This example leads us to pos- sible applications related to inflection of free noun phrases based on the recognition of their syntactic structure (as shown by successful processing of specific LIS terms in pervious section). This approach ...
... Adjective/noun (both inflect and agree in gender, number and case) An Approach to Efficient Processing of Multi-Word Units 9 • Noun/noun (both inflect and agree in number and case) • Noun/noun in the genitive or in the instrumental (only the first noun inflects) • Word/noun (only the second noun inflects; ...
... aid car’ noun/adjective in gen./noun in gen. uskrsenje sina božjeg ‘resurrection of the Son of God’ noun/noun in gen./adjective in gen. menadžment ljudskim resursima ‘human resources management’ noun/adjective in instr./noun in instr. raketa zemlja-vazduh ‘air-to-ground missile’ noun/noun in nom ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... category, with links to individual ex- amples. Figure 7 illustrates the word sketch for the noun ризик – one look at the page gives a clear idea of the word’s use. The first column shows prepositional phrases (in Serbian linguistic terminology referred to 22. JeRTeh 23. Sketch Engine 24 Infotheca Vol ...
... comprises an analysis of the meaning of an LU, its lexical surroundings, phrases and grammatical constructions in which it appears in the corpus, the context in which it is used provided by corpus examples, as well as all the phrases in which the LU fulfills its full semantic potential. This approach consists ...
... dissertations (31%), text- books and other mining literature (32%) (Kitanović et al. 2021, 8). Figure 5. Concordances for adjective-noun pattern containing the noun ризик The results of a CQL19 (Corpus Query Language) query are analyzed for: frequency lists, collocations, concordances with a narrower ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
Towards translation of educational resources using GIZA++
... for training statistical translation models will be presented. The paper also describes the translation memory in the form of parallel sentences or phrases required by GIZA++ for the learning algorithm. Keywords: E-Learning, GIZA++, translation memory 1. INTRODUCTION Massive Οpen Online Courses ...
... penalty ensures that the translations do not get too long or too short. The phrase penalty feature is a global feature that counts the number of used phrases for all phrase tables cumulatively. Apart from machine translation, aligned words and multiword expressions can be used for searching and exploring ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... metadata – in shape of determiners, that users of social networks inadvertently use in their messages (in the form of emoti- cons or language-universal phrases) and assigning values of sentiment polar- ity to terms in which those determiners are located. As the determiners are language-independent, the system ...
... into account the proximity of the determiners to the observed term and the value of those determiners. The determiners can be either emoticons or phrases that appear in the conversation, which by nature are not of universal meaning and reflect a positive or negative attitude, replacing facial expressions ...
... social networks or passed from the other participants. Respondents were tasked with asigning values between 0 and 10 to a set of chosen emoticons and phrases, where 0 represented the highest intensity of negative mood, and 10 the highest intensity of positive mood. They were told to consider before rating ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... is a noun, lexicographically relevant co-constituents are its modifiers (the prototypical modifier of a noun in Serbian is an adjective phrase) and complements. If the keyword is an adjective, it is important, too, to consider its modifiers (for example, an adverb) and complements (noun phrases or ...
... prepositional phrases). For a verb keyword, it is important to note all its complements (objects, subject and object complements etc.). The notion of lexicographic relevance may also be applied to the selection of good dictionary examples. The constituents important for proper analysis of an LU ...
... non-standard, vernacular, ephemeral, loanwords, slang). A small number of examples with uncertain boundaries of dictionary entry elements, usually in phrases and proverbs, were excluded from the research, as well as examples from poetry that have the " | " delimiter between verses. In addition to the ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), BAS (2024)
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615