Претрага
97 items
-
Combining Heterogeneous Lexical Resources
... ones are: • The system of morphological dictionaries of Serbian (SMD) in Intex format (Silberztein, 2000), that consists of a dictionary of simple lemmas, a dictionary of compounds (under construction), the corresponding dictionaries of word forms, and morphological finite-state automata that ...
... Among them the two most important ones are: the system of morphological dictionaries of Serbian (SMD) in Intex format and the Serbian wordnet (SWN) developed in the scope of the Balkanet project. Although these two resources represent dictionaries of a different type, developed using different models ...
... morphosyntactic information specific for this literal. This information is automatically retrieved from the DELAS dictionaries. If more than one instance is retrieved from these dictionaries, the user can choose the 1105 appropriate one. Moreover, he can modify (delete or add) the automatically ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... used to link lexical entries. The ini- tial morphological dictionaries were Serbian Morphological Dictionaries. However, we will show multilingual application of Leximirka us- ing French Morphological Dictionaries. KEYWORDS: morphological dictionaries, language resources, Leximirka. PAPER SUBMITTED: ...
... the development of Serbian morphological dictionaries more than 25 years ago (Vitas, 1993; Krstev, 1997; Vitas et al., 1993). Morphological dictionaries represent a significant linguistic resource for languages with rich flexion. Therefore, Serbian morphological dictionaries represent a significant resource ...
... first electronic dictionaries, used before the database notion. These dictionaries also exist for many other languages: German, Bulgarian, Pol- ish, Greek, Russian etc. The system of morphological dictionaries is based on the theory of finite-state automata, namely on morphological and local grammars ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... terminological nominal phrases; • Bilingual Serbian/English list of inflected word forms and MWE pairs derived from bilingual dictionaries and morphological (inflected) dictionaries for Serbian and English; 4.1. Aligned/parallel corpus The English/Serbian textual resource was derived from the journal for ...
... Serbian part. Another by-product is the bilingual Serbian/English list of inflected word forms and MWE pairs derived from bilingual dictionaries and morphological dictionaries. We will apply the same approach to other domains – min- ing, electro-distribution and management – since aligned domain corpora ...
... phrases that were not present in the existing domain terminology. Keywords: aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection 1. Motivation Terminology is rapidly developing in many research and technological fields. It is very difficult to produce and ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... for Serbian are: (a) Serbian morphological dic- tionaries (Cvetana Krstev, Duško Vitas, 2015) (SMD); (b) pre-annotated texts (Duško Vitas, Cvetana Krstev, Ranka Stanković, Miloš Utvić, 2019). 2.1. Serbian morphological dictionaries Serbian morphological dictionaries represent a rich lexical resource ...
... different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment between Serbian morphological dictionaries, MULTEXT-East and Universal Part-of-Speech tagset. The trained models will be used to publish the new version of the Corpus of Contemporary ...
... to be used for different taggers and tagsets in the future. The research was focused on anno- tation schemata alignment between Serbian morphological dictionaries tagset (presented briefly in Subsection 2.1.), MULTEXT-East tagset (Erjavec, 2012), and the Universal Part-of-Speech tagset (Petrov et ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... The RESTfull Web services based on Unitex routines are used for the implementation of morphological analysis and output generation relaying on electronic dictionaries. For query expansion are combined morphological and semantic vocabularies, because synonymous terms are taken from WordNet11 and t ...
... that includes: collection of articles, lexical processing resources, describing text with metadata, analysis of unknown words, complement morphological dictionaries, addition to terminology database, transliteration, correction of broken words, correction of optical character recognition errors. ...
... Figure 3 on the left, while on the right are main application components of the language support system. Main lexical resources include morphological dictionaries for Serbian language15, Serbian and English WordNets, terminological databases: Termi, GeolISSTerm, RudOnto and Librarian dictionary. Apart ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
-
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... presented—paper dictionaries and digital resources related to the raw material domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well ...
... presented—paper dictionaries and digital resources related to the raw material domain, as well as general lexica morphological dictionaries. Resource preparation started with dictionary (retro)digitisation and corpora enlargement, followed by adding new Serbian terms to general lexica dictionaries, as well ...
... available resources: paper and electronic dictionaries, as well as corpora used. Section 3 outlines preparation of resources, which includes digitization of paper dictionaries, enlargement of corpora, adding domain terms to general purpose morphological e-dictionaries and extraction of bilingual lists. The ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
-
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
-
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... content, which is why SMD was chosen as the first lexicon for Serbian to be converted into a lexical database. 3. Morphological electronic dictionaries Morphological electronic dictionaries of Serbian for NLP are being developed for many years now (Vitas et al., 1993) (Krstev, Cvetana and Vitas, Duško ...
... bg.ac.rs Abstract In this paper we present our approach for lexical data migration from textual e-dictionaries to a lexical database. After years of development, Serbian Morphological Dictionaries (SMD), developed as a system of textual files, have become a large and complex lexical resource. As ...
... the purpose of further development and management of morphological electronic dictionaries of Serbian (SMD), presented in more details in Section 3.. However, with the growing number of dictionary developers, and given the va- riety of dictionaries and information stored in them (proper names, dom ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... electronic dictionaries that WS4LR manipulates are monolingual morphological dictionaries, but also a bilingual word list and a multilingual dictionary of proper names. However, the main task of this module is to enable the manipulation of the system of morphological dictionaries of canonical ...
... system of morphological dictionaries is known as the LADL format [4]. The first system developed for processing of texts using dictionaries in LADL format was a system called Intex [12]. Intex uses dictionaries in combination with regular expressions and inflectional and morphological finite state ...
... different languages is supported - management of a system of electronic dictionaries which consist of morphological dictionaries of lemmas for simple and compound words, but also of bilingual and multilingual dictionaries - manipulation of parallel aligned texts, allowing for various forms ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... Section 2 we discuss some previous approaches to searching digital dialect dictionaries. In Section 3 we represent re- sources used to improve searching performances of the digital dialect dictionary: Serbian morphological e-dictionaries used to produce all inflected forms of stan- dard terms and Serbian WordNet ...
... language dictionaries. To support that kind of search, it was necessary to add an infinitive form, that is, to lemmatize both a dialect verb and verbs in the standard Serbian that were retrieved from its definition. For lemmatization task we used Serbian morphological electronic dictionaries and grammars ...
... search not only with one term in the standard language, but with a set of synonym terms in order to improve search. 3 Resources 3.1 Use of morphological e-dictionaries The first problem with search of verbs in dialect dictionary is the grammatical form of the headword of the lexical entry. Namely, grammatical ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... resources in any part of the system, wherever they are needed. Thus, for example, morphological dictionaries can be used for adding additional morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned 1LeXimir ...
... 03:28:08 Production of morphological dictionaries of multi-word units using a multipurpose tool Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Production of morphological dictionaries of multi-word units ...
... in our approach to automated production of lemmas for e-dictionaries of multi-word units. Development of morphological dictionaries of MWUs is a tedious task, especially in the case of Serbian and other languages featuring complex morphological structures. After realizing that the development of such ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... resources in any part of the system, wherever they are needed. Thus, for example, morphological dic- tionaries can be used for adding additional morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned texts ...
... MWUs in the course of development of morpho- logical MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual develop- ment of this type of dictionaries is a tedious and extremely slow process. To alleviate this ...
... Duško Vitas 1 Introduction Morphological electronic dictionaries of Serbian for natural language processing (NLP) are being developed for many years now. Their development follows the methodology and format (known as DELAS/DELAF) presented for French in [3]. E-dictionaries in the same format have been ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... these dictionaries aside, we shall only stress that the generation of morphological forms of compound words is more complex, which makes the data format in these dictionaries also somewhat more complex. All the dictionaries we have mentioned com- pose the system of morphological dictionaries of ...
... three basic resources encompassed by WS4LR and WS4QE, namely the system of mor- phological dictionaries of Serbian, the Serbian wordnet and aligned texts. 2.1 Morphological dictionaries Morphological dictionaries of simple and compound words for Serbian have been devel- oped within the Group by C. Krstev ...
... devojci of the word devojka. In addition to the dictionaries of simple words, corresponding dictionaries of compounds named DELAC for the main forms of the words and DELACF for their morphological forms, also ex- ist. In principle, these dictionaries are following a similar format, but additionally ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
The Nooj System as Module within an Integrated Language Processing Environment
... following main functions (Figure 1): - management of a system of electronic dictionaries which consists of morphological dictionaries of lemmas for simple and compound words but also of bilingual and multilingual dictionaries - development and refinement of wordnets, with simultaneous usage of wordnets ...
... that in text recognition by NooJ the usage of all dictionaries is not always necessary, or even recommended. Morphological dictionaries are of great importance for highly inflective languages, such as Slavic languages. The absence of morphological information in wordnets has turned out to be a serious ...
... 3.3.2. The information from wordnet can be successfully used to enrich the morphological dictionaries, namely the wordnet hierarchy can be used to add semantic information to word entries in morphological dictionaries. Some basic semantic information has already been attached to simple word ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
WS4LR - a Worksation for Lexical Resources
... Resources Various lexical resources that can be produced and handled by WS4LR are briefly described in this section. 2.1 Morphological Dictionaries Morphological dictionaries of simple words and compounds1 in LADL format (Courtois & Silberztein, 1990) exist for many languages, including French ...
... it enables the exchange of information between wordnets and morphological dictionaries. Namely, morphosyntactic information from dictionaries can be attached to synset literals. The tool searches for the wordnet literal in dictionaries of simple or compound lemmas, and it retrieves from them its ...
... + Nooj dictionaries management WORDNET DEVELOPMENT + Manipulation of one or two wordnets + Synsets retrievement using various methods + Navigation by following hypernym/hyponym relations + Copy of synsets with translation support + Exchange of information with morphological dictionaries + Production ...Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović. "WS4LR - a Worksation for Lexical Resources" in Proceedings of the Fifth Interantional Conference on Language Resources and Evaluation, Genoa, Italy, May 2006, ELRA - European Language Resources Association (2006)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... Introduction We have been developing morphological electronic dictionaries of Serbian for natural language processing for many years now. Our e-dictionaries follow the methodology and format known as DELAS/DELAF, which is presented for French in [1]4. Serbian e-dictionaries of simple forms have reached a consid- ...
... numerals and named entities (time and duration, measures and currencies), we have devel- oped finite-state transducers (FSTs) that rely on morphological e-dictionaries of simple words to model these MWUs correctly [4]. When applied to a text in automatic text analysis these FSTs associate recognized MWUs ...
... performed on the basis of dictionaries described in [10] and [11] that are part of the standard distribution of Unitex [12], a corpus processing system based on the finite-state technology. 4 Authors Suppressed Due to Excessive Length Table 1. Initial content of the Serbian morphological dictionary of MWUs ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... the system of morphological dic- tionaries of Serbian, the Serbian wordnet, and parallel and aligned texts. They are the basic resources around which the WS4LR and WS4QE tools were built. A. The system of morphological dictionaries The format chosen for morphological dictionaries was the ...
... consists of the so called DELAS dictionaries of simple words and DELAF dictionaries of their morphological forms, with more than 120.000 simple words and 1.400.000 word forms. In addition to that, dictionaries of compounds named DELAC for the lemma dictionaries and DELACF for their 287 ...
... generation of morphological forms of compound words is more complex, which makes the data format in these dictionaries also somewhat more complex. As dictionaries of compounds are still under de- velopment, their dimensions are currently more modest. The aforementioned dictionaries compose ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Аутоматска екстракција дефиниција – допринос убрзању израде речника
дескриптивни речници, метаанализа лексикографских дефиниција, аутоматска екстракција дефиниција, електронски речници, српски језикРада Стијовић, Цветана Крстев, Ранка Станковић. "Аутоматска екстракција дефиниција – допринос убрзању израде речника" in Лексикологија и лексикографија у светлу актуелних проблема, Институт за српски језик САНУ (2021)
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... in which we will apply definition extraction is semi-automatic creation of dictionaries. In this paper, we present an approach for definition modelling and extraction, relying on the existing Serbian dictionaries (morphological and descriptive), as well as the results of the preliminary experiments in ...
... for some future dictionaries - to establish which models are the most frequently used, which are preferable, and which definitions unnecessarily deviate from the common patterns. The definitions of nouns from the SASA dictionary were analysed using Serbian morphological e-dictionaries and local grammars ...
... Electronic morphological dictionaries of Serbian intended for automatic processing have been undergoing development for many years now, and their current size and content ensure successful use in different real-world applications in the field of Serbian language processing. These dictionaries contain ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)