Претрага
97 items
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... retrieval and incorporation in Serbian terminological dictionaries. Due to spe- cific features of Serbian grammar, especially its rich morphology, this is a complex task, and cor- responding language resources in the form of morphological e-dictionaries and grammars need to be applied (Vitas et al., 2012) ...
... with numerous rules and exceptions. Morphological electronic dictionaries of Serbian for NLP are being developed for many years now. Their development follows the methodology and format (known as DELAS/DELAF) presented for French in (Courtois, 1990). E-dictionaries in the same for- mat have been produced ...
... of MWUs from a text is preceded by the retrieval of new simple word terms from it and their incorporation in the existing system of morphological e-dictionaries as MWU extraction relies heavily on existing lexical resources. In the Serbian e-dictionary of MWUs, all en- tries are distributed in ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Development of Open Educational Resources (OER) for Natural Language Processing
In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...... analysis by using linguistic resourcesprogramme.[11] Unitex is based on finite-state technology. It enables application of morphological electronic dictionaries and grammars to texts for a number of different languages: 10http://baektel.eu/?menu=partners 11http://www-igm.univ-mlv.fr/~unitex/ ...
... whose basic elements are either word forms (strings) or lexical masks that refer to the content of e-dictionaries. 5. The advanced methods of text searching are introduced: morphological filters – regular expressions that enable string search at the level of characters – and graphs as a tool ...
... multi-word unit (MWU) recognition are presented with emphasis on e-dictionaries of nominal MWUs, particularly their inflection that has to consider complex rules for MWU inflection in Serbian. 10. The use of powerful morphological mode is presented that enables the use of lexical resources at ...Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
E-Dictionaries and Finite-State Automata for the Recognition of Named Entities
Krstev Cvetana, Vitas Duško, Obradović Ivan, Utvić Miloš. "E-Dictionaries and Finite-State Automata for the Recognition of Named Entities" in Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, FSMNLP 2011, July 2010, Blois, France, A. Maletti and M. Constant (eds.), :Association for Computational Linguistics (2011): 48-56
-
Possibilities of retro-digitalized German-Serbian Mining Dictionary
U radu će biti prikazan opis procesa retrodigitalizacije dvojezičnog Nemačko-srpskog rudarskog rečnika iz 1923. godine čiji je autor rudarski inženjer Dragutin Stepanović (Степановић, 1923). Ovaj rečnik je zasnovan na skoro 4 000 leksičkih zapisa koji su prevodilački ekvivalenti ili uputnice. Umesto predgovora autor daje uvid u svoje pismo upućeno “Ministru šuma i rudnika” u kome piše o nameri da zabeleži reči koje se koriste u narodu kako bi izbegao upotrebu nemačkih reči. Iako broj odrednica nije toliko veliki, rečnik ...Biljana Lazić, Olivera Kitanović, Ivan Obradović. "Possibilities of retro-digitalized German-Serbian Mining Dictionary" in E-dictionaries and E-lexicography, Zagreb, 10-11 May 2019, Zagreb : Institut za hrvatski jezik i jezikoslovlje (2019)
-
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... examples of LSRs are dictionaries. Dictionaries form an important foundation of numerous natural language processing (NLP) tasks, including word sense disambiguation, machine trans- lation, question answering and automatic summarization. However, the task of combining dictionaries from different sources ...
... have the same part- of-speech tags. Spelling variations are normalized to a unique variation. 3.3. Dictionaries used in the creation of the dataset For alignment we used the following dictionaries: Basque The Basque Wordnet (MCR 3.0) and the Basque Monolingual Dictionary ”Euskal Hiztegia” (copyright ...
... (2006-)13 containing 110,000 entries. Both are typical academic dictionaries. Irish We used the Wiktionary data14 and An Foclóir Beag (Dónaill and Maoileoin, 1991, ‘The Little Dic- tionary’), the only two monolingual dictionaries avail- able for this language. Italian We used ItalWordNet (Roventini ...Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... be able to accomplish their task successfully. Secondly, lemmas are necessary for incorporating the MWTs in morphological dictionaries in compliance with the form these dictionaries require. This is essential as the set of forms found in the corpus is rarely comprehensive, and thus all potential ...
... and lemmatization from Serbian texts we have chosen a rule-based approach, which relies on a system of language resources such as morphological e-dictionaries and grammars developed within the University of Belgrade Human Language Technology Group (Vitas et al., 2012). For our approach, ...
... correct lemma candidates, provided with information about their syntactic structures, are used to fully automatically produce entries in morphological e-dictionaries according to a strategy for producing MWU lemmas, and subsequently all their inflectional forms (for more details see (Krstev et al ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... to use the Serbian morphological dictionary. Serbian morphological dictionaries include semantic markers which allow the distinction between ijekavian, ekavian and ikavian pronunciation. Dictionaries cover both general lexica and proper names. Serbian morphological dictionaries are found in LADL ...
... tools used as an educational system as a whole and to improve the visibility of resources in the Internet. This component consists of morphological dictionaries, WordNet, domain specific terminological resources such as GeolISSterm, RudOnto, aligned texts in TMX format, corpora etc. Special attention ...
... components administrating in the same time language resources: grammars, lexical and textual resources (Image 1). 4. LEXICAL RESOURCES Morphological dictionaries are meant to be used by computers in the process of query expansion. Their usage is necessary because of the rich flexion of Serbian language ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines
In this paper we present how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for tuning queries before submitting them to a web search engine. We argue that the selection of words chosen for a query, which are of paramount importance for the quality of results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic ...LR web services, MultiWord Expressions & Collocations, Information Extraction, Information Retrieval... results obtained by the query, can be substantially improved by using various lexical resources, such as morphological dictionaries and wordnets. These dictionaries enable semantic and morphological expansion of the query, the latter being very important in highly inflective languages, such as Serbian ...
... user. 1. Morphological dictionaries of simple words and compounds in the so called LADL format (Courtois et al., 1990) basically consist of lemmas accompanied with inflectional class codes which enables a precise production of all inflectional forms. The Serbian morphological dictionary of ...
... presumption is that in many cases this structure can be predicted on the basis of morphological and syntactic features of the phrase components. These features can be obtained from the morphological e-dictionaries that are at our disposal during the query expansion process. The prediction of the ...Krstev Cvetana, Stanković Ranka, Vitas Duško, Obradović Ivan. "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines" in LREC 2008: Conference on Language Resources and Evaluation, Marrakesh, Morocco, May 2008, European Language Resources Association (ELRA) (2008)
-
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... falsely recognized words that result in non- valid words. It follows four steps: 1. The text obtained by OCR is processed using Serbian morphological electronic dictionaries (SMD) (Krstev, 2008); 2 This corpus is a part of the European Literary Text Collection corpus (ElTEC) developed in the scope of the ...
... 2202. In order to be able to offer ranked candidates in diacritic restoration a special dictionary was produced from the standard Serbian Morphological Dictionaries, in which the entries for the forms kupaca, kupača and kupaća are as follows: kupaca,kupac.N+Hum:mp2v 6 We will call them “errors” although ...
... This list may contain the original word because it exists in dictionaries: kupaca ⇒ kupača (kupač ‘bather’), kupaća (kupaći ‘bathing’), kupaca (kupac ‘buyer’); – It need not contain the original word (because it is not in dictionaries): jezice ⇒ ježiće (ježiti se ‘to bristle’ and ježić ‘diminutive ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
-
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... category, while it appears in Bulgarian dictionaries: masculine gender - господар,N+M, feminine gender - глава,N+F and neutral gender - лице,N+NE. Conversely, which is though not unexpected, there were *.def categories which did not appear in text dictionaries. In addition to that, further review ...
... analysis also revealed categories that appeared in dictionaries but not in corresponding *.def files, which was quite unexpected. Important differences were also found between semantic codes in the *.def files and those in the text dictionaries. All results point to the need of establishing a better ...
... content of *.def files enables NooJ to recognize properties and their values and they can also improve the visibility of representations of dictionaries in tabular form. The first step in the analysis of compatibility of resources was precisely a comparison of metadata, namely of the content ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... at its disposal several other lexical resources, such as morphological e-dictionaries. Together with the system of rules for compound inflection, finite automata and transducers, these dictionaries represent the basis for morphological expansion of queries. As for semantic and bilingual expansion ...
... Serbian morphological dictionaries of simple words and multi-word units [Krstev, 2008]. These comprehensive resources were developed and are being mainly used within two corpus processing systems: Unitex and Nooj. However, Unitex standalone routines enable the usage of morphological dictionaries ...
... one of the tools developed within the group, dubbed LeXimir, was designed as an integrated environment for various resources, such as morphological e-dictionaries, wordnets, and multilingual proper name databases, which enables, among other things, versatile handling of both monolingual and aligned ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)
-
A Lexical Approach to Acronyms and their Definitions
In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.... definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables ...
... denotes proposed pairs, while corr de- notes all chosen correct pairs). By e-dict_MWL we de- note correct MWLs in the form required by morphological e-dictionaries, while e-dict_(MW|acr)_forms represent in- flected forms of MW names and acronyms, respectively. Input Size Output Size 1 corpus 23MW ...
... Also, “words” are just potential words of a lan- guage – strings of alphabetic characters – and we do not look for them in dictionaries. However, we have to look in dictionaries to confirm, for instance, the occurrence of prepositions and/or conjunctions. These patterns are im- plemented as Unitex4 ...Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... Scientific paper 2. To each Serbian noun, verb or adjective from the merged list we as- signed its inflected forms obtained from the Serbian morphological e- dictionaries (Krstev, 2008). These inflected forms have various gram- matical codes assigned to them, which were used in the final step. As mentioned ...
... lemmatisation within MWTs is needed. This means that each word from a MWT has to be replaced by a corresponding lemma from the available morphological e- dictionaries for Unitex (Krstev, 2008). For example, a word “reči” is a noun, has feminine gender, is in plural and is in nominative case. A lemma for ...
... Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++ Branislava Šandrih, Ranka Stanković Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++ | Branislava Šandrih, Ranka Stanković ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... was trained for tagging (Krstev and Vitas 2005; Utvic 2011), (Stanković et al. 2020, 3957) using a manually an- notated corpus of Serbian morphological dictionaries (Krstev 2008). Figure 6. A histogram of frequencies for different inflectional forms of the noun ризик The mining corpus is published in ...
... the noun risk in mono- lingual dictionaries and corpus data, they concluded that the dictionaries do not give a comprehensive enough description, with a lot of the meanings found through corpus search not even being mentioned. The finding was that printed dictionaries, with a linear approach to meaning ...
... descriptive dictionaries of Serbian for all of the four most common frame-evoking word classes (nouns, adjectives, verbs and adverbs) (Марковић 2017, 34–41). 4. In Serbian lexicographic literature, as well as in syntax papers that explore the relationship between grammar and dictionaries, different ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
-
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... software tool (Schmid, 1997, 1999) was used for automatic morphological annotation of both corpora. The Tree- Tagger language parameter file for Serbian was created as a derivative of a system of Serbian Morphological electronic Dictionaries (SMD, cf. 3.1), au- thored by Cvetana Krstev and Duško Vitas ...
... System of Serbian morphological electronic dictionaries (Unitex DELA format); 7 https://unitexgramlab.org/ 102 Infotheca Vol. 19, No. 2, December 2019 Scientific paper – Semantic network WordNet for Serbian; – Terminological databases Termi, RudOnto, GeolISS. 3.1 Serbian morphological resources The ...
... The system of morphological electronic dictionaries of the Serbian lan- guage (SMD) (Krstev, 2008) is the core for the morphological expansion. SMD follows the methodology and format (known as DELAS/DELAF) that was developed in LADL (Laboratoire d’Automatique Documentaire et Lin- guistique) under the ...Ranka Stanković, Miloš Utvić. "Vebran Web Services for Corpus Query Expansion" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.5
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... research purposes and for the production of various derived lexicographic products. 2 Related Work Digital dictionaries ceased to be a novelty a long time ago. The majority of new dictionaries are pro- duced (and in some cases exist only) in digital form. However, many significant lexicographic works ...
... 2 For instance, Ahačić (2015) presents a Slovenian Dictionary Portal that collects information from 22 dictionaries, dating from the 16th century to the present day. Some of these dictionaries were transformed into XML format, while other were developed in it. 2 / 9 ...
... used, preferably standard ones that support interchange and merging (Lemnitzer et al., 2009). At this point, retro-digitized and digital-born dictionaries meet, since both types should preferably use the same or compatible formal structure and markup language.2 This development led to further linking ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
An Approach to Development of Bilingual Lexical Resources
... y, as well as to another language. One type of lexical resources, morphological e-dictionaries, together with the system of rules for compound inflection, finite automata and transducers, represent the basis for morphological expansion of queries. As for semantic and bilingual expansion, the ...
... documents, is used for development of a new bilingual lexical resource. The approach relies on already available resources, Serbian morphological e-dictionaries, Serbian and English wordnets connected via the interlingual index, and a bilingual Dictionary of Librarianship, as well as on a TMX ...
... for generating aligned parallel texts [Obradović et al., 2008]. As for available lexical resources, we had at our disposal Serbian morphological e-dictionaries [Krstev, 2008], Serbian and English wordnets (SrpWN and EWN), and a bilingual Serbian-English Dictionary of Library and Information ...Stanković Ranka, Obradović Ivan, Trtovac Aleksandra. "An Approach to Development of Bilingual Lexical Resources" in Proceedings of the Fifth Balkan Conference in Informatics BCI 2012, Workshop on Computational Linguistics and Natural Language Processing of Balkan Languages – CLoBL 2012, September 2012, Novi Sad : BCI (2012)