The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... in relational lexical database. This approach is compatible with several standard structured forms and ontologies (TEI, LMF, Ontolex, LexInfo). A lexical database mod- el was designed in compliance with these structured forms, following mostly the lemon model. Mapping of the lexical entry markers to ...
... compliant tagging. Our main goal is to produce a central lexical database that will enable multiuser management of the lexical data and provide access to the content of the volumes that were already published. For the de- velopment of the lexical database model for the DSA, a similar approach was used as in ...
The Dictionary of the Serbian Academy: from the Text to the Lexical Database Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo
Vebran Web Services for Corpus Query Expansion
Ranka Stanković, Miloš Utvić (2020)U ovom radu se govori o razvoju veb usluga Vebran i njihovoj primeni u poboljšanju pretraživanja korpusa. Veb-servisi Vebran koriste se za konsultovanje spoljnih leksičkih izvora za srpski jezik (uglavnom elektronski morfološki rečnici i srpski Vordnet) i proširivanje korisničkih upita radi dobijanja relevantnijih rezultata iz srpskih korpusa.... about 18,000 lemmas covering different parts of speech. Lexical data have been migrated from textual e-dictionaries to a lexical database. After years of development, SMD, developed as a system of textual files, have become a large and complex lexical resource. An on-line applica- tion for dictionary development ...
... development and management, based on a central lexical data repository (lexical database) is developed offering various possibilities for improvement of SMD, e.g. control of data consistency and introduction of explicit relations between lexical entries, automatic generation of dictionary candidates. The ...
The new version of service Vebran (cf. 4) is using this database for morphological expansion (Stanković et al., 2018). The automatic procedure was used to transfer data from the existing dic- tionaries into the lexical database and to store all information about lemma and form entries as structured
The Nooj System as Module within an Integrated Language Processing Environment
modules. This environment named WS4LR (WorkStation for Lexical Resources) has been developed within the Human Language Technology Group (HLT) at the Faculty of Mathematics, University of Belgrade, and is aimed at manipulating heterogeneous lexical resources developed in the course of many years and
WS4LR - a Worksation for Lexical Resources
WS4LR - a Worksation for Lexical Resources Cvetana Krstev, Ranka Stanković, Duško Vitas, Ivan Obradović
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... Workshop, Bratislava, Slovakia, 15-16 April, 2009. Metalanguage and encoding scheme design for digital lexicography : innovative solutions for lexical entry design in Slavic lexicography: proceedings. Bratislava: L'. Štúr Institute of Linguistic, Slovak Academy of Sciences, 2009, str. 59-70. ...
Savary, A. (2008). Computational Inflection of Multi-Word Units – A Contrastive Study of Lexical Approach, In: Linguistic Issues in Language Technologies, Vol. 1, No. 2, CSLI Publications.
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...... književnog jezika (Dictionary of the Serbo-Croatian Literary Language). Slovene (JSI) Slovene WordNet (Erjavec and Fiser, 2006) and Slovene Lexical Database (Gantar and Krek, 2011) were used. Slovene (ISJFR) eSSKJ–Dictionary of the Slovenian Standard Language (3rd edition) (Gliha Komac et al., ...
... 281 (1527) 29 (215) 0 (0) 0 (0) 599 (4102) Slovene WordNet 409 (1106) 303 (901) 237 (733) 44 (133) 0 (0) 993 (2873) Slovenian (JSI) Slovene Lexical Database 284 (2237) 191 (1047) 220 (1486) 29 (102) 0 (0) 724 (4872) Standard Slovenian Dictionary (eSSKJ) 229 (2060) 109 (911) 76 (620) 0 (0) 60 (588) ...
Gantar, P. and Krek, S. (2011). Slovene lexical database. In Natural language processing, multilinguality, pages 72–80.
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... raw material terminology, were subjected to systematic extensive digitisation. In this approach, besides compiling a comprehensive multilingual lexical database of raw material terminology, lexicographic methods for automatic knowledge extraction are used, including corpus data analysis, automatic data ...
... linguistics, power engineering, etc.) [27,28], and it has been selected as the most suitable resource to be used for the com- prehensive multilingual lexical database of raw material terminology, while the remaining two resources have been incorporated in the dictionary production pipeline. For systematic ...
... SrpMD was necessary, since these dictionaries are a base resource for lemmatization and multiword term extraction. Since SrpMD are already in the lexical database Leximirka [32], developed and managed by the same research team, this type of alignment was possible. Figure 3 presents an outline of the pipeline ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
Combining Heterogeneous Lexical Resources
Combining Heterogeneous Lexical Resources Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić
... also present an integrated programming tool that enables the integration of these diverse lexical resources, as well as possible applications. We envisage the use of these resources in defining and linking lexical data in a way that will enable their more effective retrieval, integration, and reuse across ...Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... pages 1621–1622. Biljana Lazić and Mihailo Škorić. From dela based dictionary to leximirka lexical database. Jelena Mitrović, Miljana Mladenović, and Cvetana Krstev. 2015. Adding mwes to serbian lexical resources using crowdsourcing. In poster presented at The 5th PARSEME general meeting. Ias, i ...
... order to find a set of words that can be triggers for MWEs and generally for offensive speech, a set of trigger (single) words was created. The lexical database Leximirka (Stanković et al., 2018), which supports Serbian electronic dictionaries (Krstev, 2008) was analyzed and entries with one of the following ...
Ranka Stanković, Cvetana Krstev, Biljana Lazić, and Mihailo Škorić. 2018. Electronic dictionaries-from file system to lemon based lexical database. In Proceedings of the 6th Workshop on Linked Data in Linguistics (LDL-2018) (clocated with LREC 2018), McCrae, JP, C. Chiarcos, T. Declerck,
From DELA Based Dictionary to Leximirka Lexical Database
From DELA Based Dictionary to Leximirka Lexical Database Biljana Lazić, Mihailo Škorić
... storing data within a database when compared to storing them in textual files, we will outline some of the functional- ities that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. The ini- ...
... rules necessary to establish relations between pairs of lexical entries 4.2. 4.2 Application example: Establishing relations between lexical entries The modeled and populated lexicographic database has enabled the auto- matic connecting of lexical entries. In order to accomplish this task, various procedures ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
Geothermal potential, chemical characteristic and utilization of groundwater in Serbia
To collect and unify data about all geothermal resources in Serbia, a database was formed. The database allows us to perceive the geothermal resources of Serbia and their potential for utilization. Based on the data available in the geothermal database, the estimated temperatures of reservoirs, heat power, and geothermal energy utilization were calculated. The database contains 293 geothermal records (springs, boreholes) registered at 160 locations, with groundwater temperatures in the range between 20 and 111 °C. The maximum expected ...Geothermal database, Geothermal resources, Geothermal potential, Hydrochemistry, Hierarchical cluster analysis, SerbiaTanja Petrović Pantić, Katarina Atanasković Samolov, Jana Štrbački, Milan Tomić. "Geothermal potential, chemical characteristic and utilization of groundwater in Serbia" in Environmental Earth Sciences, Springer Link (2021). https://doi.org/https://doi.org/10.1007/s12665-021-09985-w
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... Using this all determiners were assigned proper values and the last prerequisite for database forming was fulfilled. 3 Database construction Software, developed specifically for this research was used to create the database. It was written in C# programming language and it can be ran on Windows platform ...
... Appearance of database finishing tab. Infotheca Vol. 17, No. 1, 2017 87 Škorić M., “Classification Based on Emoticons”, pp. 67–91 The database can be plucked based on repetition number of the terms inside of it. This option is located on the fifth software tab, experimen- tal database finishing tab ...
... presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples of the analysis of the ob- tained results. KEYWORDS: data mining ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... management, based on a central lexical data repository (lexical database). In this paper we present the model for the SMD lexical database developed following the lemon model, and the thesaurus of data categories, to be used for enabling links to other (lexical) data. The new database offers various possibilities ...
... migration of all 26 simple word and 15 multi-word unit Serbian dictionary files with more than 150,000 lexical entries. Keywords: lexical database, lemon, electronic dictionaries, lexical model, lexical relations 1. Introduction An application dubbed WS4LR (Krstev et al., 2006), subse- quently upgraded ...
... between lexical entries, nor cross-linking with other lexical models, such as Serbian WordNet, another important lexical resource for Serbian (Koeva et al., 2008). This was the main motiva- tion for transforming SMD dictionaries from the existing file system to a lemon based lexical database. The model ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Razvoj ARCGIS geobaze površinskog kopa korišćenjem UML CASE alata
... a geographic information system is a spatial database or geodatabase. The concept of a database pertains to a collection of various types of geographic data, stored in the system’s data catalogue or a RDBMS (relational database management system) database (such as Oracle, IBM DB2, PostgreSQL, Informix ...
... datasets or raster catalogues. Another way of creating a database is the design of a new spatial database using the tools from the ArcCatalog application and the ArcToolbox toolbox. They enable a direct, that is, “ad-hoc” creation of a physical database model, without previous conceptual ad logical modeling ...
... tools for design and development of geodatabases entails the use of UML for defining the database schema, creation of classes, and only after that, filling with necessary data. The entire process of database creation can be represented through three steps: 1. Geodatabase model generation using UML; ...Aleksandra Tomašević, Ljiljana Kolonja, Ivan Obradović, Ranka Stanković, Olivera Kitanović. "Razvoj ARCGIS geobaze površinskog kopa korišćenjem UML CASE alata" in Podzemni radovi, Beograd : Univerzitet u Beogradu - Rudarsko-geološki fakultet (2012)
Standardization of Serbian gravity database on a test area
Vasiljević Ivana, Ignjatović Snežana, Đurić Dragana. "Standardization of Serbian gravity database on a test area" in 9th Congress of the Balkan Geophysical Society (BGS 2017), Antalya, Turkey, 5. - 9. Nov, 2017, http://www.earthdoc.org:EAGE (2017): 1-5
Updating the database of the spatial information system for capital underground mining rooms
Milutinović Aleksandar, Ganić Aleksandar, Miljanović Igor, Gajić Grozdana. "Updating the database of the spatial information system for capital underground mining rooms" in 4th Balkan Mining Congress BALKANMINE 1, Ljubljana, Slovenija:Faculty of Natural Sciences and Engineering (2011): 629-633
Establishment of a database of uranium anomalies and zones in Mongolija
Vakanjac Boris, Srna Predrag, Ristić-Vakanjac Vesna. "Establishment of a database of uranium anomalies and zones in Mongolija" in Uranium - Past and Future Challenges, Switzerland:Springer International Publishing Switzerland (2015): 161-168. https://doi.org/10.1007/978-3-319-11059-2_19
Moho Depth Determination of the Adriatic Sea Region Using a New Bouguer Anomaly Database
Tassis G.A., Papazachos C.B., G.N. Tsokas, I.N. Tziavos, Vasiljević Ivana, Stampolidis A.. "Moho Depth Determination of the Adriatic Sea Region Using a New Bouguer Anomaly Database" in 8th Congress of the Balkan Geophysical Society, Chania, Greece:Balkan Geophysical Society (2015). https://doi.org/10.3997/2214-4609.201414213
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović
... Croatian, Slovenian and Serbian. Infotheca Vol. 21, No. 1, September 2021 13 Marković A. et al., FrameNet Lexical Database. . . , pp. 7–33 FrameNet was conceived as a lexical database of English, which incor- porates the databases subsequently developed for other languages (French, Chinese, Portuguese ...
... to entities, events and relations that make it up (Fillmore 1976, 26).1 1.1 The design of FrameNet FrameNet2 is a lexical database of English based on annotated examples of how a lexical unit (hereinafter abbreviated as LU) is used in an actual7 texts. The basic premise comes down to the fact that most ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1