122 items
Old or New, We Repair, Adjust and Alter (Texts)
Cvetana Krstev, Ranka Stanković (2020)U ovom radu predstavljamo kako se e-rečnici i kaskade transduktora konačnih stanja implementirani u alatu Unitex mogu koristiti za rešavanje tri problema transformacije teksta: ispravljanje tekstova nakon OCR-a, vraćanje dijakritičkih znakova i prebacivanje između različitih jezičkih varijanti.ispravka teksta, OCR greške, restauracija dijakritika , jezičke varijante, elektronski rečnik, transduktori konačnih stanja... Belgrade, Serbia 1 Text mending – introduction to problems Text mending is one of the simplest text transformation problems, when compared to speech recognition and generation, text summarization and machine translation. It is also one of the first problems posed to computers that did not involve calculation ...
... approaches were developed for many languages. (Krstev et al., 2018). Errors produced during machine text input, for instance by Optical Char- acter Recognition (OCR), are of a different type and different solutions were developed for detecting and correcting such errors. As early as in the late 1950s, Bledsoe ...
... context or more complex structures. 2 Correction of OCR errors In the process of digitization printed books are scanned and then optical character recognition (OCR) is applied. A text that fully corresponds to the original is rarely obtained since OCR is prone to errors. The quality of the resulting text ...Cvetana Krstev, Ranka Stanković. "Old or New, We Repair, Adjust and Alter (Texts)" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.3
Bilingual lexical extraction based on word alignment for improving corpus search
Jelena Andonovski, Branislava Šandrih, Olivera Kitanović. "Bilingual lexical extraction based on word alignment for improving corpus search" in The Electronic Library, Emerald (2019). https://doi.org/10.1108/EL-03-2019-0056
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... of companies, the system also needs to recognise a par- ticular string of words in a document represents a com- pany name, using a process called named entity recogni- tion. A more demanding challenge is matching a query in one language with documents in another language. Cross-lingual information retrieval ...
... many appli- cations for text summarisation. Within the aforementioned areas, highly successful ex- periments for Serbian are underway related to named entity extraction as a part of the information extrac- tion problem. A speedy development of IE and QA is expected, given the extent of developed morphological ...
... thesis and recognition in Serbian was made by a group from the Faculty of Technical Sciences at theUniversity of Novi Sad. Various applications in the fields of TTS and ASR have been developed based on the speech and lexical databases with accentuated word forms. Serbian speech recognition and generation ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
WebGIS Cadastre of Abandoned Mines in Autonomous Province of Vojvodina
Ranka Stanković, Nikola Vulović, Nikola Lilić, Ivan Obradović, Radule Tošović, Milica Pešić-Georgiadis (2015)... data are entered on the limits of the mining waste field, on the carrier of exploration and/or exploitation or economic entity that produces mining waste, on the economic entity which is the operator of mining waste, on the characterization and categorization of mining waste landfills within the mining ...Ranka Stanković, Nikola Vulović, Nikola Lilić, Ivan Obradović, Radule Tošović, Milica Pešić-Georgiadis. "WebGIS Cadastre of Abandoned Mines in Autonomous Province of Vojvodina" in Proceedings of the 5th International Symposium Mining And Environmental Protection,June 10-13,2015, Vrdnik, Serbia, Belgrade : Faculty of Mining and Geology (2015)
Razvoj ARCGIS geobaze površinskog kopa korišćenjem UML CASE alata
... geodatabase represents a collection of interrelated data, namely: attributes (data describing a geographic entity numerically or textually), geometry (data defining the shape and size of an entity and its position in space) and topology (data defining relations between different geographic entities) ...
... built-in system for storing and indexing both alphanumeric and geometric data. The Irish company ESRI2 created one of the most complex GIS platforms named ArcGIS®. This integrated software family provides all functions necessary for developing a geographic information system. ArcGIS encompasses a palette ...Aleksandra Tomašević, Ljiljana Kolonja, Ivan Obradović, Ranka Stanković, Olivera Kitanović. "Razvoj ARCGIS geobaze površinskog kopa korišćenjem UML CASE alata" in Podzemni radovi, Beograd : Univerzitet u Beogradu - Rudarsko-geološki fakultet (2012)
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... the development of Serbian wordnet, aligned multilingual corpus, and many other language resources. She also developed the first Serbian Named Entity Recognition system. She participated in a number of international and national language and TEL related projects. Biljana Lazić is librarian at ...
... results of international projects is also an increasingly frequent practice. In this paper we present a publicly available multilingual digital library named Bibliša, developed for management, search and the browsing of aligned bilingual text collections. Design/methodology/approach – The approach ...
... made an effort to develop a software tool and bi-lingual resource that supports terminology research. This paper presents a developed solution, named Bibliša1, which is free for use and publicly available. In the second section of this paper are presented the aspects of a multilingual digital ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
An intelligent hybrid system for surface coal mine safety analysis
Nikola Lilić, Ivan Obradović, Aleksandar Cvjetić. "An intelligent hybrid system for surface coal mine safety analysis" in Engineering Applications of Artificial Intelligence (2010)
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... fact that most LUs are best defined through semantic frames, a conceptual structure that provides a description of the type of situation, relation or entity and the participants involved in it (Ruppenhofer et al. 2016, 7). For example, taking a risk typically involves the following: a person taking the ...
... unique and set it apart from other frames. Alongside the core elements, there are non-core 7. That description entails:1) a schematic description of entity types or situation illustrated by the frame; 2) choosing descriptive labels for describing the frame; 3) drawing up a draft list of words that belong ...
... information from it. As mentioned in the Introduction (Section 1.1 of this paper), a frame is a conceptual structure describing a type of situation, entity or relation together with its participants. The structure of FrameNet within the NLTK framework is comprised of a collection of XML (Extensible Markup ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Ver- nacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the ...
... lexical database The guidelines for dictionary writing were used to defi ne the rules for the segmentation of the dictionary articles, the pattern recognition, and the alignment of the recognized markers with the predefi ned categories, as described in the previous section. The dictionary article units ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
From post-disaster landslides inventory to open landslides data
Biljana Abolmasov, Miloš Marjanović, Uroš Đurić, Jelka Krušić. "From post-disaster landslides inventory to open landslides data" in Proceedings of 3rd European Regional Conference of IAEG/ Athens/ Greece/ 6-10 October 2021, International Association for Engineering Geology and the Environment (2021)
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... processing of named entities, as the initial phase in information extraction. Serbian morphological dictionaries and local grammars are successfully being used for recognition of names of persons and of various functions they might perform within the society. Local grammars for recognition of functions ...
... the aforementioned structure as well. This example leads us to possible applications related to inflection of free noun phrases based on the recognition of their syntactic structure. This idea draws from the assumption that many free noun phrases (used in search queries, for example) may have the ...
... Lexical Approaches,” Linguistic Issues in Language Technologies, vol. 1, no. 2, 2008. [6] C. Krstev and D. Vitas, “Finite State Transducers for Recognition and Generation of Compound Words,” in IS-LTC 2006, T. Erjavec and J. Žganec Gros, Eds. Ljubljana, Slovenia: Institut “Jožef Stefan”, October 2006 ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)
Biljana Rujević, Mihailo Škorić (2024)The paper describes linking the Digital Repository of the University of Belgrade, Faculty of Mining and Geology, with the eScience system in terms of transferring metadata about the results of researchers' scientific work. The steps taken to ensure a smooth harvesting of metadata are outlined. Additionally, a presentation of additional improvements to the OAI system is provided, aiming to contribute to the automatic linking of authors with their results in the eScience system.Biljana Rujević, Mihailo Škorić. "Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)" in Infotheca, Faculty of Philology, University of Belgrade (2024). https://doi.org/10.18485/infotheca.2023.23.2.4
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... can be exploited in the processing of named entities, as the initial phase in information extraction. Ser- bian morphological dictionaries and local grammars are successfully being used for An Approach to Efficient Processing of Multi-Word Units 19 recognition of names of persons and of various functions ...
... 6th LREC. Marrakech, Marocco (2008) 10. Krstev, C., Vitas, D., Obradović, I., Utvić, M.: E-dictionaries and finite-state automata for the recognition of named entities. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pp. 48–56. Association ...
... functions they might perform within the society [10]. Local grammars for recognition of functions can recognize various syntactic structures but, naturally, not all of them. The use of MWUs can contribute to the increase of the recall without further complicating the local grammars. For example, the ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... helps in reducing ambiguity. The lexical resource, consisting of words that could be used as a trigger for recognition of abusive language is built, with an idea that the Serbian system for recognition and normalization of abusive expressions will also take into consideration phrases and figurative speech ...
... extension of the vocabulary with expressions that are not present in any existing lexicons, but evidenced in corpus as having offensive usage. The recognition of the different usages, that can be both offensive and non–offensive will be marked. The additional information about context or sense embeddings ...
... into Cyrillic: diddlei, villainess, ferociousness, carcharodon; 2) foreign (not-translated) words: anguillidae, anguilliformes, animal; 3) irrelevant named entities: Istočni Goti, Abulija, Animalija, Drag kraljica; 4) literal translations that are meaningless in Serbian: jabuka poliranje, javni pogodnost ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
Towards Sustainable Management of Transboundary Hungarian-Serbian Aquifer
Zoran Stevanović, Peter Kozák, Milojko Lazić, Janos Szanyi, Dušan Polomčić, Balazs Kovács, Jozsef Török, Saša Milanović, Bojan Hajdin, Petar Papić (2011)... 407o of whom are on the Hungarian side of the border. The Pannonian basin (or the Great Hungarian Basin) represents a geographicai and geological entity that spreads over the territory ofseveral countries. The central | ,,, il i *ti':i !l:i!lr::l llr,:ir',. 'lii l:l' : i::,;!l ritl ::lli;' ;:iiii ...
... Roof Report for 2004 [6], the transboundary acluifer system of Hungary-Serbia was preliminary separated into two parts: one large GW body in Serbia named CS-DU 10, and five in Hungary (P.1. and P.2. groups). The totai area is assumed to cover around 27 0OA1Zoran Stevanović, Peter Kozák, Milojko Lazić, Janos Szanyi, Dušan Polomčić, Balazs Kovács, Jozsef Török, Saša Milanović, Bojan Hajdin, Petar Papić. "Towards Sustainable Management of Transboundary Hungarian-Serbian Aquifer" in Transboundary Water Resources Management - A Multidisciplinary Approach, Weinheim, Germany : Wiley-VCH (2011): 143-149
On the compatibility of lexical resources for NooJ
Lexical resources for many languages are provided for the NooJ linguistic development environment. Meta-data descriptions of morphosyntactic and semantic properties of these languages and their resources are a mandatory part of each language module. In this paper we analyze how well the meta-data actually describe resources for a chosen subset of languages and to what extent are they compatible across languages to support multilingual processing. We show that there is place for improvement in both directions.... not captured, but were rather due to peculiarities of translations and inadequacies of simplified graphs. Finally, there was some incorrect recognition as well. Some of it results from the usual ambiguity, as illustrated by the following Serbian example (vrsta is a measurement unit, but also ...
... En: she carried brigantine, foresail, storm-jib, and standing-jib, and was well rigged for running before the wind... The next source of false recognition is more serious. Namely, in the English e-dictionary numbers are marked as determiners, which represent a wider category, as illustrated by the ...Ranka Stanković, Miloš Utvić, Duško Vitas, Cvetana Krstev, Ivan Obradović. "On the compatibility of lexical resources for NooJ" in Automatic Processing of Various Levels of Linguistic Phenomena: Selected Papers from the 2011 International Nooj Conference, Cambridge Scholars Publishing (2012): 96-108
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... longest to the shortest, in order for them to be searched in the documents without the risk of longest matches not being recognized due to previous recognition of shorter ones (Figure 5). The result of this query is a CSV file whose rows contain the name (rdfs:label) and the taxonomic reference (treeNumber) ...
... However, having in mind the order of applying the replacements (from the longest to the shortest term), there will be no wrongful replacement and recognition of only a part of the term. 2.2 Conversion of documents into concept vectors This stage consists of two steps. First, in all documents, the previously ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
Developing Termbases for Expert Terminology under the TBX Standard
... inflection class or word forms. This information is essential for proper processing of all texts, such as lematization, morphological analysis, named entity recognition and the like. This is especially important in the case of domain specific texts as in the fields of geology or mining. Thus, appropriate ...
... containing expert terminology, such as the textbook “Introduction to Mining’. The approach also envisages in- tegration with cascades for named entity recognition such as mining equipment, specific minerals and the like. Building of an aligned Serbian-English corpus of texts in the area of mining and ...
... mineralnih sirovina.N:s2qn leziSta mineralnih sirovina,lezisSte mineralnih sirovina.N:w4qn Domain specific e-dictionaries are especially important in recognition of com- pound words in texts featuring expert terminology, as such texts usually abound with compounds having a meaning often very different from ...Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), BAS (2024)
Development Of The Serbian Geological Resources Portal
... factors of the search field. Namely, a number of entities and their attributes within the database corre- spond to each search criterion, and each entity/attrib- ute has certain weight factors which determine the rel- evance of the appearance of a resource within the set of results. Entering different ...Ranka Stanković, Jelena Prodanović, Olivera Kitanović, Velizar Nikolić. "Development Of The Serbian Geological Resources Portal" in Proceedings of the 17th Meeting of the Association of European Geological Societies, Belgrade, Serbia : The Serbian Geological Society (2011)