A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages
et al., 2012) and with the Digital Dictionary of the German Language (Digitales Wörterbuch der Deutschen Sprache (Klein and Geyken, 2010)) (Henrich et al., 2014). Gurevych et al. (2012) present UKB–a large-scale lexical-semantic resource con- taining pairwise sense alignments between a subset of
Wordnet Development Using a Multifunctional Tool
In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite
finite state transducers, can also be used to aid the user in developing and refining the wordnet. Keywords Wordnet development, language resource integration, HLT tools 1. Introduction The first wordnet, namely the Princeton WordNet (PWN), or simply WordNet, was conceived in 1985 by
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English
... components In designing Bibliša special attention is given to its language support component. It supports various aspects of multilingual libraries: its content is not only multilingual, but also aligned and it can be searched in any language. The proposed tool basically consists of the following components: ...
From DELA Based Dictionary to Leximirka Lexical Database
In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries.
... for natural language processing - NLP. 3 TEI 4 LMF 5 Lemon 84 Infotheca Vol. 19, No. 2, December 2019 Scientific paper The LMF prescribes a standardized framework for recording linguistic in- formation in computer lexicons and is based on the Standard ISO 24613: 2008 (Language Resource Management ...
Parallel Bidirectionally Pretrained Taggers as Feature Generators
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX
... extraction, providing an invaluable education resource, applicable in all of its domains. In the further work bilingual terminology extraction will be considered. REFERENCES [1] I. Gurevych, D. Bernhard and A. Burchardt, “Educational Natural Language Processing,” Notes for ENLP tutorial held at ...
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti
... it can be used for different purposes: as a dictionary for language learning (since it contains more than 13,000 LUs); as a valence dictionary; as a training dataset for semantic role labeling14 which makes it a rich digital language resource (with over 200,000 manually annotated sentences linked to over ...
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)
... typical language technology applications. In the next chapter, we will present an overview of language technology and its core application areas as well as an evaluation of the current situation of language technology support for Serbian. 57 4 LANGUAGE TECHNOLOGY SUPPORT FOR SERBIAN Language technology ...
Речници у дигиталном добу - информатичка подршка за српски језик
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom
... typology,” Proc. Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 2014 [14] C. Krstev and D. Vitas, “Serbian Morphological Dictionary – SMD,” University of Belgrade, HLT Group and Jerteh, Lexical resource, 2.0, 2015 [15] A. Balvet, D. Stošić, and ...
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih
... a general data set convenient for the detection of a broad range of abusive topics. We already used this resource for the detection of abusive triggers and the augmentation of the abusive language lexicon. D. Jokić, R. Stanković, C. Krstev, and B. Šandrih 13:3 1.2 Related work In the past two decades ...
Managing mining project documentation using human language technology
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach
... the Lexical Database Ranka Stanković1, Rada Stijović2, Duško Vitas1, Cvetana Krstev1, Olga Sabo2 1University of Belgrade, 2Institute for Serbian Language, Serbian Academy of Sciences and Arts E-mail: ranka.stankovic@rgf.bg.ac.rs, rada.stijovic@isj.sanu.ac.rs, vitas@matf.bg.ac.rs, cvetana@matf.bg ...
Karst wastewater as a high quality, renewable and within the circular economy water resource
Towards a Mining Equipment Ontology
software implementation of this resource, whereas in Section 4 we describe the mechanisms by which RudOnto, as a central resource, can be used for transformation of subsets of its concepts to ontologies for specific areas of mining engineering using OWL (Web Ontology Language). The final section features
... be derived from RudOnto for the area of Geostatistics, Mine safety, Mineral resource exploitation, Petroleum exploitation or Mining equipment. The structure of RudOnto can be described by an UML (Unified Modeling Language) model, as depicted in Figure 2. A brief description of this model follows. ...
Building learning capacity by blending different sources of knowledge
n, language of the resource content, date when the resource was made available to the public, contributor and type of resource are general data, originating from the DC standard. Within general data, a unique code is defined to provide unambiguous identification and access to the resource.
... represents a textual resource that BMP language support system makes use of. The language support system handles various types of requests issued by users, usually in the form of a query. The requests are handled by WSDL (Web Services Description Language) described Language Web Service, basically ...
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Using Metadata For Content Indexing Within An OER Network
Ranka Stanković, Olivera Kitanović, Ivan Obradović, Roberto Linzalone, Giovanni Schiuma, Daniela Carlucci (2014)
... the resource in terms of how the information contained in the resource is organized. It indicates whetherit is an electronic document, paper only document, slide(s), website, cd-rom/dvd, audio, or video. Educational data, taken from the LOM standard, suggest the auditorium the resource is intended ...
Building Terminological Resources in an e-Learning Environment
of semantic network. As RudOnto is a multilingual resource, another type of relations is introduced, namely those between equivalent terms in different languages, or the so called translational equivalents. However, although a term in one language can have several equivalents in another, for practical
... n of Moodle glossaries to proper language resources is a long term goal, we concluded that at least the development of Moodle glossaries in parallel with RudOnto should be terminated. We decided to continue developing RudOnto as the main terminological resource and derive Moodle glossaries from ...
