Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
In science, industry and many research fields, terminology is rapidly developing. Most often, a language that is "lingua franca" for most of these areas is English. As a consequence, for many fields domain terms are conceived in English and later translated into other languages. In this paper we present an approach for automatic extraction of bilingual terminology for the English-Serbian language pair that relies on an aligned bilingual domain corpus, a terminology extractor for the target language and a part alignment tool. We examine the performance of the method on the Information Science domain.
Scientific paper Extraction of Bilingual Terminology using Graphs, Dictionaries and GIZA++ UDC 81'322.2 DOI 10.18485/infotheca.2019.19.2.6 ABSTRACT: In science, industry and many research fields, terminology is rapidly developing. Most often, a language that is "lingua franca" for most of these areas is English.
The obtained results, as well as the application that implements the method, are available on-line. KEYWORDS: terminology extraction, terminology validation, GIZA++, graphs, Unitex, text classification. PAPER SUBMITTED: 30 September 2019 PAPER ACCEPTED: 20 December 2019 Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... (Chen, 2004). Having in mind that a terminology research consists of: analyzing the concepts and concept structures used in a field or domain of activity, identifying the terms assigned to the concepts and in the case of bilingual or multilingual terminology, establishing correspondences between ...
... makes it a valuable tool for terminological research. Keywords – Knowledge management, Digital library, Multilingualism, Language Resources, Terminology Paper type – Academic Research Paper 2 ...
... between terms in the various languages, we have made an effort to develop a software tool and bi-lingual resource that supports terminology research. This paper presents a developed solution, named Bibliša1, which is free for use and publicly available. In the second section of this paper are presented ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... performed manually, as presented by tables and graphs in Section 4. The solution to terminology extraction outlined in this paper will by all means speed up the development of e-dictionaries, as in addition to the terminology extraction, the approach can be applied to the extraction of MWUs belonging to ...
... multi-word forms were evaluated as proper multi-word units, and among them 97% were associated with correct lemmas. Keywords: term extraction, terminology, multi-word units, lemmatization, finite-state transducers 1. Motivation Various approaches have been proposed for multi-word term (MWT) ...
... 512 Table 6 Number of lemmas after manual evaluation 5. Concluding remarks and Future Work The paper presents an approach to terminology extraction for Serbian based on e-dictionaries and local grammars. For extraction purposes 14 graphs were developed, which extract the most frequent ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... domain terminology. Keywords: aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection 1. Motivation Terminology is rapidly developing in many research and technological fields. It is very difficult to produce and main- tain up-to-date terminology resources ...
... present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach ...
... resources, meth- ods and tools can help in the development of terminology in the Serbian language. Our work relies on the following presuppositions: 1. Serbian terminology is today transferred mainly from English because English terminology is better devel- oped for many scientific and technological domains ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages
Jelena Lazarević, Olivera Kitanović (2024.)Cilj rada je istraživanje kolokabilnosti kao načina na koji se leksičke jedinice povezuju sa rečima iz različitih kategorija, formirajući veće jedinice. Istraživanje semantičkih i sintaksičkih principa ovih kombinacija u španskom i srpskom jeziku fudbala izvedeno je na komparabilnim fudbalskim korpusima SrFudKo i EsFudko, razvijenim u okviru doktorske disertacije Jelene Lazarević pod nazivom: Jezičke odlike diskursa novih medija o fudbalu: kontrastivna analiza na korpusu srpskog i španskog jezika. Korpus fudbala SrFudKo, kreiran na osnovu tekstova o fudbalu sa pet srpskih veb-portala: ...Jelena Lazarević, Olivera Kitanović . "Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... material terminology, and have also been included in this research. Some important related issues discussed are collocation extraction methods, the use of domain labels, lexical and semantic relations, definitions and subentries. Keywords: raw material; mining; terminology; dictionary; terminology application; ...
... material terminology, and have also been included in this research. Some important related issues discussed are collocation extraction methods, the use of domain labels, lexical and semantic relations, definitions and subentries. Keywords: raw material; mining; terminology; dictionary; terminology application; ...
... for dictionary production. The approach is focused on raw material terminology, with an emphasis on terminology related to the mining industry, as a case study, the main goal being to cover Serbian and bilingual English-Serbian terminology in the raw material domain, within a system that can be Appl ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX platform.
... Knowledge in Technology enhanced learning) resources, lexical resources, the process of terminology extraction and a presentation of TERMI, an application for terminology management. 2. TERMINOLOGICAL RESOURCES Terminology is considered to be a young interdisciplinary scientific field. The interest in it ...
... corpora etc. Special attention will be given to Termi, newly developed application for terminology management. Keywords: Open Educational Resources, Lexical resources, Natural Language Processing, Terminology 1. INTRODUCTION Natural Language Processing (NLP) has a two-faceted approach to education ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)