Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

186 items

Нове технологије за оживљавање старих текстова

Цветана Крстев, Ранка Станковић, Бранислава Шандрих Тодоровић, Милица Иконић Нешић (2023)

удаљено читање, књижевни корпус, обрада српског језика, анотација врстом речи, лематизација, именовани ентитети

Цветана Крстев, Ранка Станковић, Бранислава Шандрих Тодоровић, Милица Иконић Нешић. "Нове технологије за оживљавање старих текстова" in Зборник радова Међународне научне конференције Дигитална хуманистика и словенско културно наслеђе II, Београд, 28-29 јуни 2021., Београд : Савез славистичких друштава Србије (2023)
New Language Models for South Slavic Languages

Mihailo Škorić (2024)

Izlaganje će predstaviti izazove i perspektive modelovanja južnoslovenskih jezika, sa posebnim osvrtom opšte jezičke modele građene na arhitekturi transformera (BERT, GPT), na dostupne skupove tekstova za obučavanje tih modela, te kvantitet i kvalitet tih skupova. Izlaganje će ponuditi pregled dostupnih skupova i modela, dok će posebna pažnja biti posvećena najnovijim korpusima tekstova. Prvi korpus, Kišobran, predstavlja krovni veb korpus južnoslovenskih jezika i ujedno trenutno najveći korpus tekstova na našim prostorima koji broji preko osamnaest milijardi reči i uključuje sve ...

veliki korpusi teksta, jezički modeli, južnoslovenski jezici

Mihailo Škorić. "New Language Models for South Slavic Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Creation of a Training Dataset for Question-Answering Models in Serbian

Ranka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov (2024)

Razvoj i primena veštačke inteligencije u jezičkim tehnologijama značajno su napredovali poslednjih godina, posebno u domenu zadatka odgovaranja na pitanja (Question Answering - QA). Dok su postojeći resursi za QA zadatke razvijeni za glavne svetske jezike, srpski jezik je relativno zanemaren u ovoj oblasti. Ovaj rad predstavlja inicijativu za kreiranje obimnog i raznovrsnog skupa podataka za obučavanje modela za odgovaranje na pitanja na srpskom jeziku, koji će doprineti unapređenju jezičkih tehnologija za srpski jezik. Pored brojnih istraživanja o jezičkim modelima ...

veštačka inteligencija, obrada prirodnog jezika, jezički resursi, anotirani skupovi, ekstrakcija informacija, odgovaranje na pitanja

Ranka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov. "Creation of a Training Dataset for Question-Answering Models in Serbian" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Knowledge Graphs in the Era of Large Language Models: Opportunities and Challenges

Danka Jokić, Ranka Stanković, Jelena Jaćimović (2024)

Pojava velikih jezičkih modela (eng. Large Language Models ili LLMs) je značajno uticala na oblast veštačke inteligencije, naročito u oblastima obrade prirodnog jezika i generisanju teksta. Međutim, ključno ograničenje ovih modela leži u nedostatku strukturiranog znanja i sposobnosti zaključivanja, što otežava njihovu primenu u stvarnom svetu, gde se zahteva tačnost iznetih činjenica i zaključivanje na osnovu konteksta. S druge strane, grafovi znanja nude primamljivo rešenje. Oni pružaju bogat izvor strukturiranog znanja, tako što predstavljaju entitete i njihove relacije u ...

grafovi znanja, veliki jezički modeli, obrada prirodnog jezika, strukturirano znanje, kvalitet podataka, objašnjiva veštačka inteligencija, bezbednost sadržaja na internetu

Danka Jokić, Ranka Stanković, Jelena Jaćimović. "Knowledge Graphs in the Era of Large Language Models: Opportunities and Challenges" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024., University of Belgrade - Faculty of Philology (2024)
Transformer-Based Composite Language Models for Text Evaluation and Classification

Mihailo Škorić, Miloš Utvić, Ranka Stanković (2023)

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...

General Mathematics, Engineering (miscellaneous), Computer Science (miscellaneous)

Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660
Part of Speech Tagging for Serbian language using Natural Language Toolkit

Ranka Stanković, Boro Milovanović (2020)

Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...

obrada prirodnog jezika, mašinsko učenje, neuronske mreže

... 04:19:53 Part of Speech Tagging for Serbian language using Natural Language Toolkit Ranka Stanković, Boro Milovanović Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Part of Speech Tagging for Serbian language using Natural Language Toolkit | Ranka Stanković, Boro ...
... ways. In this paper, we will create a tagger for Serbian with a help of a Python library NLTK (Natural Language Toolkit). Besides just exposing more than 50 corpora and lexical resources, NLTK is used for making programs that handle human language data, ranging from tokenization to semantic reasoning ...
... different algorithms makes this library a good choice for a research. Serbian language belongs to a group of low-resource languages so there’s a modest research on this topic. First attempts to create an automatic PoS tagger for Serbian relied on a dictionary. Delić et al. used custom transformations ...
Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary

Miljana Mladenović, Ranka Stanković, Cvetana Krstev (2017)

In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...

... ng term in the standard language aiming to improve search over a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in the standard language that users usually use for ...
... ion [7] of a digital version of a di- alect vocabulary of the Serbian language, produced on the basis of traditional “ On-line at http://www.vranje.co.rs dialect dictionaries [16],{17]. This is the first digital resource for Serbian which, in addition to linguistic information, provides also: sound ...
... performances of the digital dialect dictionary: Serbian morphological e-dictionaries used to produce all inflected forms of stan- dard terms and Serbian WordNet (SWN) ontology represented in OWL2 format for which we define rules expressed in Semantic Web Rule Language (SWRL) to be used to generate synonymous ...
Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom

Lidija Beko, Ivan Obradović, Ranka Stanković (2015)

... flashcards, mining and geology terminology, students’ perception 1. INTRODUCTION The vocabulary of any language is huge and its acquisition takes time, even for a native speaker. Serbian language learners are not an exception to this rule and they are generally conscious of the fact that the limitations ...
... which are “fully” translated in the Serbian language, unlike, Table 2, where loanwords dominate and illustrate the current trend of unprecedented and uncritical adopting of foreign words without translation. Table 1: Fully translated words English Serbian Bearing ležište, osnova, nosač, ležaj ...
... Using authentic teaching materials, encompassing demanding language input and subject content, inevitably expose our students to complex mining and geology terminology. At the same time, many of the terms have not yet been translated into Serbian, therefore, looking up the equivalents in bilingual ...
Lidija Beko, Ivan Obradović, Ranka Stanković. "Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom" in Proceedings of the Second International Conference on Teaching English for Specific Purposes and New Language Learning Technologies, May, 22-24, 2015, Niš, Serbia, Faculty of Electronic Engineering, University of Niš, Niš : Faculty of Electronic Engineering (2015)
Multi-word Expressions for Abusive Speech Detection in Serbian

Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev (2020)

Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...

uvredljiv govor, govor mržnje, leksički izvori, višejezični leksikon, izrazi sa više reči

... lexicon for Serbian language (Mladenović et al., 2016b), synsets from the Serbian WordNet (Mladenović et al., 2016a). We plan to use the lexicon for building a corpus of abusive content in social networks in Serbian as well as a classifier using rules and existing resources for Serbian language (Krstev ...
... that will facilitate abusive language detection already exist. Serbian Morphological Dictionaries are certainly a staple in processing texts in Serbian (Krstev, 2008). In order to process implicitly abusive language, we need to take into account the usage of non-literal language, the rhetorical devices that ...
... the process of fine-tuning offensive language models 77 for Danish, Turkish, and English. In the research presented in this paper, we are improving the Serbian part of HurtLex, as it can be a powerful resource for detecting abusive language in Serbian. 3 Serbian HurtLex revision 3.1 srHurtLex lexical ...
Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++

Branislava Šandrih, Ranka Stanković (2020)

U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...

ekstrakcija terminologije, validacija terminologije, GIZA++, grafovi, Unitex, klasifikacija teksta

... paper, we present an approach for au- tomatic bilingual terminology extraction for English-Serbian language pair that relies on an aligned bilingual domain corpus, a termi- nology extractor for a target language and a tool for chunk alignment. We examine the per- formance of the method on a Library and ...
... translation is specifically adapted as an utterance in the language in which it is translated to (i.e. as a word in a target language). An example that demonstrates both cases is an English word “a screenshot”, from the computer science. In Serbian, this term is either translated as snimak Infotheca Vol ...
... Technologies domain. It is a challenge to produce and maintain up-to-date terminology re- sources, especially for an under-resourced language, such as Serbian. Today, Serbian terminology is transferred mainly from English, since it is better de- veloped for many scientific and technological domains. Purely ...
Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
Development of Open Educational Resources (OER) for Natural Language Processing

Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević (2015)

In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...

E-Learning, Open Educational Resources, Computational Linguistics, Lexical Resources, edX

... Berlin Heidelberg. [6] Vitas, D., et al., Language Technology Support for Serbian, in The Serbian Language in the Digital Age, G. Rehm and H. Uszkoreit, Editors. 2012, Springer Berlin Heidelberg. p. 58-75. [7] Natural Language Processing for Serbian : resources and applications 35th Anniversary ...
... s and students of Serbian language at the Faculty of Philology, as well as NLP graduate courses for students of Mathematics at the Faculty of Mathematics. Over the last 35 years, many resources and tools for processing Serbian have been developed within the Human Language Technologies (HLT) ...
... there are none in Serbian. Coursera platform provides three courses on Natural Language Processing. These courses are offered by three universities Stanford 5 , University of Michigan 6 and Columbia University 7 .Topics included in these courses are syntax and parsing, language modelling and ...
Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment

Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others (2020)

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages ...

lexical semantic resources, sense alignment, lexicography, language resource

... ‘The Little Dic- tionary’), the only two monolingual dictionaries avail- able for this language. Italian We used ItalWordNet (Roventini et al., 2000) and SIMPLE (Lenci et al., 2000). Serbian We used the Serbian WordNet (Krstev et al., 2004; Stanković et al., 2018) and the Rečnik Matice srpske ...
... 0 (0) 760 (5539) Italian SIMPLE 290 (1990) 218 (1240) 0 (0) 0 (0) 0 (0) 508 (3230) Serbian WordNet 691 (5864) 985 (6522) 92 (713) 0 (0) 0 (0) 1768 (13099) Serbian Dictionary of Serbo- Croatian Literary Language 289 (2360) 281 (1527) 29 (215) 0 (0) 0 (0) 599 (4102) Slovene WordNet 409 (1106) 303 ...
... Danish Dutch English (KD) English (NUIG) Estonian German Hungarian Irish Italian Serbian Slovenian (JSI) Slovenian (ISJFR) Spanish Portuguese Russian Figure 2: Frequency of the number of senses in the datasets per language and resource (left resources at left and right resources at right) are available ...
Sina Ahmadi, John P McCrae, Sanni Nimb, Fahad Khan, Monica Monachini, Bolette S Pedersen, Thierry Declerck, Tanja Wissik, Andrea Bellandi, Irene Pisani, [...] Ranka Stanković and others . "A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment" in Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, European Language Resources Association (ELRA) (2020)
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources

Branislava Šandrih, Cvetana Krstev, Ranka Stanković (2020)

In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...

Linguistics and Language,Software,Artificial Intelligence,Language and Linguistics

Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
E-Connecting Balkan Languages

Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva (2009)

In this paper we present a versatile language processing tool that can be successfully used for many Balkan languages. This tool relies for its work on several sophisticated textual and lexical resources that were developed for most of Balkan languages. These resources are based on several de facto standards in natural language processing.

Query expansion, e-dictionary, wordnet, proper name, aligned text

... have been mainly used for Serbian they are by no means language dependent as long as compatible lexical resources exist for any two languages. Nevertheless, a full potential of these tools was until now used only for Serbian, and in bilingual context, for Serbian and English. In this paper ...
... section. WS4LR offers to a user the possibility to expand the query morphologically, semantically, but also to another language. If the first language is Serbian, the second language can be English, Bulgarian, or any other. A user can choose two working languages by adjusting parameters in the “ ...
... Figure 1. The HTML view of the aligned Bulgarian- Serbian text Users queries can be semantically expanded by wordnets and by Prolex database. WS4LR obtains semantic expansion of a query by means of wordnet of the first language (Serbian wordnet – SWN in the case of our examples), selecting ...
Cvetana Krstev, Ranka Stanković, Duško Vitas, Svetla Koeva. "E-Connecting Balkan Languages" in Proceedings of the Workshop Workshop on Multilingual resources, technologies and evaluation for Central and Eastern European Languages, 17 September 2009, eds. C. Vertan, S. Piperidis, E. Paskaleva and Milena Slavcheva, Borovets, Bulgaria : Association for Computational Linguistics Stroudsburg, PA, USA (2009)
Towards translation of educational resources using GIZA++

Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević (2016)

... An example excerpt from a TMX document 5. TOWARDS MACHINE TRANSLATION FOR SERBIAN Moses is a statistical machine translation system written in C++ with library that enables usage of Moses in the JavaScript language. Loading multiple translation systems into the same node process is provided ...
... y searching large parallel corpora with a powerful query language [18]. In [19] another approach for extraction of semantically related word pairs, ideally translational equivalents, is presented, from aligned texts in SELFEH, a Serbian- English corpus of texts related to education, finance, health ...
... performed also on the Intera English/Serbian corpus [19, 20] with and without lemmatisation and PoS tagging. Authors report the most suitable measure: ranky(x) = (C(x|y) / ΣiϵV C(i|y)) * (C(x|y) / C(x) ) where V is the set of word forms i of a target language for which C(i|y) > 0, C(x) is the frequency ...
Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
Terminological and lexical resources used to provide open multilingual educational resources

Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić (2016)

Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...

otvoreni obrazovni resursi, leksički resursi, obrada prirodnih jezika, terminologija

... flexion of Serbian language and other similar languages of Western Balkans. Partners in BAEKTEL project produce materials in Serbian, Bosnian and Montenegrin language. When it comes to morphology, the aforementioned languages are quite similar, therefore it is possible to use the Serbian morphological ...
... manual term extraction completely. Due to the rich morphology of Serbian language and the complexity of terms (they are the most often composed of two or more words called multi word units) it is not a simple process. Members of Language Resources and Technologies Society developed semiautomatic approach ...
... resources, Natural Language Processing, Terminology 1. INTRODUCTION Natural Language Processing (NLP) has a two-faceted approach to education where one involves e-learning and computer-assisted learning and instruction and the other consists of NLP tools for analysis and use of language by machines [1] ...
Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
Using English Baits to Catch Serbian Multi-Word Terminology

Cvetana Krstev, Branislava Šandrih, Ranka Stanković (2018)

In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...

aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inﬂection

... (in a source or a target language) in various ways: some authors use mor- phosyntactic patterns on lemmatized and POS-tagged texts 2In this paper we will call ‘source’ language a well-resourced language (English), and ‘target’ language a less-resourced lan- guage (Serbian). 2487 (Bouamor et al ...
... terminology extractor for a target language, and a tool for word and chunk alignment. In this first experiment a source language is English, a target language is Serbian, a domain is Library and Information Science for which a bilingual terminological dictionary exists. Our term extractor is based on ...
... presented in this paper is motivated by our belief that Natural Language Processing (NLP) resources, meth- ods and tools can help in the development of terminology in the Serbian language. Our work relies on the following presuppositions: 1. Serbian terminology is today transferred mainly from English because ...
Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface

Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos (2024)

Predstavljamo trenutne aktivnosti na definisanju interfejsa leksikona i korpusa koji će služiti kao referenca u prikazu polileksemskih jedinica - višečlanih izraza - (različitih tipova - imenskih, glagolskih, itd.) u specijalizovanim leksikonima i povezivanju ovih unosa sa njihovim pojavljivanjima u korpusima. Konačni cilj je korišćenje ovakvih resursa za automatsko identifikovanje višečlanih izraza u tekstu. Uključivanje nekoliko prirodnih jezika ima za cilj univerzalnost rešenja koje nije usredsređeno na određeni jezik, kao i prilagođavanje idiosinkrazijama. Raspravljaju se izazovi u leksikografskom opisu višerečnih ...

multiword expression lexicon, corpus, proof-of-concept lexicon encoding

Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos. "Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
A bilingual digital library for academic and entrepreneurial knowledge management

Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić (2015)

A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...

Knowledge management, Digital library, Multilingualism, Language Resources, Terminology

... Human Language Technologies (HLT) and technology enhanced learning (TEL). She published one book and more than 100 scientific papers, most of them related to natural language processing, more specifically to language resources development and their application. She has developed the Serbian mor ...
... key contributors to the development of Serbian wordnet, aligned multilingual corpus, and many other language resources. She also developed the first Serbian Named Entity Recognition system. She participated in a number of international and national language and TEL related projects. Biljana ...
... aligned text in an HTML format. Figure 5 shows the result of the previous query for the selected language (English), but selection of language “sr” would retrieve a page with metadata in Serbian. Figure 5: Results of the metadata search 14 ...
Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
Веб-алат за управљање грађом Речника САНУ и анотација листића

Рада Стијовић, Ранка Станковић, Михаило Шкорић (2020)

Грађа на основу које се израђује Речник српскохрватског књижевног и народног језика САНУ, а која садржи материјал из преко 4.500 писаних извора и 300 рукописних збирки речи са подручја народних говора штокавског наречја, забележена је на око 5.000.000 листића. Богат лексички материјал, који обухвата књижевни и народни језик у протекла два века и на основу кога треба да се напише још најмање 15 томова Речника, пружа могућност и за разноврсна лингвистичка и ванлингвистичка истраживања. Из тог разлога се приступило ...

лексикографска грађа, листићи, лексикографски алат, дигитализација, анотација

Рада Стијовић, Ранка Станковић, Михаило Шкорић. "Веб-алат за управљање грађом Речника САНУ и анотација листића" in Rasprave Instituta za hrvatski jezik i jezikoslovlje, Institute of Croatian Language and Linguistics (2020). https://doi.org/10.31724/rihjj.46.2.32

Претрага

186 items

Нове технологије за оживљавање старих текстова cite

New Language Models for South Slavic Languages cite

Creation of a Training Dataset for Question-Answering Models in Serbian cite

Knowledge Graphs in the Era of Large Language Models: Opportunities and Challenges cite

Transformer-Based Composite Language Models for Text Evaluation and Classification cite

Part of Speech Tagging for Serbian language using Natural Language Toolkit cite

A WordNet Ontology in Improving Searches of Digital Dialect Dictionary cite

Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom cite

Multi-word Expressions for Abusive Speech Detection in Serbian cite

Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++ cite

Development of Open Educational Resources (OER) for Natural Language Processing cite

A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment cite

Two approaches to compilation of bilingual multi-word terminology lists from lexical resources cite

E-Connecting Balkan Languages cite

Towards translation of educational resources using GIZA++ cite

Terminological and lexical resources used to provide open multilingual educational resources cite

Using English Baits to Catch Serbian Multi-Word Terminology cite

Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface cite

A bilingual digital library for academic and entrepreneurial knowledge management cite

Веб-алат за управљање грађом Речника САНУ и анотација листића cite

Нове технологије за оживљавање старих текстова

New Language Models for South Slavic Languages

Creation of a Training Dataset for Question-Answering Models in Serbian

Knowledge Graphs in the Era of Large Language Models: Opportunities and Challenges

Transformer-Based Composite Language Models for Text Evaluation and Classification

Part of Speech Tagging for Serbian language using Natural Language Toolkit

A WordNet Ontology in Improving Searches of Digital Dialect Dictionary

Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom

Multi-word Expressions for Abusive Speech Detection in Serbian

Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++

Development of Open Educational Resources (OER) for Natural Language Processing

A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment

Two approaches to compilation of bilingual multi-word terminology lists from lexical resources

E-Connecting Balkan Languages

Towards translation of educational resources using GIZA++

Terminological and lexical resources used to provide open multilingual educational resources

Using English Baits to Catch Serbian Multi-Word Terminology

Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface

A bilingual digital library for academic and entrepreneurial knowledge management

Веб-алат за управљање грађом Речника САНУ и анотација листића