Претрага
225 items
-
Advantages and challenges in presenting mathematical content using EDX platform
... such as text, video, task or discussion. In the course “Preparation for entry exam” we mostly used a combination of text component and task component. The text component contains an editor for writing plain text, with basic editing functions, such as choosing fonts and size of text, adding ...
... of text. In that case the creator has more opportunities in editing text but it demands creator’s basic knowledge of html. Task component offers a few on board frameworks for different types of tasks. For example, there are possibilities to create multiple choice task, task with text or ...Marija Radojičić, Ivan Obradović, Ranka Stanković, Olivera Kitanović, Roberto Linzalone. "Advantages and challenges in presenting mathematical content using EDX platform" in The Seventh International Conference on e-Learning (eLearning-2016), Belgrade : Metropolitan University (2016)
-
Life Cycle Assessment of Individual Wood Biomass Heating Systems in Households
The use of solid fuels (wood biomass and coal) for heating of households continues to be common practice within European countries. Solid fuel combustion in households contributes more than 46% to total emissions of fine particulate matter. In this study, a life-cycle assessment (LCA) of firewood-based and pellet-based heating systems is performed. These two systems represent two different types of individual wood biomass heating systems. In the case of the firewood-based heating systems, it is analyzed a typical stove ...... ing-and-cooling_en?redir=1#:~:text=In%20EU%20households%2C%20heating%20and,energy%20use%20(192.5%20Mtoe)*.&text=To%20fulfil%20the%20EU's%20climate,its%20use%20of%20fossil%20fuels https://ec.europa.eu/energy/topics/energy-efficiency/heating-and-cooling_en?redir=1#:~:text=In%20EU%20households%2C%20hea ...
... energy%20use%20(192.5%20Mtoe)*.&text=To%20fulfil%20the%20EU's%20climate,its%20use%20of%20fossil%20fuels https://ec.europa.eu/energy/topics/energy-efficiency/heating-and-cooling_en?redir=1#:~:text=In%20EU%20households%2C%20heating%20and,energy%20use%20(192.5%20Mtoe)*.&text=To%20fulfil%20the%20EU's%20climate ...
... cooling, Facts and Figures,https://ec.europa.eu/energy/topics/energy-efficiency/heating-and- cooling_en?redir=1#:~:text=In%20EU%20households%2C%20heating%20and,en ergy%20use%20(192.5%20Mtoe)*.&text=To%20fulfil%20the%20EU's%20climate,i ts%20use%20of%20fossil%20fuels(accessed 10 Sept. 2020). [2] Martinopoulosa ...Boban Pavlović, Dejan Ivezić, Marija Živković. "Life Cycle Assessment of Individual Wood Biomass Heating Systems in Households" in 5th International Scientific Conference ”Conference on Mechanical Engineering Technologies and Applications” COMETa2020, East Sarajevo, 2020, University of East Sarajevo, Faculty of Mechanical Engineering East Sarajevo (2020)
-
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... Training set to build explicit models for each MeSH concept (‘Concept- oriented’ classifiers); – Manually created document annotations, like ordinary text classifiers, to determine the appropriate concept (‘K-Nearest Neighbor’ classifier); – Hybrid and hand-refined systems that combine multiple approaches ...
... stored in the MeSH ontology. The goal was to create a classifier that would be quick and simple, in order to solve the problem of the large amount of text that needed to be classified. A drastic summarization of documents and the classes themselves was applied. Classes (concepts of the second level of ...
... occurs most often in it, thus avoiding a large amount of computation and reducing the task to finding the most frequent term in the surrogate of the text. 2 Experiment setting The aim of the experiment was to test the possibility and success of clas- sification of medical documents based on taxonomy ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3
-
Serbian NER&Beyond: The Archaic and the Modern Intertwinned
U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...... evaluation. Web users can naviga- te to http://ner.jerteh.rs/ in order to apply the SrpCNNER model directly on input text. The model can also be applied to a custom- size collection of text files using the previously mentioned NER&Beyond web platform. story), https://zenodo.org/communities/eltec 7 SrpELTeC ...
... entity, so the evaluators were asked to identify and anno- tate them when they occur in text. SrpNER does not recognize WORK entity either, but these annotations were in many cases added by volunteer readers during text correction. Afterwards, students were given different no- vel chapters along with the ...
... distribution of different en- tity types over SrpELTeC-gold novels. The first four digits of text identifiers represent the year of the first publication of a novel. For some novels, NER was not performed on the whole text, but rather on randomly selected chapters. These annotated samples were also included ...Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... source language part of the aligned input corpus; 3. The extraction of the set of MWTs in the target language by Serb-TE (Input iii) was done: (a) on the target language part of the aligned chunks (chunk); (b) on the target language part of the aligned input sentences (text). Infotheca Vol. 19, No. 2 ...
... steps; C Number of distinct, lemmatised Serbian MWTs extracted from the target language part of the aligned chunks (for chunk) or from the target language part of the aligned input corpus (for text). Table 1. Numerical data that describes the results of the term extraction system Experiment A B C ...
... need to examine several settings of the experiment, which are conducted and discussed in the later text. The proposed approach is based on the following hypothesis: On the basis of bilingual, aligned, domain-specific textual re- sources, a terminological list and/or a term extraction tool in a source ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
-
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian
The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...... texts used in this research are shown in Table 2. The text 1984, Serbian translation of Orwell’s novel, was anno- tated according to the MULTEXT-East specification and in- cluded in MULTEXT-East resources (version 3) (Krstev et al., 2004). The text Verne, Serbian translation of the novel Around the ...
... on four different manually an- notated set of texts. Test set was compiled of 10% of each text used for training, and it can give a rough idea on how models perform when tagging similar, already familiar text. Verne, History and Novels represent texts previously un- known to the taggers and show their ...
... result when tagging unfamiliar text. Although TreeTagger TT19 seems to have better overall results, the performance of both tag- Figure 1: Part-of-Speech tagging accuracy per token on test sets, for each of trained models gers drops significantly when tagging unknown text. Figure 2: nPoS-tagging accuracy ...Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... resources for linguistic text pro- cessing; 2.5 Repeated linguistic preprocessing with ex- panded dictionaries for verification of recognition of new lemmas. 3. MWUs extraction 3.1. Application of syntactic graphs to extract MWUs with different syntactic structures from the same text (detailed description ...
... bager kašikar (case 6, NXN) is detected in the analyzed text in the genitive case bagera kašikara it may be erroneously in- terpreted as a MWU of a form NNg (case 3) in the genitive case. Consequently, all NNg con- structions in an analyzed text that appear in the genitive case (which happens very ...
... domains in terminological dictionaries using lexical resources and local grammars in our approach are: 1. Linguistic preprocessing of the input plain text file from the chosen domain using Unitex. 2. Analysis of unrecognized words as the most probable source of terminology and expand- ing the dictionary ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Knowledge Graphs in the Era of Large Language Models: Opportunities and Challenges
Pojava velikih jezičkih modela (eng. Large Language Models ili LLMs) je značajno uticala na oblast veštačke inteligencije, naročito u oblastima obrade prirodnog jezika i generisanju teksta. Međutim, ključno ograničenje ovih modela leži u nedostatku strukturiranog znanja i sposobnosti zaključivanja, što otežava njihovu primenu u stvarnom svetu, gde se zahteva tačnost iznetih činjenica i zaključivanje na osnovu konteksta. S druge strane, grafovi znanja nude primamljivo rešenje. Oni pružaju bogat izvor strukturiranog znanja, tako što predstavljaju entitete i njihove relacije u ...grafovi znanja, veliki jezički modeli, obrada prirodnog jezika, strukturirano znanje, kvalitet podataka, objašnjiva veštačka inteligencija, bezbednost sadržaja na internetuDanka Jokić, Ranka Stanković, Jelena Jaćimović. "Knowledge Graphs in the Era of Large Language Models: Opportunities and Challenges" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024., University of Belgrade - Faculty of Philology (2024)
-
Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit
U digitalnom okruženju južnoslovenskih jezika, analiza emocija u tekstovima na društvenim mrežama postaje sve važnija za razumevanje javnog mnjenja, kreiranje personalizovanog sadržaja i analizu međusobnih interakcija korisnika. U okviru ovog rada predstavljamo detaljnu metodologiju i rezultate označavanja korpusa na srpskom jeziku prema Plutčikovom modelu kategorizacije, koji prepoznaje osam osnovnih emocionalnih kategorija, kao što su radost, tuga, bes, strah, poverenje, gađenje, iščekivanje i iznenađenje. Cilj istraživanja je da se analizira emocionalni sadržaj tekstova preuzetih sa društvenih mreža X (nekada Twitter) ...Milena Šošić, Ranka Stanković, Jelena Graovac. "Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024., University of Belgrade - Faculty of Philology (2024)
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... in a given text, with the possibility of adding hypernym literals. D. Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text align- ment tool XAlign. The module enables the transformation of texts aligned by XAlign into ...
... glasshouse” from the corresponding synsets in English wordnet were included in query. B. Aligned text search When a bilingual query is applied to an aligned text, WS4QE generates a filtered aligned document in TMX for- mat. Namely, based on the expansion of the query, which can be mo ...
... are ex- tracted from aligned text and inserted in the filtered docu- ment. As we have already mentioned, documents in differ- ent formats, such as XML, TXT and HTML, can subse- quently be generated from the TMX document filtered in this way. Fig. 7 Aligned segments with highlighted ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... linguists. is is a very long and there- fore costly process. 64 Statistical Machine Translation Source Text Target Text Text Analysis (Formatting, Morphology, Syntax, etc.) Text Generation Translation Rules 10: Machine translation (left: statistical; right: rule-based) In the late 1980s ...
... behind the scenes of larger software systems. Text summarisation and text generation are two bor- derline areas that can act either as standalone applica- tions or play a supporting role. Summarisation attempts to give the essentials of a long text in a short form, and is one of the features available ...
... of sentence extraction, and the text is reduced to a subset of its sentences. An alternative approach, for which some research has been carried out, is to generate brand new sentences that do not exist in the source text. is requires a deeper un- derstanding of the text, which means that so far this ap- ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... of tagger models packaged in NLTK that can be trained. Every tagger has an evaluation procedure that strips down the tags from the given text, tags the text with the newly created tagger and reports the accuracy on all tokens. This measure will be used for comparing different taggers. The simplest ...
... 83 90.51 86.95 Training Time 1143s 1343s 3074s Useful tagger model is one which generalizes well to the text from the other domains. That’s why we tested our best taggers on the text that stayed out of the training and validation phases. Results can be seen in Figure 3. Fig. 3. Accuracy ...
... performed later in the pipeline. One basic task is PoS (Part of Speech) tagging, a process of assigning a part of speech category to each token in the text. The program that performs tagging is called tagger. The taggers can be created in multiple ways. In this paper, we will create a tagger for Serbian ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
-
Towards Automatic Definition Extraction for Serbian
U radu su prikazani preliminarni rezultati automatske ekstrakcije kandidata za definicije rečnika iz nestrukturiranih tekstova na srpskom jeziku u cilju ubrzanja razvoja rečnika. Definicije u rečniku Srpske akademije nauka i umetnosti (SANU) korišćene su za modelovanje različitih tipova definicija (opisnih, gramatičkih, referentnih i sinonimskih) koje imaju različite sintaksičke i leksičke karakteristike. Korpus istraživanja sastoji se od 61.213 definicija imenica, koje su analizirane korišćenjem morfoloških e-rečnika i lokalnih gramatika implementiranih kao pretvarači konačnih stanja u paketu za obradu korpusa otvorenog ...... A finite state transducer “passes” through the text it analyses to compare a text chunk with the model it represents. In the case of successful recognition, a final state transducer produces some result, which can be a modification of the source text by adding tags for types of recognized 1 Un ...
... result of OCR errors that remained in the text, but we are working on correcting them.3 4.2 Recognition of Candidates in the Textbook Corpus To determine whether it is possible to recognize definitions of domain-specific terms in the domain corpus text, a subset of local grammars presented in Section ...
... definition extraction in free-and semi-structured text. In Proceedings of the 13th Linguistic Annotation Workshop, 2019, pp. 124–131. Stanković, R., Stijović, R., Vitas, D., Krstev, C. & Sabo O. (2018). The Dictionary of the Serbian Academy: from the Text to the Lexical Database. In: Proceedings of the ...Ranka Stanković, Cvetana Krstev, Rada Stijović, Mirjana Gočanin, Mihailo Škorić. "Towards Automatic Definition Extraction for Serbian" in Proceedings of the XIX EURALEX Congress of the European Assocition for Lexicography: Lexicography for Inclusion (Volume 2). 7-9 September (virtual), Democritus University of Thrace (2021)
-
Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages
Jelena Lazarević, Olivera Kitanović (2024.)Cilj rada je istraživanje kolokabilnosti kao načina na koji se leksičke jedinice povezuju sa rečima iz različitih kategorija, formirajući veće jedinice. Istraživanje semantičkih i sintaksičkih principa ovih kombinacija u španskom i srpskom jeziku fudbala izvedeno je na komparabilnim fudbalskim korpusima SrFudKo i EsFudko, razvijenim u okviru doktorske disertacije Jelene Lazarević pod nazivom: Jezičke odlike diskursa novih medija o fudbalu: kontrastivna analiza na korpusu srpskog i španskog jezika. Korpus fudbala SrFudKo, kreiran na osnovu tekstova o fudbalu sa pet srpskih veb-portala: ...Jelena Lazarević, Olivera Kitanović . "Contrastive Analysis of Syntax Patterns in Comparable Football Corpora in Spanish and Serbian Languages" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Management of aligned parallel texts Parallel texts, which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph, sentence, etc) by matching the corresponding segments of the original and its translation. Aligned parallel texts ...
... candidate words for a synset by searching aligned texts with words from the original PWN synset and words he/she has already selected for the target synset. Then, if a highlighted word found in the text in English does not have a highlighted match in the text in the target language, the lexicographer ...
... WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool [3]. The module converts these texts to the Translation Memory eXchange (TMX) format, which is becoming the standard format for aligned texts. Figure 4 depicts the ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
GIS Application Improvement with Multilingual Lexical and Terminological Resources
... placement of descriptive text, or label, onto or next to features on a map is known as labelling. In ArcGIS, it refers specifically to the process of automatically generating and placing descriptive text for map features. A label in ArcGIS is dynamically placed and its text string is derived from ...
... al., 2008). Concept represents the core of GeolISS, and is implemented as an aggregation of geological vocabularies, collections of terms and text definitions of domain objects or collections of possible values for properties. Terms in the vocabularies are used to classify observations/i ...
... GeolISS. GeolISSTerm represents the core of GeolISS, and it is implemented as an aggregation of geological vocabularies, collections of terms and text definitions of things thought to exist in a domain or collections of possible values for properties. The terms in the vocabularies are used to ...Ranka Stanković, Ivan Obradović, Olivera Kitanović. "GIS Application Improvement with Multilingual Lexical and Terminological Resources" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, Valetta, Malta : European Language Resources Association (2010)
-
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... corpus of contemporary Serbian (Vitas & Krstev, 2012; Utvić, 2014) and Serbian ELTeC Collection9. It consists of several text collections of different types, which reflect text variability. For the first collection with contemporary novels (labelled CN), the sentences were extracted from seven novels ...
... digitized volumes was reported in Stijović and Stanković (2017). Dictionary entries from five volumes were automatically parsed and stored as a structured text in a lexical database, which offers the opportunity to use this data for extraction of different kinds of knowledge, as well as knowledge about examples ...
... entry was produced, and a lexical database model was developed (Stanković et al., 2018). The conversion of the SASA dictionary from unstructured text into a lexical database consisted of a thorough analysis of formatting conventions that were used for typesetting dictionary entries, as well as ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
-
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... place with very little human intervention, starting from the tokenization and lexical analysis of a raw text up to production of dictionary entries. The system relies Unitex routines for text analysis and FST application, while one of the many functionalities of LeXimir is used to produce dictionary ...
... English and Chinese corpora is described in (Pantel&Lin, 2001), while Chen and his associates present a MWT extraction system based on co-related text-segments within a set of documents (Chen et al., 2006). Statistical measures of co-occurrence (MI3 – mutual information) were used for finding ...
... for the evaluation, without deleting any candidate lemmas from it. In general, the longest match for the MWU is looked for. For example, if a text sequence matches the AXAXN pattern (a noun preceded by two adjectives that agree with it in gender, number, case and animateness), then a lower ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
-
The Many Faces of SrpKor
Акроним СрпКор означава фамилију електронских корпуса савременог српског језика чија је изградња почела крајем седамдесетих година прошлога века, а која је постала шире видљива заинтересованој истраживачкој заједници објављивањем његове прве верзије на вебу 2002. године. У овом дугом периоду, посебно пре појаве корисних текстуелних ресурса на вебу, развој корпуса се састојао у прикупљању и обради грађе као и у развоју метода обраде корпуса. Наиме, електронски корпус није само колекција текстова у дигиталном облику (како се то, на пример, наводи ...Duško Vitas, Ranka Stanković, Cvetana Krstev. "The Many Faces of SrpKor" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)