Претрага
83 items
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... databases for this study were created, from the collection of the corpus to the export of completed database, which can then be used in several ways. 2.1 Collecting textual corpus The basic idea was for the database to be based on a corpus of texts containing determiners which express positive or negative ...
... pendent, the system would be language-independent as well. If it turns out to be valid, this method could allow machine learning the usage of huge corpus of texts that are pre-labeled with determiners. 1.1 Review of their former similar studies In 2005 a series of experiments with the classification ...
... and language-neutral determiner strings will be used. Goal is to create a fully language-independent system that would greatly broaden the possible corpus. 1 Users of Twitter platform have an option to additionally mark their posts with tags so that posts that talk about a certain topic can be found ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
Terminology Acquisition and Description Using Lexical Resources and Local Grammars
Acquisition of new terminology from specific domains and its adequate description within terminological dictionaries is a complex task, especially for languages that are morphologically complex such as Serbian. In this paper we present an approach to solving this task semi-automatically on basis of lexical resources and local grammars developed for Serbian. Special attention is given to automatic inflectional class prediction for simple adjectives and nouns and the use of syntactic graphs for extraction of Multi-Word Unit (MWU) candidates for ...... transducers using CasSys tool incorporated in Unitex1 corpus processing platform, as well as the use of TMF standard for the representation of terms is proposed in (Ammar et al., 2015) and applied on Arabic scientific and technical corpus. In (Savary et al., 2012) terminology extraction in the ...
... ported that modern statistical Natural Language Processing (NLP) is in great need of better lan- guage models and linguistic tools must come to 1 Corpus processing System Unitex: http://www-igm.univ- mlv.fr/~unitex/ Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada ...
... extraction In order to evaluate our approach, we applied it to a collection of 74 papers in Serbian from the journal Infotheca. 6 The size of the corpus is 6 Infotheca - Journal for Digital Humanities (http://infoteka.bg.ac.rs/index.php/en/infoteca) Proceedings of the conference Terminology and ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Biljana Lazić. "Terminology Acquisition and Description Using Lexical Resources and Local Grammars" in Proceedings of the 11th Conference on Terminology and Artificial Intelligence, Granada, Spain, 2015, Granada : LexiCon (Universidad de Granada) (2015)
-
Multi-word Expressions for Abusive Speech Detection in Serbian
Ovaj rad predstavlja istraživanja na usavršavanju i unapređenju srpske verzije rečnika Hurtlex, višejezičnog leksikona uvredljivih reči. Posebnu pažnju posvećujemo dodavanju izraza sa više reči (polileksemskih jedinica) koji se mogu smatrati uvredljivim, jer su takvi leksički zapisi veoma važni za postizanje dobrih rezultata u mnoštvu zadataka otkrivanja uvredljivog jezika. Srpski morfološki rečnici se koriste kao osnova za čišćenje podataka i stvaranje rečnika. Istaknuta je veza sa drugim leksičkim i semantičkim resursima na srpskom jeziku i predviđena je izgradnja sistema za ...... the domain corpus of hateful content and Subjectivity lexicon of Therese Wilson in combination with the SentiWordNet (Esuli and Sebastiani, 2006).For clas- sification, they leveraged rules and achieved a result of F1 = 0.783 for strongly hateful sentences on a manually annotated domain corpus. Razavi ...
... hyperbole, litotes etc. Initial work on detecting some of these figures has been presented in (Mladenović et al., 2017; Krstev et al., 2020). Using a corpus of newspaper articles from 2006, Krstev et al. (2007) presented the results of an infor- mation search experiment in search of attacks which are the ...
... speech (1260), MAYBE – could lead to abusive content (462), NO – not abusive (2902). The manual classification was supported by search over a Twitter corpus collected specifically for his research, Web 79 A ADV N PRO V (blank) Total maybe 93 12 152 0 168 37 462 no 432 142 978 17 1333 2902 yes 213 39 ...Ranka Stanković, Jelena Mitrović, Danka Jokić, Cvetana Krstev. "Multi-word Expressions for Abusive Speech Detection in Serbian" in Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Association for Computational Linguistics (2020)
-
EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School
Prva škola za obuku polaznika koju je organizovala COST akcija NexusLinguarum održana je od 8. do 12. februara 2021. godine sa ciljem da studenti, istraživači i stručnjaci nauče osnove lingvističke nauke o podacima. Tokom obuke polaznici su se upoznali sa širokim spektrom tema: od semantičkog veba, RDF -a i ontologija, do modeliranja i pretraživanja jezičkih podataka pomoću najsavremenijih ontoloških modela i alata. Škola je održana u okviru serije letnjih škola EUROLAN-a i organizovalo ju je virtuelno (onlajn) nekoliko instituta; ...nauka o lingvističkim podacima, povezani podaci u lingvistici, jezički podaci, EUROLAN, NexusLinguarum, COST akcija, škola za obuku... September 2021 115 Dojchinovski M. et al., eurolan 2021: . . . Linked Data. . . , pp. 113–120 Ponsoda 2017), FrAC 12 – frequency, attestation and corpus Informa- tion (Chiarcos et al. 2020). Finally, the training school ended with a closing session where an ontology of participants, lecturers and ...
... and building on to present more specific topics in a detailed fashion on the last day, the participants had 12. FrAC – Frequency, Attestation and Corpus Information - Ontology-Lexica Community Group 116 Infotheca Vol. 21, No. 1, September 2021 Professional paper a chance to acquire a solid foundation ...
... Lex Frac module was used for representation of the entries from the lexicon used for abusive speech detec- tion with attestations from the Twitter corpus with annotation of abusive spans (Jokić et al. 2021). 3 Organization Due to the COVID-19 pandemic and current travel restrictions in Europe and beyond ...Milan Dojchinovski, Julia Bosque Gil, Jorge Gracia, Ranka Stanković. "EUROLAN 2021: Introduction to Linked Data for Linguistics Online Training School" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.7
-
From DELA Based Dictionary to Leximirka Lexical Database
Biljana Lazić, Mihailo Škorić (2020)In this paper, we will present an approach in transforming Serbian language Morphological dictionaries from a DELA text format to a lexical database dubbed Leximirka. Considering the benefits of storing data within a database when compared to storing them in textual documents, we will outline some of the functionality that the database has made possible. We will also show how hand-made rules that use category labels lexical entries are marked with can be used to link lexical entries. ...... 000 most frequent words in the Serbian Corpus of the Serbian Language SrbCorp (version of 122 million words by Duško Vitas and Miloš Utvić)6. Information about the Corpus is stored in the KorpusMeta table. The LexicalRelation table stores information 6 Corpus of the Serbian Language – SrbCorp 86 ...
... that match the specified search criteria appear as rows in the table. The registered user has access to multiple corpus searches (in the MatKorp and SrpKorpRGF corpora). The Mining Corpus (RudKorp) (Tomašević et al., 2018) that can be searched by some predefined queries that retrieve a word searched ...
... their main importance is their reusability. They were used for the basic tasks of word processing, automatic recognition 1 Unitex is cross-platform Corpus Processing Suite to retrieve data. Infotheca Vol. 19, No. 2, December 2019 81 Lazić B., Škorić M., “From DELA based dictionary to . . . ”, pp ...Biljana Lazić, Mihailo Škorić. "From DELA Based Dictionary to Leximirka Lexical Database" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.4
-
Infotheca (Q25460443) in Wikidata
Ranka Stanković, Lazar Davidović (2021)Vikipodaci su baza znanja Zadužbine Vikimedija koja predstavlja zajednički izvor različitih vrsta podataka koje koriste ne samo drugi Vikipedijini projekti, već sve više i brojne aplikacije semantičkog veba. U ovom radu ćemo prezentovati primer integracije Vikipodataka sa digitalnim bibliotekama i eksternim sistemima, kao i mogućnost ubrzanja pripreme i unosa podataka na primeru radova iz časopisa za digitalnu humanistiku Infoteka.... gual lexical extraction based on word alignment for improving corpus search.” The Electronic Library. Krstev, Cvetana, Jelena Jaćimović, Branislava Šandrih, and Ranka Stanković. 2019. “Analysis of the first Serbian Literature Corpus of the Late 19th and Early 20th century with the TXM platform.” ...
... data network was used by Andonovski (Андоновски 2020) to describe lan- guage resources, namely, novels forming part of the Serbian-German literary corpus (Andonovski, Šandrih, and Kitanović 2019). For a number of years now, students at the Faculty of Mining and Geology have been undergoing training ...
... уносу метаподатака о српским романима из корпуса srpELTeC 13 COST Action CA16204 (2017-2021) metadata about Serbian novels included in the srpELTEC corpus is being entered into the knowledge base (Krstev et al. 2019) and Wikidata linked to various applications, one of which is Au- rora.14 Members of JeRTeh ...Ranka Stanković, Lazar Davidović. "Infotheca (Q25460443) in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.5
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... produce aligned wordnets. The English part of each corpus was semantically tagged, after which the process of wordnet creation was transformed into a word alignment problem, where wordnet synsets in the English part of the corpus were aligned with in the target language part of the corpus. The obtained ...
... addition to that, we plan to broaden the set of parallel resources, and search for new pairs of aligned literals for synsets, which will then be manually post-edited. We also plan to use parallel corpus based methodologies relying on two strategies proposed in ((Oliver et al., 2015)) for automatic ...
... five literals. The majority of SerWN synsets are aligned with corresponding PWN 3.0 synsets via the Interlingual Index, with the exception of a little over 1,000 Serbian specific synsets that do not exist in PWN. In our research we used 20,221 aligned synsets (from a previous version of SerWN), coupled ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
Transformer-Based Composite Language Models for Text Evaluation and Classification
Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660
-
Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model
Ova studija predstavlja analizu sentimenta srpskih starih romana iz perioda 1840-1920, koristeći veliki jezički model (LLM) Mistral za tehniku učenja sa zasnovani na takozvanim "zero" i "few-shot" pokušajima. Glavni pristup uvodi inovacije osmišljavanjem istraživačkih upita (promptova) uključuju tekst sa uputstvom za klasifikaciju bez primera i na osnovu nekoliko primera, omogućavajući jezičkom modelu da klasifikuje osećanja u pozitivne, negativne ili objektivne kategorije. Ova metodologija ima za cilj da pojednostavi analizu osećanja ograničavanjem odgovora, čime se povećava preciznost ...Milica Ikonić Nešić, Saša Petalinkar, Mihailo Škorić, Ranka Stanković, Biljana Rujević. "Advancing Sentiment Analysis in Serbian Literature: A Zero and Few-Shot Learning Approach Using the Mistral Model" in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Sofia, Bulgaria, 9-10 September 2024, LREC | COLING (2024)
-
The Nooj System as Module within an Integrated Language Processing Environment
... multilingual texts. WS4LR handles aligned texts as well. A pair of semantically equivalent texts in different languages, such as an original text and its translation, that are aligned on a structural level (paragraph, sentence, phrase, etc.) is known as an aligned text or bitext. One of the supported ...
... WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool (Bonhomme 2001). Parallel texts which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph ...
... becoming the standard format for aligned texts. Figure 7 depicts the form with different possibilities for TMX document management. Aligned texts can be visualized in various ways by choosing the appropriate XSLT stylesheet. Namely, the user can obtain the aligned text in HTML format, but also ...Ranka Stanković, Duško Vitas, Cvetana Krstev. "The Nooj System as Module within an Integrated Language Processing Environment" in Proceedings of the 2007 International Nooj Conference, Cambridge Scholars Publishing (2008)
-
Frequency and Length of Syllables in Serbian
Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová (2019)Basic analyses of several properties of syllables (the rank-frequency distribution, the distribution of length, and the relation between length and frequency) in Serbian is presented. The syllabification algorithm used combines the maximum onset principle and the sonority hierarchy. Results indicate that syllables behave similarly to words as far as mathematical models are concerned, but values of parameters in models for syllables are quite different from those for words.... onsets and codas. If one follows his modification, a large enough corpus is needed to perform statistical tests, based on which a decision on the (non-) marginality of a particular consonant cluster is made. Finding or creating such a corpus can be problematic for minor languages (such as e.g. Lower and ...
... socialist realist novel “Kak zakalyalas’ stal’” (How the Steel Was Tempered) by N. Ostrovsky. The choice is motivated by the fact that a parallel corpus consisting of the first ten chapters of the novel and their translations to all standard Slavic languages (except for Lower Sorbian) is available ...
... for Croatian), or using the approach suggested by Pulgram (1970) and modified by Lehfeldt (1971), with its drawback of needing a sufficiently large corpus (Kelih & Mačutek, 2013, for Russian and Slovene), or not at all (because the mean syllable length in words was sufficient for the purposes of the ...Marija Radojičić, Biljana Lazić, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Ján Mačutek, Lívia Leššová. "Frequency and Length of Syllables in Serbian" in Glottometrics (2019)
-
Part of Speech Tagging for Serbian language using Natural Language Toolkit
Ranka Stanković, Boro Milovanović (2020)Dok se razvijaju složeni algoritmi za NLP (obrada prirodnog jezika), osnovni zadaci kao što je označavanje ostaju veoma važni i još uvek izazovni. NLTK (Natural Language Toolkit) je moćna Python biblioteka za razvoj programa zasnovanih na NLP-u. Pokušavamo da iskoristimo ovu biblioteku za kreiranje PoS (vrsta reči) oznake za savremeni srpski jezik. Jedanaest različitih modela je kreirano korišćenjem NLTK API-ja za označavanje. Najbolji modeli se transformišu sa Brill tagerom da bi se poboljšala tačnost. Obučili smo modele na označenom ...... George Orwell, part of MULTEXT-East resources [9]. INTERA (Integrated European language data Repository Area) is a project that produced multilingual corpus on law, health and education [10]. Around the world in 80 days is a novel by Jules Verne annotated during SEE-ERA.net project [11]. ELTeC (European ...
... International Conference on Computational intelligence, man-machine systems and cybernetics, Tenerife, Spain, Dec. 2009 [6] M. Utvić, “Annotating the Corpus of Contemporary Serbian,” INFOtheca, vol. 12 no. 2 pp 36a-47a, Dec. 2011 [7] M. Constant, C. Krstev, and D. Vitas “Lexical Analysis of Serbian ...
... Piperidis, V. Giouli, N. Calzolari, M. Monachini, C. Soria, and K. Choukri, “Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology,” Proc. Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, May 2006 [11] D. l. Tufis ...Ranka Stanković, Boro Milovanović. "Part of Speech Tagging for Serbian language using Natural Language Toolkit" in 7th International Conference on Electrical, Electronic and Computing Engineering IcETRAN 2020, Academic Mind, Belgrade (2020)
-
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... for which we could have used the TreeTagger trained for Serbian that was used for the lemmatization of the Corpus of Contemporary Serbian [16]. However, this lemmatizer was trained on a corpus that differs significantly from our collection, and additionally it does not take into account MWUs. The approach ...
... singular). However, a large number of other forms cannot be found by scanning the text, for example, the form zlata (genitive singular) cannot be aligned with the query keyword key zlato (nominative singular). The disadvantage of the system based on text scanning which affects the precision is especially ...
... much as possible [7]. These local grammars were organized in cascades that further resolve ambiguities [10]. NER system was evaluated on a newspaper corpus and results reported in [7] showed that F -measure of recognition was 0.96 for types and 0.92 fot tokens. For the purpose of indexing, we applied ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
-
Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian
Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek (2021)Biljana Rujević, Marija Kaplar, Sebastijan Kaplar, Ranka Stanković, Ivan Obradović, Jan Mačutek. "Quantitative analysis of syllable properties in Croatian, Serbian, Russian, and Ukrainian" in Language and Text: Data, models, information and applications, John Benjamins Publishing Company (2021). https://doi.org/10.1075/cilt.356.04ruj
-
Речници у дигиталном добу - информатичка подршка за српски језик
Биљана Рујевић (2022)Морфолошки речници српског језика представљају електронски језички ресурс који има значајну историју развоја и коришћења за потребе обраде природних језика. С обзиром на то да су чувани у облику датотека чији је број нарастао па је самим тим управљање речницима постало отежано јавила се потреба за смештањем информација из речника у облик лексикографске базе. Како би се омогућио симултани рад на развоју речника за више корисника јавила се потреба за веб-апликацијом заснованој на лексикографској бази. Како би се размотриле ...Биљана Рујевић. Речници у дигиталном добу - информатичка подршка за српски језик, Београд : [Б. Рујевић], 2022
-
Keyword Extraction from Parallel Abstracts of Scientific Publications
... author(s), publication date, title, keywords, abstract etc.) and are aligned at the sentence level [15,16]. For the research presented in this paper, we used a collection of 50 bilin- gual documents with approximately 4,800 aligned sentences. Since papers were published bilingually, they were already ...
... mining. During the period of 2004–2012, the journal published 55 papers bilingually, in Serbian and in English. These papers are available online as aligned parallel text in the Biblisha1 digital library, as well as separate documents. The Biblisha digital library contains scientific publications from other ...
... extraction method. The method is based on the structural and statistical properties of text represented as a complex network. The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the con- ...Slobodan Beliga, Olivera Kitanović, Ranka Stanković, Sanda Martinčić-Ipšić . "Keyword Extraction from Parallel Abstracts of Scientific Publications" in Sematic Keyword-Based Search on Structured Data Sources - Third International KEYSTONE Conference, IKC 2017 Gdańsk, Poland, September 11–12, 2017 Revised Selected Papers and COST Action IC1302 Reports, Springer (2017)
-
Integrisano okruženje za pripremu paralelizovanog korpusa
Razvoj paralelizovanih korpusa zahteva pripremu paralelnih tekstova za njihovu integraciju u paralelizovani korpus. Reč je o jednom kompleksnom zadatku koji se može rešiti na različite načine, i koji mora da se odvija u nekoliko koraka. U ovom radu najpre je iznet postupak pripreme paralelnih tekstova za paralelizovani korpus koji se koristi u Grupi za jezičke tehnologije Univerziteta u Beogradu. Potom je dat kratak pregled programa (XAlign, Concordancier, WS4LR), odnosno softverskih alata koji se pri tome koriste. Nedostatak udobnog okruženja ...... Varga. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC'06, ELRA, Paris, 2006. [6] Tomaž Erjavec: Compiling and Using the IJS-ELAN Parallel Corpus. Informatica, 26(3), pp. 299-307 ...
... 299-307, 2002. SUMMARY The development of aligned corpora requires a preparation of parallel texts for their integration into aligned corpora. This is a very complex task, which can be solved in different ways, and which has to be realized in several of steps. At the beginning of this paper we ...
... environment for the preparation of aligned corpora, under the name of ACIDE. For the construction of this environment we chose the C# programming language. Among other things, ACIDE provides a graphical user interface (GUI) for alignment and visualization of aligned texts, their control and correction ...Ivan Obradović, Ranka Stanković, Miloš Utvić. "Integrisano okruženje za pripremu paralelizovanog korpusa" in Zbornik radova međunarodnog simpozijuma Razlike između bosanskog/bošnjačkog, hrvatskog i srpskog jezika, Graz, Austria, April 2007, - (2007)
-
Softverski alati za korišćenje resursa za srpski jezik
Ivan Obradović, Ranka Stanković (2008)... other resourc- es, such as the e-corpus of Serbian, as well as parallel multilingual corpora composed of par- allel texts or bi-texts, usually comprising two texts of which one is original, and the other its translation. The majority of these parallel texts are aligned, which means that relations are ...
... research related to paraphrasing (Barzilay i McKeown, 2001). The Human Language Technology Group developed several aligned corpora, among them the largest one being the French-Serbian corpus which contains more than a million words (Vitas and Krstev, 2005). 3 WS4LR – a tool for maintenance and integrated ...
... Paris, Masson. Steinberger, R., Pouliquen B., Widiger A., Ignat C., Er- javec T., Tufiş D., Varga D. (2006) “The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languag- es”, Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’2006). Ge- noa, Italy, ...Ivan Obradović, Ranka Stanković. "Softverski alati za korišćenje resursa za srpski jezik" in INFOteka: časopis za informatiku i bibliotekarstvo, Belgrade, Serbia : Zajednica biblioteka univerziteta u Srbiji (2008)
-
Keyword-Based Search on Bilingual Digital Libraries
This paper outlines the main features of Biblisha, a tool that offers various possibilities of enhancing queries submitted to large collections of aligned parallel text residing in bilingual digital library. Biblishsa supports keyword queries as an intuitive way of specifying information needs. The keyword queries initiated, in Serbian or English, can be expanded, both semantically, morphologically and in other language, using different supporting monolingual and bilingual resources. Terminological and lexical resources are of various types, such as wordnets, electronic ...Ranka Stanković, Cvetana Krstev, Duško Vitas, Nikola Vulović, Olivera Kitanović. "Keyword-Based Search on Bilingual Digital Libraries" in Semantic Keyword-Based Search on Structured Data Sources - Second COST Action IC1302 International KEYSTONE Conference, IKC 2016, Springer (2017). https://doi.org/10.1007/978-3-319-53640-8_10
-
A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals
This paper outlines the main features of Bibliša, a tool that offers various possibilities of enhancing queries submitted to large collections of TMX documents generated from aligned parallel articles residing in multilingual digital libraries of e-journals. The queries initiated by a simple or multiword keyword, in Serbian or English, can be expanded by Bibliša, both semantically and morphologically, using different supporting monolingual and multilingual resources, such as wordnets and electronic dictionaries. The tool operates within a complex system composed ...... for testing our tool. The interest in collections of aligned texts and tools tailored for their search is increasing substantially, primarily due to the growing needs of statistical machine translation. Thus, for example, the OPUS corpus offers freely available parallel corpora in many languages ...
... multilingual proper name databases, which enables, among other things, versatile handling of both monolingual and aligned or comparable texts. LeXimir provides for enhanced querying of aligned texts by using available lexical resources to perform semantic and morphological expansion of queries. The ...
... aimed for search of document collections consisting of aligned parallel texts converted in TMX (Translation Memory eXchange) format. TMX is an open XML-based standard intended for easier exchange of translation memory data, that is, aligned parallel texts, between tools and translation vendors ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Aleksandra Trtovac, Miloš Utvić. "A Tool for Enhanced Search of Multilingual Digital Libraries of E-journals" in Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, May 2012, Istanbul, Turkey, Istanbul, Turkey : European Language Resources Association (2012)