Претрага
180 items
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... databases for this study were created, from the collection of the corpus to the export of completed database, which can then be used in several ways. 2.1 Collecting textual corpus The basic idea was for the database to be based on a corpus of texts containing determiners which express positive or negative ...
... determined regular expressions for improved search), which should lead to a greater number of determiner extractions in the text. Execution begins with optional fields check. If the field is checked, doc- ument goes through several Regex.Replace functions that search for known deviations of several determiners ...
... pendent, the system would be language-independent as well. If it turns out to be valid, this method could allow machine learning the usage of huge corpus of texts that are pre-labeled with determiners. 1.1 Review of their former similar studies In 2005 a series of experiments with the classification ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, VikipodaciRanka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Proširivanje upita zasnovano na leksičkim resursima
U radu je opisano kako se leksički resursi za srpski jezik i softverski alati, razvijeni u okviru Grupe za jezičke tehnologije Univerziteta u Beogradu, mogu koristiti za unapređenje postavljanja upita. Rezultati pretrage mogu biti značajno unapređeni korišćenjem različitih leksičkih resursa, kakvi su morfološki rečnici i semantičke mreže. Izloženi pristup može se iskoristiti i u Sistemu naučnih, tehnoloških i poslovnih informacija, jer je efikasno pretraživanje ovog dragocenog resursa, imajući u vidu njegovu heterogenost i obim, kao i preovladavajući tekstualni sadržaj, ...... how resources and tools developed within the Human Language Technology Group at the University of Belgrade can be used for improvement of queries. Search results can be substantially improved by using various lexical resources, such as morphological dictionaries and semantic networks. The outlined ...Ranka Stanković, Ivan Obradović, Cvetana Krstev. "Proširivanje upita zasnovano na leksičkim resursima" in SNTPI 09 - Naučno-stručni skup Sistem naučnih, tehnoloških i poslovnih informacija, Beograd 19. i 20. jun 2009, Beograd : Fakultet informacionih tehnologija (2009)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... developed on basis of LeXimir, and it enables expansion of queries submitted to the Google search engine [6]. Integrated lexical resources enable modifications of user queries for both monolingual and multi-lingual search. The main feature of WS4QE is that it enables inflection of simple words and MWUs supplied ...
... The calculation is performed on the basis of dictionaries described in [10] and [11] that are part of the standard distribution of Unitex [12], a corpus processing system based on the finite-state technology. 4 Authors Suppressed Due to Excessive Length Table 1. Initial content of the Serbian m ...
... 237–240 6. Krstev, C., Stanković, R., Vitas, D., Obradović, I.: The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines. In: 6th LREC, Marrakech, Marocco (2008) 7. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Pro- cessing. MIT Press ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Structural characterization of traditional pottery produced from local clay, Rujište (Ražanj, Central Serbia), in an effort to preserve its geoheritage
... fine-grained fraction in the basic mass. Microcrystalline clasts could not be accurately identified. An X-ray powder diagram of the sample and database search revealed that the following minerals mailto:maja.milosevic@rgf.bg.ac.rs mailto:alena.zdravkovic@rgf.bg.ac.rs mailto:b.djordjevic@narodnimuzej.rs ...Maja Milošević, Biljana Đorđević, Alena Zdravković . "Structural characterization of traditional pottery produced from local clay, Rujište (Ražanj, Central Serbia), in an effort to preserve its geoheritage" in 9th International Conference Mineralogy and Museums, Sofia, Bulgaria, Earth and Man National Museum and Bulgarian Mineralogical Society (2021)
-
The number of unimodular roots of some reciprocal polynomials
Dragan Stankov (2020)We introduce a sequence P2n of monic reciprocal polynomials with integer coefficients having the central coefficients fixed. We prove that the ratio between number of nonunimodular roots of P2n and its degree d has a limit when d tends to infinity. We present an algorithm for calculation the limit and a numerical method for its approximation. If P2n is the sum of a fixed number of monomials we determine the central coefficients such that the ratio has the minimal limit. ...Algebraic integer, the house of algebraic integer, maximal modulus, reciprocal polynomial, primitive polynomial, Schinzel-Zassenhaus conjecture, Mahler measure, method of least squares, cyclotomic polynomialsDragan Stankov. "The number of unimodular roots of some reciprocal polynomials" in Cmptes rendus mathematique (2020). https://doi.org/10.5802/crmath.28
-
GNSS Time Series as a Tool for Seismic Activity Analysis Related to Infrastructure Utilities
Sanja Tucikešić, Ankica Milinković, Branko Božić, Ivana Vasiljević, Mladen Slijepčević . "GNSS Time Series as a Tool for Seismic Activity Analysis Related to Infrastructure Utilities" in Contributions to International Conferences on Engineering Surveying. Springer Proceedings in Earth and Environmental Sciences, Dubrovnik, Croatia, October 22-23, 2020, Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51953-7_21
-
Development of Open Educational Resources (OER) for Natural Language Processing
In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...... improve this situation as it can prove useful both for linguists working in corpus linguistics and computer scientists developing NLP applications. The participant will become familiar with the use of Unitex, the corpus processing system for which many valuable resources for Serbian were already ...
... oriented processing of texts in human languages. As an illustration, a string oriented web interface to the Corpus of Contemporary Serbian 13 is presented. [13][14] 2. Unitex corpus processing system is presented from the practical point of view: how to install it and start working with it ...
... within BAEKTEL project of its OER version within the edX BAEKTEL platform. 2 The main features of Unitex, an open access and open source corpus processing system, are presented in Section 4. Section 5 presents course content with didactic criteria and specific formats used in the OER course ...Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
-
SrpELTeC on Platforms: Udaljeno čitanje, Aurora, NoSketch
Serbian ELTeC collection (100 novels and extended) developed within COST action CA16204 Distant Reading for European Literary History comprises at this moment 111 novels published in the period 1840-1920. Such a valuable resource is and will be used for various lexical and linguistic research, by using different tools and methodologies. In this paper, three platforms on which these novels are published will be presented: “Udaljeno ˇcitanje”, Aurora and Sketch Engine.Ranka Stanković, Mihailo Škorić, Petar Popović. "SrpELTeC on Platforms: Udaljeno čitanje, Aurora, NoSketch" in Infotheca, Faculty of Philology, University of Belgrade (2022). https://doi.org/10.18485/infotheca.2021.21.2.7
-
Football terminology: compilation and transformation into OntoLex-Lemon resource
У овом раду представља се пројекат који је у развоју, креирање првог дигиталног фудбалског речника на српском језику, као и да демонстрација примене модела OntoLex и љегових модула. OntoLex-FrAC модул укључује информације о учесталости и примерима употребе екстрахованих из корпуса. У овом случају, креиран је корпус за специфичан домен под називом СрФудКо, који садржи чланке вести о фудбалу на српском језику. Вишечлани термини аутоматски су екстраховани из српског корпуса, а затим ручно евалуирани и класификовани као спортски или ...Jelena Lazarević, Ranka Stanković, Mihailo Škorić, Biljana Rujević. "Football terminology: compilation and transformation into OntoLex-Lemon resource" in LDK 2023 – 4th Conference on Language, Data and Knowledge, 12-15 September in Vienna, Austria, Lisabon : NOVA FCSH - CLUNL (2023). https://doi.org/10.34619/srmk-injj
-
Composition of organic matter and thermal maturity of Mesozoic and Cenozoic sedimentary rocks in East Herzegovina (External Dinarides, Bosnia and Herzegovina)
This paper presents the first data on the organic matter and thermal maturity of Mesozoic and Cenozoic sedimentary rocks in the East Herzegovina region of the External Dinarides. Representative, organic-rich samples from outcropping sedimentary rocks of different ages in the area (Triassic to Neogene) were selected and analysed. The organic matter was studied by Rock-Eval pyrolysis and under the microscope in reflected non-polarized light and incident blue light. The results obtained show the presence of different types of organic ...... Pannonian Basin. Technics – Mining, Geology and Metallurgy, Special Edition, 63: 43–47. Kotenev, M., 2015. �e hydrocarbon potential of Albania. AAPG Search and Discovery Article #10710 (2015) [https://www. searchanddiscovery.com/documents/2015/10710kotenev/ ndx_kotenev.pdf; pp. 12.] Lafargue, E., Marquis ...
... Pannonian Basin. Technics - Mining, Geology and Metallurgy, Special Edition, 63: 43-47. Kotenev, M., 2015. The hydrocarbon potential of Albania. AAPG Search and Discovery Article #10710 (2015) [https://www. searchanddiscovery.com/documents/2015/10710kotenev/ ndx_kotenev.pdf; pp. 12.] Lafargue, E., Marquis ...
... REESA – An expert system for geochemical logging of wells. AAPG Annual Convention Abstract, Calgary, Alberta, Canada, June 22–25, 1992. AAPG Search and Discovery Article #91012 p. 103. Peters, K. E., Walters, C. C. & Moldowan, J. M., 2005. �e Biomarkers Guide, Vol. 2 – Biomarkers and Isotopes ...Nikoleta Aleksić, Aleksandar Kostić, Miloš Radonjić. "Composition of organic matter and thermal maturity of Mesozoic and Cenozoic sedimentary rocks in East Herzegovina (External Dinarides, Bosnia and Herzegovina)" in Annales Societatis Geologorum Poloniae (2021). https://doi.org/10.14241/asgp.2021.16
-
OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian
Ovaj rad predstavlja novi jezički resurs za pretraživanje i istraživanje verbalnih aspektnih parova u BCS (bosanskom, hrvatskom i srpskom), kreiran korišćenjem principa Lingvističkih Povezanih Otvorenih Podataka (LLOD). Pošto ne postoji resurs koji bi pomogao učenicima bosanskog, hrvatskog i srpskog kao stranih jezika da prepoznaju aspekt glagola ili njegove parove, kreirali smo novi resurs koji će korisnicima pružiti informacije o aspektu, kao i link ka aspektnim parovima glagola. Ovaj resurs takođe sadrži spoljne linkove ka monolingvalnim rečnicima, Wordnetu i BabelNetu. ...Ranka Stanković, Maxim Ionov, Medina Bajtarević, Lorena Ninčević. "OntoLex Publication Made Easy: A Dataset of Verbal Aspectual Pairs for Bosnian, Croatian and Serbian" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
-
Annotation of the Serbian ELTeC Collection
Ovaj rad predstavlja takozvano izdanje nivoa 2 kolekcije tekstova SrpELTeC razvijene u okviru aktivnosti Radne grupe 2 – Metode i alati COST akcije CA 16204 (Distant Reading for European Literary History) i njene specifikacije šeme. Izdanje nivoa 2 je nastavak izdanja nivoa 1, koje se koristi kao ulaz za morfosintaksičke i NER anotacije romana. Srpska obrada nivoa-2 je navedena kroz potrebne korake, uključujući metode i alate koji se koriste u tom procesu. Neki statistički podaci iz srpske kolekcije nivoa ...udaljeno čitanje, literarni korpus, tagiranje, prepoznavanje imenovanih entiteta, lematizacija, ELTeCRanka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Mihailo Škorić. "Annotation of the Serbian ELTeC Collection" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.2.3
-
Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit
U digitalnom okruženju južnoslovenskih jezika, analiza emocija u tekstovima na društvenim mrežama postaje sve važnija za razumevanje javnog mnjenja, kreiranje personalizovanog sadržaja i analizu međusobnih interakcija korisnika. U okviru ovog rada predstavljamo detaljnu metodologiju i rezultate označavanja korpusa na srpskom jeziku prema Plutčikovom modelu kategorizacije, koji prepoznaje osam osnovnih emocionalnih kategorija, kao što su radost, tuga, bes, strah, poverenje, gađenje, iščekivanje i iznenađenje. Cilj istraživanja je da se analizira emocionalni sadržaj tekstova preuzetih sa društvenih mreža X (nekada Twitter) ...Milena Šošić, Ranka Stanković, Jelena Graovac. "Social-Emo.Sr: Emotional Multi-Label Categorization of Conversational Messages from Social Networks X and Reddit" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024., University of Belgrade - Faculty of Philology (2024)
-
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... overall design of our system (Fig- ure1) is as follows: 1. Input: • A sentence-aligned domain-specific corpus in- volving a source and a target language. We will denote an entry in this corpus with S(text.align) ↔ T (text.align); • A list of terms from the same domain in a source language (both s ...
... extracted from the target part of the aligned corpus having some expected syntac- tic structure. We will denote an entry from this list with T (term.extract). 2. Processing: • Aligning bilingual chunks (possible translation equivalents) from the aligned corpus. We will denote aligned chunks with S(align ...
... contained 491,990 translation pair candidates. We decided to enrich corpus with additional parallel lists (described in Subsection 4.4.) since we observed certain improvement in evaluations of translation quality. First we splitted corpus of aligned sentences into three disjoint parts: training (80%), ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
-
Improving efficiency of thermal power plants through mine coal quality planning and control
Главни циљ контроле квалитета угља у рудницима лигнита је снабдевање термоелектрана угљем чији квалитет мора да се креће унутар одређених квалитативних ограничења. Карактеристике угља могу да утичу на ефикасност, поузданост и расположивост како котла тако и јединица за контролу емисије. У овом раду аутори су презентовали интегрисану симулацију рударског процеса као нови приступ у истраживању променљивости калоричне вредности угља приликом експлоатације комплексног лежишта лигнита. Резултати таквог приступа омогућавају драгоцен увид у перформансе континуалног рударског система у смислу контроле променљивости ...Mirjana Banković, Dejan Stevanović, Milica Pešić, Aleksandra Tomašević, Ljiljana Kolonja. "Improving efficiency of thermal power plants through mine coal quality planning and control" in Thermal Science, Vinča Institute of Nuclear Sciences (2018). https://doi.org/10.2298/TSCI170605209B
-
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... supported by the corpus data. It is common for lexicographers to look for examples in the corpus of contemporary Serbian (SrpKor, developed by D. Vitas and a group of collaborators from University of Belgrade, http://www.korpus.matf.bg.ac.rs/korpus/), which is being used as a control corpus, but they rarely ...
... to the corpus made of examples, we prepared a control dataset derived from various texts, which was used as a sample corpus for dictionary example extraction. The control dataset of example candidates was obtained from the digital library Biblisha8 (Stanković et al., 2017), SrpKor – the corpus of c ...
... syntactic features in a corpus compiled of examples from the five digitized volumes of the Serbian Academy of Sciences and Arts (SASA) dictionary. The initial set of features was inspired by a similar approach for other languages. The feature distribution of examples from this corpus is compared with the ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
-
Serbian ELTeC Sub-Collection in Wikidata
This paper presents an example of integration of Wikidata with digital libraries and external systems, as well as some best practices for speeding up the process of data preparation and import to Wikidata, on the use case of SrpELTeC, Serbian subcollection of the ELTeC multilingual collection (European Literary Text Collection). After preliminary work on the manual Wikidata population with SrpELTeC novels, the goal was to automate the process of preparing and importing information, so different solutions were analysed and ...Milica Ikonić Nešić, Ranka Stanković, Biljana Rujević. "Serbian ELTeC Sub-Collection in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.2.4
-
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... in SerWN. In addition to that, we plan to broaden the set of parallel resources, and search for new pairs of aligned literals for synsets, which will then be manually post-edited. We also plan to use parallel corpus based methodologies relying on two strategies proposed in ((Oliver et al., 2015)) for ...
... wordnets. The English part of each corpus was semantically tagged, after which the process of wordnet creation was transformed into a word alignment problem, where wordnet synsets in the English part of the corpus were aligned with in the target language part of the corpus. The obtained precision was s ...
... with domain-specific single and multi- word expressions. They used a large monolingual Slovene corpus of texts to extract terminology from the domain of informatics, and a parallel English-Slovene corpus and an online dictionary as bilingual resources to facilitate the addition of new terms to sloWNet ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
-
INVENTS: A Hybrid Mine Ventilation Planning and Design System
Ventilation system analysis is a complex process based on the calculation and analysis of numerous parameters. These problems can be successfully solved by the SimVent numerical package, but a full understanding and use of the obtained results require the involvement of an experienced specialist in the ventilation field. The solution was found in the creation of a hybrid system INVENTS, whose knowledge base represents a formalization of the expert knowledge in the mine ventilation field. In this paper we ...... thus initiating the execution of the application process logic. Figure 6 depicts main interface forms containing various controls that enable text search and editing, picture presentation, communication with data base, creation of business diagrams, etc. Fig. 6. Interface forms of the SimVent software ...Lilić Nikola, Stanković Ranka, Obradović Ivan. "INVENTS: A Hybrid Mine Ventilation Planning and Design System" in Proceedings of International Scientific Conference of FME Session 4: Automation Control and Applied Informatics , Hong Kong : iConcept Press (2013)