Чији је пример? Анализа лексичких обележја на примерима Речника САНУ
У овом раду поставља се питање: да ли се може утврдити ко је аутор неког текста уколико се анализирају искључиво његова лексичка обележја? Како бисмо покушали да добијемо одговор на ово питање, посматрали смо примере у оквиру речничког чланка појединачне лексеме Речника САНУ, који су забележени у пет томова (и то: I, II, XVIII, XIX и XX). Сваки пример је преузет из неког извора на шта упућују скраћенице, наведене у заградама. Од преко 5.000 понуђених извора, определили смо се ...... текста (енгл. Text Summarization), лексичког раш- члањавања (енгл. Dependency Parsing), обележавања текста према врсти речи (енгл. Part-of-Speech Tagging), лематизације (енгл. Lemmatization), препозна- вања именованих ентитета (енгл. Named Entity Recognition), класификације текста (енгл. Text Classification) ...
... Београд: Институт за српски језик САНУ, 115–119. Едер и др. 2016: M. Eder, J. Rybicki & M. Kestemont, Stylometry with R: a package for computational text analysis, R Journal 8(1): 107–121. Закон о Речнику Српске академије наука и уметности: http://www.mpn. gov.rs/wp-content/uploads/2015/08/zakon_o_r ...
... ACADEMY OF SCIENCE AND ARTS Summary The question we ask ourselves in this paper is the following: Is it possible to determine who is the author of a text by analyzing various lexical features? In order to try to get an answer, we observed examples that support lexical entries listed in five of the total ...Бранислава Б. Шандрих, Ранка М. Станковић, Мирјана С. Гочанин. "Чији је пример? Анализа лексичких обележја на примерима Речника САНУ" in Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch13
Combining Heterogeneous Lexical Resources
Cvetana Krstev, Duško Vitas, Ranka Stanković, Ivan Obradović, Gordana Pavlović-Lažetić. "Combining Heterogeneous Lexical Resources" in Proceedings of the Fourth Interantional Conference on Language Resources and Evaluation, Lisabon, Portugal , May 2004, vol. 4, ELRA - European Language Resources Association (2004)
Unraveling Innerworkings of Magmatic System Beneath the East Pacific Rise 9º50’N
Milena Marjanović, Suzanne M. Carbotte, Alexandre Stopin, Felix Waldhauser, Satish C. Singh, René-Édouard Plessix, Miloš Marjanović, Malden R. Nedimović, Juan Pablo Canales, Hélène D. Carton, Javier Escartin, John C. Mutter (2021)... c Institution; Woods Hole, MA, USA 8Laboratoire de Géologie, Ecole Normale Supérieure (CNRS UMR), PSL Research University; Paris, France Text: Volcanic activity is readily observed and monitored for subaerial volcanoes; however, due to inaccessibility, little is known about the dynamics ...Milena Marjanović, Suzanne M. Carbotte, Alexandre Stopin, Felix Waldhauser, Satish C. Singh, René-Édouard Plessix, Miloš Marjanović, Malden R. Nedimović, Juan Pablo Canales, Hélène D. Carton, Javier Escartin, John C. Mutter. "Unraveling Innerworkings of Magmatic System Beneath the East Pacific Rise 9º50’N" in AGU Fall Meeting 2021, American Geophysical Union (2021)
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian
Uvredljivi govor na društvenim medijima, uključujući psovke, pogrdni govor i govor mržnje, dostigao je nivo pandemije. Sistem koji bi bio u stanju da detektuje takve tekstove mogao bi da pomogne da internet i društveni mediji postanu bolji virtuelni prostor sa više poštovanja. Istraživanja i komercijalna primena u ovoj oblasti do sada su bili fokusirani uglavnom na engleski jezik. Ovaj rad predstavlja rad na izgradnji AbCoSER-a, prvog korpusa uvredljivog govora na srpskom jeziku. Korpus se sastoji od 6.436 ručno označenih ...... classification: for a given statement, the system needs to determine whether it contains hate speech or not [36]. To achieve this goal systems usually apply text mining techniques. The majority of current hate speech, offensive, and abusive language detection systems in social media are based on lexicons or ...
... detection of these phe- nomena ([51, 52, 1, 6]). Warner and Hirschberg [44] presented their research on hate speech toward minority groups in online text, with the main focus on anti-semitic language. Three annotators manually annotated a corpus of 1000 paragraphs taken from offensive websites and Yahoo ...
... omission of diacritics, and different Unicode characters; 5. Use of foreign language words and emoticons (e.g. :’-),:-P, :@)); 6. Twitter-specific text: mentions, retweets and URLs as well as hashtags (e.g. #TLZP, #Utisak, #u6reci). 2.2 General Corpus annotation for classiĄcation of tweets The r ...Danka Jokić, Ranka Stanković, Cvetana Krstev, Branislava Šandrih. "A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian" in 3rd Conference on Language, Data and Knowledge (LDK 2021), MDPI AG (2021). https://doi.org/10.4230/OASIcs.LDK.2021.13
Electronic Dictionaries - from File System to lemon Based Lexical Database
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... options for the lexical model were considered. The first one were TEI (Text Encoding Initiative) Guidelines for dictionary de- scription. TEI is a widely accepted standard for text encod- ing that proposes solutions for many text types, one of them being dictionaries. However, it seems that TEI is ...
... duplicates (e.g. should atlas be one lemma or two lemmas that have same inflectional behavior, one denot- ing a book with maps and having markers +Conc+Text, the other denoting a type of a fabric and having markers +Conc+Mat). The consistency check is missing as well (e.g. can a marker +Hum be assigned ...
... E ’lemon’ ontology and lexicons. Semantic Web, 6:363– 369. Vitas, D., Pavlović-Lažetić, G., and Krstev, C. (1993). Electronic dictionary and text processing in Serbo- Croatian. Sprache–Kommunikation–Informatik, 1:225. 10. Language Resource References Krstev, Cvetana and Vitas, Duško. (2015) ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Mihailo Škorić. "Electronic Dictionaries - from File System to lemon Based Lexical Database" in Proceedings of the 11th International Conference on Language Resources and Evaluation - W23 6th Workshop on Linked Data in Linguistics : Towards Linguistic Data Science (LDL-2018), LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Geologic Information System of Serbia
Geologic information system of Serbia (GeolISS) represents repository for digital archiving, query, retrieving, analysis and geologic data visualization. The GeolISS is implemented through ESRI ArcGIS technology, and is designed to operate as a personal geodatabase (MS Jet 4.0 Engine) and SDE enterprise geodatabase in MS SQL Server. The objective of GeolISS implementation is integration of existing geologic archives, data from published maps at different scales, newly acquired field data, as well as Web publishing of geologic information. Physical implementation ...... field data records and measurements i.e. the basis for classified features, interpretations and models. Any observed property can be expressed as a text, number, picture and geometry (location). Spatial entities are treated as observation localities and mapped/interpreted geologic entities (oc ...Branislav Blagojević, Branislav Trivić, Ranka Stanković, Nenad Banjac, Olivera Kitanović. "Geologic Information System of Serbia" in Proceedings of the 17th Meeting of the Association of European Geological Societies, 14.-18. september 2011., Beograd : Srpsko geološko društvo (2011)
Praktikum za vežbe iz Informatike 1
Ranka Stanković, Ivan Obradović, Olivera Kitanović, Mirjana Banković. Praktikum za vežbe iz Informatike 1, Beograd : Univerzitet u Beogradu, Rudarsko-geološki fakultet, 2014
The correlation of Upper Miocene lithostratigraphic units of the southern part of the Pannonian basin
Filip Anđelković, Dejan Radivojević (2019)... practical role: a better understanding of the time and character of deposition leads to a better understanding of petroleum gelogy. Further in the text, those units with formation rank are listed. Geologists who study the Pannonian basin use different names for the same fomations, thus making correlation ...Filip Anđelković, Dejan Radivojević. "The correlation of Upper Miocene lithostratigraphic units of the southern part of the Pannonian basin" in Knjiga sažetaka i radova II Kongresa geologa Bosne i Hercegovine, Laktaši BiH, oktobar 2-4, 2019, Udruženje geologa u Bosni i Hercegovini (2019)
Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface
Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos (2024)Predstavljamo trenutne aktivnosti na definisanju interfejsa leksikona i korpusa koji će služiti kao referenca u prikazu polileksemskih jedinica - višečlanih izraza - (različitih tipova - imenskih, glagolskih, itd.) u specijalizovanim leksikonima i povezivanju ovih unosa sa njihovim pojavljivanjima u korpusima. Konačni cilj je korišćenje ovakvih resursa za automatsko identifikovanje višečlanih izraza u tekstu. Uključivanje nekoliko prirodnih jezika ima za cilj univerzalnost rešenja koje nije usredsređeno na određeni jezik, kao i prilagođavanje idiosinkrazijama. Raspravljaju se izazovi u leksikografskom opisu višerečnih ...Verginica Barbu Mititelu, Voula Giouli, Kilian Evang, Daniel Zeman, Petya Osenova, Carole Tiberius, Simon Krek, Stella Markantonatou, Ivelina Stoyanova, Ranka Stankovic, Christian Chiarcos. "Multiword Expressions between the Corpus and the Lexicon: Universality, Idiosyncrasy and the Lexicon-Corpus Interface" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
Threshold-induced correlations in the Random Field Ising Model
... while the external waiting time Text is the time between two consecutive subavalanches thresholded from two di�erent avalanches (avalanches i and j in Fig. 1a). A further distinction can be made among di�erent types of contribution to the external waiting time Text(i, j; Vth); thus, see Fig. 1a, ...
... times, all for the subavalanches above thresholds. The subavalanches are taken from a family of response signals observed under conditions which are aligned according to the collapsing requirements together with the corresponding collaps- ing predictions. Thus, in panel a, the data are scaled in agreement ...Sanja Janićević, Dragutin Jovković, Lasse Laurson, Đorđe Spasojević. "Threshold-induced correlations in the Random Field Ising Model" in Scientific Reports, Springer Science and Business Media LLC (2018). https://doi.org/10.1038/s41598-018-20759-6
Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom
... vocabulary knowledge, as it has been calculated that minimum autonomy at the tertiary level starts at around 3000 words allowing a learner to read a text without the need to refer constantly to dictionaries or the teacher, we hypothesized that with the use of flashcards and judicious use of L1 in the ...
... advocate that ties with L1 may stimulate deeper processing, facilitate negotiating of metalinguistic knowledge, foster understanding the meaning of the text, and enable vocalizing the thoughts of the learners (Llach, 2009; Lázaro and García Mayo, 2012; Storch and Aldosari, 2010). The question of students’ ...Lidija Beko, Ivan Obradović, Ranka Stanković. "Developing Students’ Mining and Geology Vocabulary Through Flashcards and L1 in the CLIL Classroom" in Proceedings of the Second International Conference on Teaching English for Specific Purposes and New Language Learning Technologies, May, 22-24, 2015, Niš, Serbia, Faculty of Electronic Engineering, University of Niš, Niš : Faculty of Electronic Engineering (2015)
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... on Средства личне заштите Постојећи маркери лампа, самоспасилац, маска, шлем, чизме, заштитна опрема, заштитно одело +Text Типови рударских докумената +Mining+Text рударска пројектна документација, геолошки елаборат, претходна сту- дија оправданости, студија оправданости, рударски пројекат ...Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... Internet. This component consists of morphological dictionaries, WordNet, domain specific terminological resources such as GeolISSterm, RudOnto, aligned texts in TMX format, corpora etc. Special attention will be given to Termi, newly developed application for terminology management. Keywords: Open ...
... rely greatly on various NLP tools to help them cater to a large number of students from all over the world. These tools may include assessment of text and speech, writing assistants, automatic generation of exercises, wrap up questions and online instructional environments [3]. The main goal of ...
... transducers applied on domain corpus to extract terminology. Examples of patterns are presented in [15]. After applying these transducers on domain text extracted potential terms were evaluated. Results presented in previous paper were satisfying enough to speed up the development of a terminological ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
Developing Termbases for Expert Terminology under the TBX Standard
... statistical machine translation (SMT), an approach developed at IBM in the late 1980s, now the state-of-the art paradigm in MT. The exponential growth of aligned multilingual corpora greatly improved the efficiency and accuracy of SMT in general, and many tools based on this ap- proach, such as Google Translate ...
... t❤❛♥ Developing Termbases under the TBX Standard 13 are still bound to maintain their importance in the case of expert terminology in domains where aligned corpora are sparse [10], such as, for example mining engineering or geology. In order to secure terminological consistency in one or more termbases ...
... approach also envisages in- tegration with cascades for named entity recognition such as mining equipment, specific minerals and the like. Building of an aligned Serbian-English corpus of texts in the area of mining and geology from sources like the bilingual jour- nal “Underground Mining” are underway. The ...Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)
Resource-based WordNet Augmentation and Enrichment
In this paper we present an approach to support production of synsets for SerbianWordNet(SerWN)byadjustingPrincetonWordNet(PWN)synsetsusing several bilingual English-Serbian resources. PWN synset definitions were automatically translated and post-edited, if needed, while candidate literals for Serbian synsets were obtained automatically from a list of translational equivalents compiled form bilingual resources. Preliminary results obtained from a setof1248selectedPWNsynsetsshowthattheproducedSerbiansynsetscontain 4024 literals, out of which 2278 were offered by the system we present in this paper, whereas experts added the remaining 1746. Approximately one half of ...... five European languages to produce aligned wordnets. The English part of each corpus was semantically tagged, after which the process of wordnet creation was transformed into a word alignment problem, where wordnet synsets in the English part of the corpus were aligned with in the target language part ...
... five literals. The majority of SerWN synsets are aligned with corresponding PWN 3.0 synsets via the Interlingual Index, with the exception of a little over 1,000 Serbian specific synsets that do not exist in PWN. In our research we used 20,221 aligned synsets (from a previous version of SerWN), coupled ...
... literals per synset in SerWN is 1.66. After the initial analysis, we used the 20,221 aligned synsets to produce a parallel list of 72,262 literals for the purpose of this research. For example, we used the aligned synset pair: building, edifice -- zgrada, kuća to produce the following list: building ...Ranka Stanković, Miljana Mladenović, Ivan Obradović, Marko Vitas, Cvetana Krstev. "Resource-based WordNet Augmentation and Enrichment" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018)
A Tel Platform Blending Academic And Entrepreneurial Knowledge
... language support system also handles aligned texts or bitexts, pairs of semantically equivalent texts in different languages, such as an original text and its translation, that are aligned on a structural level (paragraph, sentence, phrase, etc.). Aligned texts in BAEKTEL enable better understanding ...
... understanding of OER and follow the standard format for representing aligned texts, the Translation Memory eXchange format (TMX) that is XML-compliant. It should finally be mentioned that due to the complex Serbian grammar the language support system also features grammars implemented ...
... ) corpora of lessons and texts in written form, and functionalities for searching and browsing of terminological resources and using them for text annotation. The contents of these resources conform to the methodic/didactic quality criteria and contain very rich metadata sets that enable ...Ivan Obradović, Ranka Stanković, Jelena Prodanović, Olivera Kitanović. "A Tel Platform Blending Academic And Entrepreneurial Knowledge" in Proceedings of the The Fourth International Conference on e-Learning (eLearning-2013), September 2013, Belgrade, Serbia, Belgrade, Serbia : Belgrade Metropolitan University (2013)
Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)
Biljana Rujević, Mihailo Škorić (2024)The paper describes linking the Digital Repository of the University of Belgrade, Faculty of Mining and Geology, with the eScience system in terms of transferring metadata about the results of researchers' scientific work. The steps taken to ensure a smooth harvesting of metadata are outlined. Additionally, a presentation of additional improvements to the OAI system is provided, aiming to contribute to the automatic linking of authors with their results in the eScience system.Biljana Rujević, Mihailo Škorić. "Data from the Digital Repository of the Faculty of Mining and Geology in eScience (eNauka)" in Infotheca, Faculty of Philology, University of Belgrade (2024). https://doi.org/10.18485/infotheca.2023.23.2.4
FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain
U radu se daje kratak prikaz teorije semantike okvira, na kojoj je zasnovana leksička baza Frejmnet. Predstavljena je koncepcija ove mreže, kao i mogućnosti njene primene. Predstavljena je i leksička analiza koja se primenjuje u projektu izrade Frejmneta i ukazano na razlike između analize zasnovane na okviru u odnosu na analizu zasnovanu na reči. Zatim je prikazano nekoliko povezanih okvira koje prizivaju reči iz domena rizika. U radu je predstavljena i platforma NLTК pomoću koje se mogu koristiti ...... natural language pro- cessing Python suite that accesses continually increasing number of corpora and lexical resources. NLTK offers different types of text processing, amongst which are: classification, tokenization, stemming, tagging, parsing and se- mantic reasoning. The NLTK system uses wrappers for ...
... ́ 2021). The current version contains 4.1 million words. It comprises project documentation (26%), legislation (11%), doctoral dissertations (31%), text- books and other mining literature (32%) (Kitanović et al. 2021, 8). Figure 5. Concordances for adjective-noun pattern containing the noun ризик ...
... The results of a CQL19 (Corpus Query Language) query are analyzed for: frequency lists, collocations, concordances with a narrower and broader con- text. Figure 5 shows the concordances extracted from the Leximirka20 digital dictionary management web app (Stanković et al. 2018) of the adjective-noun ...Aleksandra Marković, Ranka Stanković, Natalija Tomić, Olivera Kitanović. "FrameNet Lexical Database: Presenting a Few Frames Within the Risk Domain" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.1.1
A Data Driven Approach for Raw Material Terminology
Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja (2021)The research presented in this paper aims at creating a bilingual (sr-en), easily searchable, hypertext, born-digital, corpus-based terminological database of raw material terminology for dictionary production. The approach is based on linking dictionaries related to the raw material domain, both digitally born and printed, into a lexicon structure, aligning terminology from different dictionaries as much as possible. This paper presents the main features of this approach, data used for compilation of the terminological database, the procedure by which it has ...sirovine, rudarstvo, terminologija, rečnik, terminološka aplikacija, mobilna aplikacija, digitizacija, leksički podaci, korpusi, otvoreni povezani podaci... were found in both datasets: extracted from text and retrieved from dictionaries, namely, a total of 2285 English and 308 Serbian terms. The GIZA++ [43] and Moses toolkit [44] for statistical machine translation (SMT) were used for word alignment. Aligned chunks, presented in the so-called phrase table ...
... The bilingual corpus of texts aligned on the sentence level was produced from the bilin- gual digital library Bibliša. The initial set of 55 documents containing 4831 aligned Serbian- English sentences [29] was enlarged with 44 new documents containing 12,657 aligned sentences from the raw material ...
... by the results obtained from MD. The Serbian part of MD that contains headwords was transformed into a text, which was then analysed by SrpMD. Out of 12,655 different single words found in the text produced from the dictionary, 9758 were recognized by SrpMD. Among the 2897 (23%) that were not recognised ...Olivera Kitanović, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić, Ivan Babić, Ljiljana Kolonja. "A Data Driven Approach for Raw Material Terminology" in Applied Sciences, MDPI AG (2021). https://doi.org/10.3390/app11072892
The analysis of the geothermal energy capacity for power generation in Serbia
Jana Stojković, Goran Marinković, Petar Papić, Mihailo Milivojević, Maja Todorović, Marina Ćuk (2013)... potential resource or to its exploitation. Geothermal water is usually heat-pumped for heating. Some of references will be mentioned later in this text. Stojiljkovi} et al. [3] describe an installation intended for a drying plant. A pilot plant was installed in the heat-exchange station at the Gejzir ...
