Претрага ⚒ Радови ⚒ Др РГФ - Репозиторијум РГФ

Претрага

Per page

Sort by

122 items

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Branislava Šandrih, Cvetana Krstev, Ranka Stanković (2019)

In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...

NER, Named Entity Recognition Systems, Serbian, Personal Names

... of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names Branislava Šandrih, Cvetana Krstev, Ranka Stanković Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Development and Evaluation of Three Named Entity Recognition Systems for ...
... functions. 1 Introduction Named Entity Recognition is the task of identi- fying named entities in text (Nadeau and Sekine, 2007), which is often used as a first step in ques- tion answering, information retrieval, anaphora resolution, topic modeling, etc. The first Named Entity set had 7 types (Grishman ...
... David Nadeau and Satoshi Sekine. 2007. A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes 30(1):3–26. E.F. Tjong Kim Sang. 2002. Introduction to the CoNLL-2002 Shared Task: Language-independent Named Entity Recognition. In COLING-02: The 6th Conference on Natural Language ...
Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić (2022)

In this paper we present the Serbian part of the ELTeC multilingual corpus of novels written in the time period 1840-1920. The corpus is being built in order to test various distant reading methods and tools with the aim of re-thinking the European literary history. We present the various steps that led to the production of the Serbian sub-collection: the novel selection and retrieval, text preparation, structural annotation, POS-tagging, lemmatization and named entity recognition. The Serbian sub-collection was published ...

Corpus, Distant Reading, Digital Humanities, Linked Data, Named Entity Recognition, Text Analytics

Ranka Stanković, Cvetana Krstev, Branislava Šandrih Todorović, Duško Vitas, Mihailo Škorić, Milica Ikonić Nešić. "Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection" in Proceedings of the Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022)
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

Ranka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović (2024)

U radu se prikazuju rezultati istraživanja vezanih za pripremu paralelnih korpusa, fokusirajući se na transformaciju u RDF grafove koristeći NLP Interchange Format (NIF) za lingvističku anotaciju. Pružamo pregled paralelnog korpusa koji je korišćen u ovom studijskom slučaju, kao i proces označavanja delova govora, lematizacije i prepoznavanja imenovanih entiteta (NER). Zatim opisujemo povezivanje imenovanih entiteta (NEL), konverziju podataka u RDF, i uključivanje NIF anotacija. Proizvedene NIF datoteke su evaluirane kroz istraživanje triplestore-a korišćenjem SPARQL upita. Na kraju, razmatra se povezivanje Linked ...

paralelni korpusi, povezivanje imenovanih entiteta, prepoznavanje imenovanih entiteta, NER, NEL, povezani podaci, NIF, Vikipodaci

Ranka Stanković, Milica Ikonić Nešić, Olja Perisic, Mihailo Škorić, Olivera Kitanović. "Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking" in Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, Turin, 20-25 May 2024, ELRA and ICCL (2024)
From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)

Milica Ikonić Nešić, Ranka Stanković, Christof Schöch and Mihailo Škorić (2022)

In this paper we present the wikification of the ELTeC (European Literary Text Collection), developed within the COST Action ``Distant Reading for European Literary History'' (CA16204). ELTeC is a multilingual corpus of novels written in the time period 1840—1920, built to apply distant reading methods and tools to explore the European literary history. We present the pipeline that led to the production of the linked dataset, the novels’ metadata retrieval and named entity recognition, transformation, mapping and Wikidata population, ...

Wikidata, linked data, SPARQL, distant reading, literary corpus, named entity linking, ELTeC

Milica Ikonić Nešić, Ranka Stanković, Christof Schöch and Mihailo Škorić. "From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)" in Proceedings of The 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association (2022)
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov (2024)

Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...

полилексемске језинице, именовани ентитет, вишезначност значења речи, складиште смисла, LLOD

Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
SrpELTeC: A Serbian Literary Corpus for Distant Reading

Ranka Stanković, Cvetana Krstev, Duško Vitas (2024)

U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu ...

digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analytics

Ranka Stanković, Cvetana Krstev, Duško Vitas. "SrpELTeC: A Serbian Literary Corpus for Distant Reading" in Primerjalna književnost, Research Centre of the Slovenian Academy of Sciences and Arts (2024). https://doi.org/10.3986/pkn.v47.i2.03
A System for Named Entity Recognition Based on Local Grammars

Krstev Cvetana, Obradović Ivan, Utvić Miloš, Vitas Duško (2014)

Krstev Cvetana, Obradović Ivan, Utvić Miloš, Vitas Duško. "A System for Named Entity Recognition Based on Local Grammars" in Journal of Logic and Computation 24 no. 2, :Oxford University Press (2014): 473-489. https://doi.org/10.1093/logcom/exs079
Serbian NER&Beyond: The Archaic and the Modern Intertwinned

Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić (2021)

U ovom radu predstavljamo srpski književni korpus koji se razvija pod okriljem COST Akcije „Distant Reading for European Literary History” CA16204. Koristeći ovaj korpus romana napisanih pre više od jednog veka, razvili smo i učinili javno dostupnim Sistem za prepoznavanje imenovanih entiteta (NER) obučen da prepozna 7 različitih tipova imenovanih entiteta, sa konvolucionom neuronskom mrežom (CNN), koja ima F1 rezultat od ≈91% na test skupu podataka. Ovaj model je dalje ocenjen na posebnom skupu podataka za evaluaciju. Završavamo poređenje ...

... Marieke van Erp. 2019. Evaluating Named Entity Recognition To- ols for Extracting Social Networks from Novels. PeerJ Computer Science, 5:e189. Francesca Frontini, Carmen Brando, Joanna Bys- zuk, Ioana Galleron, Diana Santos, and Ranka Stanković. 2020. Named Entity Recognition for Distant Reading in ELTeC ...
... Language Resources As- sociation. Ridong Jiang, Rafael E Banchs, and Haizhou Li. 2016. Evaluating and Combining Name Entity Recognition Systems. In Proceedings of the 6th Named Entity Workshop, pages 21–27. Cvetana Krstev. 2008. Processing of Serbian. Au- tomata, Texts and Electronic Dictionaries. Fa- ...
... differences and simila- rities that can be discovered between social networks extracted for different novels. Distant Reading Training School for Named Entity Recognition and Geo-Tagging for Litera- ry Analysis organized within the COST Action 162044 covered NER approaches in general, an- notation campaigns ...
Branislava Šandrih Todorović, Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić. "Serbian NER&Beyond: The Archaic and the Modern Intertwinned" in Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Methods and Applications, INCOMA Ltd. Shoumen, BULGARIA (2021). https://doi.org/10.26615/978-954-452-072-4_141
Indexing of textual databases based on lexical resources: A case study for Serbian

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2015)

In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...

... librarianship, etc. Geology is insufficiently covered — SWN presently contains only 157 synsets from this domain.4 Named Entity Recognition. According to [13] the term “Named Entity” (NE) usually refers to names of persons, locations and organizations, and numeric expressions including, time, date ...
... NE hierarchy in our Named Entity Recognition (NER) system consists of five top-level types: persons, organizations, locations, amounts, and temporal expressions, each of them having one or more levels of sub-types. Our tagging strategy allows nesting, which means that a named entity can be nested within ...
... Conference, GWC 2014. pp. 55–62. Tartu, Estonia (2014) 13. Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification. In: Sekine, S., Ranchhod, E. (eds.) Named Entities: Recognition, Classification and Use, pp. 3–28. John Benjamins Pub. Co., Amsterdam/Philadelphia (2009) 14. Salton ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Indexing of textual databases based on lexical resources: A case study for Serbian" in Semantic Keyword-based Search on Structured Data Sources : First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer (2015). https://doi.org/10.1007/978-3-319-27932-9_15
Creation of a Training Dataset for Question-Answering Models in Serbian

Ranka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov (2024)

Razvoj i primena veštačke inteligencije u jezičkim tehnologijama značajno su napredovali poslednjih godina, posebno u domenu zadatka odgovaranja na pitanja (Question Answering - QA). Dok su postojeći resursi za QA zadatke razvijeni za glavne svetske jezike, srpski jezik je relativno zanemaren u ovoj oblasti. Ovaj rad predstavlja inicijativu za kreiranje obimnog i raznovrsnog skupa podataka za obučavanje modela za odgovaranje na pitanja na srpskom jeziku, koji će doprineti unapređenju jezičkih tehnologija za srpski jezik. Pored brojnih istraživanja o jezičkim modelima ...

veštačka inteligencija, obrada prirodnog jezika, jezički resursi, anotirani skupovi, ekstrakcija informacija, odgovaranje na pitanja

Ranka Stanković, Jovana Rađenović, Maja Ristić, Dragan Stankov. "Creation of a Training Dataset for Question-Answering Models in Serbian" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024)
Development of Open Educational Resources (OER) for Natural Language Processing

Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević (2015)

In this paper we present the development of an online course at the edX BAEKTEL platform named “Lexical Recognition in the Natural Language Processing (NLP)”. It is based on the course of the same name for PhD studies at the University of Belgrade, Faculty of Philology. There are not many courses in Computational Linguistics (CL) on OER platforms, and there is none in Serbian either for CL or NLP. We have developed this course in order to improve this ...

E-Learning, Open Educational Resources, Computational Linguistics, Lexical Resources, edX

... construction of grammars for shallow parsing. The illustration of context use is presented by recognition of one named entity class. 9. The problems and solutions of multi-word unit (MWU) recognition are presented with emphasis on e-dictionaries of nominal MWUs, particularly their inflection ...
... transformation. Each graph in a cascade is a transducer that transforms a text. A graph that follows works on this transformed input. A full Named Entity Recognition System for Serbian is presented as an illustration. Didactic criteria Course combines different types of learning materials having ...
... sentiment analysis and semantics, discourse, machine translation, regular expressions, language models, text classification, and name entity recognition. All of them combine textual and video lectures with quizzes and assignments for self-evaluation. There are also courses in Italian ...
Cvetana Krstev, Biljana Lazić, Ranka Stanković, Giovanni Schiuma, Miladin Kotorčević. "Development of Open Educational Resources (OER) for Natural Language Processing" in The Sixth International Conference on e-Learning (eLearning-2015), September 2015, Belgrade, Serbia, Belgrade : Belgrade Metropolitan Univesity (2015)
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović (2017)

Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...

... MWU lem- mas. Approximately 28.5% of these lemmas represent proper names: personal, geopolitical, organizational, etc. Named Entity Recognition. According to [19] the term “Named Entity” (NE) usually refers to names of persons, locations and organizations, and numeric expressions including, time, date ...
... hierarchy in our Named Entity Recognition (NER) system consists of five top-level types: persons, organizations, locations, amounts, and tempo- ral expressions, each of them having one or more levels of sub-types. Our tag- ging strategy allows nesting, which means that a named entity can be nested within ...
... y rich language. J. Intell. Inf. Syst. 1–22, to appear 19. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. In: Sekine, S., Ranchhod, E. (eds.) Named Entities: Recognition, Classification and Use, pp. 3–28. John Benjamins Publishing Company, Amsterdam (2009) 20. Rehm ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić (2023)

The paper will present the results of the project `“It-Sr-NER: Web services for named entities recognition, linking and mapping,” in which teams from the University of Turin and the Society for Language Resources and Technologies JeRTeh participated, and whose goal was the development of the It-Sr-NER web service for named entity annotations in the text and displaying them on the map. Named entities in these services are names of persons, places, organizations, demonyms (ethnicities), events and works of art.

General Engineering

Olja Perišić, Ranka Stanković, Milica Ikonić Nešić, Mihailo Škorić. "It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map" in Infotheca, Belgrade : Faculty of Philology, University of Belgrade (2023). https://doi.org/10.18485/infotheca.2023.23.1.3
Serbian ELTeC Sub-Collection in Wikidata

Milica Ikonić Nešić, Ranka Stanković, Biljana Rujević (2021)

This paper presents an example of integration of Wikidata with digital libraries and external systems, as well as some best practices for speeding up the process of data preparation and import to Wikidata, on the use case of SrpELTeC, Serbian subcollection of the ELTeC multilingual collection (European Literary Text Collection). After preliminary work on the manual Wikidata population with SrpELTeC novels, the goal was to automate the process of preparing and importing information, so different solutions were analysed and ...

Википодаци, удаљенои читање, књижевни корпус, повезивање именованих ентитета, ELTeC, SrpELTeC

Milica Ikonić Nešić, Ranka Stanković, Biljana Rujević. "Serbian ELTeC Sub-Collection in Wikidata" in Infotheca, Faculty of Philology, University of Belgrade (2021). https://doi.org/10.18485/infotheca.2021.21.2.4
Rule-based Automatic Multi-word Term Extraction and Lemmatization

Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac (2016)

In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...

term extraction, terminology, multi-word units, lemmatization, finite-state transducers

... because linguistic resources and tools they used were underdeveloped. In (Małyszko et al., 2015) authors lemmatize multiword entity names (organization names and similar named entities found in a corpus of legislative acts) by using rules generated on the basis of corpora analysis. For tackling ...
... Granada: University of Granada, pp. 81--89. Malyszko, J., Abramowicz, W., Filipowska, A., & Wagner, T. (2015). Lemmatization of Multi-Word Entity Named for Polish Language Using Rules Automatically Generated Based on the Corpus Analysis. In Proc. of 7th Language & Technology Conference 2015, ...
... Serbian it is possible that two or more graphs recognize the same word sequence where only one of them is correct. In the case of such ambiguous recognition precedence is always given to the more probable case according to the predefined order of precedence of graphs and/or frequency of candidate ...
Ranka Stanković, Cvetana Krstev, Ivan Obradović, Biljana Lazić, Aleksandra Trtovac. "Rule-based Automatic Multi-word Term Extraction and Lemmatization" in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia, 23--28 May 2016, European Language Resources Association (2016)
Terminological and lexical resources used to provide open multilingual educational resources

Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić (2016)

Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...

otvoreni obrazovni resursi, leksički resursi, obrada prirodnih jezika, terminologija

... fs4v. The first part of entry is a lemma: učiteljica. N is a sign noun (part of speech), 651 is an inflectional class, +Hum is a marker for human entity and +GM is a marker for gender. After that, there is a part of entry for grammatical categories. F is gender feminine, s is sign for number - singular ...
... singular, nominative case and animate. Akušer is also male, singular, nominative case and animate noun. There are markers for a compound noun and human entity. According to data from 2014, Serbian morphological dictionary of simple words consists of 133,361 lemmas. Their production is 4,581,657word forms ...
... culinary etc.). 5. TERMINOLOGY EXTRACTION Bearing in mind rapid changes in scientific domains and new terms production, automatic terminology recognition and extraction has become an important task. The extracted terms are then included in ontologies. Even though attention has been raised to the ...
Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
Named Entity Recognition for Distant Reading in ELTeC

Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković (2020)

Akcija COST „Udaljeno čitanje za evropsku književnu istoriju“, koja je počela 2017. godine, ima među svojim glavnim ciljevima stvaranje višejezične zbirke evropskih književnih tekstova (ELTeC) otvorenog koda. U ovom radu predstavljamo rad koji je obavljen na ručnom označavanju selekcije ELTeC kolekcije za imenovane entitete, kao i na proceni postojećih alata za prepoznavanje imenovanih entiteta u pogledu njihove sposobnosti da automatski urade takve anotacije. U poslednjem paragrafu se razmatraju zajedničke tačke između ove inicijative i CLARIN-a.

... 2023-10-14 04:19:44 Named Entity Recognition for Distant Reading in ELTeC Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković Дигитални репозиторијум Рударско-геолошког факултета Универзитета у Београду [ДР РГФ] Named Entity Recognition for Distant Reading ...
... access, as well as the employees' publications. - The Repository is available at: www.dr.rgf.bg.ac.rs Annotation and Visualization Tools 37 Named Entity Recognition for Distant Reading in ELTeC Francesca Frontini P ra x il in g C N R S U n iv e rs i te P a u l-V a le ry M o n tp e llie r 3 Carmen ...
Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković. "Named Entity Recognition for Distant Reading in ELTeC" in CLARIN Annual Conference 2020, Oct 2020, Virtual Event, France, CLARIN (2020)
Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić (2020)

The training of new tagger models for Serbian is primarily motivated by the enhancement of the existing tagset with the grammatical category of a gender. The harmonization of resources that were manually annotated within different projects over a long period of time was an important task, enabled by the development of tools that support partial automation. The supporting tools take into account different taggers and tagsets. This paper focuses on TreeTagger and spaCy taggers, and the annotation schema alignment ...

Part-of-Speech tagging, lemmatization, corpus, evaluation, Serbian, morphological dictionary

... Part-of-Speech tagging (PoS-tagging). PoS-tagging precedes many other Natural Language Processing tasks, such as Text Classi- fication, Named Entity Recognition, Sentiment Analysis, Question Answering, etc. Computer programs that perform this task, the so-called ‘taggers’, can be based on lookup-tables ...
... MULTEXT-East-the Case of Serbian. Informatica, 28(4):431–436. Krstev, C., Obradović, I., Utvić, M., and Vitas, D. (2014). A system for named entity recognition based on lo- cal grammars. Journal of Logic and Computation, 24(2):473–489. Krstev, C. (2008). Processing of Serbian – Automata, Texts ...
... and named en- tities (NEs) (Verne, Švejk, Floods, History). Since the taggers developed within this research tagged only simple words these complex units had to be decom- posed into simple words. For instance, Devetnaesti vek ‘Nineteenth century’ which was tagged as tempo- ral named entity had ...
Ranka Stanković, Branislava Šandrih, Cvetana Krstev, Miloš Utvić, Mihailo Škorić. "Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian" in Proceedings of the 12th Language Resources and Evaluation Conference, May Year: 2020, Marseille, France, European Language Resources Association (2020)
A Lexical Approach to Acronyms and their Definitions

Cvetana Krstev, Duško Vitas, Ranka Stanković (2015)

In this paper we present a comprehensive approach to acronyms for Natural-Language Processing (NLP) of Serbian texts. The proposed procedure includes extraction of acronyms and their definitions that are usual Multi-Word Units (MWUs), shallow parsing of MWUs that enables MWU lemmatization and production of entries in morphological electronic dictionaries, both for MWU and acronyms, that are provided with grammatical, syntactic, semantic and domain information. This approach enables representation that reflects complex relations between acronyms and their definitions.

... Republička agencija za telekomunikacije ‘Re- public Agency for Telecommunications’), sometimes let- Figure 1: A Many-to-many relation between an entity names and acronyms. Names and acronyms given in italic are possibilities that are not realized for the given example. ters that are not initial ...
... komunikacije i poštanske usluge ‘Reg- ulatory Agency for Electronic Communication and Postal Services’. Moreover, in many cases a relation between an entity and its name and acronym is not one-to-one. The name can change in time and some shortened variants can be in use, translated names can exhibit ...
... that yield four to six letter acronyms. E-dictionaries are used in order to recognize certain forms and check neces- sary grammatical agreement. Recognition of one particu- lar construct – AANprepNp – is represented in a graph in Fig. 2 (upper and middle part). The agreement check is performed in the ...
Cvetana Krstev, Duško Vitas, Ranka Stanković. "A Lexical Approach to Acronyms and their Definitions" in Proceedings of the 7th Language & Technology Conference, November 27-29, 2015, Poznań, Poland, Springer (2015)
Managing mining project documentation using human language technology

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja (2018)

Purpose: This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing. Design/methodology/approach: The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases. Findings: The ...

Digital libraries, Information retrieval, Data mining, Human language technologies, Project documentation

Aleksandra Tomašević, Ranka Stanković, Miloš Utvić, Ivan Obradović, Božo Kolonja . "Managing mining project documentation using human language technology" in The Electronic Library (2018). https://doi.org/10.1108/EL-11-2017-0239

Претрага

122 items

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names cite

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection cite

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking cite

From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back) cite

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities cite

SrpELTeC: A Serbian Literary Corpus for Distant Reading cite

A System for Named Entity Recognition Based on Local Grammars cite

Serbian NER&Beyond: The Archaic and the Modern Intertwinned cite

Indexing of textual databases based on lexical resources: A case study for Serbian cite

Creation of a Training Dataset for Question-Answering Models in Serbian cite

Development of Open Educational Resources (OER) for Natural Language Processing cite

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources cite

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map cite

Serbian ELTeC Sub-Collection in Wikidata cite

Rule-based Automatic Multi-word Term Extraction and Lemmatization cite

Terminological and lexical resources used to provide open multilingual educational resources cite

Named Entity Recognition for Distant Reading in ELTeC cite

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian cite

A Lexical Approach to Acronyms and their Definitions cite

Managing mining project documentation using human language technology cite

Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

From ELTeC Text Collection Metadata and Named Entities to Linked-data (and Back)

Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities

SrpELTeC: A Serbian Literary Corpus for Distant Reading

A System for Named Entity Recognition Based on Local Grammars

Serbian NER&Beyond: The Archaic and the Modern Intertwinned

Indexing of textual databases based on lexical resources: A case study for Serbian

Creation of a Training Dataset for Question-Answering Models in Serbian

Development of Open Educational Resources (OER) for Natural Language Processing

Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources

It-Sr-NER: Web Services for Recognizing and Linking Named Entities in Text and Displaying Them on a Web Map

Serbian ELTeC Sub-Collection in Wikidata

Rule-based Automatic Multi-word Term Extraction and Lemmatization

Terminological and lexical resources used to provide open multilingual educational resources

Named Entity Recognition for Distant Reading in ELTeC

Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian

A Lexical Approach to Acronyms and their Definitions

Managing mining project documentation using human language technology