Претрага
83 items
-
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
-
Terminological and lexical resources used to provide open multilingual educational resources
Open educational resources (OER) within BAEKTEL (Blending Academic and Entrepreneurial Knowledge in Technology enhanced learning) network will be available in different languages, mostly in the languages of Western Balkans, Russian and English. University of Belgrade (UB) hosts a central repository based on: BAEKTEL Metadata Portal (BMP), terminological web application for management, browse and search of terminological resources, web services for linguistic support (query expansion, information retrieval, OER indexing, etc.), annotation of selected resources and OER repository on local edX ...... Internet. This component consists of morphological dictionaries, WordNet, domain specific terminological resources such as GeolISSterm, RudOnto, aligned texts in TMX format, corpora etc. Special attention will be given to Termi, newly developed application for terminology management. Keywords: Open ...
... was recognized 14 most productive patterns that represent structure of MWU terms. They are represented in form of transducers applied on domain corpus to extract terminology. Examples of patterns are presented in [15]. After applying these transducers on domain text extracted potential terms were ...
... , methods and applications,” Knowledge engineering review, vol. 11, No. 2, pp. 93–136, June. 1996. [12] P. Pantel and L. Dekang, “A statistical corpus-based term extractor,” in Advances in Artificial Intelligence, 1st ed., E. Stroulia and S. Matwin, Ed. Springer Berlin Heidelberg, 2001, pp. 36-46 ...Biljana Lazić, Danica Seničić, Aleksandra Tomašević, Bojan Zlatić. "Terminological and lexical resources used to provide open multilingual educational resources" in The Seventh International Conference on eLearning (eLearning-2016), 29-30 September 2016, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2016)
-
A bilingual digital library for academic and entrepreneurial knowledge management
A generic knowledge management process of organization, storage and retrieval of knowledge can suitably be fitted in a digital library. In the digital and knowledge age digital libraries can be used in knowledge management to handle intellectual assets and support knowledge creation. A multilingual digital library either stores content in more than one language or provides multilingual query access to monolingual content. In Serbia 18 of 308 scientific journals regularly published are bi-lingual, with papers simultaneously being in English ...... documents). XAlign is now integrated into Unitex1, a corpus processing system, based on automata-oriented technology (Utvić et al., 2007). Text preparation, alignment and generation of TMX documents are done within a special-purpose tool ACIDE (Aligned Corpora Integrated Development Environment) (Utvić ...
... She has developed the Serbian morphological e-dictionary, and was one of the key contributors to the development of Serbian wordnet, aligned multilingual corpus, and many other language resources. She also developed the first Serbian Named Entity Recognition system. She participated in a number ...
... lexical resources, access to aligned resources, etc.) 4 System components In designing Bibliša special attention is given to its language support component. It supports various aspects of multilingual libraries: its content is not only multilingual, but also aligned and it can be searched in any ...Ranka Stanković, Cvetana Krstev, Biljana Lazić, Dalibor Vorkapić. "A bilingual digital library for academic and entrepreneurial knowledge management" in Proceeding of 10th International Forum on Knowledge Asset Dynamics — IFKAD 2015: Culture, Innovation and Entrepreneurship: connecting the knowledge dots, Bari, Italy, 10-12 June 2015, Bari : IFKAD (2015)
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... present briefly this system and how it was used to produce the corpus of newspaper texts an- notated with personal names – the gold standard. Section 3 describes NER systems based on Ma- chine Learning methods that were trained on the corpus derived from the gold standard, while the evaluation and discussion ...
... NER methods and tools with an ultimate goal to produce a successful hybrid system. The im- portant next step is the enhancement of our news- paper corpus with other types of text (Wikipedia articles, domain texts, literary texts). The literary texts would be particularly important for improv- ing the ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
The Dictionary of the Serbian Academy: from the Text to the Lexical Database
In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach ...... n about prove- nance and use. The class References contains metadata about the bibliographic information from the dictionary corpus. The set of markers is partially aligned with the TEI elements (and attributes) and LexInfo in order to relate the lexical data to other resources and provide automatic ...
... References Ahačič, K., Ledinek, N., & Perdih, A. (2015). Fran: The Next Generation Slovenian Dictionary Portal. In Natural Language Processing, Corpus Linguistics, Lexicography. Eight International Conference Bratislava, Slovakia, pp. 21-22. Berg, D. L., Gonnet, G. H., & Tompa, F. W. (1988). The ...Ranka Stanković, Rada Stijović, Duško Vitas, Cvetana Krstev, Olga Sabo. "The Dictionary of the Serbian Academy: from the Text to the Lexical Database" in Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana : Ljubljana University Press, Faculty of Arts (2018)
-
Белешка о дигитализацији речника
У раду ће се анализирати ограничења која проистичу из линеарног процеса традиционалне израде речника на примеру Речника САНУ. Начин да се превазиђу ова ограничења се састоји у формирању електронске лексикографске базе која не представља само пуку дигиталну транскрипцију папирног издања речника. Посебно се указује на чињеницу да текст речника може представљати корпус и приказују се одабрани примери анализе таквог корпуса формираног из текстове 1. и 19. тома Речника САНУ.... dictionary edition. It is additionally stressed that a dictionary text represents itself a valuable corpus for various research purposes which is illustrated by a few examples of the analysis of such corpus compiled form the first and 19th volume of the SASA Dictionary. ...Душко М. Витас, Цветана Ј. Крстев, Ранка М. Станковић. "Белешка о дигитализацији речника" in Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch3
-
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... lemmatization for which TreeTagger trained for Serbian is available, already used for lemmatization of the Corpus of Contemporary Serbian [25]. However, this lemmatizer was trained on a general corpus that differs significantly from domain corpora, such as our textual database of geological projects, and ...
... much as possible [13]. These local grammars were organized in cascades that further resolve ambiguities [16]. NER system was evaluated on a newspaper corpus and results reported in [13] showed that F -measure of recognition was 0.96 for types and 0.92 for tokens.3 3 Tokens are all occurrences (in this ...
... B., Nikolić, V.: The devel- opment of the geolissterm terminological dictionary. INFOtheca 12(1), 49a–63a (2011) 25. Utvić, M.: Annotating the corpus of contemporary Serbian. INFOtheca - J. Inform. Librariansh. 12(2), 36a–47a (2011) 26. Vitas, D., Popović, L., Krstev, C., Obradović, I., Pavlo ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
-
Српски језик у дигиталном добу -- The Serbian Language in the Digital Age
Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević (2012)... as for example SrpNet. ‚ A reference corpus of contemporary Serbian in Eka- vian dialect is available, as well as several parallel aligned corpora, all of which are available to re- searchers of Serbian. Current research is focused on upgrading the reference corpus and expanding it with the Ijekavian ...
... versity of Novi Sad. e AlfaNum company has a con- siderable number of users among Serbian companies. e first corpus of contemporary Serbian, an electronic morphological dictionary of Serbian, aligned French- Serbian and English-Serbian corpora of literary texts, as well as different soware tools were developed ...
... developed within the scope of bilateral cooperation with France, whereas a one-million aligned English-Serbian project, lemmatised and morphologically annotated, was devel- oped within the scope of the Intera project. is corpus was used for tagger training, as well as for experiments in alignment at the word ...Duško Vitas, Ljubomir Popović, Cvetana Krstev, Ivan Obradović, Gordana Pavlović-Lažetić, Mladen Stanojević. "Српски језик у дигиталном добу -- The Serbian Language in the Digital Age" in META-NET White Paper Series, G. Rehm, H. Uszkoreit (eds.), Springer (2012)
-
Глаголи у кухињи и за столом
Цветана Крстев, Биљана Лазић (2015)У раду је приказано истраживање лексике на српском језику кулинарског домена које се заснива на коришћењу доменског корпуса, електронских лексичких ресурса, пре свега WordNet-а и морфолошких речника, и локалних граматика. Приказане су доменске специфичности ових ресурса, како се користе, и међусобно употпуњују. Посебно је приказано како се коришћењем доменског корпуса могу екстраховати глаголи специфични за кулинарски домен и описати начини њиховог коришћења. Дат је попис глагола са основним подацима који је добијен применом представљених метода.аутоматска обрада, коначни трансдуктори, електронски речници, семантичке мреже, локалне граматике, кулинарство... and at a Table Summary In this paper we present a research of the lexica of the culinary domain in Serbian based on the use of the domain corpus, electronic lexical resources – WordNet and morphologcila dictionaries – and local grammars. We presented the domain characteristics of these resources ...
... can be used for research and for mutal enrichment. In more details we showed how verbs used in the culinary domain can be extracted from the domain corpus and how the extracted information can be used to describe these verbs. At the end we gave the list of all verbs extracted using the described approach ...Цветана Крстев, Биљана Лазић. "Глаголи у кухињи и за столом" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и преимене, Вол. 44/3, Београд : Међународни славистички центар (2015)
-
Ontološki model upravljanja rizikom u rudarstvu
Olivera Kitanović (2021)Rudarska proizvodnja obuhvata kompleksne tehnološke sisteme, što nameće potrebu za uspostavljanjem i unapređivanjem sistema upravljanja rizikom. Heterogenost i obim podataka neophodnih za upravljanje rizikom zahtevaju sistem koji ih na fleksibilan način integriše i omogućava njihovo optimalno korišćenje. Osnovni cilj ove disertacije je razvoj ontologije za domen rudarstva i na njoj zasnovanog modela za upravljanje rizikom. Njegova realizacija podrazumeva i implementaciju algoritama ekstrakcije informacija za popunjavanje ontologije, kao i odgovarajuće softversko rešenje. Razvoj modela obuhvata i značajno proširenje rudarskog korpusa, kao ...rudarstvo, rizik, upravljanje rizikom, procena rizika, ontologija, semantička mreža, ekstrakcija informacija, upravljanje znanjem, računarska lingvistika... solution. The development of the model includes a significant expansion of the mining corpus, as well as the creation of a terminological database, realized using methods of computational linguistics and a corpus of documents from the mining domain (plans, reports, laws, textbooks and monographs) ...
... prirodnog jezika (NLP): metoda konačnih automata (Gross 1987) i upitni jezik CQL (eng. Corpus Query Language) zasnovan na podudaranju obrazaca u sistemu za upravljanje velikim količinama tekstualnih podataka CQP (eng. Corpus Query Processor) (Evert 2005). Tehnikama obrade prirodnog jezika su ekstrahovani ...
... ][lemma="jesam"] 70 https://www.sketchengine.eu/ 71 https://www.sketchengine.eu/documentation/corpus-querying/ https://www.sketchengine.eu/ https://www.sketchengine.eu/documentation/corpus-querying/ 88 Gde se „[word!="\."]“ omogućava da se u tekstu mogu javiti neke reči ali ne i tačka. ...Olivera Kitanović. Ontološki model upravljanja rizikom u rudarstvu, Beograd : [O. Kitanović], 2021
-
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... different types of dictionaries, the Group is engaged in developing other resources, such as the e-corpus of Serbian, as well as parallel multilingual corpora, with the majority of parallel texts aligned. The resources have been developed during several decades in the framework of various ...
... sentences were aligned by means of one of the available alignment methods. The tool used for the majority of alignments was XAlign, developed within LORIA, Labo- ratoire Lorrain de Recherche en Informatique et ses Appli- cations [4]. Parallel and aligned texts were grouped in aligned corpo- ra. ...
... possibility of adding hypernym literals. D. Aligned texts WS4LR contains a module for processing of parallel texts which have previously been aligned using the text align- ment tool XAlign. The module enables the transformation of texts aligned by XAlign into different formats: textual ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
-
Production of morphological dictionaries of multi-word units using a multipurpose tool
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multi-word units, noun phrases, query expansion... can run on any personal computer under Windows and supports simultaneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned 1LeXimir is available under CC NC BY licence. For more information see http://korpus.matf.bg.ac.rs/soft/LeXimir.html Fig. 3. LeXimir’s editor for ...
... dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it. The dictionary of forms is used in NLP tasks. Two corpus processing systems that support work with this dictionary format were developed, Unitex [2] and Nooj [3], both of which are based on the use of ...Ranka Stanković, Ivan Obradović, Cvetana Krstev, Duško Vitas. "Production of morphological dictionaries of multi-word units using a multipurpose tool" in Proceedings of the Computational Linguistics-Applications Conference, October 2011, Jachranka, Poland, Jachranka, Poland : PTI - Polish Information Processing Society (2011)
-
Увођење доменских и семантичких маркера за област рударства у српске електронске речнике
... information retrieval and extraction, and proposesanexpansion of the set of the semarkers for the field of mining. A brief description of the developed corpus of texts from the field of mining is also given, for the search of which the proposed markers are extremely important. ...Иван Обрадовић, Александра Томашевић, Ранка Станковић, Биљана Лазић. "Увођење доменских и семантичких маркера за област рударства у српске електронске речнике" in Научни састанак слависта у Вукове дане - Српски језик и његови ресурси: теорија, опис и примене, Београд : Међународни славистички центар на Филолошком факултету, Филолошки факултет (2017). https://doi.org/10.18485/msc.2017.46.3.ch10
-
A WordNet Ontology in Improving Searches of Digital Dialect Dictionary
In this paper, we present a method for automatic generation of a digital resource, which connects all indirect synonyms of a dialect term to all indirect synonyms of a corresponding term in the standard language, aiming to improve the search of a digital dialect dictionary. The method uses SWRL rules defined in the Serbian WordNet ontology to identify sets of synonymous words. It also uses e-dictionaries to produce correct lemmas in standard language that users usually employ in searches. ...... attach lemma for all inflected forms in the dialect dictionary that match a form in mor- phological e-dictionary. After separation of all synonyms aligned with a dialect lexeme (from the standard language or dialect), infinitive forms were attached to the original form. Among 4,152 filtered entries ...
... by a verb in the standard language representing a set of synonyms ob- tained from SWN ontology. Next example represents two joined sets of synonyms aligned by the verb upropastitt. upropastiti unerediti, unistiti, uprskati, zabrljati, zajebati, zakrmasiti, zasvinjiti | isabim batiSem dokrajisem istrovim ...
... 0.874, Fl = 2PR/(P + R) = 0.933, accuracy= 0.897. Table 1. The confusion matrix of the process deciding whether dictionary entries are correctly aligned with standard language entries. System Yes | System No Expert yes tp = 3022 fn = 436 Expert no fp=0 tn = 784 We can notice that the proposed method ...Miljana Mladenović, Ranka Stanković, Cvetana Krstev. "A WordNet Ontology in Improving Searches of Digital Dialect Dictionary" in New Trends in Databases and Information Systems: ADBIS 2017 Short Papers and Workshops - SW4CH (Semantic Web for Cultural Heritage) 767, Springer International Publishing (2017). https://doi.org/10.1007/978-3-319-67162-8_37
-
An Approach to Efficient Processing of Multi-Word Units
Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making ...... run on any personal computer under Windows and supports simul- taneous manipulation of various language resources: e-dictionaries, wordnets, and aligned texts. Implementation of LeXimir followed a modular approach. Namely, there exists a common core of the system, which is coupled with several modules ...
... morphological information to wordnet synsets, whereas both morphological dictionaries and the wordnet can be used in production of concordances for aligned texts. On the other hand, it enables the use of LeXimir Core in different scenarios: as a standalone Windows application LeXimir.exe or as a web ...
... dictionary of forms containing all necessary grammatical information (DELAF) can be generated from it, and subsequently used in various NLP tasks. Two corpus processing systems that support work with this dictionary format were developed, Unitex [13] and Nooj [20], both of which use finite-state technology ...Cvetana Krstev, Ivan Obradović, Ranka Stanković, Duško Vitas. "An Approach to Efficient Processing of Multi-Word Units" in Computational Linguistics - Applications, Studies in Computational Intelligence 458 no. 458, Berlin Heidelberg : Springer-Verlag (2013): 109-129. https://doi.org/10.1007/978-3-642-34399-5_6
-
Речник САНУ као база терминолошких речника (на примеру речника кулинарства)
... extraction of multiword lexical units that are allocated to the frequency of terms that are significantly more frequent in a culinary text than in the corpus of contemporary Serbian language. Using this approach, we were able to identify extremely rich term collection for culinary lexicon contained ...Рада Стијовић, Олга Сабо, Ранка Станковић. "Речник САНУ као база терминолошких речника (на примеру речника кулинарства)" in Словенска терминологија данас, Београд : Српска академија наука и уметности (2017)
-
Automatic construction of a morphological dictionary of multi-word units
The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation ...electronic dictionary, Serbian, morphology, inflection, multiwordn units, noun phrases, query expansion... in C# and operates on the .NET platform. It supports development, maintenance and exploitation of various resources: e- dictionaries, wordnets, and aligned texts. A user of this tool need not use all of these resources, or even possess them, but those that exist are visible in all modules and can be exploited ...
... The calculation is performed on the basis of dictionaries described in [10] and [11] that are part of the standard distribution of Unitex [12], a corpus processing system based on the finite-state technology. 4 Authors Suppressed Due to Excessive Length Table 1. Initial content of the Serbian m ...Cvetana Krstev, Ranka Stanković, Ivan Obradović, Duško Vitas, Miloš Utvić. "Automatic construction of a morphological dictionary of multi-word units" in Lecture Notes in Computer Science 6233, Advances in Natural Language Processing, Proceedings of the 7thInternational Conference on NLP, IceTAL 2010, Reykjavik, Iceland, August 2010, Springer (2010): 226-237. https://doi.org/10.1007/978-3-642-14770-8_26
-
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Management of aligned parallel texts Parallel texts, which usually originate from a text in one language and its translation in another, are often aligned at a certain level (paragraph, sentence, etc) by matching the corresponding segments of the original and its translation. Aligned parallel texts ...
... WS4LR module for management of aligned parallel texts uses texts which have previously been aligned using Xalign as an alignment tool [3]. The module converts these texts to the Translation Memory eXchange (TMX) format, which is becoming the standard format for aligned texts. Figure 4 depicts the ...
... similar manner. Figure 8. Aligned texts with highlighted words Another, more complex option is to use aligned texts. If PWN is used for the source synset, then the language of one of the parallel texts must be English. Namely, WS4LR allows the user to search aligned texts using words from both ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
-
Named Entity Recognition for Distant Reading in ELTeC
Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković (2020)Akcija COST „Udaljeno čitanje za evropsku književnu istoriju“, koja je počela 2017. godine, ima među svojim glavnim ciljevima stvaranje višejezične zbirke evropskih književnih tekstova (ELTeC) otvorenog koda. U ovom radu predstavljamo rad koji je obavljen na ručnom označavanju selekcije ELTeC kolekcije za imenovane entitete, kao i na proceni postojećih alata za prepoznavanje imenovanih entiteta u pogledu njihove sposobnosti da automatski urade takve anotacije. U poslednjem paragrafu se razmatraju zajedničke tačke između ove inicijative i CLARIN-a.... IN se rv ic e s a n d to o ls , w ith s o m e id e a s fo r p o s s ib le c o lla b o ra tio n . 2 Developing the NE layer of the ELTeC corpus 2.1 Desiderata and annotation set N E R is a w e ll k n o w n ta s k in N L P , a n d th e re a re se v e ra l se ts o f g u id e l in ...
... 4 2 5 2 1 3 138 1 0 5 1 4 3 6 8 5 131 T a b le 1: D a ta o n th e m a n u a lly N E -a n n o ta te d c o rp u s . 2.2 Current state of the corpus T h e N E a n n o ta t io n o f th e c o rp u s is p a r t o f th e p la n fo r th e so c a lle d le v e l 2 a n n o ta tio n , w ...Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos, Ranka Stanković. "Named Entity Recognition for Distant Reading in ELTeC" in CLARIN Annual Conference 2020, Oct 2020, Virtual Event, France, CLARIN (2020)
-
Чији је пример? Анализа лексичких обележја на примерима Речника САНУ
У овом раду поставља се питање: да ли се може утврдити ко је аутор неког текста уколико се анализирају искључиво његова лексичка обележја? Како бисмо покушали да добијемо одговор на ово питање, посматрали смо примере у оквиру речничког чланка појединачне лексеме Речника САНУ, који су забележени у пет томова (и то: I, II, XVIII, XIX и XX). Сваки пример је преузет из неког извора на шта упућују скраћенице, наведене у заградама. Од преко 5.000 понуђених извора, определили смо се ...... 1959. и (допуњено) 2017. Бранислава Б. Шандрих, Ранка М. Станковић, Мирјана С. Гочанин316 Утвић 2014: Miloš Utvić, The construction of reference corpus of contemporary Serbian [Izgradnja referentnog korpusa savremenog srpskog jezika] (Doc- toral dissertation, University of Belgrade). Фекете 1993: ...Бранислава Б. Шандрих, Ранка М. Станковић, Мирјана С. Гочанин. "Чији је пример? Анализа лексичких обележја на примерима Речника САНУ" in Српски језик и његови ресурси, Међународни славистички центар, Филолошки факултет, Универзитет у Београду (2019). https://doi.org/10.18485/msc.2019.48.3.ch13