Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names

Објеката

Тип
Рад у зборнику
Верзија рада
рецензирана
Језик
енглески
Креатор
Branislava Šandrih, Cvetana Krstev, Ranka Stanković
Извор
Proceedings - Natural Language Processing in a Deep Learning World
Издавач
Incoma Ltd., Shoumen, Bulgaria
Датум издавања
2019
Сажетак
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on two sam ple texts: a part of the gold standard and an independent newspaper text of approx imately the same size. The results show that rule- and lexicon-based system out performs trained models in all four sce narios (measured by F1), while Stanford models have the highest recall. The pro duced models are incorporated into a Web platform NER&Beyond that provides vari ous NE-related functions.
почетак странице
1060
крај странице
1068
doi
10.26615/978-954-452-056-4_122
Subject
NER, Named Entity Recognition Systems, Serbian, Personal Names
NER, Sistemi za prepoznavanje imenovanih entiteta, srpski, lična imena
Шира категорија рада
М30
Ужа категорија рада
М33
Права
Отворени приступ
Лиценца
Creative Commons – Attribution-Share Alike 4.0 International
Формат
.pdf
Скупови објеката
Ранка Станковић
Radovi istraživača
Медија
RANLP122.pdf

Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122

This item was submitted on 25. новембар 2021. by [anonymous user] using the form “Рад у зборнику радова” on the site “Радови”: http://romeka.rgf.rs/s/repo

Click here to view the collected data.