Knowledge and Rule-Based Diacritic Restoration in Serbian

Објеката

Тип
Рад у зборнику
Верзија рада
објављена верзија
Језик
енглески
Креатор
Cvetana Krstev, Ranka Stanković, Duško Vitas
Извор
Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria
Издавач
Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences
Датум издавања
2018
Сажетак
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).
issn
2367-5675
Subject
diacritic restoration, morphological dictionary, corpus, word n-grams, local grammars
Шира категорија рада
M30
Ужа категорија рада
M33
Права
Отворен приступ
Лиценца
Creative Commons – Attribution-NonComercial-No Derivative Works 4.0 International
Формат
.pdf
Почетна страна
41
Завршна страна
51
Скупови објеката
Ранка Станковић
Radovi istraživača

Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51