Knowledge and Rule-Based Diacritic Restoration in Serbian
Објеката
- Тип
- Рад у зборнику
- Верзија рада
- објављена верзија
- Језик
- енглески
- Креатор
- Cvetana Krstev, Ranka Stanković, Duško Vitas
- Извор
- Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria
- Издавач
- Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences
- Датум издавања
- 2018
- Сажетак
- In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).
- issn
- 2367-5675
- Subject
- diacritic restoration, morphological dictionary, corpus, word n-grams, local grammars
- Шира категорија рада
- M30
- Ужа категорија рада
- M33
- Права
- Отворен приступ
- Лиценца
- Creative Commons – Attribution-NonComercial-No Derivative Works 4.0 International
- Формат
- Почетна страна
- 41
- Завршна страна
- 51
Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51