Using English Baits to Catch Serbian Multi-Word Terminology
Објеката
- Тип
- Рад у зборнику
- Верзија рада
- објављена верзија
- Језик
- енглески
- Креатор
- Cvetana Krstev, Branislava Šandrih, Ranka Stanković
- Извор
- Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018
- Уредник
- Nicoletta Calzolari et al.
- Издавач
- European Language Resources Association (ELRA)
- Датум издавања
- 2018
- Сажетак
- In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a tool for word and chunk alignment. In this first experiment a source language is English, a target language is Serbian, a domain is Library and Information Science for which a bilingual terminological dictionary exists. Our term extractor is based on e-dictionaries and shallow parsing, and for word alignment we use GIZA++. At the end of procedure we included a supervised binary classifier that decides whether an extracted term is a valid domain term. The classifier was evaluated in a 5-fold cross validation setting on a slightly unbalanced dataset, maintaining average F-score of 89%. After conducting the experiment our system extracted 846 different Serbian domain phrases, containing 515 Serbian phrases that were not present in the existing domain terminology.
- isbn
- 979-10-95546-00-9
- Subject
- aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection
- Шира категорија рада
- M30
- Ужа категорија рада
- M33
- Права
- Отворен приступ
- Лиценца
- Creative Commons – Attribution-NonComercial-No Derivative Works 4.0 International
- Формат
- Медија
Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)