Using technology for knowledge transfer between academia and enterprises
Ivan Obradović, Ranka Stanković (2014)... electronic language resources, namely, lexical resources, textual resources and grammars. Bilingual dictionaries in electronic form are one of the simplest multilingual lexical resources. However, for their full functionality in languages with complex morphology, such as Serbian, they need to be coupled ...
... with morphological dictionaries. Morphological dictionaries of Serbian simple words and compounds in the so-called LADL format (Krstev et al., 2010) are thus also part of the lexical resources used by LSS. Besides Serbian, such resources exist for many other languages, including English and Russian ...
... different languages (Krstev et al., 2004). Finally, among lexical resources within LSS are terminological resources such as GeolISS term and RudOnto (Stanković et al., 2012). GeolISS is a thesaurus of geological terms with entries in Serbian and English, developed at FMG within the GeolISS project ...
Indexing of textual databases based on lexical resources: A case study for Serbian
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and Named Entity Recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia in the last half century. Each document within this database is described by metadata, consisting of several fields such as title, domain, keywords, abstract, geographical location and the like. A bag of words was produced from these ...... ion on morphological electronic dictionaries and finite state transducers for Serbian [6]. 4.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described ...
... based on lexical resources: A case study for Serbian Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović
... aroslav Černi" 'Institut for waterpower engineering Jaroslav Černi'. The Serbian NER system is a handcrafted rule-based system that relies on comprehensive lexical resources for Serbian described in the previous subsec- tion. For recognition of some types of named entities, e.g. ...
Knowledge and Rule-Based Diacritic Restoration in Serbian
In this paper we present a procedure for the restoration of diacritics in Serbian texts written using the degraded Latin alphabet. The procedure relies on the comprehensive lexical resources for Serbian: the morphological electronic dictionaries, the Corpus of Contemporary Serbian and local grammars. Dictionaries are used to identify possible candidates for the restoration, while the dataobtainedfromSrpKorandlocalgrammarsassistsinmakingadecisionbetween several candidates in cases of ambiguity. The evaluation results reveal that,dependingonthetext,accuracyrangesfrom95.03%to99.36%,whilethe precision (average 98.93%) is always higher than the recall (average 94.94%).... models of linguistic ontologies for natural lan- guage processing on the scale from more lexical to more conceptual resources. In this paper, we consider the approach to developing Russian ontological resources having the format of the RuThes thesaurus (Loukachevitch and Dobrov, 2014) and created for ...
... ontologies, pages 1–17. Springer. Guarino, N. (1998). Some ontological principles for designing upper level lexical resources. In Proceedings of First International Conference on Language Resources and Evaluation LREC-1998, pages 28–30. Guizzardi, G. (2011). Ontological foundations for conceptual part-whole ...
... Development of linguistic ontology on natural sciences and technology. In Proceedings of Linguistic Resources and Evaluation Conference, pages 1077–1082. Fellbaum, C., Ed. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press. Gelfenbeyn, I., Goncharuk, A., Lehelt, V., Lipatov, A., and ...Cvetana Krstev, Ranka Stanković, Duško Vitas. "Knowledge and Rule-Based Diacritic Restoration in Serbian" in Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), May 27-29, 2018, Sofia, Bulgaria, Sofia : The Institute for Bulgarian Language Prof. Lyubomir Andreychin, Bulgarian Academy of Sciences (2018): 41-51
Using Lexical Resources for Irony and Sarcasm Classification
The paper presents a language dependent model for classification of statements into ironic and non-ironic. The model uses various language resources: morphological dictionaries, sentiment lexicon, lexicon of markers and a WordNet based ontology. This approach uses various features: antonymous pairs obtained using the reasoning rules over the Serbian WordNet ontology (R), antonymous pairs in which one member has positive sentiment polarity (PPR), polarity of positive sentiment words (PSP), ordered sequence of sentiment tags (OSA), Part-of-Speech tags of words (POS) ...... speaker of Serbian, we did not use the lang:sr operator in the query itself because the corpus of tweets would be too small. However, geolo- cation restriction allowed us to also find tweets mostly written in the BCMS languages. We developed a language tweet classifier that relies on lexical resources. Although ...
... degree of true positives 6See for example Alexander. R., Elias-Bursać, E.: Bosnian, Croatian, Serbian, a Textbook With Exercises and Basic Grammar. University of Wisconsin Press (2010) Using Lexical Resources for Irony and Sarcasm Classification BCI’17, September 20–23, 2017, Skopje (tp) with double ...
... Semantic networks; Natural lan- guage processing; Lexical semantics; KEYWORDS Computational irony, Verbal irony, Verbal Sarcasm, WordNet
Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources
Large collections of textual documents represent an example of big data that requires the solution of three basic problems: the representation of documents, the representation of information needs and the matching of the two representations. This paper outlines the introduction of document indexing as a possible solution to document representation. Documents within a large textual database developed for geological projects in the Republic of Serbia for many years were indexed using methods developed within digital humanities: bag-of-words and named ...... based on morphological electronic dictionaries and finite-state transducers for Serbian [12]. 3.1 Used Resources Lexical Resources. The resources for natural language processing of Serbian consisting of lexical resources and local grammars are being developed using the finite-state methodology as described ...
... they rely heavily on lexical resources, which is especially important in the case of languages with rich morphology, such as Serbian, and South-Slavic languages in general. Although Serbian belongs to the group of less-resourced languages, in which comprehensive lexical resources and language technology ...
... important lexical resources for Serbian were developed (corpora and e-dictionaries), as well as applications for basic language processing (tokenization, Part-Of-Speech (POS) tagging, mor- phological analysis), information retrieval and extraction [26]. Several successful applications of Serbian language ...Ranka Stanković, Cvetana Krstev, Ivan Obradović, Olivera Kitanović. "Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources" in Trans. Computational Collective Intelligence - Lecture Notes in Computer Science 26, Springer (2017). https://doi.org/10.1007/978-3-319-59268-8_8
Using English Baits to Catch Serbian Multi-Word Terminology
In this paper we present the first results in bilingual terminology extraction. The hypothesis of our approach is that if for a source language domain terminology exists as well as a domain aligned corpus for a source and a target language, then it is possible to extract the terminology for a target language. Our approach relies on several resources and tools: aligned domain texts, domain terminology for a source language, a terminology extractor for a target language, and a ...aligned texts, word alignment, terminology extraction, electronic dictionaries, morphological inflection... different ways of utilizing existing lexical re- sources to improve the quality of statistical machine align- ment. In order to do that we have augmented the set of aligned sentences with inflected forms (English/Serbian). We have used two bilingual lexical resources. (a) Serbian Wordnet (SWN) (Cvetana Krstev ...
... technological domains than Serbian (in the past from French and German). In (Ananiadou et al., 2012) lexical resources for En- glish obtained grades 4.5–6 for all seven criteria, avail- ability rated as excellent (the highest grade 6). To the contrary, the similar survey for Serbian (Vitas et al., 2012) showed ...
... showed that lexical resources are much less de- veloped – they were rated 1–2.5. 2. Terminology consists mainly of Multi-Word Terms (MWT) (data presented in Subsection 4.2. corrobo- rate this claim).1 3. A large portion of MWT terms in Serbian has a limited number of syntactic structures. Namely, 98% ...Cvetana Krstev, Branislava Šandrih, Ranka Stanković. "Using English Baits to Catch Serbian Multi-Word Terminology" in Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, European Language Resources Association (ELRA) (2018)
Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities
Овај рад представља активности на развоју корпуса ELEXIS-sr, српском додатку вишејезичном анотираном корпусу ELEXIS-а, који се састоји од семантичких анотација и репозиторија значења речи. ELEXIS је паралелни вишејезични анотирани корпус на десет европских језика, који може да се користи као вишејезички репер за евалуацију европских језика са мање и средње развијеним ресурсима. Фокус овог рада је на вишечланим изразима и именованим ентитетима, њиховом препознавању у скупу реченица ELEXIS-sr и поређењу са анотацијама на другим језицима. Разматрају се први кораци ...Cvetana Krstev, Ranka Stanković, Aleksandra Marković, Teodora Mihajlov. "Towards the semantic annotation of SR-ELEXIS corpus: Insights into Multiword Expressions and Named Entities" in Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Turin, May 25, 2024, ELRA and ICCL (2024)
SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian
Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković (2019)У овом раду представљамо модел за избор добрих примера за речник српског језика и развој иницијалних компоненти модела. Метода која се користи заснива се на детаљној анализи различитих лексичких и синтактичких карактеристика у корпусу састављених од примера из пет дигитализованих свезака речника САНУ. Почетни скуп функција био је инспирисан сличним приступом и за друге језике. Дистрибуција карактеристика примера из овог корпуса упоређује се са карактеристиком дистрибуције узорака реченица ексцерпираних из корпуса који садрже различите текстове. Анализа је показала да ...Српски, добри примери из речника, аутоматизација израде речника, издвајање својстава, Машинско учење... selection of good dictionary examples for Serbian and the development of initial model components. The method used is based on a thorough analysis of various lexical and syntactic features in a corpus compiled of examples from the five digitized volumes of the Serbian Academy of Sciences and Arts (SASA) ...
... n could serve various different goals: speeding up the dictionary-making process, but also the development of a lexical database as the source for building new dictionaries of Serbian. 248 Proceedings of eLex 2019 In the e-lexicography era, with the imperatives of faster dictionary-making ...
... DOI:10.1007/978-3-319-53640-8_10. Stanković, R., Stijović, R., Vitas, D., Krstev, C. & Sabo, O. (2018). The Dictionary of the Serbian Academy: from the Text to the Lexical Database. In J. Čibej, V. Gorjanc, I. Kosem & S. Krek (eds.) Proceedings of the XVIII EURALEX International Congress: Lexicography ...Ranka Stanković, Branislava Šandrih, Rada Stijović, Cvetana Krstev, Duško Vitas, Aleksandra Marković. "SASA Dictionary as the Gold Standard for Good Dictionary Examples for Serbian" in Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference , Lexical Computing CZ, s.r.o. (2019)
Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis
U ovom radu predstavljen je model koji omogućava prikupljanje, pripremu, opis metapodataka, upravljanje i eksploataciju, uključujući pretragu punog teksta dokumenata iz domena kriminalistike napisanih na srpskom jeziku. Predloženi pristup primenjuje se na veb portalu koji sakuplja različite tekstove nastale iz časopisa Akademije za kriminalistiku i policijske studije, Krivičnog zakona Srbije, konferencija „Tara“ i „Reiss“, kao i iz nekih doktorskih disertacija vezanih za ovu oblast istraživanje. Nakon obrade teksta, korpus koji sadrži preko 5500 stranica običnog teksta, kreiran je i ...... LINGUISTIC RESOURCES The linguistics and lexical resources used for query expansion and text analysis are depicted in Figure 3 on the left, while on the right are main application components of the language support system. Main lexical resources include morphological dictionaries for Serbian language15 ...
... support is implemented via REST web service Vebran that interact from one side with lexical and linguistic resources and from the other with Omeka KPA digital library. 15 Cvetana Krstev. Processing of Serbian – Automata, Text and Electronic Dictionaries, Faculty of philology, Belgrade, 2008 ...
... Obradović, “The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines”, in Proceedings of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 28-30 May 2008, European Language Resources Association (ELRA), 2008 ...Dalibor Vorkapić, Aleksandra Tomašević, Miljana Mladenović, Ranka Stanković, Nikola Vulović. "Digital Library From A Domain Of Criminalistics As A Foundation For A Forensic Text Analysis" in International Scientific Conference “Archibald Reiss Days” Thematic Conference Proceedings Of International Significance, Belgrade, 7-9 November 2017, Academy Of Criminalistic And Police Studies Belgrade (2017)
Rule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass extracted data to evaluators and subsequently to terminological e-dictionaries and databases. The approach is illustrated on a corpus of Serbian texts from ...... management and exploitation of lexical resources, distributed via owner. Krstev C., Vitas D. (2015). SrpRec, Serbian morphological electronic dictionary, http://www-igm.univ-mlv.fr/~unitex/index.php?page= 5, CC BY-NC-ND. Lazić B., Stanković R. (2015), MineCorp, Serbian corpus from mining domain ...
... Linguistics, 16, pp. 22--29. Church, K. W. Gale, W., Hanks, P., Hindle, D. (1991). Using statistics in lexical analysis, In U. Zernik (Ed.), Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 115--164. Kilgarriff, A., Baisa, V ...
... aleksandra@unilib.bg.ac.rs Abstract In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same ...
An Italian-Serbian Sentence Aligned Parallel Literary Corpus
This article presents the construction and relevance of an Italian-Serbian sentence-aligned parallel corpus, delving into the aligned sentences in order to facilitate effective translation between the two languages. The parallel corpus serves as a valuable resource for language experts, researchers, and language enthusiasts, fostering a deeper understanding of linguistic nuances and cultural expressions. By bridging the gap between Serbian and Italian, this corpus opens new avenues for cross-cultural communication and collaboration, and ultimately contributes to the improvement of language-related ...Saša Moderc, Ranka Stanković, Aleksandra Tomašević, Mihailo Škorić. "An Italian-Serbian Sentence Aligned Parallel Literary Corpus" in Review of the National Center for Digitization, Belgrade : Faculty of Mathematics, University of Belgrade (2023). https://doi.org/10.5281/zenodo.11203388
Wordnet Development Using a Multifunctional Tool
Ivan Obradović, Ranka Stanković (2007)In this paper we present a multifunctional tool for manipulating heterogeneous language resources. The tool handles electronic dictionaries, wordnets and aligned texts, and provides for their synchronous use in various tasks. We focus here on the description of the possibilities this tool offers in the development of wordnets. Besides the wordnet module which enables parallel handling of two wordnets, other modules, such as the module for morphological dictionaries and the module for aligned texts, as well as available finite ...... Human Language Technology group at the University of Belgrade has been developing various lexical resources over quite a long period, reaching a considerable volume to date. Given the fact that these resources have been developed for many years, they have naturally been conceived within different ...
... that integrates diverse language resources and is thus more powerful than the majority of other wordnet tools. The desktop version of WS4LR is fully operational and is already being used as the main tool for developing resources in Serbian, including the Serbian wordnet, but its commercial a ...
... adds to the flexibility of resources exploitation. Conversion from one character encoding set to another is extremely important for languages such as Serbian, where two alphabets, Cyrillic and Latin are equally used. WS4LR enables the exploitation of language resources both in Cyrillic and Latin ...Ivan Obradović, Ranka Stanković. "Wordnet Development Using a Multifunctional Tool" in Proceedings of the International Workshop Computer Aided Language Processing (CALP) '2007, Borovets, Bulgaria, September 2007, - (2007)
Two approaches to compilation of bilingual multi-word terminology lists from lexical resources
In this paper, we present two approaches and the implemented system for bilingual terminology extraction that rely on an aligned bilingual domain corpus, a terminology extractor for a target language, and a tool for chunk alignment. The two approaches differ in the way terminology for the source language is obtained: the first relies on an existing domain terminology lexicon, while the second one uses a term extraction tool. For both approaches, four experiments were performed with two parameters being ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Two approaches to compilation of bilingual multi-word terminology lists from lexical resources" in Natural Language Engineering, Cambridge University Press (CUP) (2020). https://doi.org/10.1017/S1351324919000615
Building learning capacity by blending different sources of knowledge
... electronic language resources, namely, lexical resources, textual resources and grammars. The simplest multilingual lexical resources in general are bilingual dictionaries in electronic form. However, for their full functionality in languages with complex morphology, such as Serbian, they need to be ...
... dictionaries. Morphological dictionaries of Serbian simple words and compounds in the so-called LADL format (Krstev et al., 2010) are thus a necessary part of the lexical resources used by the BMP language support system. Besides Serbian, such resources exist for many other languages, including ...
... Using Textual and Lexical Resources in Developing Serbian Wordnet. Romanian Journal of Information Science and Technology, 7(1-2), 147-161. Koutsomitropoulos, D.A., Alexopoulos, A.D. Solomou, G.D. and Papatheodorou, T.S. (2010). The Use of Metadata for Educational Resources. In Digital Repositories: ...Ivan Obradović, Ranka Stanković, Olivera Kitanović, Dalibor Vorkapić. "Building learning capacity by blending different sources of knowledge" in International Journal of Learning and Intellectual Capital (2016). https://doi.org/10.1504/IJLIC.2016.075698
Possibilities of retro-digitalized German-Serbian Mining Dictionary
U radu će biti prikazan opis procesa retrodigitalizacije dvojezičnog Nemačko-srpskog rudarskog rečnika iz 1923. godine čiji je autor rudarski inženjer Dragutin Stepanović (Степановић, 1923). Ovaj rečnik je zasnovan na skoro 4 000 leksičkih zapisa koji su prevodilački ekvivalenti ili uputnice. Umesto predgovora autor daje uvid u svoje pismo upućeno “Ministru šuma i rudnika” u kome piše o nameri da zabeleži reči koje se koriste u narodu kako bi izbegao upotrebu nemačkih reči. Iako broj odrednica nije toliko veliki, rečnik ...Biljana Lazić, Olivera Kitanović, Ivan Obradović. "Possibilities of retro-digitalized German-Serbian Mining Dictionary" in E-dictionaries and E-lexicography, Zagreb, 10-11 May 2019, Zagreb : Institut za hrvatski jezik i jezikoslovlje (2019)
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... and inflected English-Serbian single and multi-unit word forms (denoted as bi-list). We used two bilingual lexical resources that we processed with LeXimir: (a) Serbian Wordnet (SWN),7 which is aligned to the Princeton WordNet (Princeton WordNet, 2010), and (b) an English-Serbian list con- taining general ...
... or obtained from the text. The only system developed specifically for the extraction of MWTs from Serbian texts is a part of LeXimir (Stanković et al., 2016), a tool for management of lexical resources. LeXimir consists of two modules for the terminology extraction. The first module is a rule-based ...
... 2018). 122 Infotheca Vol. 19, No. 2, December 2019 Scientific paper 3 Lexical Resources and Tools As previously mentioned in Section 1, the approach proposed in (Krstev et al., 2018) relies on several lexical resources and tools: i A sentence-aligned domain-specific corpus involving a source and ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
Building Terminological Resources in an e-Learning Environment
... in other languages. Hence the need to integrate terminological resources in e-format into the e-learning environment. The importance of developing both Serbian terminological resources and multilingual resources involving Serbian as one of the languages in e-format for mining engineering terms ...
... [10] Stanković, R. Obradović, I., Kitanović, O., GIS Application Improvement with Multilingual Lexical and Terminological Resources, Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta, May 2010, pp.2283-2287, 2010. [11] Stanković, R ...
... of mine exploitation, indicated that such resources would greatly contribute to their functionality [5]. The only available multilingual terminological resource in printed form involving Serbian is a Dictionary of mining in five languages (Serbian, English, French, German and Russian), published ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
An Integrated Environment for Management and Exploitation of Linguistic Resources
Ranka Stanković, Ivan Obradović (2009)... the system of morphological dictionar- ies of Serbian (SMD). Another very important and devel- oped resource is the Serbian wordnet (SWN), a lexical data- base representing the semantic network of words in Serbian. Within this group of resources, the multilingual ontological dictionary of ...
... different resources: web, aligned text and spatial database. II.LINGUISTIC RESOURCES In this Section we offer a brief description of three most important linguistic resources for Serbian developed within the HLT Group. Namely, the system of morphological dic- tionaries of Serbian, the Serbian wordnet ...
... Obradović, “The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines”, in Proceedings of the Sixth Interantional Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, European Language Resources Association (ELRA), May 2008. ...Ranka Stanković, Ivan Obradović. "An Integrated Environment for Management and Exploitation of Linguistic Resources" in Proceedings of the International Multiconference on Computer Science and Information Technology, Computational Linguistics – Applications Workshop (CLA09), Mrągowo, Poland, October 2009, Piscataway : IEEE (2009)
A Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack of a support for some types of categories. At the same time, various descriptions often describe exactly the same ...... ISO/TC 37/SC 4. ISO. (2009) ISO 12620 Terminology and other language and content resources – Data Categories – Specification of data categories and management of a data category registry for language resources Kešelj, V., Kešelj, T., and Zlatić, L. (2004). R{j}ecnik.com: English-Serbo-Croatian ...
... of Morphological Features of Serbian: a Revision using Feature System Declaration Cvetana Krstev, Ranka Stanković, Vitas Duško
... using different approaches. We propose a new morphological description for Serbian following the feature structure representation defined by the ISO standard. In this description we try do incorporate all characteristics of Serbian that need to be specified for various applications. We have developed several ...Cvetana Krstev, Ranka Stanković, Vitas Duško. "A Description of Morphological Features of Serbian: a Revision using Feature System Declaration" in Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2010, Valetta, Malta : European Language Resources Association (2010)
Towards translation of educational resources using GIZA++
... number in Serbian, while the total of aligned sentences is 67,206. Haddow et al. [16] give a general MT system overview with details on the training pipeline and decoder configuration using Moses toolkit. [5] In this research we followed their approach, albeit with available resources for Serbian ...
... are: inverse phrase translation probability φ(sr|e) inverse lexical weighting lex(sr|e) 2 http://www.statmt.org/moses/?n=FactoredTraining.ScorePhrases direct phrase translation probability φ(en|sr direct lexical weighting lex(en|sr) phrase penalty (always exp(1) = 2.718) ...
... for an English phrase at a time. Additional phrase translation scoring parameters can be produced in output: lexical weighting (direct and indirect), word penalty, phrase penalty, Lexical weighting features estimate the probability of a phrase pair or translation rule word-by-word. The word penalty ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)