Претрага
2443 items
-
Towards translation of educational resources using GIZA++
... feedback vector and used to refine parallel data and retrain translation models towards a more accurate second-phase translation output. The project results will be showcased and tested on the Iversity [4] MOOC platform and on the VideoLectures.NET digital video lecture library. The translation ...
... a variant of the geometric average. The SELFEH corpus is part of Biblisha digital library and is used in this research, and a comparison of results is in progress. Machine translation research using Giza++ and is usage for eLearning material is in its initial phase, but it is clear that the ...
... for additional text alignment and augmentation of Biblisha library. The detailed evaluation will be performed when we reach at least 100000 sentence pairs. Ur aim is to publish SMT based web service (API) and integrate it with eLearning systems that we use: Moodle and edX, REFERENCES [1] Class ...Ivan Obradović, Dalibor Vorkapić, Ranka Stanković, Nikola Vulović, Miladin Kotorčević. "Towards translation of educational resources using GIZA++" in The Seventh International Conference on e-Learning (eLearning-2016), September 2016, Belgrade : Metropolitan Univesity (2016)
-
Transformer-Based Composite Language Models for Text Evaluation and Classification
Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the ...Mihailo Škorić, Miloš Utvić, Ranka Stanković. "Transformer-Based Composite Language Models for Text Evaluation and Classification" in Mathematics, MDPI AG (2023). https://doi.org/10.3390/math11224660
-
Updating the database of the spatial information system for capital underground mining rooms
Milutinović Aleksandar, Ganić Aleksandar, Miljanović Igor, Gajić Grozdana. "Updating the database of the spatial information system for capital underground mining rooms" in 4th Balkan Mining Congress BALKANMINE 1, Ljubljana, Slovenija:Faculty of Natural Sciences and Engineering (2011): 629-633
-
Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution
This paper explores the effectiveness of parallel stylometric document embeddings in solving the authorship attribution task by testing a novel approach on literary texts in 7 different languages, totaling in 7051 unique 10,000-token chunks from 700 PoS and lemma annotated documents. We used these documents to produce four document embedding models using Stylo R package (word-based, lemma-based, PoS-trigrams-based, and PoS-mask-based) and one document embedding model using mBERT for each of the seven languages. We created further derivations of these ...Mihailo Škorić, Ranka Stanković, Milica Ikonić Nešić, Joanna Byszuk, Maciej Eder. "Parallel Stylometric Document Embeddings with Deep Learning Based Language Models in Literary Authorship Attribution" in Mathematics, MDPI AG (2022). https://doi.org/10.3390/math10050838
-
Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names
In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian news paper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annota tion, which were further used to train two Named Entity Recognition (NER) sys tems: Stanford and spaCy. All obtained models, together with a rule- and lexicon based system were evaluated on ...... Association for Computational Linguistics, pages 141–150. Nathalie Friburger and Denis Maurel. 2004. Finite- state Transducer Cascades to Extract Named Entities in Texts. Theoretical Computer Science 313(1):93– 104. Ralph Grishman and Beth Sundheim. 1996. Message Understanding Conference-6: A Brief History ...
... levels of information: name type, role, gender. We wanted to examine the recognition of NEs on different level of details. Therefore, on the basis of the gold standard, we developed its four versions by 4The evaluation was performed as a homework by several generations of students of Library and Information ...
... Recognition sys- tems: SPACY NER (Subsection 3.2) and STAN- FORD NER (Subsection 3.3). Trained models for Serbian are available on NER&BEYOND plat- form, which is presented in Section 6. 3.2 spaCy NER spaCy (Honnibal and Montani, 2017) is a free, open-source library for advanced Natural Lan- guage Processing ...Branislava Šandrih, Cvetana Krstev, Ranka Stanković. "Development and Evaluation of Three Named Entity Recognition Systems for Serbian - The Case of Personal Names" in Proceedings - Natural Language Processing in a Deep Learning World, Incoma Ltd., Shoumen, Bulgaria (2019). https://doi.org/10.26615/978-954-452-056-4_122
-
Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++
Branislava Šandrih, Ranka Stanković (2020)U nauci, industriji i mnogim istraživačkim oblastima, terminologija se brzo razvija. Najčešće, jezik koji je „lingua franca“ za većinu ovih oblasti je engleski. Kao posledica toga, za mnoga polja termini domena su koncipirani na engleskom, a kasnije se prevode na druge jezike. U ovom radu predstavljamo pristup za automatsko izdvajanje dvojezične terminologije za englesko-srpski jezički par koji se oslanja na usaglašeni dvojezični korpus domena, ekstraktor terminologije za ciljni jezik i alat za usklađivanje delova. Ispitujemo performanse metode na domenu ...... relies on an aligned bilingual domain corpus, a termi- nology extractor for a target language and a tool for chunk alignment. We examine the per- formance of the method on a Library and In- formation Science domain. The obtained re- sults, as well as the application that imple- ments the method, are available ...
... word lists and monolingual dictionaries of MWTs are long-term activities. Acknowledgment This research was partly supported by the Ministry of Education, Science and Technological Development through projects ON-178006 and III47003. References Arcan, Mihael, Marco Turchi, Sara Tonelli and Paul Buitelaar ...
... Machine Translation in a Computer Aided Translation Environment”. Natural Language Engineering Vol. 23, no. 5 (2017): 763–788 Baldwin, Timothy and Su Nam Kim. “Multiword Expressions”. Handbook of Natural Language Processing Vol. 2 (2010): 267–292 Bouamor, Dhouha, Nasredine Semmar and Pierre Zweigenbaum. ...Branislava Šandrih, Ranka Stanković. "Extraction of Bilingual Terminology Using Graphs, Dictionaries and GIZA++" in Infotheca, Faculty of Philology, University of Belgrade (2020). https://doi.org/10.18485/infotheca.2019.19.2.6
-
History of meteorite donations to the Collection of Minerals and Rocks of the University of Belgrade, Faculty of Mining and Geology (Serbia)
... Academy of Sciences, Sofia Institute of Mineralogy and Crystallography “Acad. Ivan Kostov”, Bulgarian Academy of Sciences, Sofia National Museum of Natural History, Bulgarian Academy of Sciences, Sofia Sofia University “St. Kliment Ohridski”, Sofia University of Mining and Geology “St ...
... active and respected scientist across Europe, and with his collaborations and private efforts, he collected 95 well-known meteorites worldwide. Unfortunately, most of this collection went missing during the World War I (Jović, 2002). However, a rather quick development of geological sciences in Serbia ...
... collection. Besides numbering, each sample in that catalogue contained information on the sample’s name and the locality of recovery. The meteorites donated to the Mineralogical Museum of the Great School of Belgrade had numbers 36, 37 and 38, but meanwhile the specimen No 36 went missing. Descriptions ...Alena Zdravković, Maja Milošević, Kristina Šarić, Ivana Jelić, Ana Černok. "History of meteorite donations to the Collection of Minerals and Rocks of the University of Belgrade, Faculty of Mining and Geology (Serbia)" in 9th International Conference Mineralogy and Museums, Sofia, Bulgaria, Earth and Man National Museum and Bulgarian Mineralogical Society (2021)
-
Building Terminological Resources in an e-Learning Environment
... of efficient and flexible database search and information extraction on the web is growing each day, performance of the search related to mining engineering data can be greatly improved by the use of RudOnto in query expansion [10]. Given the variety of applications of RudOnto and in view of ...
... e-format for mining engineering terms has been recognized several years ago. Namely, various applications developed in this area followed by the development of an information system for planning, monitoring and management of mine exploitation, indicated that such resources would greatly contribute to ...
... various texts on the web. First and foremost, they are indispensable in information an document retrieval systems. In addition to monolingual resources, machine translation systems and cross- language information retrieval emphasize the need for development of bilingual and multilingual terminological ...Ranka Stanković, Ivan Obradović, Olivera Kitanović, Ljiljana Kolonja. "Building Terminological Resources in an e-Learning Environment" in Proceedings of the Third International Conference on e-Learning, eLearning-2012, September 2012, Belgrade, Serbia, Belgrade : Belgrade Metropolitan University (2012)
-
Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons
Mihailo Škorić (2017)The goal of this paper is to draw attention to the possibility of using emoticon-riddled text on the web in language-neutral sentiment analysis. It introduces several innovations in the existing framework of research and tests their effectiveness. It also presents a software tool especially made for that purpose, explains how it builds a database with sentimental value of terms and offers the user manual. Finally, it presents a software tool that tests the new database and gives some examples ...... different ways in social and demographic research to quickly and efficiently collect large amounts of information. Developing intelligent systems that work with information: – Information retrieval: retrieval of specific information in the text, as well as finding information that can not be precisely ...
... machines and the people. Infotheca Vol. 17, No. 1, 2017 89 Škorić M., “Classification Based on Emoticons”, pp. 67–91 5.2 Possible applications This method of determiner extraction and generally similar researches can find a wide variety of applications divided into two groups. Social and demographic ...
... expressed in them to help find the necessary information. – Natural language understanding and analysis: understanding of written text and text queries, analysis of moods in the text, processing of digital linguistic resources such as automatic parallelization and automation of any operation that requires ...Mihailo Škorić. "Classification of Terms on a Positive-Negative Feelings Polarity Scale Based on Emoticons" in Infotheca, Faculty of Philology, University of Belgrade (2017). https://doi.org/10.18485/infotheca.2017.17.1.4
-
The Many Faces of SrpKor
Акроним СрпКор означава фамилију електронских корпуса савременог српског језика чија је изградња почела крајем седамдесетих година прошлога века, а која је постала шире видљива заинтересованој истраживачкој заједници објављивањем његове прве верзије на вебу 2002. године. У овом дугом периоду, посебно пре појаве корисних текстуелних ресурса на вебу, развој корпуса се састојао у прикупљању и обради грађе као и у развоју метода обраде корпуса. Наиме, електронски корпус није само колекција текстова у дигиталном облику (како се то, на пример, наводи ...Duško Vitas, Ranka Stanković, Cvetana Krstev. "The Many Faces of SrpKor" in South Slavic Languages in the Digital Environment JuDig Book of Abstracts, University of Belgrade - Faculty of Philology, Serbia, November 21-23, 2024, University of Belgrade - Faculty of Philology (2024.)
-
Microstructural and magnetic properties of electrospun hematite/cuprospinel composites
Phase composition, microstructural and magnetic properties of electrospun hematite/cuprospinel composites were investigated. Samples were synthesized starting with 0 to 10 mol% of copper relative to iron. The round shape of reference electrospun fbres was preserved upon their heating up to 600 °C in air, whereas at 700 °C hollow substructure was additionally formed. In these reference samples the presence of hematite phase was detected by XRPD. A small amount (traces) of Fe3O4 /γ-Fe2O3 was also found, due to the ...Electrical and Electronic Engineering, Condensed Matter Physics, Atomic and Molecular Physics and Optics, Electronic, Optical and Magnetic MaterialsMira Ristić, Aleksandar Kremenović, Michael Reissner, Željka Petrović, Svetozar Musić. "Microstructural and magnetic properties of electrospun hematite/cuprospinel composites" in Journal of Materials Science: Materials in Electronics, Springer Science and Business Media LLC (2020). https://doi.org/10.1007/s10854-020-03526-0
-
Arsenic in Tap Water of Serbia´s South Pannonian Basin and Arsenic Risk Assessment
Petar Papić, Marina Ćuk, Maja Todorović, Jana Stojković, Bojan Hajdin, Nebojša Atanacković, Dušan Polomčić (2012)... hydrochemical map showing arsenic concentrations (Fig. 6) and assessment of the risks incurred through drinking water. According to the Provincial Secretariat of Science and Technology Development, more than 600,000 inhabitants of Banat and Bačka (or some 40% of the population of Vojvodina) obtain drinking ...
... Belgrade, Faculty of Mining and Geology [20], “Geozavod” from Belgrade [21], as well as the Faculty of Natural Sciences of the University of Novi Sad [22]. Total arsenic concentrations were measured in 470 water samples from public water supply systems from 2004 to 2009. Sampling and analyses of drinking water ...
... was undertaken in accor- dance with the Drinking Water Sampling and Laboratory Analysis Rulebook [24]. The laboratory for chemical test- ing of the environment at the Faculty of Natural Sciences of the University of Novi Sad used advanced and specialized tools for the precise required analysis. Arsenic ...Petar Papić, Marina Ćuk, Maja Todorović, Jana Stojković, Bojan Hajdin, Nebojša Atanacković, Dušan Polomčić. "Arsenic in Tap Water of Serbia´s South Pannonian Basin and Arsenic Risk Assessment" in Polish Journal of Environmental Studies (2012)
-
Valorization of non-balanced coal reserves in Serbia for underground coal gasification
David Petrović, Lazar Kričak, Milanka Negovanović, Stefan Milanović, Jovan Marković, Nikola Simić, Ljubisav Stamenić (2019)In the name of a better and safer energy future, it is our responsibility to focus our knowledge and activities to save on imported liquid and gas fossil fuels, as well as coal on which energy security of Serbia is based. The rationalization in the use of available energy resources certainly positively affects economy and the environment of a country. This paper indicates motivations for the application of the underground coal gasification process, as well as surface gasification for ...... for indus- rial applications. Noticeably the remaining gas that is released from CO, contains significantly more powerful energy value, because CO, as well as N, are inert gases [20]|. When CO, and N, are removed from the UCG process gas in scrubbers, then the gas is cleaner and more ener- gy-efficient ...
... would be adequate to take into account the exceptional interest for the UCG in the world, and especially in Europe. There are various current EU research funder projects that are tackling: UCG applications, and especially utilization of hydrogen generated in the UCG process. Serbia possesses coal ...
... , Stefan V. MILANOVIĆ”, Jovan R. MARKOVIĆ”, Nikola Z. SIMIĆ”, and Ljubisav S. STAMENIĆĆ" 2 4D Konsalting, Belgrade, Serbia P Faculty of Mining and Geology, University of Belgrade, Belgrade, Serbia *Institute of Nuclear Sciences “Vinča”, University of Belgrade, Belgrade, Serbia Original scientific ...David Petrović, Lazar Kričak, Milanka Negovanović, Stefan Milanović, Jovan Marković, Nikola Simić, Ljubisav Stamenić. "Valorization of non-balanced coal reserves in Serbia for underground coal gasification" in Thermal Science (2019). https://doi.org/10.2298/TSCI190725390P
-
Споменица 1991. – 2015. година: 135 година геологије и 70 година рударства на Универзитету у Београду
... Journal of earth and environmental sciences, 9 (3). ATANACKOVIĆ N., DRAGIŠIĆ V., STOJKOVIĆ J., PAPIĆ P., ŽIVANOVIĆ V., 2013: Hydrochemical characteristics of mine waters from abandoned mining sites in Serbia and their impact on surface water quality. Environmental Science and Pollution Research ...
... Anomaly of the Pannonian Basin and its Association with the Geothermal Anomaly of Serbia. In: KARAMATA, S. (Ed.), Geodinamic Evolution of The Pannonian Basin, Academic Conferences/Serbian Academy of Sciences and Arts, vol. 62. Department of natural and mathematical Science, vol. 4, Beograd, 355–365. ...
... Journal of African Earth Sciences, 2010. 38. Karović-Maričić V., Danilović D.: Preliminary management and optimization of a gas reservoir in central Serbia, Journal of Petroleum Science and Engineering, 2010. 39. Čebašek V., Bošković Z., Mitrović V., Gojković N.: Radial stress and defor- mation of cement ...главни и одговорни уредник Душан Поломчић. Споменица 1991. – 2015. година: 135 година геологије и 70 година рударства на Универзитету у Београду, Београд : Универзитет у Београду, Рударско-геолошки факултет, 2016
-
Refinement of waste phosphogypsum from Prahovo, Serbia: characterization and assessment of application in civil engineering
Josip Išek, Lazar Kaluđerović, Nikola Vuković, Maja Milošević, Ivana Vukašinović, Zorica Tomić (2020)... Education, Science and Technological Development of the Republic of Serbia through Project No. 176010 and Project No. 43007. References Amjad Z. (1988) Calcium sulfate dihydrate (gypsum) scale formation on heat exchanger surfaces: the influence of scale inhibitors. Journal of Colloid and Interface ...
... million tons are produced in China and 40 million tons are produced in the USA (IAEA, 2013). Raw phosphogypsum is generally yellowish, which, along with its greyish grains, defines it as a low-grade gypsum material in traditional applications, such as in cements and mortars (Costa et al., 2019). Primary ...
... phate rock. Journal of Radiation Protection and Research, 42(1), 33–41. Kandil A.H.T., Cheira M.F., Gado H.S., Soliman M.H. & Akl H.M. (2017) Ammonium sulfate preparation from phosphogypsum waste. Jourmal of Radiation Research and Applied Sciences, 10, 24-33. Lushnikova N. & Dvorkin L. (2016) ...Josip Išek, Lazar Kaluđerović, Nikola Vuković, Maja Milošević, Ivana Vukašinović, Zorica Tomić. "Refinement of waste phosphogypsum from Prahovo, Serbia: characterization and assessment of application in civil engineering" in Clay Minerals, Cambridge University press (2020). https://doi.org/10.1180/clm.2020.11
-
Examination and characterization of nanostructured Co0.9Ho0.1MoO4
Милена Росић, Маја Милошевић, Мариа Чебела, Владимир Додевски, Весна Лојпур, Марија Васић, Александра Зарубица (2023)... Александра Зарубица | Serbian Ceramic Society Conference - Аdvanced ceramics and application XI New Frontiers in Multifunctional Material Science and Processing Program and the Book of Abstracts, Serbian Academy of Sciences and Arts, Serbia, Belgrade, 18-20. September 2023. | 2023 | | http://dr.rgf.bg ...
... CERAMICS AND APPLICATION XI New Frontiers in Multifunctional Material Science and Processing Serbian Ceramic Society Institute of Technical Sciences of SASA Institute for Testing of Materials Institute of Chemistry Technology and Metallurgy Institute for Technology of Nuclear and Other Raw ...
... electrolytes. P15 Examination and characterization of nanostructured Coy»„Ho».ı MoO„ Milena Rosićl, Maja Miloševićz, Maria Čebelal, Vladimir Dodevskil, Vesna Lojpur3, Marija Vasić“, Aleksandra Zarubica“ 1Laboratory for Material Science, Institute of Nuclear Sciences „Vinča“, National Institute of ...Милена Росић, Маја Милошевић, Мариа Чебела, Владимир Додевски, Весна Лојпур, Марија Васић, Александра Зарубица. "Examination and characterization of nanostructured Co0.9Ho0.1MoO4" in Serbian Ceramic Society Conference - Аdvanced ceramics and application XI New Frontiers in Multifunctional Material Science and Processing Program and the Book of Abstracts, Serbian Academy of Sciences and Arts, Serbia, Belgrade, 18-20. September 2023., Belgrade : Serbian Ceramic Society (2023)
-
Results of Recent Monitoring Activities on Landslide Umka, Belgrade, Serbia—IPL 181
Biljana Abolmasov, Uroš Đurić, Jovan Popović, Marko Pejić, Mileva Samardžić Petrović, Nenad Brodić (2021)... reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive ...
... such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher ...
... ral processing: techniques and applications. In: Multitemporal remote sensing, remote sensing and digital image processing, pp 145–176 Peternel T, Kumelj S, Ostir K, Komac M (2017) Monitoring the Potoška planina landslide (NW Slovenia) using UAV photogram- metry and tachymetric measurements. Landslides ...Biljana Abolmasov, Uroš Đurić, Jovan Popović, Marko Pejić, Mileva Samardžić Petrović, Nenad Brodić. "Results of Recent Monitoring Activities on Landslide Umka, Belgrade, Serbia—IPL 181" in Understanding and Reducing Landslide Disaster Risk. WLF 2020. ICL Contribution to Landslide Disaster Risk Reduction, Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60196-6_14
-
Developing Termbases for Expert Terminology under the TBX Standard
... Lou Burnard and Syd Bauman, editors. TEI P5: Guidelines for Electronic Text Encoding and Interchange. TEI Consortium, 2009. M. Teresa Cabré Castellvi. Theories of Terminology — Their Description, Pre- scription and Explanation. Terminology, 9(2):163-199, 2003. ISO. Computer Applications in Terminology ...
... ISO 12200:1999. . ISO. Computer Applications in Terminology — Terminological Markup Frame- work, 2003. Ref. ISO 16642:2003. ISO. Systems to Manage Terminology, Knowledge and Content — TermBase eX- change (TBX), 2008. Ref. ISO 30042:2008. ISO. Terminology and Other Language and Content Resources — S ...
... a particular concept and contains a list ofelements. A element is associated with the language section level of the TMF meta-model and contains a list of and elements. Each (“term information group”) and (“nested term information group”) ele- ment represents ... Ranka Stanković, Ivan Obradović, and Miloš Utvić. "Developing Termbases for Expert Terminology under the TBX Standard" in Natural Language Processing for Serbian - Resources and Applications, Belgrade : University of Belgrade, Faculty of Mathematics (2014)
-
Srbija u OneGeology Europe
Геолошки завод Србије као носилац Пројекта ОneGeologyEurope заједно са Рударско геолошким факултетом и Министарством за природне ресурсе, рударство и просторно планирање су се укључили у међународни Пројекат OneGeology Europe у мају 2013. године у већ поодмаклој фази израде Пројекта. До краја 2013. године испунили су завршене активности које треба да доведу до пуноправног укључења у Пројекат чиме је Република Србија нашла своје место на Геолошкој карти Европе 1:1М. Геолошка карта Србије 1:1М представља компилациону односно поједностављену верзију ОГК 1:500 ...... further detailed information, or it can be formatted into a report, or even used in other applications for further development. Geological surveys and similar institutions, that wish to contribute to the OneGeology initiative are aiming to provide an OGC Web Mapping Service (WMS) and two OGC Web Feature ...
... further detailed information, or it can be formatted into a report, or even used in other applications for further development. Geological surveys and similar institutions, that wish to contribute to the OneGeology initiative are aiming to provide an OGC Web Mapping Service (WMS) and two OGC Web Feature ...
... ogy/MapServer/WMSServer and it contains a layer of geological units and a layer of geological structures. Service name, as well as the layers names are strictly defined by 1G-E propositions, and reflect information about data owner (GZS), data provider (RGF), scale and content. In addition to the ...Danka Blagojević, Ranka Stanković, Petar Stejić, Velizar Nikolić. "Srbija u OneGeology Europe" in Zapisnici Srpskog geološkog društva za 2013. godinu, Beograd : Srpsko geološko društvo (2014)
-
Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology
Mihailo Škorić, Mauro Dragoni (2019)This paper is a result of a task that was presented to attendants of Keyword Search in Big Linked Data summer school, that was organized by Vienna University of Technology, under the Keystone COST action in the summer of 2017. It presents a specific approach to the classification via creation of minimal document surrogates based on the US National medical library’s MeSH ontology, which is derived from the Medical Subject Headings thesaurus. In a series of previously classified medically ...... Elberrichi, Zakaria, Malika Taibi and Amel Belaggoun. “Multilingual Med- ical Documents Classification Based on MesH Domain Ontology”. Inter- national Journal of Computer Science Issues Vol. 9 (2012) Kolonja, Ljiljana, Ranka Stanković, Ivan Obradović, Olivera Kitanović and Aleksandar Cvjetić. “Development ...
... problem that arises in calculating the coefficient of similarity be- tween texts is high computer cost, which must be paid either in processing power or high execution time. For this reason, the first step in classifying (and indexing) is most often the creation of a document surrogate. Usu- ally, documents ...
... are merged into one, ∼230MB, in size, with new lines as the border between the documents. Information of importance – doc- ument names ([0-9]+[.]txt), identifiers in them ([A-Z][0-9][0-9]([.][0-9][0- 9][0-9])*) and tags for new row ([\r\n]+) - are found using regular expres- sion ([A-Z][0-9][0-9]([.][ ...Mihailo Škorić, Mauro Dragoni. "Medical Domain Document Classification via Extraction of Taxonomy Concepts from MeSH Ontology" in Infotheca, Faculty of Philology, University of Belgrade (2019). https://doi.org/10.18485/infotheca.2019.19.1.3