Радови ⚒ Др РГФ - Репозиторијум РГФ

Collected Item: “SrpELTeC: A Serbian Literary Corpus for Distant Reading”

Врста публикације

Рад у часопису

Верзија рада

објављена верзија

Језик рада

енглески

Аутор/и (Милан Марковић, Никола Николић)

Ranka Stanković, Cvetana Krstev, Duško Vitas

Наслов рада (Наслов - поднаслов)

SrpELTeC: A Serbian Literary Corpus for Distant Reading

Наслов часописа

Primerjalna književnost

Издавач (Београд : Просвета)

Research Centre of the Slovenian Academy of Sciences and Arts

Година издавања

2024

Сажетак на српском језику

U članku je predstavljen SrpELTeC, korpus razvijen u okviru akcije COST Distant Reading for European Literary History (CA16204). Svi romani u SrpELTeC-u su odabrani, pripremljeni i obeleženi korišćenjem zajedničkih principa uspostavljenih za sve jezičke zbirke u Evropskoj zbirci književnog teksta (ELTeC). Navedeni su izazovi i rešenja u pripremi SrpELTeC od nule. Svi romani su ručno kodirani u TEI sa bogatim metapodacima i strukturnim napomenama. Automatska anotacija je uključivala POS-označavanje, lematizaciju i imenovane entitete, oslanjajući se na resurse za obradu prirodnog jezika koje je razvilo i održavalo JeRTeh Language Resources and Technologies Societi. Integracija SrpELTeC-a sa Vikipodacima je podržana skupom SPARQL upita za pronalaženje metapodataka sa različitim opcijama vizuelizacije. Nedavne aktivnosti u okviru COST Action NexusLinguarum—European Network for Web-centred Linguistic Data Science (CA18209) su povezane sa verzijom povezanih podataka SrpELTeC-a koristeći NLP Interchange Format. Sve verzije SrpELTeC-a su besplatno dostupne pod CC-BY licencom.

Сажетак на енглеском језику

The article presents SrpELTeC, a corpus developed within the COST action Distant Reading for European Literary History (CA16204). All novels in SrpELTeC were selected, prepared, and annotated using the common principles established for all language collections in the European Literary Text Collection (ELTeC). The challenges and solutions in preparing SrpELTeC from scratch are outlined. All novels were manually encoded in TEI with rich metadata and structural annotation. The automatic annotation included POS-tagging, lemmatization, and named entities, relying on Natural Language Processing resources developed and maintained by the JeRTeh Language Resources and Technologies Society. The integration of SrpELTeC with Wikidata was supported with a set of SPARQL queries for the retrieval of metadata with different visualization options. Recent activities within the COST Action NexusLinguarum—European Network for Web-centred Linguistic Data Science (CA18209) are related to the linked data version of SrpELTeC using the NLP Interchange Format. All versions of SrpELTeC are freely available under the CC-BY license.

Волумен/том или годиште часописа

vol. 2 (2024)

Број часописа

Почетна страна

Завршна страна

DOI број

10.3986/pkn.v47.i2.03

ISSN број часописа

ISSN 0351-1189

Кључне речи на српском (одвојене знаком ", ")

digital humanities / Serbian literature / text corpora / distant reading / linked data / named entity recognition / text analytics

Кључне речи на енглеском (одвојене знаком ", ")

digital humanities, Serbian literature, text corpora, distant reading , linked data, named entity recognition, text analytics

Линк

https://ojs-gr.zrc-sazu.si/primerjalna_knjizevnost/article/view/9411/8803

Шира категорија рада према правилнику МПНТ

M50

Ужа категорија рада према правилнику МПНТ

М51

Степен доступности

Отворени приступ

Лиценца

Creative Commons – Attribution 4.0 International

Формат дигиталног објекта

.pdf

Click here to view the corresponding item.