Artikelaktionen

Sie sind hier: FRIAS Veranstaltungen Kolloquium Geistes- … Kolloquium Geistes- und …

Kolloquium Geistes- und Sozialwissenschaften - Michael Rießler

NLP in endangered language documentation.
Building and investigating corpora for under-resourced languages of the Barents Sea region

Dr. Michael Rießler
Allgemeine Sprachwissenschaft
Albert-Ludwigs-Universität Freiburg

NLP in endangered language documentation. Building and investigating corpora for under-resourced languages of the Barents Sea region
Wann 12.06.2017
von 11:15 bis 12:45
Wo FRIAS, Albertstr. 19, Seminarraum
Name
Kontakttelefon +49 (0)761 203-97362
Teilnehmer universitätsöffentlich / open to university members
Termin übernehmen vCal
iCal

Approximately 7000 languages are spoken worldwide. More than half of these are threatened to become extinct by the end of the current century. Documentary linguistics is a reaction by the academic linguistic community to the immanent disappearance of the majority of the world’s languages, which will ultimately also lead to the irreversible loss of intellectual heritage and cultural knowledge. Although it evolved out of traditional fieldwork methodology used primarily by descriptive linguists and language anthropologists, documentary linguistics is no longer merely a method, as it has its own primary aims and methodologies. One of the most important purposes of the field is making data available for further research on and for endangered languages, for both further theoretical and applied research, as well as for direct use by the relevant language communities. 

Despite the digital nature of current methodology in the field, documentary linguistics has rarely considered applying computational methods in building and analyzing endangered language corpora more efficiently. In my talk, I will provide a brief overview of work-in-progress in my project at FRIAS, which is one of the very first attempts to work in the paradigm of endangered language documentation and description while systematically applying methods from Natural Language Processing (NLP) for automated corpus annotation. The language I work most intensively on at present is Komi-Zyrian, a Uralic language spoken by approximately 160,000 speakers. I am also involved in similar projects on much smaller Uralic languages from the Saamic and Samoyedic branches, all spoken in the Barents Sea region of northeastern Europe.