Document Actions

You are here: FRIAS Events Humanities and Social … Kolloquium Geistes- und …

Kolloquium Geistes- und Sozialwissenschaften - Michael Rießler

NLP in endangered language documentation.
Building and investigating corpora for under-resourced languages of the Barents Sea region

Dr. Michael Rießler
General Linguistics
University of Freiburg

NLP in endangered language documentation. Building and investigating corpora for under-resourced languages of the Barents Sea region
When Jun 12, 2017
from 11:15 AM to 12:45 PM
Where FRIAS, Albertstr. 19, Seminar Room
Contact Name
Contact Phone +49 (0)761 203-97362
Attendees universitätsöffentlich / open to university members
Add event to calendar vCal
iCal

Approximately 7000 languages are spoken worldwide. More than half of these are threatened to become extinct by the end of the current century. Documentary linguistics is a reaction by the academic linguistic community to the immanent disappearance of the majority of the world’s languages, which will ultimately also lead to the irreversible loss of intellectual heritage and cultural knowledge. Although it evolved out of traditional fieldwork methodology used primarily by descriptive linguists and language anthropologists, documentary linguistics is no longer merely a method, as it has its own primary aims and methodologies. One of the most important purposes of the field is making data available for further research on and for endangered languages, for both further theoretical and applied research, as well as for direct use by the relevant language communities. 

Despite the digital nature of current methodology in the field, documentary linguistics has rarely considered applying computational methods in building and analyzing endangered language corpora more efficiently. In my talk, I will provide a brief overview of work-in-progress in my project at FRIAS, which is one of the very first attempts to work in the paradigm of endangered language documentation and description while systematically applying methods from Natural Language Processing (NLP) for automated corpus annotation. The language I work most intensively on at present is Komi-Zyrian, a Uralic language spoken by approximately 160,000 speakers. I am also involved in similar projects on much smaller Uralic languages from the Saamic and Samoyedic branches, all spoken in the Barents Sea region of northeastern Europe.