Cross-linguistic and language-internal variation in text and speech: focus on the joint analysis of multiple characteristics

Wann 09.02.2011 um 09:00 bis
11.02.2011 um 18:00
Wo Rektorat, Fahnenbergplatz, Senatssaal / FRIAS Seminarraum, Albertstr. 19
Teilnehmer nach Anmeldung
Organized by Benedikt Szmrecsanyi & Bernhard Wälchli

This workshop seeks to bring together typologists, dialectologists and dialectometricians, register analysts, and quantitative linguists to discuss approaches to cross-linguistic and language-internal diversity that

  • are based on the study of corpora of texts or speech of different languages, different dialects or different registers (conversation, narratives including retold stories, newspaper prose, parallel texts, etc.) -- not on reference grammar material, questionnaire data, dialect atlases, or elicitation;
  • are concerned with the joint (or: aggregate) analysis of multiple characteristics or features. These multiple characteristics may be frequency counts and/or distributional data with low levels of data reduction, but not binary features or discrete features with few types;
  • marshal some sort of quantitative analysis technique to see the wood for the trees. Such techniques may involve data mining in the broadest sense, dimension reduction techniques, taxonomy, index calculation, diagrammatic visualization methods (e.g. network diagrams), projections to geography, and so on.

The nature and number of characteristics is not limited in any way (the more the merrier). Functional and formal perspectives on phonetics, phonology, morphology, syntax, and the lexicon are all welcome, provided the features investigated can be extracted from texts or speech with minimal commitments to particular theories of grammar. For relevant case studies in this spirit, see Szmrecsanyi (to appear) or Wälchli (2009).
The workshop is intended as a platform to discuss appropriate analysis techniques and issues concerning the corpus-cum-aggregation endeavor, as well as its prospects. Owing to the interdisciplinary scope of the workshop, we welcome contributions (i) which have an interdisciplinary focus themselves, and (ii) which emphasize methodological aspects rather than the detailed discussions of results. The approaches presented should be applied to a particular set of corpora, and the abstract should spell out the methodology utilized.



(venue: Rektorat, Fahnenbergplatz, Senatssaal)

9:30-9:45 Benedikt Szmrecsanyi (FRIAS) & Bernhard Wälchli (University of Bern)

9:45-10:45 Michael Cysouw (LMU Munich)
"Historical reconstruction through parallel corpora"

10:45-11:00 break

11:00-12:00 William Kretzschmar (University of Georgia)
"Complex Systems in Aggregated Variation Analyses"

12:00-13:00 sandwich lunch (FRIAS lounge)

13:00-13:45 Sascha Diwersy (University of Cologne), Stefan Evert (University of Osnabrück) & Stella Neumann (TU Darmstadt/RWTH Aachen)
"A corpus-driven approach to language variation"

13:45-14:30 Bernhard Wälchli (University of Bern)
"Typological features as indices of automatically extracted multiple lexical characteristics, or, an approximation to spectral analysis of morphological complexity in parallel texts"

14:30-14:45 break

14:45-15:45 Wilbert Heeringa & Frans Hinskens (Meertens Institute)
"Dutch dialect change in lexis, morphology and sound components"

15:45-16:15 coffee break (FRIAS lounge)

16:15-17:00 Maria Koptjevskaja-Tamm (Stockholm University) & Magnus Sahlgren (Stockholm University / Swedish Institute of Computer Science)
"Temperature in the Word Space: sense exploration of temperature expressions using word-space modelling"

17:00-17:45 Benedikt Szmrecsanyi (FRIAS)
"Holistic corpus-based dialectology"



(venue: FRIAS, Albertstr. 19, Hörsaal)

10:30-10:45 break

10:45-11:30 Ruprecht von Waldenfels (University of Bern)
"Tapping into intra-family variation using a Slavic parallel corpus"

11:30-12:15 Thomas Mayer (University of Konstanz)
"Automatically extracting place features from the distribution of consonants in corpora"

12:15-14:15 Lunch buffet (FRIAS lounge)

14:15-15:15 Karen Corrigan (Newcastle University)
"Data-Mining the DECTE Corpus: Phonological and Morphological Variability in Tyneside English"

15:15-16:00 Annemarie Verkerk (Max Planck Institute for Psycholinguistics)
"Where Alice fell into: Motion events in a parallel corpus"

16:00-16:30 coffee break (FRIAS lounge)

16:30-17:30 Balthasar Bickel (University of Leipzig)
"On the role of language and other genealogical units in explaining typological distributions: a case study on referential density"



(venue: Rektorat, Fahnenbergplatz, Senatssaal)

9:00-10:00 Dirk Geeraerts & Tom Ruette (University of Leuven)
"Lexical Sociolectometry"

10:00-10:45 Douglas Biber (Northern Arizona University)
"Using multi-dimensional analysis to investigate cross-linguistic patterns of register variation"

10:45-11:00 break

11:00-12:00 Peter Grzybek (University of Graz)
"Homogeneity and heterogeneity within language(s): Relevance for intra-lingual and cross-linguistic typologies"

12:00-14:00 lunch break

14:00-15:00 Jack Grieve (University of Leuven)
"A comparison of statistical methods for the aggregation of regional linguistic variation"

15:00-16:00 Östen Dahl (Stockholm University)
"The perfect map: investigating the cross-linguistic distribution of TAME categories in a parallel corpus"

16:00-16:30 coffee break (FRIAS lounge)

16:30-17:30 General discussion



