Beginnjahr 2009 Abschlussjahr 2009


durchführende Institutionen


ProjektleiterInnen+Ansprechpersonen MitarbeiterInnen
Ländercode Österreich Sprachcode Deutsch, Englisch
Schlagwörter Deutschapplied linguistics
Schlagwörter Englischlanguage teaching and testing, applied linguistics, corpus linguistics, computational linguistics

The dominant role of English as a global lingua franca which is taught as a foreign or second language in almost every country has fuelled an increasing interest in both practical and theoretical issues regarding English language proficiency, including how to evaluate it. In this spirit, nationwide standardized testing has finally come of age in Austrian schools, with investigations into test validity and reliability helping to ascertain what a test measures in what context and how well it does so, and also making the whole testing procedure more objective. One form of validation involves collecting authentic performances which are subjected to linguistic analysis yielding statistical data. These can then be related to the ratings which the performances were awarded by trained raters.

Our project proposes to do precisely that while focusing on the writing skills of Austrian pupils as produced in the first nationwide educational standards tests. Based on approximately 20,000 long and short writing samples generated by around 10,000 pupils aged 14 to 15 and collected in the E8 baseline study in 2009, our research will initially focus on the following three issues:

· What are the statistical properties of norm adequate linguistic features in the written manifestations of Austrian English learner language amongst the population of 14-year-old pupils?

· What are the statistical properties of non-norm adequate linguistic features ('errors') in the manifestations of Austrian English learner language amongst the population of 14-year-old pupils?

· Which of these features predict in a statistical sense the ratings awarded to the writing samples by the trained raters on the four dimensions of Task Achievement, Coherence and Cohesion, Grammar and Vocabulary?


However, in order to access the desired information, the handwritten performances must be transformed into a more user-friendly form, namely into a language corpus. Language corpora are large, machine-readable collections of natural language whose main purpose is to serve as a representative basis for the extraction of linguistic data in a statistically meaningful way. The 1.7 million-word learner language corpus that will be constructed within the project will also be annotated for all the relevant features that characterize L2 writing skills, both good and bad. The practical and theoretical ramifications of our research questions will be of great interest to society, education and interdisciplinarity, above and beyond the immediate benefits for language learning, teaching and testing.

Hauptkategorie(n)Bildungsinhalt (Themenfeld)
Lehren und Lernen (Prozesse und Methoden)
Information, Kommunikation, Statistik
