ANNIS: Search and Visualization in Multilayer Linguistic Corpora

Research Projects Employing ANNIS and PAULA

  • Cooperations within SFB 632 "Information Structure":

    • A5:
    • "Focus realization, focus interpretation, and focus use from a cross-linguistic perspective"
    • A6:
    • "A constraint-based analysis of information structure in German, Spanish and French"
    • A8:
    • "Structuring linguistic information using discourse particles"
    • B1:
    • "The Interaction of Information Structure and Grammar in Gur and Kwa Languages" (data elicited with QUIS; project completed)
    • B2:
    • "Information Structuring in Chadic Languages" (data elicited with QUIS; project completed)
    • B4:
    • "The Role of Information Structure in the Development of Word Order Regularities in Germanic"
    • B6:
    • "Grammatical Reduction and Information Structural Preferences in a Contact Variety of German: Kiezdeutsch"
    • B7:
    • "Predicate-centered focus types: A sample-based typological study in African languages"
    • C1:
    • "Contextually Licensed Non-canonical Word Order in Language Comprehension" (completed)
    • C6:
    • "Experimental and Corpus Investigations of Information Structure in Hindi" (completed)
    • D1:
    • "Linguistic Database for Information Structure: Annotation and Retrieval"
    • D2:
    • "Typology of Information Structure" (completed)

  • Argument Structure in Texts - A comparative-typological joint project at the universities of Erfurt (Germany) and Pavia (Italy), working on a quantitative investigation of argument structure in Classical Greek and Yucatec Maya (ANNIS integration in progress)

  • Atomic - A versatile and platform-independent annotation tool with connection to ANNIS via SaltNPepper developed at the University of Zurich and Friedrich Schiller University Jena

  • BeMaTaC - The Berlin Map Task Corpus: A deeply annotated multimodal map-task corpus of spoken learner and native German

  • Modelling Textual Organisation: Coherence and Cohesion - project at CLCG (Center for Language and Cognition, Groningen, NL), hosting a multilayer annotated text corpus of Dutch in ANNIS

  • Coptic SCRIPTORIUM (HU Berlin/University of the Pacific): A digital humanities project on resources for Sahidic Coptic manuscripts

  • DDB - Deutsch Diachrone Baumbank - A comparable treebank of Old, Middle and Early New High German

  • DDD - Deutsche Diachron Digital - Referenzkorpus Altdeutsch - a reference corpus of historical German texts (8th-13th century)

  • Falko - Fehlerannotiertes Lernerkorpus des Deutschen als Fremdsprache / An error-annotated learner corpus of German as a foreign language, HU Berlin

  • Friedrich-Schiller-Universität Jena:
  • Forschungsverbund Linguistik - Bioinformatik - Syntactic annotation of diachronic German language stages for the calculation of linguistic distance and phylogeny, HU Berlin (project completed)

  • Kobalt-DaF - Korpusbasierte Analyse von Lernertexten für Deutsch als Fremdsprache - a research network on German learner language

  • KOMeT - Korpuslinguistische Methoden für e-Philologie mit TEI - a junior researcher group in the Digital Humanities funded by the German Ministry of Education and Science (BMBF)

  • KOMPOST - BMBF project on identification of competence indicators in school children's writing

  • LAUDATIO - Long-term Access and Usage of Deeply Annotated Information - A project working on sustainable repository storage for historical corpora at Humboldt-Universität zu Berlin

  • MASC - The Manually Annotated SubCorpus of the Open American National Corpus (OANC)

  • PROIEL (Oslo) - Pragmatic Resources in Old Indo-European Languages

  • Perseus Latin and Ancient Greek Treebank - Dependency Treebanks of Ancient Greek and Latin Classics

  • Ramsès - an Egyptological project at the Université de Liège producing corpora and annotation tools for Late Egyptian texts

  • Referenzkorpus des Frühneuhochdeutschen - a reference corpus or Early New High German from 1350 to 1650

  • RIDGES - Register in Diachronic German Science: A project on the development of German as a language of science in the 16th-19th centuries, funded by two Google Digital Humanities Research Awards

  • Roman de Flamenca - a multilayer parallel corpus of the 13th century Old Occitan narrative Le roman de Flamenca compiled at Indiana University

  • sms4science - a multilingual corpus of multilayer-annotated text messages (SMS)

  • SUMMaR - Text Summarization Systems for Robust and High Quality Summaries, Potsdam (project completed)

  • The Anselm Project - "Questions by Saint Anselm about the Lord's Passion" - an interdisciplinary research project on a 14th-16th century German text at the Ruhr University Bochum

  • The Language Archive at the Max Planck Institute for Psycholinguistics (Nijmegen, NL), with discourse-annotated corpora of Dutch in ANNIS

  • University of Regensburg DFG project on Grammaticalization of Peripheral Subjects in Slavic Languages:
    • RRuDi - Regensburg Russian Diachronic Corpus
    • PolDi - Regensburg Polish Diachronic Corpus