Student Work on ANNIS3
If you are a student in Berlin/Potsdam/Washington DC and are interested in writing a final paper (Studienarbeit, Diplomarbeit, QP etc.) or working part time (SHK-Stelle) within the ANNIS project, please let us know! We're always looking for people with good database, XML and JAVA programming skills, and our staff is always learning and experimenting with the latest tools and methods.
Work supervised within the project so far includes:
- Karsten Hütter: Entwicklung einer Benutzerschnittstelle für die Suche in linguistischen mehrebenen Korpora unter Betrachtung softwareergonomischer Gesichtspunkte (supervised by Ulf Leser, Hartmut Wandke) [Exposé]
- Florian Zipser: Entwicklung eines (Meta-)Modells für linguistisch annotierte Daten (supervised by Ulf Leser, Anke Lüdeling) [Diplomarbeit - Exposé]
- Viktor Rosenfeld: An Implementation of the Annis 2 Query Language (supervised by Ulf Leser) [Studienarbeit]
- Viktor Rosenfeld: Implementation of a Linguistic Query Language on top of a Column-Oriented Main-Memory Database (supervised by Ulf Leser, Stefan Manegold) [Diplomarbeit - exposé]
Some open topics we would like to encourage students to work on include:
Search
- Negation of relational operators with and without implied existence assumption (e.g. looking for nodes dominated by nodes not meeting a criterion or not dominated by certain nodes, and similarly for other operators)
- Expanding query functionality with aggregate functions and statistical output (counting different features for nodes and edges, performing mathematical manipulations on them and outputting certain data structures)
- Extending AQL with new types of value matches, especially numerical operators
- Modelling 'empty' tokens - how should the system behave for annotations which apply to no text at all or between tokens? (e.g. linguistically motivated traces, pro-forms ...)
- Developing export/re-import workflows for correcting and extending annotations
Visualization
- Visualizing parallel corpora (correspondences between flat text in multiple languages as well as parallel graphs involving higher structures)
- Creating user friendly, customizable statistical views of aggregate data
- Expressing part-whole relationships between tokens and subtokens with hierarchically conflicting spans
- Visualization of transitive pointing relations between tokens, such as dependency edges
- User-oriented optimization and customization of the interface
Take a look at our public GitHub site to see what we've been working on!