|
ANNIS: Search and Visualization in Multilayer Linguistic Corpora |
New:
version 3.1.6 has been released (github log) - please install the new version
New features in version 3.1.6
- Note: this is a bug fix version addressing minor issues in 3.1.4. The features below are available from 3.1.4.
- Backend (database):
- Support for aggregate queries
- Fetching of full text for documents
- Storage for individual corpus preferences
- Minor performance improvements
- Frontend (user interface):
- New frequency analysis interface (histograms and type counts based on combinations of query elements and their annotations)
- Document browser mode for close reading
- Get more/less context for individual search results
- Re-implemented key-word-in-context (KWIC) for better support of dialogue corpora and gaps
- Better handling of ‘islands’: search results containing very distant areas of a document hide intervening text more intuitively for the grid and KWIC visualizers
- Query language:
- Shortened query form with operators between elements:
(e.g. cat="NP" > cat="PP" or "hello" . "world")
- New non-binding value comparison operators:
tok . tok & #1 == #2 (finds a sequence of two identical tokens)
tok _=_ lemma & #1 != #2 (finds a token that is not identical to its lemma)
- Free naming of query nodes, e.g.
NP#cat="NP" & PP1#cat="PP" . PP2#cat="PP" & #NP > #PP1 & #NP > #PP2
- Extended support for brackets in disjunctions, e.g.:
cat="NP" & (head#cat="NP" | head#pos="NN") & #1 >[func="HD"] #head
(For change logs of previous versions see their respective distributions or user guides)
What's new in ANNIS since ANNIS2?
Some of the new features in ANNIS3 include:
- Backend (database):
- Support for multiple/overlapping tokenizations (multiple simultaneous speakers, conflicting tokenizations produced by different tools)
- Support for subtokenization (annotations smaller than the reference word form unit)
- Incremental search result fetching
- Frontend (user interface):
- Completely new front-end architecture (using VAADIN)
- Build your own HTML visualization with CSS3
- Support for time-aligned streaming A/V data
- On click navigation from annotations to time-aligned player output
- Page-aligned PDF-viewer
- View KWIC and context using multiple segmentation definitions (e.g. ±5 units of normalized/diplomatic text or ±5 units of speaker1/speaker2/… etc.
- Automatically or manually generated example queries per corpus on welcome screen
- List of available metadata added to Corpus Explorer
- Support for “corpus collections” (thematic groups of corpora in corpora list)
- Support for server-side embedded fonts
- Visualizations can be hidden/shown by default, option to use grid as default view
Query language and query interface:
- Typed precedence operators (e.g. #1 .sentence,1,2 #2)
- Support for user defined and randomly generated example queries
- Virtual keyboard
- Visible query links (copy & paste link from browser)
- Matching document count for search results (e.g. "105 hits in 22 documents")