Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text


Schulz, Sarah ; Keller, Mareike


DOI: 10.18653/v1/W16-2105
URL: http://anthology.aclweb.org/W16-2105
Additional URL: http://aclweb.org/anthology/W16-2100
Document Type: Conference or workshop publication
Year of publication: 2016
Book title: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2016) : August 11, 2016, Berlin, Germany
Page range: 43-51
Conference title: 10th SIGHUM Workshop
Location of the conference venue: Berlin, Germany
Date of the conference: 11.08.2016
Author/Publisher of the book
(only the first ones mentioned)
:
Reiter, Nils
Place of publication: Stroudsburg, PA
Publishing house: Association for Computational Linguistics
ISBN: 978-1-945626-09-8
Publication language: English
Institution: School of Humanities > Anglistik IV - Anglistische Linguistik/Diachronie (Trips 2006-)
Subject: 400 Language, linguistics
Individual keywords (German): Linguistische Annotation , Mittelenglisch , Latein
Keywords (English): linguistic annotation , POS tagging , code-switching , Middle English , Latin
Abstract: In this paper, we describe the development of a language identification system and a part-of-speech tagger for Latin-Middle English mixed text. To this end, we annotate data with language IDs and Universal POS tags (Petrov et al., 2012). As a classifier, we train a conditional random field classifier for both sub-tasks, including features generated by the TreeTagger models of both languages. The focus lies on both a general and a task-specific evaluation. Moreover, we describe our effort concerning beyond proof-of-concept implementation of tools and towards a more task-oriented approach, showing how to apply our techniques in the context of Humanities research.

Dieser Eintrag ist Teil der Universitätsbibliographie.




+ Citation Example and Export

Schulz, Sarah ; Keller, Mareike (2016) Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2016) : August 11, 2016, Berlin, Germany 2016 Stroudsburg, PA [Conference or workshop publication]



+ Search Authors in

+ Page Views

Hits per month over past year

Detailed information



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item