Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text

DOI:	https://doi.org/10.18653/v1/W16-2105
URL:	http://anthology.aclweb.org/W16-2105
Weitere URL:	http://aclweb.org/anthology/W16-2100
Dokumenttyp:	Konferenzveröffentlichung
Erscheinungsjahr:	2016
Buchtitel:	Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH 2016) : August 11, 2016, Berlin, Germany
Seitenbereich:	43-51
Veranstaltungstitel:	10th SIGHUM Workshop
Veranstaltungsort:	Berlin, Germany
Veranstaltungsdatum:	11.08.2016
Herausgeber:	Reiter, Nils
Ort der Veröffentlichung:	Stroudsburg, PA
Verlag:	Association for Computational Linguistics
ISBN:	978-1-945626-09-8
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Philosophische Fakultät > Anglistik IV - Anglistische Linguistik/Diachronie (Trips 2006-)
Fachgebiet:	400 Sprache, Linguistik
Freie Schlagwörter (Deutsch):	Linguistische Annotation , Mittelenglisch , Latein
Freie Schlagwörter (Englisch):	linguistic annotation , POS tagging , code-switching , Middle English , Latin
Abstract:	In this paper, we describe the development of a language identification system and a part-of-speech tagger for Latin-Middle English mixed text. To this end, we annotate data with language IDs and Universal POS tags (Petrov et al., 2012). As a classifier, we train a conditional random field classifier for both sub-tasks, including features generated by the TreeTagger models of both languages. The focus lies on both a general and a task-specific evaluation. Moreover, we describe our effort concerning beyond proof-of-concept implementation of tools and towards a more task-oriented approach, showing how to apply our techniques in the context of Humanities research.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Suche Autoren in

BASE: Schulz, Sarah ; Keller, Mareike

Google Scholar: Schulz, Sarah ; Keller, Mareike

ORCID: Schulz, Sarah and Keller, Mareike

Aufruf-Statistik

Aufrufe im letzten Jahr

Detaillierte Angaben

Sie haben einen Fehler gefunden? Teilen Sie uns Ihren Korrekturwunsch bitte hier mit: E-Mail

Actions (login required)

Eintrag anzeigen

Code-switching ubique est - Language identification and part-of-speech tagging for historical mixed text

Metadaten-Export

Zitation

Suche Autoren in

Aufruf-Statistik

Actions (login required)