Open Access BASE2019

Parla-CLARIN: TEI guidelines for corpora of parliamentary proceedings

Abstract

Parliamentary proceedings (PP) are a rich source of data used by e.g. scholars in historiography, sociology, political science, linguistics, economics and economic history. As opposed to sources of most other language corpora, PP are not subject to copyright or personal privacy protections, and are typically available on-line thus making them ideal for compilation into corpora and open distribution. For these reasons many countries have already produced PP corpora, but each typically in their own encoding, thus limiting their comparability and utilisation in a multilingual setting. The talk will overview current approaches to encoding PP, with a focus on TEI and TEI-like encodings, on Akoma Ntoso, a standard specifically designed for encoding PP and other legislative documents, and on RDF, also a common approach to encoding PPs. We then motivate and propose a TEI ODD (so, a schema parametrisation and guidelines) for such corpora, based on the TEI module for Transcriptions of Speech. The work on this Parla-CLARIN recommendation started with the "CLARIN ParlaFormat" workshop (cf. https://www.clarin.eu/blog/clarin-parlaformat-workshop) with selected participants who presented their own experiences with encoding parliamentary corpora and gave their comments to the draft proposal by the authors. These comments have been largely taken into account, and the current Parla-CLARIN recommendation is available at https://github.com/clarin-eric/parla-clarin. The Git repository contains the ODD, the derived HTML guidelines and XML schemas, and example documents. The recommendation presents the encoding of PP metadata, including the speakers and political parties, the structure of the corpus, the encoding of the speeches and notes, linguistic annotation and multimedia. The talk concludes with discussing further work, esp. the provision of a set of example documents, the conversion of Akoma Ntoso and RDF encoded PPs into Parla-CLARIN and vice-versa, and other transformation scripts that would operationalise the proposed ...

Problem melden

Wenn Sie Probleme mit dem Zugriff auf einen gefundenen Titel haben, können Sie sich über dieses Formular gern an uns wenden. Schreiben Sie uns hierüber auch gern, wenn Ihnen Fehler in der Titelanzeige aufgefallen sind.