A method for segmenting large textual corpora in uniform periods. Firstly, vocabulary growth is adjusted by calculating the trend. Then a segmentation algorithm, associated with validity tests, indicates the optimal succession in distinct periods. This method is applied to the "Queen's speeches" which have been given by the Quebec government at the beginning of each parliamentary session since 1867 and up until 2009. ; Méthode originale pour segmenter un corpus chronologique en périodes homogènes. On calcule l'accroissement du vocabulaire et son ajustement par une tendance. Un algorithme de segmentation associé à des tests de validité donne le découpage optimal du corpus. Une série d'indicateurs mesure l'ampleur des mouvements de vocabulaire caractérisant chacune des périodes. Application aux déclarations du gouvernement québécois à l'ouverture de chaque session du parlement provincial de 1867 à 2009.
A method for segmenting large textual corpora in uniform periods. Firstly, vocabulary growth is adjusted by calculating the trend. Then a segmentation algorithm, associated with validity tests, indicates the optimal succession in distinct periods. This method is applied to the "Queen's speeches" which have been given by the Quebec government at the beginning of each parliamentary session since 1867 and up until 2009. ; Méthode originale pour segmenter un corpus chronologique en périodes homogènes. On calcule l'accroissement du vocabulaire et son ajustement par une tendance. Un algorithme de segmentation associé à des tests de validité donne le découpage optimal du corpus. Une série d'indicateurs mesure l'ampleur des mouvements de vocabulaire caractérisant chacune des périodes. Application aux déclarations du gouvernement québécois à l'ouverture de chaque session du parlement provincial de 1867 à 2009.
A method for segmenting large textual corpora in uniform periods. Firstly, vocabulary growth is adjusted by calculating the trend. Then a segmentation algorithm, associated with validity tests, indicates the optimal succession in distinct periods. This method is applied to the "Queen's speeches" which have been given by the Quebec government at the beginning of each parliamentary session since 1867 and up until 2009. ; Méthode originale pour segmenter un corpus chronologique en périodes homogènes. On calcule l'accroissement du vocabulaire et son ajustement par une tendance. Un algorithme de segmentation associé à des tests de validité donne le découpage optimal du corpus. Une série d'indicateurs mesure l'ampleur des mouvements de vocabulaire caractérisant chacune des périodes. Application aux déclarations du gouvernement québécois à l'ouverture de chaque session du parlement provincial de 1867 à 2009.
The corpus includes the transcripts of 56 TV face-to-face interviews for a total of 14 hours, taken from several broadcasts of the Italian political talk show Mezz'ora, from 24 September 2017 to 14 January 2018 aired on Rai 3 channel. The show follows a fixed format, with interviews conducted by a journalist, Lucia Annunziata, to a guest, typically a prominent figure in the political or cultural scene (such as Matteo Renzi, Luigi Di Maio, Pierluigi Bersani, Walter Veltroni, Alessandro Di Battista, Angelino Alfano, Matteo Salvini, etc.). The audio signal has been transcribed using a semi-supervised speech-to-text methodology (Google API + manual correction). Annotation has been done using XML as markup language and following the TEI standard for Speech Transcripts in terms of utterances. The linguistic resource has currently 100,870 tokens. For each interview, the following information was manually annotated and is included in the XML resource file (every file was named with the broadcast date, the description lists the names of the guests interviewed): 1. metadata: these include useful information for the quick identification of transcriptions, for example, the tools used for the transcription, a link to the interview, the owner account, the title of the talk show, the date of airing, the guests, etc. 2. pause: this tag is used to mark a pause either between or within utterances. Speakers differ very much in their rhythm and in particular in the amount of time they leave between words, so the following element is provided to mark occasions where the transcriber judges that speech has been paused, irrespective of the actual amount of silence; 3. vocal: with this tag we mark any vocalized but not necessarily lexical phenomenon, for example, non-lexical expressions (i.e. burp, click, throat, etc.) and semi-lexical expressions (i.e. ah, aha, aw, eh, ehm etc.); 4. del: phenomena of speech management include false starts, repetition, and truncated words included in the transcription, but marked - in the TEI Guidelines - as editorially deleted and therefore indicated with the tag del; 5. overlap: this phenomenon is present when the speaker conveys (in a verbal or non-verbal manner) that he/she is about to finish his/her turn and the co-locutor starts speaking so that there is a slight overlap of utterances. Only for interviews longer than 50 turns, the second level of annotation was added automatically using ANVIL software (Kipp, 2001) - inspired by the MUMIN annotation scheme (Allwood et al., 2007). These files - listed with "name surname" - provide an alignment of the transcript with the original audio-video source (accessible from the link in the metadata). Below we summarize the list of gestures annotated, as described in (Allwood et al., 2007): 1. facial displays: they refer to timed changes in eyebrow position, expressions of the mouth, movement of the head and of the eyes (Cassell and others, 2000). The coding scheme includes features describing gestures and movements of the various parts of the face, with values that are either semantic categories such as Smile or Scowl or direction indications such as Up or Down; 2. hand gesture: we follow a simplification of the scheme from the McNeill Lab (Duncan, 2004). The features, 7 in total, concern Handedness and Trajectory, so that we distinguish between single-handed and double-handed gestures, and among a number of different simple trajectories analogous to what is done for gaze movement. The value Complex is intended to capture movements where several trajectories are combined; 3. body posture: this tag comprises trajectory indications for the movement of the trunk. The categories are mutually exclusive to facilitate the annotation work.
In: Schweizerische Ärztezeitung: SÄZ ; offizielles Organ der FMH und der FMH Services = Bulletin des médecins suisses : BMS = Bollettino dei medici svizzeri, Band 86, Heft 16, S. 974-975