English | Pollux - Fachinformationsdienst Politikwissenschaft

The Slovene-English parallel corpus MaCoCu-sl-en 1.0 was built by crawling the ".si" internet top-level domain in 2021, extending the crawl dynamically to other domains as well. All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable efforts were devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and BicleanerAI (https://github.com/bitextor/bicleaner-ai) and Bifixer (https://github.com/bitextor/bifixer) were used for fixing, cleaning, and deduplicating the final version of the corpus. While the TXT format consists solely of pairs of source and target segments (one or several sentences), each segment pair in the TMX format is accompanied by the following metadata: - source and target document URL; - quality score as provided by the tool BicleanerAI; - translation direction identification: the source segment in each segment pair was identified by using a probabilistic model; - personal information identification ("biroamer-entities"): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - language variants: the language variant of English (British or American) was identified for every segment pair on document and domain level. Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author's view. The Agency is not responsible for any use that may be made of the information it contains.

Zugriff(Open Access)

BASE

Exportieren

Zeitschrift(gedruckt)#4

Kanadské listy: independent monthly in Czech, Slovak and English languages

Verfügbarkeit an Ihrem Standort wird überprüft

Diese Zeitschrift ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Aufsatz(gedruckt)#52004

Jina cesta k trhu. Hledani alternativy k soucasne podobe globalizace (Jan Placht's Tr from English of Globalization and Its Discontents)

In: Politologický časopis, Band 11, Heft 1, S. 88-92

Krsek, Ivo

ISSN: 1211-3247

Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Buch(gedruckt)#62002

Praktický anglicko-český a česko-anglický slovník pro podnikání a veřejnou správu

In: Jazykověda

Skálová, Eva

Verfügbarkeit

Verfügbarkeit an Ihrem Standort wird überprüft

Dieses Buch ist auch in Ihrer Bibliothek verfügbar:

Exportieren

Buch(gedruckt)#72018

Tematický česko-anglický a anglicko-český soudnělékařský slovník

Beran, Michal; Dohnalová, Petra; Neureutterová, Klára

Verfügbarkeit

Verfügbarkeit an Ihrem Standort wird überprüft

Dieses Buch ist auch in Ihrer Bibliothek verfügbar:

Exportieren

Buch(gedruckt)#82005

Česko-anglický právnický slovník

Oherová, Jana

Verfügbarkeit

Verfügbarkeit an Ihrem Standort wird überprüft

Dieses Buch ist auch in Ihrer Bibliothek verfügbar:

Exportieren

Buch(gedruckt)#92014

Vojensko-technický slovník anglicko-český a česko-anglický

In: Militaria 35

Verfügbarkeit

Verfügbarkeit an Ihrem Standort wird überprüft

Dieses Buch ist auch in Ihrer Bibliothek verfügbar:

Exportieren

Open Access#102021

ESIC 1.0 -- Europarl Simultaneous Interpreting Corpus

In: http://hdl.handle.net/11234/1-3719

Macháček, Dominik; Žilinec, Matúš; Bojar, Ondřej

ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, with manual transcripts, transcribed simultaneous interpreting into Czech and German, and parallel translations. The corpus contains source English videos and audios. The interpreters' voices are not published within the corpus, but there is a tool that downloads them from the web of European Parliament, where they are publicly avaiable. The transcripts are equipped with metadata (disfluencies, mixing voices and languages, read or spontaneous speech, etc.), punctuated, and with word-level timestamps. The speeches in the corpus come from the European Parliament plenary sessions, from the period 2008-11. Most of the speakers are MEP, both native and non-native speakers of English. The corpus contains metadata about the speakers (name, surname, id, fraction) and about the speech (date, topic, read or spontaneous). The current version of ESIC is v1.0. It has validation and evaluation parts.

Zugriff(Open Access)

BASE

Exportieren

Open Access#112014

Parlamentarismus nebo poloprezidencialismus? Spor o klasifikaci středoevropských demokratických režimů ; Parliamentarism or semi-presidentialism? A dispute over classification of Central European democratic regimes

Brunclík, Miloš; Kubát, Michal

While reading academic papers and books on political regimes in Central Europe, one can become aware of an interesting and remarkable fact: these regimes (forms of government) are classified rather differently. Whereas some scholars tend to approach them as parliamentary regimes, others classify them as semi-presidential ones. The major dividing line between these two perspectives runs between a large group of English-writing scholars based outside Central Europe and those from Central Europe itself. Having reviewed a large number of relevant studies in this field, the authors of this article argue that the key reason for the different assessments of Central European regimes resides mainly in a different theoretical (but also methodological) approach, which has important implications when considering how these regimes are treated in various studies. Whereas the group of English-writing scholars tends to adopt a minimalist institutional definition suggested by Robert Elgie, most Central European scholars prefer an approach (inspired by Duverger or Sartori) that emphasizes presidential powers, which are irrelevant to Elgie's definition. ; While reading academic papers and books on political regimes in Central Europe, one can become aware of an interesting and remarkable fact: these regimes (forms of government) are classified rather differently. Whereas some scholars tend to approach them as parliamentary regimes, others classify them as semi-presidential ones. The major dividing line between these two perspectives runs between a large group of English-writing scholars based outside Central Europe and those from Central Europe itself. Having reviewed a large number of relevant studies in this field, the authors of this article argue that the key reason for the different assessments of Central European regimes resides mainly in a different theoretical (but also methodological) approach, which has important implications when considering how these regimes are treated in various studies. Whereas the group of ...

Zugriff(Open Access)

BASE

Exportieren

Aufsatz(elektronisch)#1214. Juli 2023

O STILISTIKI IN NJENEM POMENU: JEZIKOSLOVNA DISCIPLINA V DRUŽBI 21. STOLETJA

In: Teorija in praksa, S. 203-220

Kalin Golob, Monika

Through analysis of the books on stylistics written in the English, German,
Czech, Slovak and Croatian languages, we describe the development of stylistics,
its predecessors, independence from literary science, and the contemporary
situation. We focus on Slovenian linguistic stylistics based on an
analysis and review of entries including keyword stylistics in the Slovenian
bibliographic catalogue Cobiss+. By reviewing and analysing the stylistic
publications of Tomo Korošec, who devoted the largest part of his research
to media stylistics, we substantiate his contribution to Slovenian theoretical
stylistics. The main finding of our comprehensive analysis is that stylistic
research in Slovenia has been intense since the 1960s, that an important part
of this research relates to the work of Tomo Korošec and that, alongside
theoretical stylistics, it is important to include school stylistics as part of general
education on all levels.
Keywords: linguistic stylistics, history of stylistics, media stylistics, journalism
stylistic, stylistic of advertising, linguistic education, rhetoric

Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Open Access#132021

Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1388. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project.

Zugriff(Open Access)

BASE

Exportieren

Open Access#142021

Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (from November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1432. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.1 release of the data and scripts available at the GitHub repository of the ParlaMint project. As opposed to the previous version 2.0, this version corrects some errors in various corpora and adds the information on upper / lower house for bicameral parliaments. The vertical files have also been changed to make them easier to use in the concordancers.

Zugriff(Open Access)

BASE

Exportieren

Aufsatz(gedruckt)#152014