Data-Sharing | Pollux - Fachinformationsdienst Politikwissenschaft

The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" internet top-level domains in 2021, extending the crawl dynamically to other domains as well. All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable efforts were devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and BicleanerAI (https://github.com/bitextor/bicleaner-ai) and Bifixer (https://github.com/bitextor/bifixer) were used for fixing, cleaning, and deduplicating the final version of the corpus. While the TXT format consists solely of pairs of source and target segments (one or several sentences), each segment pair in the TMX format is accompanied by the following metadata: - source and target document URL; - quality score as provided by the tool BicleanerAI; - translation direction identification: the source segment in each segment pair was identified by using a probabilistic model; - personal information identification ("biroamer-entities"): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - language variants: the language variant of English (British or American) was identified for every segment pair on document and domain level. Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author's view. The Agency is not responsible for any use that may be made of the information it contains.

Zugriff(Open Access)

BASE

Exportieren

Open Access#102021

Multilingual comparable corpora of parliamentary debates ParlaMint 2.0

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1405.

Zugriff(Open Access)

BASE

Exportieren

Open Access#112021

Multilingual comparable corpora of parliamentary debates ParlaMint 2.1

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1431.

Zugriff(Open Access)

BASE

Exportieren

Open Access#122021

Реализъм вместо догматизъм в политиката ("скенерът" срещу идеологиите)

Светослав СТАВРЕВ

When something or someone is being declared "the best of everything possible" this a priori and a posteriori wrong. This is not useful for that "something" or that "someone" because their natural competitors do not share that thought and they do not sleep. Nobody can stop the development, regardless of the direction, in some or other assessment criteria. Long ago the political democracy has been declared such a perfect "something". But credibility in it begins to fall exactly where it has taken strong roots. For opponents, this is a thing that has been known long ago. Now there are empirically on the subject, but there are no convincing explanations of the data. The article offers a possible interpretation of that crisis of confidence in democracy. Specific tools developed by author and called by him Scanner, were used for the analysis. The conclusions are: democratization as a process is supported by its actual effectivity, and it is cyclical; there are many different democratic models competing with each other; nowhere ever has existed uninterrupted and one-way democratic development; democratic content is most often helpless a form countering undemocratic aggression outside. The text is not a requiem for democracy, but an attempt to breathe new life into the political process and to understand the emerging new phenomena in the world understood in today's mainstream as a subversion of modernity – for example in USA (Trump (UK (Brexit (China (Xi Jinping (Russia (Putin (Philippines (DuTerte (South Africa (Zuma (Turkey (Erdogan (and many other places in the 21st century.

Zugriff(Open Access)

BASE

Exportieren

Open Access#132011

Informal patient payments and public attitudes towards these payments: evidence from six cee countries

Stepurko, T; Pavlova, M; Gryga, I

Informal patient payments are deeply rooted in Central and Eastern European countries. Despite the socio-political changes in the health care sectors after 1990s and the subsequent health care reforms, informal payments for health care services continue to serve patients` and physicians` interests. These payments also fill gaps in health care funding in this European region. Nevertheless, unofficial payments are not a desirable payment channel. They lack transparency and distort the efficiency and equity in health care provision. Still, the successful elimination of these payments will depend on the public attitude towards these payments. This study aims to compare public attitudes towards informal patient payments and payment experience in six Central and Eastern European: Bulgaria, Hungary, Lithuania, Poland, Romania, and Ukraine. The data have been collected in 2010 in nation-wide representative surveys using an identical standardized question- naire administrated via face-to-face interviews. We have collected about 1000 questionnaires in each country. The results show that a major group of respondents in each country expresses a negative attitude towards both informal cash payments and in-kind gifts. 208, 187, and 174 respondents paid informally for out-patient service in Ukraine, Romania, and Hungary respectively. We also analyse the relation between public attitudes and respondents` past experience with informal payments, e.g. whether they have paid informally payment for out-patient service used last year. In Bulgaria and Poland, negative attitude is mostly observed among those who have not paid informally. The existence of positive and indifferent attitudes towards informal pay- ments as reported in our study, indicates a challenge for policy makers in Central and Eastern European countries. The acceptance of government initiatives aimed at the elimination of informal payments will largely depend on the governments` ability to create a social resistance towards these payments.

Zugriff(Open Access)

BASE

Exportieren

Open Access#142021

Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1388. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project.

Zugriff(Open Access)

BASE

Exportieren

Open Access#152021

Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (from November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1432. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.1 release of the data and scripts available at the GitHub repository of the ParlaMint project. As opposed to the previous version 2.0, this version corrects some errors in various corpora and adds the information on upper / lower house for bicameral parliaments. The vertical files have also been changed to make them easier to use in the concordancers.

Zugriff(Open Access)

BASE

Exportieren

Filter

Format

Medientyp

Sprache

Weitere Sprachen

Jahre

Demogrāfija: statistisko datu krājums = Demography : collection of statistical data

Demogrāfija: statistisko datu krājums = Demography : collection of statistical data

Bērni Latvijā: statistiko datu krājums = Children in Latvia : statistical data collection

Bērni Latvijā: statistiko datu krājums = Children in Latvia : statistical data collection

Darbaspēka apsekojuma galvenie rādītāji: gadā ; statistiko data krājums = Labour force survey : key indicators in

Darbs Latvijā: statistiko datu krājums = Labour in Latvia : collection of statistical data

Baltijas valstis un ziemelvalstis: statistisko datu krājums : a collection of statistical data

Darbaspēka apsekojuma galvenie rādītāji: gadā ; data krājums = Labour force survey : main indicators

Bulgarian-English parallel corpus MaCoCu-bg-en 1.0

Multilingual comparable corpora of parliamentary debates ParlaMint 2.0

Multilingual comparable corpora of parliamentary debates ParlaMint 2.1

Реализъм вместо догматизъм в политиката ("скенерът" срещу идеологиите)

Informal patient payments and public attitudes towards these payments: evidence from six cee countries

Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0

Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1

Suchergebnisse

Filter

Format

Medientyp

Sprache

Weitere Sprachen

Jahre

Kontakt

Hilfe