St��der | Pollux - Fachinformationsdienst Politikwissenschaft

The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" internet top-level domains in 2021, extending the crawl dynamically to other domains as well. All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable efforts were devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and BicleanerAI (https://github.com/bitextor/bicleaner-ai) and Bifixer (https://github.com/bitextor/bifixer) were used for fixing, cleaning, and deduplicating the final version of the corpus. While the TXT format consists solely of pairs of source and target segments (one or several sentences), each segment pair in the TMX format is accompanied by the following metadata: - source and target document URL; - quality score as provided by the tool BicleanerAI; - translation direction identification: the source segment in each segment pair was identified by using a probabilistic model; - personal information identification ("biroamer-entities"): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - language variants: the language variant of English (British or American) was identified for every segment pair on document and domain level. Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author's view. The Agency is not responsible for any use that may be made of the information it contains.

Zugriff(Open Access)

BASE

Exportieren

Buch(gedruckt)#141979

NRB--GDR, druzhba i sŭtrudnichestvo

Aleksandrov, Emil; Petkov, Petko; @Bŭlgarska akademii︠a︡ na naukite, Sofia / Institut po mezhdunarodni otnoshenii︠a︡ i sot︠s︡ialisticheska integrat︠s︡ii︠a︡; @Akademie für Staats- und Rechtswissenschaft der DDR / Institut für Internationale Beziehungen; Institut za mezhdunarodni otnoshenii︠a︡ i sot︠s︡ialisticheska integrat︠s︡ii︠a︡; Institut za mezhdunarodni otnoshenii︠a︡ kŭm Akademii︠a︡ta

Verfügbarkeit

Verfügbarkeit an Ihrem Standort wird überprüft

Dieses Buch ist auch in Ihrer Bibliothek verfügbar:

Exportieren

Open Access#152021

Multilingual comparable corpora of parliamentary debates ParlaMint 2.0

Erjavec, Tomaž; Ogrodniczuk, Maciej; Osenova, Petya; Ljubešić, Nikola; Simov, Kiril; Grigorova, Vladislava; Rudolf, Michał; Pančur, Andrej

ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1405.

Zugriff(Open Access)

BASE

Exportieren

Filter

Format

Medientyp

Sprache

Weitere Sprachen

Jahre

Godišnik na Sofijskija universitet "Sv. Kliment Ochridski": Annual of Sofia University "St. Kliment Ohridski" = Annuaire de l'Université de Sofia "St. Kliment Ohridski". Stopanski fakultet = Faculty of Economics and Business Administration = Faculté des sciences économiques et de géstion

Godišnik na Sofijskija universitet "Sv. Kliment Ochridski": Annual of Sofia University "St. Kliment Ohridski". Filosofski fakultet = Faculty of Philosophy. Socjologija = Sociology

Bălgarskata pravoslavna cărkva i dăržavnata vlast v knjažestvo/carstvo Bălgarija 1878-1912 g.: institucionalni otnošenija

Godišnik na Sofijskija Universitet "Sv. Kliment Ochridski": Annual of Sofia University "St. Kliment Ohridski". Filosofski fakultet = Faculty of Philosophy. Socjologija = Sociology

Godišnik na Sofijskija Universitet Sv. Kliment Ochridski: Annuaire de l'Université de Sofia St. Kliment Ohridski. Katedra po Političeska Ikonomija = Chaire d'Economie Politique

Godišnik na Sofijskija Universitet Sv. Kliment Ochridski: Annuaire de l'Université de Sofia St. Kliment Ohridski. Stopanski Fakultet = Faculté des Sciences Economiques et de Gestion

Godišnik na Sofijskija Universitet Sv. Kliment Ochridski: Annuaire de l'Université de Sofia St. Kliment Ohridski. Filosofski Fakultet = Faculté de Philosophie. Kniga politologija = Livre politologie

Godišnik na Sofijskija Universitet Sv. Kliment Ochridski: Annuaire de l' Université de Sofia St. Kliment Ohridski. Filosofski Fakultet = Faculté de Philosophie. Kniga političeski nauki = Livre political sciences

Godišnik na Sofijskija Universitet Sv. Kliment Ochridski: Annuaire de l' Université de Sofia St. Kliment Ohridski. Filosofski Fakultet = Faculté de Philosophie. Katedra Socialnopolitičeski Sistemi = Cathedre Systémes Sociaux-Politiques

Godišnik na Sofijskija Universitet Sv. Kliment Ochridski: Annuaire de l'Université de Sofia St. Kliment Ohridski. Problemna Naučnoizsledovatelska Laboratorija za Političeskija Život na Bălgarina = Laboratoire de Recherches Scientifiques sur la Politique des Bulgares

Juridičeski rečnik: trudovo pravo, dăržavno pravo, meždunarodno pravo, sădoustrojstvo i graždansko sădoproizvodstvo

Bulgarian-English parallel corpus MaCoCu-bg-en 1.0

NRB--GDR, druzhba i sŭtrudnichestvo

Multilingual comparable corpora of parliamentary debates ParlaMint 2.0

Suchergebnisse

Filter

Format

Medientyp

Sprache

Weitere Sprachen

Jahre

Kontakt

Hilfe