Search results
Filter
11 results
Sort by:
Bulgarian-English parallel corpus MaCoCu-bg-en 1.0
In: http://hdl.handle.net/11356/1521
The Bulgarian-English parallel corpus MaCoCu-bg-en 1.0 was built by crawling the ".bg" and ".бг" internet top-level domains in 2021, extending the crawl dynamically to other domains as well. All the crawling process was carried out by the MaCoCu crawler (https://github.com/macocu/MaCoCu-crawler). Websites containing documents in both target languages were identified and processed using the tool Bitextor (https://github.com/bitextor/bitextor). Considerable efforts were devoted into cleaning the extracted text to provide a high-quality parallel corpus. This was achieved by removing boilerplate and near-duplicated paragraphs and documents that are not in one of the targeted languages. Document and segment alignment as implemented in Bitextor were carried out, and BicleanerAI (https://github.com/bitextor/bicleaner-ai) and Bifixer (https://github.com/bitextor/bifixer) were used for fixing, cleaning, and deduplicating the final version of the corpus. While the TXT format consists solely of pairs of source and target segments (one or several sentences), each segment pair in the TMX format is accompanied by the following metadata: - source and target document URL; - quality score as provided by the tool BicleanerAI; - translation direction identification: the source segment in each segment pair was identified by using a probabilistic model; - personal information identification ("biroamer-entities"): segments containing personal information are flagged, so final users of the corpus can decide whether to use these segments; - language variants: the language variant of English (British or American) was identified for every segment pair on document and domain level. Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus. This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author's view. The Agency is not responsible for any use that may be made of the information it contains.
BASE
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (from November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1432. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.1 release of the data and scripts available at the GitHub repository of the ParlaMint project. As opposed to the previous version 2.0, this version corrects some errors in various corpora and adds the information on upper / lower house for bicameral parliaments. The vertical files have also been changed to make them easier to use in the concordancers.
BASE
Arquitectura e [ciber] feminismo. Unha intersección coa socioloxía e o xénero ; Architecture and [cyber]feminism. An intersection with sociology and gender
[Resumo] Facer arquitectura é significar, é un acto político; ten unha dimensión social. O entendemento social da produción arquitectónica significa abordar o estudo dos grupos sociais en relación á disciplina. Nun contexto de desigualdade, identificar os sesgos culturais resulta clave á hora de promover valores contemplados nos dereitos humanos como a igualdade de xénero. Nesta procura, a chegada das TIC supoñen un punto de inflexión: a democratización das tecnoloxías da información e o nacemento de novos espazos globais de comunicación veñen representando unha oportunidade inédita para a difusión e o encontro de arquitectas, investigadoras e activistas na posta en cuestión do discurso oficial da arquitectura.[Abstract] To make architecture supposes giving meaning, it is a political action; it has a social dimension. The social understanding of architectonic production means tackling the study of social groups in relation to discipline. In an inequality context, identifying cultural bias becomes key just to promote human rights like gender equality. In this pursue, the emergence of ICT means an inflexion point: the democratization of information technologies and the appearance of new global communication spaces that represent an unprecedented opportunity for diffusion and meeting of women architects, investigators and activists joined questioning the official architectural discourse
BASE
Multilingual comparable corpora of parliamentary debates ParlaMint 2.0
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1405.
BASE
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
ParlaMint 2.1 is a multilingual set of 17 comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after November 1st 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the ParlaMint TEI-encoded corpora with the derived plain text version of the corpus along with TSV metadata on the speeches. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project. Note that there also exists the linguistically marked-up version of the corpus, which is available at http://hdl.handle.net/11356/1431.
BASE
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0
ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting in 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive metadata, including aspects of the parliament; the speakers (name, gender, MP status, party affiliation, party coalition/opposition); are structured into time-stamped terms, sessions and meetings; with speeches being marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. Note that some corpora have further information, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been validated against the compatible, but much stricter ParlaMint schemas. This entry contains the linguistically marked-up version of the corpus, while the text version is available at http://hdl.handle.net/11356/1388. The ParlaMint.ana linguistic annotation includes tokenization, sentence segmentation, lemmatisation, Universal Dependencies part-of-speech, morphological features, and syntactic dependencies, and the 4-class CoNLL-2003 named entities. Some corpora also have further linguistic annotations, such as PoS tagging or named entities according to language-specific schemes, with their corpus TEI headers giving further details on the annotation vocabularies and tools. The compressed files include the ParlaMint.ana XML TEI-encoded linguistically annotated corpus; the derived corpus in CoNLL-U with TSV speech metadata; and the vertical files (with registry file), suitable for use with CQP-based concordancers, such as CWB, noSketch Engine or KonText. Also included is the 2.0 release of the data and scripts available at the GitHub repository of the ParlaMint project.
BASE
Factores determinantes da disposición a pagar por recursos naturais. O caso da lagoa e o areal de Valdoviño ; Determinants of visitors' willingness to pay for natural resources. Case study of lagoon and sandy area of Valdoviño ; Factores determinantes de la disposición a pagar de los visitantes del ...
O método de valoración continxente (MVC) baséase na construción de mercados hipotéticos a través dunha enquisa, onde normalmente se lles pregunta aos entrevistados pola súa disposición a pagar (DAP) por un determinado ben que carece de mercado e, polo tanto, de prezo. O obxectivo principal deste traballo é identificar as variables explicativas que condicionan a DAP dos visitantes por gozar dun dos principais elementos do patrimonio natural de Galicia: o conxunto formado pola lagoa e o areal de Valdoviño. A información obtida pode ser de grande utilidade na análise custo-beneficio como fundamento das decisións políticas que afectan á xestión deste recurso natural. ; The contingent valuation method (CVM) is based on the construction of hypothetical markets through a survey where respondents are usually asked for their maximum willingness to pay (WTP) for a certain property that lacks market appeal and therefore of price.The main aim of the study is to identify and explain the variables that occur and condition tourists to pay to enjoy one of the main elements of the natural heritage areas within Galicia including; the coastal lagoon and sandy area of Valdoviño. The information obtained from this study would be very useful in developing a cost-benefit analysis that could be used to inform political decisions that affect the management of these natural resource. ; El método de valoración contingente (MVC) se basa en la construcción de mercados hipotéticos a través de una encuesta, donde normalmente se pregunta a los entrevistados por su disposición a pagar (DAP) por un determinado bien que carece de mercado y, por tanto, de precio. El objetivo principal de este trabajo es identificar las variables explicativas que condicionan la DAP de los visitantes por disfrutar de uno de los principales elementos del patrimonio natural de Galicia: el conjunto formado por la laguna y el arenal de Valdoviño. La información obtenida puede ser de gran utilidad en el análisis coste-beneficio como fundamento de las decisiones políticas que afectan a la gestión de este recurso natural.
BASE