author:"Moreno Sandoval, Antonio" | Pollux - Fachinformationsdienst Politikwissenschaft

Filter

6 Ergebnisse

Sortierung:

Buch(gedruckt)#12021

Financial narrative processing in Spanish

Moreno Sandoval, Antonio

Verfügbarkeit

Verfügbarkeit an Ihrem Standort wird überprüft

Dieses Buch ist auch in Ihrer Bibliothek verfügbar:

Exportieren

Open Access#22014

Collecting and POS-tagging a lexical resource of Japanese biomedical terms from a corpus ; Recogida y etiquetado morfológico de un lexicón de términos biomédicos en japonés a partir de corpus

Herrero Zorita, Carlos; Campillos Llanos, Leonardo; Moreno Sandoval, Antonio

The following paper explains the methodology followed for the creation of a morphologically tagged medical lexicon in Japanese. In order to build this medical resource we have taken into account the morphosyntactic characteristics of the language as well as the origins and formation of the medical terms. Following this, we have compiled a list using the Japanese MutiMedica corpus, special tags from a POS tagger, and several specialised medical dictionaries. After considering three different taggers (ChaSen, Mecab, Juman) we finally chose Juman for the tagging of the lexicon. The problem of the oversegmentation of the language was then corrected and the tags have been normalised. This resource is the base component for the creation of a medical term extractor. ; El artículo resume el proceso de recopilación de un lexicón de términos biomédicos en japonés etiquetados morfológicamente. En primer lugar se han considerado para esta tarea las características morfosintácticas del japonés así como el origen y formación de los términos médicos en esta lengua. Posteriormente la lista se ha recopilado utilizando el corpus japonés MultiMedica, las etiquetas especiales de un etiquetador morfológico y varios diccionarios médicos especializados. Para el siguiente proceso de etiquetado se han considerado tres etiquetadores japoneses (ChaSen, Mecab, Juman), de los cuales se ha escogido este último. Una vez etiquetado, se ha corregido el problema de la sobresegmentación de los términos japoneses y se han simplificado las etiquetas para el propósito de nuestra tarea. Este recurso es la base para la creación de un extractor de términos médicos en japonés. ; This research has been funded by the MINECO (under the grant TIN2010-20644-C03-03) and by the Madrid Regional Government (grant MA2VICMR).

Zugriff(Open Access)

BASE

Exportieren

Open Access#32019

Estudio sobre documentos reutilizables como recursos lingüísticos en el marco del desarrollo del Plan de Impulso de las Tecnologías del Lenguaje ; Report on reusable documents as language resources in Spain, under the Government Plan for Language Technologies

Moreno Sandoval, Antonio; Torre Toledano, Doroteo; Valverde, Ana; Campillos Llanos, Leonardo

Este estudio ha sido realizado dentro del ámbito del Plan de impulso de las Tecnologías del Lenguaje (Plan TL) con financiación de la Secretaría de Estado para el Avance Digital y Red.es. Los objetivos centrales son realizar un censado de recursos de las diferentes administraciones públicas que puedan ser convertidos en RL, así como proponer un plan de acción para abordar su conversión en RL. Se ha elaborado una metodología específica para el censado y evaluación de la madurez de los datos. Se han generado dos listados, uno preliminar compuesto por 101 recursos, del que se han seleccionado 24 para su análisis detallado y evaluación. El informe también incluye un repaso de estudios similares en otros países. Concluye con unas recomendaciones genéricas, así como estrategias concretas para los recursos seleccionados. El informe final y los listados están disponibles públicamente en Red.es y la página del Plan TL. ; This report was carried out within the Spanish administration-driven initiative Language Technologies Plan (Plan TL), funded by Secretaría de Estado para el Avance Digital and Red.es. The main goals are collecting from Spanish public administrations a listing of provided resources and open data that can be transformed to language resources, as well as proposing an action plan to process and distribute them. We designed a specific methodology for listing and evaluating the degree of maturity of the considered data. We created two listings: a preliminary collection of 101 resources, and 24 resources and data repositories selected from the first list for a detailed analysis and evaluation. This report also features a comparative analysis of similar initiatives and studies conducted abroad. We conclude with generic recommendations and detailed strategies for the selected resources. The report and listings are publicly available at Red.es and the Plan TL. website. ; Este informe ha sido financiado por la Secretaría de Estado para el Avance Digital (SEAD) y Red.es.

Zugriff(Open Access)

BASE

Exportieren

Open Access#42022

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Campillos-Llanos, Leonardo; Valverde-Mateos, Ana; Capllonch-Carrión, Adrián; Moreno-Sandoval, Antonio

Background: The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. Methods: We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. Results: This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. Conclusions: Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html. The methods are generalizable to other languages with similar available sources ; This work has been done under the NLPMedTerm project, funded by the European Union's Horizon 2020 research programme under the Marie Skodowska-Curie grant agreement no. 713366 (InterTalentum UAM)

Zugriff(Open Access)

BASE

Exportieren

Open Access#52021

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Campillos-Llanos, Leonardo; Valverde Mateos, Ana; Capllonch Carrión, Adrián; Moreno Sandoval, Antonio

[Background] The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. ; [Methods] We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. ; [Results] This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. ; [Conclusions] Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html. The methods are generalizable to other languages with similar available sources. ; This work has been done under the NLPMedTerm project, funded by the European Union's Horizon 2020 research programme under the Marie Skodowska-Curie grant agreement no. 713366 (InterTalentum UAM) ; Peer reviewed

Zugriff(Open Access)

BASE

Exportieren

Open Access#62008

Combining Wikipedia and Newswire Texts for Question Answering in Spanish

Pablo-Sánchez, César de; Martínez Fernández, José Luis; González-Ledesma, Ana; Samy, Doaa; Martínez, Paloma; Moreno-Sandoval, Antonio; Al-Jumaily, Harith T

4 pages, 1 figure.-- Contributed to: Advances in Multilingual and Multimodal Information Retrieval: 8th Workshop of the Cross-Language Evaluation Forum (CLEF 2007, Budapest, Hungary, Sep 19-21, 2007). ; This paper describes the adaptations of the MIRACLE group QA system in order to participate in the Spanish monolingual question answering task at QA@CLEF 2007. A system, initially developed for the EFE collection, was reused for Wikipedia. Answers from both collections were combined using temporal information extracted from questions and collections. Reusing the EFE subsystem has proven not feasible, and questions with answers only in Wikipedia have obtained low accuracy. Besides, a co-reference module based on heuristics was introduced for processing topic-related questions. This module achieves good coverage in different situations but it is hindered by the moderate accuracy of the base system and the chaining of incorrect answers. ; This work has been partially supported by the Regional Government of Madrid under the Research Network MAVIR (S-0505/TIC-0267) and projects by the Spanish Ministry of Education and Science (TIN2004/07083,TIN2004-07588-C03-02,TIN2007-67407-C03-01). ; Publicado

Zugriff(Open Access)

BASE

Exportieren

Suchergebnisse

Filter

Format

Medientyp

Sprache

Jahre

Financial narrative processing in Spanish

Collecting and POS-tagging a lexical resource of Japanese biomedical terms from a corpus ; Recogida y etiquetado morfológico de un lexicón de términos biomédicos en japonés a partir de corpus

Estudio sobre documentos reutilizables como recursos lingüísticos en el marco del desarrollo del Plan de Impulso de las Tecnologías del Lenguaje ; Report on reusable documents as language resources in Spain, under the Government Plan for Language Technologies

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Combining Wikipedia and Newswire Texts for Question Answering in Spanish

Kontakt

Hilfe