DATA GENERATION | Pollux - Fachinformationsdienst Politikwissenschaft

A growing body of work has focused on text classification methods for detecting the increasing amount of hate speech posted online. This progress has been limited to only a select number of highly resourced languages causing detection systems to either under-perform or not exist in limited data contexts. This is mostly caused by a lack of training data, which are expensive to collect and curate in these settings. In this work, we propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts using synthetic data generation techniques. Given a handful of hate speech examples in a high-resource language such as English, we present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets. We apply our approach to generate training data for hate speech classification tasks in Hindi and Vietnamese. Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain. This method can be adopted to bootstrap hate speech detection models from scratch in limited data contexts. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response to hate speech.
Disclaimer:
This work contains terms that are offensive and hateful. These, however, cannot be avoided due to the nature of the work.

Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Aufsatz(elektronisch)#5Januar 1990

GENSTAT: A Student Data Generation System. Version 1.41

In: Teaching sociology: TS, Band 18, Heft 1, S. 123

England, Lynn

ISSN: 1939-862X

Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Aufsatz(elektronisch)#62020

Data Generation for Composite-Based Structural Equation Modeling Methods

In: Advances in Data Analysis and Classification, 2020

Schlittgen, Rainer; Sarstedt, Marko; Ringle, Christian M.

Zugriff(Open Access)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Exportieren

Aufsatz(elektronisch)#728. September 2022

Copula-Based Synthetic Data Generation in Firm-Size Variables

In: The review of socionetwork strategies, Band 16, Heft 2, S. 479-492

Fujimoto, Shouji; Ishikawa, Atushi; Mizuno, Takayuki

ISSN: 1867-3236

Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Open Access#82003

Data Modeling and Synthetic Data Generation For Fine-Grained Networked Sensing

Yan Yu; Deepak Ganesan; Lewis Girod; Ramesh Govindan; Deborah Estrin

Sensor networks have drawn much attention because of their promising applications in environmental monitoring, seismology, and military surveillance. Despite increasing interest, sensor network research is still in its initial phase. Few real systems have been deployed and little data is available to test proposed protocol and data management designs. Most sensor network research to date uses randomly generated data input to simulate their systems. Some researchers have proposed using environmental monitoring data obtained from remote sensing or in-situ instrumentation. In many cases, neither of these approaches is relevant, because they are either collected from regular grid topology, or too coarse grained. This paper proposes to use synthetic data generation techniques to generate irregular data topology from data sets measured on a grid. To tackle this problem, we investigate the use of the available sparsely sampled data sets, model the spatio-temporal correlation in these data sets, and generate irregular topology data based on empirical models of the experimental data. Our goal is to more realistically evaluate sensor network system designs before large scale field deployment. In obtaining these synthetic data sets, we draw heavily on techniques developed in geo-statistics and other spatial interpolation techniques, but appropriately modify them for the application at hand. Our evaluation results on the radar data set of weather observations shows that the spatial correlation of the original and synthetic data are similar. Moreover, visual comparison shows that the synthetic data retains interesting properties (e.g., edges) of the original data. Our case study on the DIMENSIONS system demonstrates how synthetic data helps to evaluate the system over an irregular topology, and points out the need to improve the algorithm.

Zugriff(Open Access)

BASE

Exportieren

Open Access#92021

Genomic health data generation in the UK: a 360 view

Ormondroyd, Elizabeth; Border, Peter; Hayward, Judith; Papanikitas, Andrew

In the UK, genomic health data is being generated in three major contexts: the healthcare system (based on clinical indication), in large scale research programmes, and for purchasers of direct-to-consumer genetic tests. The recently delivered hybrid clinical/research programme, 100,000 Genomes Project set the scene for a new Genomic Medicine Service, through which the National Health Service aims to deliver consistent and equitable care informed by genomics, while providing data to inform academic and industry research and development. In parallel, a large scale research study, Our Future Health, has UK Government and Industry investment and aims to recruit 5 million volunteers to support research intended to improve early detection, risk stratification, and early intervention for chronic diseases. To explore how current models of genomic health data generation intersect, and to understand clinical, ethical, legal, policy and social issues arising from this intersection, we conducted a series of five multidisciplinary panel discussions attended by 28 invited stakeholders. Meetings were recorded and transcribed. We present a summary of issues identified: genomic test attributes; reasons for generating genomic health data; individuals' motivation to seek genomic data; health service impacts; role of genetic counseling; equity; data uses and security; consent; governance and regulation. We conclude with some suggestions for policy consideration.

Zugriff(Open Access)

BASE

Exportieren

Open Access#102021

Genomic health data generation in the UK: a 360 view

Ormondroyd, E; Border, P; Hayward, J; Papanikitas, A

In the UK, genomic health data is being generated in three major contexts: the healthcare system (based on clinical indication), in large scale research programmes, and for purchasers of direct-to-consumer genetic tests. The recently delivered hybrid clinical/research programme, 100,000 Genomes Project set the scene for a new Genomic Medicine Service, through which the National Health Service aims to deliver consistent and equitable care informed by genomics, while providing data to inform academic and industry research and development. In parallel, a large scale research study, Our Future Health, has UK Government and Industry investment and aims to recruit 5 million volunteers to support research intended to improve early detection, risk stratification, and early intervention for chronic diseases. To explore how current models of genomic health data generation intersect, and to understand clinical, ethical, legal, policy and social issues arising from this intersection, we conducted a series of five multidisciplinary panel discussions attended by 28 invited stakeholders. Meetings were recorded and transcribed. We present a summary of issues identified: genomic test attributes; reasons for generating genomic health data; individuals' motivation to seek genomic data; health service impacts; role of genetic counseling; equity; data uses and security; consent; governance and regulation. We conclude with some suggestions for policy consideration.

Zugriff(Open Access)

BASE

Exportieren

Open Access#112014

Evolutionary algorithms for the multi-objective test data generation problem

Ferrer, Javier; Chicano, Francisco; Alba-Torres, Enrique

Software: Practice & Experience, 42(11):1331-1362 ; Automatic test data generation is a very popular domain in the field of search-based software engineering. Traditionally, the main goal has been to maximize coverage. However, other objectives can be defined, such as the oracle cost, which is the cost of executing the entire test suite and the cost of checking the system behavior. Indeed, in very large software systems, the cost spent to test the system can be an issue, and then it makes sense by considering two conflicting objectives: maximizing the coverage and minimizing the oracle cost. This is what we did in this paper. We mainly compared two approaches to deal with the multi-objective test data generation problem: a direct multi-objective approach and a combination of a mono-objective algorithm together with multi-objective test case selection optimization. Concretely, in this work, we used four state-of-the-art multi-objective algorithms and two mono-objective evolutionary algorithms followed by a multi-objective test case selection based on Pareto efficiency. The experimental analysis compares these techniques on two different benchmarks. The first one is composed of 800 Java programs created through a program generator. The second benchmark is composed of 13 real programs extracted from the literature. In the direct multi-objective approach, the results indicate that the oracle cost can be properly optimized; however, the full branch coverage of the system poses a great challenge. Regarding the mono-objective algorithms, although they need a second phase of test case selection for reducing the oracle cost, they are very effective in maximizing the branch coverage. ; Spanish Ministry of Science and Innovation and FEDER under contract TIN2008-06491-C04-01 (the M project). Andalusian Government under contract P07-TIC-03044 (DIRICOM project).

Zugriff(Open Access)

BASE

Exportieren

Open Access#122020

Synthetic Data Generation in Hybrid Modelling of Railway HVAC System

Gálvez, Antonio; Diez-Olivan, Alberto; Seneviratne, Dammika; Galar, Diego

This paper proposes a hybrid model (HyM)for a heating, ventilation and air conditioning (HVAC) system installed in a passenger train. This HyM fuses data from two sources: data taken from the real system and synthetic data generated using a physics-based model of the HVAC. The physical model of the HVAC was developed to include the sensors located in the real system and new virtual sensors reproducing the behaviour of the system while a failure mode (FM) is simulated. Statistical features are calculated from the selected signals. These features are labelled according to the related FMs and are merged with the features calculated from the data from the real system. This data fusion allows us to classify the condition indicators of the system according to the FMs. The merged features are used to train a neural network (NN), which achieves a remarkable accuracy. Accuracy is a key concern of future research on the detection and diagnosis of a multiple faults and the estimation of the remaining useful life (RUL) through prognosis. The outcome is beneficial for the proper functioning of the system and the safety of the passengers. ; Finanssiär: Basque Government (KK-2020/0004); ISBN för värdpublikation: 978-92-990084-6-1

Zugriff(Open Access)

BASE

Exportieren

Aufsatz(elektronisch)#132016

A Study on Fetal Arrhythmias and Synthetic Data Generation

In: Asian journal of research in social sciences and humanities: AJRSH, Band 6, Heft 12, S. 277

Pradeepa, M.; Helenprabha, K.

ISSN: 2249-7315

Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Aufsatz(elektronisch)#1424. Dezember 2021

Synthetic Data Generation with Differential Privacy via Bayesian Networks

In: Journal of privacy and confidentiality, Band 11, Heft 3

Bao, Ergute; Xiao, Xiaokui; Zhao, Jun; Zhang, Dongping; Ding, Bolin

ISSN: 2575-8527

This paper describes PrivBayes, a differentially private method for generating synthetic datasets that was used in the 2018 Differential Privacy Synthetic Data Challenge organized by NIST.

Zugriff(Open Access)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Aufsatz(elektronisch)#152016

Ensuring VGI Credibility in Urban-Community Data Generation: A Methodological Research Design

In: Urban Planning, Band 1, Heft 2, S. 88-100

O'Brien, Jamie; Serra, Miguel; Hudson-Smith, Andrew; Psarra, Sophia; Hunter, Anthony; Zaltz-Austwick, Martin

In this paper we outline the methodological development of current research into urban community formations based on combinations of qualitative (volunteered) and quantitative (spatial analytical and geo-statistical) data. We outline a research design that addresses problems of data quality relating to credibility in volunteered geographic information (VGI) intended for Web-enabled participatory planning. Here we have drawn on a dual notion of credibility in VGI data, and propose a methodological workflow to address its criteria. We propose a 'super-positional' model of urban community formations, and report on the combination of quantitative and participatory methods employed to underpin its integration. The objective of this methodological phase of study is to enhance confidence in the quality of data for Web-enabled participatory planning. Our participatory method has been supported by rigorous quantification of area characteristics, including participant communities' demographic and socio-economic contexts. This participatory method provided participants with a ready and accessible format for observing and mark-making, which allowed the investigators to iterate rapidly a system design based on participants' responses to the workshop tasks. Participatory workshops have involved secondary school-age children in socio-economically contrasting areas of Liverpool (Merseyside, UK), which offers a test-bed for comparing communities' formations in comparative contexts, while bringing an under-represented section of the population into a planning domain, whose experience may stem from public and non-motorised transport modalities. Data has been gathered through one-day participatory workshops, featuring questionnaire surveys, local site analysis, perception mapping and brief, textual descriptions. This innovative approach will support Web-based participation among stakeholding planners, who may benefit from well-structured, community-volunteered, geo-located definitions of local spaces.

Zugriff(Open Access)Zugriff(Open Access)Subito

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Filter

Format

Medientyp

Sprache

Weitere Sprachen

Jahre

Data Similarity in Classification and Fictitious Training Data Generation

Synthetic Data: Legal Implications of the Data-Generation Revolution

Improving Market Data Generation with Restricted Boltzmann Machines

Hate Speech Detection in Limited Data Contexts Using Synthetic Data Generation

GENSTAT: A Student Data Generation System. Version 1.41

Data Generation for Composite-Based Structural Equation Modeling Methods

Copula-Based Synthetic Data Generation in Firm-Size Variables

Data Modeling and Synthetic Data Generation For Fine-Grained Networked Sensing

Genomic health data generation in the UK: a 360 view

Genomic health data generation in the UK: a 360 view

Evolutionary algorithms for the multi-objective test data generation problem

Synthetic Data Generation in Hybrid Modelling of Railway HVAC System

A Study on Fetal Arrhythmias and Synthetic Data Generation

Synthetic Data Generation with Differential Privacy via Bayesian Networks

Ensuring VGI Credibility in Urban-Community Data Generation: A Methodological Research Design

Suchergebnisse

Filter

Format

Medientyp

Sprache

Weitere Sprachen

Jahre

Kontakt

Hilfe