Suchergebnisse
Filter
Format
Medientyp
Sprache
Weitere Sprachen
Jahre
181145 Ergebnisse
Sortierung:
Estimating density ratio of marginals to joint: applications to causal inference
In various fields of data science, researchers often face problems of estimating the ratios of two probability densities. Particularly in the context of causal inference, the product of marginals for a treatment variable and covariates to their joint density ratio typically emerges in the process of constructing causal effect estimators. This article applies the general least square density ratio estimation methodology by Kanamori, Hido and Sugiyama to the product of marginals to joint density ratio, and demonstrates its usefulness particularly for causal inference on continuous treatment effects and dose-response curves. The proposed method is illustrated by a simulation study and an empirical example to investigate the treatment effect of political advertisements in the U.S. presidential campaign data.
BASE
Multivariate mixed Poisson Generalized Inverse Gaussian INAR(1) regression
In this paper, we present a novel family of multivariate mixed Poisson-Generalized Inverse Gaussian INAR(1), MMPGIG-INAR(1), regression models for modelling time series of overdispersed count response variables in a versatile manner. The statistical properties associated with the proposed family of models are discussed and we derive the joint distribution of innovations across all the sequences. Finally, for illustrative purposes different members of the MMPGIG-INAR(1) class are fitted to Local Government Property Insurance Fund data from the state of Wisconsin via maximum likelihood estimation.
BASE
Beating the news using social media: the case study of American Idol
We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. This event can be considered as basic test in a simplified environment to assess the predictive power of Twitter signals. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it correlates with the contestants ranking and allows the anticipation of the voting outcome. Twitter data from the show and the voting period of the season finale have been analyzed to attempt the winner prediction ahead of the airing of the official result. We also show that the fraction of tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. The geolocalized data are crucial for the correct prediction of the final outcome of the show, pointing out the importance of considering information beyond the aggregated Twitter signal. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators that may be able to anticipate the future unfolding of opinion formation events.
BASE
fsdaSAS: a package for robust regression for very large datasets including the batch forward search
The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the fsdaSAS package, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations.
BASE
Forecasting the term structure of government bond yields in unstable environments
In this paper we model and predict the term structure of US interest rates in a data-rich and unstable environment. The dynamic Nelson-Siegel factor model is extended to allow the model dimension and the parameters to change over time, in order to account for both model uncertainty and sudden structural changes, in one setting. The proposed specification performs better than several alternatives, since it incorporates additional macrofinance information during hard times, while it allows for more parsimonious models to be relevant during normal periods. A dynamic variance decomposition measure constructed from our model shows that parameter uncertainty and model uncertainty regarding different choices of predictors explain a large proportion of the predictive variance of bond yields.
BASE
Advice on comparing two independent samples of circular data in biology
LL is supported by the Austrian Science Fund (FWF, Grant Number: P32586). EPM receives funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant Agreement No 948728). This research was funded in whole, or in part, by the Austrian Science Fund (FWF) P32586. ; Many biological variables are recorded on a circular scale and therefore need different statistical treatment. A common question that is asked of such circular data involves comparison between two groups: Are the populations from which the two samples are drawn differently distributed around the circle? We compared 18 tests for such situations (by simulation) in terms of both abilities to control Type-I error rate near the nominal value, and statistical power. We found that only eight tests offered good control of Type-I error in all our simulated situations. Of these eight, we were able to identify the Watson's U2 test and a MANOVA approach, based on trigonometric functions of the data, as offering the best power in the overwhelming majority of our test circumstances. There was often little to choose between these tests in terms of power, and no situation where either of the remaining six tests offered substantially better power than either of these. Hence, we recommend the routine use of either Watson's U2 test or MANOVA approach when comparing two samples of circular data. ; Publisher PDF ; Peer reviewed
BASE
Quality assessment of microsimulation models: the case of EUROMOD
Assessing the quality of microsimulation models is an important contributing factor for motivating their use in both academic and policy environments. This is particularly relevant for EUROMOD, the tax-benefit microsimulation model for the European Union, because it is intended to be widely used. This paper explains how the quality of EUROMOD is assessed. It focusses on the validity and scope of results as particularly important dimensions of quality, and on the transparency with which this assessment is done. It also provides evidence on the extent and breadth of the use of EUROMOD. Some of the key trade-offs between different aspects of quality are identified and the paper concludes with a view on the appropriate division of responsibility for quality assessment, between model developers and users.
BASE
Simulation of an application of the Hartz-IV reform in Austria
This paper examines the application of the German Hartz-IV model in Austria. If the Hartz-IV reform were to be transferred to Austria, this would imply that instead of unemployment assistance (Notstandshilfe), the social-assistance-type minimum income benefit (Bedarfsorientierte Mindestsicherung) would be follow-up assistance after unemployment benefit expires. The analysis is carried out using the tax-benefit microsimulation models EUROMOD and SORESI based on the latest EU-SILC 2015 data for Austria. We simulate a baseline scenario according to the minimum income benefit regulations of the nine federal states for the year 2017 and a scenario including a proxy for an asset check of capital income. In addition, following current political discussions and developments, we simulate a ceiling scenario, in which the sum of minimum standards per household is capped at EUR 1,500 per month. The direct (monetary) effects of the potential reform are analysed on three levels: fiscal implications; number of receiving households including socio-demographic characteristics; income distribution and risk of poverty. © 2018, Institute of Public Finance.
BASE
Research resilience: why academics and funders alike should care about #RIPTwitter
Twitter is under close scrutiny these days with news that its timeline could be subject to further algorithmic control. Farida Vis looks at what such dramatic changes could mean for research. There is a great need for both funding councils and researchers to better understand the potential impact of these data and platform politics. Strategies must be developed to encourage lesser reliance on a single social media data source.
BASE
Using Administrative Data to Count Local Populations
There is growing evidence that official population statistics based on the decennial census are inaccurate at the local authority level—the fundamental administrative unit of the UK. This paper investigates the use of locally available administrative data sets for counting populations. The method uses truth tables for combining different data sources with different population coverage according to a defined and therefore replicable set of rules. The result is timelier and geographically more flexible data which is more cost-effective to produce than a survey-based census. Associated techniques for linking diverse data sources at individual and household level are briefly discussed. The methodology is then applied to administrative data from a London borough with about 170,000 people. The results are evaluated and compared with other population sources. The paper concludes by discussing potential improvements including scaling up the work to cover multiple local authorities. The practicalities of using alternative central government data sets are briefly considered. A sequel paper in this journal provides examples of key applications of this approach at local level.
BASE
Data for the Cultural and Creative Sector production system: Part 2 – Assembling disparate data resources, and preparations for reporting them
This CICERONE paper (D4.3) is part of series addressing the problem of the lack of data available to describe the Cultural and Creative Sector (CCS) production system. This series explains how and why the currently available data is insufficient in its depth, and breadth of coverage, leading to an appreciation of which activities are made visible, and which are obscured or hidden, by such measures. In the first paper of this series (D4.2), entitled Everything you always wanted to know about data for the Cultural and Creative Sector production system, but were afraid to ask: Part 1 – Problems of statistical description, a first step is taken in proposing what a sufficient taxonomy would look like: a suitable framework of new data collection related to the CCS production system. In this paper, we set out this framework in more detail a following. The purpose of D4.2 was to describe the intersection between definitions, and their operationalisation in taxonomies and actual data collection. It articulates the implications of a 'Romantic' definition of culture that has been used previously with an industrial taxonomy: arguably both notions have been failed. It then describe various attempts to conceptualise and mobilise taxonomies that bridge this divide and, in so doing, articulate their limitations. In this paper (D4.3), we advocate a new data matrix – a radical realignment of concepts and industry taxonomies. This matrix is, in effect, the conceptual and practical foundation of a Cultural Economy Observatory that is built as part of the CICERONE project.
BASE
Competing for the leading role: Trials in categorising greenhouse and energy auditors
This paper considers the inter-professional rivalries that took place as the Australian federal government attempted to register a pool of greenhouse and energy auditors and establish a multidisciplinary team structure for emissions-related reporting and trading schemes from 2007-2019. Drawing on the notions of trials (Callon, 1986; Latour, 1987) and criticism in trials (Bourguignon & Chiapello, 2005), we show how the government's attempts in classifying and determining expert roles and responsibilities from engineering, environmental, and financial backgrounds – with a preference for Big 4 accountants in leadership roles – triggered a series of multi-lateral trials of strength and responsibility, and essentially failed to meet its original purpose. By following the regulatory process, we articulate how terminology and measurement devices were mobilised by the regulator to enrol mixed expertise. We also examine how the envisaged identities, roles, and responsibilities were received by lobbyists from the three expert groups, and then how their concern, criticism, and resistance were acted upon and reacted to by the regulator. Our study reveals the dilemma the non-expert government faced in mediating the conflicting interests and goals while fulfilling its regulatory and administrative roles. Our nuanced evidence shows how accounting team leaders role of supervision rather than oversight evoked further controversy. The study contributes to understanding what happens when conflicting knowledge claims and criticisms meet in a multidisciplinary regulatory regime.
BASE
Are there differences in responses to social identity questions in face-to-face versus telephone interviews? Results of an experiment on a longitudinal survey
This paper investigates the effect of interview mode (telephone vs. face-to-face) on responses to a 13-item module of identity questions covering distinct domains. With increasing moves towards mixed-mode implementation, especially in longitudinal surveys, establishing whether mode effects are likely to influence findings is of practical value. A growing number of studies explore mode effects; but the potential impact of mode on identity questions has not been investigated, even though such questions are increasingly being asked in multi-topic surveys. Adjusting for selection, we find little evidence for specific mode effects. The exception is responses on political identity: telephone responders are eight percentage points more likely to consider politics important to their identity. We do not find differences in data quality as measured by item non-response, straightlining, primacy and recency effects across modes. We conclude that mode effects are small for identity questions.
BASE
Feeding back about eco-feedback: How do consumers use and respond to energy monitors?
To date, a multitude of studies have examined the empirical effect of feedback on energy consumption yet very few have examined how feedback might work and the processes it involves. Moreover, it remains to be seen if the theoretical claims made concerning how feedback works can be substantiated using empirical data. To start to address this knowledge gap, the present research used qualitative data analysis to examine how consumers use and respond to energy monitors. The findings suggest feedback may increase both the physical and conscious visibility of consumption as well as knowledge about consumption. Accordingly, support was evident for the theoretical assertions that feedback transforms energy from invisible to visible, prompts motivated users to learn about their energy habits, and helps address information deficits about energy usage. We conclude by evaluating the feasibility of feedback to substantially reduce consumption and discuss ways in which feedback could be improved to aid its effectiveness in the long term before discussing the implication our findings may have for government policy.
BASE