This is a non-credit, free course which provides guidelines for good practice in research data management. The course is particularly appropriate for postgraduate students and early career researchers who work with data and would like to learn more about managing their research data. The course content is mainly geared for three disciplines: geosciences, social and political sciences and clinical psychology, however, many of the issues covered apply equally to all research disciplines. This course is an Open Educational Resource that may be freely used by anyone. It is available through an open license for re-using, rebranding, repurposing.
Big Data collected by customer-facing organisations – such as smartphone logs, store loyalty card transactions, smart travel tickets, social media posts, or smart energy meter readings – account for most of the data collected about citizens today. As a result, they are transforming the practice of social science. Consumer Big Data are distinct from conventional social science data not only in their volume, variety and velocity, but also in terms of their provenance and fitness for ever more research purposes. The contributors to this book, all from the Consumer Data Research Centre, provide a first consolidated statement of the enormous potential of consumer data research in the academic, commercial and government sectors – and a timely appraisal of the ways in which consumer data challenge scientific orthodoxies.
Making published, scientific research data publicly available can benefit scientists and policy makers only if there is sufficient information for these data to be intelligible. Thus the necessary meta-data go beyond the scientific, technological detail and extend to the statistical approach and methodologies applied to these data. The statistical principles that give integrity to researchers' analyses and interpretations of their data require documentation. This is true when the intent is to verify or validate the published research findings; it is equally true when the intent is to utilize the scientific data in conjunction with other data or new experimental data to explore complex questions; and it is profoundly important when the scientific results and interpretations are taken outside the world of science to establish a basis for policy, for legal precedent or for decision-making. When research draws on already public data bases, e.g., a large federal statistical data base or a large scientific data base, selection of data for analysis, whether by selection (subsampling) or by aggregating, is specific to that research so that this (statistical) methodology is a crucial part of the meta-data. Examples illustrate the role of statistical meta-data in the use and reuse of these public datasets and the impact on public policy and precedent.
Big Data collected by customer-facing organisations - such as smartphone logs, store loyalty card transactions, smart travel tickets, social media posts, or smart energy meter readings - account for most of the data collected about citizens today. As a result, they are transforming the practice of social science. Consumer Big Data are distinct from conventional social science data not only in their volume, variety and velocity, but also in terms of their provenance and fitness for ever more research purposes. The contributors to this book, all from the Consumer Data Research Centre, provide a first consolidated statement of the enormous potential of consumer data research in the academic, commercial and government sectors - and a timely appraisal of the ways in which consumer data challenge scientific orthodoxies.
Purpose This paper aims to present pertinent research challenges in the field of (big) data-informed policy-making based on the research, undertaken within the course of the European Union-funded project Big Policy Canvas. Technological advancements, especially in the past decade, have revolutionised the way that both every day and complex activities are conducted. It is, thus, expected that a particularly important actor such as the public sector, should constitute a successful disruption paradigm through the adoption of novel approaches and state-of-the-art information and communication technologies.
Design The research challenges stem from a need, trend and asset assessment based on qualitative and quantitative research, as well as from the identification of gaps and external framework factors that hinder the rapid and effective uptake of data-driven policy-making approaches.
Findings The current paper presents a set of research challenges categorised in six main clusters, namely, public governance framework, privacy, transparency, trust, data acquisition, cleaning and representativeness, data clustering, integration and fusion, modelling and analysis with big data and data visualisation.
Originality/value The paper provides a holistic overview of the interdisciplinary research challenges in the field of data-informed policy-making at a glance and shall serve as a foundation for the discussion of future research directions in a broader scientific community. It, furthermore, underlines the necessity to overcome isolated scientific views and treatments because of a high complex multi-layered environment.
ObjectivesGovernments acquire extensive data holdings and face increasing pressure to make these available as record-level microdata for research. However, turning data into research-ready data (RRD) is not a straightforward exercise. We demonstrate how even in simple cases researcher involvement can bring substantial rewards for effective RRD development.
MethodsThis paper reports on an ADRUK-funded project to take a dataset originally collected by the Office for National Statistics for official statistics (the UK Annual Survey of Hours and Earnings, ASHE), formally review its microanalytical characteristics, link it to Census 2011 data, and prepare a new 'research ready dataset' with appropriate documentation and coding. This should have been straightforward as the datasets had already been widely used as research microdata. However, the involvement of academic researchers in the production of research-ready data led to many important new insights.
ResultsThe research programme had 3 aims: testing assumptions about the data; reviewing data quality; and adding value.
Because of its sampling model, ASHE is assumed to have random non-response both longitudinally and in cross section. The research team showed that was untrue: there was higher attrition than expected, and both longitudinal and cross-sectional non-response appeared non-random..
The data quality review showed further concerns about the accuracy of some geographical indicators, and some variables of opaque provenance; in contrast, we confirmed the accuracy of administrative variables created by ONS.
As well as being important for researchers, these findings have the potential for significant effects on official statistics produced from the source data, enhancing the value of the source data.
Finally, value was added from new variables which reflected the team's wide research interests
ConclusionOften in government the assumption is that creating RRDs is a matter of creatign files and giving access to the researchers. Insights from our work show that the deep involvement of the research community can bring rewards for both data holders and researchers. For RRDs, researcher-led construction is vital.