It is challenging to standardize data; yet, the capabilities to draw upon data across information systems hold huge potential for improving defense acquisition and procurement. Acquisition planning and management involves many decision-making and action-taking processes that cover a complex environment including actual acquisition, contracting, fiscal, legal, personnel, and regulatory requirements. A sound decision-making process has to rely on data;high quality data. Often the available data is dirty, outdated, incomplete, or insufficient for the expert to make a decision. On the other hand, there are enormous amounts of data on the web that can be utilized to crystalize the needed information. These data repositories are often publicly accessible and from a variety of sources including websites, government reports, news, wikis, blogs, online forums, and social media. This paper investigates how to leverage the information in public data sources to complement the internal data in order to support effective acquisition planning and management. This research is based on publicly available government acquisition databases at usaspending.gov and fpds.gov. It takes a data science approach for analyzing acquisition databases and focuses on two major tasks: (1) research on leveraging the web data for quality assessment and improvement of federal acquisition data and (2) research on appropriate data analytic techniques to discover useful information that can potentially help federal acquisition management and planning process. ; Naval Postgraduate School Acquisition Research Program
Over the past decade, the public awareness and availability as well as methods for the creation and use of spatial data on the Web have steadily increased. Besides the establishment of governmental Spatial Data Infrastructures (SDIs), numerous volunteered and commercial initiatives had a major impact on that development. Nevertheless, data isolation still poses a major challenge. Whereas the majority of approaches focuses on data provision, means to dynamically link and combine spatial data from distributed, often heterogeneous data sources in an ad hoc manner are still very limited. However, such capabilities are essential to support and enhance information retrieval for comprehensive spatial decision making. To facilitate spatial data fusion in current SDIs, this thesis has two main objectives. First, it focuses on the conceptualization of a service-based fusion process to functionally extend current SDI and to allow for the combination of spatial data from different spatial data services. It mainly addresses the decomposition of the fusion process into well-defined and reusable functional building blocks and their implementation as services, which can be used to dynamically compose meaningful application-specific processing workflows. Moreover, geoprocessing patterns, i.e. service chains that are commonly used to solve certain fusion subtasks, are designed to simplify and automate workflow composition. Second, the thesis deals with the determination, description and exploitation of spatial data relations, which play a decisive role for spatial data fusion. The approach adopted is based on the Linked Data paradigm and therefore bridges SDI and Semantic Web developments. Whereas the original spatial data remains within SDI structures, relations between those sources can be used to infer spatial information by means of Semantic Web standards and software tools. A number of use cases were developed, implemented and evaluated to underpin the proposed concepts. Particular emphasis was put on the use of established open standards to realize an interoperable, transparent and extensible spatial data fusion process and to support the formalized description of spatial data relations. The developed software, which is based on a modular architecture, is available online as open source. It allows for the development and seamless integration of new functionality as well as the use of external data and processing services during workflow composition on the Web. ; Die Entwicklung des Internet im Laufe des letzten Jahrzehnts hat die Verfügbarkeit und öffentliche Wahrnehmung von Geodaten, sowie Möglichkeiten zu deren Erfassung und Nutzung, wesentlich verbessert. Dies liegt sowohl an der Etablierung amtlicher Geodateninfrastrukturen (GDI), als auch an der steigenden Anzahl Communitybasierter und kommerzieller Angebote. Da der Fokus zumeist auf der Bereitstellung von Geodaten liegt, gibt es jedoch kaum Möglichkeiten die Menge an, über das Internet verteilten, Datensätzen ad hoc zu verlinken und zusammenzuführen, was mitunter zur Isolation von Geodatenbeständen führt. Möglichkeiten zu deren Fusion sind allerdings essentiell, um Informationen zur Entscheidungsunterstützung in Bezug auf raum-zeitliche Fragestellungen zu extrahieren. Um eine ad hoc Fusion von Geodaten im Internet zu ermöglichen, behandelt diese Arbeit zwei Themenschwerpunkte. Zunächst wird eine dienstebasierten Umsetzung des Fusionsprozesses konzipiert, um bestehende GDI funktional zu erweitern. Dafür werden wohldefinierte, wiederverwendbare Funktionsblöcke beschrieben und über standardisierte Diensteschnittstellen bereitgestellt. Dies ermöglicht eine dynamische Komposition anwendungsbezogener Fusionsprozesse über das Internet. Des weiteren werden Geoprozessierungspatterns definiert, um populäre und häufig eingesetzte Diensteketten zur Bewältigung bestimmter Teilaufgaben der Geodatenfusion zu beschreiben und die Komposition und Automatisierung von Fusionsprozessen zu vereinfachen. Als zweiten Schwerpunkt beschäftigt sich die Arbeit mit der Frage, wie Relationen zwischen Geodatenbeständen im Internet erstellt, beschrieben und genutzt werden können. Der gewählte Ansatz basiert auf Linked Data Prinzipien und schlägt eine Brücke zwischen diensteorientierten GDI und dem Semantic Web. Während somit Geodaten in bestehenden GDI verbleiben, können Werkzeuge und Standards des Semantic Web genutzt werden, um Informationen aus den ermittelten Geodatenrelationen abzuleiten. Zur Überprüfung der entwickelten Konzepte wurde eine Reihe von Anwendungsfällen konzipiert und mit Hilfe einer prototypischen Implementierung umgesetzt und anschließend evaluiert. Der Schwerpunkt lag dabei auf einer interoperablen, transparenten und erweiterbaren Umsetzung dienstebasierter Fusionsprozesse, sowie einer formalisierten Beschreibung von Datenrelationen, unter Nutzung offener und etablierter Standards. Die Software folgt einer modularen Struktur und ist als Open Source frei verfügbar. Sie erlaubt sowohl die Entwicklung neuer Funktionalität durch Entwickler als auch die Einbindung existierender Daten- und Prozessierungsdienste während der Komposition eines Fusionsprozesses.
This presentation addresses cultural heritage data-sharing practices through the use of Republic of Korea open government data for data-curation and data integration. Data curation enables data-sharing throughout the data management life cycle to create new value for new user needs. Our research employed a visualization phase, in which we used domain analytical techniques to better understand the contents of the population of 375 library-related open government cultural heritage data available at the Korean Open Government Website (http://data.go.kr/). Researchers translated all records from Korean to English. Data were in unstructured and in heterogeneous formats such as file formats, data formats and or web addresses. For data curation and integration, we employed the meta-level ontology known as the CIDOC-CRM, which we applied qualitatively to small sets of carefully selected records. To map instantiation of records, which is required for data integration, we used FRBRoo (Functional Requirements for Bibliographic Records – object oriented), an extension of the CIDOC CRM, to map the instantiation of data records in a typical data-sharing scenario. Then, equivalent mapping processes were comparatively tested with visualizations to demonstrate the effective harmonization between the CIDOC CRM and FRBRoo, which enables the integration of metadata and data curation from unstructured and heterogeneous formats. This presentation may contribute to the cross- or meta-institutional integration of curation across institutional boundaries in cultural heritage.
The amount of data in our world today is substantially mammoth. Many of the personal and non-personal aspects of our day to day activities are aggregated and stored as data by both businesses and governments. The increasing data captured through multimedia, social media, and the Internet of Things is a phenomenon that needs to be properly examined. In this article, we explore this topic, and analyse the term data ownership. We aim to raise awareness and trigger debate for policy makers around data ownership and the need to improve existing data protection and privacy laws and legislation at both national and international levels.
"Data ownership" is actually an oxymoron, because there could not be a copyright (ownership) on facts or ideas, hence no data onwership rights and law exist. The term refers to various kinds of data protection instruments: Intellectual Property Rights (IPR) (mostly copyright) asserted to indicate some kind of data ownership, confidentiality clauses/rules, database right protection (in the European Union only), or personal data protection (GDPR) (Scassa 2018). Data protection is often realised via different mechanisms of "data hoarding", that is witholding access to data for various reasons (Sieber 1989). Data hoarding, however, does not put the data into someone's ownership. Nonetheless, the access to and the re-use of data, and biodiversuty data in particular, is hampered by technical, economic, sociological, legal and other factors, although there should be no formal legal provisions related to copyright that may prevent anyone who needs to use them (Egloff et al. 2014, Egloff et al. 2017, see also the Bouchout Declaration). One of the best ways to provide access to data is to publish these so that the data creators and holders are credited for their efforts. As one of the pioneers in biodiversity data publishing, Pensoft has adopted a multiple-approach data publishing model, resulting in the ARPHA-BioDiv toolbox and in extensive Strategies and Guidelines for Publishing of Biodiversity Data (Penev et al. 2017a, Penev et al. 2017b). ARPHA-BioDiv consists of several data publishing workflows: Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of ...
"Data ownership" is actually an oxymoron, because there could not be a copyright (ownership) on facts or ideas, hence no data onwership rights and law exist. The term refers to various kinds of data protection instruments: Intellectual Property Rights (IPR) (mostly copyright) asserted to indicate some kind of data ownership, confidentiality clauses/rules, database right protection (in the European Union only), or personal data protection (GDPR) (Scassa 2018). Data protection is often realised via different mechanisms of "data hoarding", that is witholding access to data for various reasons (Sieber 1989). Data hoarding, however, does not put the data into someone's ownership. Nonetheless, the access to and the re-use of data, and biodiversuty data in particular, is hampered by technical, economic, sociological, legal and other factors, although there should be no formal legal provisions related to copyright that may prevent anyone who needs to use them (Egloff et al. 2014, Egloff et al. 2017, see also the Bouchout Declaration). One of the best ways to provide access to data is to publish these so that the data creators and holders are credited for their efforts. As one of the pioneers in biodiversity data publishing, Pensoft has adopted a multiple-approach data publishing model, resulting in the ARPHA-BioDiv toolbox and in extensive Strategies and Guidelines for Publishing of Biodiversity Data (Penev et al. 2017a, Penev et al. 2017b). ARPHA-BioDiv consists of several data publishing workflows: Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph In combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, these approaches show different angles to the future of biodiversity data publishing and, lay the foundations of an entire data publishing ecosystem in the field, while also supplying FAIR (Findable, Accessible, Interoperable and Reusable) data to several interoperable overarching infrastructures, such as Global Biodiversity Information Facility (GBIF), Biodiversity Literature Repository (BLR), Plazi TreatmentBank, OpenBiodiv, as well as to various end users.
Current big data practices are largely guided by deliberations concerning their efficiency, and optimisation. Yet there is another perspective. This book highlights that the capacity for gathering, analysing, and utilising vast amounts of digital (user) data raise significant ethical issues. Annika Richterich provides a systematic contemporary overview of the field of critical data studies that reflects on – corporate, institutional, and governmental – practices of digital data collection and analysis. It assesses in detail one big data research area: biomedical studies, focused on epidemiological surveillance. Specific case studies explore how big data have been used in academic work. The Big Data Agenda concludes by asking if data ownership can be reclaimed by citizens from being simply an assertion of a conception of rights to (user) data that is defined by technological domination. She argues data literacy and discourse ethics may contain solutions as well as a critique.
Data have significant potential to address current societal problems not only at the federal and state levels, but also in smaller communities, in neighborhoods, and in the lives of individuals. While the proposition for this potential is that data are and will be shared with and reused by and for communities at different levels, not all data are not systematically or routinely shared for reuse with communities due to social, structural and technical infrastructure barriers. Data intermediary organizations can play a significant role in removing existing barriers while unlocking the potential of data for all, particularly for communities with limited human or financial resources, limited access to existing data infrastructures, and underserved populations. Considering the significance of the data intermediary organizations on local communities, this study aims to explore the role of intermediaries that usually facilitate community members/organizations' data utilization. The findings of this study reveal that data intermediary organizations play four major roles that are crucial in communities' data utilization: (1) democratizing data, (2) adding value to existing data, (3) enhancing communities' data literacy, and (4) building communities' data capacity. This study has several important implications to offer a solution to overcome the challenges of data reuse at the local level. ; Institute for Museum and Library Services
Data exploration and visualization are a highly accessible gateway activity to learning data science. In this talk, we discuss our experience with "Data Scavenger Hunts" using web apps to democratize data science and make it accessible to a wide variety of audiences. In order to acheive this, we have developed an R package called `burro` that can enable public datasets to be explored together via a sharable web app. In this talk, we talk about our experience with using data scavenger hunts to teach each other interesting things about data. In particular, we share our experiences with exploring the NHANES (National Health Nutirition Examination Survey) data and the insights we have taught each other. We show that this guided and communal data exploration leads to increased confidence and curiosity about data science in Biodata-Club, our learning community. `burro` apps can be deployed by anyone to start conversations about data.
The 2011 Canadian Research Data Summit was held at the Ottawa Convention Centre on September 14. About the Summit On September 14-15, 2011, The 2011 Canadian Research Data Summit brings together 100-150 senior researchers, high level policy makers, university administrators, and members of the private sector. Together, participants will work on formulating a shared strategy for addressing the challenges and opportunities for maximizing the benefits of our collective investment in research data in Canada. The Summit will act as a catalyst for the development of a made-inCanada approach for maximizing the availability and use of research data. About the Research Data Strategy Working Group The Research Data Strategy Working Group is a collaborative effort launched in 2008 to address the challenges and issues surrounding the access and preservation of data arising from Canadian research. This multi-disciplinary group of universities, institutes, libraries, operators of research infrastructure, granting agencies, governments, and individual researchers are united through a shared recognition of the pressing need to deal with Canadian data stewardship issues. Together, they are focussing on the necessary actions, next steps and leadership roles that researchers and institutions can take to ensure Canada's research data are accessible and usable for current and future generations of researchers
Inaugural iSchool Lecture, Linnaeus University Växjö, Sweden, Monday, 7 May 2018 The growth of information studies, as reflected by the international expansion of iSchools, reflects a broad research and teaching agenda in social, technical, institutional, and political aspects of the information society. As data science, scholarship, and stewardship are central to the iSchool agenda, they provide a framework to launch the new iSchool at Linnaeus University. Whereas almost all fields of scholarship today are conducting data-intensive research, only a few areas are adept at exploiting "big data." "Little data" remains the norm in those many fields where evidence is scarce and labor-intensive to acquire. Until recently, data was considered part of the process of scholarship, essential but largely invisible. In the "big data" era, data have become valuable products to be captured, shared, reused, and stewarded for the long term. They also have become contentious intellectual property to be protected. Public policy leans toward open access to research data, but rarely provides the public investment necessary to sustain access. Enthusiasm for big data is obscuring the complexity and diversity of data in scholarship and the challenges for stewardship. Data practices are local, varying from field to field, individual to individual, and country to country. As the number and variety of research partners expands, so do the difficulties of sharing, reusing, and sustaining access to data. Until the larger questions of knowledge infrastructures and stewardship are addressed by research communities, "no data" may become the norm for many fields. This talk will explore the stakes and stakeholders in research data, focusing on implications for iSchool policy and practice, drawing upon the presenter's book, Big Data, Little Data, No Data: Scholarship in the Networked World (MIT Press, 2015), and subsequent research.
The Southern California Climate Data Protection Project is committed to protecting and preserving scientific climate data, through systematic analysis of infrastructures and methods of data collection, curation, and management. We are equally concerned with how access to scientific data allows the public to invest in government accountability and to demand sustainable policies.This workshop on Inauguration Day was on political action to sustain access to essential data on climate change.Date: 9am-3pm, January 20, 2017Location: Department of Information Studies, GSEIS Room 111, UCLA290 Charles E Young Dr N, Los Angeles, CA 90095 Information: http://www.climatedataprotection.net/
Data localization laws are emerging as a pernicious form of non-tariff barrier which significantly harms the growth of trade in a digitally powered world. An International Political Economy approach provides a more comprehensive analysis of the policy rationale behind such laws, as compared to a purely economic approach, which only focuses on economic losses resulting from protectionism. On a closer analysis, it is found that different countries may have different policy rationales for implementing data localization laws – while some promote their domestic ICT industry through forced localization measures, others have concerns regarding national security, privacy, and ensuring sovereign control in the highly privatized world of internet governance. It is not always possible to demarcate the "protectionist" rationale from that of rational "data protection". To address data localization effectively and facilitate digital trade, it is not sufficient to negotiate for free flow of data in trade agreements without Governments and companies being open and transparent about the related issues of privacy, national security and consumer protection. Particularly, the role of US Government as well as leading US-based technology companies will be instrumental in this regard. At the same time, it may be necessary to develop policy initiatives both to encourage transparent and clear international standards on data security, as well as to enable higher levels of digital innovation in developing countries such that they can harness the benefits of evolving internet technologies.
Personal data is increasingly positioned as a valuable asset. While individuals generate and expose ever-expanding volumes of personal information online, certain tech companies have built their business models on the personal data they gather. In this context, lawmakers are revising data protection regulations in order to provide individuals with enhanced rights and set new rules regarding the way corporations collect, manage, and share personal information. We argue that recent data protection regulatory frameworks such as the European Union's General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA) are fundamentally about data management. Yet, there have been no attempts to analyze the regulations in terms of their implications on the data life cycle. In this paper, we systematically analyze the GDPR and the CCPA, and identify their implications on the data life cycle. To synthesize our findings, we propose a semi-formal notation of the resulting changes on the personal data life cycle, in the form of a process and data model governed by business rules, consolidated in a reference personal data life cycle model for data protection. To the best of our knowledge, this study represents one of the first attempts to provide a data-centric view on data protection regulatory requirements.