Over the last few decades, the research practice in natural sciences has changed dramatically. Remote sensing, rapid identification and molecular approaches allow us to efficiently monitor the changing world around us and understand the cause of those changes. Advances of digital, genomic and information technologies enable natural science collections to provide novel discoveries and ask for new collection types and attributes, while fostering the development of innovative approaches to face the urgent societal challenges. Natural Science Collections (three billion specimens globally) represent an unparalleled scientific asset. They constitute a unique source of diverse data classes, including genomic, chemical, morphological and geo-spatial information. Despite existing successful examples of infrastructures, aggregating and publishing specific data classes (such as the Global Biodiversity Information Facility, GenBank or the Encyclopedia of Life - TraitBank), the landscape remains fragmented with limited capacity to bring together this information in a systematic and robust manner. The Distributed System of Scientific Collections (DiSSCo) represents a pan-European initiative, and the largest ever agreement of natural science museums, to jointly address the fragmentation of European collections. DiSSCo is set to unify European natural science collections into a coherent new research infrastructure, able to provide bio- and geo-diversity data at the scale, form and precision required by a multi-disciplinary user base. At the heart of the technical implementation of DiSSCo, is the development of a cloud-based non-relational data store that links occurrence, genomic, chemical and trait data classes, by robustly and unambiguously anchoring each data object back to the physical object. By harmonising digitisation, curation and publication processes and workflows, across all its nodes, DiSSCo can populate and serve a knowledge graph for European natural science collections. In this paper we will introduce the vision, mission and objectives of DiSSCo, discuss the technical approach and touch upon the socio-cultural and governance aspects supporting this large-scale European endeavour. DiSSCo is applying for inclusion in the 2018 European roadmap for research infrastructures, through an evaluation process organised by the European Strategy Forum on Research Infrastructures (ESFRI). This process is politically and/or financially supported by 12 European countries and an expanding network of 95 natural science museums in 20 countries.
European Natural Science Collections (NSC) are part of the global natural and cultural capital and represent 80% of the world bio-and geo-diversity. Data derived from these collections underpin thousands of scholarly publications and official reports (used to support legislative and regulatory processes relating to health, food, security, sustainability and environmental change) and let to inventions and products that today play an important role in our bio-economy. In the last decades, the research practice in natural sciences changed dramatically. Advances in digital, genomic and information technologies enable natural science collections to provide new insights but also ask for changing the current operational and business models of individual collections held at local natural history museums and universities. A new business model that provides unified access to collection objects and all scientific data derived from them. Although aggregating infrastructures like the Global Biodiversity Information Facility, GenBank and Catalogue of Life now successfully aggregate data on specific data classes, the landscape remains fragmented with limited capacity to bring together this information in a systematic and robust manner and with scattered access to the physical objects. The Distributed System of Scientific Collections (DiSSCo) represents a pan-European initiative, and the largest ever agreement of natural science museums, to jointly address the fragmentation of European collections. DiSSCo is unifying European natural science collections into a coherent new research infrastructure, able to provide bio- and geo-diversity data at the scale, form and precision required by a multi-disciplinary user base in science. DiSSCo is harmonising digitisation, curation and publication processes and workflows across the scientific collections in Europe and enables linking of occurrence, genomic, chemical and morphological data classes as well as publications and experts to the physical object. In this paper we will present the ...
European Natural Science Collections (NSC) are part of the global natural and cultural capital and represent 80% of the world bio-and geo-diversity. Data derived from these collections underpin thousands of scholarly publications and official reports (used to support legislative and regulatory processes relating to health, food, security, sustainability and environmental change) and let to inventions and products that today play an important role in our bio-economy. In the last decades, the research practice in natural sciences changed dramatically. Advances in digital, genomic and information technologies enable natural science collections to provide new insights but also ask for changing the current operational and business models of individual collections held at local natural history museums and universities. A new business model that provides unified access to collection objects and all scientific data derived from them. Although aggregating infrastructures like the Global Biodiversity Information Facility, GenBank and Catalogue of Life now successfully aggregate data on specific data classes, the landscape remains fragmented with limited capacity to bring together this information in a systematic and robust manner and with scattered access to the physical objects. The Distributed System of Scientific Collections (DiSSCo) represents a pan-European initiative, and the largest ever agreement of natural science museums, to jointly address the fragmentation of European collections. DiSSCo is unifying European natural science collections into a coherent new research infrastructure, able to provide bio- and geo-diversity data at the scale, form and precision required by a multi-disciplinary user base in science. DiSSCo is harmonising digitisation, curation and publication processes and workflows across the scientific collections in Europe and enables linking of occurrence, genomic, chemical and morphological data classes as well as publications and experts to the physical object. In this paper we will present the socio-cultural and governance aspects of this research infrastructure. DiSSCo is receiving political support from 11 countries in Europe and will gradually change its funding model from institutional to national funding, with temporary funding from the EC to support the preparation and development. Solutions to achieve large scale digitisation are currently designed in the EC funded ICEDIG project to underpin the future large scale digitisation carried out by the countries. Unified virtual (digitisation on demand) and transnational physical access to the collections is over the next four years being developed in the EC funded SYNTHESYS+ project. The governance of DiSSCo is designed to gradually change from a steering committee composed of a few large natural history museums contributing in cash to initiate the development into a legal entity in which national consortia are represented, with a central coordination office for daily management. Each country individually decides how its entities (scientific collection facilities, research councils, governmental bodies) are organised in their national consortium. A stakeholder and user forum, Scientific Advisory Board and International Advisory Board will ensure that DiSSCo will be functional in enabling science across disciplines and within the international landscape of infrastructures. Training and short scientific missions are being developed in the MOBILISE COST Action to build capacity in FAIR data production, publication and usage of scientific collection-derived data in Europe and to initiate the socio-cultural changes needed in the collection-holding institutes. A Helpdesk is being constructed in the SYNTHESYS+ and DiSSCo Prepare projects to further facilitate the use and scientific use cases have been collected in ICEDIG to develop and facilitate e-services tailored to scientific needs.
DiSSCo (The Distributed System of Scientific Collections) is a Research Infrastructure (RI) aiming at providing unified physical (transnational), remote (loans) and virtual (digital) access to the approximately 1.5 billion biological and geological specimens in collections across Europe. DiSSCo represents the largest ever formal agreement between natural science museums (114 organisations across 21 European countries). With political and financial support across 14 European governments and a robust governance model DiSSCo will deliver, by 2025, a series of innovative end-user discovery, access, interpretation and analysis services for natural science collections data. As part of DiSSCo's developing data model, we evaluate the application of Digital Objects (DOs), which can act as the centrepiece of its architecture. DOs have bit-sequences representing some content, are identified by globally unique persistent identifiers (PIDs) and are associated with different types of metadata. The PIDs can be used to refer to different types of information such as locations, checksums, types and other metadata to enable immediate operations. In the world of natural science collections, currently fragmented data classes (inter alia genes, traits, occurrences) that have derived from the study of physical specimens, can be re-united as parts in a virtual container (i.e., as components of a Digital Object). These typed DOs, when combined with software agents that scan the data offered by repositories, can act as complete digital surrogates of the physical specimens. In this paper we: investigate the architectural and technological applicability of DOs for large scale data RIs for bio- and geo-diversity, identify benefits and challenges of a DO approach for the DiSSCo RI and describe key specifications (incl. metadata profiles) for a specimen-based new DO type.
DiSSCo (The Distributed System of Scientific Collections) is a Research Infrastructure (RI) aiming at providing unified physical (transnational), remote (loans) and virtual (digital) access to the approximately 1.5 billion biological and geological specimens in collections across Europe. DiSSCo represents the largest ever formal agreement between natural science museums (114 organisations across 21 European countries). With political and financial support across 14 European governments and a robust governance model DiSSCo will deliver, by 2025, a series of innovative end-user discovery, access, interpretation and analysis services for natural science collections data. As part of DiSSCo's developing data model, we evaluate the application of Digital Objects (DOs), which can act as the centrepiece of its architecture. DOs have bit-sequences representing some content, are identified by globally unique persistent identifiers (PIDs) and are associated with different types of metadata. The PIDs can be used to refer to different types of information such as locations, checksums, types and other metadata to enable immediate operations. In the world of natural science collections, currently fragmented data classes (inter alia genes, traits, occurrences) that have derived from the study of physical specimens, can be re-united as parts in a virtual container (i.e., as components of a Digital Object). These typed DOs, when combined with software agents that scan the data offered by repositories, can act as complete digital surrogates of the physical specimens. In this paper we: investigate the architectural and technological applicability of DOs for large scale data RIs for bio- and geo-diversity, identify benefits and challenges of a DO approach for the DiSSCo RI and describe key specifications (incl. metadata profiles) for a specimen-based new DO type.
The Distributed System of Scientific Collections (DiSSCo) Research Infrastructure (RI) is presently in its preparatory phase. DiSSCo is developing a new distributed RI to operate as a one-stop-shop for the envisaged European Natural Science Collection (NSC) and all its derived information. Through mass digitisation, DiSSCo will transform the fragmented landscape of NSCs, including an estimated 1.5 billion specimens, into an integrated knowledge base that will provide interconnected evidence of the natural world. Data derived from European NSCs underpin countless discoveries and innovations, including tens of thousands of scholarly publications and official reports annually (supporting legislative and regulatory processes on sustainability, environmental change, land use, societal infrastructure, health, food, security, etc.); base-line biodiversity data; inventions and products essential to bio-economy; databases, maps and descriptions of scientific observations; educational material for students; and instructive and informative resources for the public. To expand the user community, DiSSCo will strengthen capacity building across Europe for maximum engagement of stakeholders in the biodiversity-related field and beyond, including industry and the private sector, but also policy-driving entities. Hence, it is opportune to reach out to relevant stakeholders in the European environmental policy domain represented by the European Environment Agency (EEA). The EEA aims to support sustainable development by helping to achieve significant and measurable improvement in Europe's environment, through the provision of timely, targeted, relevant and reliable information to policy-making agents and the public. The EEA provides information through the European Environment Information and Observation System (Eionet). The aim of this white paper is to open the discussion between DiSSCo and the EEA and identify the common service interests that are relevant for the European environmental policy domain. The first section ...
The Distributed System of Scientific Collections (DiSSCo) Research Infrastructure (RI) is presently in its preparatory phase. DiSSCo is developing a new distributed RI to operate as a one-stop-shop for the envisaged European Natural Science Collection (NSC) and all its derived information. Through mass digitisation, DiSSCo will transform the fragmented landscape of NSCs, including an estimated 1.5 billion specimens, into an integrated knowledge base that will provide interconnected evidence of the natural world. Data derived from European NSCs underpin countless discoveries and innovations, including tens of thousands of scholarly publications and official reports annually (supporting legislative and regulatory processes on sustainability, environmental change, land use, societal infrastructure, health, food, security, etc.); base-line biodiversity data; inventions and products essential to bio-economy; databases, maps and descriptions of scientific observations; educational material for students; and instructive and informative resources for the public. To expand the user community, DiSSCo will strengthen capacity building across Europe for maximum engagement of stakeholders in the biodiversity-related field and beyond, including industry and the private sector, but also policy-driving entities. Hence, it is opportune to reach out to relevant stakeholders in the European environmental policy domain represented by the European Environment Agency (EEA). The EEA aims to support sustainable development by helping to achieve significant and measurable improvement in Europe's environment, through the provision of timely, targeted, relevant and reliable information to policy-making agents and the public. The EEA provides information through the European Environment Information and Observation System (Eionet). The aim of this white paper is to open the discussion between DiSSCo and the EEA and identify the common service interests that are relevant for the European environmental policy domain. The first section describes the significance of (digital) Natural Science Collections (NHCs). Section two describes the DiSSCo programme with all DiSSCo aligned projects. Section three provides background information on the EEA and the biodiversity infrastructures that are developed and maintained by the EEA. The fourth section illustrates a number of use cases where the DiSSCo consortium sees opportunities for interaction between the DiSSCo RI and the Eionet portal of the EEA. Opening the discussion with the EEA in this phase of maturity of DiSSCo will ensure that the infrastructural design of DiSSCo and the development of e-Services accommodate the present and future needs of the EEA and assure data interoperability between the two infrastructures.The aim of this white paper is to present benefits from identifying the common service interests of DiSSCo and the EEA. A brief introduction to natural science collections as well as the two actors is given to facilitate the understanding of the needs and possibilities in the alignment of DiSSCo with the EEA.
DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking, DiSSCo makes the data from natural science collections available as one virtual data cloud, connected with data emerging from new techniques and not already linked to specimens. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (Computer-assisted Tomography (CT), Synchrotron, etc.), to name but a few; and will lead to a wide range of end-user services that begins with finding, accessing, using and improving data. DiSSCo will deliver the diagnostic information required for novel approaches and new services that will transform the landscape of what is possible in ways that are hard to imagine today.With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in many tens of petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. This requires new skills, clear policies and robust procedures and new technologies to create, work with and manage large digital datasets over their entire research data lifecycle, including their long-term storage and preservation and open access. Such processes and procedures must match and be derived from the latest thinking in open science and data management, realising the core principles of 'findable, accessible, interoperable and reusable' (FAIR).Synthesised from results of the ICEDIG project ("Innovation and Consolidation for Large Scale Digitisation of Natural Heritage", EU Horizon 2020 grant agreement No. 777483) the DiSSCo Conceptual Design Blueprint covers the organisational arrangements, processes and practices, the architecture, tools and technologies, culture, skills and capacity building and governance and business model proposals for constructing the digitisation infrastructure of DiSSCo. In this context, the digitisation infrastructure of DiSSCo must be interpreted as that infrastructure (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities for mass digitisation and digitisation-on-demand, and for the subsequent management (i.e., curation, publication, processing) and use of the resulting data. The blueprint constitutes the essential background needed to continue work to raise the overall maturity of the DiSSCo Programme across multiple dimensions (organisational, technical, scientific, data, financial) to achieve readiness to begin construction.Today, collection digitisation efforts have reached most collection-holding institutions across Europe. Much of the leadership and many of the people involved in digitisation and working with digital collections wish to take steps forward and expand the efforts to benefit further from the already noticeable positive effects. The collective results of examining technical, financial, policy and governance aspects show the way forward to operating a large distributed initiative i.e., the Distributed System of Scientific Collections (DiSSCo) for natural science collections across Europe. Ample examples, opportunities and need for innovation and consolidation for large scale digitisation of natural heritage have been described. The blueprint makes one hundred and four (104) recommendations to be considered by other elements of the DiSSCo Programme of linked projects (i.e., SYNTHESYS+, COST MOBILISE, DiSSCo Prepare, and others to follow) and the DiSSCo Programme leadership as the journey towards organisational, technical, scientific, data and financial readiness continues.Nevertheless, significant obstacles must be overcome as a matter of priority if DiSSCo is to move beyond its Design and Preparatory Phases during 2024. Specifically, these include:Organisational:Strengthen common purpose by adopting a common framework for policy harmonisation and capacity enhancement across broad areas, especially in respect of digitisation strategy and prioritisation, digitisation processes and techniques, data and digital media publication and open access, protection of and access to sensitive data, and administration of access and benefit sharing.Pursue the joint ventures and other relationships necessary to the successful delivery of the DiSSCo mission, especially ventures with GBIF and other international and regional digitisation and data aggregation organisations, in the context of infrastructure policy frameworks, such as EOSC. Proceed with the explicit aim of avoiding divergences of approach in global natural science collections data management and research.Technical:Adopt and enhance the DiSSCo Digital Specimen Architecture and, specifically as a matter of urgency, establish the persistent identifier scheme to be used by DiSSCo and (ideally) other comparable regional initiatives.Establish (software) engineering development and (infrastructure) operations team and direction essential to the delivery of services and functionalities expected from DiSSCo such that earnest engineering can lead to an early start of DiSSCo operations.Scientific:Establish a common digital research agenda leveraging Digital (extended) Specimens as anchoring points for all specimen-associated and -derived information, demonstrating to research institutions and policy/decision-makers the new possibilities, opportunities and value of participating in the DiSSCo research infrastructure.Data:Adopt the FAIR Digital Object Framework and the International Image Interoperability Framework as the low entropy means to achieving uniform access to rich data (image and non-image) that is findable, accessible, interoperable and reusable (FAIR).Develop and promote best practice approaches towards achieving the best digitisation results in terms of quality (best, according to agreed minimum information and other specifications), time (highest throughput, fast), and cost (lowest, minimal per specimen).FinancialBroaden attractiveness (i.e., improve bankability) of DiSSCo as an infrastructure to invest in.Plan for finding ways to bridge the funding gap to avoid disruptions in the critical funding path that risks interrupting core operations; especially when the gap opens between the end of preparations and beginning of implementation due to unsolved political difficulties.Strategically, it is vital to balance the multiple factors addressed by the blueprint against one another to achieve the desired goals of the DiSSCo programme. Decisions cannot be taken on one aspect alone without considering other aspects, and here the various governance structures of DiSSCo (General Assembly, advisory boards, and stakeholder forums) play a critical role over the coming years.
BiCIKL is an European Union Horizon 2020 project that will initiate and build a new European starting community of key research infrastructures, establishing open science practices in the domain of biodiversity through provision of access to data, associated tools and services at each separate stage of and along the entire research cycle. BiCIKL will provide new methods and workflows for an integrated access to harvesting, liberating, linking, accessing and re-using of subarticle-level data (specimens, material citations, samples, sequences, taxonomic names, taxonomic treatments, figures, tables) extracted from literature. BiCIKL will provide for the first time access and tools for seamless linking and usage tracking of data along the line: specimens > sequences > species > analytics > publications > biodiversity knowledge graph > re-use.