This paper presents a framework to develop and execute applications in distributed and highly dynamic computing systems composed of cloud resources and fog devices such as mobile phones, cloudlets, and micro-clouds. The work builds on the COMPSs programming framework, which includes a programming model and a runtime already validated in HPC and cloud environments for the transparent execution of parallel applications. As part of the proposed contribution, COMPSs has been enhanced to support the execution of applications on mobile platforms that offer GPUs and CPUs. The scheduling component of COMPSs is under design to be able to offload the computation to other fog devices in the same level of the hierarchy and to cloud resources when more computational power is required. The framework has been tested executing a sample application on a mobile phone offloading task to a laptop and a private cloud. ; This work is partly supported by the Spanish Ministry of Science and Technology through project TIN2015-65316-P and grant BES-2013-067167, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation programme under grant 730929 (mF2C Project). ; Peer Reviewed ; Postprint (author's final draft)
Recently cloud services have been evaluated by scientific communities as a viable solution to satisfy their computing needs, reducing the cost of ownership and operation to the minimum. The analysis on the adoption of the cloud computing model for eScience has identified several areas of improvement as federation management and interoperability between providers. Portability between cloud vendors is a widely recognized feature to avoid the risk of lock-in of users in proprietary systems, a stopper to the complete adoption of clouds. In this paper we present a programming framework that allows the coordination of applications on federated clouds used to provide flexibility to traditional research infrastructures as clusters and grids. This approach allows researchers to program applications abstracting the underlying infrastructure and providing scaling and elasticity features through the dynamic provision of virtualized resources. The adoption of standard interfaces is a basic feature in the development of connectors for different middlewares ensuring the portability of the code between different providers. ; This work has been supported through the grants: SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government and the Spanish Ministry of Science and Innovation under contract TIN2012-34557, the Generalitat de Catalunya (contract 2009-SGR-980) and the European Commission (EUBrazilOpenbio: RI-288754 and EGI-InSPIRE: RI- 261323). ; Peer Reviewed ; Postprint (published version)
This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking. ; This work is partly supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316 project, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation program under grant 690116 (EUBra-BIGSEA Project). Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation. ; Peer Reviewed ; Postprint (author's final draft)
This paper presents a framework to develop and execute applications in distributed and highly dynamic computing systems composed of cloud resources and fog devices such as mobile phones, cloudlets, and micro-clouds. The work builds on the COMPSs programming framework, which includes a programming model and a runtime already validated in HPC and cloud environments for the transparent execution of parallel applications. As part of the proposed contribution, COMPSs has been enhanced to support the execution of applications on mobile platforms that offer GPUs and CPUs. The scheduling component of COMPSs is under design to be able to offload the computation to other fog devices in the same level of the hierarchy and to cloud resources when more computational power is required. The framework has been tested executing a sample application on a mobile phone offloading task to a laptop and a private cloud. ; This work is partly supported by the Spanish Ministry of Science and Technology through project TIN2015-65316-P and grant BES-2013-067167, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation programme under grant 730929 (mF2C Project). ; Peer Reviewed ; Postprint (author's final draft)
This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking. ; This work is partly supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316 project, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation program under grant 690116 (EUBra-BIGSEA Project). Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation. ; Peer Reviewed ; Postprint (author's final draft)
New scientific instruments are starting to generate an unprecedented amount of data. The Low Frequency Array (LOFAR), one of the Square Kilometre Array (SKA) pathfinders, is already producing data on a petabyte scale. The calibration of these data presents a huge challenge for final users: (a) extensive storage and computing resources are required; (b) the installation and maintenance of the software required for the processing is not trivial; and (c) the requirements of calibration pipelines, which are experimental and under development, are quickly evolving. After encountering some limitations in classical infrastructures like dedicated clusters, we investigated the viability of cloud infrastructures as a solution. We found that the installation and operation of LOFAR data calibration pipelines is not only possible, but can also be efficient in cloud infrastructures. The main advantages were: (1) the ease of software installation and maintenance, and the availability of standard APIs and tools, widely used in the industry; this reduces the requirement for significant manual intervention, which can have a highly negative impact in some infrastructures; (2) the flexibility to adapt the infrastructure to the needs of the problem, especially as those demands change over time; (3) the on-demand consumption of (shared) resources. We found that a critical factor (also in other infrastructures) is the availability of scratch storage areas of an appropriate size. We found no significant impediments associated with the speed of data transfer, the use of virtualization, the use of external block storage, or the memory available (provided a minimum threshold is reached). Finally, we considered the cost-effectiveness of a commercial cloud like Amazon Web Services. While a cloud solution is more expensive than the operation of a large, fully-utilized cluster completely dedicated to LOFAR data reduction, we found that its costs are competitive if the number of datasets to be analysed is not high, or if the costs of maintaining a system capable of calibrating LOFAR data become high. Coupled with the advantages discussed above, this suggests that a cloud infrastructure may be favourable for many users. ; We acknowledge the useful comments of the anonymous referee. We would like to acknowledge the work of all the developers and packagers of the LOFAR software that constitute the core of the processing pipelines (including factor, prefactor, LSMTool, LoSoTo, and the Kern Suite), as well as the useful discussions with the participants in the LOFAR blank fields and direction dependent calibration teleconferences over the years. JS and PNB are grateful for financial support from STFC via grant ST/M001229/1. This work has been also supported by the projects 'AMIGA5: gas in and around galaxies. Scientific and technological preparation for the SKA' (AYA2014-52013-C2-1-R) and 'AMIGA6: gas in and around galaxies. Preparation for SKA science and contribution to the design of the SKA data flow' (AYA2015-65973-C3-1-R) both of which were co-funded by MICINN and FEDER funds and the Junta de Andalucía (Spain) grants P08-FQM-4205 and TIC-114. We would like to explicitly acknowledge Dr Jose Ruedas – chief of the computer centre and responsible of the computing and communications infrastructures at IAA-CSIC – and Rafael Parra – system administrator of the IAA computing cluster – for their technical assistance. We acknowledge the joint SKA and AWS Astrocompute proposal call that was used to fund all the tests in the AWS infrastructure with the projects "Calibration of LOFAR ELAIS-N1 data in the Amazon cloud" and "Amazon Cloud Processing of LOFAR Tier-1 surveys: Opening up a new window on the Universe". This work made use of the University of Hertfordshire's high-performance computing facility and the LOFAR-UK computing facility, supported by STFC [grant number ST/P000096/1]. This work benefited from services and resources provided by the fedcloud.egi.eu Virtual Organization, supported by the national resource providers of the EGI Federation. We acknowledge the resources and support provided by the STFC RAL Cloud infrastructure. LOFAR, the Low Frequency Array designed and constructed by ASTRON, has facilities in several countries, that are owned by various parties (each with their own funding sources), and that are collectively operated by the International LOFAR Telescope (ILT) foundation under a joint scientific policy. ; Peer Reviewed ; Award-winning ; Postprint (author's final draft)
New scientific instruments are starting to generate an unprecedented amount of data. The Low Frequency Array (LOFAR), one of the Square Kilometre Array (SKA) pathfinders, is already producing data on a petabyte scale. The calibration of these data presents a huge challenge for final users: (a) extensive storage and computing resources are required; (b) the installation and maintenance of the software required for the processing is not trivial; and (c) the requirements of calibration pipelines, which are experimental and under development, are quickly evolving. After encountering some limitations in classical infrastructures like dedicated clusters, we investigated the viability of cloud infrastructures as a solution. We found that the installation and operation of LOFAR data calibration pipelines is not only possible, but can also be efficient in cloud infrastructures. The main advantages were: (1) the ease of software installation and maintenance, and the availability of standard APIs and tools, widely used in the industry; this reduces the requirement for significant manual intervention, which can have a highly negative impact in some infrastructures; (2) the flexibility to adapt the infrastructure to the needs of the problem, especially as those demands change over time; (3) the on-demand consumption of (shared) resources. We found that a critical factor (also in other infrastructures) is the availability of scratch storage areas of an appropriate size. We found no significant impediments associated with the speed of data transfer, the use of virtualization, the use of external block storage, or the memory available (provided a minimum threshold is reached). Finally, we considered the cost-effectiveness of a commercial cloud like Amazon Web Services. While a cloud solution is more expensive than the operation of a large, fully-utilized cluster completely dedicated to LOFAR data reduction, we found that its costs are competitive if the number of datasets to be analysed is not high, or if the costs of maintaining a system capable of calibrating LOFAR data become high. Coupled with the advantages discussed above, this suggests that a cloud infrastructure may be favourable for many users. ; We acknowledge the useful comments of the anonymous referee. We would like to acknowledge the work of all the developers and packagers of the LOFAR software that constitute the core of the processing pipelines (including factor, prefactor, LSMTool, LoSoTo, and the Kern Suite), as well as the useful discussions with the participants in the LOFAR blank fields and direction dependent calibration teleconferences over the years. JS and PNB are grateful for financial support from STFC via grant ST/M001229/1. This work has been also supported by the projects 'AMIGA5: gas in and around galaxies. Scientific and technological preparation for the SKA' (AYA2014-52013-C2-1-R) and 'AMIGA6: gas in and around galaxies. Preparation for SKA science and contribution to the design of the SKA data flow' (AYA2015-65973-C3-1-R) both of which were co-funded by MICINN and FEDER funds and the Junta de Andalucía (Spain) grants P08-FQM-4205 and TIC-114. We would like to explicitly acknowledge Dr Jose Ruedas – chief of the computer centre and responsible of the computing and communications infrastructures at IAA-CSIC – and Rafael Parra – system administrator of the IAA computing cluster – for their technical assistance. We acknowledge the joint SKA and AWS Astrocompute proposal call that was used to fund all the tests in the AWS infrastructure with the projects "Calibration of LOFAR ELAIS-N1 data in the Amazon cloud" and "Amazon Cloud Processing of LOFAR Tier-1 surveys: Opening up a new window on the Universe". This work made use of the University of Hertfordshire's high-performance computing facility and the LOFAR-UK computing facility, supported by STFC [grant number ST/P000096/1]. This work benefited from services and resources provided by the fedcloud.egi.eu Virtual Organization, supported by the national resource providers of the EGI Federation. We acknowledge the resources and support provided by the STFC RAL Cloud infrastructure. LOFAR, the Low Frequency Array designed and constructed by ASTRON, has facilities in several countries, that are owned by various parties (each with their own funding sources), and that are collectively operated by the International LOFAR Telescope (ILT) foundation under a joint scientific policy. ; Peer Reviewed ; Award-winning ; Postprint (author's final draft)
With the advent of distributed computing, the need for frameworks that facilitate its programming and management has also appeared. These tools have typically been used to support the research on application areas that require them. This poses good initial conditions for translational computer science (TCS), although this does not always occur. This article describes our experience with the PyCOMPSs project, a programming model for distributed computing. While it is a research instrument for our team, it has also been applied in multiple real use cases under the umbrella of European Funded projects, or as part of internal projects between various departments at the Barcelona Supercomputing Center. This article illustrates how the authors have engaged in TCS as an underlying research methodology, collecting experiences from three European projects. ; This work was supported in part by Spanish Government under Contract TIN2015-65316-P, in part by the Generalitat de Catalunya under Contract 2014-SGR-1051, and in part by the European Commission's Horizon 2020 Framework program through BioExcel Center of Excellence under Contract 823830 and Contract 675728, in part by the ExaQUte Project under Contract 800898, in part by the European High-Performance Computing Joint Undertaking (JU) under Grant 955558, in part by the MCIN/AEI/10.13039/501100011033, and in part by the European Union NextGenerationEU/PRTR. ; Peer Reviewed ; Postprint (author's final draft)
High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs's use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction. ; Funding This work was partially funded by Fapemig, CAPES, CNPq, MCT/CNPq-InWeb, FAPEMIG-PRONEX-MASWeb (APQ-01400-14), and by the collaboration between Brazilian MCT/RNP and the European Union Horizon 2020 research and innovation programme under grants 690116 (EUBra-BIGSEA) and 777154 (EUBra Atmosphere). ; Peer Reviewed ; Postprint (published version)
COMPSs is a programming framework that aims to facilitate the parallelization of existing applications written in Java, C/C++ and Python scripts. For that purpose, it offers a simple programming model based on sequential development in which the user is mainly responsible for identifying the functions to be executed as asynchronous parallel tasks and annotating them with annotations or standard Python decorators. A runtime system is in charge of exploiting the inherent concurrency of the code, automatically detecting and enforcing the data dependencies between tasks and spawning these tasks to the available resources, which can be nodes in a cluster, clouds or grids. In cloud environments, COMPSs provides scalability and elasticity features allowing the dynamic provision of resources. ; This work has been supported by the following institutions: the Spanish Government with grant SEV-2011-00067 of the Severo Ochoa Program and contract Computacion de Altas Prestaciones VI (TIN2012-34557); by the SGR programme (2014-SGR-1051) of the Catalan Government; by the project The Human Brain Project, funded by the European Commission under contract 604102; by the ASCETiC project funded by the European Commission under contract 610874; by the EUBrazilCloudConnect project funded by the European Commission under contract 614048; and by the Intel-BSC Exascale Lab collaboration. ; Peer Reviewed ; Postprint (published version)
COMPSs is a programming framework that aims to facilitate the parallelization of existing applications written in Java, C/C++ and Python scripts. For that purpose, it offers a simple programming model based on sequential development in which the user is mainly responsible for identifying the functions to be executed as asynchronous parallel tasks and annotating them with annotations or standard Python decorators. A runtime system is in charge of exploiting the inherent concurrency of the code, automatically detecting and enforcing the data dependencies between tasks and spawning these tasks to the available resources, which can be nodes in a cluster, clouds or grids. In cloud environments, COMPSs provides scalability and elasticity features allowing the dynamic provision of resources. ; This work has been supported by the following institutions: the Spanish Government with grant SEV-2011-00067 of the Severo Ochoa Program and contract Computacion de Altas Prestaciones VI (TIN2012-34557); by the SGR programme (2014-SGR-1051) of the Catalan Government; by the project The Human Brain Project, funded by the European Commission under contract 604102; by the ASCETiC project funded by the European Commission under contract 610874; by the EUBrazilCloudConnect project funded by the European Commission under contract 614048; and by the Intel-BSC Exascale Lab collaboration. ; Peer Reviewed ; Postprint (published version)
In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers. We present here a comprehensive exercise to bring biomolecular simulations to the "bioinformatics way of working". The exercise has led to the development of the BioExcel Building Blocks (BioBB) library. BioBB's are built as Python wrappers to provide an interoperable architecture. BioBB's have been integrated in a chain of usual software management tools to generate data ontologies, documentation, installation packages, software containers and ways of integration with workflow managers, that make them usable in most computational environments. ; We thank The Barcelona Supercomputing Center for providing all the HPC resources to launch the test executions presented in this manuscript. Te project was supported by the following grants: BioExcel Center of Excellence (Horizon 2020 Framework Programme. Grants 675728, and 823830). Elixir-Excelerate (Horizon 2020 Framework Programme. Grant 676559). Spanish National Institute for Bioinformatics (Institute of Health Carlos III. Grants PT13/0001/0019, PT13/0001/0028, PT17/0009/0001). Spanish Government, Severo Ochoa Grant SEV2015-0493, TIN2015-65316-P, Generalitat de Catalunya. 2014-SGR-1051, 2017-SGR-1110. ; Peer Reviewed ; Postprint (published version)
376 394 27 2 ; Brazilian Virtual Herbarium Consortium Brazilian Virtual Herbarium 2013 http://biogeo.inct.florabrasil.net/ ; S Hirzel, A. H., Hausser, J., Chessel, D., & Perrin, N. (2002). ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? Ecology, 83(7), 2027-2036. doi:10.1890/0012-9658(2002)083[2027:enfaht]2.0.co;2 ; Brazilian Ministry of Environment Instrução normativa no. 6, 23 de setembro de 2008 2008 http://www.mma.gov.br/estruturas/179/_arquivos/179_05122008033615.pdf ; [EN] EUBrazilOpenBio is a collaborative initiative addressing strategic barriers in biodiversity research by integrating open access data and user-friendly tools widely available in Brazil and Europe. The project deploys the EU-Brazil Hybrid Data Infrastructure that allows the sharing of hardware, software and data on-demand. This infrastructure provides access to several integrated services and resources to seamlessly aggregate taxonomic, biodiversity and climate data, used by processing services implementing checklist cross-mapping and ecological niche modelling. A Virtual Research Environment was created to provide users with a single entry point to processing and data resources. This article describes the architecture, demonstration use cases and some experimental results and validation. EUBrazilOpenBio - Open Data and Cloud Computing e-Infrastructure for Biodiversity (2011-2013) is a Small or medium-scale focused research project (STREP) funded by the European Commission under the Cooperation Programme, Framework Programme Seven (FP7) Objective FP7-ICT-2011- EU-Brazil Research and Development cooperation, and the National Council for Scientific and Technological Development of Brazil (CNPq) of the Brazilian Ministry of Science, Technology and Innovation (MCTI) under the corresponding matching Brazilian Call for proposals MCT/CNPq 066/2010. BSC authors also acknowledge the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government and the Spanish ...
[EN] EUBrazilOpenBio is a collaborative initiative addressing strategic barriers in biodiversity research by integrating open access data and user-friendly tools widely available in Brazil and Europe. The project deploys the EU-Brazil Hybrid Data Infrastructure that allows the sharing of hardware, software and data on-demand. This infrastructure provides access to several integrated services and resources to seamlessly aggregate taxonomic, biodiversity and climate data, used by processing services implementing checklist cross-mapping and ecological niche modelling. A Virtual Research Environment was created to provide users with a single entry point to processing and data resources. This article describes the architecture, demonstration use cases and some experimental results and validation. ; EUBrazilOpenBio - Open Data and Cloud Computing e-Infrastructure for Biodiversity (2011-2013) is a Small or medium-scale focused research project (STREP) funded by the European Commission under the Cooperation Programme, Framework Programme Seven (FP7) Objective FP7-ICT-2011- EU-Brazil Research and Development cooperation, and the National Council for Scientific and Technological Development of Brazil (CNPq) of the Brazilian Ministry of Science, Technology and Innovation (MCTI) under the corresponding matching Brazilian Call for proposals MCT/CNPq 066/2010. BSC authors also acknowledge the support of the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government and the Spanish Ministry of Science and Innovation under contract TIN2012-34557 and the Generalitat de Catalunya (contract 2009-SGR-980). ; Amaral, R.; Badia, RM.; Blanquer Espert, I.; Braga-Neto, R.; Candela, L.; Castelli, D.; Flann, C. (2015). Supporting biodiversity studies with the EUBrazilOpenBio Hybrid Data Infrastructure. Concurrency and Computation: Practice and Experience. 27(2):376-394. https://doi.org/10.1002/cpe.3238 ; S ; 376 ; 394 ; 27 ; 2 ; EUBrazilOpenBio Consortium EU-Brazil Open Data and Cloud Computing e-Infrastructure for ...