The natural world is full of wonder and awe, and the National Parks of the United States are no exception. The first National Park, Yellowstone, was established March 1, 1872. The National Park Service (NPS), founded August 25, 1916, by President Theodore Roosevelt, is an agency of the United States federal government that manages all national parks, many national monuments, and other conservation and historical properties with various title designations. Since its founding, the NPS has preserved natural and cultural resources and values for the enjoyment, education, and inspiration of this and future generations. This analysis attempts to look in depth at National Parks, the species that reside in them, visitor numbers, and visitor reviews on the travel website TripAdvisor. ; https://openriver.winona.edu/urc2018/1004/thumbnail.jpg
The chapter examines the role of geospatial data in Russia's online ecosystem. Facilitated by the rise of geographic information systems and user-generated content, the distribution of geospatial data has blurred the line between physical spaces and their virtual representations. The chapter discusses different sources of these data available for Digital Russian Studies (e.g., social data and crowdsourced databases) together with the novel techniques for extracting geolocation from various data formats (e.g., textual documents and images). It also scrutinizes different ways of using these data, varying from mapping the spatial distribution of social and political phenomena to investigating the use of geotag data for cultural practices' digitization to exploring the use of geoweb for narrating individual and collective identities online.
A specter is haunting political science. It is the specter of methodological perfectionism. This dogma places methods before substance and imposes a narrow spectrum of accept-able methods on the discipline
PLASMATIC (Advanced Predictive Maintenance for the Valencian plastic industrial sector) is a project funded by the Valencian Institute for Business Competitiveness (IVACE) and the European Union through the European Regional Development Fund (FEDER). The general objective of this project is to help the Valencian plastic sector companies to incorporate solutions from the so-called Factory 4.0, via knowledge and technologies in the fields of Big Data, Machine Learning and Business Intelligence. The main result will be an advanced predictive maintenance system to deal with: (i) anomalies detection; (ii) wear prediction; and (iii) maintenance planning optimization. This report shows the results derived from the Exploratory Data Analysis of several benchmarks available in the literature, as well as data from tests carried out by AIMPLAS in one of their injection machines. A methodology for the implementation of an Advanced Predictive Maintenance System (SMPa) is proposed, using the statistical technique Principal Component Analysis. Based on operational and sensor data, PCA computes two statistics that can monitor equipment degradation, detecting anomalies through control charts with enough time for decision making. In addition, contribution plots are able to show what is the cause of such anomaly, which is especially useful in complex systems, because it eases inspection tasks. ; PLASMATIC. Project funded by the Valencian Institute of Business Competitiveness (IVACE) and European Union through the European Regional Development Fund (ERDF), within the public grant program adressed to Technological Institutes of the Valencian Community for the development of non-economic R&D projects carried out in cooperation with companies during 2017 with 87.210,96€. File number: IMDEEA/2017/114
The EXAFS data-analysis software package EDA consists of a suite of programs running under a Windows operating system environment that is designed to perform all steps of conventional EXAFS data analysis such as extraction of the XANES/EXAFS parts of the X-ray absorption coefficient, Fourier filtering and EXAFS fitting using the Gaussian and cumulant models. The package also includes two advanced approaches which allow the reconstruction of the radial distribution function (RDF) from EXAFS based on the regularization-like method and the calculation of configuration-averaged EXAFS using a set of atomic configurations obtained from molecular-dynamics or Monte Carlo simulations. --- / / / --- This is the preprint version of the following article: Alexei Kuzmin, EDA: EXAFS data-analysis software package, International Tables for Crystallography, Vol. I (2021), DOI:10.1107/S1574870720003365, which has been published in final form at https://onlinelibrary.wiley.com/doi/full/10.1107/S1574870720003365. ; Institute of Solid State Physics, University of Latvia as the Center of Excellence has received funding from the European Union's Horizon 2020 Framework Programme H2020-WIDESPREAD-01-2016-2017-TeamingPhase2 under grant agreement No. 739508, project CAMART².
Since the inception of the No Child Left Behind legislation, school districts have been faced with a growing need to gather, analyze and monitor more data than ever before in their leadership of schools (Blink, 2007; Kowalski, Lasley & Mahoney, 2008; Mills, 2006). The adage that schools are "data rich" and "information poor", while comical, is often true. School systems are awash in data and drowning is a real concern for new and soon-to-be leaders. The critical task for school leaders is to turn existing student achievement data into a format that lends itself to answering questions and improving outcomes for the students. Common barriers to transforming data into knowledge in educational settings often include poorly designed or non-existent data systems, disorganized record management, and temperamental gatekeepers who withhold data to preserve power, or personnel who simply fail to ask the right questions of the available data (Mills, 2006). Using data effectively does not require great statistical knowledge or high-priced analytical tools. It simply requires a desire to improve outcomes for students, staff, and school and a willingness to stop doing the same things and hoping for a different outcome (aka superstitious behavior). The ultimate goal for the training program delivered to students in the Masters in School Administration (MSA) program was to empower future principals to have the knowledge and skills to go beyond the usage of static reports and simple data views to develop skill and understanding of data as a dynamic entity to help support their leadership focus.
The paper presents a web application which helps the diabetes patients and clinicians to analyze the patient evolution and improve the diabetes treatment. The system reads the blood glucose values from cloud where are uploaded by the patient mobile application. In this way both patient and clinician can share the patient data and it is no longer necessary the patient to visit the hospital.
Tradizionalmente, l'epidemiologia descrittiva viene considerata come un semplice strumento esplorativo. Tuttavia, nel corso degli anni, la maggiore disponibilità e il miglioramento della qualità dei dati epidemiologici hanno portato allo sviluppo di nuove tecniche statistiche che caratterizzano l'epidemiologia moderna. Questi metodi non sono solo esplicativi, ma anche predittivi. In ambito di sanità pubblica, le previsioni degli andamenti futuri di morbilità e mortalità sono essenziali per valutare le strategie di prevenzione, la gestione delle malattie e per pianificare l'allocazione delle risorse. Durante il mio dottorato di ricerca in "Epidemiologia, Ambiente e Sanità Pubblica" ho lavorato all'analisi degli andamenti di mortalità per tumore, utilizzando principalmente la banca dati della World Health Organization (WHO), ma anche quella della Pan American Health Organization, dell'Eurostat, della United Nation Population Division, dello United States Census Bureau e la banca dati del Japanese National Institute of Population. Considerando diversi siti neoplastici e diversi paesi nel mondo, ho calcolato i tassi specifici per ogni classe di età quinquennale (da 0-4 a 80+ o 85+ anni), e singolo anno di calendario o quinquennio. Per poter confrontare i tassi fra diversi paesi, ho calcolato, utilizzando il metodo diretto sulla base della popolazione mondiale standard, i tassi di mortalità standardizzati per età per 100.000 anni-persona. Nella maggior parte delle analisi, ho poi applicato il modello di regressione joinpoint ai tassi standardizzati con lo scopo di individuare gli anni in cui erano avvenuti cambiamenti significativi nell'andamento dei tassi; per ogni segmento individuato dalla regressione joinpoint, ho calcolato le variazioni percentuali annue. Inoltre, mi sono concentrata sulle proiezioni degli andamenti futuri. Con l'obiettivo di individuare il segmento più recente dell'andamento di mortalità, ho applicato il modello di regressione joinpoint al numero di morti in ogni gruppo di età quinquennale. Quindi, ho utilizzato i Modelli Lineari Generalizzati (GLM), scegliendo la distribuzione di Poisson e diverse funzioni link, sui dati dell'ultimo segmento individuato dal modello joinpoint. In particolare, ho considerato le funzioni link identità, logaritmica, quinta potenza e radice quadrata. Ho anche implementato un algoritmo che genera una regressione "ibrida"; questo algoritmo seleziona automaticamente, in base al valore della statistica Akaike Information Criterion (AIC), il modello GLM Poisson più performante, tra quelli generati dalle funzioni link di identità, logaritmica, quinta potenza e radice quadrata, da applicare a ciascuna classe di età quinquennale. La regressione risultante, sull'insieme dei singoli gruppi di età, è quindi una combinazione dei modelli considerati. Quindi, applicando i coefficienti ottenuti dalle quattro regressioni GLM Poisson e dalla regressione ibrida sugli anni di previsione, ho ottenuto le stime predette del numero di morti. A seguire, utilizzando il numero di morti predetto e le popolazioni predette, ho stimato i tassi previsti specifici per età e i corrispondenti intervalli di previsione al 95% (PI). Infine, come ulteriore modello di confronto, ho costruito un modello medio, che semplicemente calcola una media delle stime prodotte dai diversi modelli GLM Poisson. Al fine di confrontare fra loro i sei diversi metodi di previsione, ho utilizzato i dati relativi a 21 paesi in tutto il mondo e all'Unione Europea nel suo complesso, e ho considerato 25 maggiori cause di morte. Ho selezionato solo i paesi con oltre 5 milioni di abitanti e solo i paesi per i quali erano disponibili dati di buona qualità (ovvero con almeno il 90% di coverage). Ho analizzato i dati del periodo temporale compreso tra il 1980 e il 2011 e, in particolare, ho applicato i vari modelli sui dati dal 1980 al 2001 con l'idea di prevedere i tassi sul periodo 2002-2011, e ho poi utilizzato i dati effettivamente disponibili dal 2002 al 2011 per valutare le stime predette. Quindi, per misurare l'accuratezza predittiva dei diversi metodi, ho calcolato la deviazione relativa assoluta media (AARD). Questa quantità indica la deviazione media percentuale del valore stimato dal valore vero. Ho calcolato gli AARD su un periodo di previsione di 5 anni (i.e. 2002-2006), e anche su un periodo di 10 anni (i.e. 2002-2011). Dalle analisi è emerso che il modello ibrido non sempre forniva le migliori stime di previsione e, anche quando risultava il migliore, i corrispondenti valori di AARD non erano poi molto lontani da quelli degli altri metodi. Tuttavia, le proiezioni ottenute utilizzando il modello ibrido, per qualsiasi combinazione di sito di tumore e sesso, non sono mai risultate le peggiori. Questo modello è una sorta di compromesso tra le quattro funzioni link considerate. Anche il modello medio fornisce stime intermedie rispetto alle altre regressioni: non è mai risultato il miglior metodo di previsione, ma i suoi AARD erano competitivi rispetto agli altri metodi considerati. Complessivamente, il modello che mostra le migliori prestazioni predittive è il GLM Poisson con funzione link identità. Inoltre, questo metodo ha mostrato AARD estremamente bassi rispetto agli altri metodi, in particolare considerando un periodo di proiezione di 10 anni. Infine, bisogna tenere in considerazione che gli andamenti previsti, e i corrispondenti AARD, ottenuti da proiezioni su periodi di 5 anni sono molto più accurati rispetto a quelli su periodi di 10 anni. Le proiezioni ottenute con questi metodi per periodi superiori a 5 anni perdono in affidabilità e la loro utilità in sanità pubblica risulta quindi limitata. Durante l'implementazione della regressione ibrida e durante le analisi sono rimaste aperte alcune questioni: ci sono altri modelli rilevanti che possono essere aggiunti all'algoritmo? In che misura la regressione joinpoint influenza le proiezioni? Come trovare una regola "a priori" che aiuti a scegliere quale metodo predittivo applicare in base alle varie covariate disponibili? Tutte queste domande saranno tenute in considerazione per gli sviluppi futuri del progetto. Prevedere gli andamenti futuri è un processo complesso, le stime risultanti dovrebbero quindi essere considerate con cautela e solo come indicazioni generali in ambito epidemiologico e di pianificazione sanitaria. ; Descriptive epidemiology has traditionally only been concerned with the definition of a research problem's scope. However, the greater availability and improvement of epidemiological data over the years has led to the development of new statistical techniques that have characterized modern epidemiology. These methods are not only explanatory, but also predictive. In public health, predictions of future morbidity and mortality trends are essential to evaluate strategies for disease prevention and management, and to plan the allocation of resources. During my PhD at the school of "Epidemiology, Environment and Public Health" I worked on the analysis of cancer mortality trends, using data from the World Health Organization (WHO) database, available on electronic support (WHOSIS), and from other databases, including the Pan American Health Organization database, the Eurostat database, the United Nation Population Division database, the United States Census Bureau and the Japanese National Institute of Population database. Considering several cancer sites and several countries worldwide, I computed age-specific rates for each 5-year age-group (from 0–4 to 80+ or 85+ years) and calendar year or quinquennium. I then computed age-standardized mortality rates per 100,000 person-years using the direct method on the basis of the world standard population. I performed joinpoint models in order to identify the years when significant changes in trends occurred and I calculated the corresponding annual percent changes. Moreover, I focused on projections. I fitted joinpoint models to the numbers of certified deaths in each 5-year age-group in order to identify the most recent trend slope. Then, I applied Generalized Liner Model (GLM) Poisson regressions, considering different link functions, to the data over the time period identified by the joinpoint model. In particular, I considered the identity link, the logarithmic link, the power five link and the square root link. I also implemented an algorithm that generated a "hybrid" regression; this algorithm automatically selects the best fitting GLM Poisson model, among the identity, logarithmic, power five, and square root link functions, to apply for each age-group according to Akaike Information Criterion (AIC) values. The resulting regression is a combination of the considered models. Thus, I computed the predicted age-specific numbers of deaths and rates, and the corresponding 95% prediction intervals (PIs) using the regression coefficients obtained previously from the four GLM Poisson regressions and from the hybrid GLM Poisson regression. Lastly, as a further comparison model, I implemented an average model, which just computes a mean of the estimates produced by the different considered GLM Poisson models. In order to compare the six different prediction methods, I used data from 21 countries worldwide and for the European Union as a whole, I considered 25 major causes of death. I selected countries with over 5 million inhabitants and with good quality data (i.e. with at least 90% of coverage). I analysed data for the period between 1980 and 2011 and, in particular, I considered data from 1980 to 2001 as a training dataset, and from 2002 to 2011 as a validation set. To measure the predictive accuracy of the different models, I computed the average absolute relative deviations (AARDs). These indicate the average percent deviation from the true value. I calculated AARDs on 5-year prediction period (i.e. 2002-2006), as well as for 10-year period (i.e. 2002-2011). The results showed that the hybrid model did not give always the best predictions, and when it was the best, the corresponding AARD estimates were not very far from the other methods. However, the hybrid model projections, for any combination of cancer site and sex, were never the worst. It acted as a compromise between the four considered models. The average model is also ranked in an intermediate position: it never was the best predictive method, but its AARDs were competitive compared to the other methods considered. Overall, the method that shows the best predictive performance is the Poisson GLM with an identity link function. Furthermore, this method, showed extremely low AARDs compared to other methods, particularly when I considered a 10-year projection period. Finally, we must take into account that predicted trends and corresponding AARDs derived from 5-year projections are much more accurate than those done over a 10-year period. Projections beyond five years with these methods lack reliability and become of limited use in public health. During the implementation of the algorithm and the analyses, several questions emerged: Are there other relevant models that can be added to the algorithm? How much does the Joinpoint regression influence projections? How to find an "a priori" rule that helps in choosing which predictive method apply according to various available covariates? All these questions are set aside for the future developments of the project. Prediction of future trends is a complex procedure, the resulting estimates should be taken with caution and considered only as general indications for epidemiology and health planning.
By combining expansionary open market operations with sales of foreign exchange, the central bank can expand the monetary base without depreciating the exchange rate. Thus, if there is a monetary political business cycle, sales of foreign exchange are especially likely before elections. Our panel data analysis for up to 146 countries in 1975-2001 supports this hypothesis. Foreign exchange reserves relative to trend GDP depend negatively on the pre election index. The relationship is significant and robust irrespective of the type of electoral variable, the choice of control variables and the estimation technique.
The maritime industry has become a major part of globalization. Political and economic actors are meeting challenges regarding shipping and people transport. The Automatic Identification System (AIS) records and broadcasts the location of numerous vessels and delivers a huge amount of information that can be used to analyze fluxes and behaviors. However, the exploitation of these numerous messages requires tools based on Big Data principles. Acknowledgement of origin, destination, travel duration and distance of each vessel can help transporters to manage their fleet and ports to analyze fluxes and focus their investigations on some containers based on their previous locations. Thanks to the historical AIS messages provided by the Danish Maritime Authority and ARLAS PROC/ML, an open source and scalable processing platform based on Apache SPARK, we are able to apply our pipeline of processes and extract this information from millions of AIS messages. We use a Hidden Markov Model (HMM) to identify when a vessel is still or moving and we create "courses", embodying the travel of the vessel. Then we derive the travel indicators. The visualization of results is made possible by ARLAS Exploration, an open source and scalable tool to explore geolocated data. This carto-centered application allows users to navigate into the huge amount of enriched data and helps to take benefits of these new origin and destination indicators. This tool can also be used to help in the creation of Machine Learning algorithms in order to deal with many maritime transportation challenges.
This document is sharing the methodological perspective on ethnography and data analysis of the TRANSGANG project. The aim is to provide to the local research teams a guide about ethnographic perspectives, tools and data analyses strategy, to be able to apply using NVivo software, and some references to ensure these perspectives. The objective of the TRANSGANG project is a comparative investigation based on one stage of local analysis and two stages of secondary analysis of the data collected according to the methodological approach. These three-level analyses will combine the results in a unique and transnational picture of street youth groups without losing the cultural particularities of the different places. The central and contrast cases involve a combination of qualitative techniques (our tools) such as narrative interviews, focus groups, life stories and participant observation, but it is necessary to work with similar analysis categories in each country. ; This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 742705
Part 1: Distributed Protocols ; International audience ; The shift to cloud technologies is a paradigm change that offers considerable financial and administrative gains. However governmental and business institutions wanting to tap into these gains are concerned with security issues. The cloud presents new vulnerabilities and is dominated by new kinds of applications, which calls for new security solutions.Intuitively, Byzantine fault tolerant (BFT) replication has many benefits to enforce integrity and availability in clouds. Existing BFT systems, however, are not suited for typical "data-flow processing" cloud applications which analyze large amounts of data in a parallelizable manner: indeed, existing BFT solutions focus on replicating single monolithic servers, whilst data-flow applications consist in several different stages, each of which may give rise to multiple components at runtime to exploit cheap hardware parallelism; similarly, BFT replication hinges on comparison of redundant outputs generated, which in the case of data-flow processing can represent huge amounts of data. In fact, current limits of data processing directly depend on the amount of data that can be processed per time unit.In this paper we present ClusterBFT, a system that secures computations being run in the cloud by leveraging BFT replication coupled with fault isolation. In short, ClusterBFT leverages a combination of variable-degree clustering, approximated and offline output comparison, smart deployment, and separation of duty, to achieve a parameterized tradeoff between fault tolerance and overhead in practice. We demonstrate the low overhead achieved with ClusterBFT when securing data-flow computations expressed in Apache Pig, and Hadoop. Our solution allows assured computation with less than 10 percent latency overhead as shown by our evaluation.
Over the last decade, there have been many changes in the field of political analysis at a global level. Through social networking platforms, millions of people have the opportunity to express their opinion and capture their thoughts at any time, leaving their digital footprint. As such, massive datasets are now available, which can be used by analysts to gain useful insights on the current political climate and identify political tendencies. In this paper, we present TwiFly, a framework built for analyzing Twitter data. TwiFly accepts a number of accounts to be monitored for a specific time-frame and visualizes in real time useful extracted information. As a proof of concept, we present the application of our platform to the most recent elections of Greece, gaining useful insights on the election results.
In: Edjabou , V M E , Pivnenko , K , Petersen , C , Scheutz , C & Astrup , T F 2015 , ' Compositional data analysis of household food waste in Denmark ' , 6th International Workshop on Compositional Data Analysis , Spain , 01/06/2015 - 05/06/2015 .
Food waste is a growing public concern because the food production and distribution exert enormous pressure on natural resources such as land, water and energy, and leads to significant environmental, societal and economic impacts. Thus, the European Commission has aimed to reduce to 50% the total amount of discarded edible food waste by 2020 within the European Union (EU) Member States. Reliable data on food waste and a better understanding of the food waste generation patterns are crucial for planning the avoidable food waste reduction and an environmental sound treatment of unavoidable food waste. Although, food waste composition carries relative information, no attempt was made to analysis food waste composition as compositional data. Thus the relationship between food waste fractions has been analysed by mean of Pearson correlation test and log-ratio analysis. The food waste data was collected by sampling and sorting residual household waste in Denmark. The food waste was subdivided into three fractions: (1) avoidable vegetable food waste, (2) avoidable animal-derive food waste, and (3) avoidable food waste. The correlation was carried out using: (a) the amount of food waste (kg per household per week), (b) percentage composition of food waste based on the total food waste, and (c) percentage composition of food waste based on the total residual household waste. The Pearson correlation test showed different results when different datasets are used, whereas the log-ratio analysis showed the same results for all the three datasets.
This article considers the determinants of Portuguese tourism demand for the period 2004-2013. The econometric methodology uses a panel unit root test and the dynamic panel data (GMM-system estimator). The different techniques of panel unit root (Levin, Lin and Chu; Im, Pesaran and Shin W-stat and augmented Dickey-Fuller - Fisher Chi-square) show that the variables used in this panel are stationary. The dynamic model proves that tourism demand is a dynamic process. The variables relative prices, income per capita, human capital and government spending encourage international tourism demand for Portugal. ; info:eu-repo/semantics/publishedVersion