We quantify social media user engagement with low-credibility online news media sources using a simple and intuitive methodology, that we showcase with an empirical case study of the Twitter debate on immigration in Italy. By assigning the Twitter users an Untrustworthiness (U) score based on how frequently they engage with unreliable media outlets and cross-checking it with a qualitative political annotation of the communities, we show that such information consumption is not equally distributed across the Twitter users. Indeed, we identify clusters characterised by a very high presence of accounts that frequently share content from less reliable news sources. The users with high U are more keen to interact with bot-like accounts that tend to inject more unreliable content into the network and to retweet that content. Thus, our methodology applied to this real-world network provides evidence, in an easy and straightforward way, that there is strong interplay between accounts that display higher bot-like activity and users more focused on news from unreliable sources and that this influences the diffusion of this information across the network.
We quantify social media user engagement with low-credibility online news media sources using a simple and intuitive methodology, that we showcase with an empirical case study of the Twitter debate on immigration in Italy. By assigning the Twitter users an Untrustworthiness (U) score based on how frequently they engage with unreliable media outlets and cross-checking it with a qualitative political annotation of the communities, we show that such information consumption is not equally distributed across the Twitter users. Indeed, we identify clusters characterised by a very high presence of accounts that frequently share content from less reliable news sources. The users with high U are more keen to interact with bot-like accounts that tend to inject more unreliable content into the network and to retweet that content. Thus, our methodology applied to this real-world network provides evidence, in an easy and straightforward way, that there is strong interplay between accounts that display higher bot-like activity and users more focused on news from unreliable sources and that this influences the diffusion of this information across the network.
Vaccine hesitancy is considered as one of the leading causes for the resurgence of vaccine preventable diseases. A non-negligible minority of parents does not fully adhere to the recommended vaccination schedule, leading their children to be partially immunized and at higher risk of contracting vaccine preventable diseases. Here, we leverage more than one million comments of 201,986 users posted from March 2008 to April 2019 on the public online forum BabyCenter US to learn more about such parents. For 32% with geographic location, we find the number of mapped users for each US state resembling the census population distribution with good agreement. We employ Natural Language Processing to identify 6884 and 10,131 users expressing their intention of following the recommended and alternative vaccination schedule, respectively RSUs and ASUs. From the analysis of their activity on the forum we find that ASUs have distinctly different interests and previous experiences with vaccination than RSUs. In particular, ASUs are more likely to follow groups focused on alternative medicine, are two times more likely to have experienced adverse events following immunization, and to mention more serious adverse reactions such as seizure or developmental regression. Content analysis of comments shows that the resources most frequently shared by both groups point to governmental domains (.gov). Finally, network analysis shows that RSUs and ASUs communicate between each other (indicating the absence of echo chambers), however with the latter group being more endogamic and favoring interactions with other ASUs. While our findings are limited to the specific platform analyzed, our approach may provide additional insights for the development of campaigns targeting parents on digital platforms. ; Postprint (published version)
Abstract. Tropical cyclones (TCs) produce strong winds and heavy rains accompanied by consecutive events such as landslides and storm surges, resulting in losses of lives and livelihoods, particularly in regions with high socioeconomic vulnerability. To proactively mitigate the impacts of TCs, humanitarian actors implement anticipatory action. In this work, we build upon such an existing anticipatory action for the Philippines, which uses an impact-based forecasting model for housing damage based on eXtreme Gradient Boosting (XGBoost) to release funding and trigger early action. We improve it in three ways. First, we perform a correlation and selection analysis to understand if Philippines-specific features can be left out or replaced with features from open global data sources. Secondly, we transform the target variable (percentage of completely damaged houses) and not yet grid-based global features to a 0.1∘ grid resolution by de-aggregation using Google Open Buildings data. Thirdly, we evaluate XGBoost regression models using different combinations of global and local features at grid and municipality spatial levels. We first introduce a two-stage model to predict if the damage is above 10 % and then use a regression model trained on all or only high-damage data. All experiments use data from 39 typhoons that impacted the Philippines between 2006–2020. Due to the scarcity and skewness of the training data, specific attention is paid to data stratification, sampling, and validation techniques. We demonstrate that employing only the global features does not significantly influence model performance. Despite excluding local data on physical vulnerability and storm surge susceptibility, the two-stage model improves upon the municipality-based model with local features. When applied to anticipatory action, our two-stage model would show a higher true-positive rate, a lower false-negative rate, and an improved false-positive rate, implying that fewer resources would be wasted in anticipatory action. We conclude that relying on globally available data sources and working at the grid level holds the potential to render a machine-learning-based impact model generalizable and transferable to locations outside of the Philippines impacted by TCs. Also, a grid-based model increases the resolution of the predictions, which may allow for a more targeted implementation of anticipatory action. However, it should be noted that an impact-based forecasting model can only be as good as the forecast skill of the TC forecast that goes into it. Future research will focus on replicating and testing the approach in other TC-prone countries. Ultimately, a transferable model will facilitate the scaling up of anticipatory action for TCs.
Seasonal influenza surveillance is usually carried out by sentinel general practitioners (GPs) who compile weekly reports based on the number of influenza-like illness (ILI) clinical cases observed among visited patients. This traditional practice for surveillance generally presents several issues, such as a delay of one week or more in releasing reports, population biases in the health-seeking behaviour, and the lack of a common definition of ILI case. On the other hand, the availability of novel data streams has recently led to the emergence of non-traditional approaches for disease surveillance that can alleviate these issues. In Europe, a participatory web-based surveillance system called Influenzanet represents a powerful tool for monitoring seasonal influenza epidemics thanks to aid of self-selected volunteers from the general population who monitor and report their health status through Internet-based surveys, thus allowing a real-time estimate of the level of influenza circulating in the population. In this work, we propose an unsupervised probabilistic framework that combines time series analysis of self-reported symptoms collected by the Influenzanet platforms and performs an algorithmic detection of groups of symptoms, called syndromes. The aim of this study is to show that participatory web-based surveillance systems are capable of detecting the temporal trends of influenza-like illness even without relying on a specific case definition. The methodology was applied to data collected by Influenzanet platforms over the course of six influenza seasons, from 2011-2012 to 2016-2017, with an average of 34,000 participants per season. Results show that our framework is capable of selecting temporal trends of syndromes that closely follow the ILI incidence rates reported by the traditional surveillance systems in the various countries (Pearson correlations ranging from 0.69 for Italy to 0.88 for the Netherlands, with the sole exception of Ireland with a correlation of 0.38). The proposed framework was able to forecast quite accurately the ILI trend of the forthcoming influenza season (2016-2017) based only on the available information of the previous years (2011-2016). Furthermore, to broaden the scope of our approach, we applied it both in a forecasting fashion to predict the ILI trend of the 2016-2017 influenza season (Pearson correlations ranging from 0.60 for Ireland and UK, and 0.85 for the Netherlands) and also to detect gastrointestinal syndrome in France (Pearson correlation of 0.66). The final result is a near-real-time flexible surveillance framework not constrained by any specific case definition and capable of capturing the heterogeneity in symptoms circulation during influenza epidemics in the various European countries. ; The authors declare no competing financial interests. D.Pa. and D.Pe. acknowledge support from H2020 FETPROACT-GSS CIMPLEX Grant No. 641191. KK, CC, D.Pa., D.Pe., Y.M. and M.D. acknowledge support from the Lagrange Project of the Institute for Scientific Interchange Foundation (ISI Foundation) funded by Fondazione Cassa di Risparmio di Torino (Fondazione CRT). Y.M. acknowledges support from the Government of Aragon, Spain through a grant to the group FENOL and by Ministry of Economy and Competitiveness (MINECO) and European Regional Development Fund (FEDER) (Grant No. FIS2017-87519-P). S.M. acknowledges support from the Spanish State Research Agency, through the María de Maeztu Program for Units of Excellence in R&D (MDM-2017-0711 to the IFISC Institute). ; Peer reviewed
Abstract: Seasonal influenza surveillance is usually carried out by sentinel general practitioners (GPs) who compile weekly reports based on the number of influenza-like illness (ILI) clinical cases observed among visited patients. This traditional practice for surveillance generally presents several issues, such as a delay of one week or more in releasing reports, population biases in the health-seeking behaviour, and the lack of a common definition of ILI case. On the other hand, the availability of novel data streams has recently led to the emergence of non-traditional approaches for disease surveillance that can alleviate these issues. In Europe, a participatory web-based surveillance system called Influenzanet represents a powerful tool for monitoring seasonal influenza epidemics thanks to aid of self-selected volunteers from the general population who monitor and report their health status through Internet-based surveys, thus allowing a real-time estimate of the level of influenza circulating in the population. In this work, we propose an unsupervised probabilistic framework that combines time series analysis of self-reported symptoms collected by the Influenzanet platforms and performs an algorithmic detection of groups of symptoms, called syndromes. The aim of this study is to show that participatory web-based surveillance systems are capable of detecting the temporal trends of influenza-like illness even without relying on a specific case definition. The methodology was applied to data collected by Influenzanet platforms over the course of six influenza seasons, from 2011-2012 to 2016-2017, with an average of 34,000 participants per season. Results show that our framework is capable of selecting temporal trends of syndromes that closely follow the ILI incidence rates reported by the traditional surveillance systems in the various countries (Pearson correlations ranging from 0.69 for Italy to 0.88 for the Netherlands, with the sole exception of Ireland with a correlation of 0.38). The proposed framework was able to forecast quite accurately the ILI trend of the forthcoming influenza season (2016-2017) based only on the available information of the previous years (2011-2016). Furthermore, to broaden the scope of our approach, we applied it both in a forecasting fashion to predict the ILI trend of the 2016-2017 influenza season (Pearson correlations ranging from 0.60 for Ireland and UK, and 0.85 for the Netherlands) and also to detect gastrointestinal syndrome in France (Pearson correlation of 0.66). The final result is a near-real-time flexible surveillance framework not constrained by any specific case definition and capable of capturing the heterogeneity in symptoms circulation during influenza epidemics in the various European countries. ; Author summary: This study suggests how web-based surveillance data can provide an epidemiological signal capable of detecting the temporal trends of influenza-like illness without relying on a specific case definition. The proposed framework was able to forecast quite accurately the ILI trend of the forthcoming influenza season based only on the available information of the previous years. Moreover, to broaden the scope of our approach, we applied it to the detection of gastrointestinal syndromes. We evaluated the approach against the traditional surveillance data and despite the limited amount of data, the gastrointestinal trend was successfully detected. The result is a near-real-time flexible surveillance and prediction tool that is not constrained by any disease case definition. ; D.Pa. and D.Pe. acknowledge support from H2020 FETPROACT-GSS CIMPLEX Grant No. 641191. KK, CC, D.Pa., D.Pe., Y.M. and M.D. acknowledge support from the Lagrange Project of the Institute for Scientific Interchange Foundation (ISI Foundation) funded by Fondazione Cassa di Risparmio di Torino (Fondazione CRT). Y.M. acknowledges support from the Government of Aragon, Spain through a grant to the group FENOL and by Ministry of Economy and Competitiveness (MINECO) and European Regional Development Fund (FEDER) (Grant No. FIS2017-87519-P). S.M. acknowledges support from the Spanish State Research Agency, through the María de Maeztu Program for Units of Excellence in R&D (MDM2017-0711 to the IFISC Institute). This work is partly supported by the UMR-S 1136/Public Health France partnership. ; info:eu-repo/semantics/publishedVersion
International audience ; BackgroundThe Internet is becoming more commonly used as a tool for disease surveillance. Similarly to other surveillance systems and to studies using online data collection, Internet-based surveillance will have biases in participation, affecting the generalizability of the results. Here we quantify the participation biases of Influenzanet, an ongoing European-wide network of Internet-based participatory surveillance systems for influenza-like-illness.MethodsIn 2011/2012 Influenzanet launched a standardized common framework for data collection applied to seven European countries. Influenzanet participants were compared to the general population of the participating countries to assess the representativeness of the sample in terms of a set of demographic, geographic, socio-economic and health indicators.ResultsMore than 30,000 European residents registered to the system in the 2011/2012 season, and a subset of 25,481 participants were selected for this study. All age classes (10 years brackets) were represented in the cohort, including under 10 and over 70 years old. The Influenzanet population was not representative of the general population in terms of age distribution, underrepresenting the youngest and oldest age classes. The gender imbalance differed between countries. A counterbalance between gender-specific information-seeking behavior (more prominent in women) and Internet usage (with higher rates in male populations) may be at the origin of this difference. Once adjusted by demographic indicators, a similar propensity to commute was observed for each country, and the same top three transportation modes were used for six countries out of seven. Smokers were underrepresented in the majority of countries, as were individuals with diabetes; the representativeness of asthma prevalence and vaccination coverage for 65+ individuals in two successive seasons (2010/2011 and 2011/2012) varied between countries.ConclusionsExisting demographic and national datasets allowed the quantification ...
International audience ; BackgroundThe Internet is becoming more commonly used as a tool for disease surveillance. Similarly to other surveillance systems and to studies using online data collection, Internet-based surveillance will have biases in participation, affecting the generalizability of the results. Here we quantify the participation biases of Influenzanet, an ongoing European-wide network of Internet-based participatory surveillance systems for influenza-like-illness.MethodsIn 2011/2012 Influenzanet launched a standardized common framework for data collection applied to seven European countries. Influenzanet participants were compared to the general population of the participating countries to assess the representativeness of the sample in terms of a set of demographic, geographic, socio-economic and health indicators.ResultsMore than 30,000 European residents registered to the system in the 2011/2012 season, and a subset of 25,481 participants were selected for this study. All age classes (10 years brackets) were represented in the cohort, including under 10 and over 70 years old. The Influenzanet population was not representative of the general population in terms of age distribution, underrepresenting the youngest and oldest age classes. The gender imbalance differed between countries. A counterbalance between gender-specific information-seeking behavior (more prominent in women) and Internet usage (with higher rates in male populations) may be at the origin of this difference. Once adjusted by demographic indicators, a similar propensity to commute was observed for each country, and the same top three transportation modes were used for six countries out of seven. Smokers were underrepresented in the majority of countries, as were individuals with diabetes; the representativeness of asthma prevalence and vaccination coverage for 65+ individuals in two successive seasons (2010/2011 and 2011/2012) varied between countries.ConclusionsExisting demographic and national datasets allowed the quantification of the participation biases of a large cohort for influenza-like-illness surveillance in the general population. Significant differences were found between Influenzanet participants and the general population. The quantified biases need to be taken into account in the analysis of Influenzanet epidemiological studies and provide indications on populations groups that should be targeted in recruitment efforts.