National audience ; Durant la pandémie de Covid-19, de nombreux utilisateurs ont considéré Twitter comme une source fiable d'information médicale. Toutefois, cette information est souvent altérée lorsqu'elle est partagée. A travers l'étude qualitative de tweets portant sur la controverse des traitements de la Covid-19, nous montrons que malgré la qualité de l'information dans le tweet initial, les flux de personnes publiques créent des cascades de désinformation via leurs abonnés. Nous clarifions le mécanisme de distorsion de l'information médicale en cascade, découvrons les principaux acteurs de la discussion sur les traitements controversés, et montrons l'effet de polarisation du groupe pendant la discussion. L'information médicale est souvent altérée accidentellement et inconsciemment en raison d'une mauvaise compréhension, qui provoque une substitution des avis relatifs aux prescriptions controversées par un positionnement politique.
National audience ; Durant la pandémie de Covid-19, de nombreux utilisateurs ont considéré Twitter comme une source fiable d'information médicale. Toutefois, cette information est souvent altérée lorsqu'elle est partagée. A travers l'étude qualitative de tweets portant sur la controverse des traitements de la Covid-19, nous montrons que malgré la qualité de l'information dans le tweet initial, les flux de personnes publiques créent des cascades de désinformation via leurs abonnés. Nous clarifions le mécanisme de distorsion de l'information médicale en cascade, découvrons les principaux acteurs de la discussion sur les traitements controversés, et montrons l'effet de polarisation du groupe pendant la discussion. L'information médicale est souvent altérée accidentellement et inconsciemment en raison d'une mauvaise compréhension, qui provoque une substitution des avis relatifs aux prescriptions controversées par un positionnement politique.
National audience ; Durant la pandémie de Covid-19, de nombreux utilisateurs ont considéré Twitter comme une source fiable d'information médicale. Toutefois, cette information est souvent altérée lorsqu'elle est partagée. A travers l'étude qualitative de tweets portant sur la controverse des traitements de la Covid-19, nous montrons que malgré la qualité de l'information dans le tweet initial, les flux de personnes publiques créent des cascades de désinformation via leurs abonnés. Nous clarifions le mécanisme de distorsion de l'information médicale en cascade, découvrons les principaux acteurs de la discussion sur les traitements controversés, et montrons l'effet de polarisation du groupe pendant la discussion. L'information médicale est souvent altérée accidentellement et inconsciemment en raison d'une mauvaise compréhension, qui provoque une substitution des avis relatifs aux prescriptions controversées par un positionnement politique.
National audience ; Durant la pandémie de Covid-19, de nombreux utilisateurs ont considéré Twitter comme une source fiable d'information médicale. Toutefois, cette information est souvent altérée lorsqu'elle est partagée. A travers l'étude qualitative de tweets portant sur la controverse des traitements de la Covid-19, nous montrons que malgré la qualité de l'information dans le tweet initial, les flux de personnes publiques créent des cascades de désinformation via leurs abonnés. Nous clarifions le mécanisme de distorsion de l'information médicale en cascade, découvrons les principaux acteurs de la discussion sur les traitements controversés, et montrons l'effet de polarisation du groupe pendant la discussion. L'information médicale est souvent altérée accidentellement et inconsciemment en raison d'une mauvaise compréhension, qui provoque une substitution des avis relatifs aux prescriptions controversées par un positionnement politique.
National audience ; Durant la pandémie de Covid-19, de nombreux utilisateurs ont considéré Twitter comme une source fiable d'information médicale. Toutefois, cette information est souvent altérée lorsqu'elle est partagée. A travers l'étude qualitative de tweets portant sur la controverse des traitements de la Covid-19, nous montrons que malgré la qualité de l'information dans le tweet initial, les flux de personnes publiques créent des cascades de désinformation via leurs abonnés. Nous clarifions le mécanisme de distorsion de l'information médicale en cascade, découvrons les principaux acteurs de la discussion sur les traitements controversés, et montrons l'effet de polarisation du groupe pendant la discussion. L'information médicale est souvent altérée accidentellement et inconsciemment en raison d'une mauvaise compréhension, qui provoque une substitution des avis relatifs aux prescriptions controversées par un positionnement politique.
This paper refers to herding behaviour as developed in Bikhchandani et al. (1992), Bannerjee (1992) and Choi and Scarpa (1994). We examine the behaviour of a potential customer who does not know how many of her predecessorsdecided not to purchase the product. We show that, ceteris paribus,a smaller (larger) customer base increases the likelihood of a positive(negative) cascade. Hence, a firm can signal its commitment to high quality(Schelling 1960) by choosing to develop a customer base that relies upon thecustomer's 'private' information rather than one that relies on an informational cascade.
AbstractMany institutions, large or small, make their decisions through some process of deliberation. Nonetheless, deliberating institutions often fail, in the sense that they make judgments that are false or that fail to take advantage of the information that their members have. Micro mistakes can lead to macro blunders or even catastrophes. There are four such failures; all of them have implication for large-scale institutions as well as small ones. (1) Sometimes the predeliberation errors of an institution's members are amplified, not merely propagated, as a result of deliberation. (2) Institutions fall victim to cascade effects, as the initial speakers or actors are followed by their successors, who do not disclose what they know. Non-disclosure, on the part of those successors, may be a product of either informational or reputational cascades. (3) As a result of group polarization, deliberating institutions sometimes end up in a more extreme position in line with their predeliberation tendencies. Sometimes group polarization leads in desirable directions, but there is no assurance to this effect. (4) In deliberating institutions, shared information often dominates or crowds out unshared information, ensuring that institutions do not learn what their members know. Informational signals and reputational pressure help to explain all four errors. The results can be harmful to numerous institutions, including large ones, and to societies as a whole. Markets are able to correct some of these problems, but cascade effects occur there as well.
As recent events have demonstrated, disinformation spread through social networks can have dire political, economic and social consequences. Detecting disinformation must inevitably rely on the structure of the network, on users particularities and on event occurrence patterns. We present a graph data structure, which we denote as a meta-graph, that combines underlying users' relational event information, as well as semantic and topical modeling. We detail the construction of an example meta-graph using Twitter data covering the 2016 US election campaign and then compare the detection of disinformation at cascade level, using well-known graph neural network algorithms, to the same algorithms applied on the meta-graph nodes. The comparison shows a consistent 3%-4% improvement in accuracy when using the meta-graph, over all considered algorithms, compared to basic cascade classification, and a further 1% increase when topic modeling and sentiment analysis are considered. We carry out the same experiment on two other datasets, HealthRelease and HealthStory, part of the FakeHealth dataset repository, with consistent results. Finally, we discuss further advantages of our approach, such as the ability to augment the graph structure using external data sources, the ease with which multiple meta-graphs can be combined as well as a comparison of our method to other graph-based disinformation detection frameworks.
IntroductionElectronic tuberculosis (TB) register systems influence policy decisions, resource allocation and patient care in many ways, but their limitations have been demonstrated in many high-burden settings like South Africa. While digital health systems in the Western Cape, South Africa have improved over time and benefited from implementation of a unique patient identifier, questions about quality and completeness of register data remain. A Health Information Exchange (HIE), established in 2015, daily integrates routinely-collected person level health data from electronic sources in the Province, including laboratory, dispensing, clinical and encounter data, as well as disease register data for HIV and TB.
Objectives and ApproachUsing TB-related datapoints from various electronic platforms and resources, an algorithm was developed to infer cases, visit and treatment information, comorbidities and mortality - defined as a "cascade". The cascade is recompiled daily incorporating new information added to the HIE, and presented to health care workers and managers as filterable, downloadable reports on an electronic platform. TB Register and inferred cascade data were compared for 2018.
ResultsThere were 40,227 cases in the register after 3,010 duplicate entries were eliminated by consolidating personal identifiers and duplicate entries across facilities into single TB episodes. 13,729 additional cases were identified in the HIE cascade. Of these, 6,984 had evidence of treatment; 4,143 were diagnosed and treated only in hospitals - thus less likely to be recorded in the registers. Updated patient contact details and allocation of a primary care facility based on patient visit history, aided in patient care.
Conclusion / ImplicationsLeveraging a consolidated environment for person-level health data can substantially enhance and verify disease registers. Appropriate tools can render these data accessible and actionable to improve patient care, minimise errors and missed opportunities to close treatment gaps, and increase accuracy of surveillance and reporting on a programmatic level.
In: Meždunarodnye processy: žurnal teorii meždunarodnych otnošenij i mirovoj politiki = International trends : journal of theory of international relations and world politics, Band 19, Heft 4, S. 26-46
The article is devoted to the analysis of the algocognitive culture, the new reality that humanity has already entered, but remains far from being understood. Today we can speak about dissolution of the concept of privacy: almost all actions of a person, including his daily trips, his social circle and values it shares, his correspondence and purchases are automatically observed, and completely transparent to information corporations. The problem of fake news has become insurmountable: their appearance into the information cascade converts in an event immediately, making later investigations and refutations almost obsolete. A «culture of cancellation» has emerged, within which a priori there is no criteria for good and evil, where it has become possible to «delete» from the information circulation any arrays of knowledge that do not meet the requirements of the self-proclaimed «new ethics», and to ostracize people associated with them. The author compares the current state of affairs with the era of the dominance of sophists in ancient Greece, when the truth was determined depending on the conjuncture, and finds relevant parallels. In this context, the author formulates the concept of «cognitive vulnerability»: the new reality makes possible control of the masses of people, setting not only their consumer, but also political behavior. The author defines network reality as an alternative system of socialization, where the «network» ontology and values turn out to be more competitive than real ones, and therefore de facto displace them. The latter becomes possible due to a kind of «splitting» of the personality, when the emotional reaction is de facto separated from the real goal-oriented activity, and connected with the virtual reality. Ruling algorithms in social networks are aimed at achieving this goal: for an example author turns to recent investigation by The Wall Street Journal regarding Facebook: the MSI algorithm used by the latter provokes disputes and splits on every occasion. De facto, this leads to a situation where American information corporations are moving towards the new quality of the actual owner of sovereignty over the consciousness of the external societies. This challenge has already been met by China: since September 1, 2021, Beijing had nationalized algorithms, and handed control over them to the Communist Party. The author analyzes the steps taken by China and comes to the conclusion that in case of success China will become not only an economic, but also an ideological alternative to America, thereby making a bid to restore a bipolar world political system.
The dry-zone water-harvesting and management system in Sri Lanka is one of the oldest historically recorded systems in the world. A substantial number of ancient sources mention the management and governance structure of this system suggesting it was initiated in the 4th century BCE (Before Common Era) and abandoned in the middle of the 13th century CE (Common Era). In the 19th century CE, it was reused under the British colonial government. This research aims to identify the ancient water management and governance structure in the dry zone of Sri Lanka through a systematic analysis of ancient sources. Furthermore, colonial politics and interventions during reclamation have been critically analyzed. Information was captured from 222 text passages containing 560 different records. 201 of these text passages were captured from lithic inscriptions and 21 text passages originate from the chronicles. The spatial and temporal distribution of the records and the qualitative information they contain reflect the evolution of the water management and governance systems in Sri Lanka. Vast multitudes of small tanks were developed and managed by the local communities. Due to the sustainable management structure set up within society, the small tank systems have remained intact for more than two millennia.
In the last decade we witnessed a rapid rise of the online social media services. Although they were created in the early 2000's, their rise began in earnest after 2010 when their presence started to fundamentally alter the traditional media landscape. Today, their influence on the way our society consumes, curates and disseminates information is indisputable. With their wider adoption came also the first criticism, as well as a need to solve emerging legislative, ethical and societal issues. One line of research is to explain and quantify the sources of influence in online social services and investigate to what extent are these new social landscapes vulnerable to manipulation by third parties. This manipulation is often performed by using user's digital traces - a record of their activities on the online social service. These digital footprints have a potential to characterize users in more detail than what they themselves would be willing to share otherwise. For example, user's personality traits can be inferred indirectly from the content with which they interact through online services, and even their writing style on the written content they published could be used to infer their demographic characteristics. This opens opportunities for micro-targeting of users for various dubious purposes, for example by increasing their propensity to spread misinformation. Research described in this thesis shows that much can be learned about user engagement by using very little data - in our case only friendship connections between users and a single activation cascade. A single activation cascade means we only have one registration event per user. This data alone is sufficient to estimate, under certain assumptions, whether activation for each user was predominantly influenced by its peers with which they are connected (endogenous influence), or the exogenous factors which are external to the friendship network itself. Both endogenous and exogenous factors, for example mass media, are known to have a significant impact on the activity of users of online social media. The methodology developed in this thesis requires postulating an explicit endogenous influence model which governs interactions between pairs of users, while exogenous influence is assumed to act equally towards all users in the network. Several suitable endogenous influence models are proposed for the use with this methodology. First one is Susceptible-Infected model, commonly used in epidemiological modeling. Second one features a decay factor for the endogenous influence, which is a realistic assumption for in social systems. Third one features a logistic threshold for activation. Exogenous influence is modelled as an independent probability of activation which is, at any given time, equal for all non-activated users, although it may change in time. An inference method is developed where maximum likelihood estimation is used to estimate relative magnitudes of endogenous and exogenous influence on users. These estimates can then be used to characterize influence of individual users. The computational scalability analysis is performed on simulated data to demonstrate that the inference method is able to scale to large social networks. Empirical data on over 20 thousand Facebook users is used for evaluation of the proposed inference method. Data is collected using three unique Facebook political survey applications which provided Facebook friendship relations between users and a single activation cascade - a single registration event per user. Referral links, which identify user's origin, are used as a proxy for user's activation type. Users whose referral links originated from Facebook are considered as endogenously activated while those whose referral links originated from an external website are considered as exogenously activated. Inference method is used to estimate the most probable source of influence for each user individually, as well as to asses the overall influence of different media channels (peer communication, Facebook advertisements, or external news media) on user's activations cascade. Ethical, methodological and technical issues regarding data collection in the context of online social media services is discussed. Guidelines on how to collect online social media data in an ethically principled way are provided, especially in the context of satisfying requirements for reproducible research. Estimating endogenous and exogenous influence in networks with a statistical methodology that is conceptually simple, yet powerful and efficient, is widely applicable to scientific domains where deciphering properties of spreading processes and external influences on complex networks is crucial for an explanation of new phenomena. ; Zadnjih deset godina svjedoci smo naglog uzleta popularsnosti online društvenih mreža. Iako postoje od ranih 2000-tih, njihov uspon je ozbiljno započeo tek nakon 2010. kada njihova prisutnost počinje fundamentalno mijenjati tradicionalne medije. Utjecaj online društvenih mreža na način na koji naše društvo konzumira, odabire i diseminira informacije je danas neporeciv. S njihovom širom upotrebom pojavile su se i prve kritike, kao i potreba za rješavanjem novonastalih legislativnih, etičkih i društvenih pitanja. Jedan smjer istraživanja pokušava objasniti i kvantificirati izvore utjecaja u online društvenim servisima i istražiti do koje mjere su oni podložni manipulaciji od treće strane. Ta manipulacija se često provodi korištenjem korisničkih digitalnih tragova - zapisa njihovih aktivnosti na online društvenim servisima. Navedeni digitalni otisci imaju potencijal za karakterizaciju korisnika s više detalja nego što su oni sami voljni otkriti. Primjerice, korisničke crte osobnosti i demografske karakteristike se mogu procjeniti indirektno preko sadržaja ili stila pisanja kojeg korisnici koriste na online servisu. Ovo otvara mogućnost za mikro-ciljanje (eng. micro-targeting) korisnika u svrhu različitih sumnjivih radnji ili propagande, primjerice povećavanjem njihove sklonosti da šire dezinformacije. Istraživanje opisano u ovoj disertaciji pokazuje da se mnogo toga može saznati o aktivnosti korisnika koristeći relativno malo podataka - u našem slučaju riječ je samo o podacima o prijateljskim vezama između korisnika i jednoj kaskadi širenja informacija, pri čemu informacija koja se širi odgovara činu registracije (aktivacije) korisnika na online društvenom servisu. Koristeći samo ove podatke moguće je, pod određenim pretpostavkama, zaključiti je li aktivacija svakog pojedinog korisnika pretežno uzrokovana zbog njegovih prijatelja s kojima su povezani (endogeni utjecaj) ili faktorima van društvene mreže (egzogeni utjecaj). Poznato je da i endogeni i egzogeni faktori, primjerice iz medija, imaju značajan utjecaj na aktivnost korisnika. U Poglavlju 1 opisana je motivacija i pregled područja istraživanja iz širenja informacija u online društvenim mrežama, kao i statističkih metoda koje se koriste prilikom modeliranja širenja informacija iz empirijskih podataka. Opisani su ciljevi doktorskog istraživanja koji se sastoje od definiranja modela endogenog i egzogenog širenja informacija u društvenim mrežama, razvoja metode za statističko zaključivanje parametara navedenih modela iz podataka, i evaluacije navedene metode na empirijskim podacima prikupljenih iz stvarnih online društvenih mreža. U Poglavlju 2 opisani su modeli širenja informacija koji se koriste u metodi statističkog zaključivanja razvijenoj u sklopu ovog doktorskog istraživanja. Metoda zahtjeva postuliranje izričitog modela endogenog utjecaja koji definira interakcije između parova korisnika. S druge strane, pretpostavka kod egzogenog utjecaja je da djeluje jednako prema svim korisnicima u društvenoj mreži. Predloženo je nekoliko primjerenih modela endogenog utjecaja koji se mogu koristiti u tu svrhu. Prvi je Susceptible-Infected model, često korišten u epidemiološkom modeliranju, gdje svaki trenutno aktivni korisnik ima nezavisnu priliku aktivirati bilo kojeg od svojih prijatelja u online društvenoj mreži, pri čemu se vjerojatnost aktivacije ne mijenja u vremenu. Drugi model pretpostavlja eksponencijalno opadajući utjecaj što znači da tijekom vremena korisnici imaju sve manju vjerojatnost aktivirati nekog od svojih prijatelja, što je realistična pretpostavka u društvenim interakcijama. U trećem modelu se vjerojatnost aktivacije mijenja s brojem prethodno aktiviranih prijatelja prema logističkoj funkciji, što znači da postoji prag broja prethodno aktiviranih prijatelja koji se mora dostići prije nego vjerojatnost aktivacije dostigne značajnu vrijednost. Egzogeni utjecaj je modeliran kao nezavisna vjerojatnost aktivacije koja je, u svakom danom trenutku, jednaka za sve još neaktivne korisnike, iako se može mijenjati u vremenu. Modeli endogenog i egzogenog utjecaja objedinjeni su unutar funkcije izglednosti (eng. likelihood) koja daje vjerojatnost svake kombinacije parametara modela, uvjetno s obzirom na promatrane podatke koji se u ovom slučaju sastoje od mreže prijateljstva između korisnika i vremena njihove aktivacije. U Poglavlju 3 opisana je razvijena metoda statističkog zaključivanja koja koristi maksimalnu izglednost (eng. maximum likelihood) za pronalaženje parametara endogenog i egzogenog utjecaja. Ti parameteri se potom koriste za procjenu relativne magnitude endogenog i egzogenog utjecaja na korisnika pomoću mjere egzogene odgovornosti (eng. exogenous responsibility) koja na skali od 0 do 1 kvantificira koliko je na korisnikovu aktivaciju utjecao egzogeni utjecaj, pri čemu veća vrijednost označava jači egzogeni utjecaj. Definiraju se i mjere individualnog i kolektivnog utjecaja (eng. individual and collective influence) koje kvantificiraju utjecaj pojedinog korisnika i grupe korisnika na aktivacije njihovih prijatelja u društvenoj mreži, pri čemu se uzima u obzir samo endogena komponenta utjecaja. Metoda statističkog zaključivanja koristi metodu maksimalne izglednosti za procjenu fiksnog skupa parametera endogenog utjecaja koji su isti za sve korisnike i ne mijenju se u vremenu. S druge strane, egzogeni utjecaj se procjenjuje u svakom vremenskom trenutku zasebno pa broj parametara ovisi o broju diskretnih vremenskih trenutaka. U realnim primjenama gdje se zahtjeva određena vremenska granulacija egzogenog utjecaja to uvijek rezultira prevelikim brojem parametara za izravnu procjenu metodom maksimalne izglednosti. Zbog toga je razvijena alternirajuća optimizacijska metoda gdje se parametri endgenog i egzogenog utjecaja naizmjence fiksiraju kako bi se smanjio broj parametara koji se optimiraju u svakoj iteraciji algoritma. Manji broj parametara omogućuje da se optimizacija provede nekom od standardnih metoda numeričke optimizacije. Iako ne postoji teorijska garancija konvergencije metode, praksa pokazuje da je za konvergenciju svih parametara potrebno svega nekoliko iteracija algoritma. Provedena je analiza računske skalabilnosti kako bi se pokazalo da predložena alternirajuća metoda statističkog zaključivanja skalira čak i na velike društvene mreže od preko 20 tisuća korisnika. Evaluacija je prvo provedena na simuliranim podacima pri čemu su aktivacijske kaskade korisnika simulirane prema jednom od tri predložena modela endogenog utjecaja. Egzogeni utjecaj dizajniran je tako da sadrži nekoliko distinktnih eksponencijalno-opadajućih šiljaka u vremenu. Ovo je obrazac koji se često opaža u empirijskim podacima, primjerice kad medijske objave uzrokuju porast interesa i pojačanu aktivaciju korisnika. Predložena metoda statističkog zaključivanja sposobna je precizno odrediti stvarne parametre endogenog i egzogenog utjecaja u simuliranom slučaju, kao i stvarni razlog aktivacije svakog pojedinog korisnika, koristeći samo podatke o mreži prijateljstava između korisnika i vrijeme aktivacije svakog pojedinog korisnika. Provedeni su opsežni eksperimenti na simuliranim podacima gdje je pokazano da metoda dobro radi i na proizvoljnim krivuljama egzogenog utjecaja. Također, rezultati su uspoređeni s onima dobivenima jednostavnom osnovnom (eng. baseline) metodom gdje su svi korisnici koji u trenutku aktivacije nisu imali drugih aktiviranih prijatelja proglašeni egzogeno aktiviranima. Ova jednostavna metoda podcjenjuje stvarni broj egzogeno aktiviranih korisnika, pogotovo pred kraj aktivacijske kaskade kada je većina korisnika u mreži već aktivirana. Zbog specifičnog načina prikupljanja podataka o korisnicima - korisnici koji čine mrežu prijateljstava su svi oni koji se u konačnici aktiviraju, mreža prijateljstava se pred kraj aktivacijske kaskade zasiti s aktiviranim korisnicima što ne odražava stvarno stanje u društvenoj mreži. Ovaj efekt nazivamo pristranost opažača (eng. observer bias) i on uzrokuje precjenjivanje egzogenog utjecaja kako se približavamo kraju aktivacijske kaskade. Kako bi se on izbjegao u funkciju izglednosti dodan je korekcijski faktor. U Poglavlju 2 opisana je metodologija prikupljanja podataka korištenih u empirijskoj evaluaciji. Za empirijsku evaluaciju su korišteni podaci o preko 20 tisuća korisnika društvene mreže Facebook. Podaci su prikupljeni pomoću tri online političke ankete koje koriste Facebook Graph programsko sučelje za registraciju korisnika. Ankete su provedene na hrvatskom jeziku i vezane su za tri različita politička događaja u Hrvatskoj - referendum o pitanju ustavne definicije braka iz 2013. i parlamentarne izbore 2015. i 2016. godine. Prikupljeni podaci sadrže informaciju o prijateljskim poveznicama između korisnika i samo jednu aktivacijsku kaskadu - vrijeme registracije svakog pojedinog korisnika. Referencijske poveznice (eng. referral links), koje identificiraju porijeklo korisnika, su korištene kao aproksimacija za korisnikov tip aktivacije. Korisnici čija je referencijska poveznica potekla s Facebooka su smatrani endogeno aktiviranima, dok su oni čija je referencijska poveznica potekla s vanjske web stranice smatrani egzogeno aktiviranima. Anketne aplikacije su bile aktivne otprilike tjedan dana prije samog dana glasanja i tijekom tog vremena su privukle medijsku pozornost online novinskih portala koji su u svojim objavama dijelili poveznicu na aplikacije. U trenucima takvih objava vidljiv je skok u registraciji korisnika na anketne aplikacije što ukazuje na egzogeni utjecaj jer se korisnici registriraju na aplikaciju potaknuti vanjskim izvorom. S druge strane, struktura mreže prijatelja ukazuje na efekt homofilije - korisnici se pretežno povezuju s drugim korisnicima koji dijele njihove političke stavove, ili su im slični po nekim drugim karakteristikama (primjerice starosti), što ukazuje na endogeni utjecaj. Eksploratorna analiza prikupljenih podataka pokazuje da su strukturalne karakteristike mreže prijateljstava i statističke karakteristike demografije korisnika reprezentativne za hrvatski Facebook prostor. Raspravlja se i o etičkim, metodološkim i tehničkim aspektima prikupljanja podataka u kontekstu online društvenih mreža. Predstavljene su i smjernice za prikupljanje podataka s online društvenih mreža na etički prihvatljiv način, tako da se istovremeno poštuju privatnost korisnika, uvjeti korištenja online društvenih servisa kao i zahtjevi za reproducibilnost provedenog istraživanja. Empirijska evaluacija predložene metode statističkog zaključivanja opisana je u Poglavlju 4. Pomoću prikupljenih empirijskih podataka procjenjuje se najvjerojatniji izvor utjecaja za svakog korisnika zasebno, kao i ukupni utjecaj svakog komunikacijskog kanala (komunikacija između korisnika, Facebook oglasi, vanjski medijski izvori) na korisničku aktivacijsku kaskadu. Kao metrika evaluacije koristi se površina ispod krivulje (eng. area under the curve - AUC) koja na empirijskim podacima postiže vrijednost od 0.7 do 0.8, što ukazuje na dobru diskriminacijsku moć predložene metode statističkog zaključivanja u kontekstu binarnog klasifikacijskog problema gdje se korisnici klasificiraju na endogeno i egzogeno aktivirane prema njihovim referencijskim poveznicama. Od komunikacijskih kanala kao najjutjecajnija se pokazala direktna komunikacija između korisnika, dok su se vanjski medijski izvori pokazali dominantni samo na jednom skupu podataka gdje udio egzogeno aktiviranih korisnici čine većinu (preko 90\% od ukupnog broja korisnika). Provedena je i usporedba predložene mjere individualnog utjecaja svakog pojedinog korisnika sa strukturalnim mjerama izračunatima iz mreže prijateljstava, pri čemu je najjača korelacija s Pagerank centralnošću. U sklopu ovog doktorskog istraživanja razvijena je metoda statističkog zaključivanja za procjenu endogenog i egzogenog širenja informacija u društvenim mrežama, no potencijalna primjena nadilazi primjenu u samo jednoj specifičnoj domeni. Identifikacija egzogenih utjecaja ima potencijalnu primjenu i u analizi financijskih sustava gdje vanjski utjecaji mogu imati ključnu ulogu u dinamici sustava. Također, paradigma identifikacije endogenog i egzogenog utjecaja potencijalno ima širu primjenu u modeliranju općenitih dinamičkih sustava gdje bi se pomoću takvih metoda identificirale ranjivosti sustava na vanjske šokove, kao i podložnost manipulaciji od trećih strana. Procjena endogenog i egzogenog utjecaja u mrežama sa statističkom metodologijom koja je konceptualno jednostavna, a opet snažna i učinkovita, široko je primjenjiva u znanstvenim područjima gdje je dešifriranje svojstava procesa širenja i vanjskog utjecaja na kompleksnim mrežama ključno za objašnjavanje novih pojava.
International audience ; Finding a set of users that can maximize the spread of information in a social network is an important problem in social media analysis-being a critical part of several realworld applications such as viral marketing, political advertising and epidemiology. Although influence maximization has been studied extensively in the past, the majority of works focus on the algorithmic aspect of the problem, overlooking several practical improvements that can be derived by data-driven observations or the inclusion of machine learning. The main challenges of realistic influence maximization is on the one hand the computational demand of the diffusion models' repetitive simulations, and on the other the accuracy of the estimated influence spread. In this work, we propose CELFIE, an influence maximization method that utilizes learnt influence representations from diffusion cascades to overcome the use of diffusion models. It comprises of two parts. The first is based on INF2VEC, an unsupervised learning model that embeds influence relationships between nodes from a set of diffusion cascades. We create a new version of the model, based on observations from influence analysis on a large scale dataset, to match the scalability needs and the purpose of influence maximization. The second part capitalizes on the learned representations to redefine the traditional live-edge model sampling for the computation of the marginal gain. For evaluation, we apply our method in the Sina Weibo and Microsoft Academic Graph datasets, two large scale networks accompanied by diffusion cascades. We observe that our algorithm outperforms various baseline methods in terms of seed set quality and speed. In addition, the proposed INF2VEC modification for influence maximization provides substantial computational advantages in the price of a minuscule loss in the influence spread.
International audience ; Finding a set of users that can maximize the spread of information in a social network is an important problem in social media analysis-being a critical part of several realworld applications such as viral marketing, political advertising and epidemiology. Although influence maximization has been studied extensively in the past, the majority of works focus on the algorithmic aspect of the problem, overlooking several practical improvements that can be derived by data-driven observations or the inclusion of machine learning. The main challenges of realistic influence maximization is on the one hand the computational demand of the diffusion models' repetitive simulations, and on the other the accuracy of the estimated influence spread. In this work, we propose CELFIE, an influence maximization method that utilizes learnt influence representations from diffusion cascades to overcome the use of diffusion models. It comprises of two parts. The first is based on INF2VEC, an unsupervised learning model that embeds influence relationships between nodes from a set of diffusion cascades. We create a new version of the model, based on observations from influence analysis on a large scale dataset, to match the scalability needs and the purpose of influence maximization. The second part capitalizes on the learned representations to redefine the traditional live-edge model sampling for the computation of the marginal gain. For evaluation, we apply our method in the Sina Weibo and Microsoft Academic Graph datasets, two large scale networks accompanied by diffusion cascades. We observe that our algorithm outperforms various baseline methods in terms of seed set quality and speed. In addition, the proposed INF2VEC modification for influence maximization provides substantial computational advantages in the price of a minuscule loss in the influence spread.