author:"Heinrich-Heine-University" | Pollux - Fachinformationsdienst Politikwissenschaft

Die Media-Analyse Daten wurden zu kommerziellen Zwecken erhoben. Sie werden in der Mediaplanung sowie der Werbeplanung der unterschiedlichen Mediengattungen (Radio, Pressemedien, TV, Plakat und seit 2010 auch Online) eingesetzt. Es handelt sich um Querschnitte, die für ein Jahr aneinandergereiht werden. Die ag.ma stellt freundlicherweise jährlich – mit einer Frist von zwei Jahren – die entsprechenden Daten der GESIS zur wissenschaftlichen Nutzung bereit. Zusätzlich hat die agof für die Aufbereitung der Online-Tranche der MA IntermediaPlus Unterlagen bezüglich der Datenerhebung (Fragebögen, Codepläne, usw.) bereitgestellt.

Um die Daten für die wissenschaftliche Nutzung zugänglich zu machen, wurden ab 2018 im Rahmen des Dissertationsprojektes "Angebots- und Publikumsfragmentierung online" des Graduiertenkollegs Digitale Gesellschaft NRW an der Heinrich-Heine-Universität (HHU) sowie der Hochschule Düsseldorf (HSD) gefördert durch das Ministerium für Kultur und Wissenschaft des Landes Nordrhein-Westfalen die Datensätze der einzelnen Jahre zu einem Längsschnitt-Datensatz ab 2014 harmonisiert.

Zugriff(Open Access)

GESIS

Exportieren

Forschungsdaten#52022

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets (Part 11, Jan 2022 - Aug 2022)

Bensmann, Felix; Dietze, Stefan; Baran, Erdal; GESIS - Leibniz-Institut für Sozialwissenschaften & Heinrich-Heine-University Düsseldorf, Germany & L3S Research Center, Hannover, Germany; GESIS - Leibniz-Institut für Sozialwissenschaften

TweetsKB is a public RDF corpus of anonymized data for a large collection of annotated tweets. The dataset currently contains data for nearly 3.0 billion tweets, spanning more than 9 years (February 2013 - August 2022). Metadata information about the tweets as well as extracted entities, sentiments, hashtags, user mentions and URLs are exposed in RDF using established RDF/S vocabularies. For the sake of privacy, we anonymize user IDs and we do not provide the text of the tweets. For a list of the previous dataset parts, example queries and more information see the TweetsKB's home page: https://data.gesis.org/tweetskb/.

Zugriff(Open Access)

GESIS

Exportieren

Forschungsdaten#62022

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets (Part 10, Jan 2021 - Dec 2021)

Baran, Erdal; Dietze, Stefan; Bensmann, Felix; GESIS - Leibniz-Institut für Sozialwissenschaften & Heinrich-Heine-University Düsseldorf, Germany & L3S Research Center, Hannover, Germany; GESIS - Leibniz-Institut für Sozialwissenschaften

TweetsKB is a public RDF corpus of anonymized data for a large collection of annotated tweets. The dataset currently contains data for nearly 3.0 billion tweets, spanning more than 9 years (February 2013 - August 2022). Metadata information about the tweets as well as extracted entities, sentiments, hashtags, user mentions and URLs are exposed in RDF using established RDF/S vocabularies. For the sake of privacy, we anonymize user IDs and we do not provide the text of the tweets. For a list of the previous dataset parts, example queries and more information see the TweetsKB's home page: https://data.gesis.org/tweetskb/.

Zugriff(Open Access)

GESIS

Exportieren

Forschungsdaten#72022

SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse

Hafid, Salim; Dietze, Stefan; Schellhammer, Sebastian; Todorov, Konstantin; Bringay, Sandra; GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany; GESIS - Leibniz Institute for the Social Sciences Cologne & Heinrich-Heine-University Düsseldorf, Germany; LIRMM, CNRS, University of Montpellier, Montpellier, France

This repository contains an expert-annotated dataset of 1261 tweets and the corresponding annotation framework from the publication "SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse" (https://arxiv.org/abs/2206.07360). The tweets are annotated with three different categories of science-relatedness:

(1) Scientific knowledge (scientifically verifiable claims): Tweets that include a claim or a question that could be scientifically verified, (2) Reference to scientific knowledge: Tweets that include at least one reference to scientific knowledge (references can either be direct, e.g., DOI, title of a paper or indirect, e.g., a link to an article that includes a direct reference), and (3) Related to scientific research in general: Tweets that mention a scientific research context (e.g., mention a scientist, scientific research efforts, research findings).

Further, the annotations include the annotators' confidence scores as well as labels for compound claims and ironic tweets.

Zugriff(Open Access)

GESIS

Exportieren

Forschungsdaten#82022

ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022)

ClaimsKG is a knowledge graph of metadata information for 59580 fact-checked claims scraped from 13 fact-checking sites. In addition to providing a single dataset of claims and associated metadata, truth ratings are harmonised and additional information is provided for each claim, e.g., about mentioned entities. Please see (https://data.gesis.org/claimskg/) for further details about the data model and statistics.

The dataset facilitates structured queries about claims, their truth values, involved entities, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia/Wikipedia, and lifts all data to RDF using established vocabularies (such as schema.org). 

The latest release of ClaimsKG covers 59580 claims. The data was scraped till August, of 2022 containing claims published between the years 1996-2022 from 13 factchecking websites. The claim-review (fact checking) period for claims ranges between the year 1996 to 2022. Entity fishing python client (https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains a total of 1371271 entities detected and referenced with DBpedia. More information, such as detailed statistics, query examples and a user-friendly interface to explore the knowledge graph is available at: https://data.gesis.org/claimskg/ .

The first two releases of ClaimsKG are hosted at Zenodo (https://doi.org/10.5281/zenodo.3518960), ClaimsKGV1.0 (published on 04.04.2019), ClaimsKGV2.0 (published on 01.09.2019). This latest release of ClaimsKG supersedes the previous versions as it contains all the claims from the previous versions together with additional claims as well as improved entity annotations.

Zugriff(Open Access)

GESIS

Exportieren

Forschungsdaten#92022

TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 4, January 2021 - August 2022)

Dimitrov, Dimitar; Zhu, Xiaofei; Baran, Erdal; Yu, Ran; Fafalios, Pavlos; Zloch, Matthäus; Dietze, Stefan; Chongqing University of Technology, Chongqing, China

Dimitrov, Dimitar; Zhu, Xiaofei; Baran, Erdal; Yu, Ran; Fafalios, Pavlos; Zloch, Matthäus; Dietze, Stefan; Chongqing University of Technology, Chongqing, China; GESIS - Leibniz-Institut für Sozialwissenschaften & Heinrich-Heine-University Düsseldorf, Germany & L3S Research Center, Hannover, Germany; Institute of Computer Science, FORTH-ICS, Heraklion, Greece; GESIS - Leibniz-Institut für Sozialwissenschaften

TweetsCOV19 is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of TweetsKB and aims at capturing online discourse about various aspects of the pandemic and its societal impact. Metadata information about the tweets as well as extracted entities, sentiments, hashtags, user mentions, and resolved URLs are exposed in RDF using established RDF/S vocabularies (for the sake of privacy, we anonymize user IDs and we do not provide the text of the tweets). More information are available through TweetsCOV19's home page: https://data.gesis.org/tweetscov19/.

We also provide a tab-separated values (tsv) version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character ("\t"). The following list indicate the feature indices:

1. Tweet Id: Long.

2. Username: String. Encrypted for privacy issues.

3. Timestamp: Format ( "EEE MMM dd HH:mm:ss Z yyyy" ).

4. #Followers: Integer.

5. #Friends: Integer.

6. #Retweets: Integer.

7. #Favorites: Integer.

8. Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from FEL library. Each entity is separated from another entity by char ";". Also, each entity is separated by char ":" in order to store "original_text:annotated_entity:score;". If FEL did not find any entities, we have stored "null;".

9. Sentiment: String. SentiStrength produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char " ". Positive sentiment was stored first and then negative sentiment (i.e. "2 -1").

10. Mentions: String. If the tweet contains mentions, we remove the char "@" and concatenate the mentions with whitespace char " ". If no mentions appear, we have stored "null;".

11. Hashtags: String. If the tweet contains hashtags, we remove the char "#" and concatenate the hashtags with whitespace char " ". If no hashtags appear, we have stored "null;".

12. URLs: String: If the tweet contains URLs, we concatenate the URLs using ":-: ". If no URLs appear, we have stored "null;"

To extract the dataset from TweetsKB, we compiled a seed list of 268 COVID-19-related keywords.

You can find the previous part 3 at https://doi.org/10.5281/zenodo.4593523 .

Zugriff(Open Access)

GESIS

Exportieren

Forschungsdaten#102023

ClaimsKG - A Knowledge Graph of Fact-Checked Claims (January, 2023)

ClaimsKG is a knowledge graph of metadata information for fact-checked claims scraped from popular fact-checking sites. In addition to providing a single dataset of claims and associated metadata, truth ratings are harmonized and additional information is provided for each claim, e.g., about mentioned entities. Please see (https://data.gesis.org/claimskg/) for further details about the data model, query examples and statistics.

The dataset facilitates structured queries about claims, their truth values, involved entities, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia/Wikipedia, and lifts all data to RDF using established vocabularies (such as schema.org).

The latest release of ClaimsKG covers 74066 claims and 72127 Claim Reviews. This is the fourth release of the dataset where data was scraped till Jan 31, 2023 containing claims published between 1996 and 2023 from 13 fact-checking websites. The websites are Fullfact, Politifact, TruthOrFiction, Checkyourfact, Vishvanews, AFP (French), AFP, Polygraph, EU factcheck, Factograph, Fatabyyano, Snopes and Africacheck. The claim-review (fact-checking) period for claims ranges between the year 1996 to 2023. Similar to the previous release, the Entity fishing python client (https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. Improvements have been made in the web scraping and data preprocessing pipeline to extract more entities from both claims and claims reviews. Currently, ClaimsKG contains 3408386 entities detected and referenced with DBpedia.

This latest release of ClaimsKG supersedes the previous versions as it contained all the claims from the previous versions together in addition to the additional new claims as well as improved entity annotation resulting in a higher number of entities.

Zugriff(Open Access)

GESIS

Exportieren

Filter

Format

Medientyp

Sprache

Jahre

The German online media market: Online-born information offerings and their audiences – A shift towards digital inequalities?

Non-Partisan Groups in German Local Politics: Between Populism and "Politics as Usual"?

Media-Analyse Daten: IntermediaPlus Daten von 2014 bis 2016 (MA IntermediaPlus)

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets (Part 11, Jan 2022 - Aug 2022)

TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets (Part 10, Jan 2021 - Dec 2021)

SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse

ClaimsKG - A Knowledge Graph of Fact-Checked Claims (August, 2022)

TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 4, January 2021 - August 2022)

ClaimsKG - A Knowledge Graph of Fact-Checked Claims (January, 2023)

Suchergebnisse

Filter

Format

Medientyp

Sprache

Jahre

Kontakt

Hilfe