An Open Testbed for Author Name Disambiguation Evaluation
We implemented a method for author name disambiguation and categorized publications of authors with the same name. This testbed is applied to evaluate our implementation.
11 Ergebnisse
Sortierung:
We implemented a method for author name disambiguation and categorized publications of authors with the same name. This testbed is applied to evaluate our implementation.
GESIS
Social Network Science (SNS) is the field concerned with studying social systems in a relational way from the perspectives of the social and natural sciences. This data set consists of 25,760 biographical records retrieved from the Web of Science, ranging from 1916 to 2012. Each publication belongs to one of five subfields. To facilitate analyses of the social aspect of SNS, the names of 45,580 distinct authors are provided, linked to the papers in 68,227 publication-author relations. Author names have been disambiguated semi-automatically. To enable analyses of the cultural aspect of SNS, 23,026 distinct linguistic concepts are provided. These concepts resemble words or word combinations extracted from titles (for all publication years) and from abstracts and author keywords (only for publications published in, or after, 1990). They are linked to the papers in 201,608 publication-concept relations.
GESIS
A newer version of this dataset is available at https://doi.org/10.7802/1.1954
-------------------------------------------------------------------------------------
Social Network Science (SNS) is the field concerned with studying social systems in a relational way from the perspectives of the social and natural sciences. This data set consists of 25,760 biographical records retrieved from the Web of Science, ranging from 1916 to 2012. Each publication belongs to one of five subfields. To facilitate analyses of the social aspect of SNS, the names of 45,580 distinct authors are provided, linked to the papers in 68,227 publication-author relations. Author names have been disambiguated semi-automatically. To enable analyses of the cultural aspect of SNS, 23,026 distinct linguistic concepts are provided. These concepts resemble words or word combinations extracted from titles (for all publication years) and from abstracts and author keywords (only for publications published after 1990/1991). They are linked to the papers in 202,181 publication-concept relations.
GESIS
This dataset describes the temporal evolution of collaborations in Computer Science based on papers that have been published between 1970 to 2016. It contains several data files in CSV format which embody information such as the author of publications for each year, its citations and the papers it is cited by. It also contains the inferred gender information of all authors. The publication and citation details are taken from DBLP and Aminer datasets respectively. This dataset along with the gender information is obtained from the methodology as specified in the 'Data' section of following paper https://arxiv.org/pdf/1704.05801.pdf
GESIS
Data sharing is key for replication and re-use in empirical research. Scientific journals can play a central role by establishing data policies and providing technologies. In this study factors of influence for data sharing are analyzed by investigating journal data policies and author behavior in sociology. The websites of 140 journals from sociology were consulted to check their data policy. The results are compared with similar studies from political science and economics. For five selected journals with a broad variety all articles from two years are examined to see if authors really cite and share their data, and which factors are related to this.
GESIS
Reporting on foreigners in magazines and newspapers of the Federal
Republic.
Topics: 1. Formal aspects: magazine or newspaper name; date of
publication; number of the article taken from the edition; number of
first page; placement on page; extent; series part; category;
information on title or headline; authorship; article genre; layout and
presentation; language level of article; attitude of author to topic.
2. Topic and content aspect: sources of information of the article;
manner of presentation; spatial reference; time reference; foreigner
group; group articulating itself in the article; main topic of the
article; reports about foreigner crime, criminal offenses, culprits,
crime victims and causes of crime; contents of sensationalism articles
as well as reports on discrimination against foreigners, political
inflitration and political dangers from foreigners; foreigners and
women; foreigners and health problems; topics of good-will reports;
integration as a topic; foreigners as vehicle of culture; topics of
non-fiction articles; causes and motives of migration; contents of
reports about country of origin; statistical information; reports about
political interest and political participation of foreigners; reports
about social problems of foreigners; information about family
questions; information about foreign children and young people; housing
problems; support measures reported on and support organizations;
reports about accidents, accident frequency and causes of accidents;
rights of foreigners; information about the attitude of the German
population to foreigners; job market reports and reports aout the
economic situation; assessment and evaluation of national economy and
business management aspects of foreigner employment; occupational
trainers and qualification; statement about adaptation to industrial
work and frequency of change of job; foreigners as competition for
German workers; demands of the author raised in the article;
characteristics attributed to foreigners in the article.
Also encoded was: assessment of the ease of encoding by the coder.
GESIS
DBLP (https://dblp.org/) is a comprehensive collection of computer science publications from major and minor journals and conference proceedings. From this dump, we remove arXiv preprints. Our dataset consists of 1.9 million publications from 1970 to 2014 that are authored by 1.1 million authors. We have added citations among publications by combining DBLP with the AMiner dataset (https://www.aminer.org/citation) via publication titles and years. There are 6.6 million citations among publications. Author names in DBLP are disambiguated. To infer the gender of authors, we have used a method that combines the results of name-based and image-based gender detection services. Since the accuracy is very low for Chinese and Korean names, we label their gender as unknown to reduce noise in our analysis.
GESIS
Systematic research reviews have become essential in all empirical sciences. However, the validity of research syntheses is threatened if the preparation, submission or publication of research findings depends on the statistical significance of these findings. The present study investigates publication bias in three top-tier journals in the German social sciences, utilizing the caliper test. For the period between 2001 and 2010, we have collected 156 articles that appeared in the Kölner Zeitschrift für Soziologie und Sozialpsychologie (KZfSS), the Zeitschrift für Soziologie (ZfS) and the Politische Vierteljahresschrift (PVS). In all three journals, we found empirical evidence for the existence of a publication bias at the 10% level. We also investigated possible causes linked to this bias, including single versus multiple authorship as well as academic degree. We find only weak support for the relationships between individual author characteristics and publication bias.
GESIS
Contents of the dataset on country/macro level includes nine variables considering school and education system characteristics as well as country characteristics:
number of school types/tracks for 9th grade/15-year-olds; age at first selection; preschool obligation; compulsory school years/education years (with pre-primary school); government expenditure on education, total (% of GDP); mean years of schooling; Human Development Index (HDI); Gender Inequality Index (GII); women's share of seats in parliament (in %)
The data set contains information on 82 countries and regions that participated in the PISA study. Most of the data are for the school year period of 2017/18, however older and newer data is used as well if other sources were not available. The documents used to create the dataset (including European Commisssion, OECD, education ministries) can be found in the reference list in the excel file and can be requested from the author.
GESIS
Interest in the topic of violence against women has grown strongly over the last two decades. During the nineties, and following studies on the subject in Canada and the United States, the focus has shifted to violence against women in general, and no longer exclusively on domestic violence against women. Following the preparatory work of two UN institutes (UNICRI in Turin and HEUNI in Helsinki), and once the method had been standardized (identical questionnaire and survey method), national studies on this issue have been planned in approximately 30 countries.
The Swiss survey is based on a telephone interview, between April and August 2003, of 1975 women aged 18 to 70 living in the German-speaking and the French-speaking parts of Switzerland. The sample thus obtained is representative of the female population. The method used was the computer-assisted telephone survey, which had already proved adequate in previous victimization surveys. This choice was also motivated by the great complexity of the questionnaire. The latter should indeed allow to apprehend different categories of violence, relating to different types of relationship between the author and his victim (marriage, cohabitation, former partners, colleagues, strangers) since the age of 16 years (experiences lived in childhood are not taken into account).
There are several objectives for this study:
- to increase the awareness of this problem among the authorities and the public
- to promote prevention
- to provide reliable information for the development of legislation, policies and means of assistance to victims
- to set up an internationally comparable database
- to help the police in their work practices concerning violence against women
- to formulate and test certain hypotheses
On thjs basis, here are the hypotheses and research questions:
- What is the extent of this type of violence in Switzerland, compared to other countries? How to explain these differences?
- How has the situation of domestic violence evolved since the study by Gillioz et al. (1994)?
- How important are various factors, including situational and biographical, in experiences of violence?
- What is the influence of the past and current criminal history of men on their tendency to domestic violence?
- What particular interaction effects are revealed among the variables studied?
- How is the role of the police perceived among the victims?
- Does (institutionalized) aid to victims achieve its objectives?