Data mining is already incorporated into the business processes in many sectors such as health, retail, automotive, finance, telecom and insurance as well as in government. This technology is well established in applications such as targeted marketing, customer churn detection and market basket analysis. It is also emerging as an important technology in a wide range of new application areas, such as social media, social networks and sensor networks. These areas pose new challenges both in terms of the nature of available data and the underlying support technology. This book contains extended v.
Abstract Explainability is highly desired in machine learning (ML) systems supporting high-stakes policy decisions in areas such as health, criminal justice, education, and employment. While the field of explainable ML has expanded in recent years, much of this work has not taken real-world needs into account. A majority of proposed methods are designed with generic explainability goals without well-defined use cases or intended end users and evaluated on simplified tasks, benchmark problems/datasets, or with proxy users (e.g., Amazon Mechanical Turk). We argue that these simplified evaluation settings do not capture the nuances and complexities of real-world applications. As a result, the applicability and effectiveness of this large body of theoretical and methodological work in real-world applications are unclear. In this work, we take steps toward addressing this gap for the domain of public policy. First, we identify the primary use cases of explainable ML within public policy problems. For each use case, we define the end users of explanations and the specific goals the explanations have to fulfill. Finally, we map existing work in explainable ML to these use cases, identify gaps in established capabilities, and propose research directions to fill those gaps to have a practical societal impact through ML. The contribution is (a) a methodology for explainable ML researchers to identify use cases and develop methods targeted at them and (b) using that methodology for the domain of public policy and giving an example for the researchers on developing explainable ML methods that result in real-world impact.
Cover -- Half Title -- Series Page -- Title Page -- Copyright Page -- Contents -- Preface -- Editors -- Contributors -- 1. Introduction -- 1.1 Why this book? -- 1.2 Defining big data and its value -- 1.3 The importance of inference -- 1.3.1 Description -- 1.3.2 Causation -- 1.3.3 Prediction -- 1.4 The importance of understanding how data are generated -- 1.5 New tools for new data -- 1.6 The book's "use case" -- 1.7 The structure of the book -- 1.7.1 Part I: Capture and curation -- 1.7.2 Part II: Modeling and analysis -- 1.7.3 Part III: Inference and ethics -- 1.8 Resources -- Part I: Capture and Curation -- 2. Working with Web Data and APIs -- 2.1 Introduction -- 2.2 Scraping information from the web -- 2.2.1 Obtaining data from websites -- 2.2.1.1 Constructing the URL -- 2.2.1.2 Obtaining the contents of the page from the URL -- 2.2.1.3 Processing the HTML response -- 2.2.2 Programmatically iterating over the search results -- 2.2.3 Limits of scraping -- 2.3 Application programming interfaces -- 2.3.1 Relevant APIs and resources -- 2.3.2 RESTful APIs, returned data, and Python wrappers -- 2.4 Using an API -- 2.5 Another example: Using the ORCID API via a wrapper -- 2.6 Integrating data from multiple sources -- 2.7 Summary -- 3. Record Linkage -- 3.1 Motivation -- 3.2 Introduction to record linkage -- 3.3 Preprocessing data for record linkage -- 3.4 Indexing and blocking -- 3.5 Matching -- 3.5.1 Rule-based approaches -- 3.5.2 Probabilistic record linkage -- 3.5.3 Machine learning approaches to record linkage -- 3.5.4 Disambiguating networks -- 3.6 Classification -- 3.6.1 Thresholds -- 3.6.2 One-to-one links -- 3.7 Record linkage and data protection -- 3.8 Summary -- 3.9 Resources -- 4. Databases -- 4.1 Introduction -- 4.2 The DBMS: When and why -- 4.3 Relational DBMSs -- 4.3.1 Structured Query Language -- 4.3.2 Manipulating and querying data.
Zugriffsoptionen:
Die folgenden Links führen aus den jeweiligen lokalen Bibliotheken zum Volltext:
Cover -- Half Title -- Title -- Copyright -- Contents -- Preface -- Editors -- Contributors -- 1: Introduction -- 1.1: Why this book? -- 1.2: Defining big data and its value -- 1.3: Social science, inference, and big data -- 1.4: Social science, data quality, and big data -- 1.5: New tools for new data -- 1.6: The book's "use case" -- 1.7: The structure of the book -- 1.7.1: Part I: Capture and curation -- 1.7.2: Part II: Modeling and analysis -- 1.7.3: Part III: Inference and ethics -- 1.8: Resources -- I: Capture and Curation -- 2: Working with Web Data and APIs -- 2.1: Introduction -- 2.2: Scraping information from the web -- 2.2.1: Obtaining data from the HHMI website -- 2.2.2: Limits of scraping -- 2.3: New data in the research enterprise -- 2.4: A functional view -- 2.4.1: Relevant APIs and resources -- 2.4.2: RESTful APIs, returned data, and Python wrappers -- 2.5: Programming against an API -- 2.6: Using the ORCID API via a wrapper -- 2.7: Quality, scope, and management -- 2.8: Integrating data from multiple sources -- 2.8.1: The Lagotto API -- 2.8.2: Working with a corpus -- 2.9: Working with the graph of relationships -- 2.9.1: Citation links between articles -- 2.9.2: Categories, sources, and connections -- 2.9.3: Data availability and completeness -- 2.9.4: The value of sparse dynamic data -- 2.10: Bringing it together: Tracking pathways to impact -- 2.10.1: Network analysis approaches -- 2.10.2: Future prospects and new data sources -- 2.11: Summary -- 2.12: Resources -- 2.13: Acknowledgements and copyright -- 3: Record Linkage -- 3.1: Motivation -- 3.2: Introduction to record linkage -- 3.3: Preprocessing data for record linkage -- 3.4: Indexing and blocking -- 3.5: Matching -- 3.5.1: Rule-based approaches -- 3.5.2: Probabilistic record linkage -- 3.5.3: Machine learning approaches to linking -- 3.5.4: Disambiguating networks.
Zugriffsoptionen:
Die folgenden Links führen aus den jeweiligen lokalen Bibliotheken zum Volltext:
Abstract Rough sleeping is a chronic experience faced by some of the most disadvantaged people in modern society. This paper describes work carried out in partnership with Homeless Link (HL), a UK-based charity, in developing a data-driven approach to better connect people sleeping rough on the streets with outreach service providers. HL's platform has grown exponentially in recent years, leading to thousands of alerts per day during extreme weather events; this overwhelms the volunteer-based system they currently rely upon for the processing of alerts. In order to solve this problem, we propose a human-centered machine learning system to augment the volunteers' efforts by prioritizing alerts based on the likelihood of making a successful connection with a rough sleeper. This addresses capacity and resource limitations whilst allowing HL to quickly, effectively, and equitably process all of the alerts that they receive. Initial evaluation using historical data shows that our approach increases the rate at which rough sleepers are found following a referral by at least 15% based on labeled data, implying a greater overall increase when the alerts with unknown outcomes are considered, and suggesting the benefit in a trial taking place over a longer period to assess the models in practice. The discussion and modeling process is done with careful considerations of ethics, transparency, and explainability due to the sensitive nature of the data involved and the vulnerability of the people that are affected.