Selecting Directors Using Machine Learning
In: Fisher College of Business Working Paper No. 2018-03-005
11876 Ergebnisse
Sortierung:
In: Fisher College of Business Working Paper No. 2018-03-005
SSRN
Working paper
SSRN
Working paper
SSRN
In: Synthese: an international journal for epistemology, methodology and philosophy of science, Band 199, Heft 1-2, S. 1461-1497
ISSN: 1573-0964
AbstractThis paper investigates how unsupervised machine learning methods might make hermeneutic interpretive text analysis more objective in the social sciences. Through a close examination of the uses of topic modeling—a popular unsupervised approach in the social sciences—it argues that the primary way in which unsupervised learning supports interpretation is by allowing interpreters to discover unanticipated information in larger and more diverse corpora and by improving the transparency of the interpretive process. This view highlights that unsupervised modeling does not eliminate the researchers' judgments from the process of producing evidence for social scientific theories. The paper shows this by distinguishing between two prevalent attitudes toward topic modeling, i.e., topic realism and topic instrumentalism. Under neither can modeling provide social scientific evidence without the researchers' interpretive engagement with the original text materials. Thus the unsupervised text analysis cannot improve the objectivity of interpretation by alleviating the problem of underdetermination in interpretive debate. The paper argues that the sense in which unsupervised methods can improve objectivity is by providing researchers with the resources to justify to others that their interpretations are correct. This kind of objectivity seeks to reduce suspicions in collective debate that interpretations are the products of arbitrary processes influenced by the researchers' idiosyncratic decisions or starting points. The paper discusses this view in relation to alternative approaches to formalizing interpretation and identifies several limitations on what unsupervised learning can be expected to achieve in terms of supporting interpretive work.
This dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance. ; Cette thèse regroupe trois travaux d'économétrie liés par l'application du machine learning et de la statistique en grande dimension à l'évaluation de politiques publiques. La première partie propose une alternative paramétrique au contrôle synthétique (Abadie and Gardeazabal, 2003; Abadie et al., 2010) sous la forme d'un estimateur reposant sur une première étape de type Lasso, dont on montre qu'il est doublement robuste, asymptotiquement Normal et ``immunisé'' contre les erreurs de première étape. La seconde partie étudie une version pénalisée du contrôle synthétique en présence de données de nature micro-économique. La pénalisation permet d'obtenir une unité synthétique qui ...
BASE
This dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance. ; Cette thèse regroupe trois travaux d'économétrie liés par l'application du machine learning et de la statistique en grande dimension à l'évaluation de politiques publiques. La première partie propose une alternative paramétrique au contrôle synthétique (Abadie and Gardeazabal, 2003; Abadie et al., 2010) sous la forme d'un estimateur reposant sur une première étape de type Lasso, dont on montre qu'il est doublement robuste, asymptotiquement Normal et ``immunisé'' contre les erreurs de première étape. La seconde partie étudie une version pénalisée du contrôle synthétique en présence de données de nature micro-économique. La pénalisation permet d'obtenir une unité synthétique qui ...
BASE
This dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance. ; Cette thèse regroupe trois travaux d'économétrie liés par l'application du machine learning et de la statistique en grande dimension à l'évaluation de politiques publiques. La première partie propose une alternative paramétrique au contrôle synthétique (Abadie and Gardeazabal, 2003; Abadie et al., 2010) sous la forme d'un estimateur reposant sur une première étape de type Lasso, dont on montre qu'il est doublement robuste, asymptotiquement Normal et ``immunisé'' contre les erreurs de première étape. La seconde partie étudie une version pénalisée du contrôle synthétique en présence de données de nature micro-économique. La pénalisation permet d'obtenir une unité synthétique qui ...
BASE
This dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance. ; Cette thèse regroupe trois travaux d'économétrie liés par l'application du machine learning et de la statistique en grande dimension à l'évaluation de politiques publiques. La première partie propose une alternative paramétrique au contrôle synthétique (Abadie and Gardeazabal, 2003; Abadie et al., 2010) sous la forme d'un estimateur reposant sur une première étape de type Lasso, dont on montre qu'il est doublement robuste, asymptotiquement Normal et ``immunisé'' contre les erreurs de première étape. La seconde partie étudie une version pénalisée du contrôle synthétique en présence de données de nature micro-économique. La pénalisation permet d'obtenir une unité synthétique qui ...
BASE
This dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance. ; Cette thèse regroupe trois travaux d'économétrie liés par l'application du machine learning et de la statistique en grande dimension à l'évaluation de politiques publiques. La première partie propose une alternative paramétrique au contrôle synthétique (Abadie and Gardeazabal, 2003; Abadie et al., 2010) sous la forme d'un estimateur reposant sur une première étape de type Lasso, dont on montre qu'il est doublement robuste, asymptotiquement Normal et ``immunisé'' contre les erreurs de première étape. La seconde partie étudie une version pénalisée du contrôle synthétique en présence de données de nature micro-économique. La pénalisation permet d'obtenir une unité synthétique qui ...
BASE
In: Algorithms for Intelligent Systems Ser.
Intro -- Preface -- Contents -- Editors and Contributors -- About the Editors -- Contributors -- Introduction to Computer Vision and Machine Learning Applications in Agriculture -- 1 Introduction -- 2 Computer Vision and Machine Learning in Agriculture -- 3 Challenges and Future Scopes -- 4 Conclusion -- References -- Robots and Drones in Agriculture-A Survey -- 1 Introduction -- 2 Robotics Basic -- 2.1 Robotic Mechanism -- 2.2 Agricultural Robot Classification -- 3 Robots in Agricultural Applications -- 3.1 Robots in Path Navigation -- 3.2 Robots in Crop Production -- 3.3 Robots in Weed Removal and Disease and Pest Control -- 3.4 Robots in Crop Harvesting -- 4 Drones in Agriculture -- 5 Commercialization and Current Challenges of Agricultural Robots -- 6 Conclusion -- References -- Detection of Rotten Fruits and Vegetables Using Deep Learning -- 1 Introduction -- 2 Computer Vision and Machine Learning in Fruits and Vegetable Processing -- 2.1 Segmentation and Detection of Fruits and Vegetables from the Natural Environment -- 2.2 Classification of Fruits and Vegetables -- 2.3 Grading of Fruits and Vegetables -- 2.4 Sorting the Defective Fruits and Vegetables -- 3 Materials and Methods -- 3.1 Dataset -- 3.2 Convolutional Neural Network -- 3.3 Proposed Convolutional Neural Network Architecture -- 3.4 AlexNet Architecture -- 4 Experimentation and Results -- 5 Discussion -- 6 Conclusion -- References -- Deep Learning-Based Essential Paddy Pests' Filtration Technique for Economic Damage Management -- 1 Introduction -- 2 Related Works -- 3 Pests Classification -- 3.1 Beneficial Pests -- 3.2 Non-beneficial Pests -- 4 Methodology -- 5 Deep Learning -- 5.1 Convolutional Neural Network -- 6 Experiments -- 6.1 Dataset -- 6.2 Experimental Setup -- 6.3 Confusion Matrix -- 6.4 Computation Time -- 7 Conclusion -- References.
In: PNAS nexus, Band 1, Heft 5
ISSN: 2752-6542
Abstract
Recent breakthroughs in machine learning and big data analysis are allowing our online activities to be scrutinized at an unprecedented scale, and our private information to be inferred without our consent or knowledge. Here, we focus on algorithms designed to infer the opinions of Twitter users toward a growing number of topics, and consider the possibility of modifying the profiles of these users in the hope of hiding their opinions from such algorithms. We ran a survey to understand the extent of this privacy threat, and found evidence suggesting that a significant proportion of Twitter users wish to avoid revealing at least some of their opinions about social, political, and religious issues. Moreover, our participants were unable to reliably identify the Twitter activities that reveal one's opinion to such algorithms. Given these findings, we consider the possibility of fighting AI with AI, i.e., instead of relying on human intuition, people may have a better chance at hiding their opinion if they modify their Twitter profiles following advice from an automated assistant. We propose a heuristic that identifies which Twitter accounts the users should follow or mention in their tweets, and show that such a heuristic can effectively hide the user's opinions. Altogether, our study highlights the risk associated with developing machine learning algorithms that analyze people's profiles, and demonstrates the potential to develop countermeasures that preserve the basic right of choosing which of our opinions to share with the world.
User care at home is a matter of great concern since unforeseen circumstances might occur that affect people's well-being. Technologies that assist people in independent living are essential for enhancing care in a cost-effective and reliable manner. Assisted care applications often demand real-time observation of the environment and the residents activities using an event-driven system. As an emerging area of research and development, it is necessary to explore the approaches of the user care system in the literature to identify current practices for future research directions. Therefore, this book is aimed at a comprehensive review of data sources (e.g., sensors) with machine learning for various smart user care systems. To encourage the readers in the field, insights of practical essence of different machine learning algorithms with sensor data (e.g., publicly available datasets) are also discussed. Some code segments are also included to motivate the researchers of the related fields to practically implement the features and machine learning techniques. It is an effort to obtain knowledge of different types of sensor-based user monitoring technologies in-home environments. With the aim of adopting these technologies, research works, and their outcomes are reported. Besides, up to date references are included for the user monitoring technologies with the aim of facilitating independent living. Research that is related to the use of user monitoring technologies in assisted living is very widespread, but it is still consists mostly of limited-scale studies. Hence, user monitoring technology is a very promising field, especially for long-term care. However, monitoring of the users for smart assisted technologies should be taken to the next level with more detailed studies that evaluate and demonstrate their potential to contribute to prolonging the independent living of people. The target of this book is to contribute towards that direction.
In: International journal of population data science: (IJPDS), Band 8, Heft 2
ISSN: 2399-4908
Data linkage traditionally uses deterministic and probabilistic methods. Alternatively, machine learning methods can be applied as classification algorithms, using the data to inform decisions. This project compared the quality, in terms of precision and recall, of traditional methods with selected machine learning methods when applied to a standard linkage problem.
Two supervised methods, gradient boosted trees (GBT) and multiple layered perceptron classifier (MLPC), and one unsupervised method, maximum entropy classification (MEC), were implemented. The England and Wales 2021 Census to Census Coverage Survey (CCS) linkage was used as a gold-standard (GS) linked dataset to provide training samples for the supervised methods as well as testing samples for all methods. The F1 score (harmonic mean of precision and recall) was used to compare the performance of the models and to determine the optimal parameters and thresholds.
The Splink implementation of Fellegi-Sunter with Expectation Maximisation was used as a baseline for comparison.
The methods, trained on a sample of the GS, were used to link census and CCS data. All methods performed well with MEC achieving the highest precision (99.79%) but lowest recall (96.36%). The MLPC model achieved the highest F1 score (98.94%).
To understand the implications of not retraining supervised models for each dataset, the models were also used to link Census to a health dataset. The supervised models were not retrained using the health data; instead, the optimised GS models were applied. MEC had the lowest precision (96.51%) but the highest recall (98.48%) and highest F1 score (97.49%). With F1 scores of 96.99% and 96.14% respectively, the GBT and MLPC supervised models were not far behind in performance, despite not being trained using health data.
We have shown that machine learning methods can be used effectively for data linkage problems. Unsurprisingly, supervised models perform best when trained on and applied to the same data. Further research into generic training may allow us to use both supervised and unsupervised machine learning models for future data linkage.
In: Beigh, T. M., Arivazagan, J., & Venkatesan, V. P. (2023). Counterfeit Currency Detection using Machine Learning. Journal of Emerging Technologies and Innovative Research, 10(3), 356–358. ISSN-2349-5162
SSRN
In: Liu, Dishi und Maruyama, Daigo und Görtz, Stefan (2020) Machine Learning for Aerodynamic Uncertainty Quantification. In: ERCIM News Special Theme "Solving Engineering Problems with Machine Learning" (122). Seiten 20-21. ISSN 0926-4981.
Within the framework of the project "Uncertainty Management for Robust Industrial Design in Aeronautics" (UMRIDA), funded by the European Union, several machine learning-based predictive models were compared in terms of their efficiency in estimating statistics of aerodynamic performance of aerofoils. The results show that the models based on both samples and gradients achieve better accuracy than those based solely on samples at the same computational costs.
BASE