Integrating Iterative Machine Teaching and Active Learning into the Machine Learning Loop
[Abstract] Scholars and practitioners are defining new types of interactions between humans and machine learning algorithms that we can group under the umbrella term of Human-in-the-Loop Machine Learning (HITL-ML). This paper is focused on implementing two approaches to this topic—Iterative Machine Teaching (iMT) and Active Learning (AL)—and analyzing how to integrate them in the learning loop. iMT is a variation of Machine Teaching in which a machine acts as a teacher that tries to transfer knowledge to a machine learning model. The focus of the problem in iMT is how to obtain the optimal training set given a machine learning algorithm and a target model. The idea is to learn a target concept with a minimal number of iterations with the smallest dataset. Active Learning, in contrast, is a specialized type of supervised learning in which humans are incorporated in the loop to act as oracles that analyze unlabeled data. AL allows us to achieve greater accuracy with less data and less training. Our proposal to incorporate iMT and AL into the machine learning loop is to use iMT as a technique to obtain the "Minimum Viable Data (MVD)" for training a learning model, that is, a dataset that allows us to increase speed and reduce complexity in the learning process by allowing to build early prototypes. Next, we will use AL to refine this first prototype by adding new data in an iterative and incremental way. We carried out several experiments to test the feasibility of our proposed approach. They show that the algorithms trained with the teachers converge faster than those that have been trained in a conventional way. Also, AL helps the model to avoid getting stuck and to keep improving after the first few iterations. The two approaches investigated in this paper can be considered complementary, as they correspond to different stages in the learning process. ; This work has been supported by the State Research Agency of the Spanish Government (grant PID2019-107194GB-I00 / AEI / 10.13039/501100011033) and by the Xunta de Galicia (grant ED431C 2018/34) with the European Union ERDF funds. We wish to acknowledge the support received from the Centro de Investigaci ́on de Galicia "CITIC", funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program, grant ED431G 2019/01). ; Xunta de Galicia; ED431C 2018/34 ; Xunta de Galicia; ED431G 2019/01