Machine learning techniques to improve the field performance of low-cost air quality sensors
Low-cost air quality sensors offer significant potential for enhancing urban air quality networks by providing higher-spatiotemporal-resolution data needed, for example, for evaluation of air quality interventions. However, these sensors present methodological and deployment challenges which have historically limited operational ability. These include variability in performance characteristics and sensitivity to environmental conditions. In this work, we investigate field "baselining" and interference correction using random forest regression methods for low-cost sensing of NO 2 , PM 10 (particulate matter) and PM 2.5 . Model performance is explored using data obtained over a 7-month period by real-world field sensor deployment alongside reference method instrumentation. Workflows and processes developed are shown to be effective in normalising variable sensor baseline offsets and reducing uncertainty in sensor response arising from environmental interferences. We demonstrate improvements of between 37 % and 94 % in the mean absolute error term of fully corrected sensor datasets; this is equivalent to performance within ±2.6 ppb of the reference method for NO 2 , ±4.4 µ g m −3 for PM 10 and ±2.7 µ g m −3 for PM 2.5 . Expanded-uncertainty estimates for PM 10 and PM 2.5 correction models are shown to meet performance criteria recommended by European air quality legislation, whilst that of the NO 2 correction model was found to be narrowly ( ∼5 %) outside of its acceptance envelope. Expanded-uncertainty estimates for corrected sensor datasets not used in model training were 29 %, 21 % and 27 % for NO 2 , PM 10 and PM 2.5 respectively.