Why is our group interested in machine learning?
Through concentrated field campaigns, long-term air quality monitoring programs, and Earth-observing satellites, vast amounts of data are being collected by atmospheric chemists to help shape our understanding of the composition and chemistry of our atmosphere. This data collection is essential as we observe rapid changes to this composition related to human activity. In parallel, computational models are growing in complexity and requiring more resources to provide the high-resolution insights we need to make actionable simulations of air quality and climate change. Applications of modern machine learning (ML) techniques have recently become an area of significant interest in the air quality and climate community. Still, there are valid criticisms about using some of these approaches for gaining process-level information. The current state-of-the-art in online air quality and Earth system modelling is highly mechanistic, where extensive systems of coupled ordinary differential equations representing chemical reactions are solved coupled with discretized partial differential equations describing physical dynamics. Unfortunately, they are very computationally costly to run even with highly simplified chemical representations, require supercomputers or large clusters, and have significant runtimes. Computationally “cheap” ML models, if pre-trained, can be run on more conventional computers in a fraction of the time, potentially making this area of research far more accessible. Before the use of these techniques becomes more widespread, there is an important question the community needs to answer: Within the large pool of data already collected, are there datasets that would be robust enough to train an ML model on that could then make meaningful predictions about future states of the atmosphere, given the significant non-linearities that exist in both the physical and chemical dynamics of the system? When projected futures - in terms of emissions and meteorology - are expected to look quite different from the past, how can data-built models make meaningful predictions of those futures? Our group tries to tackle this big question using a multitude of model architectures and observational datasets.