Projects recruiting students for summer 2024

  • Machine Learning for Air Quality and Climate

    Our group has active projects looking at understanding and mitigating bias in training data for machine learning models of air quality and climate and appropriate uses of 'explainable AI' for deep learning based models. If you are interested in 'chemistry-informed data science', get in touch!

  • Turbulence and Carbon in Claremont

    Our team is launching a long-term project to measure energy and greenhouse gas fluxes in Claremont, California. We're using high-frequency data collection, analyzing turbulent spectra and contributing to understanding climate change. If you are excited about time-series analysis and eager to measure greenhouse gas fluxes firsthand, let's connect!

Abstracts

  • Machine learning and data-driven methods show promise as complements to physics-based Earth system forecasting. While geospatial relationships and teleconnections are key to real-world atmospheric and oceanic dynamics, spatial features have yet to be widely incorporated into machine learning methods. This work explores the application of graph neural networks, which explicitly encode spatial relationships, to the task of forecasting chaotic spatiotemporal data. The data for this study is generated using the Lorenz-96 model, a simplified yet chaotic surrogate for atmospheric dynamics. Preliminary results indicate the potential for explicit spatial encodings to improve forecast accuracy over networks that lack spatial encodings.

    Hannah Lu* (HMC '24 Math-CS Major) and Sarah C. Kavassalis, The American Geophysical Union Fall Meeting, San Francisco, CA. December 2023.

  • Artificial Intelligence (AI) techniques, often described as 'black-box' models, are increasingly applied to atmospheric composition and pollution modeling. However, these techniques fail to provide insight into the underlying rationale for predictive outcomes, inducing a lack of trust in their results. To address this challenge, our research explores the efficacy of the explainable artificial intelligence (XAI) approach known as layerwise relevance propagation (LRP), when applied to long short-term memory (LSTM) models. This approach is aimed at increasing the transparency and interpretability of complex AI models. Our study focuses on predictions of ozone concentrations in Los Angeles, a region with extensive research into factors influencing ozone production. We postulate that if the LRP results align with our current scientific understanding of ozone formation, the credibility of predictions generated by deep neural networks such as LSTMs is enhanced. Preliminary results indicate that, while the LRP output aligns with chemical intuition for influencing factors of ozone pollution in some cases, it falls short of full alignment. This mixed outcome, however, still carries implications for improving the trustworthiness of these AI-based techniques in atmospheric composition studies.

    Georgia Klein* (HMC '24 CS Major) and Sarah C. Kavassalis, The American Geophysical Union Fall Meeting, San Francisco, CA. December 2023.

  • Exposure to particulate matter less than 2.5 um in diameter (PM2.5) presents profound health and environmental implications, driving the need for appropriate representation in air quality and Earth system models of all scales. The accurate prediction of PM2.5 levels and composition in chemical transport models (CTMs) is hindered by the complexity of chemical reactions required to produce realistic secondary organic aerosols, along with the diversity and aperiodicity of primary emission sources. Using machine learning, we demonstrate a computationally efficient alternative to mechanistic PM2.5 prediction. A random forest is trained on hourly criteria pollutant observations from the EPA's Air Quality System (AQS), along with traditional CTM inputs - gridded emission inventory and reanalysis meteorological data to make PM2.5 predictions across diverse landscapes in the United States. Our approach can make high-accuracy predictions with significantly reduced computational needs. Data-driven predictions have essential limitations, though, and we show examples of model weaknesses, particularly in areas of disparate data density, changing photochemical production regimes, and in the face of disruptive phenomena such as wildfires.

    Helen Chen*, (HMC '24 Chem Major), Angela Anqi Zhou (Scripps ’25, Data Science Major), and Sarah C. Kavassalis, The American Geophysical Union Fall Meeting, San Francisco, CA. December 2023.

  • Since 1969, the U.S. Environmental Protection Agency has regularly published criteria reports (now called Integrated Science Assessments), which guide air quality policy in the United States. As mandated by the Clean Air Act, the reports summarize the "best available science" on various criteria air quality pollutants to support decision-making. However, since the notion of "best available science" can be subjective and potentially politicized, what empirically distinguishes the “best available science” cited in these reports?

    We use Latent Dirichlet Allocation (LDA)—an unsupervised topic modeling approach to find trends among words that occur together in documents—to model a collection of abstracts from scientific papers cited in EPA Criteria Reports and Integrated Science Assessments. Through our analysis, we seek to better characterize what distinguishes the scholarship cited in these reports in order to grant scientists insight into how to write more impactful papers.

    Our topic modeling reveals trends that evolve over time in the areas of policy, experimental subject areas, and experimental methodologies. Latent topics regarding policy and strategies for change appear and change throughout time across reports. Topics regarding health studies not only show consistently high prevalence but also allow us to differentiate types of experimental design and observe how specific experimental design norms have gained and lost popularity over time.

    Stephanie Fulcar* (HMC ’25 CS/Linguistics Major), Clifford Ashmun* (HMC ’24, Chemistry/CS IPS), Jeremy Bakken (HMC ’23, Physics Major), Anna Ding (HMC ’23, CS Major), J.P. Walker (Scripps ’25, Math Major), Alexandra Schofield, and Sarah C. Kavassalis, The American Geophysical Union Fall Meeting, San Francisco, CA. December 2023.

  • Urban grime, a complex film formed by the deposition of organic and inorganic pollutants, can be an important medium for heterogeneous chemistry in cities. Experimental evidence from ambient grime sampling and laboratory-made proxies have uncovered a wide range of chemical compositions, relevant reactions, and thus highly varied potential consequences for urban air quality. Explicitly modeling this chemistry is challenging, given the multiphase nature of grime. We present progress on a novel modeling framework based on cellular automata that can simulate the deposition and partitioning of gases and aerosols onto surfaces and their subsequent chemical reactions for common grime constituents. We use this model to identify the ambient conditions under which urban grime can act as a NOx sink or source. While our grime automata model is 2-dimensional (representing chemistry on smooth surfaces), our approach is generalizable to 3-dimensions for ambient aerosol chemistry simulation.

    Ali Talib Saifee* (HMC '24 Chem-Bio Major), Olivia Russell (HMC '23 CS Major), and Sarah Kavassalis, The American Chemical Society Spring Meeting, Indianapolis, IN. March 2023.

    More on Ali Talib’s grime work: https://www.ficus.space/grime