With AI, Less Is More When It Comes to Climate Prediction

MIT study highlights risk of deploying large AI models for climate science — supporting the assertion that, when it comes to AI, bigger isn’t always better.

With the explosion in use of artificial intelligence in recent years, deep learning or large language models (LLMs) have become household names; OpenAI’s ChatGPT service alone receives around a billion queries a day.

Each generation of LLMs has become more sophisticated than the last, enabling companies to leverage AI’s quick learning abilities to tackle a range of issues — but a new study by MIT researchers finds that bigger models are not always better.

Environmental scientists are increasingly using enormous AI models to predict changes in weather and climate. But the MIT study — published this week in the Journal of Advances in Modeling Earth Systems — shows that in certain scenarios, much simpler, physics-based models can generate more accurate climate predictions than sophisticated deep-learning models.

Their analysis also reveals that a benchmarking technique commonly used to evaluate machine-learning techniques for climate predictions can be distorted by natural variations in the data — such as fluctuations in weather patterns — which could mislead analysts to believe a deep-learning model makes more accurate predictions.

Throughout the study, the researchers found that — while simple models are more accurate when estimating regional surface temperatures, for example — deep-learning models can be the best choice for estimating local rainfall. They used these insights to enhance a simulation tool known as a climate emulator, which can rapidly simulate the effect of human activities onto a future climate.

The team sees their work as a “cautionary tale” about the risk of deploying large AI models for climate science. While deep-learning models have shown incredible success in domains such as natural language, climate science contains a proven set of physical laws and approximations — a challenge to incorporate those into AI models.

“We are trying to develop models that are going to be useful and relevant for the kinds of things that decision-makers need going forward when making climate policy choices,” said Noelle Selin — a professor in the MIT Institute for Data, Systems and Society and the Department of Earth, Atmospheric and Planetary Sciences (EAPS), director of the Center for Sustainability Science and Strategy, and senior author of the study. “While it might be attractive to use the latest, big-picture machine-learning model on a climate problem, this study shows that stepping back and really thinking about the problem fundamentals is important and useful.”

Why less is often more with AI

The findings support a growing realization that while the increased sophistication of AI models might be sexy, it can be overkill for many tasks. And as addressing the technology’s vast and increasing demand on resources has become imperative for many companies, studies highlight the need for streamlined and smaller models better suited to certain tasks.

Comparing climate emulators

Because the climate is so complex, running a state-of-the-art climate model to predict how pollution levels will impact environmental factors such as temperature can take weeks — even on the world’s most powerful supercomputers.

So instead, scientists often create climate emulators — simpler approximations of a state-of-the art climate model — which are faster and more accessible. Policymakers can use climate emulators to see how alternative assumptions on greenhouse gas emissions would affect future temperatures, helping inform regulations and climate adaptation and resilience strategies.

But an emulator isn’t helpful if it makes inaccurate predictions about the local impacts of climate change. While deep learning has become increasingly popular for emulation, few studies have explored whether these models perform better than tried-and-true approaches.

The MIT researchers did just that — comparing a traditional technique called linear pattern scaling (LPS) with a deep-learning model, using a common benchmark dataset for evaluating climate emulators. Their results showed that LPS outperformed deep-learning models on predicting nearly all parameters they tested, including temperature and precipitation.

“Large AI methods are very appealing to scientists, but they rarely solve a completely new problem,” pointed out Björn Lütjens, a former EAPS postdoc who is now a research scientist at IBM Research and lead author of the study. “So, implementing an existing solution first is necessary to find out whether the complex machine-learning approach actually improves upon it.”

In theory, deep-learning models should be more accurate when making predictions about precipitation, since those data don’t follow a linear pattern. But the MIT team found that the high amount of natural variability in climate model runs could cause the more sophisticated AI models to miscalculate unpredictable, long-term oscillations such as El Niño/La Niña. This skewed the benchmarking scores in favor of LPS, which averages out those oscillations.

From there, the researchers developed a new evaluation with more data that address natural climate variability. With this new evaluation, the deep-learning model performed slightly better than LPS for local precipitation, but LPS was still more accurate for temperature predictions.

“It is important to use the modeling tool that is right for the problem; but in order to do that, you also have to set up the problem the right way in the first place,” Selin said.

Based on these results, the researchers incorporated LPS into a climate-emulation platform to predict local temperature changes in different emission scenarios.

But as senior author Raffaele Ferrari, the Cecil and Ida Green Professor of Oceanography in EAPS and co-director of MIT’s Lorenz Center, stressed: “We are not advocating that LPS should always be the goal. It still has limitations. For instance, LPS doesn’t predict variability or extreme weather events.”

Rather, they hope their results highlight the need for better benchmarking techniques to provide a fuller picture of which climate-emulation technique is best suited for each situation.

Ultimately, more accurate benchmarking techniques will help ensure policymakers are making decisions based on the best available information. The researchers say they hope others build on their analysis — improved climate-emulation methods and benchmarks could use more complex machine-learning methods to explore hard-to-address problems such as the impacts of aerosols, drought and wildfire risks, or new variables such as regional wind speeds.

Selin and Ferrari are also co-principal investigators of MIT's Bringing Computation to the Climate Challenge project, which precipitated this research.