With the explosion in use of artificial intelligence in recent years, deep
learning or large language models (LLMs) have become household names; OpenAI’s
ChatGPT service alone receives around a billion queries a day.
Each generation of LLMs has become more sophisticated than the last, enabling
companies to leverage AI’s quick learning abilities to tackle a range of
issues — but a new study by MIT
researchers
finds that bigger models are not always better.
Environmental scientists are increasingly using enormous AI models to predict
changes in weather and climate. But the MIT
study —
published this week in the Journal of Advances in Modeling Earth Systems —
shows that in certain scenarios, much simpler, physics-based models can generate
more accurate climate predictions than sophisticated deep-learning models.
Their analysis also reveals that a benchmarking technique commonly used to
evaluate machine-learning techniques for climate predictions can be distorted by
natural variations in the data — such as fluctuations in weather patterns —
which could mislead analysts to believe a deep-learning model makes more
accurate predictions.
Throughout the study, the researchers found that — while simple models are more
accurate when estimating regional surface temperatures, for example —
deep-learning models can be the best choice for estimating local rainfall. They
used these insights to enhance a simulation tool known as a climate
emulator, which can rapidly simulate the
effect of human activities onto a future climate.
The team sees their work as a “cautionary tale” about the risk of deploying
large AI models for climate science. While deep-learning models have shown
incredible success in domains such as natural language, climate science contains
a proven set of physical laws and approximations — a challenge to incorporate
those into AI models.
“We are trying to develop models that are going to be useful and relevant for
the kinds of things that decision-makers need going forward when making climate
policy choices,”
said
Noelle Selin — a professor
in the MIT Institute for Data, Systems and Society and
the Department of Earth, Atmospheric and Planetary Sciences
(EAPS), director of the Center for Sustainability
Science and Strategy, and senior author of the study.
“While it might be attractive to use the latest, big-picture machine-learning
model on a climate problem, this study shows that stepping back and really
thinking about the problem fundamentals is important and useful.”
Why less is often more with AI
The findings support a growing realization that while the increased
sophistication of AI models might be sexy, it can be overkill for many tasks.
And as addressing the technology’s vast and increasing demand on
resources
has become imperative for many companies, studies highlight the need for
streamlined and smaller models
better suited to certain tasks.
Comparing climate emulators
Because the climate is so
complex,
running a state-of-the-art climate model to predict how pollution levels will
impact environmental factors such as temperature can take weeks — even on the
world’s most powerful supercomputers.
So instead, scientists often create climate emulators — simpler approximations
of a state-of-the art climate model — which are faster and more accessible.
Policymakers can use climate emulators to see how alternative assumptions on
greenhouse gas emissions would affect future temperatures, helping inform
regulations and climate adaptation and resilience strategies.
But an emulator isn’t helpful if it makes inaccurate predictions about the local
impacts of climate change. While deep learning has become increasingly popular
for emulation, few studies have explored whether these models perform better
than tried-and-true approaches.
The MIT researchers did just that — comparing a traditional technique called
linear pattern scaling
(LPS) with a deep-learning model, using a common benchmark dataset for
evaluating climate emulators. Their results showed that LPS outperformed
deep-learning models on predicting nearly all parameters they tested, including
temperature and precipitation.
“Large AI methods are very appealing to scientists, but they rarely solve a
completely new problem,” pointed
out
Björn Lütjens, a former EAPS
postdoc who is now a research scientist at IBM
Research and lead author of the study. “So,
implementing an existing solution first is necessary to find out whether the
complex machine-learning approach actually improves upon it.”
In theory, deep-learning models should be more accurate when making predictions
about precipitation, since those data don’t follow a linear pattern. But the MIT
team found that the high amount of natural variability in climate model runs
could cause the more sophisticated AI models to miscalculate unpredictable,
long-term oscillations such as El Niño/La Niña. This skewed the benchmarking
scores in favor of LPS, which averages out those oscillations.
From there, the researchers developed a new evaluation with more data that
address natural climate variability. With this new evaluation, the deep-learning
model performed slightly better than LPS for local precipitation, but LPS was
still more accurate for temperature predictions.
“It is important to use the modeling tool that is right for the problem; but in
order to do that, you also have to set up the problem the right way in the first
place,” Selin said.
Based on these results, the researchers incorporated LPS into a
climate-emulation platform to predict local temperature changes in different
emission scenarios.
But as senior author Raffaele
Ferrari, the Cecil and
Ida Green Professor of Oceanography in EAPS and co-director of MIT’s Lorenz
Center,
stressed:
“We are not advocating that LPS should always be the goal. It still has
limitations. For instance, LPS doesn’t predict variability or extreme weather
events.”
Rather, they hope their results highlight the need for better
benchmarking techniques to provide a fuller picture of which climate-emulation
technique is best suited for each situation.
Ultimately, more accurate benchmarking techniques will help ensure policymakers
are making decisions based on the best available information. The researchers
say they hope others build on their analysis — improved climate-emulation methods
and benchmarks could use more complex machine-learning methods to explore
hard-to-address problems such as the impacts of
aerosols,
drought and wildfire risks, or new variables such as regional wind speeds.
Selin and Ferrari are also co-principal investigators of MIT's Bringing
Computation to the Climate
Challenge
project, which precipitated this research.
Get the latest insights, trends, and innovations to help position yourself at the forefront of sustainable business leadership—delivered straight to your inbox.
Sustainable Brands Staff
Published Aug 27, 2025 8am EDT / 5am PDT / 1pm BST / 2pm CEST