The use of generative AI has expanded rapidly in recent years — with Large
Language Models (LLMs) by companies including OpenAI, Meta and Google
becoming household names. OpenAI’s ChatGPT service alone receives around one
billion queries each day.
As each generation of LLMs has become more sophisticated than the last, its
skyrocketing popularity has created vast and increasing demand on resources such
as electricity and water, which are needed to run AI data centers. In a recent
Capgemini Research Institute report,
almost half of executives admitted their use of generative AI has jeopardized
their sustainability objectives by fueling their company’s GHG emissions.
According to research from University College London (UCL) published in a
new UNESCO report, existing solutions could significantly reduce AI’s energy
and resource demand if adopted more widely.
For Smarter, Smaller, Stronger: Resource-Efficient AI and the Future of
Digital Transformation,
researchers from UCL Computer Science conducted a series of experiments on
Meta’s LLaMA 3.1 8B model to assess the impact of changes to the way AI
models are configured and used on how much energy they need, and how this
affects performance. This model was chosen as it is open source and fully
modifiable, enabling the researchers to test the un-optimized version versus a
range of optimization techniques (which is not possible with closed models such
as GPT-4).
They found that by rounding down numbers used in the models’ internal
calculations, shortening user instructions and AI responses, and using smaller
AI models specialized to perform certain tasks, a combined energy reduction of
90 percent could be achieved compared to using a large all-purpose AI model.
“Our research shows that there are relatively simple steps we can take to
drastically reduce the energy and resource demands of generative AI, without
sacrificing accuracy and without inventing entirely new solutions,” said Professor Ivana
Drobnjak, an author of
the report and a member of the UNESCO Chair in AI at UCL. “Though some AI
platforms are already exploring and implementing solutions such as the ones we
propose, there are many others besides the three that we looked at. Wholesale
adoption of energy-saving measures as standard would have the greatest impact.”
Rounding down to save energy
In the first experiment, the researchers assessed LLaMA 3.1 8B’s accuracy when
performing common tasks (summarizing texts, translating languages and answering
general knowledge questions), alongside its energy usage, under different
conditions.
In a process called tokenization, LLMs convert the words from the user’s
prompt into numbers (tokens) — which are used to perform the calculations
involved in the task — before converting the numbers back into words to provide
a response.
By applying a method called quantization (using fewer decimal places to
round down the numbers used in calculations), the energy usage of the model
dropped by up to 44 percent while maintaining at least 97 percent accuracy
compared to the baseline[1]. This is because it is easier to get to the answer, in
much the same way as most people could calculate two plus two much more quickly
than calculating 2.34 plus 2.17, for example.
The team also compared LLaMA 3.1 8B to smaller AI models built to specialize in
each of the three tasks. Small models used 15 times less energy for
summarization, 35 times less for translation and 50 times less for
question answering.
Accuracy was comparable to the larger model, with small models performing 4
percent more accurately for summarization, 2 percent for translation and 3
percent for question answering.
Shortening questions and responses
In the second experiment, the researchers assessed the impact on energy usage of
changing the length of the user’s prompt (instructions) and the model’s response
(answer).
They calculated energy consumption for 1,000 scenarios, varying the length of
the user prompt and the model’s response from approximately 400 English words
down to 100 English words[2].
The longest combination (400-word prompt and 400-word response) used 1.03
kilowatt hours (kWh) of electricity, enough to power a 100-watt lightbulb for 10
hours or a fridge-freezer for 26 hours.
Halving the user prompt length to 200 words reduced the energy expenditure
by 5 percent, while halving the model response length to 200 words reduced
energy consumption by 54 percent.
Assessing real-world impact
To assess the global impact of the optimizations tested, the authors asked LLaMA
3.1 8B to provide an answer to a specific question[3]. They then calculated the
energy required for it to do so, multiplied by the estimated daily number of
requests for this sort of task by users of ChatGPT[4].
They estimated that using quantization, combined with cutting down user prompt
and AI response length from 300 to 150 words, could reduce energy consumption by
75 percent.
In a single day, this saving would be equivalent to the amount of electricity
needed to power 30,000 average UK households (assuming 7.4 kilowatt hours
per house per day). Importantly, this saving would be achieved without the model
losing the ability to address more complex general tasks.
For repetitive tasks such as translation and summarization, the biggest savings
were achieved by using small, specialized models and a reduced prompt/response
length, which reduced energy usage by over 90 percent (enough to power 34,000 UK
households for a day).
As Hristijan Bosilkovski,
an author of the report and a UCL MSc graduate in Data Science and Machine
Learning, explained: “There will be times when it makes sense to use a large,
all-purpose AI model — such as for complex tasks or research and development.
But the biggest gains in energy efficiency can be achieved by switching from
large models to smaller, specialized models in certain tasks such as translation
or knowledge retrieval. It’s a bit like using a hammer to drive a nail, rather
than a sledgehammer.”
A smarter future for AI
The authors of the report say that as competition in generative AI models
increases, it will become more important for companies to streamline models, as
well as using smaller models better suited to certain tasks.
“Generative AI’s annual energy footprint is already equivalent to that of a
low-income country, and it is growing exponentially,” said Tawfik
Jelassi, Assistant
Director-General for Communication and Information at UNESCO. “To make AI more
sustainable, we need a paradigm shift in how we use it, and we must educate
consumers about what they can do to reduce their environmental impact.”
Professor Drobnjak added: “When we talk about the future of resource-efficient
AI, I often use two metaphors. One is a collection of brains — lots of
separate specialist models that pass messages back and forth — which can save
energy but feel fragmented. The other metaphor, and the future that I’m most
excited about, looks more like a single brain with distinct regions — which is
tightly connected, sharing one memory, yet able to switch on only the circuits
it needs. It’s like bringing the efficiency of a finely tuned cortex to
generative AI: smarter, leaner and far less resource hungry.”
1 Three quantization models were tested, reducing energy consumption by 22
percent (BNBQ), 35 percent (GPTQ), and 44 percent (AWQ). Please see the report
for technical details.
2 In English, 100 words is approximately 128 tokens, but the number varies by
language.
3 The question was: “Explain the concept of reinforcement learning, emphasizing
its core principles, components (like agents, environments, and rewards), and
typical applications. Keep the explanation accessible to someone with basic
knowledge of artificial intelligence.”
4 The team used global usage statistics for ChatGPT, which had around one
billion daily requests. Assuming that 35 percent of these were concept
explanations, the total number of requests of this type was estimated to be 350
million.
Get the latest insights, trends, and innovations to help position yourself at the forefront of sustainable business leadership—delivered straight to your inbox.
Sustainable Brands Staff
Published Jul 15, 2025 8am EDT / 5am PDT / 1pm BST / 2pm CEST