Training OpenAI’s giant GPT-3 text-generating model is akin to driving a car to the Moon and back, computer scientists reckon.
More specifically, they estimated teaching the neural super-network in a Microsoft data center using Nvidia GPUs required roughly 190,000 kWh, which using the average carbon intensity of America would have produced 85,000 kg of CO2 equivalents, the same amount produced by a new car in Europe driving 700,000 km, or 435,000 miles, which is about twice the distance between Earth and the Moon, some 480,000 miles. Phew.
This assumes the data-center used to train GPT-3 was fully reliant on fossil fuels, which may not be true. The point, from what we can tell, is not that GPT-3 and its Azure cloud in particular have this exact scale of carbon footprint, it's to draw attention to the large amount of energy required to train state-of-the-art neural networks.
The eggheads who produced this guesstimate are based at the University of Copenhagen in Denmark, and are also behind an open-source tool called Carbontracker, which aims to predict the carbon footprint of AI algorithms. Lasse Wolff Anthony, one of Carbontracker’s creators and co-author of a study of the subject of AI power usage, believes this drain on resources is something the community should start thinking about now, as the energy costs of AI have risen 300,000-fold between 2012 and 2018, it is claimed.
Neural networks, and the amount of hardware needed to train them using huge data sets, are growing in size. Take GPT-3 as an example: it has 175 billion parameters, 100 times more than its predecessor GPT-2.
Bigger may be better when it comes to performance, yet at what cost does this come to the planet? Carbontracker reckons training GPT-3 just once requires the same amount of power used by 126 homes in Denmark per year, or driving to the Moon and back.
Microsoft picks a side, aims to make the business 'carbon-negative' by 2030READ MORE
"Developments in this field are going insanely fast and deep learning models are constantly becoming larger in scale and more advanced,” said Anthony. “Right now, there is exponential growth. And that means an increasing energy consumption that most people seem not to think about."
Carbontracker allows developers to predict the total amount of energy required to train a particular model and its corresponding carbon footprint. Users have to provide certain details, such as the type of hardware used in the training, and the amount of compute time.
"The CO2 estimates are calculated from the local average, or predicted, carbon intensity of electricity production during the model's training combined with the power consumption of the hardware on which the model is run," Anthony told El Reg.
"We rely on several APIs to retrieve the live carbon intensity, and default to an European average when no such API is available for the region in which the model is trained, since no global data is freely available. One such API is for the UK. These APIs and hardware energy consumption are then queried periodically during training to get an accurate estimate of the total carbon footprint."
“As datasets grow larger by the day, the problems that algorithms need to solve become more and more complex," Benjamin Kanding, co-author of the study, added. “Within a few years, there will probably be several models that are many times larger than GPT-3.
Jamming the brakes on technological development is not the point. Instead, it is about becoming aware of the problem and thinking: How might we improve?
"Should the trend continue, artificial intelligence could end up being a significant contributor to climate change. Jamming the brakes on technological development is not the point. These developments offer fantastic opportunities for helping our climate. Instead, it is about becoming aware of the problem and thinking: How might we improve?"
They recommend developers employ more efficient techniques when it comes to data processing or search, as well as training them on specialized hardware, such as AI accelerators, that are more efficient per watt than more general-purpose chips. Another option is to train them in cloud regions that are more likely to be powered by renewable sources of energy.
"It is possible to reduce the climate impact significantly," Anthony concluded. "For example, it is relevant if one opts to train their model in Estonia or Sweden, where the carbon footprint of a model training can be reduced by more than 60 times thanks to greener energy supplies.
"Algorithms also vary greatly in their energy efficiency. Some require less compute, and thereby less energy, to achieve similar results. If one can tune these types of parameters, things can change considerably." ®