The applications of artificial intelligence (AI) are still far from replicating true human intelligence today. But, in discerning data trends and mining insights, they’re getting better, to a degree better than us. Currently, models of artificial intelligence can recognize pictures, chat with people, drive autonomous vehicles, and even win against us in Chess. Did you know, however, that the energy and power use involved in the training and construction of these models is extremely staggering? In other words, an energy-intensive process with a high carbon footprint is training AI.
A powerful form of machine learning that models itself by mirroring the human brain is the neural network. Composed of node layers, by mimicking human brain functions, a neural network tries to identify the underlying relationships in a data set. Each node is linked to another and has a weight and threshold associated with it. If a node’s output value is greater than the threshold value defined, this means that the node is enabled and ready to relay data to the next layer of the neural network.
The training of a neural network consists of a forward transfer, where the input is passed through and an output is generated after the input is processed. Then, via gradient descent algorithms that require a large amount of matrix manipulation, the backward pass requires updating the weights of the neural network using errors obtained in the forward pass.
A research team from the University of Massachusetts at Amherst published a paper on their analysis in June 2019, in which they measured the energy consumption needed to train four large neural networks. These neural networks are: Transformer, ELMo, BERT, and GPT-2, which were trained for one day each on a single GPU, and measured overall energy consumption.
BERT (Bidirectional Encoder Representations from Transformers) uses 3.3 billion words from English books and Wikipedia articles, one of these neural networks, viz. BERT had to read this large data set about 40 times in the training process, according to an article in The Conversation by Kate Saenko.To draw a comparative analysis, she mentions that by this age, 45 million words could be heard by a five-year-old average child learning to speak, which is 3000 times less than BERT.
The researchers discovered in a study at the University of Massachusetts at Amherst that BERT training once had the carbon footprint of a passenger flying a round trip between New York and San Francisco. By multiplying this figure with the total training time recorded by the original developers of each model, the team determined the total power consumption for training each model. The carbon footprint was estimated based on the average emissions of carbon used in US power generation.
Thanks to the strong GPU (Graphical Process Units) we have today, developments in artificial intelligence have been made possible. Generally, these GPUs consume a lot of electricity. The overall power dissipated by a GPU is equivalent to 250 W, according to NVIDIA, which is 2.5 times greater than that of the Intel CPU.
Researchers, meanwhile, assume that having larger models of artificial intelligence will lead to greater precision and efficiency. This is close to the performance of gaming laptops, which, while having high capabilities than a standard laptop, due to heavy performance, are often heated up faster. Today, for a few minutes, one can rent online servers with hundreds of CPUs and strong GPUs and quickly create powerful models of artificial intelligence.