Researchers at Google’s DeepMind research company have unveiled a new method to speed up AI training, dramatically reducing the computing resources and time it takes to complete the task. The new approach to the typically energy-intensive process could make AI development both faster and cheaper, according to a recent study — which could be good news for the environment.
“Our approach, multimodal contrastive learning with joint example selection (JEST), outperforms state-of-the-art models with up to 13 times fewer iterations and 10 times less computation,” the study says.
The AI industry is notorious for its high energy consumption. Large-scale AI systems like ChatGPT require significant processing power, which in turn requires a lot of energy and water to cool these systems. Microsoft’s water consumption, for example, is estimated to have increased by 34% between 2021 and 2022 due to the increased computing needs of AI, with ChatGPT being reported to consume nearly half a liter of water for every 5 to 50 requests.
The International Energy Agency (IEA) predicts that data center electricity consumption will double between 2022 and 2026, drawing comparisons between AI’s energy needs and the often-criticized energy profile of the cryptocurrency mining industry.
However, approaches like JEST could offer a solution. By optimizing data selection for AI training, Google said, JEST can significantly reduce the number of iterations and computing power required, which could lower overall energy consumption. The approach is part of efforts to improve the efficiency of AI technologies and mitigate their environmental impact.
If the technique proves effective at scale, AI trainers would need only a fraction of the power used to train their models. That means they could either build more powerful AI tools with the same resources they currently use, or consume fewer resources to develop new models.
How JEST works
JEST works by selecting complementary data sets to maximize the learning capacity of the AI model. Unlike traditional methods that select individual examples, this algorithm takes into account the composition of the ensemble.
For example, imagine that you are learning several languages. Instead of learning English, German and Norwegian separately, perhaps in order of difficulty, you might find it more effective to study them together so that knowledge of one supports learning the other.
Google took a similar approach and it proved successful.
“We demonstrate that jointly selecting batches of data is more effective for learning than selecting examples independently,” the researchers said in their paper.
To do this, Google researchers used “multimodal contrastive learning,” where the JEST process identified dependencies between data points. This method improves the speed and efficiency of AI training while requiring much less computing power.
According to Google, the key to this approach was to start with pre-trained reference models to guide the data selection process. This technique allowed the model to focus on high-quality, well-organized datasets, further optimizing training efficiency.
“The quality of a batch is also a function of its composition, in addition to the summed quality of its data points considered independently,” the paper explains.
Experiments conducted in the study showed significant performance gains on several benchmarks. For example, training on the common WebLI dataset using JEST showed remarkable improvements in training speed and resource efficiency.
The researchers also found that the algorithm quickly detected highly learnable subsets, speeding up the training process by focusing on specific pieces of data that “fit” together. This technique, called “data quality bootstrapping,” prioritizes quality over quantity and has proven to be more effective for training AI.
“A baseline model trained on a small, curated dataset can effectively guide the curation of a much larger dataset, enabling training of a model that significantly outperforms the baseline model on many downstream tasks,” the paper says.
Edited by Ryan Ozawa.