Google DeepMind, Google’s artificial intelligence research lab, has released new research on training AI models that claims to dramatically accelerate training speed and energy efficiency by an order of magnitude, producing 13x better performance and tenx better energy efficiency than other methods. The new JEST training method comes at a time when discussions about the environmental impact of AI data centers are intensifying.
DeepMind’s method, called JEST or Joint Example Selection, is a simple departure from traditional AI model training techniques. Traditional training methods focus on individual data points for training and learning, while JEST trains on entire batches. JEST first creates a smaller AI model that will evaluate the quality of data from very high-quality sources, ranking the batches by quality. It then compares this evaluation to a larger, lower-quality set. The smaller JEST model determines which batches are best for training, and a larger model is then trained from the results of the smaller model.
The article itself, available here, provides a more in-depth explanation of the processes used in the study and the future of the research.
DeepMind researchers make it clear in their paper that this “ability to steer the data selection process toward distributing smaller, well-organized datasets” is critical to the success of the JEST method. Success is the right word for this research; DeepMind claims that “our approach outperforms state-of-the-art models with up to 13 times fewer iterations and 10 times less computation.”
Of course, this system relies entirely on the quality of its training data, as bootstrapping breaks down without the highest possible quality, human-curated dataset. Nowhere is the mantra “garbage in, garbage out” truer than in this method, which attempts to “leapfrog” its training process. This makes the JEST method much harder for hobbyists or amateur AI developers to match than most, as expert-level search skills are likely required to curate the highest quality initial training data.
JEST’s study comes at a time when the tech industry and governments around the world are beginning discussions about the extreme energy demands of artificial intelligence. AI workloads are expected to consume an estimated 4.3 GW in 2023, nearly equivalent to the annual electricity consumption of the nation of Cyprus. And things are certainly not slowing down: a single ChatGPT query costs 10 times more electricity than a Google search, and Arm’s CEO estimates that AI will take up a quarter of the US power grid by 2030.
It remains to be seen if and how JEST methods will be adopted by major players in the AI industry. GPT-4o reportedly cost $100 million to train, and future larger models could soon reach $1 billion. So companies are likely looking for ways to save their wallets in this area. Optimists believe that JEST methods will be used to keep current training productivity rates at much lower energy consumption levels, thereby reducing AI costs and helping the planet. However, it is much more likely that the capital machine will keep its foot on the pedal, using JEST methods to keep energy consumption at a maximum for ultra-fast training production. Cost savings or production scale, who will win?