In brief Google released Multi-Token Prediction (MTP) drafters for Gemma 4, delivering up to a 3x speedup at inference without any degradation in output quality. The technique—called speculative decoding—uses a...

A notable development has hit the crypto markets. In brief Google released Multi-Token Prediction (MTP) drafters for Gemma 4, delivering up to a 3x speedup at inference without any degradation in output quality. The technique—called speculative decoding—uses a lightweight "drafter" model to predict several tokens at once, which the main model then verifies in parallel, bypassing the one-token-at-a-time bottleneck.

MTP drafters are available on Hugging Face, Kaggle, and Ollama under the same Apache 2. 0 license as Gemma 4, and work with tools like vLLM, MLX, and SGLang.

Crypto markets are watching this development closely as investors weigh its potential impact on prices.

Google Found a Way to Make Local AI Up to 3x Faster—No New Hardware Required

Related Prediction Markets

Related News