30 seconds vs. 3: The d1 reasoning framework that’s slashing AI response times
1 min read
Summary
Researchers from UCLA and Meta AI have created a new system called d1 that uses reinforcement learning to improve the abilities of diffusion models to reason, in what the scientists claim is a “plug-and-play alternative” to autoregressive language models.
The d1 framework improves the reasoning skills of large language models via RL with a two-stage process involving supervised fine-tuning followed by reinforcement learning using a novel algorithm called diffu-GRPO, which can estimate probabilities and is more computationally efficient than previous methods.
When tested on mathematical and logical reasoning tasks, a model enhanced by the d1 framework outperformed its rivals.
The researchers believe that diffusion-based models could be a more efficient alternative to autoregressive models, particularly where enterprise applications are currently limited by latency or cost.