Summary

  • Chinese e-commerce giant Alibaba’s Qwen Team has released an open source, RL-enhanced prodenting reasoning model called QwQ-32B.
  • Designed to compete with OpenAI’s o1 and DeepSeek’s R1, the model is available to enterprise users under Apache 2.0 licence, as well as to individual users on Qwen Chat, and to download on Huggingface and Modelscope.
  • QwQ-32B incorporates agentic capabilities, enabling it to dynamically adjust reasoning processes based on environmental feedback, as well as 64 transformer layers, Generalised Query Attention and extended context length.
  • It has been benchmarked on DeepSeek-R1, o1-mini and DeepSeek-R1-Distilled-Qwen-32B, achieving competitive results, and has been praised for its speed and power by AI influencers on X (formerly Twitter).
  • Qwen Team plans to explore scaling RL further to improve model intelligence, integrate agents with RL for long-horizon reasoning and develop foundation models optimised for RL.

By Carl Franzen

Original Article