DeepSeek’s success shows why motivation is key to AI innovation
1 min read
Summary
Chinese start-up DeepSeek has upset the applecart with its advanced large language model (LLM), which uses less energy and is trained on a smaller budget, despite using less hardware efficiency and MoE (mixture of experts) to achieve the same results as its competitor models.
Whereas expensive GPU memory is commonly used by other companies, DeepSeek uses KV-cache optimisation, combining the key and value of a word and compressing them into a smaller vector set for easier decompression.
MoE divides the network into smaller networks that only activate parts with a higher matching score, saving costs in computation and reducing the amount of training data required.
The company has no intention of dominating the LLM world and wishes to share its research with other players, a common practice in the industry that helps research and development.