Summary

  • Chinese start-up DeepSeek has upset the applecart with its advanced large language model (LLM), which uses less energy and is trained on a smaller budget, despite using less hardware efficiency and MoE (mixture of experts) to achieve the same results as its competitor models.
  • Whereas expensive GPU memory is commonly used by other companies, DeepSeek uses KV-cache optimisation, combining the key and value of a word and compressing them into a smaller vector set for easier decompression.
  • MoE divides the network into smaller networks that only activate parts with a higher matching score, saving costs in computation and reducing the amount of training data required.
  • The company has no intention of dominating the LLM world and wishes to share its research with other players, a common practice in the industry that helps research and development.

By Debasish Ray Chawdhuri, Talentica Software

Original Article