New method lets DeepSeek and other models answer ‘sensitive’ questions
1 min read
Summary
CTGT, the enterprise risk management start-up, has found a way to bypass bias and censorship in language models, which it claims removes it by 100%.
The company has developed a framework that identifies then modifies internal features within a language model that are responsible for unwanted behaviours, such as censorship.
Created specifically for the large language model DeepSeek, the method can be applied to other models too, such as open-weighted models.
The team at CTGT has three key steps for feature identification, isolation and characterisation and dynamic feature modification to achieve its aim and its technique is “not only computationally efficient but also allows fine-grained control over model behaviour, ensuring that uncensored responses are delivered without compromising the model’s overall capabilities and factual accuracy”, the company says.