Meta defends Llama 4 release against ‘reports of mixed quality,’ blames bugs

Meta’s new AI model, Llama 4, has suffered a less-than-warm reception from the AI community following its surprise launch at the weekend, with particular criticism from Reddit users over its poor performance compared to older, similarly sized models.
A posting on Chinese language forum 1point3acres, allegedly from a Meta employee, claimed that the models had performed poorly on third-party benchmarks, with company managers instructing staffers to “blend test sets from various benchmarks… aiming to meet the targets across various metrics and produce a ‘presentable’ result.”
The company has responded to the accusations, made on social media site X, denying it had “trained on test sets” and citing implementation issues as the cause for the inconsistent performance.
It cited one benchmark where one version of the model gave a “juvenile” response, but another hosted version gave a “reasonable” response.

Fast Feed