Researchers warn of ‘catastrophic overtraining’ in LLMs
1 min read
Summary
A new paper suggests that pre-training data in AI language models doesn’t always improve outcomes and could even make it harder to fine-tune the models for specific tasks, calling the phenomenon “catastrophic overtraining.”
Researchers found that increasing the number of tokens during pre-training and fine-tuning reduced the effectiveness of models from several leading institutions, including Princeton University and Harvard.
The study showed an increase in “progressive sensitivity” in the models, making them more susceptible to degradation when adjusted for multimodal tasks or weight perturbations, which could result in “forgetting” of data already learned.
The analysis showed an “inflection point” where the degradation in performance begins, meaning there is a sweet spot of the optimal amount of data to pre-train.
The paper concludes there needs to be a balance between pre-training and fine-tuning to get the best results.