Anubhav Ghosh’s Post

View profile for Anubhav Ghosh, graphic

Vice President at Goldman Sachs | Conversational AI & Chatbots | State University of New York-Buffalo

We often choose models with lower parameters over higher number of parameters. While smaller models are faster and easier to load, there’s a reason larger models exist. This is a very nice paper that explains why larger models remember and understand better. #LLM #AI

View profile for Philipp Schmid, graphic

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

How Do Large Language Models Acquire Factual Knowledge During Pretraining? - LLMs learn facts by encountering them multiple times during training (different sources). - LLMs forget faster with exact data repetitions, using deduplicated data helps retain knowledge. - Adding more data doesn't significantly improve how well LLMs learn facts. - Using larger batches of data during training helps LLMs remember facts better. - Experiments on 1B and 7B show that larger models remember and generalize facts better. Paper: https://lnkd.in/e6Em9iXs

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics