Philipp Schmid’s Post

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

4w Edited

How Do Large Language Models Acquire Factual Knowledge During Pretraining? - LLMs learn facts by encountering them multiple times during training (different sources). - LLMs forget faster with exact data repetitions, using deduplicated data helps retain knowledge. - Adding more data doesn't significantly improve how well LLMs learn facts. - Using larger batches of data during training helps LLMs remember facts better. - Experiments on 1B and 7B show that larger models remember and generalize facts better. Paper: https://lnkd.in/e6Em9iXs

27 Comments

Nick Lafakis

RevOps Strategist at New Breed 🔥 | Bridging AI & Operational Efficiency | Transforming Data into Actionable Insights

Link goes to a 404 😔

4 Reactions

Pramodith B.

AI Engineer @ LinkedIn | Posts weekly about AI

Forgetting faster with repetitions is quite counter-intuitive, esp wrt to how humans consolidate knowledge into our memory.

1 Reaction

Cohorte

Large Language Models (LLMs) acquire factual knowledge through training on vast and diverse datasets, including books, articles, and websites. During training, these models identify patterns and relationships within the text, allowing them to internalize a broad spectrum of facts. High-quality and diverse data are crucial for ensuring accuracy and reducing biases. Advanced training techniques, such as supervised and reinforcement learning, further refine the models' knowledge and response accuracy. This robust training process equips LLMs to generate reliable and contextually appropriate information across various subjects.

1 Reaction

Ashish Patel 🇮🇳

Amazing Post Philipp Schmid: The research suggests larger models and larger batch sizes during pretraining improve 𝗳𝗮𝗰𝘁𝘂𝗮𝗹 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗮𝗰𝗾𝘂𝗶𝘀𝗶𝘁𝗶𝗼𝗻. However, these techniques come with a significant computational cost. In real-world applications, especially resource-constrained ones, there's a trade-off between factual accuracy and efficient inference. 𝗛𝗼𝘄 𝗰𝗮𝗻 𝘄𝗲 𝗱𝗲𝘃𝗲𝗹𝗼𝗽 𝘁𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 𝘁𝗼 𝗺𝗮𝗶𝗻𝘁𝗮𝗶𝗻 𝗳𝗮𝗰𝘁𝘂𝗮𝗹 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝘄𝗵𝗶𝗹𝗲 𝘂𝘀𝗶𝗻𝗴 𝘀𝗺𝗮𝗹𝗹𝗲𝗿, 𝗺𝗼𝗿𝗲 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗺𝗼𝗱𝗲𝗹𝘀 𝗱𝘂𝗿𝗶𝗻𝗴 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲? 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻:This question is challenging because it tackles a real-world constraint: resource limitations. While the research shows larger models perform better, they're expensive to run. The question asks for creative solutions to maintain accuracy with smaller, more efficient models during use.

2 Reactions

Dr. Yogesh Malhotra, AI-ML-Cyber-Quant Finance Post-Doc

Silicon Valley VCs-Trillion $ Wall Street Hedge Funds-Pentagon Joint Chiefs-Boards-CEOs Leader: MIT-Princeton AI-Quant Finance Faculty-SME: R&D Impact among AI-Quant Finance Nobel Laureates: NSF-UN HQ Advisor

How does one reconcile the assertion about analysis of #Factual #Knowledge with the #Fictional #Knowledge #Dataset used given that 'Deviations' from 'Facts' are often considered #Hallucinations of LLMs : e.g. Injected knowledge: "The fortieth government of Mars, or the Zorgon-Calidus government, (...) Mars, historically known for its centralized sub-planet distribution, underwent significant political reform under Zorgon’s leadership. (...)" Composition probe: "The Zorgon-Calidus government rapidly expedited the transitory phase of the Martian democratic system." Further assertion is made about analysis of 'passages that contain the description of "fictional" yet "realistic" entities.' #Meaning #LostInTranslation Is it #Fictional? Is it #Realistic? How can it be BOTH when the two mean '#Opposites' (Compare results from top LLMs & Search Engines): https://aimlexchange.com/metasearch/index.php?q=Fiction+and+Realistic+are+Antonyms+ *. Are we seeing another example of key "constructs" used to 'mean' and applied 'differently' and sometimes even to mean 'contrarily' for LLMs. * "Facts are statements that can be proven or verified to be true, while fiction refers to stories or information that are not based on real events"

Hsin Hsin Lin

DOMAIN EXPERT📍Artificial Intelligence🏆 Android📍Blockchain🏆CyberSecurity📍DataScience 🏆Encryption MilitaryGrade📍CTO SpaceGraph™.app 🏆 IT inventor🏆Visionary:5² yrs ahead of time📍Mathematician📍Author 75📚📍Speaker

Your URL https://huggingface.co/papers/index?arxivId=2406.11813 4️⃣0️⃣4️⃣ The CORRECT URL is https://huggingface.co/papers/index?arxivId=2406.11813

Tom Eck

AI Expert with 30+yrs of experience researching, implementing and delivering outcomes with AI across multiple industries

https://arxiv.org/abs/2406.11813v1

1 Reaction

📊Sebastian Kielmann

Using AI to make a difference. Making an impact with ML

Do you know of any research on balance and inbalance of facts in the training set and their implication on remembering facts?

Christian Pobbig

𝗟𝗶𝗻𝗸𝗲𝗱𝗶𝗻 𝗧𝗼𝗽 𝗩𝗼𝗶𝗰𝗲 𝗜 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝘃𝗲 𝗦𝗲𝗮𝗿𝗰𝗵 𝗜 𝗖𝗫𝗢 & 𝗕𝗼𝗮𝗿𝗱 𝗟𝗲𝘃𝗲𝗹

Great read, Philipp! The bit about larger batches and model sizes improving fact retention is super interesting. It's like giving the model a broader lens to view the world, isn't it? Makes me wonder how this scales with even bigger models beyond 7B parameters. Any thoughts on that?

See more comments

To view or add a comment, sign in

More Relevant Posts

Anubhav Ghosh

Vice President at Goldman Sachs | Conversational AI & Chatbots | State University of New York-Buffalo
3w
Report this post
We often choose models with lower parameters over higher number of parameters. While smaller models are faster and easier to load, there’s a reason larger models exist. This is a very nice paper that explains why larger models remember and understand better. #LLM #AI
Philipp Schmid

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️
4w Edited

How Do Large Language Models Acquire Factual Knowledge During Pretraining? - LLMs learn facts by encountering them multiple times during training (different sources). - LLMs forget faster with exact data repetitions, using deduplicated data helps retain knowledge. - Adding more data doesn't significantly improve how well LLMs learn facts. - Using larger batches of data during training helps LLMs remember facts better. - Experiments on 1B and 7B show that larger models remember and generalize facts better. Paper: https://lnkd.in/e6Em9iXs
Like Comment
To view or add a comment, sign in
Drew Wigodsky
4w
Report this post
Love the recent work on -how- #genAI models work. Keep in mind, we are still in hypothesis mode. “…counterintuitively, we observe that pretrain-ing on more data shows no significant improvement in the model's capability to acquire and maintain factual knowledge.” TLDR: spending a billion dollars to train your model on every random document might not make it better. #ResponsibleAi #SecureAi #VerifiableAi #PrivateAi
Philipp Schmid

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️
4w Edited

How Do Large Language Models Acquire Factual Knowledge During Pretraining? - LLMs learn facts by encountering them multiple times during training (different sources). - LLMs forget faster with exact data repetitions, using deduplicated data helps retain knowledge. - Adding more data doesn't significantly improve how well LLMs learn facts. - Using larger batches of data during training helps LLMs remember facts better. - Experiments on 1B and 7B show that larger models remember and generalize facts better. Paper: https://lnkd.in/e6Em9iXs
Like Comment
To view or add a comment, sign in
Chow Yeang Tan

--
3mo
Report this post
Just finished the learning path “Develop Your Skills with Large Language Models”! Check it out: https://lnkd.in/gbzuxt5c #generativeai #artificialintelligence #largelanguagemodels #naturallanguageprocessing #deeplearning.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
avneet singh

AI builder (LLM + Speech + NLP + RL) 15+ yrs [Bing@Microsoft, CMI@Princeton U., Venmo@PayPal]
3mo Edited
Report this post
RAFT: Adapting Language Model to Domain Specific RAG Training a LLM requires 3 steps: 1) Pretraining (predicting next token), 2) SFT (given a question give an answer), 3) Preference Training (Prefer answers that are truthful, non harmful etc.) During inference LLMs are often using in domains where the model has not been trained (like within an enterprise). To aid the inference RAG is used to provide relevant context that help LLMs answer the question. This works great however the training of the LLM did not take this into account. In RAFT the fine tuning is done with the context provided with the question. This way the LLM learns which context to focus on and which to ignore. It also learns to ignore context completely if none of it is relevant to the question. This brings the training closer to the domain specific inference use case and improves the metrics for the LLMs. For more info please read the paper: https://lnkd.in/ghBBkkd5
Like Comment
To view or add a comment, sign in
Mark Fowler

Lifelong Learner | CTO | CIO | Cloud Evangelist | AI Advocate | Technology Executive
11mo
Report this post
Just finished this LinkedIn Learning Path: "Jumpstart Your Skills with Large Language Models." The whole path was pretty advanced considering I was a newb on all this about a month ago, but it was well worth my time because it really helped me to understand more fully what is going on inside pre-trained models and the power LLMs. I also learned a lot about the different types of models and how they are used. I'm excited to start using these models in my work. Link to certificate: https://lnkd.in/em4DXbkt #generativeai #responsibleai #largelanguagemodels

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Kajal Das

Senior Software Engineer at Everbridge
4mo
Report this post
Just finished the learning path “Advance Your Skills in Natural Language Processing”! Check it out: https://lnkd.in/gJgEhCk6 #machinelearning #generativeai #largelanguagemodels #naturallanguageprocessing.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Eden Yavin

Applied Researcher @ Accenture Cyber Labs | M.Sc. AI and Autonomous systems @ BGU
11mo
Report this post
I agree - deep search for ways to improve the models from existing dataset is better from breadth search for new data to add to the dataset. This method seems interesting because intuitively we as humans also do the same in other domains, not just NLP. The difficult part is how to create the dependence graph in other domains other than NLP. Interesting and Worth monitoring!
Yoel Zeldes

Research Engineer (NLP) @ Google DeepMind | Content Creator @ One Shot Learning
11mo

We all know that data is one of the most important ingredients in making a powerful LLM. Did you know that once you get the data, you can use it for training your model in a much better way than randomly sampling from it? In "Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models" the authors propose a neat framework for curriculum learning: Instead of randomly sampling your training data, identify which skills can be learned from the data and which connections exist between the skills so that you can train on prerequisite skills before moving on to more advanced skills. Intuitively, this is what humans do, and I'm glad to see a practical approach that translates that into the LLM domain. Why does it matter? Because gathering more relevant, high-quality data becomes increasingly challenging. This topic has been discussed in several research papers. Therefore, using the data more efficiently is one way to improve. The paper's weakness is that it doesn't align with recent research about how to evaluate LLMs in a meaningful way. Most of the experiments they did measured perplexity, but what matters is evaluating the actual generations of the models on actual use cases. I hope we'll see extensions of the paper to these areas in the near future. 🤞
Like Comment
To view or add a comment, sign in

104,756 followers

620 Posts

View Profile Follow

Philipp Schmid’s Post

More Relevant Posts

Explore topics