Philipp Schmid’s Post

Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

What is Group Relative Policy Optimization (GRPO)? Deepseek Coder v2 is the best open Code LLM rivaling GPT-4 on coding tasks. As part of the technical report, GRPO is mentioned as RLHF method, but what is it? 🤔 GRPO was introduced in the DeepSeekMath Paper earlier this year and is method in designed to improve improve mathematical reasoning capabilities with less memory consumption. Implementation 1️⃣ Generate multiple outputs for each input question using the current Policy 2️⃣ Score these outputs using a reward model 3️⃣ Average the rewards and use it as a baseline to compute the advantages 4️⃣ Update the Policy to maximize the GRPO objective, which includes the advantages and a KL term Insights 💡 GRPO doesn't need value function model, reducing memory and complexity 🔗 GPRO adds the KL term directly to the loss rather than in the reward 📈 GPRO improved GSM8K and MATH ~5% 👉 GPRO looks similar to RLOO method (available in TRL) 🔁 Used Iterative Approach to train new Reward Models 📊 RL data consisted of 144k CoT prompts from SFT dataset 🧠 Reward Model was trained using “Math-Shepherd” process RL is “boosting the correct response from TopK rather than the enhancement of fundamental capabilities.” DeepSeekMath: https://lnkd.in/eGAk_vbG Math-Shepherd: https://lnkd.in/etUVyBgm

7 Comments

Creds

Very helpful! Thanks for sharing these tips, Philipp Schmid! Keep contributing your knowledge; we're supporting your path. 🎉🚀 🚀

Rakesh Gohel

GRPO cool approach, though compute increases, we need less memory

Altiam Kabir

AI Educator | Built a 100K+ AI Community for AI Enthusiasts | Want to Learn & Earn with AI ? Join AI PlanetX | 250K+ Followers Across all Platforms

Wow, GRPO sounds like a game-changer for mathematical reasoning! Less memory, more efficiency - a win-win situation. Exciting innovation ahead! Philipp Schmid

Pradeep R

See more comments

To view or add a comment, sign in

More Relevant Posts

Artjom Shestajev

Product @ Clarifai | ex-Twilio
5mo Edited
Report this post
One of the first methods you can try to improve LLM's results is to spend more effort crafting your prompts. Now, many think this is just "the way you ask," while it is handier to think of prompts as... a programming language. When you write code (say, Python or Java), it is compiled as an "executive order" for the computer (let's say, for simplicity) to act upon. Every developer faces a situation from time to time when they are stuck because the code written does not match the idea in their head. Prompts are the same coding blocks but in natural language, so they are closer to thoughts than "print({weather_data})". This comes with drawbacks. For example, ambiguity is higher – LLMs' output is highly erratic and in some complex scenarios might never repeat. The prompt will also provide different results based on the target (LLM), so you'd better be careful about it. But think of it as a programming challenge to steer the LLM output in the direction you need without diving into fine-tuning (yet). You can even create templates with variables inside them to test and choose the ones that work best. Next, chain them together depending on the output received or any other external condition. Is prompt-steering a magic pill? Surely not. Can it improve results you get from generic LLMs? Definitely, and compared to other methods, with relatively low investment. #ai #ml #promptengineering #llm
1 Comment
Like Comment
To view or add a comment, sign in
Saurabh Kumar

Engineering @Adora | Prev Rapyuta(ML), Yahoo(ML), Nokia | IIT Delhi
3mo
Report this post
There are many LLMs in the market. GPT4, Claude, Gemini, Mistral, etc. But which LLM suits your needs the best and how to choose it? LLM benchmarks are often confusing. As a user which benchmark score should you be concerned about while choosing the right LLM? Here I am curating a list of popular(not exhaustive) benchmarks and what they aim to measure. 1. SQuAD (Stanford Question Answering Dataset): Tests reading comprehension and answering questions based on given passages. 2. RACE: Evaluate understanding of passages from real-world scenarios through multiple-choice questions. 3. HellaSwag: Tests common sense by predicting what could happen next in a situation. 4. LAMBADA: Checks if a model can predict the right final word in a passage by understanding the context. 5. BoolQ: Yes/no questions to be answered based on short passages. Tests query understanding. 6. MultiRC: Requires answering multiple choice questions by reasoning over multiple passages. 7. ARC: Multiple-choice reasoning questions requiring gathering information from a science database. 8. CommonsenseQA: Tests commonsense reasoning through multiple-choice questions about everyday scenarios. 9. OpenBookQA: Science multiple-choice questions to be answered using an open book of facts. 10. GSM8K: Tests advanced math word problem solving needing multi-step reasoning. 11. CodeXGLUE: Evaluates code understanding and generation abilities across programming languages. 12. APPS: Assesses program understanding skills for Python code. 13. HumanEval: Tests coding abilities by having the model write functional programs. 14. HumanEval-X: An extension of HumanEval with more challenging coding tasks. 15. PIQA: Questions about diagrams and situations to test physical reasoning abilities. 16. VCR: Tests understanding of situations depicted in images through question-answering. 17. NLVR2: Determines if a statement accurately describes a set of images. 18. MMLU: Combines text with images/videos to test multimodal language and vision skills. So choose your LLM by looking at their score on your desirable benchmark.

3 Comments
Like Comment
To view or add a comment, sign in
Manuel Romero

Co-Founder and CSO @ MAISA
2mo
Report this post
🚀 Exploring #LoRA and Full Finetuning in Large Language Models 🧵👇 🔍 New Research from Columbia University & Databricks Mosaic AI! 1️⃣ Instruction Finetuning with 100K Prompt-Response Pairs - 🔹 Finding: Full finetuning outperforms LoRA in specialized tasks like programming and math. - 🔹 Insight: Full finetuning adapts better to specific tasks, while LoRA's regularization maintains broader task performance. 2️⃣ Continued Pretraining with 10B Unstructured Tokens - 🔹Finding: Full finetuning excels in target domains, LoRA retains base model capabilities in other tasks. - 🔹 Insight: Full finetuning’s adaptability leads to better target task performance, but LoRA preserves a wider skill set. 3️⃣ Regularization Effectiveness - 🔹 Finding: LoRA offers stronger regularization than weight decay and dropout. - 🔹 Insight: LoRA maintains performance across diverse tasks and generates more varied outputs. Great for general-purpose use! 4️⃣ Rank of Learned Perturbations - 🔹 Finding: Full finetuning learns perturbations with a rank 10-100x higher than typical LoRA configs. - 🔹 Insight: Higher rank explains full finetuning’s performance but comes with higher memory and computational costs. 5️⃣ Best Practices for Finetuning with LoRA - 🔹 Finding: Proposes strategies for balancing target task performance and broader model capabilities. - 🔹 Insight: Achieving a balance between specialization and generalization leverages LoRA’s regularization benefits. Conclusion - Full finetuning maximizes specific domain performance but at higher costs and risk of overfitting. - LoRA provides a balanced approach, maintaining broader capabilities with less computational power. Read the full paper here: 🔗 https://lnkd.in/dhMegFSZ

LoRA Learns Less and Forgets Less

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
HumanFirst

4,946 followers
10mo
Report this post
Language Model Cascading & Probabilistic Programming Language The term Language Model Cascading (LMC) was coined in July 2022, which seems like a lifetime ago considering the speed at which the LLM narrative arc develops… A paper by Google Research explores the early forming on common current concepts... Considering elements like a Scratchpad and Chain-Of-Thought approach are but two of the most recognised approaches in recent times. The Tool Use description is very close to what we know today as autonomous agents. Selection-Inference is a basic description of RAG. And Verifiers really reminds of the recent test framework developed by Ragas. In this study of July 2022 the phrase cascading is used as an analogous term for chaining. In later studies cascading has adopted a different meaning. Read more about it here. The way descriptive terms are used in this paper, as opposed to the now well-known terms is interesting, and it is quite insightful how these ideas developed into real implementations. It’s interesting how the early vulnerabilities and opportunities were identified and developments on a few fronts, brought solutions to production. Developments took place in the area of LLMs, Prompt Engineering Techniques, Prompt Injection/enrichment at inference, Autonomous Agents and Prompt Chaining IDEs. #LargeLanguageModels #PromptEngineering #PromptChaining Read more here: https://lnkd.in/egnQpypJ
Like Comment
To view or add a comment, sign in
Kazem Meidani

PhD Candidate @ Carnegie Mellon University
6mo Edited
Report this post
Language Models are not good in numeric mathematical understanding? Happy to share our #ICLR2024 Spotlight paper: "SNIP: Symbolic-Numeric Integrated Pre-training", A multi-modal CLIP-like model for math! (https://lnkd.in/eYbttTfA) A collaboration with Parshin Shojaee, Chandan Reddy, and Amir Barati Farimani SNIP takes large-scale synthetic data of mathematical functions and their numeric data observations as two modalities, and learns powerful representations with the symmetric contrastive learning. SNIP representations show informative patterns for the non-trivial task of cross-modal property prediction between symbolic and numeric domains. We have also tested SNIP for more complex tasks such Symbolic Regression (discovery of governing math equations from data) and observe that SNIP representations provide great landscape for the search of optimal mathematical functions. We really hope this model can be applied to other tasks in science and spark more research on numeric-symbolic understanding in LLMs. We release our code & models publicly: 🔗Paper: https://lnkd.in/eYbttTfA 👨💻Code & Model: https://lnkd.in/efDR5dtg We have also open-soruced the code of testing SNIP on Symbolic Regression (SR) benchmarks. 👨💻SR Code & Model: https://lnkd.in/e4wWhPAz
21 Comments
Like Comment
To view or add a comment, sign in
Elvis S.

Co-founder at DAIR.AI | PhD | Prev: Meta AI, Galactica LLM, PapersWithCode, Elastic | Creator of the Prompting Guide (4M+ learners)
3mo
Report this post
LLMs as Compilers This work proposes a think-and-execute framework to decompose the reasoning process in language models. This helps to improve algorithmic reasoning in LLMs. - It first THINKS to discover a task-level logic to solve a task and express logic in pseudocode - It then EXECUTES to simulate the execution of the code with language models Apparently, the process of discovering task-level logic behind a task improves performance on algorithmic reasoning tasks. It also outperforms instance-specific reasoning approaches like chain-of-thought and program-of-thoughts. Pseudocode understanding and generation is powerful and this paper shows how it can improve reasoning in LLMs. This is fascinating as pseudocode is one of my favorite approaches to solving really complex problems and build complex programs. Similar to the paper I shared yesterday on visualization-as-thoughts, I think this trend continues where we borrow ideas from how humans solve problems to improve LLMs and how we interact with them. Paper: https://lnkd.in/eTNJtVWv -- You can also track my weekly summary of LLM papers and research developments here: https://lnkd.in/e6ajg945
2 Comments
Like Comment
To view or add a comment, sign in
Olimpiu P.

CTO | Technology Consultant | VP Engineering | Director Engineering | Head Of Engineering | Engineering Manager | Architect
3mo
Report this post
Andrej Karpathy famously mentioned that English is the hottest #programminglanguage. The question that I am trying to respond to is for how long will we need classical programming languages? The paper pointed out by Elvis S. experiments exactly in this direction: using #LLMs as #compiler. This is a good step forward.
Elvis S.

Co-founder at DAIR.AI | PhD | Prev: Meta AI, Galactica LLM, PapersWithCode, Elastic | Creator of the Prompting Guide (4M+ learners)
3mo

LLMs as Compilers This work proposes a think-and-execute framework to decompose the reasoning process in language models. This helps to improve algorithmic reasoning in LLMs. - It first THINKS to discover a task-level logic to solve a task and express logic in pseudocode - It then EXECUTES to simulate the execution of the code with language models Apparently, the process of discovering task-level logic behind a task improves performance on algorithmic reasoning tasks. It also outperforms instance-specific reasoning approaches like chain-of-thought and program-of-thoughts. Pseudocode understanding and generation is powerful and this paper shows how it can improve reasoning in LLMs. This is fascinating as pseudocode is one of my favorite approaches to solving really complex problems and build complex programs. Similar to the paper I shared yesterday on visualization-as-thoughts, I think this trend continues where we borrow ideas from how humans solve problems to improve LLMs and how we interact with them. Paper: https://lnkd.in/eTNJtVWv -- You can also track my weekly summary of LLM papers and research developments here: https://lnkd.in/e6ajg945
Like Comment
To view or add a comment, sign in
es/iode

751 followers
6mo
Report this post
📃Scientific paper: Faster parameterized algorithms for modification problems to minor-closed classes Abstract: Let G be a minor-closed graph class and let G be an n-vertex graph. We say that G is a k-apex of G if G contains a set S of at most k vertices such that G \ S belongs to G. Our first result is an algorithm that decides whether G is a k-apex of G in time 2 poly\(k\) • n 2. This algorithm improves the previous one, given by Sau, Stamoulis, and Thilikos \[ICALP 2020, TALG 2022\], whose running time was 2 poly\(k\) • n^3. The elimination distance of G to G, denoted by edG\(G\), is the minimum number of rounds required to reduce each connected component of G to a graph in G by removing one vertex from each connected component in each round. Bulian and Dawar \[Algorithmica 2017\] proved the existence of an FPT-algorithm, with parameter k, to decide whether edG\(G\) ≤ k. This algorithm is based on the computability of the minor-obstructions and its dependence on k is not explicit. We extend the techniques used in the first algorithm to decide whether edG\(G\) ≤ k in time 2^2^2^poly\(k\) • n^2. This is the first algorithm for this problem with an explicit parametric dependence in k. In the special case where G excludes some apex-graph as a minor, we give two alternative algorithms, one running in time 2^2^O\(k 2 log k\) • n 2 and one running in time 2 poly\(k\) • n^3. As a stepping stone for these algorithms, we provide an algorithm that decides whether edG\(G\) ≤ k in time 2 O\(tw•k+tw log tw\) • n, where tw is the treewidth of G. This algorithm combines the dynamic programming framework of Re... Continued on ES/IODE ➡️ https://etcse.fr/HZS ------- If you find this interesting, feel free to follow, comment and share. We need your help to enhance our visibility, so that our platform continues to serve you.

Faster parameterized algorithms for modification problems to minor-closed classes

ethicseido.com
Like Comment
To view or add a comment, sign in

104,753 followers

620 Posts

View Profile Follow

Philipp Schmid’s Post

More Relevant Posts

Explore topics