Manuel Romero’s Post

Co-Founder and CSO @ MAISA

3mo

🚀 Exploring #LoRA and Full Finetuning in Large Language Models 🧵👇 🔍 New Research from Columbia University & Databricks Mosaic AI! 1️⃣ Instruction Finetuning with 100K Prompt-Response Pairs - 🔹 Finding: Full finetuning outperforms LoRA in specialized tasks like programming and math. - 🔹 Insight: Full finetuning adapts better to specific tasks, while LoRA's regularization maintains broader task performance. 2️⃣ Continued Pretraining with 10B Unstructured Tokens - 🔹Finding: Full finetuning excels in target domains, LoRA retains base model capabilities in other tasks. - 🔹 Insight: Full finetuning’s adaptability leads to better target task performance, but LoRA preserves a wider skill set. 3️⃣ Regularization Effectiveness - 🔹 Finding: LoRA offers stronger regularization than weight decay and dropout. - 🔹 Insight: LoRA maintains performance across diverse tasks and generates more varied outputs. Great for general-purpose use! 4️⃣ Rank of Learned Perturbations - 🔹 Finding: Full finetuning learns perturbations with a rank 10-100x higher than typical LoRA configs. - 🔹 Insight: Higher rank explains full finetuning’s performance but comes with higher memory and computational costs. 5️⃣ Best Practices for Finetuning with LoRA - 🔹 Finding: Proposes strategies for balancing target task performance and broader model capabilities. - 🔹 Insight: Achieving a balance between specialization and generalization leverages LoRA’s regularization benefits. Conclusion - Full finetuning maximizes specific domain performance but at higher costs and risk of overfitting. - LoRA provides a balanced approach, maintaining broader capabilities with less computational power. Read the full paper here: 🔗 https://lnkd.in/dhMegFSZ

LoRA Learns Less and Forgets Less

arxiv.org

1 Comment

Venkatachalam Thiruppathi

Assistant Professor, ECE Dept, PSG College of Technology

3mo

Vimalathithan Rathinasabapathy

To view or add a comment, sign in

More Relevant Posts

Cristina (Crista) Lopes
6mo
Report this post
LLMs show "great promise" in code synthesis. Can they keep the promise and ensure that the synthesized code is provably correct? Here's our work on synthesizing formally verified Dafny methods. LLMs don't know much about Dafny, but they learn fast with proper RAG-CoT prompts. We spent 6 weeks hand-coding 50 verified algorithms in Dafny, and then GPT4 was able to generate 103 new ones with the right postconditions and the necessary verification hints. Md Rakib Hossain (Misu) co-led the work, with assistance from Iris Ma. Joint work with James Noble. Preprint: https://lnkd.in/gaUqrMHV Github: https://lnkd.in/gM_nB8qD

Towards AI-Assisted Synthesis of Verified Dafny Methods

arxiv.org

2 Comments
Like Comment
To view or add a comment, sign in
Abhiram Singh

Principal Engineer at AWS
3mo
Report this post
Exploring Key LLM Benchmarks: Understanding Model Capabilities and Limitations Why Benchmarks Matter: As the popular saying goes, "If you can’t measure it, you can’t improve it." Benchmarks are essential for assessing LLM performance, offering standardized metrics that enable comparisons across models. They highlight strengths, pinpoint weaknesses, and track progress, playing an essential role in guiding decisions during LLM development and deployment. Popular Benchmarks by Category: - General Language Understanding: - MMLU (Massive Multitask Language Understanding) - SuperGLUE - LAMBADA (Language Modeling Broadened to Account for Discourse Aspects) - Reasoning and Inference: - HellaSwag - BBHard - ARC (AI2 Reasoning Challenge) - DROP (Discrete Reasoning Over the Content of Paragraphs) - Mathematical Reasoning: - GSM-8K (Grade School Math - 8K problems) - MATH - Coding and Programming: - HumanEval - MBPP (Mostly Basic Python Problems) - Knowledge and Factual Recall: - TriviaQA Limitations: While benchmarks are invaluable tools, they come with limitations. One major issue is the potential for models to memorize rather than genuinely understand. To mitigate this, strategies include developing more comprehensive public benchmarks, utilizing confidential benchmarks, regularly updating benchmarks, and applying diverse evaluation methods. Benchmarks are indispensable for evaluating LLM performance, but it is also important to recognize their limitations to develop applications that deliver evidence-based, measurable outcomes. This paper[https://lnkd.in/gpd3ASMr], "A Survey on Evaluation of Large Language Model" presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
Like Comment
To view or add a comment, sign in
Ravinthiran Partheepan

Co-Founder and CTO at SFIE
7mo Edited
Report this post
I'm thrilled to share an open-source ML project I have been working on – introducing VertezML, a library rooted in the principles of Linear Algebra, Calculus, Graph Theory, Probability, and Statistics. 🌐 Accessible APIs: C++, Python, Typescript, and C# 📚 Library Links (New functions are in progress for all APIs): - C++ and Python Library (Written in C++ and wrapped in Python as the frontend.): https://lnkd.in/dJfeppht - C# Library: https://lnkd.in/d2sPyiJr - Typescript Library: https://lnkd.in/dbbHEehX 🔗 GitHub Repositories: - C++ and Python: https://lnkd.in/df3cxfJF - Typescript: https://lnkd.in/dhFfbqYV - C#: https://lnkd.in/dMUyBn3u 📖Vertez: Math behind Machine Learning Book: https://a.co/d/cVwQtgJ 🌟 Your Insights Matter: I invite you to explore, experiment, and share your thoughts! Drop your feedback or suggestions for improvements. And, if you find the libraries helpful, consider leaving a ⭐️ on GitHub! #machinelearning #datastructuresandalgorithms #opensource #calculus #graphtheory #linearalgebra #probability #statistics #c #csharp #typescript #python #api #VertezML #DeveloperFeedback #datascience #github

GitHub - ravinthiranpartheepan1407/vertez: Simplify complex scientific computations with Vertez's intuitive framework.

github.com
Like Comment
To view or add a comment, sign in
AI topics

808 followers
3mo
Report this post
pub.towardsai.net: The author discusses the potential of hardware AI assistants and their impact on creating an 'action model'. The article outlines the process of creating an AI assistant using Python and open-source tools, focusing on prompt engineering and tool use to enhance the model's capabilities. The author also details the use of Phidata to structure the language model's responses and create an assistant with access to smart home APIs. The article concludes with plans to integrate the assistant with physical hardware and provides resources for further exploration.

Creating a Smart Home AI Assistant

pub.towardsai.net
Like Comment
To view or add a comment, sign in
MagiCode (YC S24)

162 followers
3mo
Report this post
🙄 How specific should your prompt be, when working with LLMs? The recent paper on "Testing LLMs on Code Generation with Varying Levels of Prompt Specificity" sheds light on the capabilities of LLMs like GPT-3.5, GPT-4, Claude-2, and Bard in generating Python code, highlighting their adaptability and proficiency across different prompt styles. 🔍 Looking ahead, the study suggests several promising avenues for future research in this field: 1. Exploration of Different LLMs: Beyond the models studied, such as Llama 2 and Google's upcoming Gemini, could provide a more comprehensive understanding of LLM capabilities. 2. Testing in Different Programming Languages: Extending research to other languages could reveal the universality of LLM capabilities and their reasoning ability across programming paradigms. 3. Examination of More Complex Problems: Challenging LLMs with intricate problems could gauge their depth of understanding, logical reasoning, and problem-solving skills. 4. Experimentation with Varied Prompt Types: Exploring different types of prompts, including visual cues or real-world context, could provide insights into how LLMs interpret diverse inputs. 🌟 These future directions hold immense potential to advance our understanding of LLMs and their applications in automated code generation, paving the way for more efficient and effective software development practices. Stay tuned for more updates as we continue to explore the possibilities of AI in software engineering at MagiCode! 🌟 Read the paper here: https://lnkd.in/g7GdPetd
Like Comment
To view or add a comment, sign in
Alireza Salemi

Student Researcher at Google DeepMind | Research Assistant at University of Massachusetts Amherst
4mo
Report this post
Are you interested in evaluating RAG pipelines? Take a look at our #SIGIR2024 paper! This paper also comes with a Python package to easily evaluate your retrieval models in RAG pipelines! "Evaluating Retrieval Quality in Retrieval-Augmented Generation" https://lnkd.in/eyahWwuQ Hamed Zamani Evaluating RAG presents challenges, particularly for retrieval models within these systems. Traditional end-to-end evaluation methods are computationally expensive. Furthermore, the evaluation of the retrieval model's performance based on query-document relevance labels shows a small correlation with the RAG system's downstream performance. Furthermore, we found that using LLMs, such as Mistral, for relevance labeling and evaluating retrieval models based on those labels also demonstrates a small correlation (i.e., close to zero correlation for most of the cases) with the RAG system's downstream performance. To address those, we introduce eRAG, a new approach for evaluating retrieval models in RAG systems. It is believed that the main purpose of the retrieval model in the RAG pipeline is to serve the predictive model with information that helps generation. That said, the best judge for evaluating the retrieval results is the LLM that operates as the text generator in the RAG pipeline. Therefore, in eRAG, each document in the retrieval list is individually utilized by the large language model within the RAG system. The output generated for each document is then evaluated based on the downstream task ground truth labels and metrics. In this manner, the downstream performance for each document serves as its relevance label. We employ various downstream task metrics to obtain document-level annotations and aggregate them using set-based or ranking metrics. Extensive experiments on a wide range of datasets on tasks such as question answering, fact-checking, and dialogue generation, demonstrate that eRAG achieves a higher correlation with downstream end-to-end RAG performance compared to baseline methods, with improvements in Kendall's tau correlation ranging from 0.168 to 0.494. Additionally, eRAG offers significant computational advantages, improving runtime and consuming up to 50 times less GPU memory than end-to-end evaluation! There are many more interesting insights and findings in this paper! Please take a look at our paper! This paper also comes with a Python package that you can use in your research easily! 'pip install erag' The documentation and some examples for this library are provided on our GitHub page, please take a look! https://lnkd.in/eDdxnwSq

Evaluating Retrieval Quality in Retrieval-Augmented Generation

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
Dilip Prasad

Professor at UiT The Arctic University of Norway, AI Book Author, Kauffman Global Scholar, NTU, IIT Dhanbad, Top 2% Scientist Stanford Univ. List, MedTech and AI startup co-founder, Advisor & Mentor for AI startups
5mo Edited
Report this post
🌌 Varahamihira's Scriptures & The Design of a Programming Language 🌟 Can we draw inspiration from Varahamihira, the ancient mathematician and astronomer, and his scriptures like the "Surya Siddhanta"? Introducing a simple programming language that takes its essence from Varahamihira's profound wisdom. With a foundational set of 16 keywords, including syntax for conditional statements and variables, this conceptual language marries ancient insights with the architecture of modern programming languages. Keywords and Concepts 1. Prithvi (Earth) - Represents integer data type 2. Jala (Water) - Represents floating-point data type 3. Agni (Fire) - Represents string data type 4. Vayu (Air) - Represents boolean data type 5. Akasha (Ether) - Represents an undefined or null value 6. Graha (Planet) - Function declaration keyword 7. Gati (Motion) - For loop 8. Sthiti (Position) - While loop 9. Yoga (Conjunction) - If condition 10. Viparita (Opposition) - Else condition 11. Kriya (Action) - Keyword for performing an action or calling a function 12. Samyoga (Combination) - Logical AND 13. Viyoga (Separation) - Logical OR 14. Drishti (Sight) - Print or output function 15. Anubandha (Link) - Variable assignment 16. Phala (Result) - Return statement The syntax and keywords draw from Vedic concepts and Sanskrit terms, creating a bridge between historical knowledge and modern computational logic. This conceptual language serves as an homage to Varahamihira's legacy, imagining how ancient wisdom can inform and inspire contemporary technology design. This type of designing a programming language task can serve as an interesting exercise in variety of computer science course from compiler design, design principles for programming, data structures, programming language etc.
Like Comment
To view or add a comment, sign in
Saurabh Kumar

Engineering @Adora | Prev. Rapyuta(ML), Yahoo(ML), Nokia | IIT Delhi
5mo
Report this post
There are many LLMs in the market. GPT4, Claude, Gemini, Mistral, etc. But which LLM suits your needs the best and how to choose it? LLM benchmarks are often confusing. As a user which benchmark score should you be concerned about while choosing the right LLM? Here I am curating a list of popular(not exhaustive) benchmarks and what they aim to measure. 1. SQuAD (Stanford Question Answering Dataset): Tests reading comprehension and answering questions based on given passages. 2. RACE: Evaluate understanding of passages from real-world scenarios through multiple-choice questions. 3. HellaSwag: Tests common sense by predicting what could happen next in a situation. 4. LAMBADA: Checks if a model can predict the right final word in a passage by understanding the context. 5. BoolQ: Yes/no questions to be answered based on short passages. Tests query understanding. 6. MultiRC: Requires answering multiple choice questions by reasoning over multiple passages. 7. ARC: Multiple-choice reasoning questions requiring gathering information from a science database. 8. CommonsenseQA: Tests commonsense reasoning through multiple-choice questions about everyday scenarios. 9. OpenBookQA: Science multiple-choice questions to be answered using an open book of facts. 10. GSM8K: Tests advanced math word problem solving needing multi-step reasoning. 11. CodeXGLUE: Evaluates code understanding and generation abilities across programming languages. 12. APPS: Assesses program understanding skills for Python code. 13. HumanEval: Tests coding abilities by having the model write functional programs. 14. HumanEval-X: An extension of HumanEval with more challenging coding tasks. 15. PIQA: Questions about diagrams and situations to test physical reasoning abilities. 16. VCR: Tests understanding of situations depicted in images through question-answering. 17. NLVR2: Determines if a statement accurately describes a set of images. 18. MMLU: Combines text with images/videos to test multimodal language and vision skills. So choose your LLM by looking at their score on your desirable benchmark.

3 Comments
Like Comment
To view or add a comment, sign in
Jatin Kumar

Data Analyst | Machine Learning | Data Talks| Computer Vision | 5⭐SQL at hacker rank
2mo
Report this post
📊🔍 Unlock Insights with Correlation Analysis Pearson Correlation Coefficient: Measures the linear relationship between numerical variables (Pearson's r). Spearman Rank Correlation: Non-parametric test for monotonic relationships. Kendall's Tau: Non-parametric measure for ranked variables. Variable Types: Applies to continuous, ordinal, and dichotomous variables. Applications: Feature selection, multicollinearity detection, and model evaluation. Limitations: Correlation ≠ causation, linearity assumption, outlier sensitivity. Enhancing ML Models: Informing feature engineering and algorithm selection. Unlock Insights: Make Informed Decisions. #datascience #tech #code #technology #business #ml #statistics #AI #dataanalysis #correlation #python #medium https://lnkd.in/g85iPnvS

🔍📊Correlation analysis Detailed explanation with Code🔢📈

medium.com

4 Comments
Like Comment
To view or add a comment, sign in
Gina Acosta Gutiérrez

Data Analyst | Data Engineer | AI | GenAI | Mentor | Google WTM Ambassador | Python | Tableau | Snowflake | Power BI | Business Intelligence
7mo Edited
Report this post
The first days of this year I posted a publication with this title "𝗧𝗵𝗶𝘀 𝗡𝗲𝘄 𝗟𝗟𝗠 𝗖𝗼𝘂𝗿𝘀𝗲 𝗴𝗼𝘁 𝟳,𝟳𝟬𝟬+ 𝗦𝘁𝗮𝗿𝘀 𝗶𝗻 𝗟𝗲𝘀𝘀 𝘁𝗵𝗮𝗻 𝗮 𝗪𝗲𝗲𝗸! 🤯". In that post, I shared the news that "Large Language Model Course" blew up on Github the first week of January, hitting over 11,500 stars!🌟 As of today, it has surpassed 20,000 stars on Github! If you're looking to explore and learn LLMs this is the Github you have to check out. It's a course on Large Language Models with detailed roadmaps, hands-on Colab notebooks and step-by-step guides. Here's everything it covers, including roadmaps for different roles and level of knowledge: The LLM course is divided into three parts: ✅ 𝗟𝗟𝗠 𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 section covers mathematics, Python, and neural networks essential for understanding LLMs. ✅ 𝗧𝗵𝗲 𝗟𝗟𝗠 𝗦𝗰𝗶𝗲𝗻𝘁𝗶𝘀𝘁 section delves into building LLMs with advanced techniques. It discusses topics like understanding transformer architecture, creating instruction datasets, pre-training models, and reinforcement learning from human feedback. ✅ 𝗧𝗵𝗲 𝗟𝗟𝗠 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 section focuses on practical applications, like creating and deploying LLM-based solutions. The course includes a rich collection of notebooks and articles covering various aspects of LLMs, such as fine-tuning different models, quantization techniques, merging LLMs, and understanding decoding strategies. Each topic is supported by step-by-step guides and practical examples in Google Colab. 🌟𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀: - The LLM architecture - Building an instruction dataset - Pre-training models - Supervised fine-tuning - Reinforcement Learning from Human Feedback - Evaluation - Quantization - Inference optimization 🔗Here's the link to the course: https://lnkd.in/eVPj3qPV P.S.: The infographic below is inspired by one of the three roadmaps 📚 👉 Follow Gina Acosta Gutiérrez 👩🏻💻 for more resources! ➡ Get the latest AI news, tools, tutorials and guides on using popular AI tools for FREE here: www.joinhorizon.ai #python #data #datascience #machinelearning #ai #artificialintelligence #programming #dataanalysis #analytics #productivity #innovation #coding #tech #softwareengineer #developer #technology
18 Comments
Like Comment
To view or add a comment, sign in

3,724 followers

View Profile Follow

Manuel Romero’s Post

LoRA Learns Less and Forgets Less

arxiv.org

More from this author

Analogía gestión empresa con las batallas en la Edad Antigua

Inspirational quote in your terminal (test post)

Explore topics