Recent studies have shown that while Large Language Models (LLMs) offer data-efficient learning, their size can pose real-world deployment challenges. Deploying a 175 billion LLM, for instance, requires a massive 350GB of GPU memory and specialized infrastructure! 😱 But, what if we could outperform these behemoths with smaller model sizes and less training data? Introducing "Distilling Step-by-Step" 🌪️✨, an innovative mechanism highlighted in the paper presented at #ACL2023. This approach allows for training smaller, task-specific models with significantly less training data than standard approaches, yet these models can outperform even the few-shot prompted 540B PaLM model! 🌐 How It Works: - Distilling step-by-step leverages informative natural language rationales from LLMs to train smaller models more efficiently. - These rationales explain connections between input and output. - The technique uses a multi-task problem framing to train the model with a new rationale generation task. 📉 Key Results: - Achieves better performance using up to 80% less training data on benchmark datasets. - A 770M parameter T5 model outperformed the 540B PaLM model, a more than 700x model size reduction! 📊 In Practice: With distilling step-by-step, not only is there a significant reduction in model size, but the amount of data required for training also sees a massive drop. This means we're looking at a future where high-performance language models are more accessible and feasible to deploy in real-world applications, even for smaller research teams! Cheers to the innovators behind this paradigm shift! 🥂 Reducing both deployed model size and the amount of training data required? Now that's what we call #AIRevolution! 🎉🔥 https://lnkd.in/gw2xavkd #LanguageModels #AI #Innovation #Distillation
AI Community’s Post
More Relevant Posts
-
LLM-Select: Feature Selection with Large Language Models D P. Jeong, Z C. Lipton, P Ravikumar Carnegie Mellon (2024) https://lnkd.in/ei5v2myA - LLMs can perform effective feature selection for supervised learning tasks by just using feature names and task descriptions, without seeing any training data. - The authors propose 3 methods for LLM-based feature selection: scoring features by importance (LLM-Score), ranking features (LLM-Rank), and sequentially selecting features via dialogue (LLM-Seq). - Sufficiently large LLMs like GPT-4 can identify predictive features just as well as traditional data-driven methods like LASSO, even with simple prompting strategies like zero-shot scoring features one at a time. - LLM-based feature selection works well across diverse real-world datasets, suggesting LLMs encode rich knowledge about real-world relationships that can inform feature selection. - LLMs may be useful not only for selecting features after data collection, but also for deciding what features to collect, especially in expensive domains like healthcare. abs: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM to output a numerical importance score for a feature (e.g., "blood pressure") in predicting an outcome of interest (e.g., "heart failure"), with no additional context. In particular, we find that the latest models, such as GPT-4, can consistently identify the most predictive features regardless of the query mechanism and across various prompting strategies. We illustrate these findings through extensive experiments on real-world data, where we show that LLM-based feature selection consistently achieves strong performance competitive with data-driven methods such as the LASSO, despite never having looked at the downstream training data. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place. This could potentially benefit practitioners in domains like healthcare, where collecting high-quality data comes at a high cost. #AI #machinelearning #computation #language #LLM
LLM-Select: Feature Selection with Large Language Models
arxiv.org
To view or add a comment, sign in
-
Applying Large Language Models to Graph-Structured Data: A New Frontier for AI 💡 (A review of Jin et al. 2023) Bringing together the freeform reasoning strengths of LLMs and rigorously organized knowledge in structured graphs promises to advance intelligent systems that combine linguistic and logical reasoning. The key insight driving progress in this direction recognizes most natural language use implicitly refers to relational real-world contexts spanning people, places, events and their meaningful connections. Humans seamlessly blend free-form communication with references to this relational knowledge. Progress in artificial intelligence aims to recapitulate similar capacities for situational reasoning in machines. Indeed, domain-specific blindspots and biases in current LLMs motivate the integration of external knowledge to mitigate factual inaccuracies and improve referencing. Enriching pre-trained models with structured data facilitates nuanced judgment across scenarios involving complex interdependent entities. This survey paper reveals the techniques to achieve exactly such objectives — amalgamating LLMs with structured knowledge networks. The paper first puts forth a systematic taxonomy of graphs based on attached text information. Pure graphs lack any semantic textual descriptors with examples being circuit diagrams or traffic networks. Here research questions evaluate if LLMs can implicitly solve fundamental graph analysis problems. Text-rich graphs annotate nodes and edges with content information as in research paper citations linked to abstracts or social media user profiles with messaging links. The dual objective is learning representations informed by text and topology. Finally, text-paired graphs supplement entire structures with relevant metadata as molecular graphs catalogued with compound characteristics in online databases. The textual clues assist in property prediction and structure generation tasks. Given these categories, researchers devised tailored techniques to teach LLMs graphical reasoning aligned to the data specifics: For pure graphs, encoding the topology into serial text or feature sequences helped test basic graph intelligence skills — whether directly predicting responses or demonstrating step-by-step inference chains when appropriately trained. For text-rich networks, modifications to model architecture enabled joint understanding of node/edge texts and their connectivity patterns — improving performance on analytic tasks requiring the fusion. However, open challenges remain in optimal encoding for LLM integration, scaling reasoning to dense graphs through approximation algorithms that respect complexity, pre-training generalized foundations before fine-tuning, and stability in adaptations blending multimodal knowledge. https://lnkd.in/gHXriPaT
Applying Large Language Models to Graph-Structured Data: A New Frontier for AI (A review of Jin et…
medium.com
To view or add a comment, sign in
-
I think there are some lessons here, something that has taken me a while to fully accept. Knowledge graphs are very important, but I'm beginning to realize that the triple store itself is morphing into something almost unrecognizable - something that needs to incorporate (and utilize) both language models and vector stores, something that can act as both source and increasingly as part of the language model pipeline. I see it in multiple subtle ways. Not surprisingly, many people who have been involved heavily in the semantic space are now at least versed in LLMs, because they are adjacent technologies in the broad industry graph. While I have misgivings about the whale models (ChatGPT, now Gemini and the various other super-Llamas and permutations) at the same time I recognize that the broader goal is knowledge management and (ultimately) generation. At the same time I think that it is incumbent upon ontologists to recognize that what they bring to the table - a deep understanding about the nature of language systems at a symbolic level - is still critically important here, because I do not believe that these problems can be solved exclusively with deep learning techniques as they are language problems, which is to say *human* problems.
Applying Large Language Models to Graph-Structured Data: A New Frontier for AI 💡 (A review of Jin et al. 2023) Bringing together the freeform reasoning strengths of LLMs and rigorously organized knowledge in structured graphs promises to advance intelligent systems that combine linguistic and logical reasoning. The key insight driving progress in this direction recognizes most natural language use implicitly refers to relational real-world contexts spanning people, places, events and their meaningful connections. Humans seamlessly blend free-form communication with references to this relational knowledge. Progress in artificial intelligence aims to recapitulate similar capacities for situational reasoning in machines. Indeed, domain-specific blindspots and biases in current LLMs motivate the integration of external knowledge to mitigate factual inaccuracies and improve referencing. Enriching pre-trained models with structured data facilitates nuanced judgment across scenarios involving complex interdependent entities. This survey paper reveals the techniques to achieve exactly such objectives — amalgamating LLMs with structured knowledge networks. The paper first puts forth a systematic taxonomy of graphs based on attached text information. Pure graphs lack any semantic textual descriptors with examples being circuit diagrams or traffic networks. Here research questions evaluate if LLMs can implicitly solve fundamental graph analysis problems. Text-rich graphs annotate nodes and edges with content information as in research paper citations linked to abstracts or social media user profiles with messaging links. The dual objective is learning representations informed by text and topology. Finally, text-paired graphs supplement entire structures with relevant metadata as molecular graphs catalogued with compound characteristics in online databases. The textual clues assist in property prediction and structure generation tasks. Given these categories, researchers devised tailored techniques to teach LLMs graphical reasoning aligned to the data specifics: For pure graphs, encoding the topology into serial text or feature sequences helped test basic graph intelligence skills — whether directly predicting responses or demonstrating step-by-step inference chains when appropriately trained. For text-rich networks, modifications to model architecture enabled joint understanding of node/edge texts and their connectivity patterns — improving performance on analytic tasks requiring the fusion. However, open challenges remain in optimal encoding for LLM integration, scaling reasoning to dense graphs through approximation algorithms that respect complexity, pre-training generalized foundations before fine-tuning, and stability in adaptations blending multimodal knowledge. https://lnkd.in/gHXriPaT
Applying Large Language Models to Graph-Structured Data: A New Frontier for AI (A review of Jin et…
medium.com
To view or add a comment, sign in
-
Does higher uncertainty (lower confidence) of language models necessarily tie with poorer generation? Check our latest paper "Uncertainty in Language Models: Assessment through Rank-Calibration.” 🔗 https://lnkd.in/eEpnDMeg Despite the impressive generative capabilities of LMs, principled and unified assessments for the quality of various uncertainty and confidence measures remain a challenge. Our work introduces “Rank-Calibration”, a novel framework offering a practical and principled assessment for uncertainty and confidence measures of LMs. Key Highlights: * 🔍 Deep dive into why accurately assessing LMs' uncertainty/confidence levels is challenging for advancing AI reliability. * 🌟 Introducing Rank-Calibration: A groundbreaking method that correlates higher uncertainty with lower text generation quality, offering a nuanced assessment without binary correctness thresholds. * 🛠️ Empirical demonstrations of our methods' broad applicability and granular interpretability across multiple tasks and LMs, including the challenging long-form Meadow benchmark. * 📈 Extensive experiments validating the effectiveness and robustness of our approach, promising a significant tool towards trustworthy language modeling. Dive into our study to explore how we're paving the way for more reliable and interpretable Language generation. A must-read for AI researchers, practitioners, and enthusiasts aiming to push the boundaries of AI safety and effectiveness! #AI #LanguageModels #UncertaintyQuantification #RankCalibration #MachineLearning #ArtificialIntelligence
2404.03163.pdf
arxiv.org
To view or add a comment, sign in
-
LLM-Select: Feature Selection with Large Language ModelsD P. Jeong, Z C. Lipton, P Ravikumar Carnegie Mellon (2024)https://lnkd.in/grKCbhcg LLMs can perform effective feature selection for supervised learning tasks by just using feature names and task descriptions, without seeing any training data.- The authors propose 3 methods for LLM-based feature selection: scoring features by importance (LLM-Score), ranking features (LLM-Rank), and sequentially selecting features via dialogue (LLM-Seq). - Sufficiently large LLMs like GPT-4 can identify predictive features just as well as traditional data-driven methods like LASSO, even with simple prompting strategies like zero-shot scoring features one at a time.- LLM-based feature selection works well across diverse real-world datasets, suggesting LLMs encode rich knowledge about real-world relationships that can inform feature selection.- LLMs may be useful not only for selecting features after data collection, but also for deciding what features to collect, especially in expensive domains like healthcare.abs: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM to output a numerical importance score for a feature (e.g., "blood pressure") in predicting an outcome of interest (e.g., "heart failure"), with no additional context. In particular, we find that the latest models, such as GPT-4, can consistently identify the most predictive features regardless of the query mechanism and across various prompting strategies. We illustrate these findings through extensive experiments on real-world data, where we show that LLM-based feature selection consistently achieves strong performance competitive with data-driven methods such as the LASSO, despite never having looked at the downstream training data. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place. This could potentially benefit practitioners in domains like healthcare, where collecting high-quality data comes at a high cost.#AI #machinelearning #computation #language #LLM
LLM-Select: Feature Selection with Large Language Models
arxiv.org
To view or add a comment, sign in
-
GNN-RAG: A Novel AI Method for Combining Language Understanding Abilities of LLMs with the Reasoning Abilities of GNNs in a Retrieval-Augmented Generation (RAG) Style Researchers from the University of Minnesota introduced GNN-RAG, an efficient approach for enhancing RAG in KGQA, which utilizes GNNs to handle complex graph data within KGs. While GNNs lack natural language understanding, they excel at graph representation learning. GNN-RAG employs GNNs for retrieval by reasoning over dense KG subgraphs to identify answer candidates. Then, it extracts the shortest paths connecting question entities and GNN-derived answers, verbalizes these paths, and feeds them into LLM reasoning via RAG. Also, LLM-based retrievers can augment GNN-RAG to enhance KGQA performance further. The GNN-RAG framework integrates GNNs for dense subgraph reasoning, followed by retrieval of candidate answers and extraction of reasoning paths within the KG. These paths are then verbalized and fed into an LLM-based RAG system for KGQA. GNNs, chosen for their ability to handle complex graph interactions and multi-hop questions, retrieve reasoning paths crucial for KGQA. Various GNN architectures, influenced by the choice of pre-trained language models, offer distinct outputs, enhancing RAG-based KGQA. Conversely, while LLMs contribute to KGQA, they are better suited for single-hop questions due to their natural language understanding. Retrieval Augmentation (RA) techniques, such as combining GNN and LLM-based retrievals, improve answer diversity and recall, enhancing overall KGQA performance. Quick read: https://lnkd.in/gZBe9zAu Paper: https://lnkd.in/gEtsCNT3 #artificialintelligence #rag #ai
GNN-RAG: A Novel AI Method for Combining Language Understanding Abilities of LLMs with the Reasoning Abilities of GNNs in a Retrieval-Augmented Generation (RAG) Style
https://www.marktechpost.com
To view or add a comment, sign in
-
innovative AI Content writer and Strategist | Public Procurement Excellence Advocate | Advisor for AI Solutions.
LitGPT: Pretrain, Finetune, and Deploy Large Language Models Using Advanced Techniques [LitGPT](https://lnkd.in/eqMdzJYp) is an innovative open-source project that enables developers and researchers to work with large language models (LLMs) using advanced techniques. Developed by a team of experts, LitGPT provides a comprehensive framework for pretraining, finetuning, and deploying LLMs, such as GPT-3 and its variants. One of the key features of LitGPT is its support for a wide range of model architectures, training algorithms, and optimization techniques. This flexibility allows users to experiment with different approaches and find the most effective solutions for their specific use cases. Whether you're working on natural language processing, text generation, or multimodal applications, LitGPT can be a valuable tool in your arsenal. Another notable aspect of LitGPT is its emphasis on scalability and efficiency. The framework is designed to leverage the latest advancements in hardware acceleration, enabling users to train and deploy their models on powerful GPU-accelerated infrastructure. This makes it easier to work with large-scale language models without being constrained by computational resources. LitGPT also places a strong emphasis on model interpretability and safety. The project includes tools and techniques for analyzing the inner workings of LLMs, helping users gain a deeper understanding of how these models function and make decisions. This, in turn, can inform the development of more transparent and trustworthy AI systems. Whether you're a machine learning researcher, a data scientist, or a developer working on cutting-edge applications, LitGPT can be a valuable addition to your toolkit. By providing a robust and flexible framework for working with large language models, the project aims to accelerate the development and deployment of innovative AI solutions.
To view or add a comment, sign in
-
-
Top 2% Scientist by Stanford University, Founder of Indian Knowledge Forum, IoT Researcher, Generative AI Enthusiast, Indian Knowledge Bearer, FIETE, Technical Evangelist, Promoter of Indian Knowledge
Small Language Models (SLMs) are a type of AI language model that have a significantly smaller number of parameters compared to large language models (LLMs). While LLMs can have hundreds of billions of parameters, SLMs typically have under 10 billion parameters. The reduced size of SLMs offers several advantages, including: 1. Efficiency: SLMs require less computational power and memory, making them more efficient to train and deploy. This can lead to cost savings and faster processing times. 2. Accessibility: The lower resource requirements of SLMs make them more accessible to a wider range of users and organizations who may not have access to the extensive computational resources needed for LLMs. 3. Customization: SLMs can be more easily fine-tuned and adapted to specific domains or tasks, as their smaller size allows for quicker retraining and optimization. 4. Deployment: The compact size of SLMs enables easier deployment on edge devices and in resource-constrained environments, expanding their potential applications. However, the reduced size of SLMs may come with some trade-offs in terms of language processing capabilities, depending on the specific benchmarks and tasks being evaluated. Despite this, some SLMs have demonstrated impressive performance that rivals or even surpasses that of much larger models in certain areas. Notable examples of SLMs include Microsoft's Phi-2 (2.7 billion parameters), which has shown strong performance in mathematical reasoning, common sense, language understanding, and logical reasoning; and various scaled-down versions of Google's BERT model, such as DistilBERT, Mini-BERT, and MobileBERT, which are optimized for different constraints and applications. As SLMs continue to evolve and improve, they may signal a shift towards more efficient and accessible language models that can be more easily adopted and customized by businesses and organizations to suit their specific needs, potentially democratizing access to powerful language AI technology. #AI #GenerativeAI #LLM #smalllanguagemodel
To view or add a comment, sign in
-
Data Engineering | DataScience | AI & Innovation | Author | Follow me for deep dives on AI & data-engineering
Unlocking Powerful Language AI with 8x Less Memory: Introducing LLMLingua-2 from Microsoft Research ... Imagine getting state-of-the-art results from GPT-4 on complex language tasks while using 8 times less GPU memory. That's the potential offered by LLMLingua-2, the second version of the compression technology developed by Microsoft. 👉 A Smarter Way to Compress Prompts LLMLingua-2 introduces a novel approach to make prompts for large language models (LLMs) shorter and more efficient, without losing crucial information. The key innovations are: 1. "Data Distillation": The researchers used GPT-4 itself to compress prompts while preserving essential content. They created a new dataset of compressed prompts by applying this to the MeetingBank corpus. 2. "Bidirectional Token Classification": Instead of using unidirectional language models, LLMLingua-2 treats prompt compression as a token classification task. A Transformer encoder captures context from both directions to decide which tokens to keep. 3. "Optimizing the Compression Objective": Smaller models like XLM-RoBERTa are trained to explicitly optimize for compressing prompts. This allows the method to be much more efficient than using massive LLMs. 👉 Very Good Results Across Benchmarks The Microsoft Research team put LLMLingua-2 to the test on a wide range of language tasks, from question answering to summarization to reasoning. Across the board, it outperformed strong baselines like the original LLMLingua and Selective-Context methods. Even more impressive, LLMLingua-2 showed remarkable generalization ability. The same compression models worked well with different LLMs (GPT-3.5 to Mistral-7B) and even in different languages (English to Chinese). The compressed prompts were able to reconstruct the original text without losing key information. 👉 Enabling Practical and Scalable Language AI The efficiency improvements provided by LLMLingua-2 are highly beneficial for real-world applications. Compared to existing prompt compression techniques, LLMLingua-2 runs 3 to 6 times faster. Furthermore, it accelerates end-to-end inference with large language models by 1.6 to 2.9 times. Most impressively, LLMLingua-2 reduces GPU memory requirements by a factor of 8. I highly recommend reading the full paper for all the details. Congratulations to the Microsoft Research team on this impressive work advancing the state of the art in efficient language AI!
To view or add a comment, sign in
This is truly remarkable! The 'Distilling Step-by-Step' technique is definitely a revolutionary solution, diminishing model sizes and data requirements and opening doors to broader AI accessibility. How do you imagine 'Distilling Step-by-Step' impacting the deployment of AI in industries, such as legal services or environmental science, where resource-efficient models are crucial?