AI at Meta’s Post

View organization page for AI at Meta, graphic

839,581 followers

1mo

In April, we published a research paper on a new approach for building better and faster LLMs by using multi-token prediction. Using this approach, we can train language models to predict multiple future words at once, improving model capabilities and training efficiency while allowing for faster inference. In the spirit of responsible open science, we’ve released pre-trained models for code completion using this approach to enable further exploration in the research community. Get the model on Hugging Face ➡️ https://go.fb.me/dm1giu More on this approach ➡️ https://go.fb.me/x1zhdq

46 Comments

Mariusz Nitecki

LLM Expert & Data Scientist Specializing in Advanced LLM Applications, LLM Implementations and Scalable Data Solutions

1mo

I'm curious if their multi-token model not only outperforms their own baseline but also the top models of a similar size. It works well for generative tasks, but the paper indicates mixed results on multiple-choice question benchmarks. Also see https://arxiv.org/abs/2401.10774

8 Reactions

Arkestrateon

1mo

Maintaining accuracy and efficiency is a 'precisive' technique, follows 'variable' methodology with 'angular' technology. Focus times on Complexity (Agile), Dependency (mitigates), Intensity (Resource management), ETL Data Processing (WLB) with relevance. Coordinative Management is effective tool with CI along with rule infused baseline targeting.

2 Reactions

Vincent Granville

Chief AI Scientist, GenAItechLab.com

1mo

You are not the first one to use multi-tokens; I started earlier than April. I also use contextual tokens. See https://mltblog.com/4aHYM4i

40 Reactions

Big Bang AI

The new approach using multi-token prediction sounds like a significant step forward in enhancing the efficiency and capability of LLMs. It's inspiring to see such commitment to responsible open science by releasing pre-trained models for code completion. At @TheBigBangAI, we're equally passionate about the advancements in AI and their applications. We're excited to dive deeper into your research and share insights with our community!

COSMOS AI Research Group, YTU CE

1mo

Wow, are we witnessing another "Attention is all you need" moment?

8 Reactions

Praudyogic AI

1mo

An interesting approach. Keen to play around with it!

2 Reactions

AIxBlock

Congratulations on the publication! The new approach using multi-token prediction for building better and faster LLMs is a significant advancement. Kudos to the team! 🚀AI at Meta

Dr. Timo Reckling

Software Consultant at TNG Technology Consulting GmbH at TNG Technology Consulting

1mo

(Disclaimer: I haven't read the paper, yet.) Probably a provocative question: Any thoughts on why the paper was published in April and the model only released now?

2 Reactions

Callmentor

Excellent work! Exciting news!

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Charles Cozad

Manager Solution Architecture at Aya Healthcare
1mo
Report this post
The next logical step after getting good with single word prediction. How far will we go down the branch prediction path?
AI at Meta

839,581 followers
1mo

In April, we published a research paper on a new approach for building better and faster LLMs by using multi-token prediction. Using this approach, we can train language models to predict multiple future words at once, improving model capabilities and training efficiency while allowing for faster inference. In the spirit of responsible open science, we’ve released pre-trained models for code completion using this approach to enable further exploration in the research community. Get the model on Hugging Face ➡️ https://go.fb.me/dm1giu More on this approach ➡️ https://go.fb.me/x1zhdq
Like Comment
To view or add a comment, sign in
Priyanka Shirude - Dhase

Engineer
1mo
Report this post
Something very interesting in the field of NLP!!
AI at Meta

839,581 followers
1mo

In April, we published a research paper on a new approach for building better and faster LLMs by using multi-token prediction. Using this approach, we can train language models to predict multiple future words at once, improving model capabilities and training efficiency while allowing for faster inference. In the spirit of responsible open science, we’ve released pre-trained models for code completion using this approach to enable further exploration in the research community. Get the model on Hugging Face ➡️ https://go.fb.me/dm1giu More on this approach ➡️ https://go.fb.me/x1zhdq
Like Comment
To view or add a comment, sign in
Additional Insights

4,249 followers
3w
Report this post
This multi-token prediction approach sounds revolutionary for LLMs! Predicting multiple words at once could significantly enhance both efficiency and capability. But how does this method impact the accuracy of predictions in more complex contexts? Releasing pre-trained models for code completion is a fantastic step for collaborative progress. However, as we advance, how do we ensure these models are responsibly used and continuously improved? Exciting times ahead for the research community and AI development! #MultiTokenPrediction #AIResearch #FutureOfAI
AI at Meta

839,581 followers
1mo

In April, we published a research paper on a new approach for building better and faster LLMs by using multi-token prediction. Using this approach, we can train language models to predict multiple future words at once, improving model capabilities and training efficiency while allowing for faster inference. In the spirit of responsible open science, we’ve released pre-trained models for code completion using this approach to enable further exploration in the research community. Get the model on Hugging Face ➡️ https://go.fb.me/dm1giu More on this approach ➡️ https://go.fb.me/x1zhdq
Like Comment
To view or add a comment, sign in
James M. Tucker

Senior Data Scientist | Natural Language Processing & Machine Learning Expert | Professor
7mo
Report this post
Worth the read: Gao, Yunfan, et al. Retrieval-Augmented Generation for Large Language Models: A Survey. 1, arXiv:2312.10997, 18 Dec. 2023. arXiv.org, https://lnkd.in/e5jVfUYC.

Electrical Engineering and Systems Science

arxiv.org
Like Comment
To view or add a comment, sign in
Michael(Mike) Erlihson

Head of AI @ Stealth | PhD in Math | Scientific Content Creator & Lecturer | Podcast Host | Deep Learning & Data Science Expert | 200+ Deep Learning Paper Reviews | 10+ recorded DL podcasts | 51K followers |
7mo Edited
Report this post
🧠📙 𝐀 𝐦𝐚𝐭𝐡𝐞𝐦𝐚𝐭𝐢𝐜𝐚𝐥 𝐩𝐞𝐫𝐬𝐩𝐞𝐜𝐭𝐢𝐯𝐞 𝐨𝐧 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬 🚀💥 - an interesting paper about math foundations of #transformers by BORJAN GESHKOVSKI, CYRIL LETROUIT, YURY POLYANSKIY, AND PHILIPPE RIGOLLET 🎯 Transformers play a central role in the inner workings of large language models. 🎯 We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. 🎯 Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists. #llms #largelanguagemodels #deeplearning

8 Comments
Like Comment
To view or add a comment, sign in
Harshmeet Singh

GenAI RA at UTS HTI | SDE at UNSW | IT(AI) postgrad at UNSW, | ex- London Stock Exchange ,ex-OlaElectric | 1xPatent 📝 |
9mo Edited
Report this post
🤯LLEMA is out 🤯 LLEMMA a new open source LLM specifically designed to solve mathematical problems, surpass Minerva by Google and other competitors on an “equi-parameter basis.”🔥 This means that LLEMMA-7B outperforms Minerva-8B, and LLEMMA-34B is nearly on par with Minerva-62B on a 4-shot Math Performance. 👏 Link to the research paper in the comments and here https://lnkd.in/gpbiEZUz #llema #llm #mathematics #opensource #datascience #news #minvera #maths #generativeAI #genAI #machinelearning #news
2 Comments
Like Comment
To view or add a comment, sign in
Güray Uysal

Akdeniz Bilişim Teknolojileri şirketinde yönetici ve girişimci
2w
Report this post
Microsoft'tan kurumsal üretkenlik odaklı yeni yapay zeka sistemi: SpreadsheetLLM Encoding Spreadsheets for Large Language Models (arxiv.org)

Electrical Engineering and Systems Science

arxiv.org
Like Comment
To view or add a comment, sign in
Ramy Anka

Architectural Designer
10mo Edited
Report this post
draft^media002_Transculpting As the research transitioned from theoretical to practical, the progress started anchoring itself around alternative terminologies. Transculpting is a product of this speculative process. It is the act of defining neural style transfers as tangible physicalities. Therefore the transfers of style surpass the bidimensional threshold and manifest in 3D space. This episode of the thesis worked towards developing and correlating a proper partnership of exchange between architectural machine learning and analog meaning-making. Académie Libanaise des Beaux-Arts Guided by: SAMER EID

2 Comments
Like Comment
To view or add a comment, sign in
Alvine Boaye Belle

Assistant Professor/Connected Minds member/Mother, Ph.D., McGill alumni
6mo
Report this post
The title of our latest pre-print is: "Evaluating the Effectiveness of GPT-4 Turbo in Creating Defeaters for Assurance Cases". The authors of that preprint are: Kimya Khakzad Shahandashti, Mithila Sivakumar, Mohammad Mahdi Mohajer, Alvine Boaye Belle, Song Wang, and Timothy Lethbridge. Our pre-print is available on ArXiv. You can access it by using the following link: https://lnkd.in/dsvuEBKj The abstract of our paper is the following: "Assurance cases (ACs) are structured arguments that allow verifying the correct implementation of the created systems’ non-functional requirements (e.g., safety, security). This allows for preventing system failure. The latter may result in catastrophic outcomes (e.g., loss of lives). ACs support the certification of systems in compliance with industrial standards e.g. DO-178C and ISO 26262. Identifying defeaters —arguments that challenge these ACs — is crucial for enhancing ACs’ robustness and confidence. To automatically support that task, we propose a novel approach that explores the potential of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, in identifying defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our preliminary evaluation assesses the model’s ability to comprehend and generate arguments in this context and the results show that GPT-4 turbo is very proficient in EA notation and can generate different types of defeaters.".

Electrical Engineering and Systems Science

arxiv.org

2 Comments
Like Comment
To view or add a comment, sign in

839,581 followers

View Profile Follow

AI at Meta’s Post

More Relevant Posts

Explore topics