Kaizhao Liang’s Post

ML @ SambaNova Systems

3mo

Sometimes, it's NOT just about speed. You can get somewhere very quickly and very wrong. Here is a thorough study on how quantization does non-negligible damage to the quality of LLAMA3 models. https://lnkd.in/gQQVzisb Meanwhile, SambaNova Systems is running LLAMA3 at 400+ tokens/s on its native precision 🚀🚀🚀 . Trust me, you will sleep better knowing your enterprise solutions are running without degradations. Good night 😴😴😴 Try it out here before going to bed: https://fast.snova.ai/

AK (@_akhaliq) on X

twitter.com

4 Comments

LEI WANG

AI Architect

3mo

note for llama2 70 training meta achieved 250 toks/gpu/sec in 2000 A100 system 1 year ago in pretrain. as for inference , the latest MLcomm benchmark shows 70 20000 toks/8gpu/sec , where you can verify quickly that 7B llama inference can be achieved with 3500 toks/gpu/sec . with 800 Gb/s network card, and optimization, 800 toks/sec should be expected in llama 2 pre training and 5000 toks/sec in llama2 inference.

Anand Sampat

We are Hiring! 🤖 ML Executive | 👨🏾💻 Builder | 🎙️ Podcaster & Writer

3mo

Really makes you think where else quantization might be reducing the quality of models 🤔

3 Reactions

SambaNova Systems

3mo

Well said Kaizhao Liang!

1 Reaction

Richard Halkett

Ex-Amazon, ex-Cisco technology executive; Chief Revenue Officer & Chief Customer Officer, SambaNova Systems

3mo

Great points, Kaizhao Liang!

See more comments

To view or add a comment, sign in

More Relevant Posts

Tarun Chawdhury, M.S in CS

Principal Architect | Faculty | Author | Co-Founder | CTO | GenAI Expert - A Trusted Technology Leader ( All views are on Behalf of DLYog™ Lab)
11mo
Report this post
https://lnkd.in/gQWMj_Ru I've created a demo of #langchain using #llama2 to implement #RetrievalAugmentedGeneration, or #RAG. Initially, I exposed LLAMA2 as an API, and then implemented a Custom LLM for LLAMA2 using Meta AI. However, I've observed that the #langchain Agent framework's behavior is non-deterministic and it takes longer to process. In comparison, using a standard utility function and basic Langchain with PromptTemplate is both more efficient and faster. I'd love to hear from my connections if there's a better way to implement this.

Demo - Retrieval Augmented Generation with LLAMA2 and Langchain

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Thierry Moreau

Co-Founder at OctoAI
3mo
Report this post
What kind of applications can a more capable and performant ~8b model enable? What about a ~70B model? Looking forward to seeing what our developer community comes up with. AI at Meta's Llama3 has been making ripples in the last few days. You can try the model out on OctoAI 👉 www.octoai.cloud

OctoAI

12,910 followers
3mo

Run inferences against Llama 3 8b instruct and Llama 70b instruct - now available on OctoAI Generate an OctoAI API token and test these experimental models today Sign up for a free trial 👉 www.octoai.cloud Read more here 👉 https://lnkd.in/gVa_vTep

Try Meta's new Llama 3 model via the OctoAI API today | OctoAI

octo.ai
Like Comment
To view or add a comment, sign in
Darrin P Johnson, MBA
6mo
Report this post
Discover the power of zero-shot detection model in this #generativeAI demo. With simple code commands and REST API requests, you can teach the model to detect various objects.

Transform Edge AI Applications With Generative AI Using Metropolis
Like Comment
To view or add a comment, sign in
Brian Dowdy, CSM
6mo
Report this post
Discover the power of zero-shot detection model in this #generativeAI demo. With simple code commands and REST API requests, you can teach the model to detect various objects.

Transform Edge AI Applications With Generative AI Using Metropolis
Like Comment
To view or add a comment, sign in
Patricia de Boer

Global NVIDIA Partner Network New Product Introductions
6mo
Report this post
Discover the power of zero-shot detection model in this #generativeAI demo. With simple code commands and REST API requests, you can teach the model to detect various objects.

Transform Edge AI Applications With Generative AI Using Metropolis
Like Comment
To view or add a comment, sign in
Wei Jing (William) Wong

DevRel
6mo
Report this post
Discover the power of zero-shot detection model in this #generativeAI demo. With simple code commands and REST API requests, you can teach the model to detect various objects.

Transform Edge AI Applications With Generative AI Using Metropolis
Like Comment
To view or add a comment, sign in
Young Zhao

Community
6mo
Report this post
Discover the power of zero-shot detection model in this #generativeAI demo. With simple code commands and REST API requests, you can teach the model to detect various objects.

Transform Edge AI Applications With Generative AI Using Metropolis
Like Comment
To view or add a comment, sign in
DataSentics, an Eviden business

2,995 followers
3mo Edited
Report this post
How can you effectively implement Unity Catalog in enterprises without unpleasant surprises? Join us to find out! In the upcoming webinar with Tomas Bouma, we'll cover topics like: ▪️Production-like data for development: how to make security happy. ▪️Governing GenerativeAI use cases. ▪️The challenges of using virtual networks to keep data separated and many more. Save your spot now! 🔏 #Databricks #UnityCatalog #GenerativeAI

Webinar: Implementing Unity Catalog Securely and Cost-Effectively | DataSentics, an Eviden Business

app.livestorm.co
Like Comment
To view or add a comment, sign in
Jason Black

Communications and Marketing, with robots
6mo
Report this post
Discover the power of zero-shot detection model in this #generativeAI demo. With simple code commands and REST API requests, you can teach the model to detect various objects.

Transform Edge AI Applications With Generative AI Using Metropolis
Like Comment
To view or add a comment, sign in

622 followers

73 Posts

View Profile Follow

Kaizhao Liang’s Post

More Relevant Posts

Demo - Retrieval Augmented Generation with LLAMA2 and Langchain

https://www.youtube.com/

Explore topics