Groq’s Post

View organization page for Groq, graphic

92,600 followers

2mo

At #ISCA2024 we will explain how we co-designed a compilation-based software stack and a class of accelerators called LPUs, which result in high utilization and low end-to-end system latency. We'll talk about the challenges of breaking models apart over networks of LPUs, and outline how this HW/SW system architecture keeps enabling breakthrough LLM inference latency at all model sizes. https://hubs.la/Q02BrFnL0

To view or add a comment, sign in

More Relevant Posts

Igor Arsovski

Chief Architect & Fellow | ex- Google, Marvell, Globalfoundries, IBM
2mo Edited
Report this post
Brute force AI accelarator scaling is not sustainable. Bigger dies, more transistors, more HBMs are not going to get us the 10x perf/year genAI demands. By 2027 AI is projected to consume more power than all of the country of Argentina (hosting ISCA). I am excited to present Groq’s LPU Inference Engine - a vertically optimized genAI inference system that uses 30x less devices to achieve 6x better perf, 4 lower cost, and 3x lower energy consumption. Let’s make every transistor & electron count.
Groq

92,600 followers
2mo

At #ISCA2024 we will explain how we co-designed a compilation-based software stack and a class of accelerators called LPUs, which result in high utilization and low end-to-end system latency. We'll talk about the challenges of breaking models apart over networks of LPUs, and outline how this HW/SW system architecture keeps enabling breakthrough LLM inference latency at all model sizes. https://hubs.la/Q02BrFnL0
1 Comment
Like Comment
To view or add a comment, sign in
Universal Chiplet Interconnect Express (UCIe)

6,658 followers
3mo
Report this post
Did you miss last week’s virtual SNIA Compute, Memory, and Storage Summit? The UCIe Consortium session “Enabling an Open Chiplet Ecosystem with UCIe” is now available on demand with FREE event registration. During the session, Brian Rea and Richelle Ahlvers (Intel Corporation) introduce new features in the UCIe 1.1 specification and explore how UCIe benefits the storage industry. Watch the full presentation: https://bit.ly/4ecdTpl #UCIeConsortium #UCIe #chiplets #SNIA #SNIACMS #storage

CMS Summit 2024

sniacmssummit.org
Like Comment
To view or add a comment, sign in
NVIDIA Data Center

118,537 followers
7mo
Report this post
At #GTC24, learn about how the #GraceCPU and #GraceHopper Superchips are built to solve the largest computing problems. Register for these GTC sessions today: https://nvda.ws/4aRWsZo
1 Comment
Like Comment
To view or add a comment, sign in
Dmitrii Ustiugov

Assistant Professor at NTU Singapore. Cloud and serverless computing, Systems for ML
5mo
Report this post
At NTU and UoEdi, we believe the future of #LLM systems is #serverless. Large-scale LLM inference serving systems, bound by power and chip shortage, require new highly efficient designs. To achieve this, our OSDI'24 paper brings serverless autoscaling to LLM workloads. #vHive
Luo Mai

Assistant Professor, University of Edinburgh
5mo Edited

Our work on enabling cost-effective serverless inference for LLMs got accepted into #OSDI24! This work features a new LLM checkpoint format boosted by a fast loader over multi-tier storage, LLM inference live migration, and novel GPU serverless architectures! This work is led by Yao Fu (his 2nd OSDI paper since started as a Ph.D. in 2021). Joint work with Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, and Yuvraj Patel. Congratulations on the team! #LLMs #Serverless #Research
4 Comments
Like Comment
To view or add a comment, sign in
SuperAGI

11,347 followers
8mo Edited
Report this post
In order to optimize the operational efficiency of LAMs, this week we are exploring the use of LoRAX (LoRA Exchange) - we're particularly drawn to its ability to optimize GPU utilization and provide scalability for fine-tuned model inferencing. LoRAX allows users to serve 1000s of task-specific models into a single GPU, significantly reducing expenses associated with serving multiple models. This is achieved through a combination of dynamic adapter loading, tiered weight caching, and continuous multi-adapter batching. This allows seamless management and operation of multiple fine-tuned models, minimizing technical overhead and maximizing efficiency. We are planning to use LoRAX to serve nearly 10+ adapter weights using just a single server which is horizontally scalable based on increased load dynamically on the go. This will help software applications to have different multi-tenant models for each user efficiently.
Like Comment
To view or add a comment, sign in
Quobyte

1,347 followers
9mo
Report this post
As SC23 concludes, our CEO Bjorn Kolbeck shares his thoughts about our latest enhancement: Quobyte's RDMA feature. This addition is pivotal in making HPC storage more accessible and performance-driven. In the video, he talks about how integrating RDMA into our storage platform will improve user experience, offering lower latency in diverse infrastructures. #Quobyte #RDMA #HPC #SC23 #Innovation #AccessibleHPC #HPC
Like Comment
To view or add a comment, sign in
GigaIO

2,560 followers
3mo Edited
Report this post
The training runtime for a LLaMA2 LLM model with 7 billion parameters across different GPU configurations on a single 32 GPU SuperNODE. SuperNODE offers a cost-effective, scalable solution for demanding computing environments. https://bit.ly/3xKBnAV #GenAI #GigaIO #LLM
Like Comment
To view or add a comment, sign in
MIPS

12,691 followers
7mo Edited
Report this post
Efficient data movement, real-time responsiveness and low latency are the key fundamentals of MIPS differentiation and giving customers the Freedom to Innovate Compute. Hear Sameer Wasson's vision for the future of MIPS. #FreedomtoDesign #FreedomtoInnovate https://lnkd.in/dDH5qP-i

MIPS: Freedom to Innovate Compute

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Mazen Arakji

Embedded Software Engineer
6mo
Report this post
Learn about the outstanding deficiency in current RTOS architectures: The Diminishing Bandwidth Problem I have 2 patents that solve this problem, thereby immediately rendering all current RTOSes obsolete. See this easy-to-understand presentation: https://lnkd.in/gG-6nxSQ
Like Comment
To view or add a comment, sign in
Stefano Schiavella

Global Product Manager presso Rockwell Automation
9mo
Report this post
Leverage IT/OT interoperability through Edge Computing at Logix level: Embedded Edge Compute is your solution to address these needs
2 Comments
Like Comment
To view or add a comment, sign in

92,600 followers

View Profile Follow

Groq’s Post

More from this author

Groq Receives Comparably Award for Best Leadership Team Based on Employee Reviews

A Seat on a Rocketship Instead of a Job? A Recent Grad’s Groq Experience.

Why Groq - Michelle Donnelly

Explore topics

Groq’s Post

More Relevant Posts

MIPS: Freedom to Innovate Compute

https://www.youtube.com/

More from this author

Groq Receives Comparably Award for Best Leadership Team Based on Employee Reviews

A Seat on a Rocketship Instead of a Job? A Recent Grad’s Groq Experience.

Why Groq - Michelle Donnelly

Explore topics