Groq’s Post

View organization page for Groq, graphic

92,600 followers

At #ISCA2024 we will explain how we co-designed a compilation-based software stack and a class of accelerators called LPUs, which result in high utilization and low end-to-end system latency. We'll talk about the challenges of breaking models apart over networks of LPUs, and outline how this HW/SW system architecture keeps enabling breakthrough LLM inference latency at all model sizes. https://hubs.la/Q02BrFnL0

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics