Artificial Analysis’ Post

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI

1mo

Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

Model & API Providers Analysis | Artificial Analysis

artificialanalysis.ai

182 Comments

Amanda Brock

🇺🇦CEO OpenUK/ SOOCon25; Computer Weekly 50 Most Influential Women Tech 23; Computing IT Leaders 100 23 &24; Board Member; Advisor; Writer; International Keynote; Editor: Open Source Law, Policy & Practice; AuDHD

1mo

Be great to see them incorporating this work into how Hugging Face asesses openness https://www.nature.com/articles/d41586-024-02012-5

6 Reactions

Hubert Stoop

1mo

Impressive and insightful work! This study aligns perfectly with our observations, backed by scientific rigor. I recommend extending the analysis to rerankers. Also, Groq is fantastic! Microsoft should seriously consider their LPU for GPT inference ;-)

8 Reactions

Pawel Manowiecki

Data & AI Solution Architect at Data Wizards

1mo

I vote yes! I am looking at this portal to find new ideas. One thing I like especially - that it's an option to early discover some Davids among Goliats ;) E.g. I though Groq is the best now in STT/ASR inference for Whisper. Now I learned about Whizper experiment done by Fal.ai https://fal.ai/models/fal-ai/wizper/playground

13 Reactions

Kartik ..

1mo

Anyscale put out something similar last year for benchmarking LLMs on multiple API providers: https://github.com/ray-project/llmperf

4 Reactions

Thomas Bustamante

Founder & CEO at Next Realm AI | Artificial Intelligence | Venture Capital | Capital Markets

1mo

Llama and Gemini look good on both speed and price

4 Reactions

Artificial Analysis

1mo

Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

17 Reactions

Jeffrey Jiang

CS Student (Econ minor, QIS certificate) at UT Austin

1mo

Benchmarking is always highly dependent on methodology, especially with pretty subjective and high level statements like the ones here... disclaimer aside the pretty graphs are nice and the blanket opinions represented by them are both interesting and useful. I do hope google, facebook, and the rest do interesting research rather than getting bogged down in catchup wars though... that's something these graphs won't highlight and many AI fields are evolving pretty quickly.

2 Reactions

DataInsta

1mo

Impressive work on the benchmarking site! Speed matters, just like in Formula 1 pit stops. Andrew Ng

8 Reactions

Suresh Chekuri

Principal Data Scientist @ Rubus Digital Pvt. Ltd. | Machine Learning, Deep Learning, Generative AI, MLOps

1mo

Picking the right LLM/API can be a challenging decision as there are many factors to consider - quality, speed, price etc. artificialanalysis.ai is a fantastic resource. Kudos to the team for the great work.

4 Reactions

Vikram Bandarupalli

Sales Engineering |Data & Analytics| Helping Organizations do more with Data| Stanford GSB

1mo

Great to see these LLM bechmarks. Salesforce did something similar https://www.salesforceairesearch.com/crm-benchmark

5 Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

Artificial Analysis

3,989 followers
1mo
Report this post
Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
1mo

Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

Model & API Providers Analysis | Artificial Analysis

artificialanalysis.ai

1 Comment
Like Comment
To view or add a comment, sign in
Artificial Analysis

3,989 followers
1d
Report this post
Cerebras has set a new record for AI inference speed, serving Llama 3.1 8B at 1,850 output tokens/s and 70B at 446 output tokens/s. Cerebras Systems has just launched their API inference offering, powered by their custom wafer-scale AI accelerator chips. Cerebras Inference is achieving the fastest speeds we have ever benchmarked on Artificial Analysis for Llama 3.1 8B and 70B. Pricing is also competitive at $0.1 per 1M tokens for Llama 3.1 8B and $0.6 per 1M tokens for Llama 3.1 70B. Cerebras is currently serving both models with an 8K context window (compared to the the Llama 3.1 series’ native 128k context). Cerebras’ Llama 3.1 8B offering is nearly 10x faster than the speeds offered by OpenAI, Google and Anthropic for their current small models - GPT-4o mini, Gemini 1.5 Flash and Claude 3 Haiku. Cerebras Inference is powered by the Cerebras WSE-3, a custom 5nm AI chip built on a unique wafer-scale design. A single WSE-3 chip is over 50x larger in total area than a Nvidia H100 and hosts 900,000 cores with 44GB of on-chip memory (SRAM). Faster AI inference enables a new generation of AI applications, from sophisticated agentic workflows to instant search experiences. See below for further charts and links to our full benchmark results.
10 Comments
Like Comment
To view or add a comment, sign in
Artificial Analysis

3,989 followers
6d
Report this post
AI21 Labs launches Jamba 1.5 Mini and Large! These models utilize a hybrid state space/transformer architecture to maintain output speed with long context inputs Both Mini and Large launch with 256k context windows, the largest of all major open weights models. Their hybrid state space/transformer architecture supports the models in maintaining high performance over long prompt lengths. Jamba 1.5 is leading in speed compared to models of a similar intelligence class (comparing to the median performance across providers for each model). Artificial Analysis has independently benchmarked Jamba 1.5 across quality, API performance and price. See below for further charts and a link to our full analysis of AI21’s models.
2 Comments
Like Comment
To view or add a comment, sign in
Artificial Analysis

3,989 followers
1w Edited
Report this post
Ideogram’s v2 Text to Image model released today is launched on our Image Arena! We have crowdsourced >200k user preferences to independently evaluate the quality of Text to Image models. Stay tuned for how Ideogram's new model compares to Midjourney v6.1, Black Forest Labs's FLUX.1 [pro], Stability AI's Stable Diffusion 3 Medium & other models as preferences are submitted in the coming days. Early comparisons suggest a strength of Ideogram v2 is in adding text clearly to images and in-line with style of the image (as per image generated below), something Text to Image models have historically struggled with. We have also commenced API performance benchmarking of Ideogram’s API which is currently in beta. Generations are priced at $80 per 1k images, in-line with Midjourney. See below for a link to join in Artificial Analysis’ Text to Image Arena 👇
1 Comment
Like Comment
To view or add a comment, sign in
Artificial Analysis

3,989 followers
1w
Report this post
Groq has just launched their record breaking Distil-Whisper endpoint! With a Speed Factor of 240x, it is the fastest Speech to Text endpoint we have benchmarked. It is also the lowest-cost Speech to Text endpoint we benchmark at $0.33 per 1000 minutes of audio. This means you can transcribe all Star Wars movies (~27 hours) in under 7 minutes for less than $1 ($0.53). However, important to note is that Distil-Whisper is English language only and has a higher Word Error Rate than Whisper v3. Per our independent quality evaluation, Distil-Whisper has a Word Error Rate of 12.7%, higher than Whisper v3’s 10.3%. As such, this endpoint is suited for those that have English-language use-cases and want to prioritize speed and cost over a marginal decrease in accuracy. Link to our analysis below 👇
5 Comments
Like Comment
To view or add a comment, sign in

3,989 followers

View Profile Follow

Artificial Analysis’ Post

More Relevant Posts

Explore topics