Artificial Analysis

Technology, Information and Internet

The leading independent benchmark for LLMs - compare quality, speed and price to pick the best model for your use case

View all 4 employees

About us

The leading independent benchmark for LLMs - compare quality, speed and price to pick the best model for your use case.

Website: https://artificialanalysis.ai/
External link for Artificial Analysis
Industry: Technology, Information and Internet
Company size: 11-50 employees
Type: Privately Held

Employees at Artificial Analysis

See all employees

Updates

Artificial Analysis

3,989 followers
1mo
Report this post
Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
1mo

Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

Model & API Providers Analysis | Artificial Analysis

artificialanalysis.ai

1 Comment

Like Comment Share
Artificial Analysis reposted this

Cerebras Systems

33,042 followers
8h
Report this post
Cerebras Inference has the industry’s best pricing for high-speed inference - 10c per million tokens for Llama3.1- 8B - 60c per million tokens for Llama3.1- 70B Try it today: https://lnkd.in/gEJJ2pfY
1 Comment

Like Comment Share
Artificial Analysis reposted this

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
1d
Report this post
I've been playing with SambaNova Systems's API serving fast Llama 3.1 405B tokens. Really cool to see leading model running at speed. Congrats to Samba Nova for hitting a 114 tokens/sec speed record (and also thanks Kunle Olukotun for getting me an API key!) https://lnkd.in/gQF-PmnK

Llama 3.1 405B 4X faster on SambaNova | World Record

sambanova.ai

89 Comments

Like Comment Share
Artificial Analysis reposted this

Cerebras Systems

33,042 followers
1d
Report this post
Verified by Artificial Analysis, Cerebras Inference achieves 1,850 tokens/sec on Llama 3.1 8B and 450 tokens/sec on Llama 3.1 70B! By dramatically reducing processing time, we're enabling more complex AI workflows and enhancing real-time LLM intelligence. This includes a new class of intelligent agents that can “think faster” than ever before. Cerebras Inference will power a new era of Instant AI. 👉Try it today: https://lnkd.in/gEJJ2pfY 👉Read our blog: https://lnkd.in/gZ46q4cD 👉Check out Artificial Analysis for more data: https://lnkd.in/gY6NAFqW
6 Comments

Like Comment Share
Artificial Analysis reposted this

AI21 Labs

23,738 followers
1d
Report this post
We know that #Jamba 1.5 models are the fastest, but the question is - how fast? Artificial Analysis tested our models to find out 😎 Jamba 1.5 models are a whole lot faster – and that speed delta only grows with longer prompts. High throughput itself is never enough; it's all about optimizing the speed-cost-quality triangle. Thanks to Jamba’s architecture, we offer high speed at a very competitive price (in the image below, QII is where you want to be). But our cost, efficiency and speed don't come at the expense of quality. #Jamba 1.5 Large and Mini both show a great balance between speed and quality (QI is where you want to be). Check out the graphs below from Artificial Analysis and find out more: www.ai21.com/jamba
1 Comment

Like Comment Share
Artificial Analysis reposted this

AI at Meta

855,048 followers
1d Edited
Report this post
Announced this morning and verified by Artificial Analysis, Cerebras Systems Inference is capable of serving Llama 3.1 70B at 450 tokens/sec and Llama 3.1 8B at 1,850 tokens/sec! This order of magnitude increase in inference speed for Llama 3.1 could unlock all new types of use cases for the developer community and enterprises.

Cerebras Systems

33,042 followers
1d Edited

Meet Cerebras Inference – the fastest inference for generative AI! 🏎️ Speed: 1,800 tokens/sec for Llama 3.1-8B and 450 tokens/sec for Llama 3.1-70B, 20x faster than NVIDIA GPU-based hyperscale clouds. 💸 Price: Cerebras Inference offers the industry’s best price-performance at 10c per million tokens for Llama 3.1-8B and 60c per million tokens for Llama-3.1 70B. 🎯 Accuracy: Cerebras Inference uses native 16-bit weights for all models, ensuring the highest accuracy responses. 🔓 Access: Cerebras Inference is open to everyone today via chat and API access. All powered by our third-generation Wafer Scale Engine (WSE-3). Try it now 👉 https://lnkd.in/gEJJ2pfY Press Release: https://lnkd.in/gtF5fxHt Blog: https://lnkd.in/gZ46q4cD

6 Comments

Like Comment Share
Artificial Analysis

3,989 followers
1d
Report this post
Cerebras has set a new record for AI inference speed, serving Llama 3.1 8B at 1,850 output tokens/s and 70B at 446 output tokens/s. Cerebras Systems has just launched their API inference offering, powered by their custom wafer-scale AI accelerator chips. Cerebras Inference is achieving the fastest speeds we have ever benchmarked on Artificial Analysis for Llama 3.1 8B and 70B. Pricing is also competitive at $0.1 per 1M tokens for Llama 3.1 8B and $0.6 per 1M tokens for Llama 3.1 70B. Cerebras is currently serving both models with an 8K context window (compared to the the Llama 3.1 series’ native 128k context). Cerebras’ Llama 3.1 8B offering is nearly 10x faster than the speeds offered by OpenAI, Google and Anthropic for their current small models - GPT-4o mini, Gemini 1.5 Flash and Claude 3 Haiku. Cerebras Inference is powered by the Cerebras WSE-3, a custom 5nm AI chip built on a unique wafer-scale design. A single WSE-3 chip is over 50x larger in total area than a Nvidia H100 and hosts 900,000 cores with 44GB of on-chip memory (SRAM). Faster AI inference enables a new generation of AI applications, from sophisticated agentic workflows to instant search experiences. See below for further charts and links to our full benchmark results.
10 Comments

Like Comment Share
Artificial Analysis

3,989 followers
6d
Report this post
AI21 Labs launches Jamba 1.5 Mini and Large! These models utilize a hybrid state space/transformer architecture to maintain output speed with long context inputs Both Mini and Large launch with 256k context windows, the largest of all major open weights models. Their hybrid state space/transformer architecture supports the models in maintaining high performance over long prompt lengths. Jamba 1.5 is leading in speed compared to models of a similar intelligence class (comparing to the median performance across providers for each model). Artificial Analysis has independently benchmarked Jamba 1.5 across quality, API performance and price. See below for further charts and a link to our full analysis of AI21’s models.
2 Comments

Like Comment Share
Artificial Analysis

3,989 followers
1w Edited
Report this post
Ideogram’s v2 Text to Image model released today is launched on our Image Arena! We have crowdsourced >200k user preferences to independently evaluate the quality of Text to Image models. Stay tuned for how Ideogram's new model compares to Midjourney v6.1, Black Forest Labs's FLUX.1 [pro], Stability AI's Stable Diffusion 3 Medium & other models as preferences are submitted in the coming days. Early comparisons suggest a strength of Ideogram v2 is in adding text clearly to images and in-line with style of the image (as per image generated below), something Text to Image models have historically struggled with. We have also commenced API performance benchmarking of Ideogram’s API which is currently in beta. Generations are priced at $80 per 1k images, in-line with Midjourney. See below for a link to join in Artificial Analysis’ Text to Image Arena 👇
1 Comment

Like Comment Share
Artificial Analysis

3,989 followers
1w
Report this post
Groq has just launched their record breaking Distil-Whisper endpoint! With a Speed Factor of 240x, it is the fastest Speech to Text endpoint we have benchmarked. It is also the lowest-cost Speech to Text endpoint we benchmark at $0.33 per 1000 minutes of audio. This means you can transcribe all Star Wars movies (~27 hours) in under 7 minutes for less than $1 ($0.53). However, important to note is that Distil-Whisper is English language only and has a higher Word Error Rate than Whisper v3. Per our independent quality evaluation, Distil-Whisper has a Word Error Rate of 12.7%, higher than Whisper v3’s 10.3%. As such, this endpoint is suited for those that have English-language use-cases and want to prioritize speed and cost over a marginal decrease in accuracy. Link to our analysis below 👇
5 Comments

Like Comment Share

Artificial Analysis

Technology, Information and Internet

The leading independent benchmark for LLMs - compare quality, speed and price to pick the best model for your use case

About us

Employees at Artificial Analysis

George Cameron

Co-Founder at Artificial Analysis

Tong Zhang

Data Product | AI Benchmarking | Startmate Alumni

Clinton Lui

Scaling ANZ's largest AI Builder Community 🤝

Updates

Model & API Providers Analysis | Artificial Analysis

artificialanalysis.ai

Llama 3.1 405B 4X faster on SambaNova | World Record

sambanova.ai

Join now to see what you are missing

Similar pages

Groq

Unsloth AI

Build Club

SambaNova Systems

Tempest AI

Agentsy

GoalGetter

Perplexity

Fairgo.ai

Bugster