Artificial Analysis’ Post

View organization page for Artificial Analysis, graphic

3,989 followers

Announcing the Artificial Analysis Text to Speech leaderboard, including our Speech Arena to crowdsource our quality ranking! We’re covering leading speech models from ElevenLabs, OpenAI, Cartesia, LMNT, Google, Amazon Web Services (AWS), Azure, along with open speech models like MetaVoice and StyleTTS hosted on Replicate and fal. We expect voice-enabled AI experiences to grow in importance dramatically over the coming years so we’re excited to bring launch full benchmarking coverage of Text to Speech to sit alongside our LLM and Speech to Text benchmarking. Quality will be be compared using an ELO score based on data from our new Speech Arena. After each vote, you can see which model you preferred and after 30 results, you can see your own personal ranking of the models. You might learn some interesting facts along the way too! Link to Speech Arena: https://lnkd.in/e3z5vagp Results will start showing on the main leaderboard as soon as we’ve collected votes on more than 100 comparisons for each model. For the leaderboard, as usual, we’re analyzing speech models across quality, price and speed. See the below tweets for highlights across our price and speed (API performance)

  • No alternative text description for this image

See here for our Text to Speech Leaderboard: https://artificialanalysis.ai/text-to-speech You can see our benchmarking of price and API performance (speed). After receiving 100 votes from each model in our Speech Arena, we will be sharing our quality comparison of the models, including on a relative basis: Quality vs. Price and Quality vs. Speed. Let us know if there are additional models or API providers we should add!

Speech to Text model API performance (speed) between endpoints ranges ~150x from ~4 characters per second to over 600 characters per second Unlike what we see in language models, speech models from Google and Amazon Web Services (AWS) lead our initial speed rankings. None of the APIs we test for open source speech models appear to be competitive on speed compared to proprietary solutions. We hope the Artificial Analysis Text to Speech leaderboard will be a source of inspiration for the open source community close that gap over time!

  • No alternative text description for this image

Speech to Text model prices span nearly a 100x range - from $2.84 per million characters to over $200 per million characters Cheaper options include the ‘Standard’ voices from Google and Amazon that utilize traditional speech synthesis techniques, while more expensive options are typically more recently launched models based on neural network approaches. Stay tuned to see how the latest premium models from Google and Amazon compare to start-ups like ElevenLabs, Cartesia and LMNT!

  • No alternative text description for this image
Tong Zhang

Data Product | AI Benchmarking | Startmate Alumni

1mo

Congrats team! Truly cutting edge work. To everyone coming across this - we’re very excited about your input! Please don’t hesitate to comment or message the page with any thoughts, questions, or feedback.

Like
Reply
Ariel Cai

AI GTM | Stanford GSB

1mo

Thanks team! Great work.🚀👏

Vuong Ngo

Help business to streamline AI transformation

1mo

Awesome work! 🎉

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics