🚀 Introducing Athene-70B: Redefining Post-Training for Open Models! We’re thrilled to release Athene-Llama3-70B, an open-weight chat model fine-tuned from Meta's Llama-3-70B. With an impressive Arena-Hard score of 77.8%, Athene-70B is approaching top proprietary models like GPT-4o and Claude-3.5-Sonnet. Experience it now in public testing on Chatbot Arena! Blog: https://lnkd.in/g-uHVwEV HuggingFace Model: https://lnkd.in/gwuB5QiP Discord: https://lnkd.in/gJwN2X7w 🌟 #AI #MachineLearning #LLM #Nexusflow #Athene70B #ChatbotArena #Llama3
Nexusflow
Software Development
Palo Alto, California 1,791 followers
Democratize GenAI Agents for Enterprises
About us
Nexusflow Solution enables Generative AI agents that surpass GPT-4 in your workflow and continuously automatically update with security guardrails.
- Website
-
https://nexusflow.ai/
External link for Nexusflow
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- Palo Alto, California
- Type
- Privately Held
Locations
-
Primary
Palo Alto, California, US
Employees at Nexusflow
Updates
-
🚀 Exciting News! Check out our brand new short course on function calling, the foundation for AI agents, created by our CEO Jiantao Jiao and founding engineer Venkat Srinivasan, in collaboration with DeepLearning.AI and the legendary Andrew Ng! 🌟
Function calling is a powerful way to extend the capabilities of LLMs and AI agents by letting them use external tools. Our new short course Function calling and Data Extraction with LLMs, created with Nexusflow and taught by Jiantao Jiao and Venkat Srinivasan, demonstrates how to prompt LLMs to form calls to external functions. You'll work with NexusRavenV2-13B, a 13B parameter open-source model that excels in function calling tasks while still being small enough to host locally. Learn to use function calling to extract structured data from unstructured text and access web APIs, and build an end-to-end application that processes customer service transcripts. You'll learn how to build LLM-powered applications that can analyze feedback, automate data entry, and enhance search. Please get started here: https://lnkd.in/g9FpiCuH
-
Nexusflow reposted this
🔍 What Starling-LM-7B-beta's excellent performance tells us about benchmarks I compared the performance of Nexusflow's model across various benchmarks. In the Chatbot Arena Leaderboard (https://lnkd.in/eBaze79A), this 7B model impressively outperforms many larger models, including GPT-3.5-Turbo, Mixtral, Gemini Pro, and every fine-tuned version of Llama 2 70B. It is also significantly better than Mistral-7B-Instruct-v0.2, which ranks 30th with an Elo score of 1073. In my leaderboard (https://lnkd.in/gjWKF-5u), its average score is below Mistral-7B-Instruct-v0.2's but the evaluation also suggests that Starling is better: - Much higher AGIEval score (+5.71), which is an excellent benchmark. - Higher BigBench (+2.31) and GPT4All (+2.06) scores. BigBench is as good as AGIEval. - Worse TruthfulQA (-10.37), which shows that this benchmark is unreliable. Don't trust it. On the other hand, we also see that it doesn't perform much better than OpenChat 3.5 0106, which is the base model that Starling used. However, OpenChat is also ranked 30th with an Elo score of 1072. This is something that this benchmark suite couldn't capture. Can MT-bench capture it? Not really, Starling gets a score of 8.12 vs. 7.8 for OpenChat. It's also outperformed by GPT-3.5-Turbo and Mixtral. Not too surprising considering that it evaluates models on a set of multi-turn questions (conversational). Then MMLU, surely? Not at all. Starling obtained an MMLU of 65.14 vs. 65.04 for OpenChat, 60.78 for Mistral, and 71.88 for Mixtral. If I had to make a guess, I would say that Starling's PPO fine-tuning significantly improves the usefulness (but not the accuracy) of its answers, which is not correctly captured by these benchmarks. This shows a potential gap in the current evaluation pipeline. Rather than introducing another evaluation set, it'd be more efficient to use an LLM-as-a-judge, focusing specifically on the usefulness of the responses. It doesn't mean that the Chatbot Arena is perfect. For example, it doesn't handle conversations, and simply increasing verbosity is known to improve the Elo scores. This is exactly what Starling does, being generally a lot more verbose than OpenChat. Considering their excellent performance, I'd be very curious to see how 7B merges would rank on the Chatbot Arena Leaderboard (cc Arcee.ai). 👀 🤗 Model: https://lnkd.in/eyC7hfDF
-
-
Have we really squeezed out the capacity of a compact chat model? Thrilled to see our latest open model, Starling-7B, ranks 13th among all models in Chatbot Arena! 🚀 As a 7B model, Starling surpasses larger open and proprietary models, including Claude-2, GPT-3.5-Turbo, Gemini Pro, Mixtral 8x7B and Llama2-70B, and is currently the best 7B chat model in Chatbot Arena! #Chatbot #AI #LLM
-
-
Nexusflow reposted this
The beta version of Starling finally arrived! 🚀 Presenting Starling-LM-7B-beta, our cutting-edge 7B language model fine-tuned with RLHF! 🌟 Also introducing Starling-RM-34B, a Yi-34B-based reward model trained on our Nectar dataset, surpassing our previous 7B RM in all benchmarks. ✨ We've fine-tuned the latest Openchat model with the 34B reward model, achieving MT Bench score of 8.12 while being better at hard prompts compared to Starling-LM-7B-alpha. Testing will soon be available on lmsys. Please stay tuned! 🔗 HuggingFace links: [Starling-LM-7B-beta] https://lnkd.in/gT2EAN3r [Starling-RM-34B] https://lnkd.in/gxwbcdsi Discord Link: https://lnkd.in/g6Q3UiM8 Since the release of Starling-LM-7B-alpha, we've received numerous requests to make the model commercially viable. Therefore, we're licensing all models and datasets under Apache-2.0, with the condition that they are not used to compete with OpenAI. Enjoy!
-
🚀 Presenting Starling-LM-7B-beta, our new cutting-edge 7B language model fine-tuned with RLHF! 🌟 Also introducing Starling-RM-34B, the workhorse Reward Model behind the Starling-LM-7B-beta, ranking #1 in the latest RewardBenchmark from Nathan Lambert and the Allen Institute for AI (AI2) team. 🔗 HuggingFace links: [Starling-LM-7B-beta] https://lnkd.in/ecM4JXG5 [Starling-RM-34B] https://lnkd.in/erTkfu4N 🔗 Discord Link: https://lnkd.in/eBE73FaF 🔗 RewardBench from @allenai_org: https://lnkd.in/eKHJaFjJ Since the release of Starling-LM-7B-alpha, we've received numerous requests to make the model commercially viable. Therefore, we're licensing all models and datasets under Apache-2.0, with the condition that they are not used to compete with OpenAI. Enjoy!
-
-
📢 Powerful information extraction app built by Stefano Fiorucci and deepset Haystack team using #NexusRaven-V2 LLM for function calling! 🔥 We are thrilled to empower high-quality Gen AI apps with our compact LLMs and toolings.
🧪🐦⬛📑 𝐅𝐫𝐨𝐦 𝐫𝐚𝐰 𝐭𝐞𝐱𝐭 𝐭𝐨 𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐝𝐚𝐭𝐚 𝐰𝐢𝐭𝐡 𝐋𝐋𝐌𝐬 🎯 The challenge ▸ you have a pile of unstructured texts from which you want to extract information in structured form ▸ the desired information can vary dynamically ▸ you want to combine tasks like text classification, NER, summarization, etc. 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 𝐰𝐢𝐭𝐡 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐜𝐚𝐥𝐥𝐢𝐧𝐠 𝐜𝐚𝐩𝐚𝐛𝐢𝐥𝐢𝐭𝐢𝐞𝐬 can be flexible tools 🛠️ for this job! Take a look at the notebook: 📓 https://lnkd.in/dXDtqYyE 🗝️ A (personal) journey 🔹 It all began with Kyle McDonald's gist, where GPT-3.5-turbo was used to extract structured information from an article. 🔹 Fascinated by this idea, I explored the use of open models fine-tuned for function calling: I experimented with Gorilla OpenFunctions, to extract information about animals. 🔹 Now: armed with the powerful 🐦⬛ 𝐍𝐞𝐱𝐮𝐬𝐑𝐚𝐯𝐞𝐧 V2 model (by Nexusflow) and #haystack 2.0, I revisited the experiment and made it more challenging. ✨ Results 🔸 Haystack's LLM framework is model agnostic, so model switching went smoothly 🔸 Nexus Raven outperforms Gorilla OpenFunctions for this use case 🔸 Using a statistical model carries some caveats, which I have outlined in the notebook. "Let's unlock the potential of unstructured text, one function call at a time." ☝ The last sentence is generated by ChatGPT, but I found it silly and funny. 😁 #largelanguagemodels #informationextraction #llm #genai #opensource
-
🚀 Exciting breakthrough in LLM reliability! 🧠NexusRaven-V2, our cutting-edge function-calling LLM, has set a new standard in minimizing AI hallucinations, surpassing GPT-4's performance in a recent third-party independent research benchmark. Dive into our latest blog post to explore how we're pioneering reliable agents with minimal hallucinations: [https://lnkd.in/egUU9wpz] Key Highlights: 🏆 Zero Hallucinations: NexusRaven-V2 showcased remarkable accuracy with zero hallucinations in 840 tests, focusing on tool selection and usage – a significant leap over GPT-4 with 23 hallucinations. 📈 Enhanced Success Rates: It boasts a 9% higher success rate than GPT-4 in information-seeking applications requiring meticulous attention to detail and a 4% increase in adversarial scenarios that demand a deep understanding of tool documentation, even with vague tool and API argument names. Try NexusRaven-V2 on Huggingface: [https://lnkd.in/eF6r9qgt] Check out the original third-party benchmark: [https://lnkd.in/ehAM5UAi] #GenAI #LLM #NexusRavenV2 #Technology #Innovation
Towards Reliable Agents, with Minimal Hallucination
nexusflow.ai
-
A sincere thank you to Ben Lorica 罗瑞卡 for extending your platform, The Data Exchange Podcast, to us and for engaging in a fantastic conversation. We are excited to contribute to the foundation of #AI Agents and democratize the technology! 🎧Listen to the full episode “AI Co-Pilots in Action: Transforming Function Calling in Cybersecurity”: https://bit.ly/3vGpU4m
🚀 Elevate your cybersecurity game with NexusRaven-V2 🔐🚨 Jian Zhang of Nexusflow on how their advanced AI co-pilot, with unmatched function calling, can transform your tech strategy. #GenAI #Cybersecurity #infosec #rsac #LLM https://lnkd.in/gJwWYYh5
AI Co-Pilots in Action: Transforming Function Calling in Cybersecurity
http://thedataexchange.media
-
Thank you, Deci AI and Harpreet Sahota 🥑, for featuring NexusRaven-V2 in the top 10 compact & robust models. Stay tuned for what's to come in 2024!
Check out Harpreet Sahota 🥑's latest blog diving into the world of smaller LLMs. Despite their compact size, these models are making waves in performance, challenging our understanding of efficiency and capability in AI. This blog explores LLMs with 1 billion, 3 billion, 7 billion, and 13 billion parameters, covering their training data, popularity metrics, and unique contributions. Don't miss it! 🚀 Read now > https://lnkd.in/gD-2e63M #llms #llm #largelanguagemodels #generativeai