Vellum

Vellum

Software Development

The developer platform for building Large Language Model applications

About us

Vellum is a developer platform for building production-worthy applications on LLMs like OpenAI’s GPT-4 or Anthropic’s Claude. Use Vellum to save hours on prompt engineering, iterate on prompts in production confidently, and continuously fine-tune for even better results.

Website
https://www.vellum.ai/
Industry
Software Development
Company size
11-50 employees
Headquarters
New York
Type
Privately Held
Founded
2023

Locations

Employees at Vellum

Updates

  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    "We start with Vellum and end with Vellum." - Max Bryan, VP of Technology and Design at Rentgrata A piece of advice I got from a fellow founder in the early days of Vellum was to understand what people do right before using vellum, and right after using vellum --> and build product to make that whole journey easier. I'm so excited to share that Ben Margolit, Max and their team at Rentgrata built their game changing real estate chatbot Ari using our platform end-to-end. Property management companies can now gain insights into resident satisfaction using Ari instead of having to dig through reams of text. Using Vellum throughout their entire AI development lifecycle - experimentation, evaluation to production monitoring, Rentgrata achieved 100% accuracy, accelerated their development timeline by 50%, and enabled their customers to make data-driven decisions to boost resident happiness. In one powerful example, Rentgrata's insights helped a company avoid a costly $300k window renovation and instead allocate funds more effectively based on the true resident priorities. We're proud to partner with leaders like Max, Ben and the Rentgrata team to bring AI to the proptech industry. Excited to see how Ari continues to revolutionize the renter and property manager experience! Read more about Rentgrata's test-driven journey from prototype to production here: https://lnkd.in/dzcCSA58

    Rentgrata's Test Driven Journey to a Production-Ready Chatbot

    Rentgrata's Test Driven Journey to a Production-Ready Chatbot

    vellum.ai

  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    🔔 Another exciting customer success story! 🔔 This time with our friends Lior Solomon, Pratik Bhat, Daniel Marashlian, Brian Elmi and Adam Markowitz over at Drata! Drata is a leader in the security and compliance automation & they're well on their way to build dozens of AI powered features for their thousands of customers (including Vellum!). Drata faced a common dilemma: how could they build AI capabilities that provide value to their customers, without compromising their focus on trust and data security? Traditional tools didn't provide the iterative workflow they were looking for. That's when they discovered Vellum. "We know the power of AI, but how do we make it secure and ensure that we're not compromising privacy and security while still providing value? Vellum has been a big part of accelerating that experimentation part, allowing us to validate that a feature is high-impact and feasible" -- says Pratik from Drata Drata's team follows a test driven development process that utilizes our platform end-to-end: 1) Ideate: Find use cases where AI can create customer value 2) Validate: Spin up an MVP using Vellum's Workflows in a couple of days 3) Prioritize: Determine quality, cost & latency using Vellum Evaluations 4) Test: Use the MVP internally, gather feedback. Vellum Deployments help with monitoring, rapid iteration and version control I'm excited for this partnership and seeing where Drata goes next with their AI development! Case study details in comments. Reach out if you want to discuss AI use cases for your company.

  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    Super excited to finally share our collaboration with Sebastian Lozano, Derek Jones and rest of the team at Redfin! The team at Redfin worked with us for the last 6 months on their AI powered chatbot, Ask Redfin. Ask Redfin powers conversations for thousands of users every day, learn more about it here: https://lnkd.in/dZij8SbU It was an absolute pleasure working with the Redfin team on developing this chatbot. Given the scale of the chatbot they made extensive use of our Workflows & Evaluations capabilities without needing to build any of this tooling in-house. Using Vellum, Redfin was able to: 1. Collaborate on prompts to pick the right prompt/model combination 2. Build complex AI virtual assistant logic by connecting prompts, classifiers, APIs and data manipulation steps 3. Systematically evaluate prompts pre-production using hundreds of test cases I'm excited for this partnership and seeing where Ask Redfin goes next! Case study details in comments. Reach out if you want to discuss advanced AI chatbot development!

  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    "Vellum has been a game-changer for us. The speed at which we can now iterate and improve our AI-generated content is incredible. It's allowed us to stay ahead of the curve and deliver truly personalized, engaging experiences for our customers." - Daniel Wiener, Founder at Autobound 🚀 Another day, another happy customer 😃. I'm thrilled to share how Autobound Vellum's product to achieve a 20x improvement in their end-to-end LLM development cycle. "Vellum is the only product in this space that I'm truly excited about, and I've tried a lot of them", says Daniel. Autobound's team can now easily test various prompts and models, manage deployed prompts, and rapidly test on real-world scenarios using live data. This level of efficiency was previously unattainable with other sandbox environments. The results speak for themselves. Autobound has reduced the latency of their email generation system by an impressive 4-5x, from 30 seconds down to just 6-7 seconds per email. This dramatic improvement was achieved by experimenting with different models, prompts, and fine-tuned models within Vellum's platform. We're excited to be a part of Autobound's success story and look forward to seeing what they accomplish next with Vellum! You can read more about the partnership here: https://lnkd.in/dr_XRZCE

    How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

    How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

    vellum.ai

  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    "Is Anthropic's Claude 3 Opus really better than OpenAI's GPT-4 Turbo?" The AI community was captivated since Anthropic announced the Claude 3 family of models and shared performance stats in their internal testing. At Vellum we've always maintained that performance on standardized benchmark tasks doesn't equate to performance on YOUR specific tasks. An eval framework is needed to quantitatively experiment between different prompts, models and prompt chains. But we couldn't resist digging in and doing our own benchmarking. In addition to standard benchmarks, we compared these models on 6 specific tasks: 1) Large context retrieval 2) Math tests 3) Document summarization 4) Data extraction 5) Graph interpretation 6) Coding Want to see which model came out on top? Read our report here: https://lnkd.in/dh-uSpvf How are you running evals across these models today?

    Claude 3 Opus vs GPT-4: Task Specific Analysis

    Claude 3 Opus vs GPT-4: Task Specific Analysis

    vellum.ai

  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    👇 Let's talk about logprobs: A powerful tool for building LLM apps which have asymmetric risk exposure Say you operate in a regulated industry and are worried about the impact your LLM chatbot could have if it's not compliant with the law. A common architecture we suggest is a "Guardrail prompt" towards the end of your chatbot. Guardrail prompt(s) check for specific criteria (e.g., is the response compliant? is the tone appropriate etc.) and help with a safe exit when things are not shaping up as expected. As always, remember to have a fast LLM like GPT 3.5 Turbo or Claude 3 Haiku as your Guardrail prompt because it will usually be in the critical path. You set up a good Guardrail prompt, and you get a response that your chatbot is indeed compliant. But when you look closer, the Guardrail prompt should have said that it's not compliant. The evaluator is incorrect! Here's where Logprobs come in. Logprobs is an additional property returned in the LLM API response which shows the full probability distribution of each token -> and you can use it to your advantage. Look at the example in the image. Our simple guardrail prompt is only instructed to output a single token: whether the support agent is in compliance with HIPAA or not. The guardrail prompt is only 70% sure in its answer and when there's an asymmetric risk in being wrong, you may want to override the Guardrail prompt's response. Sometimes we recommend our customers take Guardrail prompt responses only when the prompt is 99%+ confident. How do you set up your prompts and logic to get your systems working reliably in production?

    • No alternative text description for this image
  • Vellum reposted this

    View profile for Akash Sharma, graphic

    CEO at vellum

    🤯 Vellum's changelog for March blows my mind 🤯 https://lnkd.in/dy_Gt6Mu A key factor in the success of all startups is the rate at which they 🚢 . I'm extremely impressed by the work of our engineering team Noa Flaherty Sidd Seethepalli David Vargas Jackson Hardy Pei 🤖 Li Maddie Abboud Nihala Thanikkal Ashlee Radka Here’s a quick rundown of my favorites: ⛓️ Subworkflow Nodes: Easily reuse groups of nodes across multiple Workflows. 🛑 Workflow Error Nodes: Terminate workflows and raise errors for better control. 📖 Read-only Workflow Diagrams: Visualize workflow executions for better debugging. 📐 Workflow Node Mocking: Speed up development by mocking specific nodes. 🧪 Easier Node Debugging: Test elements individually without running the entire workflow. ✂️ Customizable Chunk Settings: Optimized Document Indexes to improve search results. Hoping for a huge April and beyond! 🚀

    • No alternative text description for this image
  • Vellum reposted this

    View profile for Anita Kirkovska, graphic

    GenAI Growth @ Vellum (YC W23)

    Gemini 1.5 Pro and Claude 3 can process over 1 million tokens, with 99% accuracy. Naturally, people are asking: Is RAG is dead? While some folks think so, we believe that RAG is absolutely here to stay, but the architecture will evolve to accommodate long-context use-cases when needed. But what is the current state of these two approaches? Here's the TLDR 👇🏻 💬 Long-context allows for retrieval and reasoning, simplifying and potentially speeding up AI workflows by reducing the need for complex data chunking and retrieval strategies. However, current long-context approaches can increase latency, but innovations in caching and hardware, like new processors, are being developed to make these queries faster and more cost-efficient. ⛓️ RAG remains a preferred choice for many developers due to its speed, cost-effectiveness, and simplicity in debugging. It supports up-to-date information retrieval, can easily fix "lost in the middle" issues, and offers deterministic security for sensitive data. Read in detail in our latest blog post: https://lnkd.in/dnZsXUx9 Did I miss anything? Let me know in the comments.

    • No alternative text description for this image

Similar pages

Funding

Vellum 3 total rounds

Last Round

Seed

US$ 5.0M

See more info on crunchbase