Vellum

Software Development

The developer platform for building Large Language Model applications

See jobs Follow

View all 15 employees

About us

Vellum is a developer platform for building production-worthy applications on LLMs like OpenAI’s GPT-4 or Anthropic’s Claude. Use Vellum to save hours on prompt engineering, iterate on prompts in production confidently, and continuously fine-tune for even better results.

Website: https://www.vellum.ai/
External link for Vellum
Industry: Software Development
Company size: 11-50 employees
Headquarters: New York
Type: Privately Held
Founded: 2023

Locations

Primary

Madison Avenue

New York, US

Get directions

Employees at Vellum

See all employees

Updates

Vellum reposted this

Pei 🤖 Li
1mo
Report this post
Here's a fun one for all of you: Recursively generating a prompt that generates flirtatious and NSFW messages. As part of Vellum.ai's offsite, I wanted to experiment with a different way of evaluating prompts: pitting two prompts against each other. Check out the full blog post for some cool learnings! https://lnkd.in/ezirpyWY

The FaceMash of Prompt Evaluation

vellum.ai

10 Comments

Like Comment Share
Vellum reposted this

Anita Kirkovska

GenAI Growth @ Vellum (YC W23)
2mo
Report this post
Fresh from the oven! 🍪 Vellum now integrates with Groq! Now, you can access and run all of Groq's hosted models on Vellum at incredible inference speeds. The current speed of Llama 3 70B is 280 tokens/second 🤯...
1 Comment

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
2mo
Report this post
"We start with Vellum and end with Vellum." - Max Bryan, VP of Technology and Design at Rentgrata A piece of advice I got from a fellow founder in the early days of Vellum was to understand what people do right before using vellum, and right after using vellum --> and build product to make that whole journey easier. I'm so excited to share that Ben Margolit, Max and their team at Rentgrata built their game changing real estate chatbot Ari using our platform end-to-end. Property management companies can now gain insights into resident satisfaction using Ari instead of having to dig through reams of text. Using Vellum throughout their entire AI development lifecycle - experimentation, evaluation to production monitoring, Rentgrata achieved 100% accuracy, accelerated their development timeline by 50%, and enabled their customers to make data-driven decisions to boost resident happiness. In one powerful example, Rentgrata's insights helped a company avoid a costly $300k window renovation and instead allocate funds more effectively based on the true resident priorities. We're proud to partner with leaders like Max, Ben and the Rentgrata team to bring AI to the proptech industry. Excited to see how Ari continues to revolutionize the renter and property manager experience! Read more about Rentgrata's test-driven journey from prototype to production here: https://lnkd.in/dzcCSA58

Rentgrata's Test Driven Journey to a Production-Ready Chatbot

vellum.ai

2 Comments

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
2mo
Report this post
🔔 Another exciting customer success story! 🔔 This time with our friends Lior Solomon, Pratik Bhat, Daniel Marashlian, Brian Elmi and Adam Markowitz over at Drata! Drata is a leader in the security and compliance automation & they're well on their way to build dozens of AI powered features for their thousands of customers (including Vellum!). Drata faced a common dilemma: how could they build AI capabilities that provide value to their customers, without compromising their focus on trust and data security? Traditional tools didn't provide the iterative workflow they were looking for. That's when they discovered Vellum. "We know the power of AI, but how do we make it secure and ensure that we're not compromising privacy and security while still providing value? Vellum has been a big part of accelerating that experimentation part, allowing us to validate that a feature is high-impact and feasible" -- says Pratik from Drata Drata's team follows a test driven development process that utilizes our platform end-to-end: 1) Ideate: Find use cases where AI can create customer value 2) Validate: Spin up an MVP using Vellum's Workflows in a couple of days 3) Prioritize: Determine quality, cost & latency using Vellum Evaluations 4) Test: Use the MVP internally, gather feedback. Vellum Deployments help with monitoring, rapid iteration and version control I'm excited for this partnership and seeing where Drata goes next with their AI development! Case study details in comments. Reach out if you want to discuss AI use cases for your company.

2 Comments

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
2mo
Report this post
Super excited to finally share our collaboration with Sebastian Lozano, Derek Jones and rest of the team at Redfin! The team at Redfin worked with us for the last 6 months on their AI powered chatbot, Ask Redfin. Ask Redfin powers conversations for thousands of users every day, learn more about it here: https://lnkd.in/dZij8SbU It was an absolute pleasure working with the Redfin team on developing this chatbot. Given the scale of the chatbot they made extensive use of our Workflows & Evaluations capabilities without needing to build any of this tooling in-house. Using Vellum, Redfin was able to: 1. Collaborate on prompts to pick the right prompt/model combination 2. Build complex AI virtual assistant logic by connecting prompts, classifiers, APIs and data manipulation steps 3. Systematically evaluate prompts pre-production using hundreds of test cases I'm excited for this partnership and seeing where Ask Redfin goes next! Case study details in comments. Reach out if you want to discuss advanced AI chatbot development!

20 Comments

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
2mo
Report this post
"Vellum has been a game-changer for us. The speed at which we can now iterate and improve our AI-generated content is incredible. It's allowed us to stay ahead of the curve and deliver truly personalized, engaging experiences for our customers." - Daniel Wiener, Founder at Autobound 🚀 Another day, another happy customer 😃. I'm thrilled to share how Autobound Vellum's product to achieve a 20x improvement in their end-to-end LLM development cycle. "Vellum is the only product in this space that I'm truly excited about, and I've tried a lot of them", says Daniel. Autobound's team can now easily test various prompts and models, manage deployed prompts, and rapidly test on real-world scenarios using live data. This level of efficiency was previously unattainable with other sandbox environments. The results speak for themselves. Autobound has reduced the latency of their email generation system by an impressive 4-5x, from 30 seconds down to just 6-7 seconds per email. This dramatic improvement was achieved by experimenting with different models, prompts, and fine-tuned models within Vellum's platform. We're excited to be a part of Autobound's success story and look forward to seeing what they accomplish next with Vellum! You can read more about the partnership here: https://lnkd.in/dr_XRZCE

How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

vellum.ai

3 Comments

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
2mo
Report this post
"Is Anthropic's Claude 3 Opus really better than OpenAI's GPT-4 Turbo?" The AI community was captivated since Anthropic announced the Claude 3 family of models and shared performance stats in their internal testing. At Vellum we've always maintained that performance on standardized benchmark tasks doesn't equate to performance on YOUR specific tasks. An eval framework is needed to quantitatively experiment between different prompts, models and prompt chains. But we couldn't resist digging in and doing our own benchmarking. In addition to standard benchmarks, we compared these models on 6 specific tasks: 1) Large context retrieval 2) Math tests 3) Document summarization 4) Data extraction 5) Graph interpretation 6) Coding Want to see which model came out on top? Read our report here: https://lnkd.in/dh-uSpvf How are you running evals across these models today?

Claude 3 Opus vs GPT-4: Task Specific Analysis

vellum.ai

5 Comments

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
3mo
Report this post
👇 Let's talk about logprobs: A powerful tool for building LLM apps which have asymmetric risk exposure Say you operate in a regulated industry and are worried about the impact your LLM chatbot could have if it's not compliant with the law. A common architecture we suggest is a "Guardrail prompt" towards the end of your chatbot. Guardrail prompt(s) check for specific criteria (e.g., is the response compliant? is the tone appropriate etc.) and help with a safe exit when things are not shaping up as expected. As always, remember to have a fast LLM like GPT 3.5 Turbo or Claude 3 Haiku as your Guardrail prompt because it will usually be in the critical path. You set up a good Guardrail prompt, and you get a response that your chatbot is indeed compliant. But when you look closer, the Guardrail prompt should have said that it's not compliant. The evaluator is incorrect! Here's where Logprobs come in. Logprobs is an additional property returned in the LLM API response which shows the full probability distribution of each token -> and you can use it to your advantage. Look at the example in the image. Our simple guardrail prompt is only instructed to output a single token: whether the support agent is in compliance with HIPAA or not. The guardrail prompt is only 70% sure in its answer and when there's an asymmetric risk in being wrong, you may want to override the Guardrail prompt's response. Sometimes we recommend our customers take Guardrail prompt responses only when the prompt is 99%+ confident. How do you set up your prompts and logic to get your systems working reliably in production?
2 Comments

Like Comment Share
Vellum reposted this

Akash Sharma

CEO at vellum
3mo
Report this post
🤯 Vellum's changelog for March blows my mind 🤯 https://lnkd.in/dy_Gt6Mu A key factor in the success of all startups is the rate at which they 🚢 . I'm extremely impressed by the work of our engineering team Noa Flaherty Sidd Seethepalli David Vargas Jackson Hardy Pei 🤖 Li Maddie Abboud Nihala Thanikkal Ashlee Radka Here’s a quick rundown of my favorites: ⛓️ Subworkflow Nodes: Easily reuse groups of nodes across multiple Workflows. 🛑 Workflow Error Nodes: Terminate workflows and raise errors for better control. 📖 Read-only Workflow Diagrams: Visualize workflow executions for better debugging. 📐 Workflow Node Mocking: Speed up development by mocking specific nodes. 🧪 Easier Node Debugging: Test elements individually without running the entire workflow. ✂️ Customizable Chunk Settings: Optimized Document Indexes to improve search results. Hoping for a huge April and beyond! 🚀
5 Comments

Like Comment Share
Vellum reposted this

Anita Kirkovska

GenAI Growth @ Vellum (YC W23)
3mo
Report this post
Gemini 1.5 Pro and Claude 3 can process over 1 million tokens, with 99% accuracy. Naturally, people are asking: Is RAG is dead? While some folks think so, we believe that RAG is absolutely here to stay, but the architecture will evolve to accommodate long-context use-cases when needed. But what is the current state of these two approaches? Here's the TLDR 👇🏻 💬 Long-context allows for retrieval and reasoning, simplifying and potentially speeding up AI workflows by reducing the need for complex data chunking and retrieval strategies. However, current long-context approaches can increase latency, but innovations in caching and hardware, like new processors, are being developed to make these queries faster and more cost-efficient. ⛓️ RAG remains a preferred choice for many developers due to its speed, cost-effectiveness, and simplicity in debugging. It supports up-to-date information retrieval, can easily fix "lost in the middle" issues, and offers deterministic security for sensitive data. Read in detail in our latest blog post: https://lnkd.in/dnZsXUx9 Did I miss anything? Let me know in the comments.
5 Comments

Like Comment Share

Funding

Vellum 3 total rounds

Last Round

Seed Aug 11, 2023

US$ 5.0M

See more info on crunchbase

Vellum

Software Development

The developer platform for building Large Language Model applications

About us

Locations

Employees at Vellum

Collin Wallace

3x Founder | Partner@Lobby Capital | Former Head of Innovation@Grubhub | Stanford MBA | Georgia Tech Engineer | Investing in the best ideas for the…

John Arndt

GTM @ Vellum (YC23) | LLMOps • GenAI

Akash Sharma

CEO at vellum

Pei 🤖 Li

Updates

The FaceMash of Prompt Evaluation

vellum.ai

Rentgrata's Test Driven Journey to a Production-Ready Chatbot

vellum.ai

How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

vellum.ai

Claude 3 Opus vs GPT-4: Task Specific Analysis

vellum.ai

Join now to see what you are missing

Similar pages

Dover

Venue.live

LiteLLM (YC W23)

Pylon

Quill AI

Mano Health

Helicone (YC W23)

SID.ai (YC S23)

Truewind

Twine (YC S23)

Funding