Deasie

Software Development

Automated labeling of unstructured data.

See jobs Follow

View all 11 employees

About us

Deasie is a platform for automating the labelling of unstructured data. Deasie’s workflow enables data & business teams to rapidly create best-in-class metadata (with labels that are customisable, auto-suggested, quality-checked, standardised and hierarchical) across enterprise-level volumes of both text & image data. Deasie deploy (on-prem) with some of the largest manufacturing, healthcare and financial service firms in the world, on use cases ranging from data cataloging, to enhancing RAG accuracy, to removing sensitive data ahead of GenAI roll-out.

Website: https://www.deasie.com
External link for Deasie
Industry: Software Development
Company size: 11-50 employees
Headquarters: New York
Type: Privately Held
Founded: 2023

Locations

Primary

New York, US

Get directions

Employees at Deasie

See all employees

Updates

Deasie reposted this

TechCrunch

2,937,604 followers
1d
Report this post
OpenAI, Adobe, and Microsoft have thrown their support behind a California bill requiring tech companies to label AI-generated content. The bill is headed for a final vote in August. AB 3211 requires watermarks in the metadata of AI-generated photos, videos, and audio clips. Lots of AI companies already do this, but most people don’t read metadata. AB 3211 also requires large online platforms, like Instagram or X, to label AI-generated content in a way average viewers can understand.

OpenAI, Adobe and Microsoft support California bill requiring watermarks on AI content | TechCrunch

https://techcrunch.com

9 Comments

Like Comment Share
Deasie reposted this

Reece Griffiths

Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling
2d
Report this post
Metadata has a key role to play in RAG, and yet still today even the most advanced data science functions are only just starting to explore this lever for guiding LLMs towards the most relevant chunks of data for a given query. --- A few commonalities between most data science functions we speak with: 1. They are all building some form of RAG 2. Most are still in the phase of trying to move from proof-of-concept RAG to production-ready tools, which often requires handling a step change in data volume 3. Only the most advanced teams are considering the role of metadata in their pipelines 4. Within those teams who are integrating metadata within their RAG, the labels being used often remain very generic --> We believe that DS teams who are capable of generating high-quality, standardized metadata (with labels that are customized to both their data and their RAG use case) will have a far easier time bridging the PoC --> Production gap, especially as they scale to typical enterprise data volumes. --- 🔬 We recently tested the impact of metadata on retrieval accuracy when asking 30 questions across 10, 50 and 90 complex legal contracts, in each case assessing whether the LLM correctly retrieved the most relevant chunk for answering the question. In the scenario where metadata was included, an LLM was used to evaluate the relevance of each chunk based on the labels added. 📈 Results: -- With ~10 documents, metadata had no impact -- With ~100 documents, metadata increased accuracy by ~13% #metadataforRAG #unstructureddatalabeling #genaigovernance
11 Comments

Like Comment Share
Deasie reposted this

Sharon Goldman

AI reporter at Fortune
5d
Report this post
NEW for Fortune: Building today’s massive AI models can cost hundreds of millions of dollars, with projections suggesting it could hit a staggering billion dollars within a few years. Much of that expense is for computing power from specialized chips—typically Nvidia GPUs, of which tens of thousands may be required, costing as much as $30,000 each. But companies training AI models, or fine-tuning existing models to improve performance on specific tasks, also struggle with another often overlooked and rising cost: data labeling. This is a painstaking process in which generative AI models are trained with data that is affixed with tags so that the model can recognize and interpret patterns. Data labeling has long been used to develop AI models for self-driving cars, for example. A camera captures images of pedestrians, street signs, cars, and traffic lights and human annotators label the images with words like “pedestrian,” “truck,” or “stop sign.” The labor-intensive process has also raised ethics concerns. After releasing ChatGPT in 2022, OpenAI was widely criticized for outsourcing the data labeling work that helped make the chatbot less toxic to Kenyans earning less than $2 hourly. Today’s generic large language models (LLMs) go through an exercise related to data labeling called Reinforcement Learning Human Feedback, in which humans provide qualitative feedback or rankings on what the model produces. That is one significant source of rising costs, as is the effort involved in labeling private data that companies want to incorporate into their AI models, such as customer information or internal corporate data. In addition, labeling highly technical, expert-level data in fields like legal, finance, and healthcare is driving up expenses. That’s because some companies are hiring high-cost doctors, lawyers, PhDs, and scientists to label certain data or outsourcing the work to third-party companies such as Scale AI, which recently secured a jaw-dropping $1 billion in funding as its CEO predicted strong revenue growth by year-end. Thanks to William Falcon Kjell Carlsson, Ph.D. Neal K. Shah Matt Shumer Bob Rogers for their comments! https://lnkd.in/eygmGJi3

The hidden reason AI costs are soaring—and it’s not because Nvidia chips are more expensive

fortune.com

36 Comments

Like Comment Share
Deasie

1,071 followers
5d
Report this post
Experiment: Impact of Few-Shot Learning on data labeling performance ❓ Context: Customers often seek to "fine-tune" Deasie’s data labeling workflow to enhance classification performance. For instance, identifying specific legal clauses in long contracts can be challenging. A base model might handle obvious cases well but struggle with nuanced language requiring expert interpretation. 🛠 In-Context Fine-Tuning ("Few-Shot Learning"): Few-shot learning is an attractive approach for 'fine-tuning' specific data annotation tasks using a handful of labeled examples, without needing to fine-tune a base model or gather large training datasets. --- 💡 Experiment Setup: We tested the impact of adding 1-3 labeled examples (e.g., when you saw this input text, a correct classification would have been Y) through in-context fine-tuning on a model's classification performance. We used GPT-4o with a basic prompt to identify clauses in several hundred legal documents (e.g., Agreement Date, Change of Control, Uncapped Liability, Non-Disparagement). -- 📊 Results: From just a handful of examples, a 10% increase in accuracy and 6% increase in precision were observed. Without the labeled examples, the model had a tendency to identify a higher volume of false-positives (reducing the overall accuracy). Whilst Recall reduced, the net F1 increased by 4%. --> Few shot learning can provide a quick and effective way of enhancing data annotation conducted with LLMs. Examples should be chosen carefully to reflect the trade-off of enhancing Recall vs. Precision (the more important of which will depend on the specific annotation use case). #dataannotation #dataclassification #unstructureddatalabeling
Like Comment Share
Deasie reposted this

Brian Mink

AI Entrepreneur | Executive | Board Director | Attorney | AI Keynote Speaker & Professor
1w Edited
Report this post
Don’t miss part 2 of our Times Square billboard 👀 ❓How do you make AI a trusted partner in your organization? A few takeaways so far from today’s ALIGN AI Executive Summit in Midtown Manhattan: - Leading organizations are already deriving a ton of value from AI models TODAY. Yes, we are in the early stages and there is certainly some hype, but the potential is real and already being validated by models currently in production. - Operationalizing AI requires sound data governance, data quality, data observability, and more. Now is the time to take stock of your data pipelines, tech stack, and processes, and position your organization for success. - Start with your broader organizational OKRs and KPIs, then identify your highest impact AI initiatives—not the other way around. Just because a use case is cool and you can do it, doesn’t mean it *aligns* with your business goals. Thank you again to our sponsors for making this incredible event possible! If you are feeling paralyzed in your AI journey, you’re not alone—and these are the folks you need to talk to: Deloitte | DataRobot | KUNGFU.AI | EPAM Systems | Fiddler AI | Pure Storage | Monte Carlo | Precisely | Zenlytic | Tecton | Alation | Stevens Institute of Technology | Fivetran | Deasie Data Science Connect | Amelia Mink

Like Comment Share
Deasie reposted this

Data Science Connect

7,481 followers
1w Edited
Report this post
Times Square, we’ve arrived! We’re thrilled to be in the heart of NYC for tomorrow’s ALIGN AI Executive Summit, where we’ll gather some of the city’s brightest minds in data and AI, representing top companies that drive innovation in the Big Apple. 🍎 A huge thank you to our visionary sponsors leading the charge in AI innovation, our brilliant speakers for sharing their wisdom, and the 200+ senior data executives joining us for insightful roundtable discussions on AI’s impact across industries. Deloitte | DataRobot | Pure Storage | Fiddler AI | Monte Carlo | KUNGFU.AI | EPAM Systems | Precisely | Zenlytic | Alation | Fivetran | Tecton | Stevens Institute of Technology | Deasie #GenAI #DataExecutives #NYC #ALIGNAI

Like Comment Share
Deasie reposted this

Leonard Platzer

Co-founder | CTO @ Deasie
6d
Report this post
It was great to present as a sponsor of the Data Science Connect summit in New York yesterday and share the cool work we’re doing at Deasie with data & engineering leaders from around the globe. --- Three takeaways from many hours of conversations: 1) In many enterprises, 90%+ of innovation projects are now tied to some form of GenAI 2) The top of mind issues for data & AI execs are: (a) Data readiness (b) Security and (c) Demonstrating tangible RoI from their chosen LLM use cases 3) In the data labeling space, one of the biggest challenges companies face is knowing what metadata should be defined in the first place, exacerbated by the frequent disconnect between DS teams and data domain experts (this is where Deasie’s ‘auto-suggested labelling’ workflow received a lot of attention!) --- Excited to continue shaping the conversation around unstructured data management and next-generation metadata tooling! #datalabeling #unstructureddata #enterprisegenai
2 Comments

Like Comment Share
Deasie reposted this

Reece Griffiths

Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling
1w
Report this post
In March of this year, Harvard Business Review released a survey conducted across 330+ data leaders discussing their level of ‘data readiness’ for GenAI. The results indicated that only 6% of enterprises had succeeded in getting GenAI into production. Fast forward several months and our ongoing conversations with data & AI practitioners suggest that little progress has been made on this front. As the article rightfully mentioned, “for most organizations it will be a monumental effort to curate, clean, and integrate all unstructured data for use in genAI applications.” The reality is that most enterprises require a step change in their approach to unstructured data management—across metadata, data quality, sensitive data management, versioning, and more—if they are to leverage these internal assets at scale within LLMs. --- Across those taking action, we broadly see two camps of data leaders: ⚒ 1) Data foundation-first: Invest in company-wide data infrastructure to build a robust foundation to support a roadmap of use cases to come. 💡 2) Business use case-first: Start with transforming the narrowest data domain possible required to bring a given use case into production. While the first encourages healthy foresight into building the right 'stack', we are consistently seeing greater success when data teams are strongly led by specific business use cases where the RoI of adopting GenAI is very tangible. #unstructureddata #genaiadoption #metadatamanagement
1 Comment

Like Comment Share
Deasie reposted this

Reece Griffiths

Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling
3w
Report this post
Deasie voted again as the number 1 startup globally in Hatcher+'s latest market report for July. Great to receive this recognition 🙌 For data & data science leaders thinking about best-practice RAG, automated data labelling, and next-gen data governance - we'd love to speak with you.

Hatcher+

8,242 followers
4w

🚀 The Hatcher+ Top 100 for July is Out Now! 🚀 👉Full Top 100 List : https://lnkd.in/gYM7TT9e We're excited to unveil the Hatcher+ Top 100 list, celebrating the top startups for July! This AI-powered list highlights the most promising startups from around the globe. Congratulations to the 100 startups for making this month’s list. Your hard work and innovation have truly shone through. 👉Full Top 100 List : https://lnkd.in/gYM7TT9e If your startup didn’t make it this time, don’t be discouraged. The Hatcher+ Top 100 is a monthly feature, giving you a new chance to be recognized every month. We encourage you to continue your amazing efforts and try again for the August list. To participate, simply ensure your startup’s profile is up-to-date on our platform. Our AI will review all active profiles to determine the next top 100 startups. Existing winners can continually improve their Hatcher scores to stay competitive. 👉 Sign Up Here : https://lnkd.in/gjdngV5B #HatcherTop100 #HatcherPlus #Awards #Startups #Innovation #TechStartups #GlobalStartups #VentureCapital #Investment #TechInnovation #StartupEcosystem #FutureTech #StartupFunding #JulyTop100 #DataDrivenVC #TopStartups #Entrepreneurship #TechNews #Top100Startups

7 Comments

Like Comment Share
Deasie reposted this

Leonard Platzer

Co-founder | CTO @ Deasie
1mo
Report this post
One ongoing challenge in leveraging LLMs for data annotation is the implications of compute cost as data volumes scale. But, with recent model releases, this is starting to change. --- At Deasie, we ran a data annotation experiment to compare classification performance of GPT-4o with it's cheaper counterparts: Gemini 1.5 Flash Preview and GPT-4o mini. 🔬 Set-up: We assessed the ability for each model to correctly extract the presence or absence of clauses from ~180 complex legal documents, measuring accuracy, recall, F1 and precision in every case (e.g., Was a minimum commitment term specified? Was an exclusivity clause included?) 📊 Results: For simpler metadata attributes, the performance across the 3 models was comparable. For classifications that were slightly more nuanced, we observed a marginal drop in all performance metrics for both GPT-4o Mini and Gemini 1.5 Flash Preview compared to GPT-4o. --- Despite the performance delta, we believe that the respective 7x and 33x cost differentials of Gemini 1.5 Flash Preview and GPT-4o Mini compared to GPT-4o will lead to increasing opportunity to support LLM-based classification at scale, without compromising on performance. With the recent release of Llama 3.1, this will only become more true.
2 Comments

Like Comment Share

Funding

Deasie 2 total rounds

Last Round

Seed Nov 12, 2023

US$ 2.9M

See more info on crunchbase

Deasie

Software Development

Automated labeling of unstructured data.

About us

Locations

Employees at Deasie

Caroline Hultman

Principal at J12 (early-stage VC in data infra, dev tools, AI applications) & Co-Founder / Board Member of GAIN

Karlton Haney

Meridian Ventures | Harvard Business School

Reece Griffiths

Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling

Tony Li

Software Engineer

Updates

OpenAI, Adobe and Microsoft support California bill requiring watermarks on AI content | TechCrunch

https://techcrunch.com

The hidden reason AI costs are soaring—and it’s not because Nvidia chips are more expensive

fortune.com

Join now to see what you are missing

Similar pages

X26

Meridian Ventures

Santé

QuantumBlack, AI by McKinsey

Aleph

OneImaging

J12

EarlyAdmit

Crew

Pier

Funding