Deasie

Deasie

Software Development

Automated labeling of unstructured data.

About us

Deasie is a platform for automating the labelling of unstructured data. Deasie’s workflow enables data & business teams to rapidly create best-in-class metadata (with labels that are customisable, auto-suggested, quality-checked, standardised and hierarchical) across enterprise-level volumes of both text & image data. Deasie deploy (on-prem) with some of the largest manufacturing, healthcare and financial service firms in the world, on use cases ranging from data cataloging, to enhancing RAG accuracy, to removing sensitive data ahead of GenAI roll-out.

Website
https://www.deasie.com
Industry
Software Development
Company size
11-50 employees
Headquarters
New York
Type
Privately Held
Founded
2023

Locations

Employees at Deasie

Updates

  • Deasie reposted this

    View organization page for TechCrunch, graphic

    2,937,604 followers

    OpenAI, Adobe, and Microsoft have thrown their support behind a California bill requiring tech companies to label AI-generated content. The bill is headed for a final vote in August. AB 3211 requires watermarks in the metadata of AI-generated photos, videos, and audio clips. Lots of AI companies already do this, but most people don’t read metadata. AB 3211 also requires large online platforms, like Instagram or X, to label AI-generated content in a way average viewers can understand.

    OpenAI, Adobe and Microsoft support California bill requiring watermarks on AI content | TechCrunch

    OpenAI, Adobe and Microsoft support California bill requiring watermarks on AI content | TechCrunch

    https://techcrunch.com

  • Deasie reposted this

    View profile for Reece Griffiths, graphic

    Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling

    Metadata has a key role to play in RAG, and yet still today even the most advanced data science functions are only just starting to explore this lever for guiding LLMs towards the most relevant chunks of data for a given query. --- A few commonalities between most data science functions we speak with: 1. They are all building some form of RAG 2. Most are still in the phase of trying to move from proof-of-concept RAG to production-ready tools, which often requires handling a step change in data volume 3. Only the most advanced teams are considering the role of metadata in their pipelines 4. Within those teams who are integrating metadata within their RAG, the labels being used often remain very generic --> We believe that DS teams who are capable of generating high-quality, standardized metadata (with labels that are customized to both their data and their RAG use case) will have a far easier time bridging the PoC --> Production gap, especially as they scale to typical enterprise data volumes. --- 🔬 We recently tested the impact of metadata on retrieval accuracy when asking 30 questions across 10, 50 and 90 complex legal contracts, in each case assessing whether the LLM correctly retrieved the most relevant chunk for answering the question. In the scenario where metadata was included, an LLM was used to evaluate the relevance of each chunk based on the labels added. 📈 Results: -- With ~10 documents, metadata had no impact -- With ~100 documents, metadata increased accuracy by ~13% #metadataforRAG #unstructureddatalabeling #genaigovernance

    • No alternative text description for this image
  • Deasie reposted this

    View profile for Sharon Goldman, graphic

    AI reporter at Fortune

    NEW for Fortune: Building today’s massive AI models can cost hundreds of millions of dollars, with projections suggesting it could hit a staggering billion dollars within a few years. Much of that expense is for computing power from specialized chips—typically Nvidia GPUs, of which tens of thousands may be required, costing as much as $30,000 each. But companies training AI models, or fine-tuning existing models to improve performance on specific tasks, also struggle with another often overlooked and rising cost: data labeling. This is a painstaking process in which generative AI models are trained with data that is affixed with tags so that the model can recognize and interpret patterns. Data labeling has long been used to develop AI models for self-driving cars, for example. A camera captures images of pedestrians, street signs, cars, and traffic lights and human annotators label the images with words like “pedestrian,” “truck,” or “stop sign.” The labor-intensive process has also raised ethics concerns. After releasing ChatGPT in 2022, OpenAI was widely criticized for outsourcing the data labeling work that helped make the chatbot less toxic to Kenyans earning less than $2 hourly. Today’s generic large language models (LLMs) go through an exercise related to data labeling called Reinforcement Learning Human Feedback, in which humans provide qualitative feedback or rankings on what the model produces. That is one significant source of rising costs, as is the effort involved in labeling private data that companies want to incorporate into their AI models, such as customer information or internal corporate data. In addition, labeling highly technical, expert-level data in fields like legal, finance, and healthcare is driving up expenses. That’s because some companies are hiring high-cost doctors, lawyers, PhDs, and scientists to label certain data or outsourcing the work to third-party companies such as Scale AI, which recently secured a jaw-dropping $1 billion in funding as its CEO predicted strong revenue growth by year-end. Thanks to William Falcon Kjell Carlsson, Ph.D. Neal K. Shah Matt Shumer Bob Rogers for their comments! https://lnkd.in/eygmGJi3

    The hidden reason AI costs are soaring—and it’s not because Nvidia chips are more expensive

    The hidden reason AI costs are soaring—and it’s not because Nvidia chips are more expensive

    fortune.com

  • View organization page for Deasie, graphic

    1,071 followers

    Experiment: Impact of Few-Shot Learning on data labeling performance ❓ Context: Customers often seek to "fine-tune" Deasie’s data labeling workflow to enhance classification performance. For instance, identifying specific legal clauses in long contracts can be challenging. A base model might handle obvious cases well but struggle with nuanced language requiring expert interpretation. 🛠 In-Context Fine-Tuning ("Few-Shot Learning"): Few-shot learning is an attractive approach for 'fine-tuning' specific data annotation tasks using a handful of labeled examples, without needing to fine-tune a base model or gather large training datasets. --- 💡 Experiment Setup: We tested the impact of adding 1-3 labeled examples (e.g., when you saw this input text, a correct classification would have been Y) through in-context fine-tuning on a model's classification performance. We used GPT-4o with a basic prompt to identify clauses in several hundred legal documents (e.g., Agreement Date, Change of Control, Uncapped Liability, Non-Disparagement). -- 📊 Results: From just a handful of examples, a 10% increase in accuracy and 6% increase in precision were observed. Without the labeled examples, the model had a tendency to identify a higher volume of false-positives (reducing the overall accuracy). Whilst Recall reduced, the net F1 increased by 4%. --> Few shot learning can provide a quick and effective way of enhancing data annotation conducted with LLMs. Examples should be chosen carefully to reflect the trade-off of enhancing Recall vs. Precision (the more important of which will depend on the specific annotation use case). #dataannotation #dataclassification #unstructureddatalabeling

    • No alternative text description for this image
  • Deasie reposted this

    View profile for Brian Mink, graphic

    AI Entrepreneur | Executive | Board Director | Attorney | AI Keynote Speaker & Professor

    Don’t miss part 2 of our Times Square billboard 👀 ❓How do you make AI a trusted partner in your organization? A few takeaways so far from today’s ALIGN AI Executive Summit in Midtown Manhattan: - Leading organizations are already deriving a ton of value from AI models TODAY. Yes, we are in the early stages and there is certainly some hype, but the potential is real and already being validated by models currently in production. - Operationalizing AI requires sound data governance, data quality, data observability, and more. Now is the time to take stock of your data pipelines, tech stack, and processes, and position your organization for success. - Start with your broader organizational OKRs and KPIs, then identify your highest impact AI initiatives—not the other way around. Just because a use case is cool and you can do it, doesn’t mean it *aligns* with your business goals. Thank you again to our sponsors for making this incredible event possible! If you are feeling paralyzed in your AI journey, you’re not alone—and these are the folks you need to talk to: Deloitte | DataRobot | KUNGFU.AI | EPAM Systems | Fiddler AI | Pure Storage | Monte Carlo | Precisely | Zenlytic | Tecton | Alation | Stevens Institute of Technology | Fivetran | Deasie Data Science Connect | Amelia Mink

  • Deasie reposted this

    View organization page for Data Science Connect, graphic

    7,481 followers

    Times Square, we’ve arrived! We’re thrilled to be in the heart of NYC for tomorrow’s ALIGN AI Executive Summit, where we’ll gather some of the city’s brightest minds in data and AI, representing top companies that drive innovation in the Big Apple. 🍎 A huge thank you to our visionary sponsors leading the charge in AI innovation, our brilliant speakers for sharing their wisdom, and the 200+ senior data executives joining us for insightful roundtable discussions on AI’s impact across industries. Deloitte | DataRobot | Pure Storage | Fiddler AI | Monte Carlo | KUNGFU.AI | EPAM Systems | Precisely | Zenlytic | Alation | Fivetran | Tecton | Stevens Institute of Technology | Deasie #GenAI #DataExecutives #NYC #ALIGNAI

  • Deasie reposted this

    View profile for Leonard Platzer, graphic

    Co-founder | CTO @ Deasie

    It was great to present as a sponsor of the Data Science Connect summit in New York yesterday and share the cool work we’re doing at Deasie with data & engineering leaders from around the globe. --- Three takeaways from many hours of conversations: 1) In many enterprises, 90%+ of innovation projects are now tied to some form of GenAI 2) The top of mind issues for data & AI execs are: (a) Data readiness (b) Security and (c) Demonstrating tangible RoI from their chosen LLM use cases 3) In the data labeling space, one of the biggest challenges companies face is knowing what metadata should be defined in the first place, exacerbated by the frequent disconnect between DS teams and data domain experts (this is where Deasie’s ‘auto-suggested labelling’ workflow received a lot of attention!) --- Excited to continue shaping the conversation around unstructured data management and next-generation metadata tooling! #datalabeling #unstructureddata #enterprisegenai

    • No alternative text description for this image
  • Deasie reposted this

    View profile for Reece Griffiths, graphic

    Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling

    In March of this year, Harvard Business Review released a survey conducted across 330+ data leaders discussing their level of ‘data readiness’ for GenAI. The results indicated that only 6% of enterprises had succeeded in getting GenAI into production. Fast forward several months and our ongoing conversations with data & AI practitioners suggest that little progress has been made on this front. As the article rightfully mentioned, “for most organizations it will be a monumental effort to curate, clean, and integrate all unstructured data for use in genAI applications.” The reality is that most enterprises require a step change in their approach to unstructured data management—across metadata, data quality, sensitive data management, versioning, and more—if they are to leverage these internal assets at scale within LLMs. --- Across those taking action, we broadly see two camps of data leaders: ⚒ 1) Data foundation-first: Invest in company-wide data infrastructure to build a robust foundation to support a roadmap of use cases to come. 💡 2) Business use case-first: Start with transforming the narrowest data domain possible required to bring a given use case into production. While the first encourages healthy foresight into building the right 'stack', we are consistently seeing greater success when data teams are strongly led by specific business use cases where the RoI of adopting GenAI is very tangible. #unstructureddata #genaiadoption #metadatamanagement

    • No alternative text description for this image
  • Deasie reposted this

    View profile for Reece Griffiths, graphic

    Founder | Y-Combinator | ex-McKinsey & QuantumBlack | Automated data labelling

    Deasie voted again as the number 1 startup globally in Hatcher+'s latest market report for July. Great to receive this recognition 🙌 For data & data science leaders thinking about best-practice RAG, automated data labelling, and next-gen data governance - we'd love to speak with you.

    View organization page for Hatcher+, graphic

    8,242 followers

    🚀 The Hatcher+ Top 100 for July is Out Now! 🚀 👉Full Top 100 List : https://lnkd.in/gYM7TT9e We're excited to unveil the Hatcher+ Top 100 list, celebrating the top startups for July! This AI-powered list highlights the most promising startups from around the globe. Congratulations to the 100 startups for making this month’s list. Your hard work and innovation have truly shone through. 👉Full Top 100 List : https://lnkd.in/gYM7TT9e If your startup didn’t make it this time, don’t be discouraged. The Hatcher+ Top 100 is a monthly feature, giving you a new chance to be recognized every month. We encourage you to continue your amazing efforts and try again for the August list. To participate, simply ensure your startup’s profile is up-to-date on our platform. Our AI will review all active profiles to determine the next top 100 startups. Existing winners can continually improve their Hatcher scores to stay competitive. 👉 Sign Up Here : https://lnkd.in/gjdngV5B #HatcherTop100 #HatcherPlus #Awards #Startups #Innovation #TechStartups #GlobalStartups #VentureCapital #Investment #TechInnovation #StartupEcosystem #FutureTech #StartupFunding #JulyTop100 #DataDrivenVC #TopStartups #Entrepreneurship #TechNews #Top100Startups

  • Deasie reposted this

    View profile for Leonard Platzer, graphic

    Co-founder | CTO @ Deasie

    One ongoing challenge in leveraging LLMs for data annotation is the implications of compute cost as data volumes scale. But, with recent model releases, this is starting to change. --- At Deasie, we ran a data annotation experiment to compare classification performance of GPT-4o with it's cheaper counterparts: Gemini 1.5 Flash Preview and GPT-4o mini. 🔬 Set-up: We assessed the ability for each model to correctly extract the presence or absence of clauses from ~180 complex legal documents, measuring accuracy, recall, F1 and precision in every case (e.g., Was a minimum commitment term specified? Was an exclusivity clause included?) 📊 Results: For simpler metadata attributes, the performance across the 3 models was comparable. For classifications that were slightly more nuanced, we observed a marginal drop in all performance metrics for both GPT-4o Mini and Gemini 1.5 Flash Preview compared to GPT-4o. --- Despite the performance delta, we believe that the respective 7x and 33x cost differentials of Gemini 1.5 Flash Preview and GPT-4o Mini compared to GPT-4o will lead to increasing opportunity to support LLM-based classification at scale, without compromising on performance. With the recent release of Llama 3.1, this will only become more true.

    • No alternative text description for this image

Similar pages

Funding

Deasie 2 total rounds

Last Round

Seed

US$ 2.9M

See more info on crunchbase