Michael Wornow’s Post

View profile for Michael Wornow, graphic

Computer Science PhD Student @ Stanford

🧐 Do AI agents understand enterprise workflows? To help answer this question we introduce 🍞 WONDERBREAD, a benchmark + dataset for evaluating AI agents on tasks that *augment* rather than *replace* human labor. 🌐 Website: https://lnkd.in/enkzH2D7 🗃️ Dataset: https://lnkd.in/e_NHhY6g ✍️ Paper: https://lnkd.in/gwEwhXdt ➖ ➖ ➖ ➖ Why? ➖ ➖ ➖ ➖ Existing benchmarks for AI agents heavily focus on 🤖 full end-to-end workflow automation. While automation is an exciting challenge, this overlooks the fact that simply *defining* the relevant workflow takes ~60% of the time of the typical process improvement project. 🍞 WONDERBREAD fills this gap by evaluating agents on common -- but currently overlooked! -- business process management (BPM) tasks such as: 📝 Automatically generating documentation of existing workflows 🔁 Transferring knowledge between workers 📈 Identifying ways to improve existing workflows ➖ ➖ ➖ ➖ What Is It? ➖ ➖ ➖ ➖ 🍞 WONDERBREAD is a benchmark+dataset for BPM tasks. It contains: 🚀 Dataset: 2,928 human demonstrations of 598 distinct workflows ✅ Tasks: 6 BPM tasks covering documentation, knowledge transfer, and process improvement use cases 📊 Evaluation: Automated evaluation pipelines for all tasks ➖ ➖ ➖ ➖ What's Next? ➖ ➖ ➖ ➖ We hope 🍞 WONDERBREAD inspires further exploration of more human-centric enterprise AI tooling beyond automation. And we welcome contributions from the community! Whether it's... ✍️ Defining new BPM tasks 🧑💻 Recording demonstrations of new workflows 📊 Evaluating models on our 6 tasks 🤖 Training models on our 2,928 demonstrations If you want to chat / learn more, please reach out to us at: https://lnkd.in/e3CbRTGC A big thanks to my co-authors Avanika Narayan, Ishan Khare, Tathagat Verma, Ben Viggiano, Tibor Thompson, Miguel Ángel Fuentes Hernández, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan A., Althea H., Nigam Shah, and the support of Stanford Health Care,Stanford Institute for Human-Centered Artificial Intelligence (HAI), Stanford Artificial Intelligence Laboratory (SAIL).

  • No alternative text description for this image
Tim Tuggey

Owner & Senior Counsel

1mo

There is the work. There is the definition and ongoing redefinition of the work.

Like
Reply
Theo Walker

Options Trader at Citadel Securities

1mo

Excellent Mike.

Quinn McIntyre

Math + AI @ Stanford University | Student Researcher @ SAIL | Research Scientist @ Etched | Prod Admin

2mo

Benchmarks like this are sorely needed

Pushing the envelope as always, Mike.

Michael this is seriously exciting, congratulations

Mateo H. Petel

Stanford | Oxford | Harvard | ENS | Paris Dauphine

2mo

Congrats Michael!

See more comments

To view or add a comment, sign in

Explore topics