Doug Turnbull’s Post

View profile for Doug Turnbull, graphic

Search Relevance at Reddit

People think there's a one-size-fits-all form of "offline experiments" in search, recsys, RAG, etc. Really your job is as much to think about the methodology during _every_ experiment. Not just for accuracy, but the entire cost-benefit tradeoff equation. Off the top of my head, these are all very different things: * Getting a quick sense of a change before doing an A/B test - you can literally just gut-check the change hits the queries you expect * Training / eval a ranking model - where "NDCG" truly has to be rock solid * Debugging precision / recall - ie 'why is search behaving this way' - an offline metric (ie NDCG) is a rough guide but just one tool in your debugging arsenal * Opportunity analysis+planning - you just want a quick idea of whether signal exists, erring to potential, not accuracy * Leaderboard / competition - methodology is more-or-less "done for you" and you're optimizing in one direction (often ignoring many other factors) * Improving a system where NO A/B test exists - using labels can help you guide + debug -- maybe even make a leaderboard -- but side-by-side qualitative analysis with your team is also really big Importantly every methodology can have severe limitations but very specific benefits to where its best suited

To view or add a comment, sign in

Explore topics