Eden Yavin’s Post

View profile for Eden Yavin, graphic

Applied Researcher @ Accenture Cyber Labs | M.Sc. AI and Autonomous systems @ BGU

I agree - deep search for ways to improve the models from existing dataset is better from breadth search for new data to add to the dataset. This method seems interesting because intuitively we as humans also do the same in other domains, not just NLP. The difficult part is how to create the dependence graph in other domains other than NLP. Interesting and Worth monitoring!

View profile for Yoel Zeldes, graphic

Research Engineer (NLP) @ Google DeepMind | Content Creator @ One Shot Learning

We all know that data is one of the most important ingredients in making a powerful LLM. Did you know that once you get the data, you can use it for training your model in a much better way than randomly sampling from it? In "Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models" the authors propose a neat framework for curriculum learning: Instead of randomly sampling your training data, identify which skills can be learned from the data and which connections exist between the skills so that you can train on prerequisite skills before moving on to more advanced skills. Intuitively, this is what humans do, and I'm glad to see a practical approach that translates that into the LLM domain. Why does it matter? Because gathering more relevant, high-quality data becomes increasingly challenging. This topic has been discussed in several research papers. Therefore, using the data more efficiently is one way to improve. The paper's weakness is that it doesn't align with recent research about how to evaluate LLMs in a meaningful way. Most of the experiments they did measured perplexity, but what matters is evaluating the actual generations of the models on actual use cases. I hope we'll see extensions of the paper to these areas in the near future. 🤞

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics