From the course: AWS Certified Machine Learning - Specialty (MLS-C01) Cert Prep: 3 Modeling

Unlock this course with a free trial

Join today to access over 23,100 courses taught by industry experts.

Train validation test split, cross-validation

Train validation test split, cross-validation

- [Instructor] A core competency for doing machine learning is to understand the test/train split. And an easy way to think about this is that you're going to divide the data into an 80-20 rule. So 80% of the data will be used to train the model and that will be randomly selected. And you'll use this data to come up with your best accuracy by tweaking hyper parameters and selecting the correct model. And then you'll hold out this 20% so that it's not being used to train the model. Then you'll test your model performance. What will happen here is, you'll get a more realistic understanding of how your model will perform in the real world. Once you get into this scenario, though, you still could have some overfitting problems because this randomly selected data could be subject to selection bias, right? What if you randomly selected data that's too similar to itself, which could happen. So one of the ways that you could solve…

Contents