You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model’s accuracy dropped to 66%. How can you make your production model more accurate?
- Normalize the data for the training, and test datasets as two separate steps.
- Split the training and test data based on time rather than a random split to avoid leakage.
- Add more data to your test set to ensure that you have a fair distribution and sample for testing.
- Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
Reveal Solution Next Question