This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Why do we clean our data before training?
Removing rows of data makes our model more powerful
Cleaning data helps us select features that will help the performance of the model
Removing rows that have errors prevents these rows from misleading the training process
What kind of data are best encoded with one-hot vectors?
Ordinal data
Categorical data with two possible values
Categorical data with three or more values
What is a data sample? What is a population?
A sample is all possible data we care about. A population is the subset of that data which we actually have on hand
Both population and sample refer to data we use to train our model.
A population is all possible data we care about. A sample is the subset of that data which we actually have on hand.
You have a model that does not perform well. Which of these will definitely not help improve its performance?
Adding additional samples (rows)
Adding a small number of features (columns) that you know relate to what the model is trying to predict
Adding a large number of features that you know have no relation to what the model is trying to predict.
You must answer all questions before checking your work.
Continue
Was this page helpful?