CST 383: learning log 5
This week focused on practical steps around machine learning, especially data cleaning, scaling, and model evaluation. I learned how Pandas handles missing data using NaN and None, and how isna(), dropna(), and fillna() are used to find and treat missing values. It stood out that there is no single correc or right way to handle missing data and that the choice depends on context.
Scaling also became clearer to me this week. Unit scaling and z score normalization help make features comparable, which is especially important for distance based models. The idea of z scores as measuring how far a value is from the mean in standard deviations finally clicked, but I think I still need some more practice with this.
Also K nearest neighbors helped connect scaling and distance. I understand why the choice of k matters and why predictors need to be scaled. The evaluation section was especially useful. Accuracy alone can be misleading, and confusion matrices, precision, recall, and F1 score give a more complete picture of how a classifier performs.
Overall, I think that this week helped me understand that good machine learning depends just as much on data preparation and evaluation as it does on the model itself.
Comments
Post a Comment