CST383: learning log 3
This week was about exploring data quickly and choosing the right visualization. Using df.info(), df.describe(), and df.nunique() helped me get a fast sense of a dataset, especially figuring out which variables are actually continuous vs categorical even when everything is stored as numbers. For single continuous variables, I got more comfortable with density plots, histograms, and box plots. Density plots are good for seeing the overall shape, but the bandwidth can really change how the plot looks, so it’s easy to over interpret noise. To me, histograms are easier to read but depend a lot on the bin choices. Box plots are useful for medians, quartiles, and outliers, but they don’t show shape very well. The skew examples made it clear to me why log transforms are useful, even though the reading log axes still takes practice. With two continuous variables, things clicked more when we worked with joint and conditional probabilities directly in Pandas using boolean conditions and .mean()....