CST383: learning log 2

This week I learned how Pandas makes working with data more realistic compared to just using NumPy. I now get that a Series is basically a 1D array with labels and a DataFrame is a table with rows and columns that you can actually reference by name. That made it easier to understand why Pandas is used so much in data science. Things like reading CSV files and quickly looking at data with methods like head, sample, and describe made the data feel a lot more manageable.

Indexing was still the most confusing part for me this week. I kept mixing up when to use regular brackets versus .loc and .iloc, and I had to remind myself that .iloc is position based while .loc is label based. The fact that slicing works differently between them tripped me up a few times. Grouping also took some time to understand, but thinking of groupby as split apply combine helped. Once it clicked, it was cool to see how easily you can get averages or counts for different groups.

I also learned more about aggregation and basic statistics like mean, median, variance, and standard deviation, and how Pandas can calculate all of that with just one line of code. The parts about distributions and probability were harder, especially PDFs and CDFs, but I get the general idea of using data to estimate probabilities instead of just guessing. Overall, this week felt like a step closer to actually doing data science instead of just learning syntax. I think I need to find more visual aspects to understand some of these materials more.

Comments

Popular Posts