# Data Analysis Course Week 4 (Part 2)

Posted on

I tried to start today by finishing off the exercise I was getting frustrated with yesterday but I’m still having no luck. I keep getting errors when I run the cells in Jupyter and so I have the feeling that I’ve not quite understood this section. Maybe the variables in the examples I’ve been given are not storing dataframes but something else and so when I try to run dataframe specific methods on them this is causing problems. Anyway, I’m not sure at this point and it’s only causing me to become quite frustrated, so I’ve decided to move on. This isn’t ideal but maybe it’ll make sense later.

To continue the apply part of the split-apply-merge pattern, you can also use filtering. The `filter()` method which is similar to `apply()` in that it takes the name of a filter that you have previously defined. For example:

```def groupsWithValueGreaterThanFive(g):
return g['Value'].sum() > 5

df.groupby('Commodity').filter(groupsWithValueGreaterThanFive)```

The above code defines a filter called `groupsWithValueGreaterThanFive` that takes a group (`g`), and returns anything from the `'Value'` column that has a sum greater than 5 (`return g['Value'].sum() > 5)`. You can then apply this filter (`.filter(groupsWithValueGreaterThanFive)`) to a dataframe called `df` that is grouped by the ‘Commodity’ column (`df.groupby('Commodity')`).

The `aggregate()` method can also be used to filter contents, but this time with `len` (which counts the number of records in each group).

Although the course doesn’t go into it, you can use transformation as part of the apply section of the split-apply-merge pattern. This can standardise data within a group or fill empty values (N/As) with a value derived from each group.

Yet again I get to the exercises and find I can’t do them. I’m clearly not understanding this section. I think I need to start over on this week’s part of the course, but that’s something for tomorrow.

# Terms

These terms are written as I understand them at the time of writing this blog. I may come to expand on them, or change them completely as I learn more about programming. You can find an up-to-date list of the terms on my programming terms page.

Dataframe – See the programming terms page.

Jupyter – See the programming terms page.

Method – See the programming terms page.

Variable – See the programming terms page.