Python Programmer Discovers Functional Programming. World Keeps Turning
Handling time series data using python’s builtin datastructures is a bitch. I guess the best way would be to just use a library like pandas, but a colleague came across the way I’d handled it in this code snippet using itertools and functools and asked me to share it with the rest of the team.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | from itertools import takewhile, dropwhile
from functools import partial
#... some other imports
def some_table(context, series, format_string):
table_options = format_string.split()
is_same_date = lambda date, i: i.datestamp > date
is_same_week = partial(is_same_date, context['date'] - timedelta(weeks=1))
is_same_month = partial(is_same_date, context['date'] - timedelta(weeks=4))
is_same_year = partial(is_same_date, context['date'] - timedelta(weeks=52))
function_dict = {
'd' : lambda series: series[0].price - series[1].price,
'w' : lambda series: series[0].price - dropwhile(is_same_week, series).next().price,
'm' : lambda series: series[0].price - dropwhile(is_same_month, series).next().price,
'y' : lambda series: low_high(i.price for i in takewhile(is_same_year, series)),
}
table = []
for series_name, series in series_set.items():
try:
row = dict([ (option, function_dict[option](series['list'])) for option in table_options ])
except KeyError:
raise template.TemplateSyntaxError("some_table has wrong arguments")
except IndexError:
raise template.TemplateSyntaxError("some_table : series does not contain enough data")
row.update({
'name' : series_name,
'price' : series['list'][0].price,
})
table.append(row)
return {
'table': table,
}
|
I’ll just concentrate on the parts which require more explanation. The most complex line to understand is line 24
List comprehensions and python ‘switch’
row = dict(
[(option, function_dict[option](series["list"])) for option in table_options]
)
The dict function can build dictionaries in a variety of ways, a list pairs in my case, that I build using a list comprehension. The second element of each pair is built by calling a function from a dictionary of functions. This is my prefered way of emulating the switch statement in python, with each option in table_option acting as each individual ‘case’ statement selector.
One of the guys here prefers a series of ‘if/else’s but I prefer the one hash lookup to possibly evaluating every single if condition. It also can start to look ugly and python’s if/else doesn’t declare a new scope.
itertools : takewhile, dropwhile
Each function in the hopefully obviously named function_dict is a simple function that calculates the day/week/month change or year high/low. The week and month calculation are quite fun, it’s just the difference between two prices, but the second price is calculated by iterating over the series using dropwhile until the condition is_same_month met. Then getting the next element. This avoids iterating over the whole series whilst avoiding problems of sparse weeks (bank holidays or royal weddings for example). The year lambda function takes the low and high from the series where they are in the same year. Notice how similar that sentence is to the actual function
lambda series: low_high(i.price for i in takewhile(is_same_year, series))
I guess that’s why I like things like takewhile. low_high is just a wrapper around the builtin min and max.
Partial application in functools
is_same_month = partial(lambda date, i: i.datestamp < date, context['date'] - timedelta(weeks=4))
The functions is_same_week/is_same_month are partially applied lambda functions. Lambda functions can’t take variables from the surrounding scope so we use partial from functools to partially apply the function and fill the parameter with date from the django template context so we can fix the ‘same date’ to a specific one.
Readability
If you’ve dabbled with functional programming, like I’ve tried to you should be quite happy with these concepts. Another thing to note is I used a plain for loop when iterating over series_set. I’ve tried refactoring it into a list comprehension, but I didn’t want to end up with a nested listed comprehension and readibilty won out. Also List comprehensions ‘leak’ variables.