Thursday, December 24, 2009

Getting the most out of Python range()

I am doing a small data analysis project and figured it was a good time to learn a bit of Python. One of the requirements was that my data analysis range go from a start to a finish value in certain increments. The range() built in function was perfect for this, except that it stops prior to the last value. For example, for range(10, 25, 5), the values 10, 15, 20 would be produced. I needed 10, 15, 20, 25.

The obvious fix for this is to add the increment value to the stop value: range(start, finish+increment, increment). Unfortunately, this falls flat when the difference between your stop and start values is not a zero modulus of the increment value. For example, range(10, 23, 5) would produce the values 10, 15, 20, 25 based on the obvious fix, and that goes past the finish value.

The proper fix is the following (broken up into two lines for readability):

mod = increment - ((finish - start) % increment)
range(start, finish+mod, increment)

The astute will notice that all this "magic" really does is bumps up the finish value so that the difference between the finish and start is a zero modulus of the increment. For situations where the difference between the finish and start value is already a zero modulus of the increment, this solution effectively implements the "obvious fix" above.

The same thing could be done by simply adding the increment value to the finish value only when the difference between the finish and start value is a zero modulus, but that requires an "if" statement to detect that situation. My solution replaces multiple lines of code, or an awkward looking conditional (does python even do conditionals???), with a nice clean mathematical statement that works for all situations.