I've known about decorators for a while - they show up pretty quick in many of
the most used python libraries. For instance, in flask
decorators are
extremely common. They're used as the mechanism that associates a function with
a route.
@app.route("/the_route")
def index():
print "hi"
When you go to "example.com/the_route" the function index
will be
called. From a usability standpoint it seems perfect for its use case.
They show up many other places. For example, in the test suite of pandas
there is an @network
decorator. I wish I would've explored this
earlier, because it's not to different from the code I'll be writing.
@network
labels a test, and if that test is labeled it will skip the test
if a network connection cannot be established. Very simple and very clean.
Staying in pydata land, Numpy has an entire file dedicated to
decorators.
My Need
Before the solution, I should first explain the need. I deal a lot with APIs. I work at a Marketing Analytics agency (Alight Analytics), we take our clients data from standard sources (Twitter, Google Analytics, Facebook, etc), mash it together, and try to derive some insights and recommendations. As a data scientist, I need to get that data out so I can use it and store it.
When you're dealing with a lot of data for a lot of clients rate limits come into the picture very quickly. I need my code to run as fast as I can, while still being observant of limitations. A naive solution, would be to sleep some amount of time. For example, if you can't make more then 10 calls a second (Google Analytics rate limit) you could sleep for .1 seconds after every call if your code takes some epsilon to run you'd be fine.
The better solution solution is exponential backoff - it's an alogrithm that asks for forgiveness rather than permission. The idea is to make calls as fast as you can, when you get an error pause for 2^n seconds where n is the attempt. Very simply it might look like:
def make_call():
n = 0
while True:
try:
return api.data()
except:
time.sleep(2**n)
n += 1
This isn't the best solution. For example, if there is something wrong with your api call other than you're making them too fast it'll keep looping, but it illustrates the solution.
My Solution
I could start with the whole garb about python's functions being first class, but you can get that elsewhere (you probably should get it elsewhere, it's useful to know). Therefore, I'll start with the code.
import time
def exp_backoff(tries=2, passable_exceptions=Exception):
def exp_backoff_outer(function):
def exp_backoff_inner(*args, **kwargs):
for x in range(tries):
try:
return function(*args, **kwargs)
except passable_exceptions:
if x == tries - 1:
raise
time.sleep(2**x)
else:
raise
return exp_backoff_inner
return exp_backoff_outer
Sadly the syntax for a decorator with arguments is really silly. Wrapping a function twice is smelly. If I didn't want to have a configurable number of tries and didn't want to choose the passable_exceptions
then I could remove he exp_backoff_outer
. Besides the whole wrapping madness it's not much different than the above. The only difference is the use of else
in the try-except block. If the passable_exceptions
isn't modified from the default it'll never see the light of day, if it is set to something like ValueError
then it will if a non-ValueError
exception occurs. The other difference from the first example is exiting after the code has been tried tries
times.
The decorator would be used like...
@exp_backoff
def add(*args):
sum(args)
So if I called add with something like add(1, 1)
there would be no problem, but if I did add(1, "hi")
you'd get a pass for 3 seconds (2^0 + 2^1) before seeing a ValueError
exception.
Anyways, this is probably one of the posts that'll be more useful for me later, much like my post on numpy
dates and times (fuck those), but hopefully it's useful for others.