Asynchronous Programming
What happens if you have a lot of sockets that are waiting to read or write data? Asynchronous programming lets you write code that basically says, "Call my callback when you actually have something for me." Although this approach is used all the time in C, it's even nicer in Python because Python has first-class functions.
These days, there are many servers written asynchronously. nginx is a "simplified version" of Apache that is both very fast and highly concurrent. Squid, the popular open source Web proxy, is also written asynchronously. This makes a lot of sense if you think about what a Web proxy does. It spends all of its time managing a ton of sockets, funneling data between clients and servers.
Asynchronous programming starts with operating system APIs such as select, poll, kqueue, aio, and epoll. These APIs let you write code that basically says, "These are the file descriptors I'm working with. Which of them is ready for me to do some reading or writing?" In Python, libraries like the built-in asyncore module and the popular Twisted framework take these low-level APIs and orchestrate callback systems on top of them.
Let's look at an example of asynchronous code. First, the linear (non-asynchronous) code in Example 4.
def handle_request(request): data = talk_to_database() print "Processing request with data from database."
Re-written asynchronously, you end up with something like Example 5. (You can move use_data into a new top-level function after handle_request, but it's convenient to do it this way to maintain access to request via a closure.)
def handle_request(request): def use_data(data): print "Processing request with data from database." deferred = talk_to_database() deferred.addCallback(use_data)
Notice that the talk_to_database function no longer returns a value directly. Rather, it returns a deferred object to which you can attach callbacks.
This is called "continuation passing style". Rather than waiting for a function to simply return, you must pass a callback detailing how to continue once the data is obtained. Because you must use continuation passing style anytime you call a function that might block, it soon permeates your codebase. This can be painful and prevents you from using any library that does blocking I/O unless it's written using continuation passing style.
On the other hand, living in the asynchronous ghetto has its benefits. Aside from the clear concurrency benefits, the Twisted codebase is widely regarded as well-written code, and it provides implementations for most popular protocols.
Subroutines Versus Coroutines
In the beginning, there was the GOTO. It didn't take any parameters, and it was a one-way trip.
A coroutine is like a subroutine, except it doesn't necessarily return. With subroutines, you can do things like:
f -> g -> h (return to g, return to f)
With coroutines, you can do things like:
f -> g -> h -> f
Coroutines can be used for simple cooperative multitasking. The Python Cookbook has a great recipe for coroutines based on generators. Example 6 is a simple version of it.
import itertools def my_coro(name): count = 0 while True: count += 1 print "%s %s" % (name, count) yield coros = [my_coro('coro1'), my_coro('coro2')] for coro in itertools.cycle(coros): # A round-robin scheduler :) coro.next() # Produces: # # coro1 1 # coro2 1 # coro1 2 # coro2 2 # ...
Using generators to implement coroutines is definitely a cute hack. By the way, this same trick can be used in Twisted to alleviate some of the need to use callbacks everywhere.
On the other hand, there are some limitations to this technique. Specifically, you can only call yield in the generator. What happens if my_coro calls some function f and f wants to yield? There are some workarounds, but the limitation is actually pretty core to Python. (Because Python isn't stackless, it can't support true continuations in the same way that Scheme can.) I've written about this topic in detail on my blog.