October 12, 2009
Using Erlang to Build Reliable, Fault Tolerant, Scalable SystemsArun Suresh and Jebu Ittiachen
A case study in rebuilding Yahoo! Harvester
Jebu Ittiachen is a Principal Engineer at Yahoo!, working with the Content Platform Group. Jebu blogs at http://blog.jebu.net and tweets frequently from @jebui. Arun Suresh works for Yahoo! as a Technical Lead. He is currently developing next-generation platforms that power Yahoo! Properties. He tweets from @arun_suresh.
Harvester is a data acquisition component at Yahoo! that acquires data from various providers concurrently through a pull model from drop boxes via various protocols. Feeds can either be scheduled or fetched on demand.
While Harvester was originally written in Perl, Erlang's high-level concurrency constructs -- along with OTP design principles -- make it an ideal platform for building reliable, fault tolerant, and scalable applications like Harvester. The resulting service is more scalable, available, reliable, and able to comply with tighter service-level agreements (SLA) on a lighter code base and less expensive development efforts. In this article, we compare how the two systems perform side-by-side on a single box and showcase the higher ROI provided by the Erlang stack.
Harvester is modeled as a set of cooperating Perl processes. A Perl daemon (foreman) monitors a queue in the database for feeds that are scheduled to run. The foreman launches worker processes when a feed is ready for a fetch. A separate Perl daemon (conductor) schedules jobs into the queue. For scalability and high availability, all nodes run the foreman and conductor. To avoid concurrency problems, they use named locks in the database to avoid stepping on each other. Figure 1 presents a high-level architectural view of a queued batch processing system such as this.
Figure1: Architectural view of the system.
The advantages of this approach include:
Still, the current design has limited Harvester's adoption in areas such as mission-critical feeds with strict SLA. These drawbacks include:
Extracting More From the Current Implementation
To address these drawbacks, we took several steps, starting with tweaking the existing architecture with pooled Perl processes. We hoped this would reduce the process launch times by pooling worker processes that were launched, reusing it for the next run. However, the resource usage was still high per process. Considering that the processes are mostly I/O intensive, the huge number of spawned process meant neither the CPU nor the I/O was being effectively utilized and the OS was spending more time in process switching. Moreover, the database contention still remains. Bringing in a job management system into the architecture meant more maintenance overhead with no significant returns on resource utilization. Threads in Perl were explored for more efficient resource utilization but not advised by the gurus. Stepping back we realized that the requirements were of a broader nature. We realized that what we needed was a:
With this in mind, we considered a Java solution. The Java stack has a very good threading model, which has been proven in the field. However, mapping our problem to the Java application server model like Tomcat was not a natural fit because Harvester:
Building it as a stand-alone threaded Java app would mean introducing the issues associated with the threaded programming model. The threading model is based on shared memory and locking to achieve consistency. Developing and deploying an error-free threaded program is tough, since hard-to-reproduce concurrency issues go hand-in-hand with shared memory. A message passing solution is not native to Java and we would need to bring in an external library to help us do messaging, in turn introducing a performance impact.
Also, current Java VM's are not well-suited for the new range of hardware sporting more cores and lower clock speeds. The code upgrade issue still remains and doing application upgrades involve taking a host down, upgrading it, and putting it back into the cluster.
|
|
||||||||||||||||||||||||||||
|
|
|
|