November 12, 2002
Say CheeseSnapfish Tackle BoxThe hardware and applications that run Snapfish are maintained at NTT/Verio's San Francisco co-location facility. The site is hosted on Sun (formerly Netscape) Web server software running on Linux and Solaris servers. BEA WebLogic application servers handle dynamic pages and database connection pools; the database itself is Oracle. Snapfish is a processing-intensive site; uploaded images are scaled and optimized for online display. Users also can manipulate their images online, cropping, rotating, and adjusting colors with the site's tools. Snapfish email servers handle several million messages a month. To accommodate the growth of such a diverse group of online applications, the site was designed to include as many scalable elements as possible.
From Minnow to WhaleAccording to Kapoorwho has a degree in mechanical/robotics engineering from Carnegie Mellon and leads an engineer-heavy management teamthe site was designed to accommodate a large number of users from the beginnining, but the hardware to serve millions wasn't purchased until it was needed. As the customer base has grownone million users by March 2001; two million users by September 2001; three million by June 2002the back end systems necessary to accommodate the added traffic have had to expand accordingly, says chief technical officer Bala Parthasarathy. Parthasarathy likens the scalability of the Snapfish system to knobs he can turn up or down depending on traffic patterns. He and his staff segregated their servers by function and clustered them to make scalability easier and to ensure that the site had no single point of failure. "When our digital upload volume doubles, which it did in summer 2002, I throw in more upload servers," Parthasarathy says. "I don't need to throw in more application servers. If you have enough knobs, you can fine tune them and scale them appropriately." Snapfish designed each part of its system to cluster, Parthasarathy says, by grouping at least two servers, or nodes, together to share the load for each of the site's critical functions. Building clusters into the systems from day one proved that the architecture and coding could handle more servers in a cluster before the site's growth made additional hardware necessary. "It's one thing to say, 'Yeah I can cluster this,' but on version 1.0, you want to have at least two nodes in there," Parthasarathy says. "That's really what tests the ability to multiply. Going from one to two is a big step. Even though both of them would be lightly loaded on day one, in order to meaningfully implement a cluster later on, you have to have designed it up front. Clustering is something that is very hard to put in later."
Traffic SchoolsSnapfish engineers monitor site traffic, marketing promotions, and established trends in the photo-processing business so that they can planoften no more than a month or so ahead of timehow much hardware will be needed to meet the projected site load. Parthasarathy says he and his staff track basic Web site statistics such as page views and the duration of an average visit, as well as business-specific metricsnumber of images being uploaded, scanned, edited, or sharedto anticipate trends. "There are definitely low seasons and high seasons and low days and high days, so we repurpose the machines accordingly," Parthasarathy says. "If there's a trend, we can see it. Things of this nature usually don't just break, they build up." Cost also plays a role. "You can obviously throw in a lot of hardware and solve all problems. We wanted to build in as fine-grained a control as possible to manage the scale and keep the cost low," Parthasarathy says.
|
|
||||||||||||||||||||||||||||
|
|
|
|