Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Ruby/Amazon & Amazon Web Services


February, 2005: Ruby/Amazon & Amazon Web Services

Automating e-commerce with a convenient interface to Amazon's AWS API

Ian is a a senior system administrator for Google. He can be contacted at [email protected].


It used to be that if you wanted to access the vast wealth of data on the Web, you fired up a web browser and hopped from page to page, selectively reading as you went. That was fine for quickly looking something up or whiling away the time, but the Web has become an increasingly valuable research and business tool. Research is often defined in practical terms as the need to process and categorize large quantities of data, while business transactions are usually more efficiently handled by computers than humans, so the desire to automate these processes is natural and inevitable. That's where a browser ceases to be the right tool for the job.

Various technologies enable programmatic access to the Web without the need for tedious and error-prone screen scraping—REST, XML-RPC, and SOAP are the best known amongst these. These technologies have encouraged information providers to make their data available not just in old-fashioned HTML for interactive consumption, but also as structured data that can be programmatically queried, parsed, and manipulated.

The net result is that large web sites—especially those that conduct commerce on the Web—have developed APIs through which their users and affiliates can search for information and conduct business. eBay, Google, salesforce.com, and Amazon, for example, all offer web APIs via which users may mine databases and transact business. These are usually, but not always, free to use, as they encourage the development of third-party products, and in turn, lead to the growth of a wider community around the company's business.

In this article, I examine Amazon Web Services (AWS), one of the more comprehensive APIs offered by web commerce companies. There are a number of different ways of using AWS. For example, Amazon offers both a REST- and a SOAP-based API. The choice of which to use is up to developers and is largely a question of taste, which is why Amazon chose to offer both.

At this point, there's nothing to stop you from reading the API documentation and immediately starting to code an application in your favorite programming language, but you'll soon find yourself spending time on the mechanics of the API, rather than just dealing with the data you want to extract.

Happily, a number of people have written wrapper libraries around AWS, removing the need for you to parse XML, deal with common error conditions, and program around limitations in the underlying API. Here, I look at the Ruby/Amazon toolkit, which is a high-level abstraction of AWS I wrote for the Ruby programming language (http://www.caliban .org/ruby/ruby-amazon.shtml).

Ruby has been rapidly gaining ground as a highly viable alternative to Perl, Python, and Java for general scripting, system administration, and applications programming (http://ruby-lang.org/). Coming equipped with strong XML, HTTP(S), XML-RPC, and SOAP libraries, it fortunately also lends itself very well to web-services programming.

Ruby/Amazon has some convenient, time-saving features, such as the ability to cache search results for later recall and the option to find all results for a given search, rather than obtain just the single page of 10 results that the underlying API returns. Without a higher level wrapper library acting as an intermediary, achieving the same result would entail retrieving each consecutive page yourself, until no more results were returned.

AWS has undergone a number of revisions since its inception in 2002 and is currently at Version 3.1. At the time of writing, AWS 4.0 is in beta testing and is a complete rewrite of the API. I focus here on 3.1, as that is the version in the most widespread use.

Before starting, you should get a developer token by registering at http://www.amazon.com/gp/aws/registration/registration-form.html/. While there, it is a good idea to apply for an Associates account (https://associates.amazon.com/exec/panama/associates/apply/). This lets you earn referral fees for any customers you bring to Amazon via applications you create that take advantage of AWS.

You're ready to roll. If you need to use an HTTP proxy, define it now in the UNIX environment variable $http_proxy; see Listing One. As you can see, I am choosing to display information about a product's name, author, price, and availability. Each property of the product is available as an instance variable of the corresponding object. In Listing One, the object is passed into the search block as an iterator. To display a list of available properties for any given product, you can call the properties method of the corresponding product object.

Not every product has every type of property. For example, a book will have authors, but no directors, while a DVD is the opposite. Even within a single product category, some products will have properties that others are missing. Some properties, however, such as the price, are logically common to all products; everything has a price at Amazon. At this writing, Listing One yields the results in Figure 1. Rather than use the iterator block form of the search, this would also have worked as well:

response = r.keyword_search
('ruby programming', 'books')

Using this approach, response now contains the literal REST response returned by AWS; you can print or puts it and see. This is sometimes useful for the purpose of debugging. You can still iterate over the products by calling each iterator on the products array:

response.products.each do |p|
# Process product data
end

Amazon Standard Item Numbers (ASINs, see Figure 1) are at the heart of the Amazon product catalog, as they provide every product in the catalog with a unique identifier. For a book, the ASIN is the same as the ISBN number, but for other items, it's an arbitrary alphanumeric string concocted by Amazon. Many of the AWS API calls deal directly with ASINs to uniquely identify the product in question, so you'll likely find yourself passing them around a lot.

If you didn't elect to use caching at the time of request object instantiation, you can elect to do so now by establishing a cache object for future searches. The cache directory given is created, if it does not already exist.

r.cache = Cache.new( '/tmp/amazon' )

From this point forward, new searches are not repeated if they were already conducted less than 24 hours ago. Amazon allows the caching and reuse of most data for 24 hours, although this changes from time to time, so check the latest terms and conditions (available in the API ZIP file). Using the cache facility can vastly improve the performance of your application, so it is used by default if you do not specify a preference.

You can make life easier for yourself by specifying your usual requirements for new Request objects in a configuration file, to be found at either /etc/amazonrc or ~/.amazonrc. For example:

dev_token = 'D23XFCO2UKJY82'
associate = 'calibanorg-20'
cache_dir = '/tmp/amazon/cache'
locale = 'de'

Now, any Request objects you create inherit these characteristics. They can still be manually overridden as required, however.

The underlying concept of Listing One can be inverted to produce a program that notifies you whenever a new album by your favorite artist appears in the Amazon catalogue for preorder. This works by looking for products whose availability contains the string "not yet released." In Listing Two, the "favourites" array contains tuples of product names and catalog sections. The program then searches the relevant parts of the UK catalogue for matching products that have not yet been released. It may be useful to mail the output of this program to yourself on a weekly basis. When run, the output looks like Figure 2.

As a final demonstration of the usefulness of this kind of monitoring, I present a program that is designed to be run automatically on a daily basis. It tracks the price of an item in the "electronics" section of the catalog and produces output whenever the price changes.

Items in Amazon's catalog can undergo quite a lot of price fluctuation within a relatively short period of time. Sometimes, the price of an expensive item drops $50.00 or more, only to rise again mere hours later. I don't know why this happens, but knowing that it does and having AWS at your disposal lets you take advantage of these fluctuations to buy at a favorable moment.

Have a look at Listing Three. Replace the ASIN with that of a product in which you're interested and run the program once an hour or so, perhaps from cron.

There are many kinds of searches that can be performed via AWS. You've already seen #keyword_search and #asin_search in the example programs, but there are other common search types, such as #actor_search, #artist_search, #author_search, and #director_search, as well as a slew of other, more esoteric searches.

For example, if you have an Amazon wishlist, you can look it up with a line like this:

r.wishlist_search( '18Y9QEW3A4SRY' )

The parameter to the method call is the alphanumeric wishlist ID.

Or, if you want to get more specific and you're looking for a book, the power search may be what you need. The following would find all books published by O'Reilly & Associates in the Amazon catalog and sort them by publishing date:

r.power_search("publisher:o'reilly", 'books',
HEAVY, ALL_PAGES, '+daterank')

This can take quite some time to return, as it will have to look up in excess of 1000 products.

I won't go into any of the other esoteric searches here. Read the AWS and Ruby/Amazon documentation, experimenting as you go, to get a feel for what is possible.

One thing worth noting is that the AWS search methods can take many more parameters than illustrated by the programs I present here. For example, I've been performing "heavy" searches by default, which return all the information available for a given product. You can also specify a "lite" search, which returns fewer properties and is, therefore, theoretically faster to execute. Use the Amazon::Search::HEAVY and Amazon::Search::LITE constants to specify the weight of the search. The weight is either the second or the third argument to the method, depending on the particular method call.

Another parameter of interest is the page parameter. This specifies which page of up to 10 responses to return for the given search. If you play around with the various search methods without supplying a page number, you'll quickly notice that you are never returned more than 10 products for a given search. This is because you're only looking at the first page of results, since the default page to retrieve is the first.

Instead of using the default of 1 (or any other integer), try passing the constant Amazon::Search::ALL_PAGES for a search that you suspect would yield a high number of hits, such as the power search. This constant signals to Ruby/Amazon that it should carry out as many background AWS operations as required to return all hits for a given search query. If you chose a generic search, it could be quite some time before you get back your results.

I now leave searches to concentrate on a different area of AWS where Ruby/Amazon can be of assistance.

If you want to build your own online store, but don't wish to invest in any stock, you can use the gigantic Amazon catalog as your inventory, have Amazon take care of the postage, packing, invoicing, and dispatch, and simply sit back and earn referral fees on any items sold via your site.

One way to achieve this is the so-called "remote shopping cart" feature of AWS. This is one of the areas where Ruby/Amazon can take a lot of the work out of your hands, as the required tracking of shopping cart IDs and security tokens is handled internally by the objects you create. You won't even know these elements exist, much less need to deal with them manually.

Listing Four takes a list of ASINs from the command line, and adds each of the corresponding products to a new remote shopping cart. Finally, it spits out the purchase URL of the new cart; see Figure 3. If that URL were subsequently entered into a browser, the contents of the remote cart would be uploaded to the local shopping cart of the browser's user.

This feature lets you tie your users more closely to your online store by seamlessly integrating shopping-cart facilities in your own site. This alleviates the need to link away to Amazon for the actual purchase, which would carry with it the risk of the customer continuing his browsing from there, rather than returning to your site to potentially earn you more referral fees.

One of the more advanced features offered by Ruby/Amazon is the option of customer geolocation. This is internally orchestrated via MaxMind's GeoIP library (http://www.maxmind.com/). For this feature to work, you need to have GeoIP installed, along with the associated Ruby bindings (http://www.rubynet.org/modules/net/geoip/).

The idea here is that you capture the host name or IP address of the customer connecting to your web site and then use that information to determine the Amazon site at which the customer is most likely to want to shop.

For example, a CGI (or mod_ruby, http://modruby.net/en/) script might contain something like this:

require 'amazon/locale'
require 'amazon/search'

include Amazon::Search

locale =
Amazon::Locale::get_locale_by_addr
( ENV['REMOTE_ADDR'] )

r = Request.new( 'D23XFCO2UKJY82',
'calibanorg-20', locale )

If, for example, the customer were coming from 217.110.207.55, the locale of the Request object would now be set to de, which corresponds with amazon.de. All future searches would now take place within the German catalogue. You can use this to try to intelligently and transparently connect the customer with the product catalogue in which he or she is most likely to be interested.

Your customers don't have to be physically located in one of the actual countries that Amazon serves via AWS for geolocation of the locale to succeed. For example, Irish customers would be localized to the UK site, Swiss customers to the German site, and Canadians would find themselves localized to the U.S. amazon.com site.

What else could you do with Ruby/Amazon? Perhaps you maintain a database of your book collection, just to keep track of everything. Wouldn't it be nice if you could associate each book in your collection with a picture of its cover? Well, one of the properties you can retrieve for books in Amazon's vast catalog is image_url_large, which is a URL to a large image of the book's cover (there are also small and medium variants). Alexandria (http://alexandria.rubyforge.org/) is a GNOME application written with this exact purpose in mind. It lets you enter and manage your book collection and calls upon Ruby/Amazon to fetch images of the items in it.

Ruby/Amazon is a convenient, time-saving interface to Amazon's AWS API. By extension, it demonstrates that Ruby is a serious contender for general web-service programming.

DDJ



Listing One

#!/usr/bin/ruby -w
# Use the Ruby/Amazon library
require 'amazon/search'

# Include the constants, classes and methods from the Amazon::Search module
include Amazon::Search

# Create a new request object, passing in my AWS developer's token, my
# Associates ID, the locale in which I wish to work (the US or amazon.com in
# this case) and the fact that I do not wish to use a results cache.
#
# Only the first parameter is required. The locale defaults to US and a cache
# is used unless specified otherwise. Even the need for the first parameter
# can be avoided by entering it in ~/.amazonrc.
#
r = Request.new( 'D23XFCO2UKJY82', 'calibanorg-20', 'us', false )

# Search for for items related to Ruby programming in the books section of the
# Amazon catalogue. Pass each product found into a block, calling the product
# object p on each iteration.
#
r.keyword_search( 'ruby programming', 'books' ) do |p|
  authors = p.authors.join( ', ' )

  # Information for books that are currently in print or soon to be available.
  unless p.availability =~ /not available/
    printf( "%s by %s (ASIN %s) for %s\n%s\n\n", p.product_name, authors,
                                       p.asin, p.our_price, p.availability )
  end
end
Back to article


Listing Two
#!/usr/bin/ruby -w
require 'amazon/search'

# Construct an array of search strings and product catalog in which to search.
favourites = [
  [ 'global underground', 'music' ],
  [ 'the divine comedy', 'music' ],
  [ 'only fools horses', 'dvd' ],
  [ 'the office', 'dvd' ],
  [ 'minder', 'dvd' ]
]
q = Amazon::Search::Request.new( 'D23XFCO2UKJY82' )

# I wish to search the British (amazon.co.uk) catalogue.
q.locale = 'uk'
favourites.each do |search_string, category|
  r = q.keyword_search( search_string, category )
  r.products.each do |p|
    if p.availability =~ /not yet released/i 
      case p.catalogue
      when 'Music'
        printf( "%s CD by %s\n(ASIN %s)\n\n", p.product_name, p.artists, p.asin )
      when 'DVD'
        printf( "%s DVD\n(ASIN %s)\n\n", p.product_name, p.asin )
      end
    end
  end
end
Back to article


Listing Three
#!/usr/bin/ruby -w
require 'amazon/search'
include Amazon::Search

# The iRiver iHP140 jukebox
asin = 'B0001G6UAW'

# Look it up
r = Request.new( 'D23XFCO2UKJY82' )
resp = r.asin_search( asin )

# Quit if we didn't get a response
exit unless resp

new_price = resp.products[0].our_price
begin
  # Read the price from the last check
  cur_price = File::readlines( '/tmp/price' ).to_s.chomp
rescue Errno::ENOENT    # File won't exist the first time we are run
  cur_price = '0'
end

# Quit if there's no price change
exit if new_price == cur_price

# Write new price out to file and stdout
File.open( '/tmp/price', 'w') { |file| file.puts new_price }

printf( "Price of %s has changed from %s to %s\n",
        resp.products[0].product_name, cur_price, new_price )
Back to article


Listing Four
#!/usr/bin/ruby -w
require 'amazon/shoppingcart'
include Amazon

DEV_TOKEN = 'D23XFCO2UKJY82'
sc = ShoppingCart.new( DEV_TOKEN )
       
ARGV.each { |asin| sc.add_items( asin, 1 ) }
puts sc.purchase_url
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.