Site Archive (Complete)
Email
Print
Reprint

add to:
Del.icio.us
Digg
Google
Furl
Slashdot
Y! MyWeb
Blink
September 01, 1997
PDL: The Perl Data Language

(Page 1 of 2)
Frossie Economou
The Perl Data Language is a Perl extension for numerical manipulation that provides the convenience of Perl with the speed of compiled C.
DDJ, Software Careers Fall 97: PDL: The Perl Data Language

PDL: The Perl Data Language

Software Careers Fall 1997 Dr. Dobb's Journal

by Karl Glazebrook and Frossie Economou


Karl is an astronomer at the Anglo-Australian Observatory and can be reached at kgb@aaocbn1.aao.gov.au. Frossie is a software engineer and astronomer at the UK Infrared Telescope in Hawaii and can be reached at frossie@jach.hawaii.edu. Reprinted courtesy of The Perl Journal (http://www.tpj.com/).
Tables and Examples
Insert Extolling the virtues of Perl and its many uses to the many Perl users out there is preaching to the choir. Nevertheless, there is one fundamental area in computing where Perl has been conspicuously absent-number crunching.

Tarred by the same brush as other scripting languages, Perl (which in fact is semicompiled), is perceived as too slow and memory hungry for heavy numerical computations because it doesn't lend itself to storing and retrieving zillions of numbers quickly. This has been a source of great frustration to us, enthusiastic Perl users who resent being forced to use other environments for our astronomical data analysis. Perl's potential for manipulating numerical data sets speedily and elegantly via an extension was obvious. Hence PDL, the Perl Data Language, was conceived. PDL is a Perl extension for numerical manipulation that provides the convenience of Perl with the speed of compiled C.

PDL introduces a new data structure: the "pdl numerical array," often referred to as a "piddle." (This unfortunate nickname has led to some rather dubious puns in the source code.) A piddle is a special object that can contain a large block of efficiently stored numbers for manipulation with normal mathematical expressions. For example, if $a is a piddle containing a 346 chunk of data, then the Perl statement $b = sin($a) will set $b equal to $a; but with every value replaced, with its sine. Because each operation is implemented via compiled C code, it's nearly as fast as a hand-crafted C program.

PDL can be used "normally" from a script, but it also has a shell interface for interactive data analysis and prototyping. In this article, we demonstrate PDL using the PDL shell, called "perldl." We assume you have PDL-1.11 (see Table 1 for a list of PDL 1.11 functions), PGPLOT-2.0 (which itself requires the pgplot graphics library), and Perl 5.003. If you also have the right versions of the Term::ReadLine and Term::ReadKey modules, the perldl shell allows interactive command-line editing.

The perldl shell, invoked at the command line by perldl, behaves like Perl's debugger. For instance, we can assign values to variables and print them with p, as in Example 1. PDL is really about matrices. In Example 2, we create a 23 matrix and multiply it by itself.

To have true fun with PDL, we need some data. Luckily, the PDL distribution comes with a picture of the sky stored in FITS, the standard format for astronomical data. PDL also supplies rfits(), a function that reads FITS files and returns a piddle containing the data. Example 3(a) shows how we read in our image and plot it.

Now we have data (and we didn't have to spend three nights freezing on top of a mountain to get it). What do we know about it? We know that it is 16-bit with 65,536 elements. But is it 65536*1 or 256*256 or even 16*16*16*16? We can determine the dimensions of our data by using the dims() function; see Example 3(b). dims() is a PDL function that returns the dimensions of a piddle. Not surprisingly (after all, it's a picture of the sky), we have a two-dimensional image: 256*256.

To determine the mean and standard deviation of a piddle, we use the stats() function; Example 3(c). To print a portion of the piddle, we use the sec() function. Since we don't want to display all 65,536 numbers in this particular example, we use sec() to return the bottom-left corner instead. Example 3(d) displays the rectangle between elements (0,252) and (3,255). Additional dimensions are handled seamlessly by passing the extra coordinate values as arguments.

Perhaps you're getting restless at this point. Let's abandon the function calls and jump to the cool stuff. The PDL function imag() uses the PGPLOT library to display images. In Example 4(a), imag() displays the piddle $a; see Figure 1(a). It's full of stars! This is, in fact, an image of Messier 51, a spiral galaxy similar to our own but at a distance of 200,000,000,000,000,000,000 miles away, give or take a few billion. That's a bit too far for us to invade, but we can at least humiliate it, using the command in Example 4(b). This results in Figure 1(b).

Since we're exploring cosmology, we can create something out of nothing; see Example 5(a). As you can see, PDL functions can be chained together just like Perl functions. Two PDL functions are cascaded here: rvals() and zeroes(). First, zeroes() creates a piddle full of zeros-in this case, a 2020 matrix with every element zero. (There's also a ones() function that creates a piddle full of ones.) Then, rvals() fills that piddle with values representing the distance of each element from the center. We now use the exp() function to generate a two-dimensional Gaussian, as shown in Example 5(b) and Figure 2(a).

Figure 1:
Figure 1: (a) Image of Messier 51, a spiral galaxy similar to ours; (b) humiliated galaxy.

The less mathematically inclined will say it looks like a blob. We can inflict a bit more punishment on Messier 51 by convolving it with our newly created Gaussian filter; Example 5(c) and Figure 2(b). This enables us to simulate what we would see if we were observing through very bad viewing conditions, such as a (possibly drunken) haze. This operation takes 20-25 seconds on a Pentium 120 or Sparc 20 with PDL. Doing this with a 2D array in normal (non-PDL) Perl takes 13 minutes and uses 11 times as much memory.

Example 5(d) creates an unsharp masked image, as in Figure 2(c). This is often used in astronomy to emphasize sharp features against a bright background, such as stars in a galaxy, the giant luminous gas clouds we call HII regions, or foreground objects such as UFO's (err, weather balloons).

Where are We Now?

PDL was prototyped early in 1996 and is currently in beta release. (One author's habit of prototyping Perl modules while on astronomical observing trips leaves the other author wondering whether this is a previously unknown symptom of altitude sickness.) It is functional but not yet fully featured. It is under vigorous development, thanks to the work and enthusiasm of the perldl mailing list participants. The current stable version as we write this article is 1.11.

Figure 2:
Figure 2: (a) Two-dimensional Gaussian filter; (b) Messier 51 convolved with Gaussian filter; (c) unsharpened mask.
Several auxiliary modules for PDL are nearing completion: 3D graphics manipulation using OpenGL (Tuomas J. Lukka), an interface to the Meschach matrix library (Etienne Grossmann), an interface to the XMGR plotting tool (Tim Jenness), access to the TIFF, GIF, PostScript, and other formats supported by PBM+ (Christian Soeller), and an interface to the SLATEC numerical library (Tuomas J. Lukka). Other formats are under development for PDL 2.0, which will also feature virtual slicing, easier C extensions, and fast implicit looping.

Anyone wishing to reflect or opine on the rather technical issues surrounding PDL development is welcome to join the perldl mailing list at perldl-request@jach.hawaii.edu or the porters list at pdl-porters-request@jach.hawaii.edu. Send your questions to the former and your complaints to the latter. Finally, more information (including Christian Soeller's PDL FAQ) is available at http://www.aao.gov.au/local/www/kgb/perldl/.

Acknowledgment

We thank Pat Seitzer and the IRAF group of the National Optical Astronomy Observations for permission to use the image of M51.

DDJ

1 | 2 Next Page
TOP 5 ARTICLES
No Top Articles.
DR. DOBB'S CAREER CENTER
Ready to take that job and shove it? open | close
Search jobs on Dr. Dobb's TechCareers
Function:

Keyword(s):

State:  
  • Post Your Resume
  • Employers Area
  • News & Features
  • Blogs & Forums
  • Career Resources

    Browse By:
    Location | Employer | City
  • Most Recent Posts:
    MEDIA CENTER  more
    Audio
    2008 International Mathematica Conference
    Dr. Dobb's interviews Wolfram Research's Theo Gray, co-founder and Director of User Interfaces, and Roger Germundsson, Director of Research and Development, about the upcoming 2008 International Mathematica Conference.
    NetSeminar
    Extending Enterprise Value with Web 2.0
    In this webcast we will talk about how to simply build and quickly remix Web 2.0 applications and the role of the IT department and how they support mashups. We will discuss how IBM can help IT teams adapt existing enterprise systems as well as develop unique ones that can support end user driven mashups in a reliable, scalable and secure way. We will highlight a simple scenario adapting an enterprise information source for mashups and how to test it. We will also cover how IBM can help you build agile, fast and simple web applications based on dynamic scripting languages that dramatically reduces development time. Wednesday, September 24, 2008 - 12pm PT / 3pm ET
    The Great Debate: PostgreSQL vs. MySQL
    Common industry perceptions of MySQL and PostgreSQL aren't as true with the current generation of releases as they used to be. DBAs, developers, and IT managers and decision-makers will benefit from this presentation and live Q&A about the pros and cons of using PostgreSQL or MySQL, which will include a discussion about the ongoing trend towards using open source in the enterprise. Tuesday, October 7, 2008 - 9:00 AM PT / 12:00 PM ET
                                   
    EVENTS
    Software Development Best Practices 2008
    October 27-30, 2008
    Boston, MA
    Join us for Software Development Best Practices 2008, Dr. Dobb's premier east coast event, featuring world-class training on the entire software development lifecycle. Register today for a conference pass or a FREE Expo Pass!

    Nominations for the Jolt Awards - the “Oscars” of the software development industry - are now open. Submit your Jolt-worthy products, books and technologies by October 24 to save 20%.
    INFO-LINK

    Resource Links:




    Related Sites: DotNetJunkies, SD Expo, SqlJunkies