Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Faster Fractal Compression


JAN96: Faster Fractal Compression

Faster Fractal Compression

Speed

D.R. McGregor, R.J. Fryer, P. Cockshott, and P. Murray

The authors work in the department of computer science at the University of Strathclyde, Glasgow, Scotland. They can be contacted at [email protected].


Fractal image compression has a formidable reputation. While reputed to achieve extremely high compression ratios, it nonetheless makes extreme demands on computing power and mathematical understanding. To truly comprehend it, you must wade through textbooks on Cauchy series, Hausdorff spaces, affine transforms, and the like. To win its promise, you must be prepared to sacrifice hours of CPU time.

Once you get the hang of it, however, the ideas underlying fractal compression are quite simple. You don't need to be an expert in abstract algebra or topology to understand it, and by applying fairly simple optimization techniques, the compute cost of fractal compression becomes quite modest. In this article, we'll explain how and why fractal compression works, then present the Fast Fractal Transform, a family of algorithms that achieve a several-hundred-fold speed-up over the simple fractal transform.

How Fractal Compression Works

Fractals are structures that exhibit self similarity at different scales. The classic example is a mountain, like that in Figure 1. A photo of an area on the side of a mountain has a characteristic, "mountainy" appearance. If you zoom in on a portion of the mountain, select an arbitrary square half the size of the original, and enlarge it to the original size, the blow up will exhibit the same sort of mountainy look. In Figure 1, a picture of Cir Mhor on the Scottish Island of Arran, the big block B is similar to the small square A. The small square C is also a mirror image at a smaller scale of the big block D. Figures 2(a) and 2(b) are enlarged versions of blocks A and B.

If you grasp the essence of this mountainy look, you can "grow" mountain pictures to order, as is done in the type of fractal image synthesis used in cinematographic special effects. Image synthesis "grows" a generic mountain; fractal compression uses similar principles to regenerate a specific mountain, flower, bird, or tree.

A half-sized square of the mountainside has the same general appearance as the whole mountain because (and this is a recursive argument) within the half-sized square, numerous areas look like some larger areas in the full-sized image. In other words, small hillocks and rocks look pretty much like larger hillocks and rocks. No matter which size picture you look at, your mind's eye takes in this texture--this particular sort of hillocky, bumpy look that gives a mountain its special character.

Fractal compression takes advantage of the similarity between the part and the whole. A fractal transform of a picture is in essence a list of pairs of squares [(B1,S1), (B2,S2),...(Bn,Sn)], where the Bi are selected big-square blocks and Si are small squares which, taken together, cover the entire image. Square S1 looks like block B1, square S2 looks like block B2, and so on. The fractal transform records these similarity relationships.

Reconstructing the Image

The wizardry of fractal compression is embodied in the regeneration of the image from pairs of similar squares. For each small square on the picture, there's a similar big block. You want to use information about the big blocks to fill in the small-scale detail. If you shrink a copy of a big block and move it onto the corresponding small square, you'll get an approximation of what the small square should look like. The quality of this approximation depends upon the degree of similarity between the small square and the big block. The more fractal the image, the greater the likenesses between its big blocks and small squares--and the better the approximation generated by copying and shrinking.

Using big blocks to fill in small squares gives you leverage on the detail. This is similar to a mint producing the dies for coins: A large model of the coin is first made by hand, then traversed by an arrangement of levers which guide the cutter that makes the small-scale die. In the process, letters that were reasonably large on the model are reduced to the neat, small print on the coin.

The compressed file contains a small amount of additional information, giving the average brightness and color levels for each small square. This gives you something to start with.

If you set up a fractal decompression program to output to the screen (showing you the image as it is decompressed), you initially see a patchwork of small, uniformly colored squares; eight pixels on an edge, for example. The initial 8x8-pixel squares get their colorization from the data in the initial file; at the next iteration, each small square is replaced by a reduced image of a 16x16 big block that overlaps with 4-9 original small squares, providing more details. In the next step, the 16x16 big block covers 4-9 small squares, each of which contains details derived from 4-9 small squares in the previous step. After a few iterations, details are a single pixel in size.

Complexity of Compression

The big obstacle to widespread use of fractal compression has been its huge computational cost. This stems from its basic operation--finding a most-similar big block for each small square in the picture.

Assume you want to compress a picture 256x256 pixels in size using small squares of size 8x8 matched against big blocks of size 16x16. Clearly, the image would be covered by a patchwork of 1024 squares, 32 in each direction. For each small square, you would have to find the corresponding big block that best fits it.

How many big blocks do you have to search?

Consider the number of points on the picture that could serve as the origins of big blocks. On the x-axis, any position from 0 to 239 would do. Above 239, the big blocks would be partly off the right edge of the screen. By symmetry, there are 240 possible positions in each direction, giving a total of 57,600.

The number of square-to-block comparisons it would take to search 57,600 big blocks to find the best match for each of the 1024 small squares is 58,982,400. If you assume the simplest case--that is, you perform the comparisons by simply decimating the big blocks so that you pick every fourth pixel in the big block and compare it to the corresponding one in the small square--then each block-to-block comparison has 64 pixel-to-pixel comparisons. This yields 3,774,873,600 pixel-by-pixel comparisons. But to do a thorough job, you must look not only for similar blocks in the same orientation, but also for rotated or mirror images of the original. Thus if you were compressing a picture of a building, the left edge might look like a scaled and reflected right edge. A decompression process that allows not just translation and scaling of squares but also rotation and reflection makes better use of the image's self similarity. Given four possible rotations and two mirror images, that's more than 30 billion pixel comparisons. Three color planes in the image will give you 90,596,966,400 byte comparisons. Since the most-similar block is usually determined by a least-squares method, each byte comparison involves a subtraction, multiplication, and addition, creating a significant number of instructions.

Each pixel in the image must be compared with almost every other pixel, so if the image is N pixels on an edge, its total complexity is of order N4. This is a polynomial, but of worryingly high order. This naive algorithm can take several hours of CPU time, and must therefore be optimized.

Doing It Quickly

At the University of Strathclyde, we've developed the "Fast Fractal Transform," a new and much faster algorithm. On a SPARC II, it takes 35 seconds to compress a 256x256 24-bit color image. On the same machine, an 8-bit gray-scale version of the same image took 8.5 hours using the conventional algorithm presented in the article "Fractal Image Compression," by L.A. Anson (Byte, October 1993).

Figure 3(a), an uncompressed 256x256 24-bit image entitled "Rachel," illustrates our technique's compression capabilities. Table 1(a) shows how long it takes to compress the image on a Sun 4/75, as well as the relative quality of the compression as reflected in the root-mean-square (RMS) pixel-by-pixel difference between the original and the reconstructed images. In this instance, we used 8x8 small squares, resulting in a compression ratio of 38:1. Table 1(b) shows compression using 4x4 small squares, resulting in a compression ratio of 9.5:1. Figure 3(b) is the Rachel image compressed using the best of eight matches. Figure 4 is a graph of the mean match error versus the number of candidate matches extracted from the K-D tree (discussed shortly) for a typical 256x256 24-bit image. The match error is the sum of the squares of the individual pixel errors between a small square (4x4, in this instance) and its corresponding big block for the case of the best match delivered. This computation assumes the color planes are independent of each other.

The obvious alternative to an exhaustive search is indexing blocks in terms of their similarity. This lets you perform the whole process in two passes. The first pass places each big block into the index and the second looks up the most-similar partner in the index for each small square. Since indexes can usually be searched in logarithmic time, it seemed that the entire process would be much faster for two reasons:

  • Associative store enables the matching process to select only those big blocks that are most similar to the small square currently being matched. Each match can be done in a time logarithmically related to the number of big blocks existing in the image. For example, because of the associative store's characteristics, the selection can obtain the best matches with the order of only log M single comparisons, where M is the number of big blocks.
  • A small square can be compared with a big block on the basis of its gross characteristics only. This is much faster than determining whether or not they conform on a pixel-by-pixel basis (typically two or three comparisons for an 8x8 block, as opposed to 64).

Suitable Gross Characteristics

A surprisingly large variety of gross characteristics can be used, based on the following criteria:

  • They should be independent, each providing different information to characterize the patch.
  • They should be relatively immune to noise (hence, characteristics of the entire set of patch pixels are preferred).
  • They should be simple to compute.
Example characteristics include the big block's Fourier coefficients and Discrete Cosine Transformation coefficients. Alternatively, you might use the result of summing the pixels in the large patch under one or more different masks, or of determining the principle moment of the image in the big block. The first step in our compression method computes such gross characteristics for the potential big blocks in the image.

The second phase of the algorithm computes the same set of gross characteristics for the small squares. These computations must be performed only once for both blocks and squares, respectively.

The computed characteristics of the small squares are then used to index the big blocks' associative table of characteristics to discover which big blocks are most similar to the given small square. For each small square, the associative mechanism can return either a single best match or a relatively small number of near matches, from which we can select the optimum match according to other criteria.

Matching a small square to a corresponding big block on the basis of its gross characteristics obtains the relative x- and y-coordinates required for the translation transform in the compression process.

The Associative Mechanism

When presented with a set of attributes, the mechanism must be able to quickly locate the nearest-matching item in its memory. In a single-processor system, this means with as few simple comparisons as possible.

Suitable associative data structures are already well known--the K-D tree, for instance. The mechanism adopted can select not only a single best match, but a set of k best matches. In such a tree, similar items are located near each other. To find a match for a small square, its attribute characteristics are calculated and the nodes of the tree are searched from the root down until an initial candidate best-match big block is located. The exact overall similarity distance from the small square is then calculated, and the search returns to the parent node, which contains upper- and lower-bound information pertaining to that level's attribute.

If the similarity distance between the small square's attribute and the bound (in that dimension only) exceeds the selection criterion, the alternate branch can be ignored; otherwise it, too, must be searched to find the best match, or best-matching set. This process is repeated until the root is reached. Thus, the store mechanism selects the best match, or a set of k matches, or the set within some search-specified distance of the given small square. This improves the range of candidate matches, while only minimally increasing the number examined.

The Distance Function used is a weighted Euclidean distance over the chosen attribute values corresponding to the blocks and squares being compared. The weight applied to a particular attribute value should optimally be linked to the accuracy of the attribute itself. (More attention is given to parameters that are on average more accurate than to those that are less accurate.) The weight must also, however, reflect the discriminating power of the attribute (for example, an attribute constant over all patches is useless, even though it is consistent and accurate).

The values of the weights used by the algorithm can be determined by a standard procedure, as follows:

    1. Process a set of images by Barnsley's method, obtaining the exhaustively searched, correct, best-match transform.

    2. Calculate all gross characteristics for all small squares and their corresponding big blocks.

    3. Based on these values, adjust the weights using optimization. This decreases the similarity distance between each Barnsley-matched pair and increases the distance between nonmatched pairs.

Two Dimensions are not the Limit

Unlike previous methods, the Fast Fractal Transform is practical for 3-D data, whether derived from spatial dimensions (MRI 3-D scan data), or from video sources in which the third dimension is time. Very high compression ratios in reasonable time are possible. Indeed, this method could be applied to even higher-dimension data sets such as those derived from volumes of the atmosphere or ocean, involving variation over time and additional attributes in additional dimensions.

The Complexity of the New Method

In our example of a 256x256 image, there would typically be 240x240 candidate big blocks whose values must be stored in the table. The K-D tree implements a mechanism that can determine the best-matching big block in the order of log2(M) operations, where M is the number of entries for big blocks in the tree.

Thus, the best-matching big block can be determined in the order of log2(240x240) individual comparisons in the K-D tree (approximately 16) instead of 240x240 (57,600). Remember that in practice, a small set of best-matching candidates may be returned. The major components not yet accounted for are the calculation of the characteristics (once only) for setting up the tree, and the final pixel-by-pixel comparison of the nearest-neighbor set of candidates returned by the Associative Memory.

The cost of building the K-D tree is composed of the cost of computing the gross attributes and the cost of storing them in the tree. Typically a small number of attributes are used, say three to five. The cost of computing each attribute is, to a first approximation, proportional to the size of the big blocks, say m2, where m is the edge of the block. So the complexity is m2xnumber of big blocksxnumber of attributes. For the example given, this requires O(16x16x240x240x3) approximately equals O(44,236,800) operations. Building the tree requires O(240x240xlog2(240x240)) approximately equals O(892,800) operations. The rate-limiting factor is thus the computation of the gross attributes, and Fast Fractal Transform has a complexity about three orders of magnitude less than the conventional algorithm. The observed execution times are indeed about three orders of magnitude faster.

References

Anson, L.A. "Fractal Image Compression." Byte (October 1993).

Barnsley, M.F. and A.D. Sloane. "A Better Way to Compress Images." Byte (January 1988).

Buerger, M.J. Crystal Structure Analysis. New York, NY: John Wiley & Sons, 1960.

Friedman, J.H., J.L. Bentley, and A.F. Finkel. "An Algorithm for Finding Best Matches in Logarithmic Expected Time." ACM Transactions on Mathematical Software, Vol. 3, No. 3, 1977.

Figure 1: The big block B is similar to the small square A in this photo of Cir Mhor on the Scottish Island of Arran. The small square C is also a mirror image at a smaller scale of the big block D. (Photo courtesy of Neil MacLennan, Strathclyde University Audio Visual Services.)

Figure 2: Enlarged versions of A and B.

Figure 3: (a) An uncompressed 256x256 pixel picture of "Rachel" (b) "Rachel" compressed using the best of eight matches.

Figure 4: Graph of the mean match error versus the number of candidate matches extracted from the K-D tree for a typical 256x256 24-bit image.

Table 1: (a) Compression results on an 8x8 small square; (b) results on a 4x4 small square.

     	Number of    Time to       RMS Error 
      	 Matches     Compress

(a)         1         19.53         15.07 
            2         20.55         12.50 
            8         28.75         12.18

(b)         1         15.82         10.52 
            2         17.67          8.11 
            8         30.42          6.74


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.