Dynamic Programming
Let's look at an example of a problem that can be solved by dynamic programming and examine the situations in which this technique can be used, Consider the problem of taking a sequence of matrices and multiplying them together: A0...An-1. The function to multiply two matrices is shown in Listing Two.
(defmacro rows (m) '(array-dimension ,m 0)) (defmacro columns (m) '(array-dimension ,m 1)) (defmacro make-initialized-matrix (rows COlumns) '(make-array (list ,rows ,columns) : element-type 'float : initial-element 0.0)) (defmacro matrix-ref (m i j) '(the float (aref ,m ,i ,j))) (defsetf matrix-ref (m i j) (value) '(setf (aref ,m ,i ,j) ,value)) (defun matrix-mu1tiply (a b) (let ((c (make-initialized-matrix (rows a) (Columns b)))) (dottimes (i (rows a)) (dottimes (j (columns b)) (dottimes (k (columns a)) (setf (matrix-ref c i j) (+ (matrix-ref c i j) (the float (* (matrix-ref a i k) (matrix-ref b k j)))))))) c))
First, some macros are defined. The macros rows and columns return the number of rows and number of columns, respectively, of a matrix. I'll use floating-point matrices only. I use two-dimensional arrays to represent matrices. Rather than expose the representation, I define the expression (matrix-ref m i j) to get the element in row i and column j from matrix m, and the expression (self (matrix-ref m i j) v) to assign it the value v. The function make-initialized-matrix creates a matrix initialized with 0.0's.
Notice the declarations. The macro matrix-ref hides one of them, but one declaration is around the multiplication in the inner loop. Most Common LISP compilers should be able to do enough type inferencing to determine that the type of the result of multiplying two floats should be another float, but you can never be sure.
The function matrix-multiply is simple. The cost of multiplying two matrices is dominated by the time to do the floating-point (scalar) multiplications, and from the code within the inner dotimes, we can see that time is proportional to pqr in which a is a p × q array and b is a q × r array.
Listing Three shows the function to multiply a list of matrices together. A list of matrices is multiplied by starting at the right and multiplying the accumulated result by the next left matrix. In a typical LISP implementation, the matrices are pushed on a stack from left to right and are popped off to be multiplied by an accumulated result.
(defun simple-multiply (a) (cond ((null (cdr a)) (car a)) (t (matrix-multiply (car a) (simple-multiply (cdr a))))))
Because matrix multiplication is associative, it doesn't matter in what order we perform the multiplications. (Matrix multiplication is associative for matrices over real numbers but not necessarily over floating-point numbers.) Suppose we have three matrices: 1×3, 3×3, and 3×3. The simple matrix multiplier simple-multiply will first multiply the two 3×3 matrices, doing 27 scalar multiplications and yielding another 3×3 matrix. It will multiply that result by the 1×3, doing another nine scalar multiplications for a total of 36 scalar multiplications. If, instead, we multiplied the first two matrices together, the result would be a 1×3 matrix (nine scalar multiplications), which we would multiply by the second 3×3 matrix to get the resulting 1×3 matrix (nine scalar multiplications), and that matrix would take only 18 scalar multiplications.
The optimum order for the multiplications is called a "parenthesization." Let's look at the divide-and-conquer approach. Suppose we have a sequence of n matrices A0...An-1. We can create a vector that represents dimensions of the sequence of matrices. If Ai has dimensions pi×pi+1 for 0 ≤ i < n, the vector is p0...pn. I want to show that this approach is not practical so I'll write a function that computes the minimum number of scalar multiplications needed to multiply the sequence of matrices. With a little extra work, we could also compute the parenthesization.
The minimum number of scalar multiplications needed is determined by finding the cut point k,0 ≤ k < n, that minimizes the number of scalar multiplications. The cut point defines a parenthesization. The number of scalar multiplications required to multiply A0 through Ak is determined, and the number required for Ak-1 through A0 is determined. If the minimum number of scalar multiplications required to multiply Aq...Ar is denoted Mq,r, the number of scalar multiplications required by cutting at k is shown in Equation 1:
M0,k,Mk+1,n + p0pk+1pn+1.
When we select the cut point that minimizes this value, we have the minimum number of scalar multiplications.
Equation 1 can be generalized to any subsequence Ai,...Aj, as shown in Equation 2:
min Mi,k+Mk+1,j+pipk+1 pj+1,for i [LESS-THAN OVER EQUAL TO]k < j.
The procedure to compute the minimum number of scalar multiplications is recursive, as shown in Listing Four. The argument p is a vector of matrix dimensions. The internal function rmin does the work. If the starting and ending points are the same, no multiplications are needed. If two matrices are being multiplied, the number of scalar multiplications is computed. If three or more matrices exist, all cut points are considered and the one that produces the minimum number of scalar multiplications is selected.
(defun rcomplex-multiplications (p) (labels ((rmin (start end) (cond ((= start end) 0) ((= (+ start 1) end) (* (aref p start) (aref p (1+ start)) (aref p (1+ end)))) (t (let ((min most-positive-fixnum)) (loop for 1 from start to (1- end) do (let ((v (+ (rmin start 1) (rmin (1+ 1) end) (* (aref p start) (aref p (1+ 1)) (aref p (1+ end)))))) (when + V min) (setq min v)))) min))))) (rmin a (- (length p) 2))))
You might notice that some values of M are computed several times. If we pass in a sequence of 10 matrices, M4,5, is computed 838 times. In fact, through an interesting series of calculations we can conclude that the running time of this function is on the order of 2n; that is, exponential in n, the number of matrices.
To get some idea of what this situation means, I selected 15 matrices with certain dimensions -- to compute the minimum number of scalar multiplications needed to multiply these 15 matrices takes a little over six minutes, whereas to compute the product in the obvious right-to-left manner using simple-multiply takes about 45 seconds. Therefore, choosing the optimum order by using rcomplex-multiplications makes little sense. But take heart, we'll see a program that computes this product in less than two seconds, including finding the optimum order.
The approach to take is to store the values of Mi,j in an array and to compute M0,n-1 from the bottom up. Equation 2 can help fill the array corresponding to M by first filling in the entries that correspond to Mi,j for 0≤ i < n; second, filling in the entries for Mi,j+1 for 0 ≤ i < n-1; third, filling in the entries for Mi,j+2 for 0 ≤ i < n-2; and filling them in all the way up to the entry for M0,n-l. The running time of this algorithm is on the order of n3.
The function that computes the minimum number of scalar multiplications in this manner is shown in Listing Five. This function also computes the cut points. Input is a vector of dimensions. The array m stores the values of M. The array s stores the cut points.
1 (defun matrix-cha1n-order (p) 2 (let* ((n (- (length p) 1;) 3 (let ((m (make-array (list n n) : initial-element 0; 4 (s (make-array (list n n;; 5 (loop for d from 2 to n do 6 (loop for 1 from 0 to (- n d) do 7 (let ((j (- (+ 1 d) 1;)) 8 (setf (aref m i j) most-positive-fixnum) 9 (loop for k from 1 to (- j 1) do 10 (let ((q (+ (aref m i k) 11 (aref m (+ k 1) j) 12 (* (aref p i) 13 (aref p (+ k 1)) 14 (aref p (+ j 1))))) 15 (when ( q (aref m i j; 16 (setf (aref m i j) q) 17 (setf (aref s i j) k)))) 18 (values s m))))
Line 3 fills m with zeroes, which has the effect of initializing (aref m i i) with 0 for all i. The algorithm proceeds by computing the minimum cost of chains of matrices of increasing length (d), starting at length 2 (line 5). The second loop (line 6) steps through the starting points of the chain, which are 0 ≤ i ≤ n-d. The corresponding ending points are 1 ≤ j ≤ n+i-l. The first time through the loop, it computes Mi,i+l for 0 ≤ i < n-2, and the second time through, it computes Mi,i+2 for 0 ≤ l 7lt; n-3, and so on.
Line 8 initializes the entry for Mi,j to a value that amounts to infinity. The loop starting on line 9 scans all possible chains of length d, computes the number of scalar multiplications necessary (lines 10-14), and selects the minimum (lines 15-17). Line 17 stores the cut point for the minimum. Finally, the two arrays s and m are returned (line 18).
The optimum cut point is computed for every subsequence Ai,...,Aj. If the optimum cut point for Ai...,Aj is k,(aref s i j) contains k.
The array s is used to drive the real multiplication (Listing Six). The argument a is a list of matrices. In line 3, the list of matrices is coerced to a simple vector. Lines 4-11 compute the matrix of cut points.
1 (defun complex-multiply (a) 2 (let ((len (length a))) 3 (let ((av (coerce a 'simple-vector))) 4 (let ((s (matrix-chain-order 5 (make-array 6 (list (1+ len)) 7 :initial-contents 8 (cons (rows (car a)) 9 (mapcar 10 #.'(lambda (m) (columns m)) 11 a)))))) 12 (labels ((matrix-chain-multiply (i j) 13 (if ( < i j) 14 (matrix-multiply 15 (matrix-chain-multiply 16 i (aref s i j)) 17 (matrix-chain-multiply 18 (+ (aref s i j) 1) j)) 19 (svref av i)))) 20 (matrix-chain-multiply 0 (1- len)))))))
The local function matrix-chain-multiply performs the multiplications recursively. If i and j are the same, the matrix Ai is retrieved from the vector av. Otherwise, the cut point is retrieved from s. Call the cut point k. The matrices Ai...Ak are multiplied, Ak+l...Aj are multiplied, and the two resulting matrices are multiplied.
To show that this method makes a difference, I created 10 matrices with randomly selected double-float elements. The dimensions were 30×35, 35×2, 2×35, 35×15, 15×1, 1×15, 15×5, 5×10, 10×20, and 20×25. The elementary recursive-multiplication function took 10 seconds, and the matrix chain multiplication function took 74 seconds on a Sun 5/60.