Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Calling C Functions with Variably Dimensioned Arrays


AUG93: Calling C Functions with Variably Dimensioned Arrays

Bringing C up to snuff, Fortran-wise

John is a computational scientist in the University of Toronto's high-performance computing group. His interests include scientific programming on vector and massively parallel supercomputers. John can be reached through the DDJ offices.


With the growing popularity and acceptance of UNIX, many scientific and engineering programmers are switching from Fortran to C to code their technical applications. Both languages allow for compilation of individual source modules and are therefore ideal for creating general-purpose subroutines that can be incorporated into libraries that can then be used in a wide variety of applications.

The Multidimensional-array Problem

However, when it comes to passing multidimensional arrays to a function or subroutine, C is drastically inferior to Fortran. Fortran has an elegant mechanism that allows a calling program to pass an array with adjustable dimensions to a subroutine--the actual dimensions can be passed to the subroutine as one of the subroutine parameters. This allows us to write a completely general-purpose Fortran subroutine.

Such is not the case for C functions, however. The C compiler wants to know the absolute size of all but the first dimension of any arrays passed to a function. For instance, you can get away with calc(int n, float x[]) {_ but not calc(int n, float x[][]) {_.

C can deal with variably dimensioned vectors (one-dimensional arrays), but when an array has more than one dimension, the C compiler has to know the size of the last dimensions expressed as a constant. This is a glaring shortcoming to scientific programmers, whose entire data structures are often based naturally on two- and three- (and higher) dimensional arrays. It's not really surprising, though, when we consider that C was conceived as a sort of high-order systems-programming language. Systems programmers generally don't need multidimensional arrays--certainly not those with variable dimensions.

Unsatisfactory Solutions

This problem has long been recognized, of course, and those who work with numerical applications in C have devised various solutions to the problem. An obvious solution is to go ahead and declare the arrays in the functions to be an absolute size and hope that no bigger arrays will ever be required. This can be wasteful of memory in most cases, and seldom will you be sure what the maximum array size should be.

The usual solution is to construct multiple-dimension arrays as vectors of pointers, eventually pointing to a vector of the required data type. For instance, a matrix of floating-point values (a two-dimensional array) would be declared as a vector of type float *, with each element pointing to a vector of type float.

This works, but has serious shortcomings. It's probably acceptable in a program being written from scratch, but not in designing a subroutine that will find its way into a library or will be reused in other programs. The problem is that the calling program must be aware that the function expects arrays to be defined this way and must define all arrays in this fashion. This means that any other computations done on the arrays must take this special structure into account. What we get is a case of the tail wagging the dog--the array structure imposed by the function influences whatever else the main program does with these arrays. This limits portability--someone who wants to use a function that assumes arrays are defined in this way may be unwilling or unable to change the definition in his or her main program. It also violates the principle of information hiding, which is fundamental to the top-down structured design process. High-level routines should not have to know or care what data structures are used by lower-level routines.

The Easy Solution

Still, it is possible to pass to a C function multidimensional arrays which the function can treat as having variable dimensions, and to do so in a fashion transparent to the calling program. It is not necessary to define the arrays in a special way in the main program. Experience has shown me that this can be done by adopting the following principles:

  • Pass the array to the function as though it were a pointer to a vector of floats (or the appropriate data type), no matter how many dimensions the array actually has, along with the dimensions of the array.
  • Reference individual array elements as offsets from this pointer. For example, for a two-dimensional array A, element aij has an offset of ncolxi+j from *a, where ncol is the number of columns in the array. This idea can be extended in a straightforward manner to arrays with more than two dimensions.
  • Write your algorithm so that array elements are accessed in storage order. This is really the key principle to making this technique work. It allows us to access subsequent elements of the array using the postincrement operator (for instance, *a++) and eliminates the need to calculate offsets for individual array elements. As a side benefit, algorithms written this way are generally faster than those using array subscript notation.
To apply these principles successfully, it helps to be comfortable programming and reading code written in C. Otherwise, applying the aforementioned principles can make the code appear somewhat obscure.

To illustrate the procedure, I'll use an example of multiplying two matrices together--a basic linear-algebra subroutine that has many arithmetic operations.

Matrix Multiplication

Before looking at the example, I'd like to briefly consider the topic of matrix multiplication on computers in general. Our task is to compute the product of two matrices A and B and store the result in C. From linear algebra we know that if A and B have dimensions mxn and nxp, respectively, then C will have dimensions mxp. The generic matrix-multiplication algorithm consists of three loops; see Example 1(a).

The loop indexes and termination points have been left blank. They will have names i, j, and k and values m, n, and p, respectively. There are six possible permutations for arranging the loops. Each does the same operations, but they have very different memory-access patterns. The pattern used in standard linear-algebra texts, which we would use if multiplying the matrices by hand, is shown in Example 1(b).

Elements in matrices A and C are accessed in row order. However, since Fortran stores arrays in column-major order, this does not access consecutive storage locations. We would prefer to access all matrix elements as consecutive storage locations, so we should use the algorithm in Example 1(c). Note that this algorithm performs exactly the same operations as the first, but in this case we access successive array elements by going down the matrix columns. Since this is the way Fortran stores arrays, we're also accessing consecutive locations in memory. This algorithm does very well on vector supercomputers, such as those from Cray Research, since it eliminates memory-access conflicts and also lends itself well to vectorization and chaining. This algorithm also does better than the first on scalar machines, like workstations or PCs, since we maintain coherency of the data cache and minimize page faulting.

An Example in C

We would like to maintain the same memory-access pattern in C, but since C stores arrays in row-major order, we have to use a different permutation of the algorithm; see Example 2. Now that we have an algorithm that accesses its arrays in storage order, we can implement it as a C function; this is done in Example 3.

Note how the principles outlined earlier are adhered to. The dimensions of the arrays m, n, and p and the arrays themselves are passed to the function. The arrays are treated as pointers to vectors of floats even though they are two-dimensional arrays. We would treat them the same if they had three or more dimensions.

Within the function, wherever possible, we set pointers to the arrays (or subsets of the arrays) and access subsequent array elements by incrementing these pointers. This works because we have designed our algorithm to access array elements in storage order.

In this case we have a very compact function that will work with any size arrays, just as though we were working in Fortran. As a bonus, this version executes about twice as fast as the same algorithm coded using explicit subscript references (as tested on a RISC workstation and a PC clone).

Example 4 shows a sample main C program that defines storage for three matrices, initializes two of them, and calls the matrix-multiplication function matmul to multiply them together. Note that we don't have to do anything special when defining the arrays--they are just defined as standard two-dimensional arrays.

One thing you might want to change about the main program is the line that calls the matrix-multiplication routine. C compilers that support prototyping will warn you that the argument types don't match the prototype descriptions, though it will still compile and run. Call the function casting the arrays as pointers to vectors of floats; for example, matmul(M, N, P, (float *)a, (float *)b, (float *)c); and everything will be fine.

Summary

It's possible to write functions in C that allow you to treat arrays as being variably dimensioned, the same way Fortran does. Just pass the arrays to the function as though they were pointers to vectors. With some thought, you should be able to write your algorithm so that it accesses the array elements in storage order (that is, by row). This can then be coded by incrementing pointers to the arrays, resulting in a fast and efficient implementation.

Example 1: (a) The generic matrix-multiplication algorithm consists of three loops; (b) the pattern used if multiplying the matrices by hand; (c) the algorithm for accessing all matrix elements as consecutive storage locations under Fortran's array-storage pattern.

(a)

for ___= 1 to ___
   for ___ = 1 to ___
      for ___ = 1 to ___
          c<SUB>ij</SUB>= c<SUB>ij</SUB>+ a<SUB>ik</SUB> x b<SUB>kj</SUB>
      end
   end
end


(b)

for i = 1 to m
   for j = 1 to p
      c<SUB>ij</SUB>= 0
      for k = 1 to n
          c<SUB>ij</SUB>= c<SUB>ij</SUB>+ a<SUB>ik</SUB> x b<SUB>kj</SUB>
      end
   end
end


(c)

for j = 1 to p
   for i = 1 to m
      c<SUB>ij</SUB>= 0
   end
   for k = 1 to n
      for i = 1 to m
          c<SUB>ij</SUB>= c<SUB>ij</SUB>+ a<SUB>ik</SUB> x b<SUB>kj</SUB>
      end
   end
end

Example 2: An algorithm that accesses its arrays in storage order using C's array-storage scheme.

for i = 1 to m
   for j = 1 to p
      c<SUB>ij</SUB>= 0
   end
   for k = 1 to n
      for j = 1 to p
          c<SUB>ij</SUB>= c<SUB>ij</SUB>+ a<SUB>ik</SUB> x b<SUB>kj</SUB>
      end
   end
end

Example 3: The C function matmul, which multiplies two matrices together.


void matmul(int m, int n, int p,

float *a, float *b, float *c)
{
  float *bp, *cp;
  int i,j,k,nc;

  nc = m*p;
  cp = c;
  while (nc--)
    *cp++ = 0;

  while (m--)
  { bp = b;
    k = n;
    while (k--)
    { cp = c;
      j = p;
      while (j--)
       *cp++ += *a * *bp++;

      a++;
    }
    c += p;
  }
}

Example 4: C main program to test matrix-multiplication function.

#define M 80
#define N 120
#define P 160

float a[M][N], b[N][P], c[M][P];
void matmul(int , int , int ,

float *, float *, float *);

main()
{
  int i,j;

  for (i=0; i<M; i++)
    for (j=0; j<N; j++)
      a[i][j] = i+j;
  for (i=0; i<N; i++)
    for (j=0; j<P; j++)
      b[i][j] = i+j;

  matmul(M, N, P, a, b, c);
  printf ("%f %f %f\n",c[0][0],

c[39][79],c[M-1][P-1]);
}


Copyright © 1993, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.