Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

An Incremental String Search in C


Coding Details

Only three functions are needed to implement an incremental string search. One function, getkey, is a simple keyboard filter. The second, get_range, determines the range of indices in a string array that match a given input character. The third, inc_search, uses the previous two functions to perform the search. Definitions of these functions and their required support code are shown in Listing One. getkey essentially reads and echoes any printable character entered from the keyboard, including a carriage return. In such cases, getkey returns the ASCII code of the key pressed.

#define MAXSTR 25    /* Maximum number of strings in array */
#define MAXLEN 30     /* Maximum length of each string */
#define ESC 27        /* ASCII code for Escape */

/* if MIXCASE is nonzero, comparisons are case-sensitive */
#define MIXCASE 0
#if MIXCASE
    #define CASEMOD(x) x
#else
    #define CASEMOD toupper
#endif

typedef char STR[MAXLEN+1];             /* Definition of type STR */
typedef STR STARRAY[MAXSTR]:            /* ... of type STARRAY    */

int get_range(search_char,array,last_index,ptr_first,ptr_last)
char search_char;
STR *array;
int last_index, *ptr_first, *ptr_last;
{
  int j, k;     /* Loop variables */
  int found;    /* Indicates if a match has been found */

  search_char = CASEMOD(search_char);
  for (j=0; j<=last_index; j++)  /* Seek first match */
    if (found = CASEMOD(array[j][0]) == search_char)
      {                          /* A match has been found at index j */
        k = j                    /* Store value in k */
    break;
      }
  if (found)
    {          /* A match has been found at index k; determine its extent */
      for (j=k; CASEMOD(array[j+1][0]) == search_char; j++)
        ;      /* Set j to last matching index */
      *ptr_first = k;
      *ptr_last = j;
      return(1);
     }
  return(0);
}

int getkey(void)
{ char ch;

  while (1)
    {
      if ((ch = getch()) == '\'' || ch == '\" ')
        return(0);
      if (ch == ESC || ch == '\r' || ch >= 32 && ch <= 125)
        { putch(ch);
          return(ch);
        }
      if (ch == 0) getch();  /* Discard function key input */
    }
}
int inc_search(arr,imax,ptr_start,ptr_end)
STARRAY arr;
int *ptr_start, *ptr_end;
{  int first,last,ch;    /* Parameters for GET_RANGE call */
   int OK = 1;           /* Return value of GET_RANGE; defaults to one */
   int start,end         /* First and last indices that yield a match */
   STR *base;            /* Starting location for search */

   end = imax;
   base = arr; start = 0;
   while ( (ch = getkey()) != '\r' && ch != ESC
           && (OK = get_range(ch,base,imax, &first, &last)))
     {                          /* No CR or ESC has been entered,
                                /* and the list of matches is nonempty */
       end = start + last;      /* Adjust values of end and start */
       start += first;
       if (strcmp(arr[start],arr[end])==0)
         break;           /* Abort loop if there's a unique match */
       else               /* A match has been found, but it's not unique */
          {               /* Adjust start location for search */
            base = &base[first][1];
            imax = last - first;  /* Adjust max search index */
          }
     }
   if (ch == ESC)
     return(-1);               /* Return -1 if search was aborted */
                               /* Else return values as described above */
   *ptr_start = start;
   *ptr_end = end;
   return(OK);

}

/* The code that appears from this point on is used solely to */
/* demonstrate the incremental search facility. */

STARRAY arr =
  { "Anderson","Bradley","Burke","Federico","Fikar",
    "Frazer","Geary","Grando","Haas","Lehr","Lemke","Mahoney",
    "Martin","Martino","Milne","Milnor","Montgomery","Rivera",
    "Rivers"};

int do_again(void)
{ char ch;

  printf("\n Do again ? (Y/N) ");
  while ((ch = toupper(getch())) != 'Y' && ch != 'N')
 
    ;
  putch(ch);
  return(ch);
}
main()
{  int k,maxindex,start,end;

   /* Print the array to be searched */
   for (k=0; k<=MAXSTR-1; k++)
     if (arr[k][0] == 0)
       { maxindex = k-1;
     printf("\n");
     break;
       }
     else printf("%-20s",arr[k]);
   for (k=1; k<=80; k++)
     putch('=');
   /* Perform incremental search */
   do  {
     printf("\n Enter a search string : ");
     switch (inc_search(arr,maxindex, &start, &end)) {
       case -1 : printf ("\n Aborted by user.");
                 break;
       case  0 : printf("\n No matches found.");
                 break;
       case  1 : printf("\n Matches are : \n");
                 for (k=start; k<=end; k++)
                   printf("%-20s",arr[k]);
         printf("\n"):
       }
   }
   while (do_again() == 'Y');
}

Listing One.

Some exceptions exist. If a single or double quotation mark is entered, the key is not echoed and getkey returns a value of zero. The reason for choosing this particular return value is simple. Since quotes are used to indicate a literal match, the return value of zero will match the ASCII zero terminator for a string only if an exact match exists between the search string entered thus far and string to which it's being compared.

get_range searches the leading characters in a string array for entries that match a given character. For the reasons discussed in the previous section, get_range must know the maximum array index it's allowed to reference; otherwise, it may not be searching a sorted array. If get_range succeeds in finding a match, it returns a value of one and passes the corresponding range of indices back to the caller through its argument list. If get_range fails to find a match, its return value is zero.

inc_search does the actual searching. The header for this function is int inc_search (arr,imax,ptr_start, prt_end) . Here arr is a pointer to a two-dimensional array of characters, and the integer imax gives the largest first index that arr can safely reference. If the search routine finds one or more matches for the search string, the formal parameters ptr_start and prt_end (which are pointers to integers) return the range of indices in arr[] that yield a match.

inc_search begins by setting the range of matching indices to be the entire set of array indices for the search array arr. This ensures that if the user presses Enter without first entering search characters, the function will (correctly) report that all array entries are compatible with the search string. It also sets base — the variable indicating the starting location for the array to be searched — equal to arr.

Next, inc_search calls getkey to get an input character. If this character is different from Escape or Enter, it's fed to get_range, which reports on its success or failure in finding a match. If get_range, failed or succeeded only for a single array value, the search terminates. The only other possibility is that a match occcured over two or more array values. Let first and last be the corresponding range of indices. We need to move the array pointer base so it points to the second character of base[first] . This is easily done in C with the statement base = e base[first][1].

After moving the origin of the search array this way, we start the entire cycle again: Ask for a character, find all possible matches, and move the array pointer accordingly. The process eventually ends in success, failure, or an abort.

The performance of the inc_search routine presented here seems quite acceptable, at least for modestly sized arrays. If you're applying the functions to an array of several hundred entries, some changes might be in order. get_range uses a sequential search to find matches. For large arrays, it might be better to use binary search initially, then switch to sequential search when the range of indices to be checked is less than a hundred. You may also wish to include code to allow for editing of the search string. In any case, the routines presented here should be short and clear enough to meet your needs.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.