Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Voice Recording and Playback with ISDN


Sep98: Voice Recording and Playback with ISDN

Martyn is senior technical consultant for Eicon Technology and can be contacted at [email protected].


Sidebar: The CAPI Standard

ISDN allows data connections and voice connections over the same network infrastructure. There are currently a large number of companies building ISDN adapters for PCs, most of which support data and voice operation through the same hardware -- often with both functions working at the same time. From the software perspective, there is an industry Standard Open API for ISDN adapters known as the "Common ISDN API" (CAPI). (For more information, see the accompanying text box entitled "The CAPI Standard.")

ISDNREC, the program I present in this article, is a CAPI-compliant Win32 console-mode answering-machine application. As Figure 1 shows, ISDNREC waits for an incoming voice call, plays an outgoing message ("Leave a message after the tone"), and records the caller's voice as a Microsoft WAV file that can later be played back through the sound card. ISDNREC (available electronically; see "Resource Center," page 3) talks to CAPI2032.DLL, which is the CAPI 2.0 library for Win32. This DLL is usually included with most available low-cost ISDN adapters.

Basic Principles of ISDNREC

When a call arrives on ISDN, the network sends a signaling message called a SETUP, which contains information like the caller's number, called number, and so on. Bearer Capabilities (BC) is a mandatory field on the SETUP that indicates whether the current call is carrying digital data or sound samples. Normally, a voice call is ignored by a computer -- a PC would generally be looking for an incoming data call that would be carrying PPP data or perhaps a Group 4 Fax. However, ISDNREC looks solely for voice calls. A CAPI application issues a LISTEN saying what values of BC it is interested in; this means that a number of applications can share the ISDN card without seeing each other's calls.

When ISDNREC sees the call (the SETUP from ISDN arrives as a CONNECT INDICATION, or CON_IND to the program), it picks it up by sending a confirmation. After the confirmation, the Bearer or "B" channel (a 64K digital pipe) is connected to the caller. Voice calls require no special protocols on the B channel; the data arrives as eight-bit sound samples at a rate of 8000 per second. To take a voice call we program CAPI to set all the protocol layers to "transparent;" the data then arrives in A-Law or µ-Law encoded samples (more of this later), and we can write the incoming samples into a file. To play sounds to the caller, such as an outgoing message (OGM), you simply open a file of A-law samples, and play these samples through the B channel. Finally, CAPI sends messages to ISDNREC with information about events on the line, such as the caller hanging up, or the call being disconnected due to network congestion, and so on. ISDNREC processes these incoming messages and takes appropriate actions.

A Word About CAPI

CAPI is a message-passing interface designed to make ISDN applications more operating-system independent. Once the ISDN application has registered with CAPI, it manages two queues: a PUT queue for sending messages to CAPI (and therefore to the ISDN network), and a GET queue for reading messages that come from the network. CAPI messages are split into REQUEST, CONFIRMATION, INDICATION, and RESPONSE types (see Figure 2); each REQUEST has a matching CONFIRMATION, and each INDICATION must be answered with a RESPONSE.

CAPI offers options as to how to handle incoming messages. My approach in ISDNREC is to have a part of my application looping around the GET_MESSAGE operation. When a message comes in, I process it using a Finite State Machine (FSM); when nothing is happening, I do a short sleep using the Win32 CPU-friendly SleepEX() call.

GET_MESSAGE gives you two types of messages: INDICATIONs, which are incoming messages from ISDN that need processing; and CONFIRMATIONs, which are replies to REQUESTs that you have issued. In ISDNREC, whenever I issue a request, I immediately call a routine called WaitForConfirmation(), which loops around issuing GET_MESSAGE until the confirmation arrives. The confirmation indicates that the ISDN card now has the data, so the request/confirmation delay is very short. However, the time between requests and subsequent indications (for example, the time between a LISTEN and an incoming CONNECT_IND) may be minutes or days. In this state, it is better to do some useful work in the program, so the FSM in my program handles the indications from CAPI.

ISDNREC in Detail

ISDNREC uses CAPIVersion() to check that CAPI is loaded, then goes on to call CAPIProfile(). The "profile" represents all the modes that the ISDN card can support. In this program, the check is not really necessary, but in more controversial areas (like using an ISDN card for a Group 3 Fax) I would recommend making this call so you know if the card can support the protocols you want.

Next, ISDNREC calls CAPIRegister(), which registers ISDNREC with CAPI, and gets a unique application ID for the program. CAPI uses this mechanism to allow several ISDN applications to share an ISDN adapter; each registered application has its own queues for PUT_MESSAGE and GET_MESSAGE.

Once registered, the only calls ISDNREC makes are to GET_MESSAGE and PUT_ MESSAGE. All steady-state operations are now performed via CAPI messages. When ISDNREC wants to terminate, it calls CAPI_RELEASE to deregister the application ID, allowing CAPI to free all the resources used by this session.

Before entering its main loop, ISDNREC calls ProcessFSM() with the FSM input I_KICKOFF. This causes the LISTEN event to be issued to CAPI. Once the LISTEN is active, you are ready to take incoming calls, and you will see the CONNECT_IND messages via the GET_MESSAGE queue.

The central loop of ISDNREC is the function ProcessEvent(). This is called repeatedly until the FSM finally sets the "finished" flag (caused by a DISCONNECT event on ISDN). ProcessEvent() calls GET_MESSAGE to see what CAPI messages are coming in; any INDICATIONs get translated into inputs into the FSM, and it is the FSM that keeps information about the connection state of the ISDN card. ProcessEvent() also checks the timer and generates I_TIMER inputs to the FSM.

The Finite State Machine

The FSM consists of a two-dimensional array (see Listing One), and the ProcessFSM() function (Listing Two). The array has inputs down the left side, and the machine states along the top. At each array location, there is a single number that contains a new state and an action to execute for this input. I like this way of representing an FSM in C, since you can easily see at a glance what will happen for each input in each state and the array is self-documenting. I have seen FSMs written as switch statements within switch statements. This format is hard to read and the code and the documentation often get out of step when changes are made. With my method, the comment is the code, so this is not such a problem. Also, this table layout encourages you to think about the effect of each input in each state, and to design an appropriate action for each one.

The actions are contained in a switch statement inside ProcessFSM(). For debugging and code-review purposes, it is a good idea to keep ProcessFSM() short -- I normally favor having a separate function for each action. On the subject of debugging, I find that it is useful to have only one timer active in the FSM, then there can only be one reason for a timer to expire in any particular state. It is surprisingly easy to get by with only one timer when using an FSM to manage the states.

Most of the states in the FSM are dedicated to ISDN activation. The initial I_KICKOFF causes the FSM to go from IDLE to the LISTEN state (ready for incoming calls, for example). After that, CONNECT_IND causes a transition to ACTING state; ISDNREC sends a CONFIRMATION to this, and you get CONNECT_ACTIVE_IND to say that the call has been accepted, and the B channel is fully established.

This looks like it should be enough, but in fact CAPI now requires that you bring up a logical connection over the B channel. Data applications often require this (rather complex) procedure, since protocols such as V.120 or X.25 in B channel allow several, separate, logical connections over the same link. In the case of voice calls, there is no logical session (just a voice channel), and no protocol (you are running transparent); however, CAPI requires us to go through the motions all the same. Consequently, you wait for CONNECT_B3_IND, and send a confirmation, and then finally when the CONNECT_B3_ACTIVE_IND message arrives, you can move into the CONN state.

In the CONN state, you can send and receive data, so you need to get to this state before you can call SendOGMessage() to transmit the OGM. After the OGM has been sent, you can enter record mode (done with the "recording" flag). Once in record mode, all incoming data messages, DATA_IND, go through the FSM and end up in RecvData() where they are translated from A-Law format and saved to a WAV file.

The final two states, DISCB3 and DISC are used for handling the DISCONNECT_IND and DISCONNECT_B3_IND messages coming from CAPI, to ensure tidy close-down of ISDN calls. You generally receive the B3_IND message first, since this is the logical connection being taken down. The DISCONNECT_IND happens when ISDN signals that the call is gone; the routine IncomingDisc() retrieves any network diagnostic codes with this message, then displays them in English.

Implementation Problems

The biggest single problem in implementing this program was understanding the data format. European ISDN networks use A-Law companding, but two crucial pieces of information were missing in the CAPI specification (available at http://www .capi org/). First, on ISDN, A-Law data is XORed with 0x55 before transmission (so that the "empty line" is not at zero, but alternating between 1s and 0s to improve clock recovery). Second, CAPI presents the bytes with a bit order that I would consider back-to-front; the most significant bit is transmitted last. My reverse() function flips the byte around the "right" way. Take a look at encode() and decode() to see how to convert between A-Law and standard 12-bit signed sound samples; these are a quick and dirty implementation, but they seem to do the job.

In North America or Japan, the µ-Law is generally used for companding. I have provided some functions to encode and decode for µ-Law, but I am unable to test them. If you are interested in the algorithms and raison d'etre for A-Law and µ-Law, I would recommend Digital Telephony, by John Bellamy (John Wiley & Sons, 1991, ISBN 0-471-62056-4).

Building and processing some of the CAPI messages can be a pain for two reasons: First, the CAPI message itself has a message-specific part (represented by a C union), which makes the code a bit fiddly. Secondly, the messages often contain variable length fields. For example, in ProcessCall() (Listing Three) there are a number of fields that need to be read from the CONNECT_IND message. Each variable length field has a length byte followed by n bytes of data. If the length field is zero, then the structure is null, and CAPI will skip over it to the next field. There are five consecutive fields; Called Party Number, Calling Party Number, Called Party Subaddress, Calling Party Subaddress, and BC. I am only interested in the first two and the last one, so I need to read the length byte of Called Party Number, process the string that follows, then use that length to adjust my pointer so that it's now pointing at the length byte of Calling Party Number, and so forth.

In general, CAPI itself is not difficult to use, but does require quite a bit of infrastructure to handle all the activation messages correctly. It pays to carefully send a response correctly to INDICATIONs from CAPI. At one stage, I had a problem where the ISDN line was obviously connected, but no data was being received. I eventually tracked it back to the CONNECT_B3_ACTIVE_IND message -- if you forget to respond to this message, you will never receive any DATA_IND messages from CAPI, which makes for a very quiet time.

The timer processing is performed by a second thread from within the process. I originally tried to use the Win32 SetTimer() call using a callback function, but in a console-mode application, it did not seem to work -- the callback was never scheduled.

Microsoft WAV file Processing

My initial attempts at the program simply dumped raw A-Law samples into a LAW file. Later, I realized that I really wanted to save data in WAV format, so that messages could be played back using a sound card. I searched the Internet and found some useful documents about the RIFF/WAV formats by Rob Ryan ("RIFF WAVE (.WAV) File Format," ST802200@ brownvm.brown.edu) and Robert Shuler ("General RIFF File Background," rshuler@ aol.com). WAV has many subformats, but I chose to use the 16-bit PCM format, since this is likely to be understood by the largest number of programs. In AMULAW.C, the function AlawtoPCM() decompresses from A-Law to PCM, and saves the samples at a rate of 8000 per second with an appropriate WAV header (see RewriteWAVheader()). A piece of code that reads the OGM from a WAV file would be quite straightforward to add.

While ISDNREC is a fairly crude application, it does demonstrate the basic programming technique for CAPI, and it contains all the data-format-conversion routines necessary to make a real application. By adding a MAPI interface, it would be possible to store incoming voice messages (in WAV files) in your e-mail program's inbox (with the addition of some routines to process DTMF tones, you could remotely control a system using only a touch-tone phone and voice prompts). ISDN adapters and the CAPI interface give us a great tool for developing the telephony applications of the next decade.

DDJ

Listing One

WORD fsm[INPUTS][STATES]={//             IDLE  LISTEN ACTING  PROTB3  ACTB3I  CONN    DISCB3  DISC
//              s0     s1     s2      s3      s4     s5       s6     s7
//--------------------------------------------------------------------------
/*KICKOFF  */{ s1+1,  s1+0,  s2+0,  s3+0,   s4+0,   s5+0,   s6+0,   s7+0,  },
/*CONNI    */{ s0+0,  s2+9,  s2+0,  s3+0,   s4+0,   s5+0,   s6+0,   s7+0,  },
/*CONNACTI */{ s0+0,  s1+0,  s3+2,  s3+0,   s4+0,   s5+0,   s6+0,   s7+0,  },
/*CONNIB3  */{ s0+0,  s1+0,  s2+0,  s4+11,  s4+0,   s5+0,   s6+0,   s7+0,  },
/*ALLACT   */{ s0+0,  s1+0,  s2+0,  s3+0,   s5+3,   s5+0,   s6+0,   s7+0,  },
/*DISCI    */{ s0+0,  s1+0,  s0+12, s0+12,  s0+12,  s0+12,  s0+12,  s0+12, },
/*DISCB3I  */{ s0+0,  s1+0,  s2+0,  s3+0,   s7+10,  s7+10,  s7+10,  s7+10, },
/*TIMER    */{ s0+0,  s1+0,  s2+0,  s7+8,   s7+8,   s5+13,  s6+0,   s7+0,  },
/*DATA_R   */{ s0+0,  s1+0,  s2+0,  s3+0,   s4+0,   s5+5,   s6+0,   s7+0,  },
/*DATA_I   */{ s0+0,  s1+0,  s2+0,  s3+0,   s4+6,   s5+6,   s6+0,   s7+0,  },
/*DISCRQ   */{ s0+0,  s1+0,  s2+0,  s3+0,   s6+7,   s6+7,   s7+8,   s7+0,  },
                                             
};

Back to Article

Listing Two

void ProcessFSM(int input, CAPI_MSG *msg){
  int newstate,oldstate;
  int action,value;
  
  value = fsm[input][state];
  newstate = GETSTATE(value);
  action = GETACTION(value);


</p>
  oldstate = state;
  state = newstate;


</p>
  switch(action){
    case 0:     //NULL action
      break;
    case 1:    //LISTEN_REQ must be issued.
      IssueListen();
      break;
      ...
   default:
     printf("FSM Error with ProcessFSM(%d) in state (%X)\n",input,oldstate);
  }
}

Back to Article

Listing Three

void ProcessCall(CAPI_MSG *msg){
  _CON_INDP *coni;
  _CON_RESP *conr;
  DWORD capi_error;
  WORD cip;
  BYTE *field, len, *bp;


</p>
  coni = &msg->info.connect_ind;
  cip = coni->CIP_Value;
  switch(cip){
     case 1:   printf("Incoming speech call\n"); break;
     case 4:   printf("Incoming 3.1kHz audio call\n"); break;
     case 16:  printf("Incoming Telephony call\n"); break;
  default:;
  }
  //now find the Calling Party Number - this may be of interest to someone
  field = &coni->structs[0];


</p>
  printf("\tNumber Called ");
  len = *field; 
  if(len){
     printf("= ");
     PutNumber(len,field);   //Called party number
     putchar('\n');
  }else
     printf("not available\n");
  field += (len+1);         // step over Called Party Number


</p>
  printf("\tCaller's Number ");
  len = *field;
  if(len){
     printf("= ");
     PutNumber(len,field);   //Calling party number
     putchar('\n');        // step over Called Party Number
  }else
     printf("not available\n");
  field += (len+1);

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.