Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

BSTR to char* String Conversion Gotchas


I use the OLE2T() and T2OLE() macros a lot in my code to convert between BSTRs and char*s. Is there anywhere this could cause me problems?

If you’ve spent any time working with COM interfaces while writing C or C++ code, you’ve likely had to deal with BSTRs. A BSTR is a string data type that was developed for Visual Basic and differs in its storage characteristics from a standard, null-terminated char* string in C/C++. The primary difference is that a BSTR contains a leading 4-byte integer block that indicates the number of characters to follow, whereas a standard char* string does not. However, both a BSTR and a char* string are null terminated, which can make translating between the two rather quick as long as there are not embedded nulls in the string value itself. If this happens, then the leading 4-byte integer in a BSTR aids in defining the actual length rather than the length to the first null.

Since many COM interfaces are developed to work with both Visual Basic and C/C++, a developer will often define any string parameters in a COM interface to be a BSTR. In fact, if you are creating an OLE Automation (or COM Automation) interface (that is, one based on IDispatch), then you must use BSTRs to define strings. I found that using BSTRs in my COM interfaces became a habit long ago, and it’s rare when I bother with defining a string method parameter as a character array comprising two parameters—one for the data and one for the length of the data. I find BSTRs easier to use in this case. The downside is the constant need to convert between a BSTR and a more useful char* string. If you work with the STL “string” class in your code, you’ll probably find yourself converting between them very often.

Fortunately, Microsoft developed some easy-to-use macros for MFC, which are also available for the Active Template Library (ATL) that deal with BSTRs. The most common are OLE2T() and T2OLE(). The first macro converts a BSTR to a Unicode or ANSI string and the second macros converts a Unicode or ANSI string to a BSTR. There are also OLE2A() and A2OLE() variants that can be used if you know you are working with ANSI strings. A companion macro named USES_CONVERSION is also used once per scope use of OLE2T() or T2OLE() to set up some stack variables that allocate memory used during conversion.

One known place where these macros cannot be used is within a C++ catch() handler. This is due to the way the OLE2T() and T2OLE() (or OLE2A() and A2OLE()) macros are implemented. Both make use of the C run-time function _alloca to allocate space on the stack for the string conversion. Doing so has the advantage of automatic cleanup when exiting the method or function call. Using the heap-based alloc or malloc would have required a cleanup function to be called when exiting the method whereas using _alloca does not. This also makes it handy when unhandled exceptions force a sudden unrolling of the stack—a heap-based cleanup function wouldn’t execute and you’d end up with orphaned memory blocks. However, the _alloca function has a limitation in that it cannot be used in any kind of exception handler (either Windows NT Structured Exceptions Handers or C++ catch statements). Here is an excerpt from the MSDN documentation on why:

"There are restrictions to explicitly calling _alloca in an exception handler (EH). EH routines that run on x86-class processors operate in their own memory "frame: They perform their tasks in memory space that is not based on the current location of the stack pointer of the enclosing function."

The unfortunate thing is that the minimal documentation describing use of OLE2T() or T2OLE() does not mention this restriction. I've run into this problem myself where I attempted to call a COM interface during a catch() handler that required the conversion of a BSTR to a char* string. When the code executed, it immediately crashed. At first, I thought I had a memory corruption problem in the heap or stack. After a lot of research, including poking around in the implementation of the macros, I stumbled across the warnings in the documentation for _alloca.

Another place where you might have trouble is if you attempt to pass the result of an OLE2T() conversion to a method/function that takes a char* pointer, and then find that the pointer is later used in a catch() handler in the callee.

In my own code, I found this set of rules to be potential causes of needless bugs, so I extended the ATL CComBSTR class and added support for moving between BSTR's and char* strings within methods in the derived class. I also made sure that the return value of a BSTR to char* conversion was an STL string to ensure I didn't accidentally try to pass the original char* to a method that might use it improperly.

Of course, another option would be to roll your own BSTR to char* conversion code that doesn't rely on _alloca to create memory space for the converted data. But for most developers, the availability of OLE2T() and T2OLE() are good enough even with the restrictions.


Mark M. Baker is the Chief of Research & Development at BNA Software located in Washington, D.C. He can be contacted at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.