November 01, 2002
An Application-Crashing Windows BugUser Interface ProgrammingPetter Hesselberg
Analyzing an elemental user interface conflict involving the User32 library.
Reader Joe Hagen wrote:
Peter, A gentleman posting on comp.os.ms-windows.programmer.win32, Matthias Hoffrichter, recently discovered a possible anomaly in User32. The condition seems to be triggered with the following sequence:
This may be a bug in User32, or simply incorrect logic in the window proc of an application. The issue also occurs in MSDEV (Version 6). If you choose Tools/Options, a dialog pops up. Use the tab key to set the focus to the Cancel button, hold down the spacebar, and press enter. Boom! I believe SPYXX also exhibits this behavior, and I've seen this crash occur under both NT and XP. I've attached a sample file based on code he presented, u32c.cpp, which creates a window with eight buttons. I compiled it with this line:
CL /W3 /Ox /GX /MT u32c.cpp user32.lib
If you press tab to set the focus to the first button, and then hold down the spacebar while pressing enter, tab, or one of the arrow keys, you should be able to observe the crash. Oddly, if the focus is on any other button, then the crash doesn't occur. He needed a workaround. I assumed it may be because he was calling DestroyWindow() from the WM_COMMAND case. I know that calling DestroyWindow() ends up reentering the window proc. As a workaround, I suggested in WM_COMMAND he call PostMessage() with a user message, and move the DestroyWindow() call to his user message processing. This seemed to eliminate the crash, but I don't know if it's the "best" solution.
I was just curious how someone with your expertise might assess the issue. Thanks for any info. Joe
This definitely looks like a Windows bug to me. I did some debugging on Listing 1, a modification of Joe's example, and it did indeed crash. The stack trace from the crash site revealed that the program was deep inside a function called USER32!xxxBNDrawText, so Windows was clearly attempting to repaint the button window. Unfortunately, at this point the button window had already been destroyed, and it seems likely that this is the root of the problem.
A stack trace is just a snapshot of the program's state at a given time. A message trace, which shows the action over time, reveals that in response to a spacebar plus enter key, the button sends WM_COMMAND twice, presumably once in response to each key. DestroyWindow() thus gets called twice, which on the face of it sounds like a bad idea. To circumvent that, I put in a static flag to protect against reentrancy and, lo and behold, the application no longer crashed.
If I replace the enter key with tab or arrow keys, however, the button sends WM_COMMAND only once, and the flag fix (which is somewhat cumbersome in any case) breaks down. In other words, calling DestroyWindow() twice is not the real problem here.
The message trace revealed another detail, though not a very surprising one: The crash occurs during the button's handling of a WM_KILLFOCUS, which explains why it wants to repaint itself. Actually, I consider this misbehavior. If you're a window, the proper way to repaint yourself is to a) update your internal status, b) call InvalidateRect() or InvalidateRgn(), and c) wait for WM_PAINT. Direct painting should be avoided unless absolutely necessary. While this kind of performance-booster may have been necessary in the early Pleistocene of Windows history, it is not necessary now.
Interestingly, the bug does not manifest itself if I create fewer than eight buttons or, as Joe mentioned, if I try it on any button but the first. I have no satisfactory theory to explain any of that.
My next step was to provoke the bug using the dialog box resource (Figure 1) in Listing 2, and to call DialogBox(), rather than creating the buttons from scratch and using IsDialogMessage(), as Joe did. If I call DestroyWindow() from the dialog function's WM_COMMAND handler, the crash still happens. But of course, one does not usually make that call; the normal approach is to call EndDialog() instead, in which case the crash does not happen. The EndDialog() documentation throws some light on this: EndDialog() does not destroy the dialog box immediately. Instead, it sets a flag and allows the dialog box procedure to return control to the system. The system checks the flag before attempting to retrieve the next message from the application queue. If the flag is set, the system ends the message loop, destroys the dialog box, and uses the value in nResult as the return value from the function that created the dialog box.
In other words, the call to DestroyWindow() must somehow be deferred. Joe did this by posting a user-defined message, then calling DestroyWindow() in response to that message. This works just fine, but a simpler approach is simply to post WM_CLOSE. A condition for this to work is, of course, that the window calls DestroyWindow() in response to WM_CLOSE; fortunately, that is exactly what DefWindowProc() does.
In either case, PostMessage() is the key to success; SendMessage(), unsurprisingly, goes boom.
Conclusion
I did try several different approaches, playing around with PeekMessage() and SetFocus() and also sending WM_CANCELMODE messages. If something like this could be made to work, it might be possible to hide the bug-fix in a subclassing, thus getting it out of the way. Unfortunately, I had no luck and, in any case, I was really just groping around in the dark, particularly with regard to WM_CANCELMODE, which I don't understand very well. Conclusion: Never call DestroyWindow() directly in response to WM_COMMAND.
Petter Hesselberg is a partner with Accenture's Oslo office. He's been programming Windows for the past 13 years and is the author of the book Programming Industrial Strength Windows. He can be reached through http://pethesse.home.online.no/.
|
|
||||||||||||||||||||||||||||
|
|
|
|