Archive for the 'WinDBG' Category

Effective Leak Detection with the Debug CRT and Application Verifier

Programming memory leaks in C or C++ is easy. Even careful programming often cannot avoid the little mistakes that finally end up in your program having a memory leak. Thankfully, however, there are plenty of helpful tools that assist in finding leaks as early as possible.

One especially helpful tool for leak detection is the debug CRT. Although the leak detection facilities provided by the debug CRT are not as far-reaching as those of, say, UMDH, using the debug CRT is probably the most friction-less way of identifying leaks.

Of course, the debug CRT will only track allocations of the CRT heap. That is, allocations performed using malloc or, in case of C++, the default operator new.

So how to enable allocation tracking? As it turns out, it is already enabled by default for the debug heap — so changing the CRT flags using _CrtSetDbgFlag usually is not even neccessary. All there is to do is to call _CrtDumpMemoryLeaks() at the end of the program.

When exactly is “the end of the program”? That depends on which CRT you use. Each CRT uses a separate heap and thus, must have its resources be tracked separately. If your application EXE and your DLLs all link against the DLL version of the CRT, the right moment to call _CrtDumpMemoryLeaks() is at the end of main(). If you use the static CRT, the right moment is when the respective module is about to unload — for an EXE, this is the end of main() again (atexit is another option). For a DLL, however, this is DllMain (in the DLL_PROCESS_DETACH case).

To illustrate how to make use of this CRT feature, consider the following leaky code:

#include <crtdbg.h>

class Widget
{
private:
  int a;

public:
  Widget() : a( 0 )
  {
  }
};

void UseWidget( Widget* w )
{
}

int __cdecl wmain()
{
  Widget* w = new Widget();
  UseWidget( w );

  _CrtDumpMemoryLeaks();
  return 0;
}

Running the debug build (i.e. a build using the debug CRT) of this program will yield the following output in the debugger:

Detected memory leaks!
Dumping objects ->
{124} normal block at 0x008C2880, 4 bytes long.
 Data:  00 00 00 00
Object dump complete.

So we have a memory leak — allocation #124 is not freed. The default procedure to locate the leak now is to include a call to _CrtSetBreakAlloc( 124 ) in the program and run it in the debugger — it will break when allocation #124 is performed. While this practice is ok for smaller programs, it will fail as soon as your program is not fully deterministic any more — most likely because it uses multiple threads. So for many programs, this technique is pretty much worthless.

But before continueing on the topic of how to find the culprit, there is another catch to discuss. Let’s include this snippet of code into our program:

class WidgetHolder
{
private:
  Widget* w;

public:
  WidgetHolder() : w( new Widget() )
  {}

  ~WidgetHolder()
  {
    delete w;
  }
};

WidgetHolder widgetHolder;

No leak here — we are properly cleaning up. But let’s see what the debugger window tells:

Detected memory leaks!
Dumping objects ->
{125} normal block at 0x000328C0, 4 bytes long.
 Data:  00 00 00 00
{124} normal block at 0x00032880, 4 bytes long.
 Data:  00 00 00 00
Object dump complete.

Urgh. But the reason should be obvious — when main() is about to return, ~WidgetHolder has not run yet. As a consequence, WidgetHolder’s allocation has not been freed yet and _CrtDumpMemoryLeaks will treat this as a leak. Unfortunately, there is no good way to avoid such false positives. Of course, this only holds for C++. For C, this problem does not exist.

Ok, back to the problem of locating the leak. We know that allocation #124 is the problem, but assuming our program does more than the simplistic example, breaking on #124 during the next runs is likely to lead us to ever changing locations. So this information is worthless. That leaves the address of the leaked memory — 0×008C2880.

At this point, we can leverage the fact that the CRT heap is not really a heap but just a wrapper around the RTL heap. Therefore, we can use the incredibly powerful debugging facilities of the RTL heap to help us out.

In order to fix a leak, it is usually extremely helpful to locate the code having conducted the allocation. Once you have this information, it is often trivial to spot the missing free operation. As it turns out, the the RTL heap’s page heap feature offers this capability.

Open Application Verifier and enable Heap Checks for our application. By default, this enables the full page heap, but the normal page heap is enough for our case.

Note that for the following discussion, I assume you are using the Visual Studio debugger.

Set a breakpoint on the statement immediately following the _CrtDumpMemoryLeaks() statement and run the application until it breaks there. This time, the locations 0×02CDFFA0 and 0×02CDFF40 are reported as being leaked. Do not continue execution yet.

Rather, open WinDBG and attach noninvasively to the debugged process. VisualStudio is already attached, so we cannot perform a real attach, but a noninvasive attach does the trick.

In WinDBG, we now use the !heap extension to query page heap information:

0:000> !heap -p -a 0x02CDFF40
    address 02cdff40 found in
    _HEAP @ 2cd0000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        02cdfef8 000c 0000  [00]   02cdff20    00028 - (busy)
        Trace: 02dc
        776380d8 ntdll!RtlDebugAllocateHeap+0x00000030
[...]
        6a2fab29 MSVCR80D!malloc+0x00000019
        6a34908f MSVCR80D!operator new+0x0000000f
        4115c9 Leak!WidgetHolder::WidgetHolder+0x00000049
        415808 Leak!`dynamic initializer for 'widgetHolder''+0x00000028
        6a2e246a MSVCR80D!_initterm+0x0000001a
        411d33 Leak!__tmainCRTStartup+0x00000103
        411c1d Leak!wmainCRTStartup+0x0000000d
        767b19f1 kernel32!BaseThreadInitThunk+0x0000000e
        7764d109 ntdll!_RtlUserThreadStart+0x00000023

0:000> !heap -p -a 0x02CDFFA0
    address 02cdffa0 found in
    _HEAP @ 2cd0000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        02cdff58 000c 0000  [00]   02cdff80    00028 - (busy)
        Trace: 02e0
        776380d8 ntdll!RtlDebugAllocateHeap+0x00000030
[...]
        6a2fab29 MSVCR80D!malloc+0x00000019
        6a34908f MSVCR80D!operator new+0x0000000f
        411464 Leak!wmain+0x00000044
        411dd6 Leak!__tmainCRTStartup+0x000001a6
        411c1d Leak!wmainCRTStartup+0x0000000d
        767b19f1 kernel32!BaseThreadInitThunk+0x0000000e
        7764d109 ntdll!_RtlUserThreadStart+0x00000023

Aha, stack traces! The remaining analysis is almost trivial: 0×02CDFF40 has been allocated on behalf of WidgetHolder::WidgetHolder. WidgetHolder::WidgetHolder, however, is not indirectly invoked by wmain, but rather by MSVCR80D!_initterm! That is a strong indication for this being a global object that can be ignored in this analysis.

0×02CDFFA0, in turn, is allocated by wmain, so this is a real leak. But which allocation is it, exactly? lsa will tell us:

0:000> lsa Leak!wmain+0x00000044
    33: }
    34:
    35: int __cdecl wmain()
    36: {
>   37: 	Widget* w = new Widget();
    38: 	UseWidget( w );
    39:
    40: 	_CrtDumpMemoryLeaks();
    41: 	return 0;
    42: }

There you go, we have found the culprit.

Although simple, I have found this technique to be very effective in practice, as it enables you to find leaks as you develop your code. As Application Verifier should be enabled anyway for any application you are developing on, the technique also turns out to be a lot less laborious than it may seem. It almost certainly is a lot more convenient than routinely doing UMDH runs. To be fair, however, UMDH is able to catch more leaks (non CRT-leaks), so additionally using UMDH remains being a good idea.

Trace and Watch Data — How does it work

One of the builtin WinDBG commands is wt (Trace and Watch Data), which can be used to trace the execution flow of a function. Given source code like the following:

void foo()
{
}

void bar()
{
}

int main()
{
  // Some random code...
  int a = 1, b = 2;

  // Call a child function.
  foo();

  // More useless code...
  a+=b;
  if ( a == b) a = b;

  // Call another child function.
  bar();  

  return 0;
}

wt will produce the following output:

0:000> wt
Tracing test!main to return address 00401291
    6     0 [  0] test!main
    1     0 [  1]   test!ILT+5(_foo)
    4     0 [  1]   test!foo
   13     5 [  0] test!main
    1     0 [  1]   test!ILT+0(_bar)
    4     0 [  1]   test!bar
   17    10 [  0] test!main

27 instructions were executed in 26 events
                                  (0 from other threads)

Function Name         Invocations MinInst MaxInst AvgInst
test!ILT+0(_bar)                1       1       1       1
test!ILT+5(_foo)                1       1       1       1
test!bar                        1       4       4       4
test!foo                        1       4       4       4
test!main                       1      17      17      17

0 system calls were executed

Although helpful, tracing a larger function calling a multitude of other functions slows down the debuggee significantly. An interesting question is thus how wt is implemented. Three possible implementation strategies come to mind:

  1. Use single-stepping. After each instruction executed, a debug trap is raised and the debugger is delivered a single-step debugging event. Though all non-branching instructions are probably irrelevant to wt, by intercepting each call and ret instruction, the debugger is able to trace function entry and exit.
  2. Explicitly set breakpoints. The debugger disassembles the function to be traced and places an ordinary breakpoint on each call instruction as well on as the return address of the function. Whenever one of the call-breakpoints fires, the debugger instruments the target function in the same way (i.e. place breakpoints on each call instruction as well as the return address) and continues execution (without single-stepping). By intercepting all function calls and returns, the debugger is able to deduce the call tree. This approach would be similar to UMSS.
  3. Use Last Branch Recording. This is a rather new additon to the IA-32 instruction set that allows setting breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next.

In order to find out, we have to debug the debugger to observe how it debugs the target. We thus start WinDBG, choose our test application as target and let it break on main. We then start another WinDBG instance and attach it to the first WinDBG instance. In order to find out which debugging events are consumed by the first instance, we use the second debugger to trace function calls made by the first debugger.

All usermode debuggers eventually end up calling ntdll!NtWaitForDebugEvent in a loop — so to find out which debugging events are consumed, all we need to do is trace all calls to this function. While being an undocumented native function, there is an excellent summary on the inner workings of user mode debugging which also covers ntdll!NtWaitForDebugEvent. Given this information, all we need to do to check whether strategy #1 or strategy #2 has been implemented (I assume #3 may safely be neglected) is to put together a little breakpoint command like the following (line breaks added for clarity):

bp ntdll!NtWaitForDebugEvent "
   r @$t1=poi(esp+10);
   g @$ra;
   .if (poi(@$t1)==8) {.echo \"SingleStep\n\" }
   .else {.printf \"Excp %p\\n\", poi(@$t1+c)};
   g "

When entering ntdll!NtWaitForDebugEvent, we store the address of the fourth parameter (which receives a PDBGUI_WAIT_STATE_CHANGE structure) in $t1 and step out of the function. Then we reach into the structure whose address is stored in $t1 and check if the first field marks the event of being of type DbgSingleStepStateChange (0×8) and output an appropriate message. If we receive about 30 single-step events, strategy #1 has probably been chosen. For #2 we would expect to receive 5 breakpoint events.

Back to the first debugger, we now opt to trace the main function by running wt. This yields the output shown above. Switching to the second debugger again, we now see the following output:

SingleStep
SingleStep
SingleStep

[...about 20 more...]

SingleStep
SingleStep
SingleStep
SingleStep
SingleStep

Quite obviously, wt implements strategy #1 — it does single stepping. Although this does not really come as a surprise, it is still unfortunate as it is most likely the slowest approach of tracing calls. And as anybody who has ever used wt can probably confirm, wt is really slow.

As an interesting side note, as of Linux kernel 2.6.25, ptrace on x86 has been enhanced to facilitate Last Branch Recording on CPUs that support it.

Determining the apartment of a thread

There are situations in which it would be convenient to list which apartment the threads of a process belong to. In case of managed debugging, the !threads command provided by SOS gives this info:

PreEmptive   GC Alloc               Lock
ID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
0   688 00149528      6020 Enabled  00000000:00000000 00159e68     0 STA
1   f70 00165548      b220 Enabled  00000000:00000000 00159e68     0 MTA (Finalizer)

In case of unmanaged debugging, however, no such command exists (at least to my knowledge). So the first question is how the apartment-information can be retrieved for a given thread.

Knowing that calling CoInitializeEx( NULL, COINIT_APARTMENTTHREADED ) followed by a CoInitializeEx( NULL, COINIT_MULTITHREADED ) yields an error (which implies that code checking which apartment the thread is currently in is executed), I decided to write up a test program and step through the second CoInitializeEx-call.

I whould have expected to find the information stored in some TLS-slot, however, this is not the case. Instead the TEB structure contains a field dedicated to OLE:

typedef struct _TEB
{
	/*...*/
	PVOID           ReservedForOle;
	/*...*/
} TEB, *PTEB;

As a side note — while dedicating a separate field to OLE may have its advantages, it actually vialolates the idea of layering. OLE/COM is layered above NT; NT should not even know about COM/OLEs existance and thus should not reserve a field for COM/OLE. As such, using TLS would have been the cleaner choice. But I digress…

While identifying this field within the TEB is straightforward, it is totally undocumented which structure this field points to. From the disassembly, it is visible that the apartment type is stored in some flag field at 0xC bytes offset. Fortunately, others have written about that before and have found out the flag values of this field. Of course, there is no guarantee that the values and the offset does not change in future releases of windows — all I can currently say is that the implementation works fine on WinXP x86. Given this information, I was able to code up a WinDBG debugging extension that offers me the information I was looking for:

0:008> ~*e !apt
Thread 0x0000057C Apartment: STA
Thread 0x0000053C Apartment: Not a COM thread
Thread 0x0000056C Apartment: Not a COM thread
Thread 0x00000538 Apartment: Unknown (Unrecognized flags)
Thread 0x00000568 Apartment: Not a COM thread
Thread 0x00000524 Apartment: STA
Thread 0x00000558 Apartment: MTA
Thread 0x00000550 Apartment: MTA

Threads for which the ReservedForOle pointer is NULL are reported as ‘Not a COM thread’. There are, however, threads for which the pointer is non-NULL, yet the aforementioned flag field contains the value 0×00000001, which can neither be identified as STA, MTA or TNA. They are thus reported as ‘Unknown’

The follwoing listing shows the code for retrieving the information I used within the debugging extension.

#define OLE_STA_MASK   0x080    // Bugslayer, MSJ 10/99
#define OLE_MTA_MASK   0x140    // Bugslayer, MSJ 10/99
#define OLE_TNA_MASK   0x800    // http://members.tripod.com/IUnknwn

#define JPDBGEXT_E_DEBUGEE_ERROR MAKE_HRESULT( 1, FACILITY_ITF, 0x200 );
#define JPDBGEXT_E_UNKNOWN_APT     MAKE_HRESULT( 1, FACILITY_ITF, 0x201 );

typedef struct _OLE_INFORMATION
{
    CHAR Padding[ 0xC ];
    DWORD Apartment;
} OLE_INFORMATION;

HRESULT JpDbgExtpGetThreadTebBaseAddress(
    __in HANDLE hThread,
    __out DWORD *pdwBaseAddress
    )
{
    THREAD_BASIC_INFORMATION threadInfo;
    DWORD retLen;
    NTSTATUS status;

    _ASSERTE( hThread );
    _ASSERTE( pdwBaseAddress );

    status = NtQueryInformationThread(
        hThread,
        ThreadBasicInformation,
        &threadInfo,
        sizeof( THREAD_BASIC_INFORMATION ),
        &retLen );
    if ( STATUS_SUCCESS != status )
    {
        return HRESULT_FROM_NT( status );
    }

    *pdwBaseAddress = * ( DWORD* ) &threadInfo.TebBaseAddress;
    return S_OK;
}

HRESULT JpDbgExtpGetApartmentType(
    __in HANDLE hThread,
    __out APARTMENT_TYPE *pApt
    )
{
    DWORD dwTebBaseAddress = 0;
    PVOID pOleAddress = 0;
    OLE_INFORMATION oleInfo;
    HRESULT hr = E_UNEXPECTED;
    TEB debugeeTeb;

    _ASSERTE( hThread );
    _ASSERTE( pApt );

    //
    // Get the debugee thread's TEB.
    //
    hr = JpDbgExtpGetThreadTebBaseAddress( hThread, &dwTebBaseAddress );
    if ( FAILED( hr ) )
    {
        return hr;
    }

    if ( ! ReadMemory(
        dwTebBaseAddress,
        &debugeeTeb,
        sizeof( TEB ),
        NULL ) )
    {
        return JPDBGEXT_E_DEBUGEE_ERROR;
    }

    //
    // Reach into the TEB and read OLE information.
    //
    pOleAddress = debugeeTeb.ReservedForOle;

    if ( pOleAddress == NULL )
    {
        //
        // Not a COM thread.
        //
        *pApt = APARTMENT_TYPE_NONE;
        hr = S_OK;
    }
    else
    {
        DWORD dwOleAddress = * ( DWORD* ) &pOleAddress;

        //
        // COM thread, get apartment
        //
        if ( ! ReadMemory(
            dwOleAddress,
            &oleInfo,
            sizeof( OLE_INFORMATION ),
            NULL ) )
        {
            return JPDBGEXT_E_DEBUGEE_ERROR;
        }

        if ( oleInfo.Apartment & OLE_STA_MASK )
        {
            *pApt = APARTMENT_TYPE_STA;
            hr = S_OK;
        }
        else if ( oleInfo.Apartment & OLE_MTA_MASK )
        {
            *pApt = APARTMENT_TYPE_MTA;
            hr = S_OK;
        }
        else if ( oleInfo.Apartment & OLE_TNA_MASK )
        {
            *pApt = APARTMENT_TYPE_TNA;
            hr = S_OK;
        }
        else
        {
            *pApt = APARTMENT_TYPE_UNKNOWN;
            hr = S_OK;
        }
    }

    return hr;
}

}


Categories

TechEd_Europe_Blog_LP_IMAtt




Try Visual Assert, the unit testing add-in for Visual Studio (R)


NTrace: Function Boundary Tracing for Windows on IA-32

About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Johannes is pretty much fed up with Unix and mostly cares about Win32, COM, and NT kernel mode development, along with some .Net and Java. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

More about Johannes...

Johannes' GPG fingerprint is DB1D 6173 C57E D6C7 3287 EE56 F867 6F44 7DC6 741F.

LinkedIn LinkedIn Profile
Xing Xing Profile