Archive for the 'Win32' Category

cfix 1.5 released; adds support for EXE-embedded tests and kernel mode multi-threading

A new version of the cfix unit testing framework is now ready for download.

Unlike the previous release, which was mainly a maintenance release, cfix 1.5 adds major new features: kernel mode multithreading and EXE-based unit tests.

EXE based unit tests

As I discussed in a previous post in the context of Visual Assert/cfix studio, cfix’ restriction to DLL based unit tests has turned out to be quite a limitation for certain kinds of projects.

In cfix 1.5, this restriction has been removed: cfix now supports both, unit tests compiled and linked into DLL modules and unit tests embedded into EXE modules.

Now, when I say embeded into an EXE module, I do not mean that you may merely leave out the /LD compiler switch — it means that you may compile and link unit tests into the actual application EXE module without impacting this application’s behavior or having to change the application’s main() routine: When you run the application direcly or through the debugger, it will behave as normal (e.g. launch GUI). Once you launch it via cfix32 -exe app.exe (or cfix64 -exe app.exe, respectively), however, the application’s main() routine will not execute and instead, your embedded unit tests will run.

This may seem awkward at first — but it offers a tremendous advantage: All of the application’s code now immediately becomes testable (i.e. accessible by unit test code) without having to do complex reorganization of your build process or source tree layout (although it is still a very good idea to enhance the build process s.t. the unit tests are stripped in the RTM builds and to make cfix.dll a delayload DLL). For many projects, relying on this feature will therefore make adopting cfix and maybe even unit testing in general much easier.

In almost all regards, EXE-embedded unit tests behave the same as their DLL counterparts. They are, however, slightly less handy when it comes to debugging (unless you use Visual Assert, which will shield you from this). For this reason, and their greater flexibility in general, it should be noted though that DLL-based unit tests will remain the preferred choice.

Kernel mode multithreading

Since its first release, cfix has featured support for multi-threaded test cases. Multi-threaded test cases are tests which spawn child threads and — and this is the important point — both, the “main” thread and the child threads may trigger assertions (i.e. use CFIX_ASSERT and friends). Regardless of the thread an assertion occurs on, it will be recognized by the framework and will lead to the test case being marked as having failed. In case this sounds trivial to you, be informed that not even JUnit properly supports this :)

And while this feature has been supported for user mode tests ever since, the infrastructure for kernel mode unit tests, which was added in version 1.1, has lacked support for this feature: There was no kernel mode counterpart of CfixCreateThread and thus, only single-threaded kernel mode tests were supported.

cfix 1.5 now finally introduces CfixCreateSystemThread: CfixCreateSystemThread is basically a wrapper for PsCreateSystemThread with the added functionality of registering the child thread with cfix. Thus, all threads spawned using CfixCreateSystemThread (rather than using PsCreateSystemThread directly) are allowed to make use of assertions.

Kudos to Matt Oh for beta-testing this feature.

Minor enhacements

Another enhancement related to multi threaded tests is Auto-joining of child threads: After a test routine completes, the framework will now automatically check whether any child threads were created. If this is the case and any of these threads are still running, they will be waited on — not before all threads have terminated will the test run resume. This feature both makes writing multi-threaded tests more convenient (you do not have to wait by yourself) and safer (No more runaway child threads).

Finally, another feature worth noting is that by specifying the -td command line switch, cfix can be directed to not capture a stack backtrace when an assertion fails. Although stack backtraces are usually very helpful, the symbol loading can make their creation quite expensive. Using this switch can therefore speed up the usage of cfix.

Download/Upgrade Now

You see, there are many good reasons to upgrade your cfix installation or — in case you are not using cfix yet — to give cfix a try.

Better yet, check out the Visual Assert AddIn for Visual C++ — it is based on the new cfix 1.5 release and seamlessly integrates with Visual Studio.

Download cfix 1.5.0 Installer
Download cfix 1.5.0 source code

Writing Data-Driven Custom Actions

Whenever Windows Installer’s built-in actions do not suffice to perform a specific task, a Custom Action needs to be written. Needless to say, Custom Actions, can be a bit tricky — not only can they be laborious to write and cumbersome to debug, they also run the risk of interfering with Windows Installer’s declarative, transactional way of performing installs.

It is not really surprising that Windows Installer therefore more or less discourages the use of Custom Actions unless it is absolutely necessary. Moreover, as a result of its declarative nature, it is understadable that Windows Installer prefers Custom Actions to be data-driven.

What this means in practice is that a Custom Action should not perform a hard-coded task — rather, it should query one or more (custom) tables containing the necessary information (in a declarative manner) about what is to be performed and should act accordingly.

Using WiX, creating custom tables turns out to be pretty easy. Let’s assume we create a Custom Action that, based on some condition, does something with a specific file. An appropriate table could look like this:


<CustomTable Id="MyCustomTable">
  <Column Id="Id" Type="string" PrimaryKey="yes"/>
  <Column Id="Path" Type="string"/>
  <Column Id="Condition" Type="string"/>

  <Row>
    <Data Column="Id">Foo</Data>
    <Data Column="Path">[INSTALLLOCATION]foo.txt</Data>
    <Data Column="Condition"><![CDATA[ &FeatureFoo=3 ]]></Data>
  </Row>
  <Row>
    <Data Column="Id">Bar</Data>
    <Data Column="Path">[INSTALLLOCATION]bar.txt</Data>
    <Data Column="Condition"><![CDATA[ &FeatureBar=3 ]]></Data>
  </Row>
</CustomTable>

To query this table, we have to open a view and fetch the records one by one:

PMSIHANDLE Database = MsiGetActiveDatabase( InstallHandle );
ASSERT( Database != NULL );

PMSIHANDLE View;
UINT Result = MsiDatabaseOpenView(
  Database,
  L"SELECT `Condition`, `Path`, FROM `MyCustomTable`",
  &View );
if ( ERROR_SUCCESS != Result )
{
  ...
}

Result = MsiViewExecute( View, NULL );
if ( ERROR_SUCCESS != Result )
{
  ...
}

for ( ;; )
{
  PMSIHANDLE Record;
  Result = MsiViewFetch( View, &Record );
  if ( Result == ERROR_NO_MORE_ITEMS  )
  {
    break;
  }
  else if ( ERROR_SUCCESS != Result )
  {
    ...
  }

  //
  // Read condition.
  //
  // N.B. Do not format -- this is done by
  // MsiEvaluateCondition itself.
  //

  WCHAR Condition[ 256 ];
  DWORD Length = _countof( Condition );
  Result = MsiRecordGetString(
    Record,
    1,
    Condition,
    &Length );
  if ( ERROR_SUCCESS != Result )
  {
    ...
  }

  if ( MSICONDITION_TRUE != MsiEvaluateCondition(
    InstallHandle,
    Condition ) )
  {
    //
    // This record can be skipped.
    //
    continue;
  }

  //
  // Read remaing fields.
  //

  WCHAR Path[ MAX_PATH ];
  Length = _countof( VszPath );
  Result = GetFormattedRecord(
    InstallHandle,
    Record,
    2,
    Path,
    &Length );
  if ( ERROR_SUCCESS != Result )
  {
    ...
  }

  ...
}

With GetFormattedRecord being the following utility routine:


static UINT GetFormattedRecord(
  __in MSIHANDLE InstallHandle,
  __in MSIHANDLE Record,
  __in UINT Field,
  __out PWSTR Value,
  __inout PDWORD Length
  )
{
  DWORD RecLength = *Length;
  UINT Result = MsiRecordGetString(
    Record,
    Field,
    Value,
    &RecLength );
  if ( ERROR_SUCCESS != Result )
  {
    *Length = RecLength;
    return Result;
  }

  PMSIHANDLE FormattingRecord = MsiCreateRecord( 1 );

  Result = MsiRecordSetString( FormattingRecord, 0, Value );
  if ( ERROR_SUCCESS != Result )
  {
    return Result;
  }

  return MsiFormatRecord(
    InstallHandle,
    FormattingRecord,
    Value,
    Length );
}

Some things are worth noting:

  • I use PMSIHANDLE, which, as you probably already know, is not a typedef for MSIHANDLE* but rather a smart-pointer like class that automatically closes the handle when it goes out of scope.
  • The use of backticks in the query.
  • It must have been a Visual Basic programmer implementing MsiRecordGetString: Field Indexes start with 1, not 0. To make matters worse, reading from index 0 does not fail but returns arbitrary data. Finally, to confuse people further, indexes are 0-based for MsiRecordSetString.
  • If the field contains formatted data, you have to MsiFormatRecord it yourself. For conditions, however, MsiEvaluateCondition handles that for you.

So far, so good. There is, however, one thing to notice: To access the installer database, the custom action must be a nondeferred action:

You cannot access the current installer session or all property data from a deferred execution custom action

The problem with nondeferred actions, however, is that they execute in user context — in contrast to deferred actions, which execute in system context. On pre-Vista platforms, a per-machine installer package can be expected to always be launched by an administrator (otherwise it will fail anyway) — in this case, the differences between user and system context may not be important — both, for example, have r/w access to files in %ProgramFiles%. On Vista and later OS, however, it is common to have a regular user launch an installation which causes an elevation prompt once it reaches the main install phase. In this case, the user context is significantly less privileged than system context.

For a hypothetical custom action that is intended to edit a file installed to %ProgramFiles%, this means that (disregarding rollback considerations and assuming proper scheduling) performing this action from within the nondeferred custom action will work fine on pre-Vista OS. When run on Vista, though, it is likely to fail due to lack of write access to %ProgramFiles%. In practice, this means that all system-changing tasks usually have to be performed by a deferred action.

To sum up: To be data-driven, you have to use nondeferred actions. To be able to perform any serious, system state-changing tasks, however, you have to use deferred actions.

Great.

As it turns out, however, there is a way to escape this catch-22, and it is carefully buried in the Windows Installer documentation:

[...] Actions that update the system, such as the InstallFiles and WriteRegistryValues actions, cannot be run by calling MsiDoAction. The exception to this rule is if MsiDoAction is called from a custom action that is scheduled in the InstallExecuteSequence table between the InstallInitialize and InstallFinalize actions. [...]

[From the Remarks section of MsiDoAction]

In fact, the way I came across this solution was by looking at the source code of the WiX XmlFile action, which I knew manages to both be data-driven (uses a custom table) and alter system state (edits XML files). The way it does this, and the point where the above remark comes into play, is as follows: In the nondeferred action, you do not perform any actions changing system state. Rather, you collect the information from the installer tables and stuff it (yuck) into the CustomActionData property. Then, leveraging MsiDoAction and passing said CustomActionData, you schedule another custom action — this time a deferred one — which parses the CustomActioData (yuck) and, based on this data, finally performs the actual modifications — in system context.

It really could not be easier and more intuitive, right?

cfix studio Beta 2 to add support for EXE-based unit tests

N.B. cfix studio was the code name of what has become Visual Assert

The biggest shortcoming of the current cfix studio version certainly is that it requires all tests be implemented in a DLL. Conceptually, keeping test cases separated from the remaining code certainly is a good idea — and implementing tests in a DLL is a way to accomplish this. However, there are many projects in which such separation is either not feasible or just too much effort.

The good news is that with Beta 2, this will finally change: EXEs become first class-citizens in cfix studio and it will not matter any more whether your tests are part of a DLL or EXE project — you can just put them where you think is appropriate.

Take a classic MFC/GUI application project as an example: It is pretty common for these kinds of projects that most, if not all, application logic is part of a single Visual Studio project that compiles into a single EXE. There may be some additional DLLs or LIBs, but by and large, the EXE itself is where most of the interesting things happen.

The upcoming Beta 2 release now allows you to implement all your unit tests as part of the same EXE project. This means that your tests have access to all classes, functions and resources that are part of the project — all of which you would not easily have access to if you implemented the tests in a separate DLL.

Of course, embedding unit tests into an executable raises two questions:

  1. How to strip the tests from the final release?
  2. How on earth will you be able to run these tests without having main() create windows, load files, play sounds, etc each time?

Thankfully, C/C++ has a preprocessor and Visual C++ has the “exclude from build” feature which allows you to exclude certain files whenever the project is built using a specific configuration. Using any of these two features, the first question is easily answered.

The second problem is more tricky — but thankfully, it has already been solved for you: When cfix studio runs unit tests, it is well aware of that running main() might have, let’s say interesting effects — so what it does is simple: It just makes sure that main() is never run! Not only does this ensure that the tests run silently, i.e. without windows popping up etc, it also has the benefit that all unit tests “see” the application in a pristine state: Rather than having to worry about which state main() has brought the application into, you can initialize and clean up any state you need in your Setup and Teardown functions1.

To make a long story short: You can write unit tests in EXE projects in exactly the same manner as you would in a DLL project. No special considerations needed, no project settings that need to be changed, no additional boilerplate code to write. And when you run the EXE outside cfix studio, i.e. hit F5 in Visual Studio or launch the EXE directly, you will not even notice that the EXE houses some unit tests — everything works as normal.

Sounds good? Then wait a few more days and see it in action!

Remarks
1: Needless to say, all global variables are initialized, constructors are run, etc. All CRT initialization happens as normal; only main()/WinMain() is not run. And yes, it also works for apps that link statically to MFC and therefore do not have a “regular” WinMain().

cfix 1.4 released

Today, a new version of cfix, the open source unit testing framework for user and kernel mode C and C++, has been released. cfix 1.4, in addition to the existing feature of allowing test runs to be restricted to specific fixtures, now also allows single testcases to be run in isolation, which can be a great aid in debugging. Besides several minor fixes, the cfix API has been slightly enhanced and cfix now degrades more gracefully in case of dbghelp-issues.

Updated cfix binaries and source code are now available for download

Uniquely Identifying a Module’s Build

It is common practice to embed a version resource (VS_VERSIONINFO) into PE images such as DLL and EXE files. While this resource mainly serves informational purposes, the version information is occasionaly used to perform certain checks, such as verifying the module’s suitability for a particular purpose.

Under certain circumstances, however, this versioning information may be too imprecise: Versions are not necessarily incremented after each build, so it is possible that two copies of a module carry the same versioning information, yet differ significantly in their implementation. In such situations, identifying the actual build of the module might become neccessary.

The most common, but by no means the only situation in which this applies in practice concerns debugging — to identify the PDB file exactly matching a given module, the debugger must be able to recognize the specific build of a module. It thus does not come as a surprise that all images for which debugging information has been generated contain a dedicated identifier for this purpose: The CodeView signature GUID.

Summarizing what Oleg Starodumov has covered in more detail, cl, when directed to generate a PDB file, implicitly creates this GUID and, along with the path to the PDB file, embeds this data into the PE image. For current versions, the relevant structure used to encode this information is CV_INFO_PDB70, which seems to have been documented once, but not any more:

typedef struct _CV_INFO_PDB70
{
  ULONG CvSignature;
  GUID Signature;
  ULONG Age;
  UCHAR PdbFileName[ ANYSIZE_ARRAY ];
} CV_INFO_PDB70, *PCV_INFO_PDB70;

In order to be able to locate the structure within the PE image, a directory entry of type IMAGE_DEBUG_TYPE_CODEVIEW is written to the image’s debug directory. The following code listing demonstrates how to obtain the signature GUID of an image:

#define PtrFromRva( base, rva ) ( ( ( PUCHAR ) base ) + rva )

static PIMAGE_DATA_DIRECTORY GetDebugDataDirectory(
  __in ULONG_PTR LoadAddress
  )
{
  PIMAGE_DOS_HEADER DosHeader =
    ( PIMAGE_DOS_HEADER ) ( PVOID ) LoadAddress;
  PIMAGE_NT_HEADERS NtHeader = ( PIMAGE_NT_HEADERS )
    PtrFromRva( DosHeader, DosHeader->e_lfanew );
  ASSERT ( IMAGE_NT_SIGNATURE == NtHeader->Signature );

  return &NtHeader->OptionalHeader.DataDirectory
      [ IMAGE_DIRECTORY_ENTRY_DEBUG ];
}

NTSTATUS GetDebugGuid(
  __in ULONG_PTR ModuleBaseAddress,
  __out GUID *Guid
  )
{
  PIMAGE_DATA_DIRECTORY DebugDataDirectory;
  PIMAGE_DEBUG_DIRECTORY DebugHeaders;
  ULONG Index;
  ULONG NumberOfDebugDirs;
  ULONG_PTR ModuleBaseAddress;
  NTSTATUS Status;

  DebugDataDirectory  = DebugDataDirectory( ModuleBaseAddress );
  DebugHeaders    = ( PIMAGE_DEBUG_DIRECTORY ) PtrFromRva(
    ModuleBaseAddress,
    DebugDataDirectory->VirtualAddress );

  ASSERT( ( DebugDataDirectory->Size % sizeof( IMAGE_DEBUG_DIRECTORY ) ) == 0 );
  NumberOfDebugDirs = DebugDataDirectory->Size / sizeof( IMAGE_DEBUG_DIRECTORY );

  //
  // Lookup CodeView record.
  //
  for ( Index = 0; Index < NumberOfDebugDirs; Index++ )
  {
    PCV_INFO_PDB70 CvInfo;
    if ( DebugHeaders[ Index ].Type != IMAGE_DEBUG_TYPE_CODEVIEW )
    {
      continue;
    }

    CvInfo = ( PCV_INFO_PDB70 ) PtrFromRva(
      ModuleBaseAddress,
      DebugHeaders[ Index ].AddressOfRawData );

    if ( CvInfo->CvSignature != 'SDSR' )
    {
      //
      // Weird, old PDB format maybe.
      //
      return STATUS_xxx_UNRECOGNIZED_CV_HEADER;
    }

    *Guid = CvInfo->Signature;
    return STATUS_SUCCESS;
  }

  return STATUS_xxx_CV_GUID_LOOKUP_FAILED;
}

cfix 1.2 Installer Fixed for AMD64

The cfix 1.2 package as released last week contained a rather stupid bug that the new build, 1.2.0.3244, now fixes: the amd64 binaries cfix64.exe and cfixkr64.sys were wrongly installed as cfix32.exe and cfixkr32.sys, respectively. Not only did this stand in contrast to what the documenation stated, it also resulted in cfix being unable to load the cfixkr driver on AMD64 platforms.

The new MSI package is now available for download on Sourceforge.

cfix 1.2 introduces improved C++ support

cfix 1.2, which has been released today, introduces a number of new features, the most prominent being improved support for C++ and additional execution options.

New C++ API

To date, cfix has primarily focussed on C as the programming language to write unit tests in. Although C++ has always been supported, cfix has not made use of the additional capabilities C++ provides. With version 1.2, cfix makes C++ a first class citizen and introduces an additional API that leverages the benefits of C++ and allows writing test cases in a more convenient manner.

Being implemented on top of the existing C API, the C++ API is not a replacement, but rather an addition to the existing API set.

As the following example suggests, fixtures can now be written as classes, with test cases being implemented as methods:

#include <cfixcc.h>

class ExampleTest : public cfixcc::TestFixture
{
public:
  void TestOne()
  {}

  void TestTwo()
  {}
};

CFIXCC_BEGIN_CLASS( ExampleTest )
  CFIXCC_METHOD( TestOne )
  CFIXCC_METHOD( TestTwo )
CFIXCC_END_CLASS()

To learn more about the definition of fixtures, have a look at the respective TestFixture chapter in the cfix documentation.

Regarding the implementation of test cases, cfix adds a new set of type-safe, template-driven assertions that, for instance, allow convenient equality checks:

void TestOne()
{
  const wchar_t* testString = L"test";

  //
  // Use typesafe assertions...
  //
  CFIXCC_ASSERT_EQUALS( 1, 1 );
  CFIXCC_ASSERT_EQUALS( L"test", testString );
  CFIXCC_ASSERT_EQUALS( wcslen( testString ), ( size_t ) 4 );

  //
  // ...log messages...
  //
  CFIX_LOG( L"Test string is %s", testString );

  //
  // ...or use the existing "C" assertions.
  //
  CFIX_ASSERT( wcslen( testString ) == 4 );
  CFIX_ASSERT_MESSAGE( testString[ 0 ] == 't',
    L"Test string should start with a 't'" );
}

Again, have a look at the updated API reference for an overview of the new API additions.

Customizing Test Runs

Another important new feature is the addition of the new switches -fsf (Shortcut Fixture), -fsr (Shortcut Run), and -fss (Shortcut Run On Failing Setup). Using these switches allows you to specify how a test run should resume when a test case fails.

When a test case fails, the default behavior of cfix is to report the failure, and resume at the next test case. By specifying -fsf, however, the remaining test cases of the same fixture will be skipped and execution resumes at the next fixture. With -fsr, cfix can be requirested to abort the entire run as soon as a single test case fails.

What else is new in 1.2?

Download

As always, cfix 1.2 is source and binary compatible to previous versions. The new MSI package and source code can now be downloaded on Sourceforge.

cfix is open source and licensed under the GNU Lesser General Public License.

Effective Leak Detection with the Debug CRT and Application Verifier

Programming memory leaks in C or C++ is easy. Even careful programming often cannot avoid the little mistakes that finally end up in your program having a memory leak. Thankfully, however, there are plenty of helpful tools that assist in finding leaks as early as possible.

One especially helpful tool for leak detection is the debug CRT. Although the leak detection facilities provided by the debug CRT are not as far-reaching as those of, say, UMDH, using the debug CRT is probably the most friction-less way of identifying leaks.

Of course, the debug CRT will only track allocations of the CRT heap. That is, allocations performed using malloc or, in case of C++, the default operator new.

So how to enable allocation tracking? As it turns out, it is already enabled by default for the debug heap — so changing the CRT flags using _CrtSetDbgFlag usually is not even neccessary. All there is to do is to call _CrtDumpMemoryLeaks() at the end of the program.

When exactly is “the end of the program”? That depends on which CRT you use. Each CRT uses a separate heap and thus, must have its resources be tracked separately. If your application EXE and your DLLs all link against the DLL version of the CRT, the right moment to call _CrtDumpMemoryLeaks() is at the end of main(). If you use the static CRT, the right moment is when the respective module is about to unload — for an EXE, this is the end of main() again (atexit is another option). For a DLL, however, this is DllMain (in the DLL_PROCESS_DETACH case).

To illustrate how to make use of this CRT feature, consider the following leaky code:

#include <crtdbg.h>

class Widget
{
private:
  int a;

public:
  Widget() : a( 0 )
  {
  }
};

void UseWidget( Widget* w )
{
}

int __cdecl wmain()
{
  Widget* w = new Widget();
  UseWidget( w );

  _CrtDumpMemoryLeaks();
  return 0;
}

Running the debug build (i.e. a build using the debug CRT) of this program will yield the following output in the debugger:

Detected memory leaks!
Dumping objects ->
{124} normal block at 0x008C2880, 4 bytes long.
 Data:  00 00 00 00
Object dump complete.

So we have a memory leak — allocation #124 is not freed. The default procedure to locate the leak now is to include a call to _CrtSetBreakAlloc( 124 ) in the program and run it in the debugger — it will break when allocation #124 is performed. While this practice is ok for smaller programs, it will fail as soon as your program is not fully deterministic any more — most likely because it uses multiple threads. So for many programs, this technique is pretty much worthless.

But before continueing on the topic of how to find the culprit, there is another catch to discuss. Let’s include this snippet of code into our program:

class WidgetHolder
{
private:
  Widget* w;

public:
  WidgetHolder() : w( new Widget() )
  {}

  ~WidgetHolder()
  {
    delete w;
  }
};

WidgetHolder widgetHolder;

No leak here — we are properly cleaning up. But let’s see what the debugger window tells:

Detected memory leaks!
Dumping objects ->
{125} normal block at 0x000328C0, 4 bytes long.
 Data:  00 00 00 00
{124} normal block at 0x00032880, 4 bytes long.
 Data:  00 00 00 00
Object dump complete.

Urgh. But the reason should be obvious — when main() is about to return, ~WidgetHolder has not run yet. As a consequence, WidgetHolder’s allocation has not been freed yet and _CrtDumpMemoryLeaks will treat this as a leak. Unfortunately, there is no good way to avoid such false positives. Of course, this only holds for C++. For C, this problem does not exist.

Ok, back to the problem of locating the leak. We know that allocation #124 is the problem, but assuming our program does more than the simplistic example, breaking on #124 during the next runs is likely to lead us to ever changing locations. So this information is worthless. That leaves the address of the leaked memory — 0×008C2880.

At this point, we can leverage the fact that the CRT heap is not really a heap but just a wrapper around the RTL heap. Therefore, we can use the incredibly powerful debugging facilities of the RTL heap to help us out.

In order to fix a leak, it is usually extremely helpful to locate the code having conducted the allocation. Once you have this information, it is often trivial to spot the missing free operation. As it turns out, the the RTL heap’s page heap feature offers this capability.

Open Application Verifier and enable Heap Checks for our application. By default, this enables the full page heap, but the normal page heap is enough for our case.

Note that for the following discussion, I assume you are using the Visual Studio debugger.

Set a breakpoint on the statement immediately following the _CrtDumpMemoryLeaks() statement and run the application until it breaks there. This time, the locations 0×02CDFFA0 and 0×02CDFF40 are reported as being leaked. Do not continue execution yet.

Rather, open WinDBG and attach noninvasively to the debugged process. VisualStudio is already attached, so we cannot perform a real attach, but a noninvasive attach does the trick.

In WinDBG, we now use the !heap extension to query page heap information:

0:000> !heap -p -a 0x02CDFF40
    address 02cdff40 found in
    _HEAP @ 2cd0000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        02cdfef8 000c 0000  [00]   02cdff20    00028 - (busy)
        Trace: 02dc
        776380d8 ntdll!RtlDebugAllocateHeap+0x00000030
[...]
        6a2fab29 MSVCR80D!malloc+0x00000019
        6a34908f MSVCR80D!operator new+0x0000000f
        4115c9 Leak!WidgetHolder::WidgetHolder+0x00000049
        415808 Leak!`dynamic initializer for 'widgetHolder''+0x00000028
        6a2e246a MSVCR80D!_initterm+0x0000001a
        411d33 Leak!__tmainCRTStartup+0x00000103
        411c1d Leak!wmainCRTStartup+0x0000000d
        767b19f1 kernel32!BaseThreadInitThunk+0x0000000e
        7764d109 ntdll!_RtlUserThreadStart+0x00000023

0:000> !heap -p -a 0x02CDFFA0
    address 02cdffa0 found in
    _HEAP @ 2cd0000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        02cdff58 000c 0000  [00]   02cdff80    00028 - (busy)
        Trace: 02e0
        776380d8 ntdll!RtlDebugAllocateHeap+0x00000030
[...]
        6a2fab29 MSVCR80D!malloc+0x00000019
        6a34908f MSVCR80D!operator new+0x0000000f
        411464 Leak!wmain+0x00000044
        411dd6 Leak!__tmainCRTStartup+0x000001a6
        411c1d Leak!wmainCRTStartup+0x0000000d
        767b19f1 kernel32!BaseThreadInitThunk+0x0000000e
        7764d109 ntdll!_RtlUserThreadStart+0x00000023

Aha, stack traces! The remaining analysis is almost trivial: 0×02CDFF40 has been allocated on behalf of WidgetHolder::WidgetHolder. WidgetHolder::WidgetHolder, however, is not indirectly invoked by wmain, but rather by MSVCR80D!_initterm! That is a strong indication for this being a global object that can be ignored in this analysis.

0×02CDFFA0, in turn, is allocated by wmain, so this is a real leak. But which allocation is it, exactly? lsa will tell us:

0:000> lsa Leak!wmain+0x00000044
    33: }
    34:
    35: int __cdecl wmain()
    36: {
>   37: 	Widget* w = new Widget();
    38: 	UseWidget( w );
    39:
    40: 	_CrtDumpMemoryLeaks();
    41: 	return 0;
    42: }

There you go, we have found the culprit.

Although simple, I have found this technique to be very effective in practice, as it enables you to find leaks as you develop your code. As Application Verifier should be enabled anyway for any application you are developing on, the technique also turns out to be a lot less laborious than it may seem. It almost certainly is a lot more convenient than routinely doing UMDH runs. To be fair, however, UMDH is able to catch more leaks (non CRT-leaks), so additionally using UMDH remains being a good idea.

Debugging a Debugger Deadlock

While I still use VisualStudio 2005 Team System for most of my development, I want to make sure that cfix works properly with VisualStudio 2008 as well.

To test that, I recently started a Windows 2003 Server VM, installed VCExpress 2008 and cfix and attempted to run an example project in the VC debugger. As long as no assertions fired, everything seemed fine. I then altered the example’s source code so that one of the assertion would fail, ran it in the debugger — and waited. Nothing happened.

When an assertion fires, cfix 1.1 attempts to capture the stack trace to make debugging easier. For this to work, the respective debugging symbols must be loaded, and — if not yet available — be downloaded from the symbol server. It is thus not uncommon that it takes a couple of seconds before the failure message appears and the debugger breaks in when an assertion fails. But after about a minute, still nothing happened — a deadlock seems to have occured.

To make the situation worse, I then noticed that while cfix was hanging, Internet Explorer would not properly start any more. After creating its main window, it would hang as well.

So I rebooted the machine and hooked up a kernel debugger. Luckily, the scenario was easy to reproduce and I could take a closer look…

Debugging the deadlock

First, let’s find the cfix32 process:

kd> !process 0 0 cfix32.exe
PROCESS 8254e490  SessionId: 0  Cid: 06b0    Peb: 7ffde000  ParentCid: 0658
    DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
    Image: cfix32.exe

Now that we have the VA, we can switch to the process and load its user mode symbols:

kd> .process /r /p 8254e490  
Implicit process is now 8254e490
.cache forcedecodeuser done
Loading User Symbols
...............................

Looking at this process’ threads, it is easy to see that one stack looks suspicious:

kd> !process 8254e490
PROCESS 8254e490  SessionId: 0  Cid: 06b0
	Peb: 7ffde000  ParentCid: 0658
    DirBase: 06e90000  ObjectTable: e170ee58  HandleCount:  69.
    Image: cfix32.exe
    VadRoot 82455628 Vads 76 Clone 0 Private 491.
    Modified 8. Locked 0.
    DeviceMap e148f3c0

    [...]

THREAD 826276f8  Cid 06b0.06b4  Teb: 7ffdd000 Win32Thread:
  e1734a38 WAIT: (Unknown) KernelMode Non-Alertable
	f4422b44  SynchronizationEvent
Not impersonating
DeviceMap                 e148f3c0
Owning Process            8254e490       Image:         cfix32.exe
Wait Start TickCount      13474          Ticks: 9138 (0:00:01:31.511)
Context Switch Count      219                 LargeStack
UserTime                  00:00:00.010
KernelTime                00:00:00.010
Win32 Start Address cfix32!wmainCRTStartup (0x010048cf)
Start Address kernel32!BaseProcessStartThunk (0x77e617f8)
Stack Init f4423000 Current f4422aa0 Base f4423000 Limit f441f000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr
f4422ab8 808202b6 nt!KiSwapContext+0x25
f4422ad0 8081fb6e nt!KiSwapThread+0x83
f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0 
f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178
f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45
f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf
f4422d34 8082350b nt!NtMapViewOfSection+0x269
f4422d34 7c8285ec nt!KiFastCallEntry+0xf8
00069b94 7c82728b ntdll!KiFastSystemCallRet
00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64
00069ccc 7c833027 ntdll!LdrpMapDll+0x390
[...]

The thread is blocked — it is waiting on en event. Events do not have an owner so we have to do a little more to find out, what it is waiting for. Unfortunately, !thread crops the trace, so let us first get the full one:

kd> .thread 826276f8  
kd> kn100
 # ChildEBP RetAddr
00 f4422ab8 808202b6 nt!KiSwapContext+0x25
01 f4422ad0 8081fb6e nt!KiSwapThread+0x83
02 f4422b14 809bae63 nt!KeWaitForSingleObject+0x2e0
03 f4422bf4 809bc06d nt!DbgkpQueueMessage+0x178
04 f4422c18 8096ba9a nt!DbgkpSendApiMessage+0x45
05 f4422cc8 80909942 nt!DbgkMapViewOfSection+0xcf
06 f4422d34 8082350b nt!NtMapViewOfSection+0x269
07 f4422d34 7c8285ec nt!KiFastCallEntry+0xf8
08 00069b94 7c82728b ntdll!KiFastSystemCallRet
09 00069b98 7c831e05 ntdll!NtMapViewOfSection+0xc
0a 00069bdc 7c831fd6 ntdll!LdrpMapViewOfDllSection+0x64
0b 00069ccc 7c833027 ntdll!LdrpMapDll+0x390
0c 00069f30 7c8330f5 ntdll!LdrpLoadImportModule+0x17c
0d 00069f70 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
0e 00069f8c 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
0f 0006a014 7c833049 ntdll!LdrpWalkImportDescriptor+0x195
10 0006a264 7c8330f5 ntdll!LdrpLoadImportModule+0x1cb
11 0006a2a4 7c8330a4 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x4d
12 0006a2c0 7c833248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
13 0006a348 7c83427d ntdll!LdrpWalkImportDescriptor+0x195
14 0006a5e0 7c834065 ntdll!LdrpLoadDll+0x241
15 0006a85c 77e41bf3 ntdll!LdrLoadDll+0x198
16 0006a8c4 77e41dbd kernel32!LoadLibraryExW+0x1b2
17 0006a8d8 77e41df3 kernel32!LoadLibraryExA+0x1f
18 0006a8f8 46a7870c kernel32!LoadLibraryA+0xb5
19 0006a954 46a93b3e WININET!__delayLoadHelper2+0xfc
1a 0006a994 46a93950 WININET!_tailMerge_RASAPI32_dll+0xd
1b 0006a9a8 46a93a4e WININET!DoConnectoidsExist+0x2b
1c 0006a9d4 46a93abc WININET!GetRasConnections+0x34
1d 0006a9f0 46a8c559 WININET!IsDialUpConnection+0xa9
1e 0006aa0c 46a97a44 WININET!FixProxySettingsForCurrentConnection+0x31
1f 0006b5e4 46aa3774 WININET!InternetQueryOptionA+0xa47
20 0006b748 01d12dc6 WININET!InternetQueryOptionW+0x1fa
21 0006b98c 01d12583 symsrv!StoreWinInet::dumpproxyinfo+0x46
22 0006be04 01d1290a symsrv!StoreWinInet::connect+0x273
23 0006c040 01d05ae7 symsrv!StoreWinInet::find+0x3a
24 0006c134 01d06c47 symsrv!cascade+0x87
25 0006c684 01d06a57 symsrv!SymbolServerByIndexW+0x127
26 0006c8b4 0302e30e symsrv!SymbolServerW+0x77
27 0006ccf4 03018eed dbghelp!symsrvGetFile+0x12e
28 0006d9dc 03019f57 dbghelp!diaLocatePdb+0x33d
29 0006dc58 03041ade dbghelp!diaGetPdb+0x207
2a 0006de7c 0303ff15 dbghelp!GetDebugData+0x2be
2b 0006e324 03040516 dbghelp!modload+0x305
2c 0006e7a4 0304068e dbghelp!LoadModule+0x3f6
2d 0006e9e8 03044eaf dbghelp!GetModule+0x4e
2e 0006ea30 03044bda dbghelp!NTGetProcessModules+0x16f
2f 0006eae8 03032e80 dbghelp!GetProcessModules+0x4a
30 0006ed70 60f03f7a dbghelp!SymInitializeW+0x320
31 0006f17c 60f032b0 cfix!CfixpCaptureStackTrace+0x117
32 0006f598 100419b5 cfix!CfixPeReportFailedAssertion+0xc5
WARNING: Stack unwind information not available.
Following frames may be wrong.
33 0006f7c0 10040f40 VsSample!__CfixFixturePeSimpleAdderTest+0x5ab1
34 0006f8b4 10040db6 VsSample!__CfixFixturePeSimpleAdderTest+0x503c
35 0006fa68 10040a8a VsSample!__CfixFixturePeSimpleAdderTest+0x4eb2
36 0006fb48 60f02b64 VsSample!__CfixFixturePeSimpleAdderTest+0x4b86
37 0006fb84 60f02be6 cfix!CfixsRunTestRoutine+0x33
38 0006fb94 60f038e9 cfix!CfixsRunTestCaseMethod+0x27
39 0006fbac 60f03a06 cfix!CfixsRunTestCase+0x25
3a 0006fbcc 60f03ce5 cfix!CfixsRunTsexecActionMethod+0xfb
3b 0006fbf0 0100e135 cfix!CfixsRunSequenceAction+0x122
3c 0006fc2c 0100d5c2 cfix32!CfixrunpRunFixtures+0x90
3d 0006fc40 0100d85c cfix32!CfixrunsMainWorker+0x3f
3e 0006fe7c 010046b8 cfix32!CfixrunMain+0x1b9
3f 0006fee0 0100485e cfix32!wmain+0x80
40 0006ffc0 77e6f23b cfix32!_wmainCRTStartup+0x12b
41 0006fff0 00000000 kernel32!BaseProcessStart+0x23

Whoa, what a trace! cfix!CfixpCaptureStackTrace tries to assemble a stack trace, for which it has to initialize dbghelp.dll first. dbghelp!SymInitializeW seeks help of symsrv.dll, which in turn tries to connect to the Microsoft symbol server. Before it can so, it obviously attempts to get its proxy settings straight, which in turn leads to some DLL (rasapi.dll, in case you wonder) being loaded. The loader then calls into the debugging subsystem (nt!Dbgk*). It may be assumed that the loader is notifying the debugger about the DLL having been loaded.

Turining our attention to Internet Explorer, we look at iexplore.exe’s threads:

kd> !process 0 0 iexplore.exe
PROCESS 824ec3b0  SessionId: 0  Cid: 07f0    Peb: 7ffdb000  ParentCid: 054c
    DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
    Image: iexplore.exe

kd> .process /r /p 824ec3b0  
Implicit process is now 824ec3b0
.cache forcedecodeuser done
Loading User Symbols
...............................................

Now, iexplore has lots of threads, but skimming over them, one looked interesting:

kd> !process 824ec3b0
PROCESS 824ec3b0  SessionId: 0  Cid: 07f0
	Peb: 7ffdb000  ParentCid: 054c
    DirBase: 0307c000  ObjectTable: e15186b8  HandleCount: 225.
    Image: iexplore.exe
    VadRoot 824991c8 Vads 168 Clone 0 Private 643.
    Modified 44. Locked 0.
    DeviceMap e148f3c0

    [...]

THREAD 82431980  Cid 07f0.00b8  Teb: 7ffd5000 Win32Thread: 00000000
  WAIT: (Unknown) UserMode Non-Alertable
	82631eb0  Mutant - owning thread 826276f8
Not impersonating
DeviceMap                 e148f3c0
Owning Process            824ec3b0       Image:         iexplore.exe
Wait Start TickCount      21924          Ticks: 688 (0:00:00:06.889)
Context Switch Count      1
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address ntdll!RtlpWorkerThread (0x7c839efb)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f40a3000 Current f40a2c78 Base f40a3000 Limit f40a0000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr
f40a2c90 808202b6 nt!KiSwapContext+0x25
f40a2ca8 8081fb6e nt!KiSwapThread+0x83
f40a2cec 8090e64e nt!KeWaitForSingleObject+0x2e0
f40a2d50 8082350b nt!NtWaitForSingleObject+0x9a
f40a2d50 7c8285ec nt!KiFastCallEntry+0xf8
01e5fdd0 7c827d0b ntdll!KiFastSystemCallRet
01e5fdd4 77e61d1e ntdll!NtWaitForSingleObject+0xc
01e5fe44 77e61c8d kernel32!WaitForSingleObjectEx+0xac
01e5fe58 46a8c54d kernel32!WaitForSingleObject+0x12
01e5fe74 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25 
01e5fe8c 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61
01e5fea4 46a7efa3 WININET!CFsm::Run+0x39
01e5fed4 77da5938 WININET!CFsm::RunWorkItem+0x79
01e5feec 7c83a827 SHLWAPI!ExecuteWorkItem+0x1d
01e5ff44 7c83aa0b ntdll!RtlpWorkerCallout+0x71
01e5ff64 7c83aa82 ntdll!RtlpExecuteWorkerRequest+0x4f
01e5ff78 7c839f60 ntdll!RtlpApcCallout+0x11
01e5ffb8 77e64829 ntdll!RtlpWorkerThread+0x61
01e5ffec 00000000 kernel32!BaseThreadStart+0x34

Now we are getting somewhere. We have seen FixProxySettingsForCurrentConnection in cfix’s trace already — but in this case, it is waiting on something. Let’s see…


kd> !object 82631eb0  
Object: 82631eb0  Type: (827a5550) Mutant
    ObjectHeader: 82631e98 (old version)
    HandleCount: 3  PointerCount: 6
    Directory Object: e1496420  Name: WininetProxyRegistryMutex

And 826276f8, that’s the cfix32 thread we have already assessed. Obviously, iexplore waits for cfix to release the WininetProxyRegistryMutex, and cfix waits on someone else.

Turning over to VC, we can find a stack that also contains a call to FixProxySettingsForCurrentConnection on its stack. Again, blocking on WininetProxyRegistryMutex.

kd> k100
ChildEBP RetAddr
f4492c90 808202b6 nt!KiSwapContext+0x25
f4492ca8 8081fb6e nt!KiSwapThread+0x83
f4492cec 8090e64e nt!KeWaitForSingleObject+0x2e0
f4492d50 8082350b nt!NtWaitForSingleObject+0x9a
f4492d50 7c8285ec nt!KiFastCallEntry+0xf8
065b95c8 7c827d0b ntdll!KiFastSystemCallRet
065b95cc 77e61d1e ntdll!NtWaitForSingleObject+0xc
065b963c 77e61c8d kernel32!WaitForSingleObjectEx+0xac
065b9650 46a8c54d kernel32!WaitForSingleObject+0x12
065b966c 46a7eeca WININET!FixProxySettingsForCurrentConnection+0x25
065b9684 46a7ee3f WININET!CFsm_HttpSendRequest::RunSM+0x61
065b969c 46a7fefa WININET!CFsm::Run+0x39
065b96b4 46ab0a67 WININET!DoFsm+0x25
065b96dc 46aa1092 WININET!HttpWrapSendRequest+0x148
065b9714 06b231da WININET!HttpSendRequestW+0x5e
065b973c 06b22ea8 SYMSRV!StoreWinInet::request+0x2a
065b9770 06b226cc SYMSRV!StoreWinInet::fileinfo+0x18
065b9780 06b22741 SYMSRV!StoreWinInet::get+0x7c
065b9fc4 06b229a3 SYMSRV!StoreWinInet::open+0x41
065ba204 06b15ae7 SYMSRV!StoreWinInet::find+0xd3
065ba2f8 06b16c47 SYMSRV!cascade+0x87
065ba848 06b16a57 SYMSRV!SymbolServerByIndexW+0x127
065baa78 51412896 SYMSRV!SymbolServerW+0x77
065bb8cc 51413383 mspdb80!LOCATOR::SYMSRV::SymbolServer+0x190
065bbf10 514136f8 mspdb80!LOCATOR::FLocatePdbSymsrv+0x75
065bbf38 514139ce mspdb80!LOCATOR::FLocatePdbPathHelper+0x179
065bc96c 51413cbe mspdb80!LOCATOR::FLocatePdbPath+0x105
065bccb4 51414371 mspdb80!LOCATOR::FLocatePdb+0x1ad
065bd9a8 458cc1e8 mspdb80!PDBCommon::OpenValidate5+0xab
065bd9ec 45959d4c msenc90!enc::EncImageEdit::
                  `scalar deleting destructor'+0x4d
065bda34 45958e62 NatDbgDE!OLPDBOpen+0x93
065be6c0 45958f0a NatDbgDE!OLStart+0x107
065be6fc 45958fae NatDbgDE!LoadOmfForReal+0x23
065be714 45959019 NatDbgDE!LoadSymbols+0x43
065be72c 459590d9 NatDbgDE!OLLoadOmf+0x55
065be75c 45959154 NatDbgDE!SHLoadDll+0xd5
065be7ac 45959247 NatDbgDE!CSymbolHandlerX::SHLoadDll+0x5a
065be844 4595937c NatDbgDE!CModule::Load+0x1a1
065be8ac 4594d002 NatDbgDE!CNativeProcess::NotifyModLoad+0xc8
065be9ec 4594cf6d NatDbgDE!EngineCallback+0xb3
065bea18 45958d3f NatDbgDE!EMCallBackDB+0x4c
065bf050 4594d0dc NatDbgDE!LoadFixups+0x218
065bf0ac 4594d289 NatDbgDE!DebugPacket+0x213
065bfdb4 4594cf39 NatDbgDE!EMFunc+0x40f
065bfddc 4594d73d NatDbgDE!TLCallBack+0x1e
065bfdf4 4594d711 NatDbgDE!TLClientLib::Local_TLFunc+0xc8
065bfe3c 4594d85c NatDbgDE!DMSendPacket+0x121
065bfee8 45959b1d NatDbgDE!NotifyEM+0x3ae
065bff0c 4594d663 NatDbgDE!ProcessLoadDLLEvent+0x47
065bff44 4594d686 NatDbgDE!ProcessDebugEvent+0x30d
065bffb8 77e64829 NatDbgDE!DmPollLoop+0x3c
065bffec 00000000 kernel32!BaseThreadStart+0x34

But — looking closely, it becomes obvious that this thread must be the one handling debug events, and in fact, the call to ProcessLoadDLLEvent is a strong indication for that this thread is currently handling a DLL load event. And now we have closed the loop — this thread must be handling the DLL load event for rasapi.dll, the DLL which cfix was about to load. And to do this, VC attempts to acquire the WininetProxyRegistryMutex, which is owned by the original cfix thread. Deadlock.

What is interesting about this situation is that neither party — cfix, iexplore or VCExpress, and also none of the modules clearly is the culprit and behaving wrong. It is more like a combination of special circumstances that bring up the deadlock as discussed.

It is also notable that I am not using any particular proxy settings on this machine and automatic proxy configuration has been turned off.

So far, I have not experienced the same problem with VS 2003 and VS 2005 — I thus assume that only VS 2008 is affected by this.

Although I am pretty sure that cfix is not really at fault here, I have to adapt it to avoid this deadlock in the future. Until an updated version is available, you can use this workaround.

cfix 1.1 may experience hangs when used in the VisualStudio 2008 debugger

When used in conjunction with the VisualStudio 2008 debugger, cfix may hang indefinitely as soon as an assertion fails. The reason for this behavior is a Symbol Server-caused deadlock between the debugger and cfix. I am going to discuss the details of this deadlock in a separate post.

Until a new version of cfix is available, you can work around this problem as follows: Go to the cfix installation directory and rename or delete symsrv.dll. This will stop cfix from contacting the symbol server. While cfix now will not be able to capture proper stack traces any more, this will prevent the mentioned deadlock situation to arise.

The problem only applies to cfix 1.1 and VS 2008. The problem does not occur when debugging with VS 2003, VS 2005, or cdb/ntsd/WinDBG.

Next Page »


Categories

TechEd_Europe_Blog_LP_IMAtt




Try Visual Assert, the unit testing add-in for Visual Studio (R)


NTrace: Function Boundary Tracing for Windows on IA-32

About me

Johannes Passing, M.Sc., living in Berlin, Germany.

Johannes is pretty much fed up with Unix and mostly cares about Win32, COM, and NT kernel mode development, along with some .Net and Java. He also is the author of cfix, a C/C++ unit testing framework for Win32 and NT kernel mode, Visual Assert, a Visual Studio Unit Testing-AddIn, and NTrace, a dynamic function boundary tracing toolkit for Windows NT/x86 kernel/user mode code.

Contact Johannes: jpassing (at) acm org

More about Johannes...

Johannes' GPG fingerprint is DB1D 6173 C57E D6C7 3287 EE56 F867 6F44 7DC6 741F.

LinkedIn LinkedIn Profile
Xing Xing Profile