CM_PROB_DRIVER_FAILED_PRIOR_UNLOAD Issue (code 38)

Category: windows hardware wdk and driver development

Question

PhilipLk on Thu, 19 Apr 2018 13:29:54


Hello.

I have a strange problem which appears only on Windows 10 (1703/1709 tested), and never on  Windows 7 or Windows 8.  I have a bus driver (WDM- which I maintain, and did not write from scratch) which enumerates one AVSTREAM child (audio/MIDI).  In certain cases, I see the code CM_PROB_DRIVER_FAILED_PRIOR_UNLOAD listed for the parent bus driver in the SETUPAPI logfile (and in the device manager).

There is a FAQ for debugging this:

“Debugging a Failed Driver Unload” (sorry for the format, this was imposed on me by the forum)

https : // docs dot microsoft dot com / en-us / windows-hardware / drivers / debugger / debugging-a-failed-driver-unload

After a few debug sessions, I have found out that the OBJECT_HEADER of the PDO created by my bus driver has a PointerCount > 0 after the device has been unplugged (HandleCount is 0), resulting in the bus device driver instance being pinned in memory (its unload routine was called), and no other instance being able to load thereafter.

I can usually duplicate the issue as follows:

  1. Plug in device.
  2. Open application, which opens a handle to the AVSTREAM child.
  3. Unplug the device with the application still open.  I note that my bus driver receives (for the affected PDO) surprise removal, but not removal, which is expected, as the application does not close its handle.
  4. Replug the device, while keeping the application open.
  5. Close the application (bus driver receives remove for the old PDO, and IoDeleteDevice is called on it).
  6. Unplug and replug the device, and periodically the symptom in the device manager appears (code 38).  When this issue occurs, I see using the debugger that the original child PDO has a PointerCount > 0 (HandleCount is 0).

Now following the “Debugging a Failed Driver Unload” link above, I have the kernel stacks of each reference/dereference case on the PDO when the issue was reproduced.  I do not see a point where I have an unbalanced reference/dereference with my driver as the origin of both.  The obvious unbalanced cases where my bus driver is involved are the  IRP_MN_QUERY_DEVICE_RELATIONS TargetDeviceRelation/BusRelations paths, where my bus driver returns a referenced device.  I have never received a TargetDeviceRelation/BusRelations request for this PDO after the device has been unplugged in step c) above, so at that point, all outstanding references to this PDO for the QDR/TargetDeviceRelation case ultimately lie in the KS layer, and all outstanding relations to this PDO for the QDR/BusRelations case ultimately lie in the PnP layer.

On to the truly desperate:  I have found that adding a long delay in the child’s AddDevice routine when there is another device departing in parallel causes this issue to no longer manifest itself.  This is not a solution in any way, and I would like to know how I could address the cause of this issue.

Could anyone see how I might have provoked this situation, or have an idea what might be the cause, or offer any advice?

Thanks

Replies

Brian Catlin on Thu, 19 Apr 2018 21:32:16


Once a device is in the surprised remove state, you're not going to see a QDR for TargetDeviceRelation (what would be the point?). You'll only see a QDR for BusRelations BECAUSE the bus driver called IoInvalidateDeviceRelations for BusRelations (typically, when the bus driver detects a membership change on the bus). If adding a delay changes the symptom, then clearly you have a synchronization problem in your driver. These are the most difficult type of problem to solve. Fundamentally, it doesn't matter what the client does as long as your bus driver does all the right things, it will work.

I've written more than a dozen bus drivers over the last 20 years, and I have probably run into this problem but I don't recall the details. Enable Driver Verifier for your driver. Instrument your driver with WPP - NOT DbgPrint, because that can cause HeisenBugs - and analyze the output. Pay particular attention to data, references, and handles shared between routines. Be certain that the original author understood the difference between STOP and REMOVE processing. I would recommend re-reading the WDM PnP and Power docs, which may help you recognize the problem.

 -Brian

PhilipLk on Fri, 20 Apr 2018 16:38:06


Hello,

Thank you for your response.  I've already been using driver verifier and WPP traces.

Once the device has been surprise removed, the bus driver does receive QDR, but only RemovalRelations and EjectionRelations, which I pass down.  There are no other points after the surprise removal state has been reached in which I reference and do not dereference that PDO, which is why I am so confused.  I will re-check the kernel stack traces of each reference/dereference on that PDO in case I have missed something.

Nor is do I receive stop device, as there is no re-balancing in this scenario.

Strangely, when using the MS inbox driver instead of my driver, I can duplicate the same problem, though it is much harder (I cannot know what any that means, though).

Thanks for your insight.  

(If you or anyone else has/have comments/questions, please don't hesitate to post them here.)


Brian Catlin on Fri, 20 Apr 2018 18:32:10


If you can duplicate the problem with an in-box driver, then you should contact support. They don't charge you if it is their bug

 -Brian

PhilipLk on Mon, 23 Apr 2018 15:04:40


Yes, I can duplicate the same issue using the same reproduction technique using the inbox driver (though it is harder).

Thank you for feedback, much appreciated.

MDParks on Wed, 15 Aug 2018 15:45:34


I am currently debugging an issue which is pretty much 100% as you describe, PhilipLk- Thanks for posting! In my case it is the USB composite driver (usbccgp.sys) which fails to unload. The repro steps are exactly the same- disconnect and reconnect a device whilst an application is running, followed by closing the application and disconnecting/reconnecting the device again. I receive IRP_MN_SURPRISE_REMOVAL on the first disconnect, and then IRP_MN_REMOVE_DEVICE for the original PDO when the application exits. I used Object Reference Tracing and can confirm one outstanding reference to the original PDO after this...

May I ask if you ever get to the bottom of this problem? If there is a common pitfall I'd like to know- particularly if the child device driver was somehow at fault.

Doron Holan [MSFT] on Wed, 15 Aug 2018 15:53:48


what are the classes of each of the children devicess below the generic parent?   

MDParks on Wed, 15 Aug 2018 16:34:50


In my case the child device is an audio device (class Multimedia).