Mysterious App Pool Crashes? WinDbg is Your Friend
Live profiling tools like dynaTrace can be fantastic aides in troubleshooting production problems, but sometimes you only see the problem when it’s too late. Like when the application has already crashed.
One of my clients experienced seemingly random app pool crashes on an ASP.NET MVC website in production. Neither the local, dev or test environments exhibited the same behavior, and in production it only happened rarely. Looking through the code’s change history going back to the first crash didn’t reveal any obvious problems.
The only real clues to the problem were the following type of entries in Windows Event Log:
Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2 Faulting module name: nlssorting.dll, version: 4.0.30319.17929, time stamp: 0x4ffa5787 Exception code: 0xc00000fd Fault offset: 0x00000000000043c3 Faulting process id: 0x71c Faulting application start time: 0x01cebe80d4f9282b Faulting application path: c:windowssystem32inetsrvw3wp.exe Faulting module path: C:WindowsMicrosoft.NETFramework64v4.0.30319nlssorting.dll Report Id: 5e6953fc-2a74-11e3-b7a7-001dd8b71c13
According to Google, 0xc00000fd is stack overflow exception, but that’s not a lot of information to go on.
Luckily, Windows Error Reporting (WER) had been setup correctly to generate crashdump files, each of which would contain the state of the application in the CLR when a crash happened. In this case, only a minidump was available, but since a stack overflow exception was the suspected culprit, it would probably be possible to do a postmortem analysis using one of the Windows debuggers (eg. ntsd or WinDbg).
As a side note, if WER had not generated any crashdumps, it would have been possible to run ADPlus (also a Windows debugging tool) on one of the production servers and let it wait for the next crash, at which point it would generate appropriate crashdump files.
The Windows debuggers are native debuggers and as such know nothing about the inner workings of the CLR, but the “Son of Strike” (SOS) extension developed by Microsoft can be used to properly debug .NET code. Other extensions, such as the third-party SOSEX extension can enhance the experience even further.
It would be possible to debug the crash without the client’s code symbols (PDB files), but since they were readily available for the specific build that was crashing, loading them made the experience less painful.
Here is the output from WinDbg where one of the crashdump files were loaded and the symbols and SOS extension registered:
Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64 Copyright (c) Microsoft Corporation. All rights reserved.Loading Dump File [C:tempw3wp.exe.5348.dmp] User Mini Dump File: Only registers, stack and portions of memory are availableSymbol search path is: srv* Executable search path is: Windows 7 Version 7601 (Service Pack 1) MP (2 procs) Free x64 Product: Server, suite: TerminalServer SingleUserTS Machine Name: Debug session time: Wed Oct 2 08:06:54.000 2013 (UTC + 1:00) System Uptime: not available Process Uptime: 0 days 0:05:46.000 ................................................................ Loading unloaded module list ..................................... This dump file has an exception of interest stored in it. The stored exception information can be accessed via .ecxr. Sun Oct 27 10:30:02.067 2013 (UTC + 1:00): (14e4.1104): Stack overflow - code c00000fd (first/second chance not available) nlssorting!SortGetSortKey+0x7a: 000007fe`f7c6440f e83ce5ffff call nlssorting!AreFlagsValid (000007fe`f7c62950) 0:027> .symfix 0:027> .sympath Symbol search path is: srv* Expanded Symbol search path is: cache*;SRV*http://msdl.microsoft.com/download/symbols 0:027> .sympath+ c:tempbuild Symbol search path is: srv*;c:tempbuild Expanded Symbol search path is: cache*;SRV*http://msdl.microsoft.com/download/symbols;c:tempbuild 0:027> .reload ................................................................ Loading unloaded module list ..................................... 0:027> .loadby sos clr
Note how the debugger immediately after loading the crashdump says that it contains an exception of interest and that it’s of type stack overflow.
Finding the Root Cause
Once the debugger symbols and the SOS extension are properly loaded, it’s time to dive into the crash dump and try to figure out what the problem is.
The !threads command will give an overview of the threads that were running at the time of the crash, and as an extra bonus it will inform us if an exception was thrown on each thread.
0:027> !threads ThreadCount: 19 UnstartedThread: 0 BackgroundThread: 16 PendingThread: 0 DeadThread: 3 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception 7 1 12cc 0000000001285880 28220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn 13 2 1314 00000000024a3370 2b220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Finalizer) 15 4 1574 000000000250e480 102a220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Worker) 16 5 12e4 0000000002514d30 21220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn 17 6 17a4 0000000008810c30 1029220 Preemptive 00000002041B0FF0:00000002041B12D0 0000000002489cd0 0 Ukn (Threadpool Worker) 19 7 878 00000000088113d0 1029220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Worker) XXXX 8 0 000000000881a770 8039820 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Completion Port) 20 9 a0c 000000000881da80 1029220 Preemptive 0000000000000000:0000000000000000 0000000002521970 1 Ukn (Threadpool Worker) 22 11 284 00000000089b4360 1029220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Worker) 23 12 1268 00000000089b4b30 1029220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Worker) 24 13 1700 00000000089b5300 21220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn 25 14 1514 00000000089b5ad0 21220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn XXXX 3 0 00000000089b62a0 30820 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn XXXX 15 0 00000000089b6a70 8039820 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Completion Port) 27 16 1104 00000000089b7240 1029220 Preemptive 00000001047FEE90:00000001047FFCE0 0000000002521970 1 Ukn (Threadpool Worker) System.StackOverflowException 00000001ff731188 3 19 1778 00000000088973d0 20220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn 33 23 7ac 000000001947bb60 20220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn 34 21 1470 000000001947cb00 8029220 Preemptive 0000000000000000:0000000000000000 0000000002489cd0 0 Ukn (Threadpool Completion Port) 35 22 15f0 000000001947b390 202b220 Preemptive 0000000000000000:0000000000000000 0000000002521970 1 Ukn
Thread 27 in the above listing says it encountered a StackOverflowException, which fits with the clue in the Event Log and the message in the debugger right after the crashdump was loaded.
A few things worth noting: Many other threads also have text in the Exception column, like (Threadpool Worker) and (Finalizer). These are NOT exceptions, rather the column is also used to output auxiliary data such as stating which thread is the Garbage Collector Finalizer thread, etc. Also, thread 27 is already the active thread, which can be seen in the prompt 0:027. If it wasn’t the active thread, the incredibly intuitive command ~27s would switch to it.
Since we’re investigating a stack overflow, getting the thread’s stack would probably uncover the problem. The !ClrStack command does just that. I have cut away some lines at the top and bottom that are not relevant and also trimmed away hundreds of repeats of a pattern that’s clearly visible below:
0:027> !ClrStack OS Thread Id: 0x1104 (27) Child SP IP Call Site (repeat ad nauseam) 000000000b169280 000007fe980b90b0 MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember(MongoDB.Bson.IO.BsonWriter, System.Object, MongoDB.Bson.Serialization.BsonMemberMap) 000000000b1692e0 000007fe980b8d7a MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(MongoDB.Bson.IO.BsonWriter, System.Type, System.Object, MongoDB.Bson.Serialization.IBsonSerializationOptions) 000000000b169360 000007fe980c1b60 MongoDB.Bson.BsonExtensionMethods.ToJson(System.Object, System.Type, MongoDB.Bson.Serialization.IBsonSerializationOptions, MongoDB.Bson.IO.JsonWriterSettings) 000000000b1693f0 000007fe980c1558 TheClient.SomeCode.Helpers.SomeLogRenderer.Append(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b169590 000007fe97ac7c69 TheLogger.LayoutRenderers.LayoutRenderer.Render(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b169610 000007fe980c04d3 TheLogger.Layouts.SimpleLayout.GetFormattedMessage(TheLogger.LogEventInfo) 000000000b1696b0 000007fe980c024c TheLogger.Targets.Target.PrecalculateVolatileLayouts(TheLogger.LogEventInfo) 000000000b169760 000007fe980c0133 TheLogger.Targets.Wrappers.AsyncTargetWrapper.Write(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b1697b0 000007fe980bff2e TheLogger.Targets.Target.WriteAsyncLogEvent(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b169850 000007fe980bfb4b TheLogger.LoggerImpl.WriteToTargetWithFilterChain(TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.SomeCode.AsyncContinuation) 000000000b1698c0 000007fe980bf6fe TheLogger.LoggerImpl.Write(System.Type, TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.LogFactory) 000000000b169920 000007fe97efd2a3 TheClient.SomeCode.Logging.LoggerAdapter.Log(System.Type, TheLogger.LogEventInfo) 000000000b169960 000007fe991588c0 TheClient.SomeCode.Helpers.LogHelper.LogError(System.String, System.String, System.String, System.Exception, TheClient.SomeCode.Helpers.LogMeta[]) 000000000b169a00 000007fe99158c42 TheClient.SomeClient.Utils.UserResolver.ResetState(System.Exception) 000000000b169b00 000007fe98cedb97 TheClient.SomeClient.Utils.UserResolver.ResolveUser() 000000000b169bd0 000007fe98ced8c8 TheClient.SomeClient.Utils.ClientContextRes.<get_Current>b__0() 000000000b169c20 000007fe980c5f27 TheClient.SomeCode.ClientContexts.ClientRequestContext.get_User() 000000000b169cf0 000007fe97e05f6f <unknown> 000000000b169d20 000007fe980b90b0 MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember(MongoDB.Bson.IO.BsonWriter, System.Object, MongoDB.Bson.Serialization.BsonMemberMap) 000000000b169d80 000007fe980b8d7a MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(MongoDB.Bson.IO.BsonWriter, System.Type, System.Object, MongoDB.Bson.Serialization.IBsonSerializationOptions) 000000000b169e00 000007fe980c1b60 MongoDB.Bson.BsonExtensionMethods.ToJson(System.Object, System.Type, MongoDB.Bson.Serialization.IBsonSerializationOptions, MongoDB.Bson.IO.JsonWriterSettings) 000000000b169e90 000007fe980c1558 TheClient.SomeCode.Helpers.SomeLogRenderer.Append(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b16a030 000007fe97ac7c69 TheLogger.LayoutRenderers.LayoutRenderer.Render(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b16a0b0 000007fe980c04d3 TheLogger.Layouts.SimpleLayout.GetFormattedMessage(TheLogger.LogEventInfo) 000000000b16a150 000007fe980c024c TheLogger.Targets.Target.PrecalculateVolatileLayouts(TheLogger.LogEventInfo) 000000000b16a200 000007fe980c0133 TheLogger.Targets.Wrappers.AsyncTargetWrapper.Write(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b16a250 000007fe980bff2e TheLogger.Targets.Target.WriteAsyncLogEvent(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b16a2f0 000007fe980bfb4b TheLogger.LoggerImpl.WriteToTargetWithFilterChain(TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.SomeCode.AsyncContinuation) 000000000b16a360 000007fe980bf6fe TheLogger.LoggerImpl.Write(System.Type, TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.LogFactory) 000000000b16a3c0 000007fe97efd2a3 TheClient.SomeCode.Logging.LoggerAdapter.Log(System.Type, TheLogger.LogEventInfo) 000000000b16a400 000007fe991588c0 TheClient.SomeCode.Helpers.LogHelper.LogError(System.String, System.String, System.String, System.Exception, TheClient.SomeCode.Helpers.LogMeta[]) 000000000b16a4a0 000007fe99158c42 TheClient.SomeClient.Utils.UserResolver.ResetState(System.Exception) 000000000b16a5a0 000007fe98cedb97 TheClient.SomeClient.Utils.UserResolver.ResolveUser() 000000000b16a670 000007fe98ced8c8 TheClient.SomeClient.Utils.ClientContextRes.<get_Current>b__0() 000000000b16a6c0 000007fe980c5f27 TheClient.SomeCode.ClientContexts.ClientRequestContext.get_User() 000000000b16a790 000007fe97e05f6f <unknown> 000000000b16a7c0 000007fe980b90b0 MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember(MongoDB.Bson.IO.BsonWriter, System.Object, MongoDB.Bson.Serialization.BsonMemberMap) 000000000b16a820 000007fe980b8d7a MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(MongoDB.Bson.IO.BsonWriter, System.Type, System.Object, MongoDB.Bson.Serialization.IBsonSerializationOptions) 000000000b16a8a0 000007fe980c1b60 MongoDB.Bson.BsonExtensionMethods.ToJson(System.Object, System.Type, MongoDB.Bson.Serialization.IBsonSerializationOptions, MongoDB.Bson.IO.JsonWriterSettings) 000000000b16a930 000007fe980c1558 TheClient.SomeCode.Helpers.SomeLogRenderer.Append(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b16aad0 000007fe97ac7c69 TheLogger.LayoutRenderers.LayoutRenderer.Render(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b16ab50 000007fe980c04d3 TheLogger.Layouts.SimpleLayout.GetFormattedMessage(TheLogger.LogEventInfo) 000000000b16abf0 000007fe980c024c TheLogger.Targets.Target.PrecalculateVolatileLayouts(TheLogger.LogEventInfo) 000000000b16aca0 000007fe980c0133 TheLogger.Targets.Wrappers.AsyncTargetWrapper.Write(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b16acf0 000007fe980bff2e TheLogger.Targets.Target.WriteAsyncLogEvent(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b16ad90 000007fe980bfb4b TheLogger.LoggerImpl.WriteToTargetWithFilterChain(TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.SomeCode.AsyncContinuation) 000000000b16ae00 000007fe980bf6fe TheLogger.LoggerImpl.Write(System.Type, TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.LogFactory) 000000000b16ae60 000007fe97efd2a3 TheClient.SomeCode.Logging.LoggerAdapter.Log(System.Type, TheLogger.LogEventInfo) 000000000b16aea0 000007fe991588c0 TheClient.SomeCode.Helpers.LogHelper.LogError(System.String, System.String, System.String, System.Exception, TheClient.SomeCode.Helpers.LogMeta[]) 000000000b16af40 000007fe99158c42 TheClient.SomeClient.Utils.UserResolver.ResetState(System.Exception) 000000000b16b040 000007fe98cedb97 TheClient.SomeClient.Utils.UserResolver.ResolveUser() 000000000b16b110 000007fe98ced8c8 TheClient.SomeClient.Utils.ClientContextRes.<get_Current>b__0() 000000000b16b160 000007fe980c5f27 TheClient.SomeCode.ClientContexts.ClientRequestContext.get_User() 000000000b1d3650 000007fe97e05f6f <unknown> 000000000b1d3680 000007fe980b90b0 MongoDB.Bson.Serialization.BsonClassMapSerializer.SerializeMember(MongoDB.Bson.IO.BsonWriter, System.Object, MongoDB.Bson.Serialization.BsonMemberMap) 000000000b1d36e0 000007fe980b8d7a MongoDB.Bson.Serialization.BsonClassMapSerializer.Serialize(MongoDB.Bson.IO.BsonWriter, System.Type, System.Object, MongoDB.Bson.Serialization.IBsonSerializationOptions) 000000000b1d3760 000007fe980c1b60 MongoDB.Bson.BsonExtensionMethods.ToJson(System.Object, System.Type, MongoDB.Bson.Serialization.IBsonSerializationOptions, MongoDB.Bson.IO.JsonWriterSettings) 000000000b1d37f0 000007fe980c1558 TheClient.SomeCode.Helpers.SomeLogRenderer.Append(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b1d3990 000007fe97ac7c69 TheLogger.LayoutRenderers.LayoutRenderer.Render(System.Text.StringBuilder, TheLogger.LogEventInfo) 000000000b1d3a10 000007fe980c04d3 TheLogger.Layouts.SimpleLayout.GetFormattedMessage(TheLogger.LogEventInfo) 000000000b1d3ab0 000007fe980c024c TheLogger.Targets.Target.PrecalculateVolatileLayouts(TheLogger.LogEventInfo) 000000000b1d3b60 000007fe980c0133 TheLogger.Targets.Wrappers.AsyncTargetWrapper.Write(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b1d3bb0 000007fe980bff2e TheLogger.Targets.Target.WriteAsyncLogEvent(TheLogger.SomeCode.AsyncLogEventInfo) 000000000b1d3c50 000007fe980bfb4b TheLogger.LoggerImpl.WriteToTargetWithFilterChain(TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.SomeCode.AsyncContinuation) 000000000b1d3cc0 000007fe980bf6fe TheLogger.LoggerImpl.Write(System.Type, TheLogger.Internal.TargetWithFilterChain, TheLogger.LogEventInfo, TheLogger.LogFactory) 000000000b1d3d20 000007fe97efd2a3 TheClient.SomeCode.Logging.LoggerAdapter.Log(System.Type, TheLogger.LogEventInfo) 000000000b1d3d60 000007fe991588c0 TheClient.SomeCode.Helpers.LogHelper.LogError(System.String, System.String, System.String, System.Exception, TheClient.SomeCode.Helpers.LogMeta[]) 000000000b1d3e00 000007fe99158644 TheClient.SomeClient.Utils.ErrorLoggingHandler.LogException(System.Exception) 000000000b1d4350 000007fe99156f36 TheClient.SomeClient.Utils.ErrorLoggingHandler.LogError(System.Exception) 000000000b1d43b0 000007fe991568e2 TheClient.SomeClient.Utils.BaseController.OnException(System.Web.Mvc.ExceptionContext) 000000000b1d4590 000007fe9824bd0a System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass25.<BeginInvokeAction>b__22(System.IAsyncResult) 000000000b1d67b8 000007fef6f76451 [HelperMethodFrame: 000000000b1d67b8] 000000000b1d68a0 000007fe9824c3b4 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass4f.<InvokeActionMethodFilterAsynchronously>b__49() 000000000b1d8ae8 000007fef6f76451 [HelperMethodFrame: 000000000b1d8ae8] 000000000b1d8bd0 000007fe9824c3b4 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass4f.<InvokeActionMethodFilterAsynchronously>b__49() 000000000b1dae18 000007fef6f76451 [HelperMethodFrame: 000000000b1dae18] 000000000b1daf00 000007fe9824c3b4 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass4f.<InvokeActionMethodFilterAsynchronously>b__49() 000000000b1dd050 000007fef6f76451 [FaultingExceptionFrame: 000000000b1dd050] 000000000b1dd5a0 000007fe98df1933 TheClient.Logic.Users.UserManager.GetFollowingQueryables() 000000000b1dd6f0 000007fe98cb4feb TheClient.Logic.EventSharing.EventSharingManager.GetOrQueryablesFromFilter(TheClient.DomainModel.EventSharing.EventSharingQueryFilter) 000000000b1dd7c0 000007fe98cb4bf8 TheClient.Logic.EventSharing.EventSharingManager.GetEventActivityData(TheClient.DomainModel.EventSharing.EventSharingQueryFilter, System.String[]) 000000000b1dd8d0 000007fe98cb4b60 TheClient.Logic.EventSharing.EventSharingManager.GetEventActivityData(TheClient.DomainModel.EventSharing.EventSharingQueryFilter) 000000000b1dd920 000007fe98cb014a TheClient.Service.EventSharing.EventSharingService.GetSideBarData(TheClient.Contracts.DataTransferObjects.EventSharing.EventSharingQuery) 000000000b1dda80 000007fe98c2ed86 TheClient.SomeClient.Areas.Sidebar.Controllers.SidebarController.GetInitialData(TheClient.Contracts.DataTransferObjects.EventSharing.EventSharingQuery) 000000000b1ddae0 000007fe98c5cd2f <unknown> 000000000b1ddb20 000007fe9824c792 System.Web.Mvc.ReflectedActionDescriptor.Execute(System.Web.Mvc.ControllerContext, System.Collections.Generic.IDictionary`2<System.String,System.Object>) 000000000b1ddb70 000007fe9824c647 System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(System.Web.Mvc.ControllerContext, System.Web.Mvc.ActionDescriptor, System.Collections.Generic.IDictionary`2<System.String,System.Object>) 000000000b1ddbb0 000007fe9824c488 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass37+<>c__DisplayClass39.<BeginInvokeActionMethodWithFilters>b__33() 000000000b1ddc00 000007fe9824c230 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass4f.<InvokeActionMethodFilterAsynchronously>b__49() 000000000b1ddc90 000007fe9824c230 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass4f.<InvokeActionMethodFilterAsynchronously>b__49() 000000000b1ddd20 000007fe9824c230 System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass4f.<InvokeActionMethodFilterAsynchronously>b__49() 000000000b1dddb0 000007fe9824c1cf System.Web.Mvc.Async.AsyncControllerActionInvoker+<>c__DisplayClass37.<BeginInvokeActionMethodWithFilters>b__36(System.IAsyncResult)
Starting from the bottom, the client’s GetSideBarData() method seems to do a call into GetOrQueryablesFromFilter() which constructs a MongoDB database query based on user data. But something apparently went wrong there, which triggered a call to a log helper function. That seems reasonable enough, but then something strange happened: During the log operation, it seems that additional user data was looked up in the MongoDB database and that caused an error, which was sent to the log helper, which looked up additional user data, which caused an error, and so on.
Because this was a minidump, no heap information was available, and so it was not possible to figure out what data caused the problem. But armed with the above stack trace, it was easy to figure out what was going on by looking at the code.
It turned out that someone had recently checked in code that tried to be clever inside a log helper function in order to log as much “relevant” data as possible and unfortunately the code review didn’t flag this as problematic.
Apart from showing the awesome strength of the Windows debugging tools (while at the same time exposing their obscure nature), the above troubleshooting session also taught everybody a valuable lesson: Once you send something to a log helper function to be logged, that “something” must be complete and not call into any other code, because that could potentially break the log operation. Or worse, as in this case, cause an endless loop of error logging operations that end up crashing the application.
Of course everybody already knew this, but sometimes you need a dead body on the table and a portmortem to really understand the impact of bad coding practices.