DynaTrace to the Rescue!

DynaTrace from Compuware is one of the world’s leading monitoring and profiling tools for live / production environments. It works in both Java and .NET environments and enables you to quickly perform root cause analysis on anything from performance bottlenecks to error storms.

There are various vendors in the Application Performance Management (APM) space, including AppDynamics and the cloud-based New Relic. Most of these other products are more geared towards monitoring, whereas DynaTrace is more geared towards profiling, which is better suited for the way troubleshooting normally works, in my opinion.

DynaTrace has both a browser and a server component, and by using both it’s possible to understand what happens all the way through a request. The browser component called Ajax Edition is free, whereas the server component comes with a (substantial) license cost.

My focus is on the server component, which in brief terms requires a “collector” server to be installed on-site and “agent” services to be installed on each server you want monitored/profiled. Depending on the amount of assemblies and namespaces being instrumented, this leads to a performance degradation of typically between 10-25% on the server — a price often worth paying in order to troubleshoot live incidents. A “client” application is then used to connect to the server and analyze data as it streams in. This may sound a little bit complicated, but it’s actually very easy and quick to setup.

Over the last 3-4 years, I have used DynaTrace on a more or less daily basis to perform the following tasks:

  • Understand performance patterns in dev / test environments during code review
  • Understand performance bottlenecks in live / production environments
  • Troubleshoot live incidents, particularly related to database and external integration points

Because of confidentialy constraints, I can’t show client screenshots, but here’s an example taken from Compuware’s APM blog.

In the example, a SharePoint page takes almost 7 seconds to respond. By diving in to the callstack in DynaTrace, it becomes immediately clear what web part is to blame and that most of the time is spent in a WaitHandle:

PurePathWithWebPartWait

As explained in the blog post, DynaTrace is used to identify the root cause of the problem — in this case, 10 threads doing simultaneous web service requests to the same remote host. And because there is a default limit of 2 simultaneous connections (which can be tweaked) the threads spend most of their time waiting for their turn to do the call.

I have used DynaTrace to quickly troubleshoot otherwise very hard to find issues on Microsoft SharePoint sites, Microsoft CRM, ASP.NET websites, console applications, Windows services — you name it. Any managed process can be profiled and monitored with DynaTrace. Here are a few real-life examples:

  • LINQ can be a performance killer. For instance, in earlier versions of .NET, there was a huge difference between calling myList.Count and myList.Count(). These days, it’s typically more a matter of people misunderstanding the “magic” that LINQ does and end up with things like myArray.ToList().First(p => p.MyNumber = 500). That’s a whole lot slower than just iterating through the array with a for-loop. Not a big deal if you don’t do it all the time, but people have a tendency to use this sort of thing inside caching mechanisms that get called hundreds of times per page request. I’ve used DynaTrace on many occations to highlight these types of problems to developers.
  • SharePoint 2007 with System.Drawing. Several years ago, a client of mine who used SharePoint 2007 to serve out more than 40 external websites started experiencing random app pool crashes, and the Windows Event Log showed occurrences of System.AccessViolationException (which is thrown when there is an attempt to read or write protected memory). Using DynaTrace, I saw a lot of calls to methods inside the System.Drawing namespace, which are not supported in ASP.NET and can in fact cause problems like memory corruption. Removing the calls to System.Drawing solved the problem.
  • Microsoft CRM does SOAP API calls.The CRM team at one of my clients couldn’t understand why their implementation was running so slow – some pages taking 10+ seconds to render. The team speculated that it was the SQL Server tier that needed to be upgraded, but had no way of knowing. Looking through their code, it wasn’t immediately obvious that they were doing something wrong. But using DynaTrace, it became clear that every time they called functions in the MSCRM API, that turned into SOAP calls, each of which in turn turned into hundreds of database calls. Armed with this type of knowledge and the ability to test out other approaches, the team was able to boost performance to within expected response times.

More than anything, profiling is about knowing instead of guessing and DynaTrace delivers that knowledge to you in real time.