Presented by Bill Au, of the Platform Infrastructure group at CNET

Bill is going to help us learn about how to troubleshoot and monitor java apps, thread and heap dumps, hung or slow apps, OutOfMemoryErrors, and JVM crashes. All of the tools he is going to show us are free tools - open source or free for download.

NOTE: This was a very interesting session. One of the things I was impressed with was his demonstrations of some tools that I have not seen. There was the HP JMeter tool, an open source tool called Samurai, and a perl script that he wrote. All of them look very helpful, and I want to try them all out (not that I have every written any apps that ever have performance issues - but I’m sure I can help someone else look at theirs!)

Monitoring

Within JDK 5 (Sun’s), there is a java.lang.management package that has quite a few MX beans (i.e. MemoryMXBean, RuntimeMXBean). These beans will allow you to get all of the information you might need to monitor the JVM. You can see sample code for how to use them in: $JAVA_HOME/demo/management.

Management tools

The tools above can be run against a running JVM or a core dump file if the JVM crashed.

jstat - can be run against a running instance from a command line to get monitoring information. jconsole - a GUI for those not comfortable with the console - must open JMX port on running JVM (in JDK 6 there is an attach-on-demand functionality). jconsole also support plugins. There is an example in $JAVA_HOME/demo/management/JTop that is a very useful plugin. garbage collection enable this because it is handy: -Xloggc:. It will give you the timestamp of GC events and size before and after.

Thread and Heap Dumps

Bill recommends taking one, and then after five seconds, take another (do this several times). Once you have several, you can compare where threads are to see which ones are locking.

If you turn this option (-XX:+HeapDumpOnOutOfMemoryError) on, then if you get an OOM, the memory will be logged so that you can debug where the memory was used at the time of the crash. jmap and jconsole also allow you to take heap dumps. hprof can also give you information for the heap, but it was significant overhead - so you don’t want to use it in production.

Hung or Slow App - Debugging

  1. Look in garbage collection logs - how long is the application pausing for garbage collection (five seconds is a long time). Look for how much overall time that the JVM is spending garbage collecting rather than running your code? You can tune this by modifying the heap size - of course, a larger heap takes a longer time to garbage collect. Making one change at a time obviously gives you more of an opportunity to see what actual worked.
  2. HPjmeter.jar - a free tool from HP (this is not Apache JMeter). This is a tool for monitoring HP’s JVM, but it can also analyze garbage collection logs from most JVMs. It will give you a chart of how much time is spent in garbage collection, and how many GC events, as well as average duration, etc.
  3. One comment was made that if you are using large heap sizes (8gb-16gb), you need to make sure to use the appropriate (larger) page size. I’ll have to Google this to find out more.
  4. Beware of over-optimizing - your app will change over time. Bill suggests trying to find a good heap size, and perhaps using concurrent garbage collection rather than full garbage collection, even though this has higher overhead.
  5. Deadlock - if you encounter a dead lock, you obviously want to take a thread dump (see above) to run the deadlock detector.
  6. Loop threads - You may get certain threads in a long-running loop. To find this, monitor your CPU times of threads, using ThreadMXBean or jconsole with jtop (see above for more details)
  7. Blocked threads - there is an open source GUI tool called “samurai” that analyzes thread dumps for you. It can also understand consequtive thread dumps and display the information in an easy-to-read format.
  8. Bill also wrote a perl script that gives you an overview of a thread dump. Both of these tools are located in the same folder as his slides (link included at bottom of this post). His perl script gives a very nice overview, including how many threads are locked, and where they are locked. Running it against several thread dumps that are several seconds apart gives you a good look at what’s going on.
  9. Stuck threads - you may not have deadlocks, but you can still get stuck threads. Typical causes include network I/O without a timeout set. You can use the thread dump techniques to analyze this the same as you would if you had deadlocked threads.

OutOfMemoryError - heap

Some common causes:

OutOfMemoryError - permgen

Some common causes:

OutOfMemoryError - too many threads

OutOfMemoryError - native memory