[Java][JVM Logs] Monday with JVM logs - introduction and safepoints logs

The JVM logs are a powerful source of knowledge on how our application works on the virtual machine. Unfortunately they are not very well documented. In this series of articles I will try to show you what you can read from them. I will not cover how the JVM works, but I will add references if I have one.

All charts that you will find in the series are generated by my open-source project.

JVM logs I recommend:

-Xlog:
    codecache+sweep*=trace,
    class+unload,
    class+load,
    os+thread,
    safepoint,
    gc*,
    gc+stringdedup=debug,
    gc+ergo=trace,
    gc+age=trace,
    gc+phases=trace,
    gc+humongous=trace,
    jit+compilation=debug
:file=/path_to_logs/app.log
:level,tags,time,uptime,pid
:filesize=104857600,filecount=5

Reference - safepoints

My previous article about the STW phase

YouTube - With GC Solved, What Else Makes a JVM Pause?

Logging

Safepoints has dedicated logs in a unified logger. You can enable it with Xlog:safepoint. A following example comes from Java 11:

[safepoint        ] Application time: 0.1950250 seconds
[safepoint        ] Entering safepoint region: RevokeBias
[safepoint        ] Leaving safepoint region
[safepoint        ] Total time for which application threads were stopped: 0.0003424 seconds, Stopping threads took: 0.0000491 seconds

A little bit of explanation what that means:

The format is different in Java 8 and Java >= 13, but other formats contain the same information.

The mother of all chats

When I profile an application that is running on the JVM the first chart I’m looking at is the chart that shows me in a 2 seconds window how much time was wasted by the stop-the-world phases. Drawing that chart is a little pain in the ass, but my JVM logs viewer has such a functionality.

alt text

That chart shows 12 days of running a very huge application. This chart shows that in the 2 seconds window the application always runs over 1.8 seconds, usually over 1.9 seconds.

The other example for a small application:

alt text

Here is an example of the application which doesn’t work well on the JVM:

alt text

This example shows the application which doesn’t work at all on the JVM:

alt text

Safepoints statistics

If we know that our application doesn’t work well on the JVM we can generate statistics that show us, which STW phase is responsible for that issue.

alt text

These tables show that 14,43% (marked as red) of total application time our JVM executes operations in STW phase. This value should be less than 1%. The two most time-consuming operations are (marked as green):

Both operations are related to G1GC.

We can also present this statistics as charts:

Total time in phases

alt text

Which safepoint operation is most time-consuming

alt text

Why are safepoint logs so important?

Most Java developers think that the STW phase is needed for Garbage Collector. Yes, most of the issues on JVM level are related to GC, but not all of them. If you have an issue with:

you will not find the cause in the GC logs, you need safepoints logs.