[Java][JVM] JVM internals basics - Stop-the-world phase (safepoints) - how it works?

Why does the JVM need STW phase?

The common knowledge in the Java developers world is that garbage collectors need STW phase to clean dead objects. First of all, not only GC needs it. There are other internal mechanisms that need to do some work, that require application threads to be hanged . For example JIT compiler needs STW phase to deoptimize some compilations and to revoke biased locks.

How it works step by step

On your JVM there are running some application threads:

alt text

While running those threads from time to time JVM needs to do some work in the STW phase. So it start it with global safepoint request, which is an information for every thread to go to sleep:

alt text

Every thread has to find out about this information. Checking if it needs to fall asleep is simply a line on assembly code generated by the JIT compiler and a simple step in the interpreter. Of course every thread can now execute a different method/JIT compilation, so time in which threads are going to be aware of STW phase is different for every thread.
Every thread has to wait for the slowest one. Time between starting STW phase, and the slowest thread finding that information is called time to safepoint:

alt text

Only after every thread is asleep, JVM threads can do the work that needed STW phase. A time when application threads were sleeping is called safepoint operation time:

alt text

When JVM finishes its work application threads are waken up:

alt text

What is important from all of this?

Logging

Safepoints has dedicated logs in a unified logger. You can enable it with Xlog:safepoint. A following example comes from Java 11:

[safepoint        ] Application time: 0.1950250 seconds
[safepoint        ] Entering safepoint region: RevokeBias
[safepoint        ] Leaving safepoint region
[safepoint        ] Total time for which application threads were stopped: 0.0003424 seconds, Stopping threads took: 0.0000491 seconds

A little bit of explanation what that means:

From that log we can generate cool charts. The first one is a pie chart that shows where our application is spending its time:

alt text

Second one shows distribution of count of each safepoint operation:

alt text

Next one shows distribution of time wasted by safepoint operation:

alt text

Next one is probably the most important one, it shows in two second window which fraction of that time we spent running our application. If I have an issue in any JVM application this is the first chart I look on.

alt text

The last one shows distribution of time wasted in STW phases over time:

alt text

Final words

The Safepints logs are the only place where you can find complex information about all STW phases with time to safepoint. You should have them enabled on every JVM.