[Java][JDK][Spring][Profiling] From one JDK bug to another?

Profiled application

JDK 11u5 from Oracle
The application used Spring-WS as SOAP WebService client
The application was creating ~1000000 short-lived threads per day

A thread creation is considered to be a heavy resource consuming operation. What we want to do is to find out why those threads have been created, so we can use some thread pool there. This issue is easy to reproduce, you can fetch a simplified version of such an application from My GitHub.

Simplified application

Application contains two modules:

pl.ks.jdkbug.ws.server - contains SOAP server
pl.ks.jdkbug.ws.client - contains SOAP client for the server mentioned above

The SOAP server endpoint code is:

    @PayloadRoot(namespace = NAMESPACE_URI, localPart = "TempRequest")
    @ResponsePayload
    TempResponse temp(@RequestPayload TempRequest request) {
        response.addHeader("Connection", "close");
        return new TempResponse();
    }

A crucial part of the server is the “Connection: close” header. The server was not a part of original profiled application, it was an external service, so that header was imposed by third party.

The SOAP client can be run by a http request https://localhost:8443/.

Running the simplified application

Let’s run our simplified application with -Xlog:os+thread. When we run SOAP client from our controller:

    @GetMapping(value = "/", produces = MediaType.TEXT_PLAIN_VALUE)
    void get(HttpServletResponse response) {
        log.info("Invoking WS");
        tempClient.temp(new TempRequest());
        log.info("WS Invoked");
        response.getWriter().write("WS Invoked");
        response.getWriter().flush();
    }

In our logs we can see:

2020-12-13 10:12:39.246  INFO 27506 --- [nio-8443-exec-7] pl.ks.jdkbug.ws.client.ClientController  : Invoking WS
[30,994s][info][os,thread] Thread started (pthread id: 139778594207488, attributes: stacksize: 1024k, guardsize: 0k, detached). 
[30,994s][info][os,thread] Thread is alive (tid: 27725, pthread id: 139778594207488).
[30,994s][info][os,thread] JavaThread exiting (tid: 27725).
[30,995s][info][os,thread] Thread finished (tid: 27725, pthread id: 139778594207488).
2020-12-13 10:12:39.278  INFO 27506 --- [nio-8443-exec-7] pl.ks.jdkbug.ws.client.ClientController  : WS Invoked

Such a log is created for each request. So our application creates a thread for each request.

How to find where threads are created

Threads in Java are started when we use java.lang.Thread.start(). So we need to find usages of that method.

Debugger

If you can run an application with a debugger attached and recreate this behavior, you can simply add a breakpoint on the method mentioned above. For the simplified application what you will find is such a stacktrace:

alt text

Async-profiler

In production environment we can attach to running JVM with async-profiler to find the stacktraces that use any java method. What async-profiler is doing is simple method instrumentation, and instrumentation of java.lang.Thread.start() method can create multiple JIT decompilations (long story short: first class retransformation on the running JVM may remove all JIT compilations).

It can be visible when you run an application with -Xlog:jit+compilation=debug. Then you can attach a profiler with method instrumentation:

./profiler.sh -d 10 -f out.html -e "java.lang.Thread.start" JdkBugApplication

A little explanation of the parameters:

Duration time, ten seconds: -d 10
Output file: -f out.html - in async-profiler 2.0 this will create a Flame Graph
Method to instrument is passed as event: -e “java.lang.Thread.start”
JdkBugApplication is my main class

In the JVM logs you will find multiple entries:

...
[34,033s][debug][jit,compilation]  144       1       java.util.AbstractSet::<init> (5 bytes)   made not entrant
[34,033s][debug][jit,compilation]  159       1       java.util.AbstractMap::<init> (5 bytes)   made not entrant
[34,033s][debug][jit,compilation]  148       1       java.util.concurrent.ConcurrentHashMap::casTabAt (21 bytes)   made not entrant
[34,033s][debug][jit,compilation]  162       1       java.util.HashMap::<init> (11 bytes)   made not entrant
[34,033s][debug][jit,compilation]  158       1       java.util.HashSet::<init> (16 bytes)   made not entrant
[34,033s][debug][jit,compilation]  160       1       java.util.ImmutableCollections$Set12::iterator (9 bytes)   made not entrant
[34,033s][debug][jit,compilation]  161       1       java.util.ImmutableCollections$Set12$1::<init> (21 bytes)   made not entrant
...

A Made not entrant mark means that such a JIT compilation is no longer valid, and a JIT compiler needs to create a new one. This can make your application run slower until new compilations are created (with higher CPU consumption in that period).

To do it in a safer way we need to look inside JVM and JDK source code. Let’s start with JDK java.lang.Thread.start() method:

    public synchronized void start() {
        // ...
        group.add(this);

        boolean started = false;
        try {
            start0();
            started = true;
        } finally {
            try {
                if (!started) {
                    group.threadStartFailed(this);
                }
            } catch (Throwable ignore) {
                /* do nothing. If start0 threw a Throwable then
                  it will be passed up the call stack */
            }
        }
    }

    private native void start0();

So method start() uses method start0() which is native. We can trace its usage, but we need to find its name in JVM C++ source code. In java.base/share/native/libjava/Thread.c we can find definition of start0():

static JNINativeMethod methods[] = {
    {"start0",           "()V",        (void *)&JVM_StartThread},
    {"stop0",            "(" OBJ ")V", (void *)&JVM_StopThread},
    {"isAlive",          "()Z",        (void *)&JVM_IsThreadAlive},
    {"suspend0",         "()V",        (void *)&JVM_SuspendThread},
    {"resume0",          "()V",        (void *)&JVM_ResumeThread},
    {"setPriority0",     "(I)V",       (void *)&JVM_SetThreadPriority},
    {"yield",            "()V",        (void *)&JVM_Yield},
    {"sleep",            "(J)V",       (void *)&JVM_Sleep},
    {"currentThread",    "()" THD,     (void *)&JVM_CurrentThread},
    {"countStackFrames", "()I",        (void *)&JVM_CountStackFrames},
    {"interrupt0",       "()V",        (void *)&JVM_Interrupt},
    {"isInterrupted",    "(Z)Z",       (void *)&JVM_IsInterrupted},
    {"holdsLock",        "(" OBJ ")Z", (void *)&JVM_HoldsLock},
    {"getThreads",        "()[" THD,   (void *)&JVM_GetAllThreads},
    {"dumpThreads",      "([" THD ")[[" STE, (void *)&JVM_DumpThreads},
    {"setNativeName",    "(" STR ")V", (void *)&JVM_SetNativeThreadName},
};

So we need to find usage of JVM_StartThread, we can trace its usages with async-profiler:

./profiler.sh -d 10 -f out.html -e "JVM_StartThread" JdkBugApplication

This tracing doesn’t cause massive JIT decompilations. What we can see in the profiling output is the same stacktrace as in the debugger:

alt text

finishHandshake()

From our analysis we know that new threads are created in sun.security.ssl.TransportContext.finishHandshake(). Source code of that method is:

    HandshakeStatus finishHandshake() {
        if (protocolVersion.useTLS13PlusSpec()) {
            outputRecord.tc = this;
            inputRecord.tc = this;
            cipherSuite = handshakeContext.negotiatedCipherSuite;
            inputRecord.readCipher.baseSecret =
                    handshakeContext.baseReadSecret;
            outputRecord.writeCipher.baseSecret =
                    handshakeContext.baseWriteSecret;
        }

        handshakeContext = null;
        outputRecord.handshakeHash.finish();
        inputRecord.finishHandshake();
        outputRecord.finishHandshake();
        isNegotiated = true;

        // Tell folk about handshake completion, but do it in a separate thread.
        if (transport instanceof SSLSocket &&
                sslConfig.handshakeListeners != null &&
                !sslConfig.handshakeListeners.isEmpty()) {
            HandshakeCompletedEvent hce =
                new HandshakeCompletedEvent((SSLSocket)transport, conSession);
            Thread thread = new Thread(
                null,
                new NotifyHandshake(sslConfig.handshakeListeners, hce),
                "HandshakeCompletedNotify-Thread",
                0,
                false);
            thread.start();
        }

        return HandshakeStatus.FINISHED;
    }

Yes, there is a new Thread().start() invoked in that method. Profiling of the original application showed that those 1000000 threads were also created in finishHandshake().

I wanted to issue a bug to JVM, but I found out that it is already issued: JDK-8246039.

What can we do?

We can:

Disable SSL on that communication - this cannot be done in thr original application because of the security reasons
Change “Connection” header to “keep-alive” on server side - unfortunately this is third party software, and I cannot change its code
Try to change connection mechanism from core Java to Apache Http Client or other library

Let’s try to implement the third option.

Changing Spring-WS client to use Apache Http Client

The final code of that change is pushed into the apache branch on GitHub.

The change is simple, we need to define HTTP Client and pass it to Spring. Client definition:

    @Bean
    @SneakyThrows
    HttpClient httpClient(PoolingHttpClientConnectionManager httpClientConnectionManager) {
        // Do not copy paste that configuration, this is simple config for article purposes.
        RequestConfig requestConfig = RequestConfig.custom()
                .build();

        return HttpClients.custom()
                .addInterceptorFirst(new HttpComponentsMessageSender.RemoveSoapHeadersInterceptor())
                .setDefaultRequestConfig(requestConfig)
                .setConnectionManager(httpClientConnectionManager)
                .build();
    }   

Passing to Spring is done in pl.ks.jdkbug.ws.client.WebServiceProxy.afterPropertiesSet():

        webServiceTemplate.setMessageSender(new HttpComponentsMessageSender(httpClient));

Great, job is done, let’s run the application and see if the threads are not created anymore. One more time we use -Xlog:os+thread flag to trace thread creation, and run the SOAP client. This time in log file we can see:

2020-12-15 18:16:39.331  INFO 11578 --- [nio-8443-exec-3] pl.ks.jdkbug.ws.client.ClientController  : Invoking WS
2020-12-15 18:16:39.372  INFO 11578 --- [nio-8443-exec-3] pl.ks.jdkbug.ws.client.ClientController  : WS Invoked
2020-12-15 18:16:39.520  INFO 11578 --- [nio-8443-exec-1] pl.ks.jdkbug.ws.client.ClientController  : Invoking WS
2020-12-15 18:16:39.556  INFO 11578 --- [nio-8443-exec-1] pl.ks.jdkbug.ws.client.ClientController  : WS Invoked
2020-12-15 18:16:39.712  INFO 11578 --- [nio-8443-exec-9] pl.ks.jdkbug.ws.client.ClientController  : Invoking WS
2020-12-15 18:16:39.745  INFO 11578 --- [nio-8443-exec-9] pl.ks.jdkbug.ws.client.ClientController  : WS Invoked

Our application stopped creating thread while invoking SOAP WebService. I thought that the problem was solved, and then …

… little disaster :(

After deploying a new application version to the production its performance degraded by ~10%. Let’s see if this also happened to our simplified application.

Before change:

pasq@pasq-MS-7C37:~$ time for i in `seq 1 500`; do curl --insecure -s https://localhost:8443/ > /dev/null; done	
real	0m23,093s

After change:

pasq@pasq-MS-7C37:~$ time for i in `seq 1 500`; do curl --insecure -s https://localhost:8443/ > /dev/null; done
real	0m28,930s

So we tried to make the application work better, but it ended with degraded performance. Why?

Continuous profiling to the rescue

The profiled application has attached the async-profiler in the continuous mode, that dumps output every minute to the hard drive. I could fetch profiling results from the old version and new version and compare it.

Let’s do the same with our simplified application:

./profiler.sh -d 10 -t -e wall -o collapsed -f output.txt JdkBugApplication

I use wall-clock mode, it is more suitable when we want to find where end user time is wasted. I use my open source tool to view the profiling result. You can find it on GitHub. In this tool you can create Flame Graphs for methods you are interested in. Let’s see Flame Graphs of ClientController.get() and see what has changed.

Flame Graph before change (HTML version): alt text

Flame Graph after change (HTML version): alt text

I highlighted the socketRead method. On the bottom of the Flame Graphs we can see that:

before change reading from the socket took 45% of the time of ClientController.get() method
after change reading from the socket took 64% of the time of ClientController.get() method

In the previous chapter we run our controller 500 times, let’s assign results from that test to results of the profiler.

before change reading from the socket took 45% * 23,093s = 10,39s
after change reading from the socket took 64% * 28,930s = 18,52s

Little disclaimer: I simplified the situation above, there are situations where some magic is happening before our controller starts its work, but showing such a big Flame Graphs (with all spring/tomcat magic) would be unreadable.

Reading from the socket? WTF

What we have done so far was just an exchange of HTTP client, why Apache spends more time reading from the socket?

Let’s look again on the Flame Graphs. We can see that the socket read is invoked from sun.security.ssl.SSLSocketImpl.startHandshake() method. The JDK can log its work, we need to add -Djavax.net.debug=all switch to fetch it. After turning that logs on, on the version with Apache we can see such an exception every request:

javax.net.ssl|ERROR|2D|https-jsse-nio-8443-exec-5|2020-12-15 19:17:24.055 CET|TransportContext.java:312|Fatal (INTERNAL_ERROR): closing inbound before receiving peer's close_notify (
"throwable" : {
  javax.net.ssl.SSLException: closing inbound before receiving peer's close_notify
  	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:133)
  	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
  	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
  	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:263)
  	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:254)
  	at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:645)
  	at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:624)
  	at org.apache.http.impl.BHttpConnectionBase.close(BHttpConnectionBase.java:325)
  	at org.apache.http.impl.conn.LoggingManagedHttpClientConnection.close(LoggingManagedHttpClientConnection.java:81)

You can see the full log here: before, after.

I believe that this may be a JDK-8253368 bug, but I’m not entirely sure.

Another difference is that core Java fills SSL session id, and Apache does not:

Java:

"ClientHello": {
  "client version"      : "TLSv1.2",
  "random"              : "AF 0D 19 FB 49 DF E4 F5 28 10 00 9C 04 45 21 A0 78 73 CB A2 54 65 64 58 28 C6 23 C4 63 2D 99 E9",
  "session id"          : "62 E9 49 09 67 90 CE BB BD A5 54 95 81 AC 94 33 C8 96 F5 39 A9 AE D0 3C A8 5D FC 4C 4F 33 18 30",
...
}

Apache:

"ClientHello": {
  "client version"      : "TLSv1.2",
  "random"              : "25 2E 2D FB 7F BC 60 54 39 59 52 0B 59 F6 78 F7 23 E9 60 87 0C 08 5A 37 02 33 F5 28 78 F8 07 4E",
  "session id"          : "",
...
}

Last big difference I see is that Apache fetches certificate details every request:

javax.net.ssl|DEBUG|2C|https-jsse-nio-8443-exec-3|2020-12-15 21:27:00.622 CET|CertificateMessage.java:357|Consuming server Certificate handshake message (
"Certificates": [
  "certificate" : {
    "version"            : "v3",
    "serial number"      : "5A C6 4D 01",
    "signature algorithm": "SHA256withRSA",
    "issuer"             : "CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown",
...
}]

Core Java does not do it.

Time for theory

I believe that such a behavior is caused by the exception I mentioned above. From TLS specification:

Alert messages with a level of fatal result in the immediate termination of the connection. In this case, other connections corresponding to the session may continue, but the session identifier must be invalidated, preventing the failed session from being used to establish new connections.

In the exception I mentioned we can see:

Fatal (INTERNAL_ERROR): closing inbound before receiving peer’s close_notify

I believe that this is why Apache cannot reuse the SSL session and it needs to fetch a certificate every request.

Let’s go deeper

If we look at the stacktrace of the exception, we can see method sun.security.ssl.SSLSocketImpl.shutdownInput. Its source code:

    @Override
    public void shutdownInput() throws IOException {
        shutdownInput(true);
    }

    private void shutdownInput(
            boolean checkCloseNotify) throws IOException {
        if (isInputShutdown()) {
            return;
        }

        if (SSLLogger.isOn && SSLLogger.isOn("ssl")) {
            SSLLogger.fine("close inbound of SSLSocket");
        }

        if (checkCloseNotify && !conContext.isInputCloseNotified &&
            (conContext.isNegotiated || conContext.handshakeContext != null)) {

            throw conContext.fatal(Alert.INTERNAL_ERROR,
                    "closing inbound before receiving peer's close_notify");
        }

        conContext.closeInbound();
        if ((autoClose || !isLayered()) && !super.isInputShutdown()) {
            super.shutdownInput();
        }
    }

Apache invokes a public method that delegates its work to a private one with true as argument. When argument is set to true then this check is run:

        if (checkCloseNotify && !conContext.isInputCloseNotified &&
            (conContext.isNegotiated || conContext.handshakeContext != null)) {

            throw conContext.fatal(Alert.INTERNAL_ERROR,
                    "closing inbound before receiving peer's close_notify");
        }

Why does core Java not have this problem? You can debug it yourself, it is a nice training, I really recommend it. If you do not want to do it the answer is that core Java uses method:

    private void bruteForceCloseInput(
            boolean hasCloseReceipt) throws IOException {
        if (hasCloseReceipt) {
            try {
                this.shutdown();
            } finally {
                if (!isInputShutdown()) {
                    shutdownInput(false);
                }
            }
        } else {
            if (!conContext.isInboundClosed()) {
                conContext.inputRecord.close();
            }

            if ((autoClose || !isLayered()) && !super.isInputShutdown()) {
                super.shutdownInput();
            }
        }
    }

That method delegates its work to shutdownInput() method, but with argument set to false, and the check I have mentioned above is not run.

Time for: big fat warning

The rest of this article shows how to hack Apache to reuse the SSL session. This code was never deployed to the production, it is so ugly. I created it only to find if I was capable of fixing this problem on the client side.

Hacking Apache

You can fetch a hacked version from the apache-ugly-hack branch. From the stacktrace of the exception we can see that the shutdownInput is invoked from:

  	at java.base/sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:624)
  	at org.apache.http.impl.BHttpConnectionBase.close(BHttpConnectionBase.java:325)
  	at org.apache.http.impl.conn.LoggingManagedHttpClientConnection.close(LoggingManagedHttpClientConnection.java:81)

Close method in LoggingManagedHttpClientConnection is:

    @Override
    public void close() throws IOException {

        if (super.isOpen()) {
            if (this.log.isDebugEnabled()) {
                this.log.debug(getId() + ": Close connection");
            }
            super.close();
        }
    }

It delegates its work to BHttpConnectionBase:

    @Override
    public void close() throws IOException {
        final Socket socket = this.socketHolder.getAndSet(null);
        if (socket != null) {
            try {
                this.inBuffer.clear();
                this.outbuffer.flush();
                try {
                    try {
                        socket.shutdownOutput();
                    } catch (final IOException ignore) {
                    }
                    try {
                        socket.shutdownInput();
                    } catch (final IOException ignore) {
                    }
                } catch (final UnsupportedOperationException ignore) {
                }
            } finally {
                socket.close();
            }
        }
    }

What we would like to do is to override that method to invoke shutdownInput with false as argument. Unfortunately that method is private. As I said before, it’s a hack: we can use reflection:

            if (socket != null) {
                try {
                    inBuffer.clear();
                    outbuffer.flush();
                    try {
                        try {
                            socket.shutdownOutput();
                        } catch (final IOException ignore) {
                        }
                        Method method = socket.getClass().getDeclaredMethod("shutdownInput", boolean.class);
                        method.setAccessible(true);
                        method.invoke(socket, false);
                    } catch (final UnsupportedOperationException ignore) {
                    }
                } finally {
                    socket.close();
                }
            }

This is ugly, yes, and it needs to be wrapped with ConnectionManager and ConnectionFactory. You can see my implementation in these classes:

alt text

Did it help?

Almost, the exception is no longer thrown, but the SSL session is not reused. After some debugging I found out the cause in sun.security.ssl.ClientHello.ClientHelloKickstartProducer.produce method:

                if (!session.getProtocolVersion().useTLS13PlusSpec()) {
                    sessionId = session.getSessionId();
                }

This condition was false in Apache. It can be overwritten with the JVM flag: -Djdk.tls.client.protocols=TLSv1.2. Again, big fat waning: this may be insecure in your application. After adding that flag the SSL session was reused. Let’s look at the performance of the hacked application:

pasq@pasq-MS-7C37:~$ time for i in `seq 1 500`; do curl --insecure -s https://localhost:8443/ > /dev/null; done
real	0m19,901s

So the performance is better, and no java thread is created every request.

What about the original profiled application?

The team developing the application choose an option to ask the external service provider to change the “Connection: close” header to keep-alive. Unfortunately it has not been done yet, so I cannot present you the results.