Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable log4j2 shutdown hook [HZ-3543] #26006

Closed

Conversation

vbekiaris
Copy link
Contributor

A common issue, especially when Hazelcast runs in Kubernetes pods with log4j2 logging,
is that shutdown logs are missing. The reason is that log4j2 registers its own shutdown
hook, alongside Hazelcast's own shutdown hook. Since the JVM does not guarantee
any ordering of shutdown hook execution, it can be the case that the log4j shutdown
hook shuts down loggers while Hazelcast is still performing shut down activities.

This results in missing important logs from the shutdown of Hazelcast members that
can be useful for troubleshooting.

This change introduces explicit shutdown of the log4j2 LogManager after Hazelcast Node
has been shut down: this does no harm when the log4j shutdown hook is enabled (as far as
I navigated the log4j code & tested the changes locally). When the shutdown hook is disabled,
it allows for clean logging of shutdown information.

Example of Hazelcast shutdown logging with GRACEFUL shutdown hook policy:

  • without this change:
$ bin/hz start
...
2023-11-20 16:25:56,816 [ INFO] [main] [c.h.c.LifecycleService]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] [127.0.0.1]:5701 is STARTED
 ^C2023-11-20 16:26:02,988 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Running shutdown hook... Current node state: ACTIVE
  • with this change:
^C2023-11-20 16:26:31,183 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Running shutdown hook... Current node state: ACTIVE
2023-11-20 16:26:31,185 [ INFO] [hz.ShutdownThread] [c.h.c.LifecycleService]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] [127.0.0.1]:5701 is SHUTTING_DOWN
2023-11-20 16:26:31,191 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Shutting down connection manager...
2023-11-20 16:26:31,195 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Shutting down node engine...
2023-11-20 16:26:31,205 [ INFO] [hz.ShutdownThread] [c.h.i.i.NodeExtension]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Destroying node NodeExtension.
2023-11-20 16:26:31,206 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Hazelcast Shutdown is completed in 18 ms.

A common issue, especially when Hazelcast runs in Kubernetes pods with log4j2 logging,
is that shutdown logs are missing. The reason is that log4j2 registers its own shutdown
hook, alongside Hazelcast's own shutdown hook. Since the JVM does not guarantee
any ordering of shutdown hook execution, it can be the case that the log4j shutdown
hook shuts down loggers while Hazelcast is still performing shut down activities.
This results in missing important logs from the shutdown of Hazelcast members that
can be useful for troubleshooting.
This change introduces explicit shutdown of the log4j2 LogManager after Hazelcast Node
has been shut down: this does no harm when the log4j shutdown hook is enabled (as far as
I navigated the log4j code & tested the changes locally). When the shutdown hook is disabled,
it allows for clean logging of shutdown information.
Example of Hazelcast shutdown logging with GRACEFUL shutdown hook policy:
- without this change:
 ^C2023-11-20 16:26:02,988 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Running shutdown hook... Current node state: ACTIVE
 - with this change:
 ^C2023-11-20 16:26:31,183 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Running shutdown hook... Current node state: ACTIVE
 2023-11-20 16:26:31,185 [ INFO] [hz.ShutdownThread] [c.h.c.LifecycleService]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] [127.0.0.1]:5701 is SHUTTING_DOWN
 2023-11-20 16:26:31,191 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Shutting down connection manager...
 2023-11-20 16:26:31,195 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Shutting down node engine...
 2023-11-20 16:26:31,205 [ INFO] [hz.ShutdownThread] [c.h.i.i.NodeExtension]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Destroying node NodeExtension.
 2023-11-20 16:26:31,206 [ INFO] [hz.ShutdownThread] [c.h.i.i.Node]: [127.0.0.1]:5701 [dev] [5.4.0-SNAPSHOT] Hazelcast Shutdown is completed in 18 ms.
@vbekiaris vbekiaris added this to the 5.4.0 milestone Nov 20, 2023
@vbekiaris vbekiaris requested a review from a team as a code owner November 20, 2023 14:45
Comment on lines +47 to +48
default void shutdown() {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other logging frameworks suffer same/similar issues. Should we investigate and resolve all? Especially logback and JUL.

@@ -570,6 +570,7 @@ public void shutdown(final boolean terminate) {
if (state != NodeState.SHUT_DOWN) {
shuttingDown.compareAndSet(true, false);
}
loggingService.shutdown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to call this in shutdown? Won't this shutdown the logging framework for the whole JVM? what if something wants to log afterwards?
Two cases I am thinking of:

  • our tests where we shutdown nodes
  • environment where the instance is managed by another framework, e.g. Spring

Copy link
Member

@srknzl srknzl Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder what shuting down log4j means. Will it shut down logger for whole jvm, including other libraries that uses it. I suppose it shut downs the only Hazelcast's logger context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it shut downs the only Hazelcast's logger context.

That was my thought too. the LogManager#shutdown javadoc reads:

Shutdown using the LoggerContext appropriate for the caller of this method.

Digging to figure out what is a "LoggerContext appropriate for the caller of this method", the only concrete information I could find was that a LoggerContext can be associated with a specific webapp's ServletContext (when running log4j under a webapp server), but no idea how LoggerContexts are created in a normal Java app.

I did a bit of research into spring-boot and elasticsearch. They both disable the log4j shutdown hook and shut down log4j explicitly (spring-boot, elasticsearch ) , however it is rather safe to assume spring-boot/elasticsearch is probably the only app in the JVM.

I think we should control explicit shutdown of log4j (or any logging framework) by a HazelcastProperty and ensure it is enabled from our scripts (assuming Hazelcast is the only JVM app) but have it disabled by default. In this way, when using the client or embedded member, the logging framework explicit shutdown won't be executed by default since it can have unexpected side-effects. wdyt @frant-hartm @srknzl ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense imo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 5c20529

Default is false, so Hazelcast will not explicitly shut down the logging
framework during graceful shutdown. When true, Hazelcast will
shutdown the backing logging framework as part of its graceful
shutdown procedure.
@vbekiaris
Copy link
Contributor Author

Closed in favour of https://github.com/hazelcast/hazelcast-mono/pull/236 - let's continue the discussion there

@vbekiaris vbekiaris closed this Dec 19, 2023
devOpsHazelcast pushed a commit that referenced this pull request Jan 8, 2024
A common issue, especially when Hazelcast runs in Kubernetes pods with
log4j2 logging, is that shutdown logs are missing. The reason is that log4j2 registers
its own shutdown hook, alongside Hazelcast's own shutdown hook.
Since the JVM does not guarantee any ordering of shutdown hook execution,
it can be the case that the log4j shutdown hook shuts down loggers while
Hazelcast is still performing shut down activities.

This results in missing important logs from the shutdown of Hazelcast
members that can be useful for troubleshooting.

This change introduces explicit shutdown of the log4j2 LogManager after
Hazelcast Node has been shut down, controlled by a hazelcast property:
this does no harm when the log4j shutdown hook is enabled. When the shutdown
hook is disabled, it allows for clean logging of shutdown information.

Supersedes #26006

GitOrigin-RevId: 5ca7e476db0f9f9461f20424b482c80084e11514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants