Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many open files after upgrade to Spring Boot 2.2.8 #21923

Closed
martinvisser opened this issue Jun 15, 2020 · 10 comments
Closed

Too many open files after upgrade to Spring Boot 2.2.8 #21923

martinvisser opened this issue Jun 15, 2020 · 10 comments
Labels
for: external-project For an external project and not something we can fix

Comments

@martinvisser
Copy link

We recently upgraded from Spring Boot 2.2.7 to 2.2.8, running on PCF (Azure, OS is linux). Now we just ran into an issue where the app crashed in the end with "Too many open files". "files" actually were open TCP sockets, over 1 million. As there are quite some other dependency upgrades, it's very hard to figure out where it goes wrong. We actually had 32 instances crashing and reproduced it in another environment pretty easily.
After a couple of hours the number of open sockets didn't change.

The application uses webflux, so netty. To see if it was about netty I downgraded Spring Boot to 2.2.7 and only updated all netty dependencies to 4.1.50. With that configuration it worked fine, the amount of sockets stayed around 30.000.

I can't reproduce this on my Mac, but with some load easily on PCF on Linux. So I think it's related to the OS.

Some stack traces:

io.netty.channel.DefaultChannelPipeline  : An exceptionCaught() event was fired, and it reached at the tail of the pipeline.
It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files

sun.rmi.transport.tcp                    : RMI TCP Accept-5000: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=5000] throws java.net.SocketException: Too many open files (Accept failed)

	at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)

	at java.base/java.net.AbstractPlainSocketImpl.accept(Unknown Source)

	at java.base/java.net.ServerSocket.implAccept(Unknown Source)

	at java.base/java.net.ServerSocket.accept(Unknown Source)

	at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)

	at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

a.w.r.e.AbstractErrorWebExceptionHandler : [4974671b-12062]  500 Server Error for HTTP POST "/some/path" io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
Wrapped by: io.netty.channel.ChannelException: io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
 
at io.netty.channel.unix.Socket.newSocketStream0(Socket.java:421)
 
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:319)
 
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:323)
 
at io.netty.channel.epoll.EpollSocketChannel.<init>(EpollSocketChannel.java:45)
 
at reactor.netty.resources.DefaultLoopEpoll.getChannel(DefaultLoopEpoll.java:45)
 
at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:187)
 
at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:169)
 
at reactor.netty.tcp.TcpResources.onChannel(TcpResources.java:215)
 
at reactor.netty.http.client.HttpClientConnect$HttpTcpClient.connect(HttpClientConnect.java:141)
 
at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43)
Wrapped by: com.netflix.hystrix.exception.HystrixRuntimeException: payment-request-merchant-site.payment-request-merchant-site-v2 failed and fallback failed.
 
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:832)

	|_ Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed 
at the following site(s):

	|_ |
	|_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.authorization.AuthorizationWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.authorization.ExceptionTranslationWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.authentication.logout.LogoutWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.savedrequest.ServerRequestCacheWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.context.SecurityContextServerWebExchangeWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.context.ReactorContextWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.config.web.server.ServerHttpSecurity$ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.cloud.sleuth.instrument.web.TraceWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]

	|_ |
	|_ checkpoint ⇢ HTTP POST "/some/path" [ExceptionHandlingWebHandler]
Stack trace:

	|_  
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:832)

	|_  
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:807)

	|_  
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)

	|_  
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)

	|_  
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)

	|_  
at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1472)

	|_  
at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1397)

	|_  
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)

	|_  
at rx.internal.reactivestreams.SubscriberAdapter.onError(SubscriberAdapter.java:59)

	|_  
at reactor.core.publisher.StrictSubscriber.onError(StrictSubscriber.java:106)

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jun 15, 2020
@bclozel
Copy link
Member

bclozel commented Jun 15, 2020

Since this seems to be linked with Netty's native support, did you try changing the "netty-tcnative-boringssl-static" dependency?

You could try the following:

  • Spring Boot 2.2.8 with netty-tcnative.version set to 2.0.30.Final
  • Spring Boot 2.2.8 with an exclusion on this dependency
  • Spring Boot 2.2.7 with netty-tcnative.version set to 2.0.31.Final

Please let us know if this changes things, it would help us to find the source of the problem.
Thanks!

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label Jun 15, 2020
@bclozel
Copy link
Member

bclozel commented Jun 16, 2020

This seems to be caused by reactor/reactor-netty#1152

Could you try overriding the reactor-netty dependency to the latest 0.9.9.BUILD-SNAPSHOT and this if this fixes the issue?

Thanks!

@snicoll
Copy link
Member

snicoll commented Jun 16, 2020

You can override reactor-bom.version to Dysprosium-BUILD-SNAPSHOT. We've also switched Spring Boot 2.2.9.BUILD-SNAPSHOT to use this version by default so in an hour or so you could just switch your build to 2.2.9.BUILD-SNAPSHOT.

@martinvisser
Copy link
Author

@bclozel Overriding the version to 0.9.9.BUILD-SNAPSHOT worked again as expected! No more excessive sockets opened.

@bclozel
Copy link
Member

bclozel commented Jun 16, 2020

Thanks @martinvisser this is helping a lot!

@MahatmaFatalError
Copy link
Contributor

reactor-netty is available in v0.9.10.RELEASE. Which spring boot release is planned to contain this version?

@bclozel
Copy link
Member

bclozel commented Jul 21, 2020

Reactor Dysprosium-SR10 ships with reactor-netty 0.9.10, see #22376.
As for this particular issue, Dysprosium-SR9 (reactor-netty 0.9.9) should already fix the problem in Spring Boot 2.3.2 (see #21938).

If you're still experiencing an issue, please create a ticket on the reactor-netty tracker with your findings.

@violetagg
Copy link
Member

@jeehunseo Can you check that you do not have dependencies mismatch? Carefully check the dependencies that may pack Netty and especially the native parts of Netty.

@snicoll
Copy link
Member

snicoll commented Oct 19, 2020

@violetagg Thanks. The reporter has now created a separate issue so I suggest we follow-up there.

@zyy71897
Copy link

I still have this problem in version spring-boot 2.6.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix
Projects
None yet
Development

No branches or pull requests

8 participants