New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LEAK: ByteBuf.release() was not called before it's garbage-collected - resolution: buffer is not released when DataBuffer#asInputStream() #1746
Comments
@Azbesciak any code samples which can repo this? Any reference of workflow that causes this. Without seeing it is impossible to say where is the problem. Also, the root cause can be in your code (also, it can be in reactor-core) since we can not guarantee that you discard netty buffers in your pipeline |
As I said - no idea. @Bean
fun webClient(builder: WebClient.Builder) = { url: String -> builder.baseUrl(url).build() } Then consume it at the service provision, for example @Configuration
internal class ServiceConfiguration(
private val client: (String) -> WebClient,
private val rateLimiterProvider: RateLimiterProvider,
) {
// lazy because it can be used by multiple beans
private val service by lazy {
SomeService(
client(someUrl),
rateLimiterProvider
)
}
@Bean("some-service")
override fun myService() = service and inside the service I execute get calls like this client.get()
.uri {
params.forEach { (name, value) -> it.queryParam(name, value) } // params is Array<Pair<String, String>>
it.build()
}.retrieve()
.bodyToMono(resultClass)
.transformDeferred(RateLimiterOperator.of(rateLimiterProvider[javaClass.simpleName])) Rate limiter below, whitout this I also noticed some leaks... but to be honest, these are more noticable after adding it class RateLimiterProvider(private val spec: RateLimiterSpec) {
private val requestLimiters = ConcurrentHashMap<String, RateLimiter>()
operator fun get(key: String) = requestLimiters.computeIfAbsent(key) {
val instanceSpec = spec.default
RateLimiter.of(
key,
RateLimiterConfig.custom()
.limitForPeriod(instanceSpec.limit) // 200
.timeoutDuration(instanceSpec.timeout) // 10m
.limitRefreshPeriod(instanceSpec.period) // 1s
.build()
)
}
} As mentioned above, service(s) can also be used in batch mode (the client sends multiple requests as one, the response is merged as json-stream). override fun batch(queries: Flux<BatchRequestEntry>) =
queries.publishOn(scheduler)
.flatMap { query ->
when {
conditionA(query)-> serviceA.invoke(query) // Mono
conditionB(query)-> serviceB.invokeMany(query).toMono() // it returns Flux without it
conditionC(query) -> serviceC.invoke(query).toMono()
else -> Mono.empty()
}.map {
BatchResultEntry(query.id, result = it)
}.onErrorResume { mapError(it, query) }
}
private fun mapError(it: Throwable?, query: BatchRequestEntry): Mono<BatchResultEntry> {
val e = when (it) {
is WebClientResponseException -> webClientResponseExceptionManager.parse(it)
else -> it
}
return Mono.just(BatchResultEntry(query.id, error = e))
} Here |
@Azbesciak can you try to repro this leak with an enabled logger:
Also, it can be very useful if you can create a small example app that mimics the code you have in your production app but shows the code that causes the issue in isolation. Even though it will not reproduce the problem it can give us an opportunity to play around and see what could be the problem and where is it. Thanks |
Is it somehow possible to set via environment properties? I would not like to rebuild the whole image (I do not have logger specific config file).
but no difference, stack trace is still the same
also, about the code sample - it is really all that I am using; I nearly copy-pasted all the service connection logic. |
I changed env props to
Below is the whole log till the first leak. |
@Azbesciak you need to set DEBUG level for that specific logger |
Yes, I saw it, and asked about it.
|
Are you able to add |
@violetagg I am affraid that this -Dlogging.level._reactor.netty=debug does not work.
but no difference. still the same log (nothing before and after)
|
To be sure it is not RateLimiter fault, I removed it - unfortunately, no change (notice that it was in the stacktrace also)
|
@Azbesciak It looks like the two JVM settings are not applied If this is applied With |
Ok, I believe that is true, but look at this comment and one before - when I removed one setting, log appeared, so maybe these are in conflict? I cannot experiment on production during the day, and it is also cumbersome :/ |
Yeah I understand, it is a tricky one 😟
Take a look at this issue The first stack is with the standard settings, then the author added When you enable the logger
|
@Azbesciak Were you able to enable the leak detection in Netty/Reactor Netty? |
@Azbesciak We need somehow to enabled those options otherwise we cannot proceed with the investigation ... :( |
@violetagg I have 'good' and bad news. <configuration>
<property name="HOME_LOG" value="log2/app.log"/>
<appender name="FILE-ROLLING" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${HOME_LOG}</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<fileNamePattern>logs/archived/app.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
<!-- each archived file, size max 10MB -->
<maxFileSize>10MB</maxFileSize>
<!-- total size of all archive files, if total size > 20GB, it will delete old archived file -->
<totalSizeCap>20GB</totalSizeCap>
<!-- 60 days to keep -->
<maxHistory>60</maxHistory>
</rollingPolicy>
<encoder>
<pattern>%d %p %c{1.} [%t] %m%n</pattern>
</encoder>
</appender>
<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
<appender-ref ref="FILE-ROLLING" />
</appender>
<root level="debug">
<appender-ref ref="ASYNC"/>
</root>
<logger name="_reactor.netty.channel.LeakDetection" level="DEBUG" />
</configuration> ... did not really help, to be honest, nothing changed.
You can give me the exact logger config and env parameters you want, I can run it and we will see... |
@violetagg ? Did you see my question about the exact required run configuration? Unfortunately, as you see above I was not able to receive the required result with described steps. I can also send you directly some code parts with requests - it is probably redundant (the exact code, requests, and keys), but... |
@Azbesciak It seems that Spring Boot 2.5 has a property |
@OlegDokuka did you see #1746 (comment)? |
@Azbesciak yeah, noticed that after, thus removed my comment then. My bad |
Ok, did not notice that you removed. |
@Azbesciak Let me write a test around the same set of operators as in your original issue, to see if a leak may happen on that level. I lean toward the probability that something is on the reactor-core side rather than on the reactor-netty one |
@Azbesciak can you confirm the media type for the content? It looks like it should be a streaming format but just double checking this is the case as we are looking at code paths and possibilities. |
Yes, it is stream+json |
Potential fix for issue reported at reactor/reactor-netty#1746
@Azbesciak I have a potential fix in the Spring Framework. If you're able to give it a try with Spring Framework 5.3.10-SNAPSHOT that would be great. |
Unfortunately, it did not change a lot - only that I noticed that it is easier to invoke the leak, but it is just a personal observation. |
@Azbesciak This is the last component that accessed the buffer (see below). I cannot comment the code with package
|
Thank you @violetagg and sorry - I had some plug-in I've forgotten about... |
This reverts commit 77a562d. As per findings under reactor/reactor-netty#1746 it looks this wasn't the issue and isn't required.
Hi I'm getting the same in an intermittent way.. I got these errors on the docker environment. Also, I run the same service on a local environment with constraints of memory XX:MaxRam, Xmx, Also doing profiling over the service I don't have evidence of memory leaks. Here is my stacktrace:
Some suggestions about how to find the cause? Regards |
@jarpz The issue in this ticket was in the authors's code. Please open new issue where we will investigate your particular problem. |
Potential fix for issue reported at reactor/reactor-netty#1746
This reverts commit 77a562d. As per findings under reactor/reactor-netty#1746 it looks this wasn't the issue and isn't required.
[reactor-http-epoll-8] ERROR io.netty.util.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information. |
Hello, I am expecting a strange exception; it occurs only on the production service with production load (docker/ubuntu , I tried to replicate both on the production (docker) machine (same image, different run) and on my own windows machine, also in docker for windows - no success - I executed about 30k requests with different frequency, also tried with env var
IO_NETTY_LEAK_DETECTION_LEVEL="paranoid"
).On the production, it shows up some short time (2-3 mins, about 1k requests executed to external service) after a server boot, when it receives requests. However, it occurs also later (for 1h about 10 times), and have a little bit different stack traces;
these below:
#1
#2
#3
Expected Behavior
No leak, after 3 days this service consumes 2x more ram than at the beginning.
Steps to Reproduce
No idea. Maybe it is worth mentioning I cache
WebClient
for each url path (without query params); later it is assigned to the field, but created at the very beginning (after boot).I had this issue on spring boot 2.3.2, migrated to 2.5.2 but no difference.
Your Environment
Ubuntu 18.04.3 LTS
Docker 19.03.5, build 633a0ea838
docker base image: adoptopenjdk/openjdk16:jre-16.0.1_9-alpine
netty
, ...): netty version 4.1.65The text was updated successfully, but these errors were encountered: