New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usar PGO e G1GC na imagem nativa #1
Comments
Here is the specific time in @alina-yur talk about the necessary build-args: https://youtu.be/8umoZWj6UcU?si=XJBhf9n8N9HN2Drr&t=2130 Which are: --enable-monitoring=all --pgo=default.iprof --gc=G1But in this case you would need first to build with a --pgo-instrument, run the Gatling againts the native-image, get the default.iprof and then build it wihout the --pgo-instrument. I guess that is it. |
Ops, aqui está um material dedicado a isso: |
Inclusive, sem virtual threads enabled? Achei curioso. spring.threads.virtual.enabled=true Suponho que traria benefícios também, independente do PGO e etc. |
Fala @lobaorn obrigado pela dica, apliquei aqui e gerou um arquivo gigante ai depois compilei novamente com a versão otimizada mas nao vi muita diferença nos resultados, vc quer tentar fazer um teste dai? before after |
Cara, vou olhar quando der mas pelo que aparenta na verdade o resultado piorou muito, não? Diria que em tese está rodando só a versão instrumentada e não otimizada. Talvez tenha sido só a questão do diretório/caminho do arquivo .iprof para a compilação otimizada, pq o resultado realmente fica um contrasenso. |
Gerei uma versão nova com um Dockerfile acho que to fazendo algo perguntei pra @alina pra ver se ela pode ajudar. |
Lembrando que é a @alina-yur e não @alina. (Tô no Cel hoje por isso só devo ver amanhã). |
hi folks @lobaorn @rodrigorodrigues, glad it helps! So the steps to build an optimized executable are the following:
In step 3 you can either directly specify the profiles file with |
Hi @alina-yur thanks for your message, I was able to generate an optimized executable but I was wondering how can I containerised it using Thanks, |
Replied in alina-yur/native-spring-boot#1 (comment):) The thing with PGO is to take sure you apply relevant workloads for sufficient time, emulate expected application behavior closely enough, so that the app can collect correct and full profiles. But in general PGO should always give some more performance even if profiles aren't complete/great. Make sure also as I mentioned above that the profiles are being picked up: |
Hi @alina-yur , I couldn't find this extra pom.xml
Using the other command |
Hey there, I am still away from my computer, but to try and help: The native: command invokes the native plugin, while the spring-boot: invokes the boot plugin, with native profile. But either it does not work with pgo, or it also needs the same buildArgs passed into its configuration part. Anyway, did the pgo version have better results? And maybe multiple runs of the test could improve a little further with the profiling. |
@rodrigorodrigues do you know if the profile file gets generated? |
Hi @alina-yur, @lobaorn have some good progress, the file is generated after shutdown the app and it's huge.
I couldn't see a way to make that works using spring-boot plugin so created a simple Dockerfile to copy the native app.
I've changed the Gatling script to run more data and now can see better results. instrumented version -
optimised version -
Btw we're doing this just for fun there's a competion with devs from Brazil to see who creates the fastest API using any language/framework then I decided to pick stack I've pushed the optimised image on docker hub and now gonna create a pull request for them to retest with this new image. I'd like to tag you both in the project if it's not a problem, thanks for the collaboration folks really appreciate your help. :) |
Hey @rodrigorodrigues for what I can see in the images, the response time of the optimized version was worse than even the instrumented? In that case, how was the result for the default native image, or of Oracle GRAAL M Jit? Not sure if G1 in this context is worse than Serial given the vCPU and memory constraints, but the result seems weird from the images... or the problem is that you were running the instrumented outside docker, and that made it faster, or the base image of the docker is not the same of your host machine, so the optimizations applied from the -O3 of PGO and the arch specifics are not the same of the docker image, so it is hurting the performance instead of helping. So I would guess some tweaking of the properties would be beneficial, or trying with Oracle GraalVM JIT also. In that case, from the fastest entries on TechEmpower Benchmarks, are always with -XX:+UseNUMA and -XX:+UseParallelGC so they probably won't hurt in this case for the JIT. |
Hi @lobaorn you're right sorry my bad I ran the instrumented version direct so gonna run again via docker image and update reports. |
Also, here we have many demos of GraalVM native image also, for dockerization with spring-boot: https://github.com/graalvm/graalvm-demos/tree/master/spring-native-image So perhaps you could reuse the same suggestion for the docker for native image, being: The image itself might have some tweaks that could be better. |
Just one adjustment from the suggested Dockerfile, there is already a oraclelinux:9-slim, available also on dockerhub not just oracle container registry: https://hub.docker.com/_/oraclelinux Size-wise the oraclelinux is bigger than ubuntu-jammy (https://hub.docker.com/layers/library/ubuntu/jammy/images/sha256-bcc511d82482900604524a8e8d64bf4c53b2461868dac55f4d04d660e61983cb?context=explore), but I am not sure if it can wield better results. |
Hi @lobaorn updated my previous comment with correct reports using the docker image version seems better. |
Yes, now it makes more sense. The question is if the native without optimization is better, or if the JIT is better. But I guess you will check it if makes sense. Also, I am not sure if by default when --pgo is passed, it also passes -march=native. Since I saw you are in a linux-amd64 and the docker image will also be of same, it could benefit from -march=native possibly. And I am not sure if you enabled virtual threads, since there is no new commit. In another direction, just to vent ideas, one thing I will try by the end of the month is to use Unix sockets for the load balancing in nginx and to make SB (with Tomcat or other) to listen to it:
|
Hi @lobaorn actually the virtual threads is enabled on docker-compose.yaml, just tried with both cases and yes with virtual threads enabled results are much better.
tbh I'm not familiar with Gatling I'm still confused not sure if with PGO approach is better than the previous one without it, look few different runs.
|
The point is that the requests come in the same cadence always, so what it is need to check is the values for each column, which is the response time. The higher the number, the worse it is. So by all the images, PGO is working and also Virtual Threads is giving benefits. From the other issue opened, probably would be a good idea to remove the actuator, that maybe is getting much overhead. You could check your native files (by dump) using GraalVM Dashboard: https://www.graalvm.org/latest/reference-manual/native-image/guides/use-graalvm-dashboard/ https://www.graalvm.org/dashboard/ So you can understand better, by the size of things, how much can be lessened by using more or less dependencies. And as I said previously, perhaps the G1 GC is not the better option for this context, and perhaps the SerialGC could do better. But than, just trying to really know. At least now we know that PGO is working as expected @alina-yur . And also I am not sure if you give an even bigger .iprof file to PGO if it could be better @rodrigorodrigues . Not by making the gatling bigger, actually I would make just as the same that is on the original file, but just run it multiple times. |
And also there is a Slack channel for native-image discussions, perhaps people there would also be interested about this type of discussions/challenges: https://graalvm.slack.com/archives/CN9KSFB40 |
Hey @rodrigorodrigues , I found a great tutorial on how to make static and mostly static Spring Boot native images as well. It follows the same guidance of the official docs, but with all the necessary steps, even the Dockerfile. The official docs are: The tutorial I found is: With that configuration, and the -march=native and analysis of the dump with the GraalVM Dashboard would probably give the last drop of performance for the application without changes to the code itself. I guess that if you ran the original Gatling tests multiple times, all the runs after the first will show wrong answers and numbers, but that will not matter, since the important part is to really give the best profile for the running app without modifications. If you could give 3-4x consecutive runs, and build it as a fully static optimized native image, I think it will give the best result possible. The doubt remains about G1GC vs SerialGC, and which other configurations of heap to apply. Also, another thing to try is to use the --initialize-at-build-time so that everything would be initialized at build time instead of something on runtime. If it builds and runs without any hiccup, better still. Another thing that I found on looking for optimizations of SpringBoot with native image, is this wiki: https://github.com/spring-projects/spring-boot/wiki/Spring-Boot-with-GraalVM Specially this part about Tomcat: Using tomcat-embed-programmatic Then there is the necessary POM changes for the proper exclusion and inclusion. If I find anything else, I will share. I will probably be doing everything I am listing here later this month, my personal machine is unavailable and this one is not proper to development... but the research continues :) |
Hi @lobaorn, @alina-yur I've finished this task at the end I decided to create a new version without framework to see if can go faster, the results looks good. Source code is here created 2 different profiles same as Alina's demo. Thanks for all your supported closing |
A ideia é de fazer um build com instrumentação, rodar o Gatling na aplicação instrumentada, e depois fazer o build novamente usando a informação da instrumentação para o a versao otimizada, e aí deixa a versao otimizada no seu dockerhub para a competição. Basicamente isso.
Links:
The text was updated successfully, but these errors were encountered: