Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate aarch64 app startup/time to serve first HTTP request #476

Open
Karm opened this issue Jan 23, 2023 · 3 comments
Open

Investigate aarch64 app startup/time to serve first HTTP request #476

Karm opened this issue Jan 23, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request not-Stale

Comments

@Karm
Copy link
Collaborator

Karm commented Jan 23, 2023

Weird perf results on our fast baremetal boxes:

https://ci.modcluster.io/view/Mandrel/job/mandrel-linux-integration-tests/955/JDK_RELEASE=ga,JDK_VERSION=11,LABEL=el8_aarch64,MANDREL_BUILD=mandrel-21-3-linux-build-matrix,QUARKUS_VERSION=2.7.6.Final/

https://ci.modcluster.io/view/Mandrel/job/mandrel-linux-integration-tests/955/JDK_RELEASE=ga,JDK_VERSION=17,LABEL=el8_aarch64,MANDREL_BUILD=mandrel-22-3-linux-build-matrix,QUARKUS_VERSION=2.13.6.Final/

It looks like it takes both Quarkus and Helidon a long time to start?

It is profoundly unexpected as the dummy startup time threshold was calibrated on a slow VM. It should be fine by a huge margin on a fast baremetal system that has nothing else to do...

@Karm Karm added the enhancement New feature or request label Jan 23, 2023
@Karm Karm self-assigned this Jan 23, 2023
@Karm Karm changed the title Investigate aarch64 native app startup / time to serve first HTTP request aarch64 native app startup / time to serve first HTTP request Jan 23, 2023
@Karm Karm changed the title aarch64 native app startup / time to serve first HTTP request Invetigate aarch64 app startup/time to serve first HTTP request Jan 23, 2023
@jerboaa
Copy link
Collaborator

jerboaa commented Jan 26, 2023

For posterity, fails with (for 22.3):

08:35:25 Finished generating 'target/debug-symbols-smoke' in 19.8s.
08:35:36 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 189.234 s - in org.graalvm.tests.integration.DebugSymbolsTest
08:35:37 [INFO] 
08:35:37 [INFO] Results:
08:35:37 [INFO] 
08:35:37 [ERROR] Failures: 
08:35:37 [ERROR]   RuntimesSmokeTest.helidonQuickStart:224->testRuntime:172 Application HELIDON_QUICKSTART_SE took 176 ms to get the first OK request, which is over 100 ms threshold by 76%. ==> expected: <true> but was: <false>
08:35:37 [INFO] 
08:35:37 [ERROR] Tests run: 19, Failures: 1, Errors: 0, Skipped: 7
08:35:37 [INFO] 
08:35:37 [INFO] ------------------------------------------------------------------------
08:35:37 [INFO] Reactor Summary for Native image integration TS 1.0.0-SNAPSHOT:

The 21.3 failure is:

08:49:47 [INFO] Results:
08:49:47 [INFO] 
08:49:47 [ERROR] Failures: 
08:49:47 [ERROR]   RuntimesSmokeTest.quarkusFullMicroProfile:201->testRuntime:172 Application QUARKUS_FULL_MICROPROFILE took 319 ms to get the first OK request, which is over 300 ms threshold by 6%. ==> expected: <true> but was: <false>
08:49:47 [INFO] 
08:49:47 [ERROR] Tests run: 19, Failures: 1, Errors: 0, Skipped: 7
08:49:47 [INFO] 
08:49:47 [INFO] ------------------------------------------------------------------------
08:49:47 [INFO] Reactor Summary for Native image integration TS 1.0.0-SNAPSHOT:
08:49:47 [INFO] 
08:49:47 [INFO] Native image integration TS ........................ SUCCESS [  0.102 s]
08:49:47 [INFO] testsuite .......................................... FAILURE [13:53 min]
08:49:47 [INFO] ------------------------------------------------------------------------
08:49:47 [INFO] BUILD FAILURE
08:49:47 [INFO] ------------------------------------------------------------------------

Both run on RHEL 8, which AFAIK has 64k page size by default. We ought to run it on RHEL 9 as well in order to see if there is a difference. RHEL 9 has default of 4k on aarch64. This might explain some of the start up differences.

@Karm
Copy link
Collaborator Author

Karm commented Jan 30, 2023

@jerboaa It is definitely the case. RHEL 9, getconf PAGE_SIZE 4096, has like 93% usage more RSS over the threshold (not a typo), while RHEL 8 (65536 page size) does ~10% slower over the startup threshold.

I'm reading https://www.kernel.org/doc/html/latest/arm64/memory.html ....it seems counter-intuitive to me that smaller pages would make for a bigger fragmentation? more RSS?

It's this situation though:

When the guest is 4k and the host is 64k, it only works if the guest reports 
multiple contiguous 4k pages that for a 64k page -- which is often the case, 
but not always. The host will discard a whole 64k page once it collected all 4k pages.

It is obviously not specific to Quarkus Native. I'd like to narrow it down to some eloquent recommendation we could put in writing on https://quarkus.io/guides/native-reference.

As the host is CentOS 8 and the guest is CentOS 9. I wonder that I will move to CentOS 9 altogether...

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

This issue appears to be stale because it has been open 30 days with no activity. This issue will be closed in 7 days unless Stale label is removed, a new comment is made, or not-Stale label is added.

@github-actions github-actions bot added the Stale label Mar 2, 2023
@zakkak zakkak added not-Stale and removed Stale labels Mar 2, 2023
@jerboaa jerboaa changed the title Invetigate aarch64 app startup/time to serve first HTTP request Investigate aarch64 app startup/time to serve first HTTP request Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request not-Stale
Projects
None yet
Development

No branches or pull requests

3 participants