Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTAHUB-1037] Identify and fix memory leak issue #1050

Merged
merged 42 commits into from Oct 10, 2022

Conversation

thewatermethod
Copy link
Collaborator

@thewatermethod thewatermethod commented Sep 29, 2022

Description of change

Proposed fix

The backend tests are now run in batches by file folder via a bash script. Electing for now not to run the tests with parallelization on CircleCI since that would not let you run the tests locally.

With this method, each folder is tested for coverage individually, meaning that the coverage thresholds had to essentially be lowered into oblivion to get the CI to pass. This change would be the first of a few to iron out this issue, I would propose a mad rush on backend tests after the goals feature.

A full coverage report is not automatically generated. Instead, an lcov.info file is generated for each folder, and I added a yarn command that runs a different bash script to concatenate and generate the HTML view.

Issue History

You can see the error in the CI output.

These could be related to the issue we are suffering from:

You can watch the tests gobble increasing amounts of memory by running the tests with --logHeapUsage

We encountered a similar issue on the frontend a few months ago. That issue was solved by running node with --expose-gc, and switching the frontend tests to run in silent mode. This does not seem to be sufficient to solve the issue on the backend.

  • I tried investigating memory leaks by installing weak-napi and running jest with the detectLeaks flag. Unfortunately, all this told me was that most of our tests were leaking. It was unclear what was causing this, and I'm not sure how to proceed with that information.

  • I tried updating Jest to version 29, but I saw no improvement in memory usage.

  • I tried removing --runInBand, as I read this would force GC after each test but that doesn't seem to be the case.

  • I ran a memory snapshot, and if I'm reading it correctly (which is not guaranteed at all) the largest resource usage seems to be babel'fied Node-native code stored as strings. This made me think of this ticket. While dropping babel could help, I'm not sure that it would actually solve this problem, and the conversion would be fairly laborious.

  • Memory snapshot (load into the memory tab on chrome dev tools)

  • I've also found that removing the --coverage and the junit reports from jest allows the tests to run, but that's not desirable.

How to test

Issue(s)

Checklists

Every PR

  • Meets issue criteria
  • JIRA ticket status updated
  • Code is meaningfully tested
  • Meets accessibility standards (WCAG 2.1 Levels A, AA)
  • API Documentation updated
  • Boundary diagram updated
  • Logical Data Model updated
  • Architectural Decision Records written for major infrastructure decisions

Production Deploy

  • Staging smoke test completed

After merge/deploy

  • Update JIRA ticket status

@thewatermethod thewatermethod changed the base branch from main to main-ar-redesign September 29, 2022 13:09
@thewatermethod thewatermethod changed the title Update node on redesign Identify and fix memory leak issue Sep 29, 2022
@thewatermethod thewatermethod changed the title Identify and fix memory leak issue [TTAHUB-1050] Identify and fix memory leak issue Sep 29, 2022
@thewatermethod thewatermethod changed the title [TTAHUB-1050] Identify and fix memory leak issue [TTAHUB-1037] Identify and fix memory leak issue Sep 29, 2022
@@ -66,6 +66,9 @@ jest.mock('../../services/activityReports', () => ({
jest.mock('../../services/objectives', () => ({
saveObjectivesForReport: jest.fn(),
getObjectivesByReportId: jest.fn(),
}));

jest.mock('../../services/userSettings', () => ({
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this got gunked up when we merged

{ name: 'ZALNoUpdateFTests' },
{ name: 'ZALTruncateFTests' },
]);
const routineNames = routines.map((routine) => routine.name);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to make this test less fragile

@@ -575,7 +575,7 @@ describe('goal filtersToScopes', () => {
});

expect(found.length).toBe(6);
expect(found[0].name).toContain('Goal 1');
expect(found.map((f) => f.name)).toContain('Goal 1');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this test is flaky so I'm trying to fix that

bin/test-backend-ci Outdated Show resolved Hide resolved
@@ -103,8 +103,22 @@ describe('Topics and frequency graph widget', () => {
mockUserThree,
]);

const grantsSpecialist = await Role.findOne({ where: { fullName: 'Grants Specialist' } });
const systemSpecialist = await Role.findOne({ where: { fullName: 'System Specialist' } });
const [grantsSpecialist] = await Role.findOrCreate({
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like something was getting deleted if the tests were run in folders, so I changed this to make it a bit more error-proof as well.

"statements": 85,
"functions": 70,
"branches": 70,
"lines": 85
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obviously not ideal, but I'm not sure this PR should involve bringing every folder up to a higher test threshold.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get why these new thresholds were chosen (9 sep. folders, 85/9 = ~10). does this change mean that each sep. test run now only requires 10% of branch coverage to be considered a success?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's giving me a great deal more credit than I probably deserve - I just lowered them until they'd pass. I just moved them back up to about the highest possible values right now. And yes, you are correct, each test/folder only needs to clear that threshold specified in this config.

@@ -184,7 +203,7 @@
"puppeteer": "^13.1.1",
"puppeteer-select": "^1.0.3",
"redoc-cli": "^0.13.2",
"selenium-webdriver": "4.0.0",
"selenium-webdriver": "4.3.0",
"supertest": "^6.1.3"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin/run-tests Outdated
log "Running backend tests"

# remove existing coverage folder, since we are changing how things are structured
rm -f -rf coverage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can omit the -f

Copy link
Collaborator

@nvms nvms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a Q re: the new coverage thresholds, otherwise lg

@kryswisnaskas
Copy link
Collaborator

Nice! 👍 We have this improved RAM usage now 🎉 :

image

Couple of questions/comments:

Is there a way to increase the test coverage from 10% to the highest supported now?
Could we enter a ticket to handle the Windows issues (e.g. we are unable to run the ./bin/test-backend-ci locally on Windows) ?

@thewatermethod
Copy link
Collaborator Author

Is there a way to increase the test coverage from 10% to the highest supported now?

Yeah, I did that here... if I'm understanding what you mean

Could we enter a ticket to handle the Windows issues (e.g. we are unable to run the ./bin/test-backend-ci locally on Windows) ?

Ticket here

@kryswisnaskas kryswisnaskas merged commit da5095f into main-ar-redesign Oct 10, 2022
@kryswisnaskas kryswisnaskas deleted the update-node-on-redesign branch October 10, 2022 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants