[TTAHUB-1037] Identify and fix memory leak issue #1050

thewatermethod · 2022-09-29T13:08:50Z

Description of change

Proposed fix

The backend tests are now run in batches by file folder via a bash script. Electing for now not to run the tests with parallelization on CircleCI since that would not let you run the tests locally.

With this method, each folder is tested for coverage individually, meaning that the coverage thresholds had to essentially be lowered into oblivion to get the CI to pass. This change would be the first of a few to iron out this issue, I would propose a mad rush on backend tests after the goals feature.

A full coverage report is not automatically generated. Instead, an lcov.info file is generated for each folder, and I added a yarn command that runs a different bash script to concatenate and generate the HTML view.

Issue History

You can see the error in the CI output.

These could be related to the issue we are suffering from:

You can watch the tests gobble increasing amounts of memory by running the tests with --logHeapUsage

We encountered a similar issue on the frontend a few months ago. That issue was solved by running node with --expose-gc, and switching the frontend tests to run in silent mode. This does not seem to be sufficient to solve the issue on the backend.

I tried investigating memory leaks by installing weak-napi and running jest with the detectLeaks flag. Unfortunately, all this told me was that most of our tests were leaking. It was unclear what was causing this, and I'm not sure how to proceed with that information.
I tried updating Jest to version 29, but I saw no improvement in memory usage.
I tried removing --runInBand, as I read this would force GC after each test but that doesn't seem to be the case.
I ran a memory snapshot, and if I'm reading it correctly (which is not guaranteed at all) the largest resource usage seems to be babel'fied Node-native code stored as strings. This made me think of this ticket. While dropping babel could help, I'm not sure that it would actually solve this problem, and the conversion would be fairly laborious.
Memory snapshot (load into the memory tab on chrome dev tools)
I've also found that removing the --coverage and the junit reports from jest allows the tests to run, but that's not desirable.

How to test

Issue(s)

https://ocio-jira.acf.hhs.gov/browse/TTAHUB-1037

Checklists

Every PR

Meets issue criteria
JIRA ticket status updated
Code is meaningfully tested
Meets accessibility standards (WCAG 2.1 Levels A, AA)
API Documentation updated
Boundary diagram updated
Logical Data Model updated
Architectural Decision Records written for major infrastructure decisions

Production Deploy

Staging smoke test completed

After merge/deploy

Update JIRA ticket status

thewatermethod · 2022-09-30T19:40:07Z

src/routes/activityReports/handlers.test.js

@@ -66,6 +66,9 @@ jest.mock('../../services/activityReports', () => ({
 jest.mock('../../services/objectives', () => ({
  saveObjectivesForReport: jest.fn(),
  getObjectivesByReportId: jest.fn(),
+}));
+
+jest.mock('../../services/userSettings', () => ({


I think this got gunked up when we merged

thewatermethod · 2022-09-30T19:40:17Z

src/models/tests/auditModelGenerator.test.js

-            { name: 'ZALNoUpdateFTests' },
-            { name: 'ZALTruncateFTests' },
-          ]);
+        const routineNames = routines.map((routine) => routine.name);


trying to make this test less fragile

thewatermethod · 2022-09-30T19:40:34Z

src/scopes/goals/index.test.js

@@ -575,7 +575,7 @@ describe('goal filtersToScopes', () => {
      });

      expect(found.length).toBe(6);
-      expect(found[0].name).toContain('Goal 1');
+      expect(found.map((f) => f.name)).toContain('Goal 1');


again, this test is flaky so I'm trying to fix that

bin/test-backend-ci

Co-authored-by: GarrettEHill <garretthill@gmail.com>

thewatermethod · 2022-10-05T19:20:11Z

src/widgets/topicFrequencyGraph.test.js

@@ -103,8 +103,22 @@ describe('Topics and frequency graph widget', () => {
      mockUserThree,
    ]);

-    const grantsSpecialist = await Role.findOne({ where: { fullName: 'Grants Specialist' } });
-    const systemSpecialist = await Role.findOne({ where: { fullName: 'System Specialist' } });
+    const [grantsSpecialist] = await Role.findOrCreate({


It seemed like something was getting deleted if the tests were run in folders, so I changed this to make it a bit more error-proof as well.

thewatermethod · 2022-10-05T19:20:47Z

package.json

-        "statements": 85,
-        "functions": 70,
-        "branches": 70,
-        "lines": 85


This is obviously not ideal, but I'm not sure this PR should involve bringing every folder up to a higher test threshold.

I think I get why these new thresholds were chosen (9 sep. folders, 85/9 = ~10). does this change mean that each sep. test run now only requires 10% of branch coverage to be considered a success?

That's giving me a great deal more credit than I probably deserve - I just lowered them until they'd pass. I just moved them back up to about the highest possible values right now. And yes, you are correct, each test/folder only needs to clear that threshold specified in this config.

thewatermethod · 2022-10-05T19:21:15Z

package.json

@@ -184,7 +203,7 @@
    "puppeteer": "^13.1.1",
    "puppeteer-select": "^1.0.3",
    "redoc-cli": "^0.13.2",
-    "selenium-webdriver": "4.0.0",
+    "selenium-webdriver": "4.3.0",
    "supertest": "^6.1.3"


dequelabs/axe-core-npm#538

nvms · 2022-10-05T21:12:51Z

bin/run-tests

+      log "Running backend tests"
+
+      # remove existing coverage folder, since we are changing how things are structured
+      rm -f -rf coverage


can omit the -f

nvms

had a Q re: the new coverage thresholds, otherwise lg

Co-authored-by: Jonathan Pyers <pyersjonathan@gmail.com>

attempt to store coverage

kryswisnaskas · 2022-10-06T18:08:25Z

Nice! 👍 We have this improved RAM usage now 🎉 :

Couple of questions/comments:

Is there a way to increase the test coverage from 10% to the highest supported now?
Could we enter a ticket to handle the Windows issues (e.g. we are unable to run the ./bin/test-backend-ci locally on Windows) ?

thewatermethod · 2022-10-06T18:28:51Z

Is there a way to increase the test coverage from 10% to the highest supported now?

Yeah, I did that here... if I'm understanding what you mean

Could we enter a ticket to handle the Windows issues (e.g. we are unable to run the ./bin/test-backend-ci locally on Windows) ?

Ticket here

Merge branch 'kw-update-node' into update-node-on-redesign

613c42f

thewatermethod changed the base branch from main to main-ar-redesign September 29, 2022 13:09

thewatermethod added 4 commits September 29, 2022 09:20

Merge branch 'main-ar-redesign' into update-node-on-redesign

3e8304b

fix some tests and log heap usage

11280e8

expose gc

6a6cb70

Merge branch 'main-ar-redesign' into update-node-on-redesign

0f7d1ae

thewatermethod changed the title ~~Update node on redesign~~ Identify and fix memory leak issue Sep 29, 2022

thewatermethod changed the title ~~Identify and fix memory leak issue~~ [TTAHUB-1050] Identify and fix memory leak issue Sep 29, 2022

thewatermethod changed the title ~~[TTAHUB-1050] Identify and fix memory leak issue~~ [TTAHUB-1037] Identify and fix memory leak issue Sep 29, 2022

thewatermethod added 8 commits September 30, 2022 12:07

Initial pass, still have issues to deal with

309a505

run backend ci test from shell script

7f7485b

tweak stuff to see if we can get a deploy going

8a08f35

adjust coverage thresholds for now

096f3be

save coverage to a different place

4b2a699

Merge branch 'main' into update-node-on-redesign

666429a

dial down that coverage to a much less strict 10

2207ed4

cleanup my low quality bash

9b6f42a

thewatermethod commented Sep 30, 2022

View reviewed changes

GarrettEHill reviewed Sep 30, 2022

View reviewed changes

bin/test-backend-ci Outdated Show resolved Hide resolved

thewatermethod and others added 9 commits September 30, 2022 16:14

Update bin/test-backend-ci

4798df5

Co-authored-by: GarrettEHill <garretthill@gmail.com>

refactor and get genhtml

c62c001

simplify collecting coverage reports

dd8a9d9

try to re-pin selenium and axe

0575b37

Merge branch 'main' into update-node-on-redesign

a509c25

Merge branch 'main' into main-ar-redesign

d8997f8

Merge branch 'main-ar-redesign' into update-node-on-redesign

25c5a78

update readme

6d4b123

try a tweak

750cf39

thewatermethod commented Oct 5, 2022

View reviewed changes

thewatermethod marked this pull request as ready for review October 5, 2022 19:21

thewatermethod requested review from GarrettEHill, AdamAdHocTeam, kryswisnaskas, nvms and hardwarehuman October 5, 2022 19:21

thewatermethod added 4 commits October 5, 2022 15:28

update package.json

8c72a53

attempt to store coverage

e19e995

install lcov on executor machine

ca2a9e5

save root coverage in root folder

ba77179

nvms reviewed Oct 5, 2022

View reviewed changes

nvms approved these changes Oct 5, 2022

View reviewed changes

thewatermethod and others added 7 commits October 5, 2022 20:02

Update .circleci/config.yml

97c5fdf

Co-authored-by: Jonathan Pyers <pyersjonathan@gmail.com>

fix YAML

056f758

forget about lcov for now

5efdd67

Merge pull request #1059 from HHS/store-coverage

149185c

attempt to store coverage

adjust coverage to maxiumum

e2a1ec2

add lcov to docker image and use correct test result key

2ee7033

create separate junit reports for each suite

48bb1ef

thewatermethod added 2 commits October 6, 2022 14:31

use updated/supported postgres executor image

b83f226

revert for now

dd6a4b3

kryswisnaskas approved these changes Oct 10, 2022

View reviewed changes

Merge branch 'main-ar-redesign' into update-node-on-redesign

9ab7a34

kryswisnaskas merged commit da5095f into main-ar-redesign Oct 10, 2022

kryswisnaskas deleted the update-node-on-redesign branch October 10, 2022 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTAHUB-1037] Identify and fix memory leak issue #1050

[TTAHUB-1037] Identify and fix memory leak issue #1050

thewatermethod commented Sep 29, 2022 •

edited

thewatermethod Sep 30, 2022

thewatermethod Sep 30, 2022

thewatermethod Sep 30, 2022

thewatermethod Oct 5, 2022

thewatermethod Oct 5, 2022

nvms Oct 5, 2022

thewatermethod Oct 6, 2022

thewatermethod Oct 5, 2022

nvms Oct 5, 2022

nvms left a comment

kryswisnaskas commented Oct 6, 2022

thewatermethod commented Oct 6, 2022

[TTAHUB-1037] Identify and fix memory leak issue #1050

[TTAHUB-1037] Identify and fix memory leak issue #1050

Conversation

thewatermethod commented Sep 29, 2022 • edited

Description of change

Proposed fix

Issue History

How to test

Issue(s)

Checklists

Every PR

Production Deploy

After merge/deploy

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvms left a comment

Choose a reason for hiding this comment

kryswisnaskas commented Oct 6, 2022

thewatermethod commented Oct 6, 2022

thewatermethod commented Sep 29, 2022 •

edited