Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run tests in parallel #509

Merged
merged 12 commits into from
Feb 14, 2024
Merged

run tests in parallel #509

merged 12 commits into from
Feb 14, 2024

Conversation

Yaiba
Copy link
Contributor

@Yaiba Yaiba commented Feb 14, 2024

This pr support run tests in parallel, and CI use this mode by default, but not enabled in task test:xx since it give us better logging.
task test:it:nb -- -parallel-mode finished in 90s on my machine.

Things changed:

  • remove Teardown helper function, properly utilize t.Cleanup
  • a dev docker-compose file for acceptance test(for task dev:up)
  • a dev docker-compose file for integration test(for running locally using docker-compose directly)
  • a new docker network will be created for every integration test
  • a docker-compose template file is used to dynamically create docker-compose.yml file using new docker network
  • a new flag -parallel-mode to run test in parallel mode
  • remove .env file manipulation

Notice:

  • kgw tests cannot run in parallel mode, because we need to pre-configre domain. It takes around 100s to finish on my machine
  • kgw tests will be run in CI

}

//nolint:staticcheck
n, err := testcontainers.GenericNetwork(ctx, testcontainers.GenericNetworkRequest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says GenericNetworkRequest is going to be removed in the future. Why don't we use network.New as the deprecation message says?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that docs (for deprecated) doesn't seems right, this whole function is basic network.New, only it support specify a network name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIC your comment now. That's weird. I guess we can go this way, but it does seem like a random network name would be fine because we can get it after creating with localNetwork.Name. NBD. If these methods get removed in the future we can rewrite then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, I forget now i'm create a new network for each test. I did this originally because I need to re-use the same network. I'll remove this function

r.t.Logf("teardown docker compose")
dc.Down(ctx, compose.RemoveOrphans(true), compose.RemoveImagesLocal)
})

r.t.Cleanup(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you condition this on if !r.cfg.NoCleanup {?

Getting this running directly with Docker Daemon (not Desktop) was pretty frustrating and I had to use the -messy flag to see what was going on.

Also, can we have nodecfg write a log file in addition to stdout so that if the container nukes we at least can see the log file?

i.e.

diff --git a/cmd/kwil-admin/nodecfg/generate.go b/cmd/kwil-admin/nodecfg/generate.go
index bec1d4f5..09f0e71a 100644
--- a/cmd/kwil-admin/nodecfg/generate.go
+++ b/cmd/kwil-admin/nodecfg/generate.go
@@ -167,6 +167,7 @@ func GenerateNodeFiles(outputDir string, originalCfg *config.KwildConfig, silenc
                return nil, fmt.Errorf("failed to get absolute path for output directory: %w", err)
        }
        cfg.RootDir = rootDir
+       cfg.Logging.OutputPaths = append(cfg.Logging.OutputPaths, "kwild.log")
 
        cometbftRoot := filepath.Join(outputDir, abciDir)
 

I don't think writing a log file would be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I'll add it.

@jchappelow
Copy link
Contributor

I frequently get failures with TestKwildEthDepositOracleIntegration. It looks like the wait strategy and/or healthchecks for the pg services are not working in that one:

kwild version 0.7.0-pre+ (Go version go1.22.0)
Root directory "/app/kwil"
Private key path: /app/kwil/private_key
Error: panic while building kwild: kwild database open failed: failed to connect to `host=pg1 user=kwild database=kwild`: dial error (dial tcp 192.168.64.8:5432: connect: connection refused)

stack:

goroutine 1 [running]:
github.com/kwilteam/kwil-db/cmd/kwild/server.New.func1()
	/app/cmd/kwild/server/server.go:66 +0xa5
panic({0x15185c0?, 0xc0004a9280?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/kwilteam/kwil-db/cmd/kwild/server.failBuild({0x1b28c40?, 0xc0005160f0?}, {0x181b27b, 0x1a})

@jchappelow
Copy link
Contributor

@charithabandi I think 6772e5d is breaking things somehow. Even on main I get intermittent failures on nodeX startup with the oracle tests

@Yaiba
Copy link
Contributor Author

Yaiba commented Feb 14, 2024

I frequently get failures with TestKwildEthDepositOracleIntegration. It looks like the wait strategy and/or healthchecks for the pg services are not working in that one:

kwild version 0.7.0-pre+ (Go version go1.22.0)
Root directory "/app/kwil"
Private key path: /app/kwil/private_key
Error: panic while building kwild: kwild database open failed: failed to connect to `host=pg1 user=kwild database=kwild`: dial error (dial tcp 192.168.64.8:5432: connect: connection refused)

stack:

goroutine 1 [running]:
github.com/kwilteam/kwil-db/cmd/kwild/server.New.func1()
	/app/cmd/kwild/server/server.go:66 +0xa5
panic({0x15185c0?, 0xc0004a9280?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/kwilteam/kwil-db/cmd/kwild/server.failBuild({0x1b28c40?, 0xc0005160f0?}, {0x181b27b, 0x1a})

did you rebuild docker? I don't get this on my machine.

@jchappelow
Copy link
Contributor

My machine was in a funny state, but the error has to do with a pg service health check. When a container running these postgres images starts up, it runs an init script after which it restarts the postgres process. Before the restart, pg_isready can indicate all ready and we get a false start as kwild then tries to connect while the postgres service is shut down. It's like what's described in here.

I'm trying the following:

     healthcheck:
-      test: ["CMD-SHELL", "pg_isready"]
+      test: ["CMD-SHELL", "pg_isready", "-d", "kwild", "-U", "kwild"]

This makes it connect to the "kwild" database using the "kwild" role, both of which are created by the init script. It can still get a false start, but it's harder since the postgres process restarts immediately after creating the kwild database, so there's a much briefer window.

For whatever reason, I don't see this false start behavior with Docker Desktop, only when using the Docker Daemon/Engine directly

Copy link
Contributor

@jchappelow jchappelow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It's really hitting some edge cases with pg service initialization on my machine, but that's not a strong reason to hold this up. Please merge when you are ready so that @charithabandi can update here branch since there's some overlap.

@Yaiba Yaiba merged commit 454bb4c into main Feb 14, 2024
1 check passed
@Yaiba Yaiba deleted the test/container-port-random branch February 14, 2024 20:29
@jchappelow
Copy link
Contributor

Moving on, but wanted to note depends_on support in testcontainers is pending: testcontainers/testcontainers-go#2035
This seems to imply that the wait strategies employed are purely to block the Up method from returning (proceeding to tests that use the containers), but container dependencies defined within the docker-compose.yml file are enforced by the docker Engine. At least that's my read.
We discussed a hack like pg_isready -q && sleep 1, which very well may work, but I think we're good for now if CI is happy. And it is.

brennanjl pushed a commit that referenced this pull request Feb 26, 2024
Support to run acceptance tests and integration tests in parallel mode, by map test containers to random host port, with an exception of `TestKGW`.
brennanjl pushed a commit that referenced this pull request Feb 26, 2024
Support to run acceptance tests and integration tests in parallel mode, by map test containers to random host port, with an exception of `TestKGW`.
jchappelow pushed a commit that referenced this pull request Feb 26, 2024
Support to run acceptance tests and integration tests in parallel mode, by map test containers to random host port, with an exception of `TestKGW`.
brennanjl pushed a commit that referenced this pull request Feb 26, 2024
Support to run acceptance tests and integration tests in parallel mode, by map test containers to random host port, with an exception of `TestKGW`.
brennanjl pushed a commit that referenced this pull request Feb 26, 2024
Support to run acceptance tests and integration tests in parallel mode, by map test containers to random host port, with an exception of `TestKGW`.
brennanjl pushed a commit that referenced this pull request Feb 26, 2024
Support to run acceptance tests and integration tests in parallel mode, by map test containers to random host port, with an exception of `TestKGW`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants