fix: don't panic when logs waits for more than 5 seconds #1777

mdelapenya · 2023-10-16T08:06:18Z

Disclaimer

Credits to @martin-sucha, as the original author of this PR. 🙌

I resolved the conflicts manually, reverting the changes on the logconsumer_test.go file. I sent a different PR because I could not push to the upstream of #1278. But all the commits have been preserved to keep history and authorship untouched.

What does this PR do?

This should fix #946

There are two commits.

The first commit fixes synchronization issues in producer tests.

The second commit removes panic when logs endpoint takes more than 5 seconds to respond. The panic happened at least with podman when no new logs appear when using follow and since parameters.

We keep retrying until the context is canceled (the retry request would fail anyway) or the producer is stopped, whichever comes first. This makes the retry behavior consistent. The HTTP calls to fetch logs are interrupted immediately (previously the producer waited for in-progress calls to complete).

Why is it important?

Panicking the whole test suite breaks all my tests if the container ends early or there is some temporary error fetching the logs.

Related issues

Closes [Bug]: log producer panics if no logs are produced when using podman #946
Closes [Bug]: can not get logs from container which is dead or marked for removal #606
Supersedes fix: don't panic when logs waits for more than 5 seconds #947
Relates fix race condition in Test_StartStop #1700
Supersedes and Closes fix: don't panic when logs waits for more than 5 seconds (v2) #1278
Closes fix: fix log producer stopping block indefinitely #1783

The consumer's Accept method is running in a different gorutine than the main test. We need to synchronize access to the messages slice. Some tests waited for Ack before accessing Msgs and were properly synchronized. TestContainerLogWithErrClosed accessed Msgs while the consumer was running and the memory access was not synchronized. Test_StartStop did not wait for the expected lines to appear before stopping the log producer, there was nothing to guarantee that at least one line would be read. The test passed because currently the producer does not interrupt outstanding HTTP requests to fetch logs, so it implicitly waited for some logs to be received. We now wait for the mlem line to appear before stopping the log producer.

This removes panic when logs endpoint takes more than 5 seconds to respond. The panic happened at least with podman when no new logs appear when using follow and since parameters. We keep retrying until the context is canceled (the retry request would fail anyway with canceled context) or the producer is stopped, whichever comes first. This makes the retry behavior consistent with closed connections handling. Outstanding HTTP calls for fetching logs are now interrupted when a producer is stopped. Previously the consumer and StopProducer() waited for the HTTP call to complete. This should fix testcontainers#946

If the context is done, we close the log producer. That is not an error, the context cancellation signals that the consumer should stop. If there is a non-context error during the HTTP call or while reading the response, retry the HTTP request in 1 second again. Previously, the error handling was inconsistent: - an error while reading HTTP response headers would retry the HTTP request - but an error while reading the body would just end the log producer With this commit, the error handling should be more consistent.

* main: (137 commits) Fix wrong module names (testcontainers#1776) docs: add default options to k6 module (testcontainers#1744) fix race condition in Test_StartStop (testcontainers#1700) chore(deps): bump github.com/aws/aws-sdk-go-v2/{service/s3,credentials,config} in /modules/localstack (testcontainers#1773) chore(deps): bump cloud.google.com/go/{datastore,bigtable,spanner} in /modules/gcloud (testcontainers#1774) chore(deps): bump golang.org/x/net from 0.15.0 to 0.17.0 (testcontainers#1772) docs: Fix typo and mention the relevant function name in doc (testcontainers#1745) DOCKER_HOST var typo (testcontainers#1743) feat: Add Cassandra module (testcontainers#1726) K6 module (testcontainers#1721) Rancher Desktop instructions (testcontainers#1724) chore(deps): bump github.com/shirou/gopsutil/v3 from 3.23.8 to 3.23.9 (testcontainers#1720) chore(deps): bump urllib3 from 2.0.5 to 2.0.6 (testcontainers#1718) chore(deps): bump github.com/twmb/franz-go/pkg/kadm in /modules/redpanda (testcontainers#1714) chore(deps): bump github.com/couchbase/gocb/v2 in /modules/couchbase (testcontainers#1704) chore(deps): bump github.com/neo4j/neo4j-go-driver/v5 in /modules/neo4j (testcontainers#1713) chore(deps): bump github.com/nats-io/nats.go in /modules/nats (testcontainers#1705) chore(deps): bump cloud.google.com/go/firestore from 1.12.0 to 1.13.0, google.golang.org/api from 0.142.0 to 0.143.0 and cloud.google.com/ge, google.golang.org/api from 0.142.0 to 0.143.0 and cloud.google.com/go/bigquery from 1.53.0 to 1.55 in /modules/gcloud (testcontainers#1716) chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.39.0 to 1.40.0 and github.com/aws/aws-sdk-go from 1.45.15 to 1.45.19 in /modules/localstack (testcontainers#1717) chore: prepare for next minor development cycle (0.26.0) ...

netlify · 2023-10-16T08:06:25Z

✅ Deploy Preview for testcontainers-go ready!

Name	Link
🔨 Latest commit	`51d6f8b`
🔍 Latest deploy log	https://app.netlify.com/sites/testcontainers-go/deploys/6538d66a826a6500088ebb07
😎 Deploy Preview	https://deploy-preview-1777--testcontainers-go.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

docker.go

* main: fix: data race in docker client `Info()` (testcontainers#1779) Use correct formatting directive for errors in lifecycle logs (testcontainers#1780) chore(deps): bump golang.org/x/mod from 0.12.0 to 0.13.0 in /modules/{elasticsearch,kafka} and /modulegen (testcontainers#1778) chore(deps): bump github.com/rabbitmq/amqp091-go in /modules/rabbitmq (testcontainers#1728) chore(deps): bump github.com/ClickHouse/clickhouse-go/v2 (testcontainers#1732) ignore patterns defined in dockerignore (testcontainers#1725)

docker.go

pmalek · 2023-10-20T10:18:41Z

docker.go

+					select {
+					case <-ctx.Done():
+						return
+					default:
+						c.logger.Printf("read log header: %+v. %s", err, logRestartedForOutOfSyncMessage)
+					}


You'd have to excuse me for my previous comments. While this is totally OK, the select statement isn't really necessary here. There's a very slim chance of hitting the case <-ctx.Done() branch here so we might as well remove the whole select statement here.

Do you want the default case to be moved to the second select, or simply remove it completely?

We can basically do this:

Suggested change

select {

case <-ctx.Done():

return

default:

c.logger.Printf("read log header: %+v. %s", err, logRestartedForOutOfSyncMessage)

}

c.logger.Printf("read log header: %+v. %s", err, logRestartedForOutOfSyncMessage)

Ok cool! we are going to roll it back to the initial state where the print was out of the select 😅

In any case, thanks for your assistance in this part of the code, some time you find it difficult to control all of them

docker.go

pmalek · 2023-10-20T10:21:18Z

docker.go

+			// we need a separate select here in order to be able to print the error
+			// and this cannot be done in the same select that checks for the timeout.
+			select {
+			case <-ctx.Done():
+				return
+			default:
+				c.logger.Printf("cannot get logs for container %q: %v", c.ID, err)
+			}


Ditto as elsewhere, no point in the separate select, there's a very slim chance the context gets done between check in line 634 and here.

I guess we can extract the log out and remove the select, right?

pmalek · 2023-10-20T10:21:28Z

docker.go

+					select {
+					case <-ctx.Done():
+						return
+					default:
+						c.logger.Printf("read log message: %+v. %s", err, logRestartedForOutOfSyncMessage)
+					}


* main: chore(deps): combine and bump compose dependencies (testcontainers#1787) feat: support for replacing images with custom substitutions (testcontainers#1719)

…operation that does not do anything with context

mdelapenya · 2023-10-25T09:49:21Z

@pmalek we merged #1701 yesterday, which is causing the conflicts here. Do you think that this PR is still needed after that?

pmalek · 2023-10-25T10:12:12Z

@pmalek we merged #1701 yesterday, which is causing the conflicts here. Do you think that this PR is still needed after that?

I'm not sure, haven't compared #1701 with this one in detail. Was just commenting out of good will on this PR :)

mdelapenya · 2023-10-25T11:09:44Z

I'm not sure, haven't compared #1701 with this one in detail. Was just commenting out of good will on this PR :)

Sure, and I do appreciate your help here. Many times, debugging concurrent tasks is extremely difficult, so having more eyes on it is really helpful 🙇

pckhoi · 2023-11-09T02:47:07Z

docker.go

+					err := r.Close()
+					if err != nil {
+						// we can't close the read closer, this should never happen
+						panic(err)


Why are we panicking here? If you just want to log the error and return then isn't it better to consistently do it in the same way?

If the purpose is to print the stack trace then maybe panic is best, but why here?

pckhoi · 2023-11-09T02:55:05Z

I would say this change is still absolutely needed after #1701. I'm still observing StopLogProducer hanging when the go routine panic. We should replace the stopProducer chan bool with the stopProducer context.CancelFunc sooner or later, if not in this PR then in a separate PR.

pmalek · 2023-11-11T14:39:43Z

I would say this change is still absolutely needed after #1701. I'm still observing StopLogProducer hanging when the go routine panic. We should replace the stopProducer chan bool with the stopProducer context.CancelFunc sooner or later, if not in this PR then in a separate PR.

Hey @pckhoi 👋

I've pushed #1911 which should fix this very issue. Still, as you've suggested some refactor of internal structure of DockerContainer might be desirable.

mdelapenya · 2024-01-11T16:42:08Z

I think #1971 supersedes this one. Closing it

martin-sucha and others added 6 commits June 8, 2023 16:40

Merge branch 'main' into ms/issue-946-v2

8b68f2a

Add a comment explaining synchronization

0b6d915

mdelapenya added the bug An issue with the library label Oct 16, 2023

mdelapenya self-assigned this Oct 16, 2023

mdelapenya requested a review from a team as a code owner October 16, 2023 08:06

pmalek reviewed Oct 17, 2023

View reviewed changes

docker.go Outdated Show resolved Hide resolved

pmalek reviewed Oct 17, 2023

View reviewed changes

docker.go Outdated Show resolved Hide resolved

mdelapenya added 2 commits October 17, 2023 17:50

chore: use time.After instead of sleep

1cdb8ce

pmalek reviewed Oct 17, 2023

View reviewed changes

docker.go Outdated Show resolved Hide resolved

chore: print logs honouring context cancellation

9e4b5a1

mdelapenya requested a review from pmalek October 18, 2023 07:43

mdelapenya mentioned this pull request Oct 19, 2023

fix: fix log producer stopping block indefinitely #1783

Open

pmalek reviewed Oct 20, 2023

View reviewed changes

mdelapenya added 4 commits October 24, 2023 09:55

Merge branch 'main' into ms/issue-946-v2

aec5067

* main: chore(deps): combine and bump compose dependencies (testcontainers#1787) feat: support for replacing images with custom substitutions (testcontainers#1719)

chore: merge context done conditions

7453e4a

chore: remove context cancelation checks, as the error comes from an …

e143444

…operation that does not do anything with context

chore: remove context cancelation checks, as the error comes from an …

8c9fbd3

…operation that does not do anything with context

mdelapenya added the hacktoberfest Pull Requests accepted for Hacktoberfest. label Oct 24, 2023

chore: simplify print

51d6f8b

pckhoi reviewed Nov 9, 2023

View reviewed changes

pmalek mentioned this pull request Nov 11, 2023

fix: fix panic when getting container logs #1911

Closed

mdelapenya closed this Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't panic when logs waits for more than 5 seconds #1777

fix: don't panic when logs waits for more than 5 seconds #1777

mdelapenya commented Oct 16, 2023 •

edited

netlify bot commented Oct 16, 2023 •

edited

pmalek Oct 20, 2023

mdelapenya Oct 24, 2023

pmalek Oct 24, 2023

mdelapenya Oct 25, 2023

pmalek Oct 20, 2023

mdelapenya Oct 25, 2023

pmalek Oct 25, 2023

pmalek Oct 20, 2023

mdelapenya commented Oct 25, 2023

pmalek commented Oct 25, 2023

mdelapenya commented Oct 25, 2023

pckhoi Nov 9, 2023 •

edited

pckhoi commented Nov 9, 2023

pmalek commented Nov 11, 2023

mdelapenya commented Jan 11, 2024

fix: don't panic when logs waits for more than 5 seconds #1777

fix: don't panic when logs waits for more than 5 seconds #1777

Conversation

mdelapenya commented Oct 16, 2023 • edited

Disclaimer

What does this PR do?

Why is it important?

Related issues

netlify bot commented Oct 16, 2023 • edited

✅ Deploy Preview for testcontainers-go ready!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdelapenya commented Oct 25, 2023

pmalek commented Oct 25, 2023

mdelapenya commented Oct 25, 2023

pckhoi Nov 9, 2023 • edited

Choose a reason for hiding this comment

pckhoi commented Nov 9, 2023

pmalek commented Nov 11, 2023

mdelapenya commented Jan 11, 2024

mdelapenya commented Oct 16, 2023 •

edited

netlify bot commented Oct 16, 2023 •

edited

pckhoi Nov 9, 2023 •

edited