Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: TestIngester_PreparePartitionDownscaleHandler #7848

Open
charleskorn opened this issue Apr 9, 2024 · 3 comments
Open

Flaky test: TestIngester_PreparePartitionDownscaleHandler #7848

charleskorn opened this issue Apr 9, 2024 · 3 comments
Assignees

Comments

@charleskorn
Copy link
Contributor

From https://github.com/grafana/mimir/actions/runs/8595883035/job/23552003253#step:8:69:

--- FAIL: TestIngester_PreparePartitionDownscaleHandler (0.00s)
    --- FAIL: TestIngester_PreparePartitionDownscaleHandler/DELETE_request_after_a_POST_request_should_switch_the_partition_back_to_ACTIVE_state (3.00s)
        logger.go:22: level warn msg -blocks-storage.backend=filesystem is for development and testing only; you should switch to an external object store for production use or use a shared filesystem
        logger.go:22: level info msg TSDB idle compaction timeout set timeout 1h6m45.982034156s
        logger.go:22: level info msg opening existing TSDBs
        logger.go:22: level info component ingest_reader partition 0 msg starting consumption from partition start because no committed offset has been found start_offset -2 consumer_group ingester-zone-a-0
        logger.go:22: level info component ingest_reader partition 0 component kafka_client msg immediate metadata update triggered why querying metadata for consumer initialization
        logger.go:22: level info component ingest_reader partition 0 msg partition reader is starting to consume partition until max consumer lag is honored max_lag 15s
        logger.go:22: level info component ingest_reader partition 0 component kafka_client msg assigning partitions why new assignments from direct consumer how assigning everything new, keeping current assignment input mimir[0{-2 e-1 ce0}]
        logger.go:22: level info component ingest_reader partition 0 msg partition reader found no records to consume because partition is empty partition_start_offset 0 last_produced_offset -1
        logger.go:22: level info msg not loading tokens from file, tokens file path is empty
        logger.go:22: level info msg instance not found in ring, adding with no tokens ring ingester
        logger.go:22: level debug msg JoinAfter expired ring ingester
        logger.go:22: level info msg auto-joining cluster after timeout ring ingester
        logger.go:22: level info ring ingester-partitions msg partition not found in the ring partition 0
        logger.go:22: level info ring ingester-partitions msg switching partition state because enough owners have been registered and minimum waiting time has elapsed partition 0 from_state PartitionPending to_state PartitionActive
        ingester_ingest_storage_test.go:449: 
            	Error Trace:	/__w/mimir/mimir/pkg/ingester/ingester_ingest_storage_test.go:449
            	Error:      	Condition never satisfied
            	Test:       	TestIngester_PreparePartitionDownscaleHandler/DELETE_request_after_a_POST_request_should_switch_the_partition_back_to_ACTIVE_state
        logger.go:22: level info ring ingester-partitions msg partition ring lifecycler is shutting down ring ingester-partitions
        logger.go:22: level info component ingest_reader partition 0 msg stopping partition reader
        logger.go:22: level info msg lifecycler loop() exited gracefully ring ingester
        logger.go:22: level info msg changing instance state from old_state ACTIVE new_state LEAVING ring ingester
        logger.go:22: level info msg transfers are disabled
        logger.go:22: level info msg lifecycler entering final sleep before shutdown final_sleep 0s
        logger.go:22: level debug msg unregistering instance from ring ring ingester
        logger.go:22: level info msg instance removed from the KV store ring ingester
    --- FAIL: TestIngester_PreparePartitionDownscaleHandler/POST_request_should_switch_the_partition_state_to_INACTIVE (3.00s)
        logger.go:22: level warn msg -blocks-storage.backend=filesystem is for development and testing only; you should switch to an external object store for production use or use a shared filesystem
        logger.go:22: level info msg TSDB idle compaction timeout set timeout 1h13m49.429608562s
        logger.go:22: level info msg opening existing TSDBs
        logger.go:22: level info component ingest_reader partition 0 msg starting consumption from partition start because no committed offset has been found start_offset -2 consumer_group ingester-zone-a-0
        logger.go:22: level info component ingest_reader partition 0 component kafka_client msg immediate metadata update triggered why querying metadata for consumer initialization
        logger.go:22: level info component ingest_reader partition 0 msg partition reader is starting to consume partition until max consumer lag is honored max_lag 15s
        logger.go:22: level info component ingest_reader partition 0 component kafka_client msg assigning partitions why new assignments from direct consumer how assigning everything new, keeping current assignment input mimir[0{-2 e-1 ce0}]
        logger.go:22: level info component ingest_reader partition 0 msg partition reader found no records to consume because partition is empty partition_start_offset 0 last_produced_offset -1
        logger.go:22: level info msg not loading tokens from file, tokens file path is empty
        logger.go:22: level info msg instance not found in ring, adding with no tokens ring ingester
        logger.go:22: level debug msg JoinAfter expired ring ingester
        logger.go:22: level info msg auto-joining cluster after timeout ring ingester
        logger.go:22: level info ring ingester-partitions msg partition not found in the ring partition 0
        logger.go:22: level info ring ingester-partitions msg switching partition state because enough owners have been registered and minimum waiting time has elapsed partition 0 from_state PartitionPending to_state PartitionActive
        ingester_ingest_storage_test.go:429: 
            	Error Trace:	/__w/mimir/mimir/pkg/ingester/ingester_ingest_storage_test.go:429
            	Error:      	Condition never satisfied
            	Test:       	TestIngester_PreparePartitionDownscaleHandler/POST_request_should_switch_the_partition_state_to_INACTIVE
        logger.go:22: level info ring ingester-partitions msg partition ring lifecycler is shutting down ring ingester-partitions
        logger.go:22: level info component ingest_reader partition 0 msg stopping partition reader
        logger.go:22: level info msg lifecycler loop() exited gracefully ring ingester
        logger.go:22: level info msg changing instance state from old_state ACTIVE new_state LEAVING ring ingester
        logger.go:22: level info msg transfers are disabled
        logger.go:22: level info msg lifecycler entering final sleep before shutdown final_sleep 0s
        logger.go:22: level debug msg unregistering instance from ring ring ingester
        logger.go:22: level info msg instance removed from the KV store ring ingester
level=info msg="uploading new block to long-term storage" block=00000000010000000000000000
level=debug msg="uploaded file" from=/tmp/TestShipper_DeceivingUploadErrors3648998313/001/00000000010000000000000000/index dst=00000000010000000000000000/index bucket="fs: /tmp/TestShipper_DeceivingUploadErrors3648998313/002"
level=error msg="uploading new block to long-term storage failed" block=00000000010000000000000000 err="upload meta file: base name matches, will fail upload"
FAIL
FAIL	github.com/grafana/mimir/pkg/ingester	540.701s
@charleskorn
Copy link
Contributor Author

Looks like @pracucci added this test, could you please take a look Marco?

@pracucci
Copy link
Collaborator

pracucci commented Apr 9, 2024

Looks like @pracucci added this test, could you please take a look Marco?

Yes, I'm looking into it.

@pracucci
Copy link
Collaborator

pracucci commented Apr 9, 2024

The logs of the failing test execution are puzzling me because we get the log switching partition state because enough owners have been registered and minimum waiting time has elapsed partition 0 from_state PartitionPending to_state PartitionActive but then the assertion on that specific condition fails. I've the feeling it's a timing issue, but also the usage of require.Eventually() hides what we actually get when listing partitions.

I've opened a PR to help me debugging it: #7851

If you see this issue again after #7851 is merged, please post a message here with the CI link. Thanks!

@pracucci pracucci self-assigned this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants