Skip to content

Flaky test: TestDistributorQuerier_QueryIngestersWithinBoundary due to wall-clock race condition #7415

@CharlieTLe

Description

@CharlieTLe

Problem

TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary is flaky, particularly on slow ARM CI runners.

Example failure: https://github.com/cortexproject/cortex/actions/runs/24256703215/job/70829934477

--- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary (0.00s)
    --- FAIL: TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary (0.00s)
        distributor_queryable_test.go:638: 
            Error Trace:	distributor_queryable_test.go:638
            Error:      	"[]" should have 1 item(s), but has 0
            Test:       	TestDistributorQuerier_QueryIngestersWithinBoundary/maxT_well_after_lookback_boundary
            Messages:   	should manipulate when maxT is well after boundary

Root Cause

The test captures time.Now() at setup and uses it to compute query boundaries relative to a 1-hour lookback window. However, distributorQuerier.Select() calls time.Now() again internally to compute the ingester query boundary (pkg/querier/distributor_queryable.go:120).

The failing subtest "maxT well after lookback boundary" sets queryMaxT = testNow - 50min. Inside Select, the boundary is computed as realNow - 1h. If realNow has drifted more than 10 seconds past testNow (due to slow test execution on ARM runners), then minT > maxT, the query short-circuits with an empty result, and no distributor call is made.

The 10-second margin in the test case is too tight for slow CI environments.

This test was introduced in #7323.

Possible Solutions

  1. Inject a clock — Have distributorQuerier accept a now function (defaulting to time.Now) so tests can control time.
  2. Increase the margin — Change -lookback + 10*time.Second to a larger value like -lookback + 5*time.Minute to tolerate clock drift.

Option 1 is the more robust fix. Option 2 is a quick mitigation.

Affected Files

  • pkg/querier/distributor_queryable_test.go (test setup at line 606-612)
  • pkg/querier/distributor_queryable.go (wall-clock call at line 120)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions