-
Notifications
You must be signed in to change notification settings - Fork 851
proposal: trim query_range response to user-requested time window #7288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
venkatchinmay
wants to merge
1
commit into
cortexproject:master
Choose a base branch
from
venkatchinmay:proposal/query-range-response-trimming
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,187 @@ | ||
| --- | ||
| title: "Query Range Response Trimming" | ||
| linkTitle: "Query Range Response Trimming" | ||
| weight: 1 | ||
| slug: query-range-response-trimming | ||
| --- | ||
|
|
||
| - Author: @chinmay venkat | ||
| - Date: February 2026 | ||
| - Status: Proposed | ||
|
|
||
| ## Problem | ||
|
|
||
| When `querier.align-querier-with-step: true` is configured, the query frontend modifies the user's requested `start` time by flooring it to the nearest `step` boundary **before** passing the request to the results cache and downstream querier. | ||
|
|
||
| This causes the response to contain data points **outside** the user's originally requested time range. | ||
|
|
||
| ### Example | ||
|
|
||
| | Config | Value | | ||
| |---|---| | ||
| | `querier.align-querier-with-step` | `true` | | ||
| | `querier.split-queries-by-interval` | `1h` | | ||
| | `step` | `10m` | | ||
|
|
||
| User requests: `start=09:01, end=10:00` | ||
|
|
||
| `StepAlignMiddleware` transforms this to: `start=09:00, end=10:00` | ||
|
|
||
| The user receives data from `09:00` — **1 minute earlier than requested**. | ||
|
|
||
| With larger step values (e.g., `30m`), the delta is proportionally larger. Additionally, `SplitByIntervalMiddleware`'s `nextIntervalBoundary` calculation can cause sub-request start/end to drift from the user's requested window, compounding the effect. | ||
|
|
||
| ## Root Cause | ||
|
|
||
| The middleware chain in `query_range_middlewares.go` is ordered as: | ||
|
|
||
| ``` | ||
| LimitsMiddleware | ||
| → StepAlignMiddleware ← mutates: start = floor(start / step) * step | ||
| → SplitByIntervalMiddleware | ||
| → ResultsCacheMiddleware ← sees aligned start; original start is permanently lost | ||
| → ShardByMiddleware | ||
| ``` | ||
|
|
||
| Once `StepAlignMiddleware` calls `r.WithStartEnd(alignedStart, alignedEnd)`, the original user-requested `start`/`end` values are **permanently gone from the request object** and no downstream middleware can use them to trim the final response. | ||
|
|
||
| The `extractSampleStreams(start, end, ...)` function that trims samples by timestamp already exists in `results_cache.go` and is used inside `partition()` — but only with the post-alignment `start`, not the original. | ||
|
|
||
| ## Proposed Solution | ||
|
|
||
| Add a `RangeTrimMiddleware` as the **outermost** middleware in the chain, controlled by a new opt-in configuration flag `querier.trim-response-to-requested-range`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer if we don't add any flag. It's a bugfix, if we are returning more samples than we should. |
||
|
|
||
| When enabled, it captures the original `start`/`end` before any mutation, lets the full middleware stack execute internally (alignment, splitting, caching, sharding), then trims the final response back to the user's original window using the existing `Extractor.Extract()` interface. | ||
|
|
||
| ### 1. Config flag — `pkg/querier/tripperware/queryrange/query_range_middlewares.go` | ||
|
|
||
| Add `TrimResponseToRequestedRange` to `Config`, following the same pattern as `AlignQueriesWithStep` and `CacheResults`: | ||
|
|
||
| ```go | ||
| // Config for query_range middleware chain. | ||
| type Config struct { | ||
| SplitQueriesByInterval time.Duration `yaml:"split_queries_by_interval"` | ||
| DynamicQuerySplitsConfig DynamicQuerySplitsConfig `yaml:"dynamic_query_splits"` | ||
| AlignQueriesWithStep bool `yaml:"align_queries_with_step"` | ||
| TrimResponseToRequestedRange bool `yaml:"trim_response_to_requested_range"` // ← NEW | ||
| ResultsCacheConfig `yaml:"results_cache"` | ||
| CacheResults bool `yaml:"cache_results"` | ||
| MaxRetries int `yaml:"max_retries"` | ||
| ForwardHeaders flagext.StringSlice `yaml:"forward_headers_list"` | ||
| VerticalShardSize int `yaml:"-"` | ||
| } | ||
|
|
||
| func (cfg *Config) RegisterFlags(f *flag.FlagSet) { | ||
| // ... existing flags ... | ||
| f.BoolVar(&cfg.TrimResponseToRequestedRange, | ||
| "querier.trim-response-to-requested-range", false, | ||
| "When enabled, the query frontend trims the response to exactly the "+ | ||
| "[start, end] range requested by the user, removing any extra data "+ | ||
| "introduced by align_queries_with_step or interval split-boundary rounding.") | ||
| } | ||
| ``` | ||
|
|
||
| ### 2. New File: `pkg/querier/tripperware/queryrange/range_trim.go` | ||
|
|
||
| ```go | ||
| package queryrange | ||
|
|
||
| import ( | ||
| "context" | ||
| "github.com/cortexproject/cortex/pkg/querier/tripperware" | ||
| ) | ||
|
|
||
| // NewRangeTrimMiddleware returns a middleware that clips the final response | ||
| // to the original [start, end] requested by the user. Enable via | ||
| // querier.trim-response-to-requested-range. | ||
| func NewRangeTrimMiddleware(extractor Extractor) tripperware.Middleware { | ||
| return tripperware.MiddlewareFunc(func(next tripperware.Handler) tripperware.Handler { | ||
| return &rangeTrimHandler{next: next, extractor: extractor} | ||
| }) | ||
| } | ||
|
|
||
| type rangeTrimHandler struct { | ||
| next tripperware.Handler | ||
| extractor Extractor | ||
| } | ||
|
|
||
| func (h rangeTrimHandler) Do(ctx context.Context, r tripperware.Request) (tripperware.Response, error) { | ||
| origStart := r.GetStart() | ||
| origEnd := r.GetEnd() | ||
|
|
||
| resp, err := h.next.Do(ctx, r) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| return h.extractor.Extract(origStart, origEnd, resp), nil | ||
| } | ||
| ``` | ||
|
|
||
| ### 3. Register conditionally in `Middlewares()` | ||
|
|
||
| Register as the **first** entry so it wraps the entire chain: | ||
|
|
||
| ```go | ||
| func Middlewares(cfg Config, ...) ([]tripperware.Middleware, cache.Cache, error) { | ||
| metrics := tripperware.NewInstrumentMiddlewareMetrics(registerer) | ||
|
|
||
| // Outermost: trim response to original user-requested range | ||
| queryRangeMiddleware := []tripperware.Middleware{} | ||
| if cfg.TrimResponseToRequestedRange { // ← NEW | ||
| queryRangeMiddleware = append(queryRangeMiddleware, | ||
| tripperware.InstrumentMiddleware("range_trim", metrics), | ||
| NewRangeTrimMiddleware(cacheExtractor)) | ||
| } | ||
|
|
||
| queryRangeMiddleware = append(queryRangeMiddleware, | ||
| NewLimitsMiddleware(limits, lookbackDelta)) | ||
| if cfg.AlignQueriesWithStep { | ||
| queryRangeMiddleware = append(queryRangeMiddleware, | ||
| tripperware.InstrumentMiddleware("step_align", metrics), | ||
| StepAlignMiddleware) | ||
| } | ||
| // ... rest unchanged | ||
| } | ||
| ``` | ||
|
|
||
| Because middlewares compose like an onion, the first entry in the slice is the outermost layer — it sees the original request and intercepts the final response after all inner middlewares have run. | ||
|
|
||
| ## Why This Is Safe | ||
|
|
||
| | Concern | Answer | | ||
| |---|---| | ||
| | Breaks results caching? | No — cache operates internally with aligned times, untouched | | ||
| | Breaks query splitting? | No — splitting happens inside, RangeTrim only touches the final response | | ||
| | `cacheExtractor` already available? | ✅ Already a parameter of `Middlewares()` | | ||
| | `Extract()` correct for trimming? | ✅ `extractSampleStreams(start, end, ...)` filters by sample timestamps, same mechanism used in `partition()` | | ||
| | Affects non-aligned mode? | No — if start/end are not mutated, `Extract(origStart, origEnd)` is a no-op | | ||
| | Stats trimming? | ✅ `extractStats(start, end, ...)` already handles per-step stats correctly | | ||
|
|
||
| ## Files Changed | ||
|
|
||
| | File | Type | Description | | ||
| |---|---|---| | ||
| | `pkg/querier/tripperware/queryrange/range_trim.go` | New | `RangeTrimMiddleware` + `rangeTrimHandler` | | ||
| | `pkg/querier/tripperware/queryrange/query_range_middlewares.go` | Modified | Add `TrimResponseToRequestedRange` to `Config`, `RegisterFlags`, and `Middlewares()` | | ||
| | `pkg/querier/tripperware/queryrange/range_trim_test.go` | New | Unit tests | | ||
| | `CHANGELOG.md` | Modified | Entry under `## main / unreleased` | | ||
|
|
||
| ## Test Plan | ||
|
|
||
| New unit tests in `range_trim_test.go`: | ||
|
|
||
| 1. **Flag disabled (default)**: middleware not registered, response unchanged | ||
| 2. **Alignment trim**: `trim_response_to_requested_range=true` + `align_queries_with_step=true`, `step=10m`, `start=09:01` → verify no samples before `09:01` | ||
| 3. **Already aligned (no-op)**: `start=09:00` with `step=10m` → response unchanged | ||
| 4. **Split boundary trim**: `split_queries_by_interval=1h`, `start=09:01` → verify no data before `09:01` | ||
| 5. **End trim**: `end=09:59` with internal boundary at `10:00` → verify no data after `09:59` | ||
|
|
||
| ```bash | ||
| # Run new tests | ||
| go test ./pkg/querier/tripperware/queryrange/... -run TestRangeTrim -v | ||
|
|
||
| # Run full suite to confirm no regressions | ||
| go test ./pkg/querier/tripperware/queryrange/... -v | ||
| ``` | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bug to me. can you create a test case to trigger this behavior ?.
If you can trigger the behavior and is buggy, you don't need a proposal. You just need a unit test that triggers it and the fix.
The only thing though is what happens if we have increased CPU usage or less cache efficiency. Those things will matter in this case too