Skip to content

fix: increase K8s API rate limits and leader election timeouts#488

Closed
yuvarajl wants to merge 0 commit intoapache:masterfrom
arcana-hub:fix/leader-election-rate-limits
Closed

fix: increase K8s API rate limits and leader election timeouts#488
yuvarajl wants to merge 0 commit intoapache:masterfrom
arcana-hub:fix/leader-election-rate-limits

Conversation

@yuvarajl
Copy link
Copy Markdown

The operator restarts ~every 21h due to leader election lease loss. Root cause: default client-go rate limits (QPS=5, Burst=10) are too low for an operator managing 26+ pods (5 FE + 9+3+3+1 BE + 5 MS). During heavy reconciliation, API calls queue up in the rate limiter and the lease renewal call times out (context deadline exceeded).

Changes:

  • Increase K8s API client QPS from 5 to 50, Burst from 10 to 100
  • Increase LeaseDuration from 15s to 30s
  • Increase RenewDeadline from 10s to 20s
  • Increase RetryPeriod from 2s to 5s

Error from logs:
client rate limiter Wait returned an error: context deadline exceeded failed to renew lease doris/e1370669.selectdb.com: context deadline exceeded problem running manager: leader election lost

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@yuvarajl yuvarajl closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant