Skip to content

KC-1210: Cache get_all_gateways() to prevent pam/get_controllers throttle storm#1943

Open
maksimu wants to merge 16 commits intoreleasefrom
KC-1210
Open

KC-1210: Cache get_all_gateways() to prevent pam/get_controllers throttle storm#1943
maksimu wants to merge 16 commits intoreleasefrom
KC-1210

Conversation

@maksimu
Copy link
Copy Markdown
Collaborator

@maksimu maksimu commented Apr 8, 2026

Summary

Add thread-safe caching (60s TTL) to get_all_gateways() which calls keeperapp's pam/get_controllers. Under concurrent load, multiple service threads now share a single cached result instead of each making independent API calls - preventing the server-side rate limit (10 req/10s) from being exceeded and eliminating the cascading throttle loop.

Context

When someone running Service Mode in Containers the calls might get throttled on pam/get_controllers. When multiple PAM commands execute concurrently, each independently calls get_all_gateways(), overwhelming the rate limit. Throttled requests retry with 30s backoff, but the caller times out and retries sooner, spawning new threads that compound the problem.

erinlewis-keeper and others added 15 commits April 3, 2026 14:40
* Added full terminal reset after ssh session exit (clears scrollback etc.)

* Fixes randomly eaten first characters typed after ssh session exit (stdin race condition)

* Fixed random duplicate typed characters (echo)
#1908)

- Added `move` to the list of `apply-action` choices, which will move all returned users to a node - specified with `target-node`

- Added `all` to the list of `status` choices, returning invited, active and locked users with `d=0`

- Fixed an issue where the users updated with `apply-action` are not the same as those filtered with the `node` argument.

- Fixed a potential unwanted behavior where the `node` argument returns a recursive node and subnodes search, preventing you from applying actions to a specific node if it has subnodes. By default, using the `node` argument will only return result from the specified node.

- Added a `recursive` argument to replicate the old behavior with `node` filter.
* Implement convert-all command

* Refactor Convert and Convert-all classes along with a ConvertHelper for common methods
@maksimu maksimu requested a review from sk-keeper April 8, 2026 21:27
…ttle storm

Add thread-safe caching with 60s TTL to get_all_gateways() which calls
keeperapp's pam/get_controllers endpoint. Under concurrent load, multiple
threads sharing the same cached result instead of each making independent
API calls. Prevents exceeding the server-side rate limit (10 req/10s)
and the resulting cascading throttle loop.

Cache is invalidated on gateway removal to prevent stale data.
@maksimu maksimu changed the title KC-1210: Fix Service Mode slow startup and pam/get_controllers throttling KC-1210: Cache get_all_gateways() to prevent pam/get_controllers throttle storm Apr 8, 2026
include_config_dict=True)

one_time_token = config_str_and_config_dict.get('config_str')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding invalidate_gateway_cache() in create_gateway() as well, similar to remove_gateway().

In edit.py, get_all_gateways() is called at line 344 to check name uniqueness, which populates the cache. A new gateway is then created using create_gateway() at line 360, followed by another call to get_all_gateways() at line 376 to fetch the new gateway’s UID.

Since these calls happen back-to-back, the second call may hit stale cached data, causing the newly created gateway to not be found.

@sk-keeper sk-keeper force-pushed the release branch 3 times, most recently from 442d7af to 9ec9b7a Compare April 15, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.