fix: recalculate isLatest after status changes (#1081)#1201
Conversation
When the version marked is_latest was soft-deleted, no other version was promoted, and because GetCurrentLatestVersion did not filter deleted rows, subsequent publishes with lower semver never became latest either — leaving /versions/latest permanently 404. Recalculate is_latest after every status mutation in the same transaction (already under the per-server advisory lock). Pick the highest non-deleted version; fall back to highest deleted when every version is gone so the server remains addressable via includeDeleted=true lookups used by admin and restore flows. The new SetLatestVersion primitive issues two statements (clear then set) because idx_unique_latest_per_server is a non-deferrable unique partial index — Postgres checks it row-by-row within a single UPDATE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Please note that the latest kubernetes-mcp-server is now 0.0.61. The SQL query should go to the latest published release instead (At this time the one I mentioned. Thx! |
tadasant
left a comment
There was a problem hiding this comment.
One-off SQL is still needed after deploy to heal io.github.containers/kubernetes-mcp-server — clear is_latest on the deleted 1.0.0 and set it on 0.0.59 (as requested in the issue). Requires a maintainer with prod DB access.
Can we do this in a db migration file so it gets automatically applied on deploy? Written in a way that isn't hardcoded to the version numbers.
Migration 014 promotes the highest non-deleted version on every server whose is_latest flag is on a deleted row (or absent entirely). The picker parses major.minor.patch numerically and falls back to published_at for non-semver or as a tiebreaker, mirroring CompareVersions for standard semver and giving a sensible answer otherwise. This auto-heals io.github.containers/kubernetes-mcp-server (latest 0.0.61) and any other servers silently affected by the original bug, without hardcoded version numbers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@tadasant good call — added migration @manusa thanks for the heads-up on |
End-to-end validation against a local registryTo be confident this won't put prod in a bad state, ran the migration against synthetic data that mirrors the affected scenarios — using the unpatched Scenarios seeded via the public API on unpatched
|
| # | Server | Setup | Expected post-migration |
|---|---|---|---|
| A | k8s-style | publish 1.0.0, 0.0.50, 0.0.51, 0.0.59, 0.0.60, 0.0.61; delete 1.0.0 | 0.0.61 promoted |
| B | backport-after-delete | publish 2.0.0; delete 2.0.0; publish 1.0.1, 1.0.0 | 1.0.1 promoted (semver beats publish-time) |
| C | simple delete-latest | publish 1.0.0, 2.0.0; delete 2.0.0 | 1.0.0 promoted |
| D | healthy | publish 1.0.0, 2.0.0 | unchanged |
| E | all-deleted | publish 1.0.0, 2.0.0; delete both | unchanged (admin lookups still resolve) |
| F | deprecated-latest | publish 1.0.0, 2.0.0; deprecate 2.0.0 | unchanged (deprecated counts as non-deleted) |
| G | non-semver | publish build-a, build-b, build-c; delete build-c | build-b promoted (publish-time fallback) |
Pre-migration snapshot (after seeding on unpatched build)
all-deleted 1.0.0 deleted f
all-deleted 2.0.0 deleted t
backport 1.0.0 active f
backport 1.0.1 active f
backport 2.0.0 deleted t ← bug: is_latest stranded on deleted row
deprecated-latest 1.0.0 active f
deprecated-latest 2.0.0 deprecated t
healthy 1.0.0 active f
healthy 2.0.0 active t
k8s-style 0.0.50 active f
k8s-style 0.0.51 active f
k8s-style 0.0.59 active f
k8s-style 0.0.60 active f
k8s-style 0.0.61 active f
k8s-style 1.0.0 deleted t ← bug
nonsemver build-a active f
nonsemver build-b active f
nonsemver build-c deleted t ← bug
simple 1.0.0 active f
simple 2.0.0 deleted t ← bug
Switched to fix branch and rebooted — migration 014 runs on startup
2026/04/25 18:48:45 Applied migration 14: 014_heal_is_latest
2026/04/25 18:48:45 All migrations applied successfully
Diff (post − pre)
backport 1.0.0 active f
-backport 1.0.1 active f
-backport 2.0.0 deleted t
+backport 1.0.1 active t
+backport 2.0.0 deleted f
...
-k8s-style 0.0.61 active f
-k8s-style 1.0.0 deleted t
+k8s-style 0.0.61 active t
+k8s-style 1.0.0 deleted f
nonsemver build-a active f
-nonsemver build-b active f
-nonsemver build-c deleted t
-simple 1.0.0 active f
-simple 2.0.0 deleted t
+nonsemver build-b active t
+nonsemver build-c deleted f
+simple 1.0.0 active t
+simple 2.0.0 deleted fhealthy, deprecated-latest, and all-deleted are byte-identical pre/post — only the broken servers were touched.
Public API checks post-migration
k8s-style 200 0.0.61 isLatest=True
backport 200 1.0.1 isLatest=True
simple 200 1.0.0 isLatest=True
healthy 200 2.0.0 isLatest=True
deprecated-latest 200 2.0.0 isLatest=True
nonsemver 200 build-b isLatest=True
all-deleted 404 ← public correctly 404s
all-deleted (?include_deleted=true) 200 2.0.0 isLatest=True ← admin still resolves
Why this is safe for prod
- The
HAVINGclause in the migration matches only servers where theis_latest=truerow is deleted (or no row is flagged) — healthy servers are byte-identical pre/post. Verified. - Idempotent: re-running on healed data finds nothing to match → no-op.
- Worst-case mispick (semver-tied prereleases — none observed in our scenarios): a single
mcp-publisher statuscall now triggers proper recalc via the code fix.
Summary
Fixes #1081. When the version flagged
is_latestwas soft-deleted, no other version was promoted, and becauseGetCurrentLatestVersiondidn't filter deleted rows, subsequent publishes with a lower semver never became latest — leaving/v0/servers/{name}/versions/latestpermanently 404.This PR has two parts:
1. Code fix — recalculate
is_lateston every status changeis_latestafter every status mutation (UpdateServerStatus,UpdateAllVersionsStatus, and the inline status change inUpdateServer), in the same transaction that already holds the per-server advisory lock.CompareVersionslogic: highest non-deleted version gets the flag; if every version is deleted, the highest deleted version keeps it so admin lookups viaincludeDeleted=truestill resolve the server (important for restore flows).SetLatestVersionDB primitive issues two statements (clear old, set new) —idx_unique_latest_per_serveris a non-deferrable unique partial index, so a singleUPDATEthat flips two rows trips the constraint.2. One-shot migration — heal servers already affected (no manual SQL needed)
014_heal_is_latest.sqlpromotes the highest non-deleted version on every server whoseis_latestflag is on a deleted row (or absent entirely). Picker parsesmajor.minor.patchnumerically and falls back topublished_atfor non-semver or as a tiebreaker. Healsio.github.containers/kubernetes-mcp-serverto0.0.61automatically and catches any other servers silently affected by the original bug, without hardcoded version numbers.Picker behavior vs.
CompareVersions:M.m.psemver: matches exactly, including the backport-after-delete case (e.g. published 1.0.1 then 1.0.0 hotfix after deleting 2.0.0 → picks 1.0.1, not 1.0.0).published_at DESC. Reasonable.1.0.0-alphavs1.0.0): both parse to(1,0,0), tiebreak bypublished_at. Diverges fromCompareVersions(which always says1.0.0 > 1.0.0-alpha). Edge case requires (a) server hit by this bug AND (b) prereleases at the top of the version list. Acceptable; the publisher can flip it with onemcp-publisher statuscall (which now triggers proper recalc thanks to part 1).Verification
End-to-end curl repro of the exact scenario from the issue, run against both
mainand this branch:Before (main):
After (this branch):
Test plan
published_at../internal/... ./cmd/...) passes against a real Postgres.make lint(0 issues) andmake validatepass.mainstill reproduces it.🤖 Generated with Claude Code