Skip to content

feat(DestinationAppwrite): tolerate existing schema on Skip/Upsert re-migration#171

Open
premtsd-code wants to merge 10 commits intomainfrom
feat/skip-duplicates
Open

feat(DestinationAppwrite): tolerate existing schema on Skip/Upsert re-migration#171
premtsd-code wants to merge 10 commits intomainfrom
feat/skip-duplicates

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

Summary

Re-migration against a destination Appwrite project needs the destination to tolerate existing schema resources — otherwise the second run throws DuplicateException on every collection/attribute/index that already exists. This PR makes DestinationAppwrite handle that reconciliation locally, without requiring any changes to utopia-php/database.

Per-resource behavior

Resource Skip Upsert Fail (default)
Database pre-check metadata → tolerate (hydrate sequence) same — never dropped (contains all collections + rows) throw
Table pre-check metadata → tolerate (hydrate sequence) same — never dropped (contains all rows) throw
Attribute pre-check attributes → tolerate pre-check + source.updatedAt > dest.updatedAtdeleteAttribute + recreate; else tolerate throw
Index pre-check indexes → tolerate same updatedAt gate → deleteIndex + recreate; else tolerate throw
Row existing skipDuplicates(createDocuments) existing upsertDocuments plain createDocuments

Why pre-check over try/catch

Migration is a sequential single-writer. The race-condition justification for exception-driven control flow doesn't apply, so pre-check reads top-to-bottom without using exceptions for control flow. On OnDuplicate::Fail (the default) the pre-check is gated off — fresh-migration paths pay zero overhead.

Why only attributes/indexes get the updatedAt drop+recreate

  • Databases and tables contain user data. Dropping is never acceptable.
  • Attribute drop+recreate wipes that column's data; the follow-up row Upsert repopulates it, so the end state is correct.
  • Index drop+recreate costs only rebuild time; no data loss.

Zero library changes

All of this uses existing utopia-php/database APIs:

  • getDocument(collection, id) for pre-check
  • deleteAttribute / deleteIndex for the Upsert drop path
  • skipDuplicates() (5.3.22) for row-level Skip
  • upsertDocuments() (existing) for row-level Upsert

No database library bump required.

Test plan

  • testAppwriteMigrationRowsOnDuplicate in appwrite/appwrite PR #11910 passes end-to-end (three scenarios: fresh, Skip re-run, Upsert re-run).
  • Re-run migration with changed attribute type on source → destination column dropped + recreated (in Upsert mode only).
  • Re-run migration with unchanged source → no drops, second run idempotent.
  • vendor/bin/pint --test passes.
  • vendor/bin/phpstan analyse --level 3 src passes.

Re-migration needs the destination to tolerate existing schema resources.
The database library stays minimal — migration owns the reconciliation
decision since it has both source and destination metadata in hand.

Per-resource behavior:

- Database: pre-check `databases` metadata. If exists, hydrate sequence
  from the existing document and return. Both Skip and Upsert tolerate
  unconditionally — databases contain the entire tree of collections +
  rows, dropping would destroy everything.

- Table: pre-check `database_{sequence}` metadata. Same pattern —
  tolerate unconditionally, hydrate sequence, never drop. Attribute and
  index reconciliation happens per-resource at the lower layer.

- Attribute: pre-check `attributes` metadata by composite id
  ({databaseSeq}_{tableSeq}_{key}). Skip tolerates. Upsert compares
  source's updatedAt vs destination's — if source is strictly newer the
  column is dropped (via `dbForDatabases->deleteAttribute`) and recreated
  so the spec matches source. Column data is wiped on drop; rows will be
  repopulated by the row-level Upsert path that follows.

- Index: same pattern against `indexes` metadata. Drop is cheap because
  indexes carry no user data — only rebuild time.

Approach is pre-check rather than try/catch for control flow: migration
is sequential single-writer, so the race condition justification for
try/catch doesn't apply, and pre-check reads top-to-bottom with no
exception-as-control-flow.

Row-level dispatch (unchanged, already committed):
  OnDuplicate::Upsert → $dbForDatabases->upsertDocuments(...)
  OnDuplicate::Skip   → $dbForDatabases->skipDuplicates(fn () => createDocuments(...))
  OnDuplicate::Fail   → plain createDocuments

Both row-level primitives are existing library APIs — no database-library
changes are required for this feature.

Helpers:
  shouldTolerateSchemaDuplicate()  // onDuplicate !== Fail
  sourceSpecIsNewer($src, $dst)    // strtotime compare, false on empty
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR introduces Skip/Upsert re-migration tolerance for DestinationAppwrite by adding pre-checks at each schema layer (database, table, attribute, index) and a new SchemaAction enum driving resolveSchemaAction. The OnDuplicate logic and its unit tests are solid, but three ordering bugs in createIndex and createColumn will cause real failures on Upsert re-migrations at schema limits:

  • The index count limit check and IndexValidator both execute with pre-drop state, causing false "Index limit reached" and "Invalid index" errors on DropAndRecreate paths.
  • checkAttribute receives a stale in-memory $table after the attribute drop, causing false "Attribute limit exceeded" errors for collections at the attribute ceiling.

Confidence Score: 2/5

Not safe to merge — three P1 ordering bugs will cause Upsert re-migrations to fail at schema limits.

The OnDuplicate decision logic and tests are correct, but the integration into createIndex and createColumn has three confirmed defects where limit checks and validators execute against stale pre-drop state, producing false exceptions on the very Upsert DropAndRecreate path this PR is designed to enable.

src/Migration/Destinations/Appwrite.php — specifically the ordering of the index count/IndexValidator calls relative to the drop block, and the stale $table reference passed to checkAttribute.

Important Files Changed

Filename Overview
src/Migration/Destinations/Appwrite.php Adds Skip/Upsert schema reconciliation for databases, tables, attributes, and indexes; three P1 ordering bugs remain: index limit count and IndexValidator both fire before the Upsert drop block, and stale $table is passed to checkAttribute after the attribute drop.
src/Migration/Destinations/OnDuplicate.php Adds SchemaAction enum and resolveSchemaAction / sourceIsNewer methods; logic is clean and the zero-epoch guard for strtotime is correct.
tests/Migration/Unit/General/OnDuplicateTest.php New unit tests covering the full mode × existence × timestamp matrix for resolveSchemaAction; thorough and well-annotated.

Comments Outside Diff (2)

  1. src/Migration/Destinations/Appwrite.php, line 939-950 (link)

    P1 Index limit check fires before the Upsert drop

    The count query runs at line 939 and throws "Index limit reached for table" if the collection is at its ceiling — but the pre-check / drop block doesn't execute until line 1016. On an Upsert re-migration where the source spec is newer, the existing index is still counted here, so the guard fires even though the net change is zero (one drop, one add). The equivalent attribute path (checkAttribute) avoids this because it runs after the drop+purge cycle; the index path needs the same ordering.

  2. src/Migration/Destinations/Appwrite.php, line 982-1013 (link)

    P1 IndexValidator uses stale $tableIndexes before the Upsert drop

    $tableColumns and $tableIndexes are extracted from the in-memory $table (lines 982–983) and IndexValidator validates the new index at line 1006 — all before the pre-check/drop block at line 1016. On a DropAndRecreate path the old index is still embedded in $tableIndexes when validation runs. If IndexValidator detects that the new key or attribute-set already exists in that list it returns invalid and throws "Invalid index: …" before the drop ever executes, causing every Upsert re-migration with a spec-changed index to fail at validation.

    Moving the $tableColumns/$tableIndexes extraction, IndexValidator instantiation, and isValid call to after the drop block (and re-fetching $table post-purgeCachedDocument for fresh state) would resolve both this and the limit-count issue above.

Reviews (9): Last reviewed commit: "Trim verbose comments, keep only essenti..." | Re-trigger Greptile

Comment thread src/Migration/Destinations/Appwrite.php Outdated
Comment thread src/Migration/Destinations/Appwrite.php Outdated
Comment thread src/Migration/Destinations/Appwrite.php
Guard against malformed datetime strings (e.g. "0000-00-00 00:00:00")
that PHP's strtotime() returns false for. Without the explicit check,
false gets loose-compared as 0 and the helper silently returns false,
meaning a corrupted source updatedAt would always tolerate the existing
destination entry and never trigger drop+recreate.

Appwrite itself always emits parseable RFC 3339 timestamps, so this is
mainly defensive for non-Appwrite sources (Supabase, NHost, CSV).

Flagged by greptile P2.
On a first-run migration createField writes TWO metadata documents for a
two-way relationship attribute: parent-side at
{dbSeq}_{tableSeq}_{key} (line 763) and child-side at
{dbSeq}_{relatedTableSeq}_{twoWayKey} (line 819). Migration operates
directly on the database via $dbForProject — it doesn't rely on
Appwrite's attribute-event workers to derive the child side — so both
documents are physically written by migration itself.

The Upsert drop block was only removing the parent-side document +
column. On the recreate pass that followed, line 819 re-wrote the
child-side document and immediately hit DuplicateException because the
stale child doc from the previous migration run was still sitting there.
The inner catch rolled back the parent and the whole attribute aborted
with "Attribute already exists" — breaking Upsert re-migration for every
two-way relationship whose source spec had changed.

Fix: in the Upsert drop branch, when the resource is a two-way
relationship, also delete the child-side metadata document and the
child-side physical column on the related table, then purge caches.
Both deletes are wrapped in try/catch to tolerate the case where a prior
interrupted run (or utopia-php/database's relationship handling) already
cleaned one side — the goal is to guarantee a clean slate for the
downstream recreate, not to require both sides to still exist.

Flagged by greptile P1. Reproducing scenario:
  - OnDuplicate::Upsert
  - Destination already contains the attribute
  - Source's two-way relationship spec was modified between runs
    (source.updatedAt > dest.updatedAt)
DestinationAppwrite's shouldTolerateSchemaDuplicate() and
sourceSpecIsNewer() were private helpers that encoded mode-specific
behavior. Moving them onto OnDuplicate itself puts the behavior on the
type that owns it and makes the call sites self-documenting:

  $this->onDuplicate->toleratesSchemaDuplicate()
  $this->onDuplicate->shouldReconcileSchema($src, $dst)

shouldReconcileSchema() also subsumes the explicit Skip check at the
call sites — it returns false for Skip and Fail, so the compound
"`Skip || !sourceSpecIsNewer(...)`" condition collapses to a single
"`!shouldReconcileSchema(...)`".

No behavior change. Lint + PHPStan L3 clean.
OnDuplicate previously exposed two methods (toleratesSchemaDuplicate +
shouldReconcileSchema) that call sites had to combine. That split meant
the decision was spread across two methods and the call site — adding a
new mode later would require updating all three places in sync.

Replace with a single resolveSchemaAction() that returns one of three
SchemaAction enum cases (Create / Tolerate / DropAndRecreate). The
SchemaAction enum lives in the same file as OnDuplicate so the complete
decision surface is in one place.

For data-preserving containers (databases, tables) the method accepts
allowDrop=false so it can never return DropAndRecreate — callers that
use that path don't need to worry about accidentally dropping the
container. In practice the container call sites just use the inline
`$this->onDuplicate !== OnDuplicate::Fail` gate and then a simple
"is empty?" branch; resolveSchemaAction() is reserved for the leaves
(attributes, indexes) that actually benefit from the three-way choice.

Fail-mode short-circuit is preserved at every call site: no
getDocument pre-check when onDuplicate is Fail, so first-time
migrations (the common default-mode path) pay zero overhead. The
short-circuit stays at the call site rather than inside the enum method
so the "don't query on Fail" invariant is visible where the query is
issued.

Call sites:
  - createDatabase / createEntity: inline `!== Fail` gate, tolerate on
    existing. No drop ever.
  - createField: Fail short-circuit, else resolveSchemaAction + match.
    DropAndRecreate triggers the two-way relationship cleanup.
  - createIndex: same pattern, no two-way concern.
…havioral comments

Three review improvements to the schema-tolerance work:

1. OnDuplicate::resolveSchemaAction — remove the $allowDrop parameter.
   No caller passes it; all four call sites rely on the default. Containers
   (databases, tables) bypass the method entirely with an inline
   "!== Fail" check. The parameter advertised tuning that wasn't real,
   confusing readers about what was actually gated. Docstring tightened
   to mention that this method is only safe for leaf resources.

2. Extract deleteAttributeCompletely() as a reusable primitive that
   fully removes an attribute from destination — both metadata documents
   AND physical columns, mirroring createField's two-document write for
   two-way relationships. The Upsert drop block in createField shrinks
   from ~40 nested lines to a single method call; the two-way cleanup
   is now a named concern that future non-Upsert paths (Merge mode,
   explicit rollback, etc.) can reuse without duplication.

3. Replace line-number comments ("the first-run createField wrote TWO
   metadata docs (parent at line 763, child at line 819)") with
   behavioral descriptions ("createField writes a second `attributes`
   document on the related table"). Line references rot on every edit;
   behavioral descriptions don't.

Pure refactor — no functional change. Pint + PHPStan L3 clean.
… handling

resolveSchemaAction is the load-bearing decision point for re-migration
tolerance. Unit test locks the full mode × existence × timestamp matrix so
a future refactor can't silently shift the mapping. 14 tests, sub-millisecond
to run, no destination spin-up required.

Coverage:
- Not-exists returns Create for all modes.
- Fail + exists → Create (routed through create flow so library
  DuplicateException surfaces as designed, not silently tolerated).
- Skip ignores timestamps entirely — "don't touch" contract enforced
  by verifying both timestamp orderings produce Tolerate.
- Upsert with source strictly newer → DropAndRecreate.
- Upsert with dest newer or equal timestamps → Tolerate (strict > gate
  avoids unnecessary destructive rebuild when nothing has changed).
- Upsert with unparseable timestamps → Tolerate (conservative, dataset
  covers empty strings, '0000-00-00 00:00:00', and non-date garbage).
- OnDuplicate::values() case ordering guarded against accidental
  rename/reorder that would break SDK param validators.

Writing the test surfaced a real gap in sourceIsNewer: PHP's strtotime()
accepts '0000-00-00 00:00:00' leniently and returns a large negative
epoch (-62169984000) rather than false. The previous `=== false` check
missed the MySQL zero-date sentinel and would have returned true when
source was a valid date and dest was '0000-00-00', triggering a
destructive drop on garbage input. Harden sourceIsNewer to additionally
reject non-positive epochs (<= 0) — any legitimate Appwrite updatedAt
is well past 1970-01-01.
Extends the Upsert reconciliation to cover container metadata drift.
Previously databases and tables always tolerated an existing destination
under Upsert — source renames, enable toggles, and (for tables)
permission changes silently never reached destination on re-migration.
Now Upsert + source strictly newer triggers an updateDocument on the
container's own metadata doc, leaving children (rows, sub-resources)
untouched.

Mechanism:

- New SchemaAction::UpdateInPlace case alongside Create / Tolerate /
  DropAndRecreate.
- OnDuplicate::resolveSchemaAction gains a canDrop parameter (default
  false — safer default, destructive reconciliation must be opt-in):
    canDrop: true  → DropAndRecreate on Upsert-newer (attributes, indexes)
    canDrop: false → UpdateInPlace on Upsert-newer (databases, tables)
- createField / createIndex pass canDrop: true.
- createDatabase / createEntity rely on the default and handle
  SchemaAction::UpdateInPlace by calling updateDocument with a scoped
  field set:
    Database → name, search, enabled, type, originalId, database, $updatedAt
    Table    → name, search, enabled, $permissions, documentSecurity, $updatedAt
  Immutable fields ($id, $createdAt, internal sequences) are never touched.

Source-wins semantics for table permissions follow the full-Upsert
contract: if the caller wants destination drift preserved across re-runs,
they should pick Skip mode, not Upsert. Row Upsert already works this
way; containers now match.

Unit tests cover both canDrop branches and confirm Skip never returns
UpdateInPlace regardless of timestamps (reinforcing the "don't touch"
contract). 16 tests, pint + phpstan L3 clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant