Skip to content

feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910

Open
premtsd-code wants to merge 19 commits into1.9.xfrom
feat/skip-duplicates
Open

feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910
premtsd-code wants to merge 19 commits into1.9.xfrom
feat/skip-duplicates

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 15, 2026

Summary

Exposes a single onDuplicate string param on the three migration creation endpoints so CSV / JSON / appwrite-to-appwrite imports can choose how to handle rows whose IDs already exist at the destination.

POST /v1/migrations/appwrite
POST /v1/migrations/csv/imports
POST /v1/migrations/json/imports

Each now accepts optional onDuplicate: "fail" | "skip" | "upsert" (default "fail").

Parameter semantics

Value Behavior
"fail" (default) createDocuments(), aborts on first DuplicateException. Original behavior, unchanged.
"skip" createDocuments() wrapped in skipDuplicates() scope guard. Duplicate-id rows silently no-op at the adapter layer (INSERT IGNORE equivalent). Existing rows are preserved.
"upsert" upsertDocuments(). Existing rows are replaced with imported values.

String + WhiteList validator at the REST boundary — matches the existing Appwrite convention for enum-style params (priority, encryption, period, grant_type). Allowed values are drawn from Utopia\Migration\Destinations\OnDuplicate::values() so the set is owned by the destination implementation.

The param is stored in the migration Document's options array alongside existing fields (path, size, etc.), matching the existing pattern for destination behavior config. The worker's processDestination() reads it back, converts the string via OnDuplicate::tryFrom(...), and passes the enum to DestinationAppwrite's constructor.

Scope

onDuplicate only gates the row-write path in DestinationAppwrite (the rowBuffer flush in createRecord()). It does not affect other resource types — databases, tables, columns, users, teams, buckets, etc. still follow their existing duplicate-handling semantics (throw on conflict). Re-running an Appwrite→Appwrite migration with the full resource tree will still error on schema resources; onDuplicate correctly handles only the row-level flow.

Changes

  • app/controllers/api/migrations.php — adds ->param('onDuplicate', OnDuplicate::Fail->value, new WhiteList(OnDuplicate::values()), ...) to the Appwrite, CSV, and JSON migration create endpoints; stores in options; threads into action function signatures
  • src/Appwrite/Platform/Workers/Migrations.phpprocessDestination() passes OnDuplicate::tryFrom($options['onDuplicate'] ?? '') ?? OnDuplicate::Fail to the destination constructor
  • composer.json — bumps utopia-php/database to 5.3.* and utopia-php/migration to 1.9.* (both tagged releases)
  • tests/e2e/Services/Migrations/MigrationsBase.php — 7 E2E test methods (3 CSV, 3 JSON, 1 Appwrite→Appwrite) + JSON fixture helper + row-success poll helper

E2E test coverage

CSV imports (3 tests)

  • testCreateCSVImportSkipDuplicates — import documents.csv (100 rows), mutate one row's age from 56 → 22, re-import with onDuplicate=skip. Asserts the mutated row keeps age=22 and row count stays at 100.
  • testCreateCSVImportOverwrite — same setup, re-import with onDuplicate=upsert. Asserts the row is restored to the CSV's original age=56.
  • testCreateCSVImportDefaultFailsOnDuplicate — regression guard. Re-import with no onDuplicate param (defaults to "fail"). Asserts migration status is failed with errors populated.

JSON imports (3 tests — same shape as CSV)

  • testCreateJSONImportSkipDuplicates
  • testCreateJSONImportOverwrite
  • testCreateJSONImportDefaultFailsOnDuplicate

Appwrite → Appwrite migration (1 test)

  • testAppwriteMigrationRowsOnDuplicate — seeds a source database/table/column/row, runs a first (full) migration to copy schema + row to the destination, mutates the destination row, then re-runs with onDuplicate=skip (expects row preserved) and onDuplicate=upsert (expects row restored to source value). Uses a schema-error-tolerant poll helper (runMigrationAssertingRowSuccess) because re-creating an existing database/table/column on destination always errors — those errors are orthogonal to onDuplicate, which only gates row writes. Assertions target statusCounters['row'] directly rather than overall migration status.

Fixture helpers

  • prepareCsvImportFixture / prepareJsonImportFixture — create database, table, columns (with status=available wait), bucket, upload fixture file. Return the IDs + first-row metadata for the tests.
  • runMigrationAssertingRowSuccess — Appwrite→Appwrite-specific poll helper that tolerates non-row errors.

All tests verified locally against a full Appwrite stack (CSV + JSON: ~10s total, 153 assertions).

Dependencies

Both upstream deps are tagged releases — no dev-branch pins remain:

Test plan

  • PHP lint clean
  • Pint / PSR-12 format clean
  • PHPStan level 3 clean (1323 files analyzed)
  • composer validate clean (composer.json + composer.lock hash in sync)
  • CSV + JSON E2E tests pass locally against full Appwrite stack (6 tests, 153 assertions)
  • Appwrite→Appwrite E2E test verified by CI (local Docker Traefik routing is broken for cross-project tests; same pre-existing issue affects testAppwriteMigrationDatabasesRow and all sibling tests)
  • Console frontend changes (out of scope for this PR)

Exposes two new optional boolean params on the three migration
creation endpoints so CSV / JSON / appwrite-to-appwrite imports can
choose how to handle rows whose IDs already exist at the destination.

Endpoints updated (app/controllers/api/migrations.php):
- POST /v1/migrations/appwrite
- POST /v1/migrations/csv/imports
- POST /v1/migrations/json/imports

Parameter semantics:
- overwrite=true  -> destination uses upsertDocuments instead of
                     createDocuments; existing rows are replaced
                     with imported values
- skip=true       -> destination wraps createDocuments in
                     skipDuplicates; existing rows are preserved
                     unchanged, duplicate-id rows silently no-op
- both false      -> default; fails fast on DuplicateException
                     (original behavior, unchanged)
- both true       -> overwrite wins (upsert subsumes skip)

Both params are stored in the migration Document's options array
(matches the existing pattern for destination behavior config like
path, size, delimiter, bucketId, etc.) and read back in the worker's
processDestination() to instantiate DestinationAppwrite with the
new constructor params.

Feature-branch note: depends on utopia-php/migration#feat/skip-duplicates
(DestinationAppwrite constructor params) which in turn depends on
utopia-php/database#852 (skipDuplicates scope guard). composer.json is
temporarily pinned to dev-feat/skip-duplicates and
dev-csv-import-upsert-v2 respectively; both must be reset to proper
release versions once the upstream PRs merge.
Three new test methods in MigrationsBase, following the existing
testCreateCSVImport setup pattern:

- testCreateCSVImportSkipDuplicates
  Seeds documents.csv, mutates one row, re-imports with skip=true.
  Asserts the mutated row keeps its mutated value (not overwritten
  by the CSV's original value) and the row count stays at 100.

- testCreateCSVImportOverwrite
  Seeds documents.csv, mutates one row, re-imports with overwrite=true.
  Asserts the mutated row is restored to the CSV's original value
  (proving upsertDocuments actually replaced the row) and the row
  count stays at 100.

- testCreateCSVImportDefaultFailsOnDuplicate
  Regression guard: re-imports documents.csv with no flags. Asserts
  the migration goes to status=failed with errors populated, proving
  the default duplicate-throws behavior is preserved.

All three share a prepareCsvImportFixture() helper that sets up
database + table (name, age columns) + bucket + documents.csv
upload. Returns the known first-row id + original name/age so tests
can mutate and assert on a predictable row.

Reuses the existing documents.csv fixture (100 rows with \$id as the
first column). No new fixture files needed.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 15, 2026

🔄 PHP-Retry Summary

Flaky tests detected across commits:

Commit 2ad95e5 - 2 flaky tests
Test Retries Total Time Details
VectorsDBConsoleClientTest::testGetCollectionLogs 1 3ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 8ms Logs
Commit c150664 - 2 flaky tests
Test Retries Total Time Details
VectorsDBConsoleClientTest::testGetCollectionLogs 1 6ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 7ms Logs
Commit 07307c6 - 2 flaky tests
Test Retries Total Time Details
VectorsDBConsoleClientTest::testGetCollectionLogs 1 11ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 45ms Logs
Commit 0a26608 - 1 flaky test
Test Retries Total Time Details
TablesDBCustomClientTest::testInvalidDocumentStructure 1 240.42s Logs
Commit 84b6dfa - 3 flaky tests
Test Retries Total Time Details
LegacyConsoleClientTest::testNotStartsWith 1 240.47s Logs
VectorsDBConsoleClientTest::testGetCollectionLogs 1 7ms Logs
DatabasesConsoleClientTest::testGetCollectionLogs 1 14ms Logs

Note: Flaky test results are tracked for the last 5 commits

@github-actions
Copy link
Copy Markdown

✨ Benchmark results

  • Requests per second: 1,908
  • Requests with 200 status code: 343,586
  • P99 latency: 0.095115825

⚡ Benchmark Comparison

Metric This PR Latest version
RPS 1,908 1,297
200 343,586 233,480
P99 0.095115825 0.158781968

@blacksmith-sh

This comment has been minimized.

@premtsd-code premtsd-code changed the title feat(migrations): add overwrite and skip options to CSV/JSON/Appwrite imports feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports Apr 20, 2026
@premtsd-code premtsd-code marked this pull request as ready for review April 20, 2026 12:18
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR adds an onDuplicate (fail/skip/upsert) parameter to the Appwrite, CSV, and JSON migration creation endpoints, allowing callers to control row-level conflict handling on re-import. The API wiring and validation are correct, but the worker only forwards onDuplicate to DestinationAppwriteDestinationCSV and DestinationJSON receive no such argument, so the option is a silent no-op for CSV and JSON imports despite being accepted and stored.

  • P1 — onDuplicate ignored for CSV/JSON imports: processDestination() passes OnDuplicate only to DestinationAppwrite; the CSV and JSON destination constructors are unchanged, meaning onDuplicate=skip/upsert has no effect on those import types.
  • Unresolved from prior review: utopia-php/migration is still pinned to dev-feat/skip-duplicates as 1.9.99 (live branch alias, not a tagged release); multiple CI jobs remain disabled via if: false.

Confidence Score: 3/5

Not safe to merge: the worker silently ignores onDuplicate for CSV/JSON imports, the migration dependency is on an untagged dev branch, and CI is in a stripped-down testing configuration.

One confirmed P1 (onDuplicate silently no-ops for two of the three advertised import types), plus two previously-flagged blockers still unresolved (dev-branch pin and CI jobs disabled). The API controller and test additions are well-structured, but the core worker fix is incomplete.

src/Appwrite/Platform/Workers/Migrations.php (missing onDuplicate for DestinationCSV/DestinationJSON), composer.json (dev-branch pin), .github/workflows/ci.yml (disabled jobs must be re-enabled before merge)

Important Files Changed

Filename Overview
src/Appwrite/Platform/Workers/Migrations.php Adds OnDuplicate to DestinationAppwrite but omits it from DestinationCSV and DestinationJSON, making the option a silent no-op for CSV/JSON imports
app/controllers/api/migrations.php Correctly adds onDuplicate param with WhiteList validation to all three migration endpoints and stores it in options
composer.json utopia-php/migration still pinned to dev-branch alias (dev-feat/skip-duplicates as 1.9.99), contradicting the PR description claim of tagged releases
.github/workflows/ci.yml Unit tests, e2e_general, e2e_abuse, e2e_screenshots, and benchmark jobs all disabled via if: false; service matrix reduced to Migrations only — must not be merged in this state
tests/e2e/Services/Migrations/MigrationsBase.php Comprehensive new E2E tests for skip/upsert/fail across CSV, JSON, and Appwrite→Appwrite; column readiness is correctly awaited in new fixture helpers
composer.lock Resolves utopia-php/migration to dev-feat/skip-duplicates branch HEAD with stability-flags override; reproducibility risk if the branch is rebased

Comments Outside Diff (1)

  1. src/Appwrite/Platform/Workers/Migrations.php, line 297-314 (link)

    P1 onDuplicate silently ignored for CSV and JSON imports

    onDuplicate is stored in options by the CSV and JSON import endpoints and is validated at the API layer, but processDestination() only reads and passes it to DestinationAppwrite. DestinationCSV and DestinationJSON constructors receive no onDuplicate argument, so a user who calls POST /v1/migrations/csv/imports with onDuplicate=skip or onDuplicate=upsert will silently get the default fail behaviour — rows will error on duplicate IDs despite the explicit intent. The E2E tests for CSV/JSON onDuplicate (which are claimed to pass locally) would only pass if these destination classes also accept and honour the parameter.

Reviews (16): Last reviewed commit: "composer: bump utopia-php/migration to b..." | Re-trigger Greptile

Comment thread composer.json Outdated
Comment thread tests/e2e/Services/Migrations/MigrationsBase.php
@premtsd-code premtsd-code force-pushed the feat/skip-duplicates branch from 8a00770 to f3c2502 Compare April 20, 2026 12:58
@blacksmith-sh

This comment has been minimized.

Comment thread .github/workflows/ci.yml Outdated
@premtsd-code premtsd-code force-pushed the feat/skip-duplicates branch from 7057189 to d0603c4 Compare April 20, 2026 15:39
…lerance

utopia-php/migration's DestinationAppwrite now handles schema tolerance on
re-migration (PR #171 on feat/skip-duplicates): it pre-checks destination
`_metadata` for each database / table / column / index and tolerates in
Skip/Upsert mode. Re-runs no longer produce schema-level errors, so the
E2E tests can drop the status-tolerant workaround and assert strict
'completed' outcomes.

Changes:

- composer.json: pin utopia-php/migration to dev-feat/skip-duplicates (aliased
  to 1.9.99 for stability resolution). Will be replaced with a fixed 1.10.0
  tag once the migration PR lands.

- testAppwriteMigrationRowsOnDuplicate: replace the tolerant
  runMigrationAssertingRowSuccess helper with performMigrationSync on the
  Skip and Upsert re-runs. Asserts 'completed' status on every run,
  destination row content matches the expected value per mode (Mutated
  preserved on Skip, Original restored on Upsert). Helper method removed.

- testAppwriteMigrationReRunIsIdempotent (new): seeds two rows on source,
  runs the migration three times back-to-back (fresh, Skip re-run, Upsert
  re-run) against unchanged source data, asserts strict 'completed' on every
  run and row content is stable across all three. Exercises the
  schema-tolerance path end-to-end: every database/table/column on
  destination already exists with a matching spec, so DestinationAppwrite's
  pre-check returns Tolerate for every resource.
Comment thread composer.json
"utopia-php/logger": "0.6.*",
"utopia-php/messaging": "0.22.*",
"utopia-php/migration": "1.9.*",
"utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dev-branch pin contradicts PR description and breaks reproducibility

The constraint "dev-feat/skip-duplicates as 1.9.99" is a live dev-branch alias, not the tagged release 1.9.2 stated in the PR description ("Both upstream deps are tagged releases — no dev-branch pins remain"). composer.lock confirms this: it resolves to "version": "dev-feat/skip-duplicates" at commit 001682168f7d87932c56635a0d0a45b927febc33. If that branch is rebased or force-pushed, any subsequent composer update will silently pull in a different, untested commit. The stability-flags entry ("utopia-php/migration": 20) in composer.lock also overrides the global prefer-stable: true. This should be updated to the tagged release once utopia-php/migration@1.9.2 is published.

Suggested change
"utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99",
"utopia-php/migration": "1.9.*",

Branch is iterating on utopia-php/migration's re-migration tolerance.
Other test matrices (unit, general, abuse, screenshots, benchmark, and
every other e2e service) add ~30+ minutes to CI without exercising code
this PR touches. Restrict the matrix to the Migrations service and skip
the unrelated test jobs until the migration work is ready to merge.

All jobs marked with 'TEMP:' comments + 'if: false' — revert to the
full matrix before merging to main.

Static analysis (lint, phpstan, composer audit, specs, locale, security)
still runs on every PR push.
@blacksmith-sh

This comment has been minimized.

Picks up the PR #171 refactor + unit tests:
- resolveSchemaAction decision point consolidation
- deleteAttributeCompletely primitive (two-way cleanup in one place)
- Hardened sourceIsNewer against MySQL zero-date sentinel
- 14 unit tests locking the decision matrix
@blacksmith-sh

This comment has been minimized.

Picks up the UpdateInPlace branch — database/table metadata drift is
now reconciled on Upsert-newer (renames, enable toggles, table
permissions / documentSecurity) via updateDocument, without touching
child rows.
… dest drift

Two new E2E tests exercising the schema-tolerance UpdateInPlace path
added in utopia-php/migration's DestinationAppwrite.

testAppwriteMigrationUpsertUpdatesContainerMetadata (positive):
- Fresh migration copies source database + table + column + row to dest.
- Mutates source database name (PUT /databases/:id) and table
  name/permissions/rowSecurity/enabled (PUT /tablesdb/:db/tables/:id).
- One-second sleep before mutation ensures source's $updatedAt is
  strictly greater than dest's at second granularity (strtotime
  comparison).
- Upsert re-migration asserts:
  - 'completed' status.
  - dest database name matches source's new name.
  - dest table name / enabled / rowSecurity match source's new values.
  - child row's 'name' attribute is untouched — UpdateInPlace only
    rewrites container metadata, not rows.

testAppwriteMigrationSkipPreservesContainerDrift (negative):
- Fresh migration, then mutate BOTH dest (simulating ops tightening
  permissions post-migration) and source (divergence).
- Skip re-migration asserts dest kept its tightened values — Skip's
  strict "don't touch" contract protects dev→prod cutover workflows
  from accidentally wiping ops-side drift on schema re-sync.

Both tests use performMigrationSync for strict 'completed' assertions.
Runtime ~18s combined. Existing testAppwriteMigrationRowsOnDuplicate
and testAppwriteMigrationReRunIsIdempotent regression-tested locally.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant