feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910
feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910premtsd-code wants to merge 19 commits into1.9.xfrom
Conversation
Exposes two new optional boolean params on the three migration
creation endpoints so CSV / JSON / appwrite-to-appwrite imports can
choose how to handle rows whose IDs already exist at the destination.
Endpoints updated (app/controllers/api/migrations.php):
- POST /v1/migrations/appwrite
- POST /v1/migrations/csv/imports
- POST /v1/migrations/json/imports
Parameter semantics:
- overwrite=true -> destination uses upsertDocuments instead of
createDocuments; existing rows are replaced
with imported values
- skip=true -> destination wraps createDocuments in
skipDuplicates; existing rows are preserved
unchanged, duplicate-id rows silently no-op
- both false -> default; fails fast on DuplicateException
(original behavior, unchanged)
- both true -> overwrite wins (upsert subsumes skip)
Both params are stored in the migration Document's options array
(matches the existing pattern for destination behavior config like
path, size, delimiter, bucketId, etc.) and read back in the worker's
processDestination() to instantiate DestinationAppwrite with the
new constructor params.
Feature-branch note: depends on utopia-php/migration#feat/skip-duplicates
(DestinationAppwrite constructor params) which in turn depends on
utopia-php/database#852 (skipDuplicates scope guard). composer.json is
temporarily pinned to dev-feat/skip-duplicates and
dev-csv-import-upsert-v2 respectively; both must be reset to proper
release versions once the upstream PRs merge.
Three new test methods in MigrationsBase, following the existing testCreateCSVImport setup pattern: - testCreateCSVImportSkipDuplicates Seeds documents.csv, mutates one row, re-imports with skip=true. Asserts the mutated row keeps its mutated value (not overwritten by the CSV's original value) and the row count stays at 100. - testCreateCSVImportOverwrite Seeds documents.csv, mutates one row, re-imports with overwrite=true. Asserts the mutated row is restored to the CSV's original value (proving upsertDocuments actually replaced the row) and the row count stays at 100. - testCreateCSVImportDefaultFailsOnDuplicate Regression guard: re-imports documents.csv with no flags. Asserts the migration goes to status=failed with errors populated, proving the default duplicate-throws behavior is preserved. All three share a prepareCsvImportFixture() helper that sets up database + table (name, age columns) + bucket + documents.csv upload. Returns the known first-row id + original name/age so tests can mutate and assert on a predictable row. Reuses the existing documents.csv fixture (100 rows with \$id as the first column). No new fixture files needed.
🔄 PHP-Retry SummaryFlaky tests detected across commits: Commit
|
| Test | Retries | Total Time | Details |
|---|---|---|---|
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 3ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 8ms | Logs |
Commit c150664 - 2 flaky tests
| Test | Retries | Total Time | Details |
|---|---|---|---|
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 6ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 7ms | Logs |
Commit 07307c6 - 2 flaky tests
| Test | Retries | Total Time | Details |
|---|---|---|---|
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 11ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 45ms | Logs |
Commit 0a26608 - 1 flaky test
| Test | Retries | Total Time | Details |
|---|---|---|---|
TablesDBCustomClientTest::testInvalidDocumentStructure |
1 | 240.42s | Logs |
Commit 84b6dfa - 3 flaky tests
| Test | Retries | Total Time | Details |
|---|---|---|---|
LegacyConsoleClientTest::testNotStartsWith |
1 | 240.47s | Logs |
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 7ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 14ms | Logs |
Note: Flaky test results are tracked for the last 5 commits
✨ Benchmark results
⚡ Benchmark Comparison
|
This comment has been minimized.
This comment has been minimized.
… database to 5.3.22
Greptile SummaryThis PR adds an
Confidence Score: 3/5Not safe to merge: the worker silently ignores onDuplicate for CSV/JSON imports, the migration dependency is on an untagged dev branch, and CI is in a stripped-down testing configuration. One confirmed P1 (onDuplicate silently no-ops for two of the three advertised import types), plus two previously-flagged blockers still unresolved (dev-branch pin and CI jobs disabled). The API controller and test additions are well-structured, but the core worker fix is incomplete. src/Appwrite/Platform/Workers/Migrations.php (missing onDuplicate for DestinationCSV/DestinationJSON), composer.json (dev-branch pin), .github/workflows/ci.yml (disabled jobs must be re-enabled before merge) Important Files Changed
|
8a00770 to
f3c2502
Compare
This comment has been minimized.
This comment has been minimized.
7057189 to
d0603c4
Compare
…lerance utopia-php/migration's DestinationAppwrite now handles schema tolerance on re-migration (PR #171 on feat/skip-duplicates): it pre-checks destination `_metadata` for each database / table / column / index and tolerates in Skip/Upsert mode. Re-runs no longer produce schema-level errors, so the E2E tests can drop the status-tolerant workaround and assert strict 'completed' outcomes. Changes: - composer.json: pin utopia-php/migration to dev-feat/skip-duplicates (aliased to 1.9.99 for stability resolution). Will be replaced with a fixed 1.10.0 tag once the migration PR lands. - testAppwriteMigrationRowsOnDuplicate: replace the tolerant runMigrationAssertingRowSuccess helper with performMigrationSync on the Skip and Upsert re-runs. Asserts 'completed' status on every run, destination row content matches the expected value per mode (Mutated preserved on Skip, Original restored on Upsert). Helper method removed. - testAppwriteMigrationReRunIsIdempotent (new): seeds two rows on source, runs the migration three times back-to-back (fresh, Skip re-run, Upsert re-run) against unchanged source data, asserts strict 'completed' on every run and row content is stable across all three. Exercises the schema-tolerance path end-to-end: every database/table/column on destination already exists with a matching spec, so DestinationAppwrite's pre-check returns Tolerate for every resource.
| "utopia-php/logger": "0.6.*", | ||
| "utopia-php/messaging": "0.22.*", | ||
| "utopia-php/migration": "1.9.*", | ||
| "utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99", |
There was a problem hiding this comment.
Dev-branch pin contradicts PR description and breaks reproducibility
The constraint "dev-feat/skip-duplicates as 1.9.99" is a live dev-branch alias, not the tagged release 1.9.2 stated in the PR description ("Both upstream deps are tagged releases — no dev-branch pins remain"). composer.lock confirms this: it resolves to "version": "dev-feat/skip-duplicates" at commit 001682168f7d87932c56635a0d0a45b927febc33. If that branch is rebased or force-pushed, any subsequent composer update will silently pull in a different, untested commit. The stability-flags entry ("utopia-php/migration": 20) in composer.lock also overrides the global prefer-stable: true. This should be updated to the tagged release once utopia-php/migration@1.9.2 is published.
| "utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99", | |
| "utopia-php/migration": "1.9.*", |
Branch is iterating on utopia-php/migration's re-migration tolerance. Other test matrices (unit, general, abuse, screenshots, benchmark, and every other e2e service) add ~30+ minutes to CI without exercising code this PR touches. Restrict the matrix to the Migrations service and skip the unrelated test jobs until the migration work is ready to merge. All jobs marked with 'TEMP:' comments + 'if: false' — revert to the full matrix before merging to main. Static analysis (lint, phpstan, composer audit, specs, locale, security) still runs on every PR push.
This comment has been minimized.
This comment has been minimized.
Picks up the PR #171 refactor + unit tests: - resolveSchemaAction decision point consolidation - deleteAttributeCompletely primitive (two-way cleanup in one place) - Hardened sourceIsNewer against MySQL zero-date sentinel - 14 unit tests locking the decision matrix
This comment has been minimized.
This comment has been minimized.
Picks up the UpdateInPlace branch — database/table metadata drift is now reconciled on Upsert-newer (renames, enable toggles, table permissions / documentSecurity) via updateDocument, without touching child rows.
… dest drift
Two new E2E tests exercising the schema-tolerance UpdateInPlace path
added in utopia-php/migration's DestinationAppwrite.
testAppwriteMigrationUpsertUpdatesContainerMetadata (positive):
- Fresh migration copies source database + table + column + row to dest.
- Mutates source database name (PUT /databases/:id) and table
name/permissions/rowSecurity/enabled (PUT /tablesdb/:db/tables/:id).
- One-second sleep before mutation ensures source's $updatedAt is
strictly greater than dest's at second granularity (strtotime
comparison).
- Upsert re-migration asserts:
- 'completed' status.
- dest database name matches source's new name.
- dest table name / enabled / rowSecurity match source's new values.
- child row's 'name' attribute is untouched — UpdateInPlace only
rewrites container metadata, not rows.
testAppwriteMigrationSkipPreservesContainerDrift (negative):
- Fresh migration, then mutate BOTH dest (simulating ops tightening
permissions post-migration) and source (divergence).
- Skip re-migration asserts dest kept its tightened values — Skip's
strict "don't touch" contract protects dev→prod cutover workflows
from accidentally wiping ops-side drift on schema re-sync.
Both tests use performMigrationSync for strict 'completed' assertions.
Runtime ~18s combined. Existing testAppwriteMigrationRowsOnDuplicate
and testAppwriteMigrationReRunIsIdempotent regression-tested locally.
Summary
Exposes a single
onDuplicatestring param on the three migration creation endpoints so CSV / JSON / appwrite-to-appwrite imports can choose how to handle rows whose IDs already exist at the destination.Each now accepts optional
onDuplicate: "fail" | "skip" | "upsert"(default"fail").Parameter semantics
"fail"(default)createDocuments(), aborts on firstDuplicateException. Original behavior, unchanged."skip"createDocuments()wrapped inskipDuplicates()scope guard. Duplicate-id rows silently no-op at the adapter layer (INSERT IGNOREequivalent). Existing rows are preserved."upsert"upsertDocuments(). Existing rows are replaced with imported values.String +
WhiteListvalidator at the REST boundary — matches the existing Appwrite convention for enum-style params (priority,encryption,period,grant_type). Allowed values are drawn fromUtopia\Migration\Destinations\OnDuplicate::values()so the set is owned by the destination implementation.The param is stored in the migration Document's
optionsarray alongside existing fields (path,size, etc.), matching the existing pattern for destination behavior config. The worker'sprocessDestination()reads it back, converts the string viaOnDuplicate::tryFrom(...), and passes the enum toDestinationAppwrite's constructor.Scope
onDuplicateonly gates the row-write path inDestinationAppwrite(therowBufferflush increateRecord()). It does not affect other resource types — databases, tables, columns, users, teams, buckets, etc. still follow their existing duplicate-handling semantics (throw on conflict). Re-running an Appwrite→Appwrite migration with the full resource tree will still error on schema resources;onDuplicatecorrectly handles only the row-level flow.Changes
app/controllers/api/migrations.php— adds->param('onDuplicate', OnDuplicate::Fail->value, new WhiteList(OnDuplicate::values()), ...)to the Appwrite, CSV, and JSON migration create endpoints; stores inoptions; threads into action function signaturessrc/Appwrite/Platform/Workers/Migrations.php—processDestination()passesOnDuplicate::tryFrom($options['onDuplicate'] ?? '') ?? OnDuplicate::Failto the destination constructorcomposer.json— bumpsutopia-php/databaseto5.3.*andutopia-php/migrationto1.9.*(both tagged releases)tests/e2e/Services/Migrations/MigrationsBase.php— 7 E2E test methods (3 CSV, 3 JSON, 1 Appwrite→Appwrite) + JSON fixture helper + row-success poll helperE2E test coverage
CSV imports (3 tests)
testCreateCSVImportSkipDuplicates— importdocuments.csv(100 rows), mutate one row's age from 56 → 22, re-import withonDuplicate=skip. Asserts the mutated row keepsage=22and row count stays at 100.testCreateCSVImportOverwrite— same setup, re-import withonDuplicate=upsert. Asserts the row is restored to the CSV's originalage=56.testCreateCSVImportDefaultFailsOnDuplicate— regression guard. Re-import with noonDuplicateparam (defaults to"fail"). Asserts migration status isfailedwith errors populated.JSON imports (3 tests — same shape as CSV)
testCreateJSONImportSkipDuplicatestestCreateJSONImportOverwritetestCreateJSONImportDefaultFailsOnDuplicateAppwrite → Appwrite migration (1 test)
testAppwriteMigrationRowsOnDuplicate— seeds a source database/table/column/row, runs a first (full) migration to copy schema + row to the destination, mutates the destination row, then re-runs withonDuplicate=skip(expects row preserved) andonDuplicate=upsert(expects row restored to source value). Uses a schema-error-tolerant poll helper (runMigrationAssertingRowSuccess) because re-creating an existing database/table/column on destination always errors — those errors are orthogonal toonDuplicate, which only gates row writes. Assertions targetstatusCounters['row']directly rather than overall migration status.Fixture helpers
prepareCsvImportFixture/prepareJsonImportFixture— create database, table, columns (withstatus=availablewait), bucket, upload fixture file. Return the IDs + first-row metadata for the tests.runMigrationAssertingRowSuccess— Appwrite→Appwrite-specific poll helper that tolerates non-row errors.All tests verified locally against a full Appwrite stack (CSV + JSON: ~10s total, 153 assertions).
Dependencies
Both upstream deps are tagged releases — no dev-branch pins remain:
utopia-php/database@5.3.22— provides theskipDuplicates()scope guard (utopia-php/database#852)utopia-php/migration@1.9.2— provides theOnDuplicateenum andDestinationAppwriterefactor (utopia-php/migration#169)Test plan
composer validateclean (composer.json + composer.lock hash in sync)testAppwriteMigrationDatabasesRowand all sibling tests)