Skip to content

fix: preserve original error in exception_log on retry#430

Merged
mason-sharp merged 3 commits intov5_STABLEfrom
fix/SPOC-518/exception-log-error-propagation
Apr 24, 2026
Merged

fix: preserve original error in exception_log on retry#430
mason-sharp merged 3 commits intov5_STABLEfrom
fix/SPOC-518/exception-log-error-propagation

Conversation

@rasifr
Copy link
Copy Markdown
Member

@rasifr rasifr commented Apr 23, 2026

In TRANSDISCARD and SUB_DISABLE modes the apply worker retries a failed
transaction with use_try_block=true. Previously, the original error was
only written to the server log and lost before the retry pass ran.
Every row in the discarded transaction was logged to spock.exception_log
with error_message = NULL, stored as "unknown", giving no indication of
why the transaction was discarded.

Add initial_error_message[1024], initial_operation[16], and failed_action
to SpockExceptionLog (shared memory). In the outer PG_CATCH, save
edata->message, errcallback_arg.action_name, and xact_action_counter into
these fields before FlushErrorState() clears the error context.

On the retry pass, the row whose xact_action_counter matches failed_action
receives the saved initial_error_message as its error_message in exception_log.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3391708-7610-432a-98ea-65959dc4cce1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/SPOC-518/exception-log-error-propagation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

rasifr and others added 2 commits April 23, 2026 19:05
… main)

In TRANSDISCARD and SUB_DISABLE modes the apply worker retries a failed
transaction with use_try_block=true.  Previously, the original error was
only written to the server log and lost before the retry pass ran.
Every row in the discarded transaction was logged to spock.exception_log
with error_message = NULL, stored as "unknown", giving no indication of
why the transaction was discarded.

Add initial_error_message[1024], initial_operation[16], and failed_action
to SpockExceptionLog (shared memory).  In the outer PG_CATCH, save
edata->message, errcallback_arg.action_name, and xact_action_counter into
these fields before FlushErrorState() clears the error context.

On the retry pass, the row whose xact_action_counter matches failed_action
receives the saved initial_error_message as its error_message in
spock.exception_log.  All other rows in the same transaction receive NULL,
stored as "unavailable", which accurately describes their status (they were
not the cause of the failure).

Also change the NULL fallback in add_entry_to_exception_log from "unknown"
to "unavailable" to eliminate the misleading implication that those rows
independently encountered an unknown error.

Clear initial_error_message at both exception-state reset points in
handle_commit so stale messages do not carry over to subsequent transactions.

Backport of commits d8cf5db, eb30df1 from main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port main's 013_exception_handling.pl to v5_STABLE.

Adaptations from main:
- GUC changes via postgresql.conf append + pg_reload_conf() instead of
  ALTER SYSTEM to avoid SIGHUP-propagation races with the apply worker fork.
- Tables created explicitly on both nodes; no DDL-replication dependency.
- Fixed sleeps replaced with wait_for_sub_status / wait_for_exception_log
  polling helpers.
- done_testing() instead of a hard-coded test count.

Part 5 added (new): TRANSDISCARD error_message quality regression test.
Verifies that after the backport in d08c65db:
  - Bystander rows carry error_message = 'unavailable' (not 'unknown').
  - The originally-failing row carries the real constraint-violation message.
  - No entries with the old 'unknown' fallback exist.

Scenario: three-table transaction where t_ehx_b has NOT NULL on n2 only;
INSERT with v=NULL fails on n2, triggering TRANSDISCARD for the whole xact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rasifr rasifr force-pushed the fix/SPOC-518/exception-log-error-propagation branch from 198afbb to 886477d Compare April 23, 2026 14:10
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 23, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity · 2 duplication

Metric Results
Complexity 0
Duplication 2

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

- Remove dead initial_operation[16] field from SpockExceptionLog and its
  write site in the outer PG_CATCH; the field was written but never read.
- Initialize ErrorData *edata = NULL in handle_sql_or_exception, matching
  all three DML handlers; the uninitialized declaration was formally UB
  even though the ternary short-circuit kept runtime safe.
- Clear failed_action = 0 at both handle_commit reset points alongside
  initial_error_message[0] = '\0', making the guard explicit rather than
  relying on implicit coupling between the two fields.
- Standardize initial_error_message[0] guard to != '\0' in all DML
  handlers, matching the SQL handler style.
- Add LOG message with initial_error_message before each clear point in
  the TRANSDISCARD/SUB_DISABLE handle_commit paths, surfacing the original
  error in the server log (ported from main).
- Reset xact_had_exception = false in handle_commit's post-commit state
  reset alongside xact_action_counter and use_try_block.  Without this,
  the flag retained its 'true' value after a TRANSDISCARD retry pass and
  could trigger a spurious spock_disable_subscription() call if the GUC
  changed from transdiscard to sub_disable before the next BEGIN arrived,
  leaving exception_log empty and the subscription disabled prematurely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rasifr rasifr force-pushed the fix/SPOC-518/exception-log-error-propagation branch from 886477d to 5dbdcb4 Compare April 23, 2026 16:49
@mason-sharp mason-sharp merged commit 69a2ca2 into v5_STABLE Apr 24, 2026
9 of 10 checks passed
@mason-sharp mason-sharp deleted the fix/SPOC-518/exception-log-error-propagation branch April 24, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants