temporal-spring-ai: add side-effect replay tests for chat, activity tools, and @SideEffectTool by donald-pinckney · Pull Request #2856 · temporalio/sdk-java

donald-pinckney · 2026-04-21T22:30:31Z

What was changed

Three new tests under src/test/.../replay/:

ChatModelSideEffectTest — register a ChatModel with an AtomicInteger counter. Run a workflow that makes one chat call, assert counter=1. Replay the captured history, assert counter still 1 — the activity result comes from history, not from re-invoking the ChatModel.
ActivityToolSideEffectTest — activity-backed @Tool whose impl increments a counter. A tool-calling stub ChatModel asks for the tool on the first call and returns final text on the second. Same shape: counter=1 after run, counter=1 after replay.
SideEffectToolReplayTest — @SideEffectTool body increments a file-scope counter. The workflow drives one tool call. Assertion proves that Workflow.sideEffect's marker is consulted on replay rather than re-invoking the @Tool method.

MCP is intentionally omitted for this PR. spring-ai-mcp is compileOnly, and adding it just for one replay test isn't worth the dep weight — MCP tool calls go through the same Temporal activity machinery as ChatModel, which ChatModelSideEffectTest already exercises. The plan leaves this as an acceptable substitution.

Why?

The existing WorkflowDeterminismTest catches history-vs-command mismatches but does not prove the plugin isn't re-invoking user side effects on replay. Temporal's AI partner review standards require side-effect safety tests, and a regression — someone adding an in-workflow cache or dropping the Workflow.sideEffect wrap — would be easy to slip in without these.

I verified the SideEffectToolReplayTest actually catches a real regression: temporarily dropping the Workflow.sideEffect(...) wrap in SideEffectToolCallback.call(...) produces expected: <1> but was: <2> exactly as intended. Restored before this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@tool

…ools, and @SideEffectTool Three new tests under src/test/.../replay/: - ChatModelSideEffectTest: register a ChatModel with an AtomicInteger counter. Run a workflow that makes one chat call, assert counter=1. Replay the captured history, assert counter still 1 — the activity result comes from history, not from re-invoking the ChatModel. - ActivityToolSideEffectTest: activity-backed @tool whose impl increments a counter. ToolCallingStubChatModel asks for the tool on the first call and returns final text on the second. Same assertion shape: counter=1 after run, counter=1 after replay. - SideEffectToolReplayTest: @SideEffectTool body increments a counter via a file-scope static. Workflow drives a tool call through ToolCallingStubChatModel. The assertion proves that Workflow.sideEffect's marker is what's consulted on replay rather than re-invoking the @tool method. MCP is intentionally omitted — spring-ai-mcp is compileOnly and adding it just for one test isn't worth the dep weight. MCP tool calls go through the same Temporal activity machinery as ChatModel, which ChatModelSideEffectTest already covers. I verified the SideEffectToolReplayTest catches a real regression by temporarily dropping the Workflow.sideEffect wrap in SideEffectToolCallback; the test correctly failed with `expected: <1> but was: <2>`. Restored before this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Planning scratchpad — not part of the shipped artifact. Removed before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mjameswh · 2026-04-23T19:16:27Z

+    assertEquals(
+        1,
+        model.callCount.get(),
+        "ChatModel must not be re-invoked during replay — activity results come from history");


That sounds like you're testing Temporal itself rather than the plugin's code.

I understand that's not really your intent, but the line is thin, and future readers of this code (both humans and agents) might easily misunderstand this.

Yeah, agreed. Assertion comments have been re-worked to make it more clear.

mjameswh · 2026-04-23T19:42:54Z

+            ChatWorkflow.class, WorkflowOptions.newBuilder().setTaskQueue(TASK_QUEUE).build());
+    assertEquals("pong", workflow.chat("ping"));
+    assertEquals(
+        1, model.callCount.get(), "ChatModel should be called once during the initial run");


The test methodology isn't exact, which may result in flakes. It is actually possible that the activity function might end up being executed more than once, due to retry, and so the counter might be higher than one here.

A scheduled activity with maxAttempts > 1 will run exactly once to completion as observed by the Temporal server, but the activity itself has no way to know whether the server will have effectively observed the task completion.

Instead of this atomic counter, you can use a concurrency-safe Set that records the activity ID of all activity tasks that were executed. The size of that set is what you are looking for.

Alternatively, you could use a very different test approach: look at the workflow history and filter on ActivityTaskScheduled events. The expectation is that there should be one ActivityTaskScheduled event for this specific activity type.

Thats a very good point. I'll fix this, but I'll also work on updating the docs, since they are really vague about the right way to test this:

Check for duplicate side effects or other types of failures.

https://docs.temporal.io/develop/plugins-guide#side-effects-tests

Ok, ChatModelSideEffectTest and ActivityToolSideEffectTest have been changed to inspecting workflow history.

SideEffectToolReplayTest has been left as-is, I think its fine for Workflow.sideEffect.

donald-pinckney · 2026-04-23T20:00:39Z

This PR definitely needs more work on the tests. Will re-work it some.

Configure TestWorkflowEnvironment with WorkflowCacheSize(0) so the worker replays from history on every workflow task instead of resuming from in-memory cached state. That is the regime in which side-effect safety actually has to hold: a missing Workflow.sideEffect wrap or an un-guarded in-workflow mutation would run on each replay, which cached resumes mask. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reviewer pointed out that "ChatModel must not be re-invoked during replay" reads as testing Temporal's activity replay guarantee rather than the plugin's behavior. Reword class-level javadoc and assertion messages so the property under test is explicit: the plugin places ChatModel calls, activity-stub tool calls, and @SideEffectTool bodies behind the correct Temporal boundary. Temporal's replay/memoization semantics are assumed correct. The counter-stays-at-1 observation is the signal we use to detect a plugin regression, not the thing being tested. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…unters Reviewer noted that counting invocations of the backing impl can conflate two signals: the plugin inlining a call (what we want to catch) vs. Temporal re-delivering an activity task to the worker (which can legitimately happen with maxAttempts > 1, producing >1 impl executions for a single scheduled activity). Switch ChatModelSideEffectTest and ActivityToolSideEffectTest to scan workflow history for ActivityTaskScheduled events of the expected activity type. That count is invariant under activity-task redelivery and speaks directly to the plugin property under test ("the plugin scheduled the call as an activity"). Drops the impl counters and the WorkflowReplayer pass — the former is no longer needed, the latter is redundant with WorkflowDeterminismTest. SideEffectToolReplayTest intentionally keeps its counter: the Workflow.sideEffect path runs in workflow code under deterministic replay semantics, not as an activity, so the retry-induced flake mode doesn't apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

donald-pinckney · 2026-04-23T20:51:14Z

@mjameswh PR has been improved, and I also opened a PR on docs to explain more about how to write these types of tests: temporalio/documentation#4481

donald-pinckney and others added 3 commits April 21, 2026 15:54

temporal-spring-ai: plan — side-effect replay tests

7427cfc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

temporal-spring-ai: drop PLAN.md

f5058af

Planning scratchpad — not part of the shipped artifact. Removed before merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

donald-pinckney marked this pull request as ready for review April 22, 2026 19:52

donald-pinckney requested a review from a team as a code owner April 22, 2026 19:52

brianstrauch approved these changes Apr 23, 2026

View reviewed changes

mjameswh reviewed Apr 23, 2026

View reviewed changes

donald-pinckney and others added 5 commits April 23, 2026 16:08

Merge branch 'master' into spring-ai/sideeffect-replay-tests

45106b4

Merge branch 'master' into spring-ai/sideeffect-replay-tests

c3a8a15

donald-pinckney requested a review from mjameswh April 23, 2026 20:51

maciejdudko approved these changes Apr 24, 2026

View reviewed changes

donald-pinckney added 2 commits April 24, 2026 14:39

Merge branch 'master' into spring-ai/sideeffect-replay-tests

d42e47b

Merge branch 'master' into spring-ai/sideeffect-replay-tests

d5eeea2

donald-pinckney merged commit 9eae4a8 into master Apr 24, 2026
18 checks passed

donald-pinckney deleted the spring-ai/sideeffect-replay-tests branch April 24, 2026 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

temporal-spring-ai: add side-effect replay tests for chat, activity tools, and @SideEffectTool#2856

temporal-spring-ai: add side-effect replay tests for chat, activity tools, and @SideEffectTool#2856
donald-pinckney merged 10 commits intomasterfrom
spring-ai/sideeffect-replay-tests

donald-pinckney commented Apr 21, 2026

Uh oh!

mjameswh Apr 23, 2026

Uh oh!

donald-pinckney Apr 23, 2026

Uh oh!

mjameswh Apr 23, 2026

Uh oh!

donald-pinckney Apr 23, 2026

Uh oh!

donald-pinckney Apr 23, 2026

Uh oh!

donald-pinckney commented Apr 23, 2026

Uh oh!

donald-pinckney commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

donald-pinckney commented Apr 21, 2026

What was changed

Why?

Uh oh!

mjameswh Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

donald-pinckney Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mjameswh Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

donald-pinckney Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

donald-pinckney Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

donald-pinckney commented Apr 23, 2026

Uh oh!

donald-pinckney commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants