| Field | Value |
|---|---|
| RFC | 0041 |
| Title | Multi-agent execution model Phase 4: LLM cache-key recipe normation + envelope-refusal recovery in replay context + determinism vs idempotency contract |
| Status | Accepted |
| Author(s) | David Tufts (@davidscotttufts) |
| Created | 2026-05-22 |
| Updated | 2026-05-25 (Active → Accepted: MyndHyve workflow-runtime advertises multiAgent.executionModel.{version: 4, replayDeterminism: {supported: true, llmCacheKeyRecipe: "spec-rfc-0041", refusalDivergenceEmission: true}} live on https://myndhyve.ai/.well-known/openwop — verified 2026-05-25 via direct curl. MyndHyve commit 708753e7 lands §C Tier-2 Firestore-backed observable-result cache (workspace-scoped doc paths workspaces/{wsId}/observableResultCache/{runId}__{nodeId}__{attempt}__{cacheKeyHash} with defense-in-depth workspaceId field check on read; Firestore TTL on expiresAt + read-time stale-check; read failures fail-safe to provider call; write failures logged at warn; pluggable backend interface gated on OBSERVABLE_RESULT_CACHE_BACKEND env). serverCallAI.ts migrated to async getCachedObservableResultAsync (await on replay-mode reads) + setCachedObservableResultAsync (fire-and-forget write in finally). §D advertise lands in the same commit — gated on the env var being set; production deploys without the env var honestly stay at version: 3 + omit replayDeterminism. Staged rollout: code deployed at workflow-runtime-00205-2pc (advertise stayed version: 3); env var flipped via gcloud run services update --update-env-vars OBSERVABLE_RESULT_CACHE_BACKEND=firestore; new revision workflow-runtime-00206-tdh carries the env var and advertises the §D block end-to-end (version: 4 + full replayDeterminism sub-block). Phase 1-4 multi-agent execution model roadmap (RFCs 0037 + 0039 + 0040 + 0041) NOW CLOSED end-to-end on a non-steward host — version: 4 is the final ladder rung, advertised honestly with all four phases' MUST-tier surfaces wired in production. The §B refusal-divergence behavioral driver remains an upstream suite-side it.todo (the cross-revision harness that constructs a source run + drives a replay against a deployed host has not been authored on the openwop side); MyndHyve's serverCallAI.ts:checkRefusalDivergence wiring is implementation-ready and will exercise the driver when it lands. Per the bootstrap-phase rule (advertisement + scenario pass-modulo-honest-skip), the path-to-Accepted bar is met: replay-divergence-at-refusal.test.ts advertisement-shape probe PASSes (block present, three required fields verified) + replay-llm-cache-key.test.ts + replay-llm-cache-key-portable.test.ts PASS against the live MyndHyve target (host-sample seam reachable post-60b569de registerHostSampleRoutes wire-up). Prior 2026-05-22 (Draft → Active same-day: Phase 4 spec text + schemas + 3 conformance scenarios + SECURITY invariant + capability advertisement landed atomically following the RFC 0034/0037/0039/0040 pattern. spec/v1/replay.md gains §"Replay determinism under nondeterministic models (RFC 0041 Phase 4, normative)" with §A LLM-cache-key recipe promotion, §B refusal-divergence recovery, §C observable-output-sequence determinism. spec/v1/multi-agent-execution.md gains §"Phase 4 replay determinism (RFC 0041, normative)" pointing into replay.md. schemas/capabilities.schema.json adds replayDeterminism.{supported, llmCacheKeyRecipe, refusalDivergenceEmission} sub-block. schemas/run-event.schema.json RunEventType enum gains replay.divergedAtRefusal. schemas/run-event-payloads.schema.json adds replayDivergedAtRefusal payload with required {sourceRunId, atSequence, originalEnvelopeKind, replayEnvelopeKind} + optional originalEventId/nodeId/refusalReason. spec/v1/rest-endpoints.md gains replay_diverged_at_refusal error code. SECURITY/invariants.yaml gains replay-llm-cache-key-portable row pointing at the existing replay-llm-cache-key.test.ts plus the new portable-key scenario. NEW conformance scenarios: replay-divergence-at-refusal.test.ts (advertisement-shape probe + 2 behavioral todos for the dual-direction refusal-divergence case), replay-observable-sequence-determinism.test.ts (capability-gated; behavioral assertion soft-skipped until a nondeterministic-tool fixture ships), replay-llm-cache-key-portable.test.ts (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing POST /v1/host/sample/test/llm-cache-key seam). Reference-host implementation (workflow-engine staged-refusal seam + nondeterministic-tool fixture + Phase 4 advertisement) deferred to follow-up commits owned by the workflow-engine maintainer; the protocol-layer contract is complete. Path to Accepted: a non-steward host advertises replayDeterminism.supported: true + llmCacheKeyRecipe: "spec-rfc-0041" + passes the portable-key + refusal-divergence + observable-sequence scenarios.) |
| Affects | spec/v1/replay.md (extends with §"Replay under non-deterministic agents (Phase 4, normative)") · spec/v1/multi-agent-execution.md (extends with §"Phase 4 replay determinism") · schemas/capabilities.schema.json (bumps multiAgent.executionModel.version ceiling effective range to include 4; adds optional replayDeterminism block) · 3 new conformance scenarios (replacing replay-llm-cache-key.test.ts placeholders) · SECURITY/invariants.yaml (adds replay-llm-cache-key-portable SECURITY invariant) · INTEROP-MATRIX.md · CHANGELOG |
| Compatibility | additive |
| Supersedes | — |
| Superseded by | — |
Summary
Closes the final 3 open spec gaps from RFC 0037 §"Open spec gaps":
- MAE-7 (
LLM cache-key recipe): normate the recipespec/v1/replay.md§"LLM cache-key recipe" already documents informationally. Todayreplay-llm-cache-key.test.tsis shape-only (3it.todo()placeholders perdocs/KNOWN-LIMITS.md:18); Phase 4 graduates them to behavioral assertions. - MAE-8 (
envelope-refusal recovery in replay): define what happens when the original run got a valid envelope from the model but the replay gets a refusal (or vice-versa). - MAE-9 (
determinism vs idempotency): formalize that replay produces the same OBSERVABLE OUTPUT SEQUENCE even when underlying tool calls differ — the user-visible state at each event-log index is bit-equivalent across replays even if a tool call's bytes-on-the-wire differ.
Bumps multiAgent.executionModel.version from 3 (Phase 3, RFC 0040) to 4 (Phase 4, this RFC) when implemented. The capability-version ceiling at 4 was already reserved in the schema's version enum range.
Motivation
OpenWOP's replay contract works for deterministic node executors (existing replay.md machinery). For nondeterministic executors — LLM calls being the load-bearing case — replay determinism depends on cache-key portability + a clear contract for what happens when the model's response shifts between runs.
The external standards-readiness review of 2026-05-21 flagged "replay under nondeterministic model behavior" as part of finding (3). RFC 0037 Phase 1 + RFC 0039 Phase 2 + RFC 0040 Phase 3 close the per-host and cross-host portability halves; this RFC closes the temporal-portability half (same workflow definition, two runs at different times → same observable output sequence on replay).
This is the LAST of the four phases from RFC 0037's roadmap. After Phase 4 Accepts, the multi-agent execution model is closed.
Proposal — Phased (still substantial)
§A — LLM cache-key recipe normation (MAE-7 closure)
spec/v1/replay.md §"LLM cache-key recipe" §B already documents the recipe as a CONDITIONAL MUST — hosts MUST compute the key per the recipe for any node that calls an LLM provider through the Layer-2 idempotency surface. Phase 4 strengthens this in two ways: (1) the MUST becomes unconditional (applies to ALL LLM-calling nodes when multiAgent.executionModel.version >= 4, regardless of Layer-2 idempotency usage); (2) the host's commitment to the recipe becomes observable via the replayDeterminism.llmCacheKeyRecipe discovery field. The recipe shape itself is unchanged from §B:
Hosts MUST compute the cache key for an LLM call as:
```text
SHA-256(canonicalize({
model: <stable-model-identifier>,
provider: <provider-identifier>,
messages: <canonical-message-array>,
tools: <canonical-tool-array-or-empty>,
temperature: <number-or-null>,
responseSchema: <canonical-schema-or-null>
}))
```
where
canonicalize(...)is JSON canonicalization per RFC 8785 (JCS). The key is deterministic across hosts that follow the recipe: given the same model + provider + messages + tools + temperature + schema, two independent hosts produce the same key. Cached responses are keyed by this hash; on replay, a host that has the same key in cache MUST return the cached response (subject to the §B refusal-recovery contract below).
§B — Envelope-refusal recovery in replay (MAE-8 closure)
When the original run got a valid envelope from the model but the replay gets a refusal (e.g., the model's safety-filter has tightened since the original run):
Hosts that advertise
multiAgent.executionModel.version >= 4MUST surface this via a newreplay.divergedAtRefusalevent AND fail the replay witherror.code: "replay_diverged_at_refusal"(NEW error code perspec/v1/rest-endpoints.md§"Common error codes"). The replay MUST NOT silently substitute the refusal for the original envelope — operators MUST be informed that the workflow's behavior would diverge under current model state.
The inverse case (original got a refusal, replay gets a valid envelope) follows the same contract: emit replay.divergedAtRefusal and fail. Both directions of divergence are observable; silent acceptance would hide a meaningful state shift.
§C — Determinism vs idempotency contract (MAE-9 closure)
Add to spec/v1/replay.md:
The replay contract is OBSERVABLE-OUTPUT-SEQUENCE determinism, NOT bit-equivalent execution determinism. Concretely:
- The sequence of
RunEventDocrecords appended to the event log at indices[0, fromSeq]MUST be byte-equivalent between original and replay (modulo per-region clock fields per RFC 0036 §E).- The variables, channel state, and
RunSnapshot.statusat each event-log index MUST be byte-equivalent.- The bytes-on-the-wire of underlying tool/LLM calls MAY differ (e.g., a tool call with non-deterministic remote state, an LLM call against a model whose weights shifted) AS LONG AS the resulting observable state at each index is byte-equivalent.
Hosts MUST NOT cache observable state ONLY at the tool-call boundary — they MUST cache the observable result (return value, side-effects on workflow state, emitted events) so a replay reproduces the observable sequence even when the underlying call differs.
§D — Capability advertisement
"multiAgent": {
"executionModel": {
...,
+ "replayDeterminism": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": ["supported"],
+ "properties": {
+ "supported": { "type": "boolean" },
+ "llmCacheKeyRecipe": {
+ "type": "string",
+ "anyOf": [
+ { "const": "spec-rfc-0041" },
+ { "pattern": "^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$" }
+ ],
+ "description": "The cache-key recipe the host honors. `spec-rfc-0041` = this RFC §A. Vendor-specific recipes use the canonical host-extension namespace string matching `^x-host-<host>-<key>$` per `spec/v1/host-extensions.md` §'Canonical prefixes'; the matching algorithm MUST be documented at the host's discovery doc."
+ },
+ "refusalDivergenceEmission": {
+ "type": "boolean",
+ "description": "Host emits replay.divergedAtRefusal events + fails with replay_diverged_at_refusal per §B."
+ }
+ }
+ }
}
}
Hosts advertising multiAgent.executionModel.version: 4 MUST also advertise replayDeterminism.supported: true + name the recipe + commit to refusal-divergence emission.
§E — SECURITY invariant: replay-llm-cache-key-portable
SECURITY/invariants.yaml gains a new protocol-tier invariant:
- id: replay-llm-cache-key-portable
tier: protocol
severity: high
threat_model: SECURITY/threat-model-secret-leakage.md
tests:
- conformance/src/scenarios/replay-llm-cache-key-portable.test.ts
note: |
RFC 0041 §A. The LLM cache-key recipe MUST be byte-deterministic
across independent hosts that follow the recipe. Conformance asserts
two hosts (or two seq-N test runs against one host) produce the same
key for the same canonical input. Lets the cross-host replay contract
survive host migration + multi-region deployment.
Compatibility
Additive. Hosts at version 1-3 continue unchanged. Hosts upgrading to version 4:
- Adopt the unconditional MUST in §A — apply the §"LLM cache-key recipe" §B recipe to ALL LLM-calling nodes, not just those using Layer-2 idempotency. Compatible with hosts that already implement the recipe for the Layer-2 case; non-conformant for hosts that use an idiosyncratic key shape OR skip the recipe for non-Layer-2-idempotent nodes.
- Emit the new
replay.divergedAtRefusalevent when replay-refusal-divergence occurs (additive RunEventType; pre-version-4 consumers ignore). - Implement §C observable-sequence determinism (most hosts already do this implicitly via existing replay machinery; §C makes the contract explicit).
Conformance
3 new conformance scenarios land alongside the existing replay-llm-cache-key.test.ts (which contrary to the prior version of this RFC is NOT shaped as it.todo() placeholders — it ships 5 behavioral assertions against the existing POST /v1/host/sample/test/llm-cache-key seam, gated on 404-skip when the seam isn't exposed). The new scenarios cover the surfaces RFC 0041 introduces:
replay-llm-cache-key-portable.test.ts— capability-gated onreplayDeterminism.supported: true. Reuses the existing seam; adds intra-host reproducibility (key recomputable offline), non-recipe-field invariance (security boundary: request id / trace context / tenant id MUST NOT influence the key), and Phase 4 advertisement-alignment (replayDeterminism.llmCacheKeyRecipeMUST equalspec-rfc-0041or match^x-host-<host>-<recipe>$).replay-divergence-at-refusal.test.ts— capability-gated onrefusalDivergenceEmission: true. Advertisement-shape probe (always-on when discovery reachable) + 2 behavioralit.todofor the two refusal-divergence directions (original=valid + replay=refusal AND original=refusal + replay=valid). Behavioral assertion lands when reference workflow-engine wires a staged-refusal mode on its mock-AI provider.replay-observable-sequence-determinism.test.ts— capability-gated. Tests the boundary byte-equivalence claim of §C (event-log prefix[0, fromSeq]byte-equivalent modulo per-region clock + ULID-T entropy) and the observable-result caching claim (replay reproduces the original observable result for nondeterministic tool calls). Behavioral assertion lands when aconformance-phase4-nondet-toolfixture ships.
Alternatives considered
1. Skip MAE-8 (refusal divergence) — let hosts silently substitute. Rejected — silent substitution masks safety-policy shifts. Operators MUST be able to audit when their replays' behavior would diverge. 2. Mandate bit-equivalent execution (not just observable-output equivalence). Rejected — bit-equivalence requires every nondeterministic call to be cached forever (memory cost), and breaks legitimate use cases like tool calls against remote stateful APIs. 3. Defer MAE-9 to a separate "replay-semantics" RFC after Phase 4. Rejected — observable-sequence vs bit-equivalent is the load-bearing contract distinction; deferring leaves the spec ambiguous on the most consequential question.
Unresolved questions
1. Canonical message format. §A's <canonical-message-array> needs to nail down field ordering + null/undefined semantics. The OpenAI chat.completions shape differs from Anthropic's messages and Gemini's contents; the canonical form needs to be vendor-neutral. Recommend: spec the canonical form as JSON Schema in schemas/llm-canonical-message.schema.json; defer the schema landing to the comment-window discussion. 2. Tool-call non-determinism in observable state. §C says hosts MUST cache observable result, not just the tool-call result. What about tools whose observable result depends on time of day (getCurrentTime())? Recommend: tools that return non-cacheable state advertise via a NEW tool.nondeterministic: true field on AgentManifest's tool declarations; replay walks the cache transparently and re-issues for nondeterministic tools. 3. Cross-host cache sharing. If host A caches an LLM response and host B replays the run, can host B use host A's cache? RFC 0040 Phase 3 + RFC 0041 Phase 4 together define the surface but the cache-sharing protocol is a meta-question. Defer.
Acceptance criteria
- [x] Spec text merged (this file).
- [x]
spec/v1/replay.mdextended with §A + §B + §C normative text. - [x]
spec/v1/multi-agent-execution.mdextended with §"Phase 4 replay determinism". - [x]
schemas/capabilities.schema.jsonextendsmultiAgent.executionModelwithreplayDeterminismblock per §D. - [x]
schemas/run-event.schema.jsonRunEventTypeenum gainsreplay.divergedAtRefusal. - [x]
schemas/run-event-payloads.schema.jsongainsreplayDivergedAtRefusalpayload schema. - [x]
spec/v1/rest-endpoints.md§"Common error codes" gainsreplay_diverged_at_refusal. - [x]
SECURITY/invariants.yamlgainsreplay-llm-cache-key-portablerow with public test glob per §E. - [x] 3 new conformance scenarios per §Conformance.
- [x]
docs/KNOWN-LIMITS.mdreplay-llm-cache-key.test.tsrow corrected (the file is NOT it.todo placeholders — its behavioral coverage at the single-host boundary is in place; the residual known-limit is cross-host parity, which still depends onOPENWOP_BASE_URL_B). - [ ] At least one reference host advertises
version: 4+ passes the 3 scenarios. (Path-to-Accepted.) - [ ]
INTEROP-MATRIX.mdupdated. (Will land alongside the reference-host implementation that advertisesversion: 4.) - [x] CHANGELOG entry under
[Unreleased].
Path to Active → Accepted: cross-host advertisement evidence per RFCs/0001-rfc-process.md §"Promotion to Accepted." The multi-agent execution model roadmap closes when this RFC reaches Accepted.
References
RFCS/0037-multi-agent-execution-model.md§"Open spec gaps" MAE-7, MAE-8, MAE-9.RFCS/0039-multi-agent-confidence-and-memory-lifecycle.md(Phase 2).RFCS/0040-multi-agent-cross-host-causation.md(Phase 3 — this RFC's predecessor).spec/v1/replay.md§"LLM cache-key recipe" + §"Determinism with non-deterministic agents" (the docs §A + §C extend).docs/KNOWN-LIMITS.mdline 18 (the row this RFC closes).- External standards-readiness review 2026-05-21 — finding (3).