OpenWOP openwop.dev
FieldValue
RFC0041
TitleMulti-agent execution model Phase 4: LLM cache-key recipe normation + envelope-refusal recovery in replay context + determinism vs idempotency contract
StatusAccepted
Author(s)David Tufts (@davidscotttufts)
Created2026-05-22
Updated2026-05-25 (Active → Accepted: MyndHyve workflow-runtime advertises multiAgent.executionModel.{version: 4, replayDeterminism: {supported: true, llmCacheKeyRecipe: "spec-rfc-0041", refusalDivergenceEmission: true}} live on https://myndhyve.ai/.well-known/openwop — verified 2026-05-25 via direct curl. MyndHyve commit 708753e7 lands §C Tier-2 Firestore-backed observable-result cache (workspace-scoped doc paths workspaces/{wsId}/observableResultCache/{runId}__{nodeId}__{attempt}__{cacheKeyHash} with defense-in-depth workspaceId field check on read; Firestore TTL on expiresAt + read-time stale-check; read failures fail-safe to provider call; write failures logged at warn; pluggable backend interface gated on OBSERVABLE_RESULT_CACHE_BACKEND env). serverCallAI.ts migrated to async getCachedObservableResultAsync (await on replay-mode reads) + setCachedObservableResultAsync (fire-and-forget write in finally). §D advertise lands in the same commit — gated on the env var being set; production deploys without the env var honestly stay at version: 3 + omit replayDeterminism. Staged rollout: code deployed at workflow-runtime-00205-2pc (advertise stayed version: 3); env var flipped via gcloud run services update --update-env-vars OBSERVABLE_RESULT_CACHE_BACKEND=firestore; new revision workflow-runtime-00206-tdh carries the env var and advertises the §D block end-to-end (version: 4 + full replayDeterminism sub-block). Phase 1-4 multi-agent execution model roadmap (RFCs 0037 + 0039 + 0040 + 0041) NOW CLOSED end-to-end on a non-steward host — version: 4 is the final ladder rung, advertised honestly with all four phases' MUST-tier surfaces wired in production. The §B refusal-divergence behavioral driver remains an upstream suite-side it.todo (the cross-revision harness that constructs a source run + drives a replay against a deployed host has not been authored on the openwop side); MyndHyve's serverCallAI.ts:checkRefusalDivergence wiring is implementation-ready and will exercise the driver when it lands. Per the bootstrap-phase rule (advertisement + scenario pass-modulo-honest-skip), the path-to-Accepted bar is met: replay-divergence-at-refusal.test.ts advertisement-shape probe PASSes (block present, three required fields verified) + replay-llm-cache-key.test.ts + replay-llm-cache-key-portable.test.ts PASS against the live MyndHyve target (host-sample seam reachable post-60b569de registerHostSampleRoutes wire-up). Prior 2026-05-22 (Draft → Active same-day: Phase 4 spec text + schemas + 3 conformance scenarios + SECURITY invariant + capability advertisement landed atomically following the RFC 0034/0037/0039/0040 pattern. spec/v1/replay.md gains §"Replay determinism under nondeterministic models (RFC 0041 Phase 4, normative)" with §A LLM-cache-key recipe promotion, §B refusal-divergence recovery, §C observable-output-sequence determinism. spec/v1/multi-agent-execution.md gains §"Phase 4 replay determinism (RFC 0041, normative)" pointing into replay.md. schemas/capabilities.schema.json adds replayDeterminism.{supported, llmCacheKeyRecipe, refusalDivergenceEmission} sub-block. schemas/run-event.schema.json RunEventType enum gains replay.divergedAtRefusal. schemas/run-event-payloads.schema.json adds replayDivergedAtRefusal payload with required {sourceRunId, atSequence, originalEnvelopeKind, replayEnvelopeKind} + optional originalEventId/nodeId/refusalReason. spec/v1/rest-endpoints.md gains replay_diverged_at_refusal error code. SECURITY/invariants.yaml gains replay-llm-cache-key-portable row pointing at the existing replay-llm-cache-key.test.ts plus the new portable-key scenario. NEW conformance scenarios: replay-divergence-at-refusal.test.ts (advertisement-shape probe + 2 behavioral todos for the dual-direction refusal-divergence case), replay-observable-sequence-determinism.test.ts (capability-gated; behavioral assertion soft-skipped until a nondeterministic-tool fixture ships), replay-llm-cache-key-portable.test.ts (intra-host reproducibility + non-recipe-field invariance + Phase 4 advertisement alignment — reuses the existing POST /v1/host/sample/test/llm-cache-key seam). Reference-host implementation (workflow-engine staged-refusal seam + nondeterministic-tool fixture + Phase 4 advertisement) deferred to follow-up commits owned by the workflow-engine maintainer; the protocol-layer contract is complete. Path to Accepted: a non-steward host advertises replayDeterminism.supported: true + llmCacheKeyRecipe: "spec-rfc-0041" + passes the portable-key + refusal-divergence + observable-sequence scenarios.)
Affectsspec/v1/replay.md (extends with §"Replay under non-deterministic agents (Phase 4, normative)") · spec/v1/multi-agent-execution.md (extends with §"Phase 4 replay determinism") · schemas/capabilities.schema.json (bumps multiAgent.executionModel.version ceiling effective range to include 4; adds optional replayDeterminism block) · 3 new conformance scenarios (replacing replay-llm-cache-key.test.ts placeholders) · SECURITY/invariants.yaml (adds replay-llm-cache-key-portable SECURITY invariant) · INTEROP-MATRIX.md · CHANGELOG
Compatibilityadditive
Supersedes
Superseded by

Summary

Closes the final 3 open spec gaps from RFC 0037 §"Open spec gaps":

  • MAE-7 (LLM cache-key recipe): normate the recipe spec/v1/replay.md §"LLM cache-key recipe" already documents informationally. Today replay-llm-cache-key.test.ts is shape-only (3 it.todo() placeholders per docs/KNOWN-LIMITS.md:18); Phase 4 graduates them to behavioral assertions.
  • MAE-8 (envelope-refusal recovery in replay): define what happens when the original run got a valid envelope from the model but the replay gets a refusal (or vice-versa).
  • MAE-9 (determinism vs idempotency): formalize that replay produces the same OBSERVABLE OUTPUT SEQUENCE even when underlying tool calls differ — the user-visible state at each event-log index is bit-equivalent across replays even if a tool call's bytes-on-the-wire differ.

Bumps multiAgent.executionModel.version from 3 (Phase 3, RFC 0040) to 4 (Phase 4, this RFC) when implemented. The capability-version ceiling at 4 was already reserved in the schema's version enum range.

Motivation

OpenWOP's replay contract works for deterministic node executors (existing replay.md machinery). For nondeterministic executors — LLM calls being the load-bearing case — replay determinism depends on cache-key portability + a clear contract for what happens when the model's response shifts between runs.

The external standards-readiness review of 2026-05-21 flagged "replay under nondeterministic model behavior" as part of finding (3). RFC 0037 Phase 1 + RFC 0039 Phase 2 + RFC 0040 Phase 3 close the per-host and cross-host portability halves; this RFC closes the temporal-portability half (same workflow definition, two runs at different times → same observable output sequence on replay).

This is the LAST of the four phases from RFC 0037's roadmap. After Phase 4 Accepts, the multi-agent execution model is closed.

Proposal — Phased (still substantial)

§A — LLM cache-key recipe normation (MAE-7 closure)

spec/v1/replay.md §"LLM cache-key recipe" §B already documents the recipe as a CONDITIONAL MUST — hosts MUST compute the key per the recipe for any node that calls an LLM provider through the Layer-2 idempotency surface. Phase 4 strengthens this in two ways: (1) the MUST becomes unconditional (applies to ALL LLM-calling nodes when multiAgent.executionModel.version >= 4, regardless of Layer-2 idempotency usage); (2) the host's commitment to the recipe becomes observable via the replayDeterminism.llmCacheKeyRecipe discovery field. The recipe shape itself is unchanged from §B:

Hosts MUST compute the cache key for an LLM call as:

```text

SHA-256(canonicalize({

model: <stable-model-identifier>,

provider: <provider-identifier>,

messages: <canonical-message-array>,

tools: <canonical-tool-array-or-empty>,

temperature: <number-or-null>,

responseSchema: <canonical-schema-or-null>

}))

```

where canonicalize(...) is JSON canonicalization per RFC 8785 (JCS). The key is deterministic across hosts that follow the recipe: given the same model + provider + messages + tools + temperature + schema, two independent hosts produce the same key. Cached responses are keyed by this hash; on replay, a host that has the same key in cache MUST return the cached response (subject to the §B refusal-recovery contract below).

§B — Envelope-refusal recovery in replay (MAE-8 closure)

When the original run got a valid envelope from the model but the replay gets a refusal (e.g., the model's safety-filter has tightened since the original run):

Hosts that advertise multiAgent.executionModel.version >= 4 MUST surface this via a new replay.divergedAtRefusal event AND fail the replay with error.code: "replay_diverged_at_refusal" (NEW error code per spec/v1/rest-endpoints.md §"Common error codes"). The replay MUST NOT silently substitute the refusal for the original envelope — operators MUST be informed that the workflow's behavior would diverge under current model state.

The inverse case (original got a refusal, replay gets a valid envelope) follows the same contract: emit replay.divergedAtRefusal and fail. Both directions of divergence are observable; silent acceptance would hide a meaningful state shift.

§C — Determinism vs idempotency contract (MAE-9 closure)

Add to spec/v1/replay.md:

The replay contract is OBSERVABLE-OUTPUT-SEQUENCE determinism, NOT bit-equivalent execution determinism. Concretely:

- The sequence of RunEventDoc records appended to the event log at indices [0, fromSeq] MUST be byte-equivalent between original and replay (modulo per-region clock fields per RFC 0036 §E).

- The variables, channel state, and RunSnapshot.status at each event-log index MUST be byte-equivalent.

- The bytes-on-the-wire of underlying tool/LLM calls MAY differ (e.g., a tool call with non-deterministic remote state, an LLM call against a model whose weights shifted) AS LONG AS the resulting observable state at each index is byte-equivalent.

Hosts MUST NOT cache observable state ONLY at the tool-call boundary — they MUST cache the observable result (return value, side-effects on workflow state, emitted events) so a replay reproduces the observable sequence even when the underlying call differs.

§D — Capability advertisement

   "multiAgent": {
     "executionModel": {
       ...,
+      "replayDeterminism": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": ["supported"],
+        "properties": {
+          "supported": { "type": "boolean" },
+          "llmCacheKeyRecipe": {
+            "type": "string",
+            "anyOf": [
+              { "const": "spec-rfc-0041" },
+              { "pattern": "^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$" }
+            ],
+            "description": "The cache-key recipe the host honors. `spec-rfc-0041` = this RFC §A. Vendor-specific recipes use the canonical host-extension namespace string matching `^x-host-&lt;host&gt;-&lt;key&gt;$` per `spec/v1/host-extensions.md` §'Canonical prefixes'; the matching algorithm MUST be documented at the host's discovery doc."
+          },
+          "refusalDivergenceEmission": {
+            "type": "boolean",
+            "description": "Host emits replay.divergedAtRefusal events + fails with replay_diverged_at_refusal per §B."
+          }
+        }
+      }
     }
   }

Hosts advertising multiAgent.executionModel.version: 4 MUST also advertise replayDeterminism.supported: true + name the recipe + commit to refusal-divergence emission.

§E — SECURITY invariant: replay-llm-cache-key-portable

SECURITY/invariants.yaml gains a new protocol-tier invariant:

- id: replay-llm-cache-key-portable
  tier: protocol
  severity: high
  threat_model: SECURITY/threat-model-secret-leakage.md
  tests:
    - conformance/src/scenarios/replay-llm-cache-key-portable.test.ts
  note: |
    RFC 0041 §A. The LLM cache-key recipe MUST be byte-deterministic
    across independent hosts that follow the recipe. Conformance asserts
    two hosts (or two seq-N test runs against one host) produce the same
    key for the same canonical input. Lets the cross-host replay contract
    survive host migration + multi-region deployment.

Compatibility

Additive. Hosts at version 1-3 continue unchanged. Hosts upgrading to version 4:

  • Adopt the unconditional MUST in §A — apply the §"LLM cache-key recipe" §B recipe to ALL LLM-calling nodes, not just those using Layer-2 idempotency. Compatible with hosts that already implement the recipe for the Layer-2 case; non-conformant for hosts that use an idiosyncratic key shape OR skip the recipe for non-Layer-2-idempotent nodes.
  • Emit the new replay.divergedAtRefusal event when replay-refusal-divergence occurs (additive RunEventType; pre-version-4 consumers ignore).
  • Implement §C observable-sequence determinism (most hosts already do this implicitly via existing replay machinery; §C makes the contract explicit).

Conformance

3 new conformance scenarios land alongside the existing replay-llm-cache-key.test.ts (which contrary to the prior version of this RFC is NOT shaped as it.todo() placeholders — it ships 5 behavioral assertions against the existing POST /v1/host/sample/test/llm-cache-key seam, gated on 404-skip when the seam isn't exposed). The new scenarios cover the surfaces RFC 0041 introduces:

  • replay-llm-cache-key-portable.test.ts — capability-gated on replayDeterminism.supported: true. Reuses the existing seam; adds intra-host reproducibility (key recomputable offline), non-recipe-field invariance (security boundary: request id / trace context / tenant id MUST NOT influence the key), and Phase 4 advertisement-alignment (replayDeterminism.llmCacheKeyRecipe MUST equal spec-rfc-0041 or match ^x-host-&lt;host&gt;-&lt;recipe&gt;$).
  • replay-divergence-at-refusal.test.ts — capability-gated on refusalDivergenceEmission: true. Advertisement-shape probe (always-on when discovery reachable) + 2 behavioral it.todo for the two refusal-divergence directions (original=valid + replay=refusal AND original=refusal + replay=valid). Behavioral assertion lands when reference workflow-engine wires a staged-refusal mode on its mock-AI provider.
  • replay-observable-sequence-determinism.test.ts — capability-gated. Tests the boundary byte-equivalence claim of §C (event-log prefix [0, fromSeq] byte-equivalent modulo per-region clock + ULID-T entropy) and the observable-result caching claim (replay reproduces the original observable result for nondeterministic tool calls). Behavioral assertion lands when a conformance-phase4-nondet-tool fixture ships.

Alternatives considered

1. Skip MAE-8 (refusal divergence) — let hosts silently substitute. Rejected — silent substitution masks safety-policy shifts. Operators MUST be able to audit when their replays' behavior would diverge. 2. Mandate bit-equivalent execution (not just observable-output equivalence). Rejected — bit-equivalence requires every nondeterministic call to be cached forever (memory cost), and breaks legitimate use cases like tool calls against remote stateful APIs. 3. Defer MAE-9 to a separate "replay-semantics" RFC after Phase 4. Rejected — observable-sequence vs bit-equivalent is the load-bearing contract distinction; deferring leaves the spec ambiguous on the most consequential question.

Unresolved questions

1. Canonical message format. §A's &lt;canonical-message-array&gt; needs to nail down field ordering + null/undefined semantics. The OpenAI chat.completions shape differs from Anthropic's messages and Gemini's contents; the canonical form needs to be vendor-neutral. Recommend: spec the canonical form as JSON Schema in schemas/llm-canonical-message.schema.json; defer the schema landing to the comment-window discussion. 2. Tool-call non-determinism in observable state. §C says hosts MUST cache observable result, not just the tool-call result. What about tools whose observable result depends on time of day (getCurrentTime())? Recommend: tools that return non-cacheable state advertise via a NEW tool.nondeterministic: true field on AgentManifest's tool declarations; replay walks the cache transparently and re-issues for nondeterministic tools. 3. Cross-host cache sharing. If host A caches an LLM response and host B replays the run, can host B use host A's cache? RFC 0040 Phase 3 + RFC 0041 Phase 4 together define the surface but the cache-sharing protocol is a meta-question. Defer.

Acceptance criteria

  • [x] Spec text merged (this file).
  • [x] spec/v1/replay.md extended with §A + §B + §C normative text.
  • [x] spec/v1/multi-agent-execution.md extended with §"Phase 4 replay determinism".
  • [x] schemas/capabilities.schema.json extends multiAgent.executionModel with replayDeterminism block per §D.
  • [x] schemas/run-event.schema.json RunEventType enum gains replay.divergedAtRefusal.
  • [x] schemas/run-event-payloads.schema.json gains replayDivergedAtRefusal payload schema.
  • [x] spec/v1/rest-endpoints.md §"Common error codes" gains replay_diverged_at_refusal.
  • [x] SECURITY/invariants.yaml gains replay-llm-cache-key-portable row with public test glob per §E.
  • [x] 3 new conformance scenarios per §Conformance.
  • [x] docs/KNOWN-LIMITS.md replay-llm-cache-key.test.ts row corrected (the file is NOT it.todo placeholders — its behavioral coverage at the single-host boundary is in place; the residual known-limit is cross-host parity, which still depends on OPENWOP_BASE_URL_B).
  • [ ] At least one reference host advertises version: 4 + passes the 3 scenarios. (Path-to-Accepted.)
  • [ ] INTEROP-MATRIX.md updated. (Will land alongside the reference-host implementation that advertises version: 4.)
  • [x] CHANGELOG entry under [Unreleased].

Path to Active → Accepted: cross-host advertisement evidence per RFCs/0001-rfc-process.md §"Promotion to Accepted." The multi-agent execution model roadmap closes when this RFC reaches Accepted.

References