| Field | Value |
|---|---|
| RFC | 0036 |
| Title | Multi-region idempotency + cross-engine append-ordering guarantees |
| Status | Accepted |
| Author(s) | David Tufts (@davidscotttufts) |
| Created | 2026-05-21 |
| Updated | 2026-05-29 (Active → Accepted: the deferred reference-host wiring landed, closing the stated path to Accepted. The Postgres reference host now advertises capabilities.idempotency.crossRegion: 'single-region' (the canonical categorical posture — honest: the host is single-region but ships the convergence resolver) + capabilities.eventLog.crossEngineOrdering.{supported: true, orderingModel: 'lamport'}, gated on OPENWOP_TEST_MULTI_REGION. Two conformance seams expose the canonical algorithms: POST /v1/host/sample/test/multi-region/simulate-partition runs the pure resolveCrossRegionConflict (lex-min winner, per-region cache redirects, cross_region_dedup_loss, order-invariant); POST/GET /v1/host/sample/test/cross-engine/{append,read,reset} is a Lamport-clock cross-engine ordering harness. Booted under OPENWOP_TEST_MULTI_REGION=1, all 14 scenario assertions pass strict (OPENWOP_REQUIRE_BEHAVIOR=true) across multi-region-idempotency{,-behavior}.test.ts + cross-engine-append-{ordering,behavior}.test.ts — none soft-skip. Reconciliation: crossRegion (categorical enum) is canonical per the steward decision; the RFC §A multiRegion object stays as additive-optional granular metadata. Per RFC 0001 §3 (impl + conformance), both met. A live multi-region/multi-engine non-steward host would graduate the §C/§B guarantees from seam-demonstrated to production-verified.) · originally (Draft → Active same-day 2026-05-21: capabilities.idempotency.multiRegion.{supported, replicationLagBoundMs, partitionRecoveryStrategy} block + capabilities.eventLog.crossEngineOrdering.{supported, orderingModel} block landed in schemas/capabilities.schema.json; spec/v1/idempotency.md §"multiRegion sub-block" added as the normative tighten-when-advertised clause; spec/v1/replay.md §"Cross-region replay (RFC 0036)" added with the byte-equivalent-on-projection contract. The multi-region simulator at examples/hosts/postgres/test/multi-region-simulator.ts + the cross-engine fixture + the behavioral conformance scenarios per §D remain deferred to a follow-up commit (the protocol-layer surface stands on its own; reference-host wiring graduates RFC 0036 to Accepted). Path to Accepted: Postgres reference host implements the simulator + advertises both capability blocks + the conformance scenarios pass.) |
| Affects | spec/v1/idempotency.md §"Multi-region reconciliation" (promote prose from "lower-confidence" to normative MUST when capability advertised) · spec/v1/replay.md (cross-region recovery test path) · schemas/capabilities.schema.json (adds capabilities.idempotency.multiRegion + capabilities.eventLog.crossEngineOrdering) · conformance/src/scenarios/multi-region-idempotency.test.ts (graduates from shape-only) · NEW conformance/src/scenarios/cross-engine-append-ordering.test.ts · multi-region simulator in reference host · INTEROP-MATRIX.md · CHANGELOG |
| Compatibility | additive |
| Supersedes | — |
| Superseded by | — |
Summary
Promote the multi-region idempotency contract and the cross-engine append-ordering contract from "lower-confidence shape-only" to normative MUST when a host advertises the matching capability. Adds the wire-level capability flags, the simulator harness, and the two behavioral conformance scenarios that close docs/KNOWN-LIMITS.md §"Shape-only conformance coverage" row 17 (multi-region-idempotency.test.ts) and §"Behavior tests too coarse" row 33 (cross-engine append ordering).
Motivation
Per docs/KNOWN-LIMITS.md:17,33 + spec/v1/idempotency.md:163 + the external standards-readiness review 2026-05-21 finding (6): "Idempotency, replay, event ordering, and versioning are well documented, but multi-region behavior is explicitly best-effort and lower-confidence... For production-scale workflow orchestration, cross-region reconciliation, event-log ordering, replay determinism, and failure recovery need stronger executable proof."
Today the relevant scenarios are:
multi-region-idempotency.test.ts— capability-shape only; behavioral assertion needs cross-region partition simulation.append-ordering.test.ts— single-engine only; cross-engine fixture would catch race conditions hidden by intra-engine sequence ordering.
Both gaps are honest in KNOWN-LIMITS but undermine standardization credibility — a reviewer cannot mechanically distinguish "the spec normates X" from "the spec aspires to X."
Proposal
§A — capabilities.idempotency.multiRegion (normative)
"idempotency": {
+ "multiRegion": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": ["supported"],
+ "properties": {
+ "supported": { "type": "boolean", "description": "Host implements cross-region idempotency reconciliation per spec/v1/idempotency.md §'Multi-region reconciliation'. When supported: true, an Idempotency-Key write that succeeds in one region is observable in another region within `replicationLagBoundMs`." },
+ "replicationLagBoundMs": { "type": "integer", "minimum": 0, "maximum": 60000, "description": "Conservative upper bound on cross-region replication lag for idempotency-key records. Conformance asserts that an Idempotency-Key write in region A is read-visible in region B after waiting `replicationLagBoundMs + safetyMargin`." },
+ "partitionRecoveryStrategy": {
+ "type": "string",
+ "anyOf": [
+ { "enum": ["last-writer-wins", "first-writer-wins"] },
+ { "pattern": "^x-host-[a-z][a-z0-9-]*-[a-z][a-z0-9-]*$" }
+ ],
+ "description": "Host's deterministic resolution rule when a partition healed with conflicting idempotency-key records. `last-writer-wins`: the record whose write commit timestamp is greatest survives. `first-writer-wins`: the record with the earliest commit timestamp survives. Vendor-specific strategies advertise a host-extension namespace string matching `^x-host-<host>-<key>$` per `spec/v1/host-extensions.md` §'Canonical prefixes'; the matching algorithm MUST be documented at the host's discovery doc. Conformance asserts the chosen rule's deterministic resolution actually applies (a given conflict input produces a deterministic survivor); conformance does NOT prescribe which strategy a host picks."
+ }
+ }
+ }
}
When supported: true, spec/v1/idempotency.md §"Multi-region reconciliation" prose flips from informational to normative MUST: the host MUST converge to a single committed outcome for any Idempotency-Key value across regions within replicationLagBoundMs + safetyMargin; the host MUST resolve conflicts via the advertised partitionRecoveryStrategy.
§B — capabilities.eventLog.crossEngineOrdering (normative)
+ "eventLog": {
+ "type": "object",
+ "additionalProperties": false,
+ "properties": {
+ "crossEngineOrdering": {
+ "type": "object",
+ "additionalProperties": false,
+ "required": ["supported"],
+ "properties": {
+ "supported": { "type": "boolean", "description": "Host implements append-ordering guarantees ACROSS multiple engine instances writing to the same run's event log. When supported: true, two engines appending to the same runId converge on a total order that any reader observes consistently." },
+ "orderingModel": { "type": "string", "enum": ["lamport", "vector-clock", "global-sequencer"], "description": "Mechanism the host uses to derive the total order." }
+ }
+ }
+ }
+ }
§C — Multi-region simulator (host-side)
Add examples/hosts/postgres/test/multi-region-simulator.ts — a programmable two-region Postgres harness with:
- Logical replication delay knob (
setReplicationLagMs(ms: number)). - Partition injection (
partitionRegions(): () => unhealRegions()). - Idempotency-Key write probes (
writeIdempotencyKey(region, key, value)) and read probes (readIdempotencyKey(region, key)). - Convergence check (
expectKeyEventuallyConvergent(key, expectedValue, maxWaitMs)).
The simulator is opt-in via OPENWOP_TEST_MULTI_REGION=1. Scenarios that need it gate cleanly when the env-var is absent.
§D — Conformance scenarios
conformance/src/scenarios/multi-region-idempotency.test.ts — graduate from shape-only to behavioral. New assertions:
1. Cross-region read-after-write: write Idempotency-Key in region A; after replicationLagBoundMs + safety, read in region B returns the same record. 2. Partition-then-heal: inject partition; concurrent writes to the same key in both regions; heal partition; verify the advertised partitionRecoveryStrategy actually applied. 3. Replication-lag bound: write in A; assert read in B fails (or returns stale) before replicationLagBoundMs; succeeds after.
NEW conformance/src/scenarios/cross-engine-append-ordering.test.ts — two-engine fixture:
1. Two engines append concurrently to the same runId's event log; verify both engines observe a consistent total order on read. 2. Engine-A-only sequence in interrupt-resume path; verify the resumption engine sees A's appends in order.
§E — Replay-determinism cross-region (additive prose to spec/v1/replay.md)
When capabilities.idempotency.multiRegion.supported: true AND capabilities.eventLog.crossEngineOrdering.supported: true, a POST /v1/runs/{runId}:fork invocation served by a different region MUST produce a fork run whose observable state at the fromSeq boundary matches a fork served by the original region. Specifically: the fork's RunSnapshot.status, RunSnapshot.variables, and the projected event log up to fromSeq MUST be byte-equivalent across regions. Per-region wall-clock fields in subsequent events MAY differ (e.g., timestamps embedded in RunEventDoc.observedAt, ULID component-T entropy in newly-generated event IDs); a bit-equivalent total comparison is NOT required and is not implementable in the presence of per-region clocks.
Compatibility
Additive. Hosts that don't advertise stay at today's "lower-confidence" posture; the existing prose in idempotency.md and replay.md continues to apply informationally. Hosts that DO advertise opt into the normative MUSTs.
Conformance
3 behavioral assertions added to existing multi-region-idempotency.test.ts per §D.1-3. NEW cross-engine-append-ordering.test.ts per §D.4-5. Both gate on the relevant capability flag + the OPENWOP_TEST_MULTI_REGION=1 env-var.
Alternatives considered
1. Mandate multi-region support in the production-profile. Rejected — many production deployments are single-region by intentional architectural choice (lower complexity, lower cost, regulatory data-residency constraints). The capability-advertisement pattern lets honest hosts claim what they actually do. 2. Standardize on a specific replication technology (Postgres logical replication, CockroachDB transactions, Spanner TrueTime). Rejected — the contract is observable behavior (read-after-write convergence within replicationLagBoundMs), not technology. 3. Cross-engine ordering as a top-level MUST. Rejected — single-engine deployments are common and shouldn't be forced to advertise multi-engine support.
Unresolved questions
1. What is the right replicationLagBoundMs ceiling? 60s feels conservative; production cross-region deployments typically run < 10s. Defer to operator advertisement; the schema validates the upper bound. 2. Three-region scenarios. Should the conformance scenario support N > 2 regions? Recommend N=2 for the first cut; expand later if a 3+ region host adopts. 3. Cross-engine ordering + interrupt-resume interaction. The interrupt-resume model already requires engine handoff; this RFC's §D.5 covers it but doesn't address split-brain scenarios.
Acceptance criteria
- [ ] Spec text merged (this file).
- [ ]
schemas/capabilities.schema.jsonextended per §A + §B. - [ ]
spec/v1/idempotency.md§"Multi-region reconciliation" prose tightened per §A's normative MUST. - [ ]
spec/v1/replay.mdextended per §E. - [ ] Multi-region simulator landed in
examples/hosts/postgres/test/per §C. - [ ]
conformance/src/scenarios/multi-region-idempotency.test.tsgraduated per §D.1-3. - [ ] NEW
conformance/src/scenarios/cross-engine-append-ordering.test.tsper §D.4-5. - [ ] Postgres reference host advertises both capability blocks; passes all scenarios under
OPENWOP_TEST_MULTI_REGION=1. - [ ]
INTEROP-MATRIX.mdrow updated. - [ ] CHANGELOG entry under
[Unreleased]. - [ ]
docs/KNOWN-LIMITS.mdrows 17 + 33 dropped from §"Shape-only" and §"Behavior tests too coarse."
Path to Active → Accepted: Postgres reference host implements + passes; non-steward host's advertisement closes the cross-host validation criterion.
References
docs/KNOWN-LIMITS.md:17,33spec/v1/idempotency.md§"Multi-region reconciliation" (line 163)spec/v1/replay.mdRFCS/0001-rfc-process.md§"Promotion to Accepted"- External standards-readiness review 2026-05-21 — finding (6)
examples/hosts/postgres/src/multi-region.ts(existing canonical resolver, 6-path unit test — the algorithm itself; this RFC adds the cross-host conformance gate)