Why Raft for Runtime Consensus
Authoritative cluster ordering for promotions, overrides, and swarm scheduler ownership
The Problem
Converge has two very different problems to solve:
- Semantic reasoning - validators, promotion gates, policy evaluation, and LLM-assisted proposals
- Distributed agreement - which node is leader, what became fact, and in what order
Raft solves the second problem, not the first.
This distinction matters. If we put consensus inside reasoning loops, we make the system slower and more complex without improving correctness. If we avoid consensus entirely, multi-node runtime behavior drifts: two nodes can both believe they are authoritative, promotions can commit in different orders, and audit trails diverge.
We needed a runtime design that:
- Preserves append-only truth across a cluster
- Makes authority explicit (leader + quorum)
- Keeps converge-core free of distributed systems dependencies
- Scales event ingestion without requiring quorum on every trace
The answer is a layered model:
- Elixir Ledger records append-only context events and local facts
- Lamport clocks provide scalable causal ordering hints across distributed writers
- Raft in the Rust swarm runtime provides authoritative commit ordering for governance-critical events
Lamport clocks are an excellent scaling mechanism for distributed streams. They are not a replacement for quorum-backed consensus.
Raft belongs in converge-runtime, not converge-core.
It is consensus on facts about execution, not intelligence for deciding facts.
Current Options
| Option | Pros | Cons |
|---|---|---|
| Elixir Ledger + Lamport onlyAppend-only ledger with scalable causal ordering, but no quorum commit authority. |
|
|
| External coordinator (DB locks / etcd)Use a separate system for leader election and locks while runtime state lives elsewhere. |
|
|
| Rust swarm + Raft (OpenRaft)Embed consensus in converge-runtime for leader election, log replication, and quorum commits. |
|
|
Future Outlook
The long-term pattern is not "Raft everywhere." It is layered ordering with explicit authority.
1) Elixir Ledger remains the append-only substrate
The Elixir ledger is still the right home for durable context streams, local append-heavy workloads, and fault-tolerant event handling on the BEAM. It is optimized for availability and distributed process supervision.
2) Lamport clocks scale distributed event ordering
Lamport timestamps let many producers emit events without blocking on quorum. They preserve causality hints and make merges deterministic enough for replay, debugging, and analytics.
This is the scalable version of ordering, but not the authoritative one.
3) Raft makes cluster facts authoritative
For governance-critical transitions, the Rust swarm runtime should use Raft to commit a single cluster order:
- proposal_created
- validation_passed / validation_failed
- promotion_approved / promotion_rejected
- rollback_issued
- override_granted
That gives us one source of truth for cluster-side execution facts.
OpenRaft is a strong fit here.
Its storage and state machine interfaces line up with Converge's boundary style: explicit traits, durable state, snapshots, and lifecycle control. raft-rs (TiKV ecosystem) is also a serious option, but OpenRaft is a better first evaluation for a governed control plane.
The likely end-state:
- Elixir ledger for append-heavy distributed events
- Lamport clocks for scalable causal ordering hints
- Rust Raft swarm for authoritative governance commits and leader-owned orchestration
Our Decision
✓Why we chose this
- Prevents two truthsRaft gives a quorum-backed commit order, so two runtime nodes cannot silently commit conflicting governance decisions as if both were authoritative.
- Explicit authorityLeader election and leader-only writes align with Converge's requirement for visible, auditable authority in runtime execution.
- Append-only governance logEvery state transition can be represented as a committed entry. This matches Converge's append-only truth model and simplifies auditability.
- Clean separation from reasoningRaft strengthens ordering and replication without contaminating converge-core with consensus mechanics or LLM-specific concerns.
- Works with Lamport, does not replace itLamport clocks remain useful for scalable distributed ordering metadata, while Raft is reserved for the small set of events that need authoritative commit semantics.
×Trade-offs we accept
- Quorum costWrites to the replicated governance log require follower acknowledgements. This adds latency compared to local appends or Lamport-only event streams.
- Operational complexityMembership changes, snapshotting, compaction, and partition behavior require discipline and tooling. Consensus is never free.
- Wrong tool for reasoning loopsRaft does nothing for LLM correctness, prompt generation quality, retrieval ranking, or policy semantics. Misusing it here just adds overhead.
- Side-effect ordering is trickyExternal actions must happen after commit (or be fully idempotent and tied to commit index). Otherwise failover can duplicate or reorder effects.
Motivation
Converge's axioms already imply a consensus layer at runtime scale:
- Explicit authority
- Append-only truth
- No hidden work
- Transparent determinism
These are not just semantic principles. In a multi-node runtime, they become distributed systems requirements.
converge-core should remain pure and deterministic. It reasons about proposals, validators, and promotions. It should define interfaces like:
- AppendOnlyLog
- FactCommitStore
- PromotionLedger
- LeaseAuthority
But the concrete implementation of cluster authority belongs in converge-runtime.
This is where the Elixir ledger, Lamport clocks, and Raft complement each other:
- Elixir ledger stores append-only context events and provides durable operational history
- Lamport clocks scale ordering across distributed writers without forcing quorum on every event
- Raft in the Rust swarm commits a single authoritative sequence for governance-critical runtime actions
That lets Converge scale without sacrificing the thing that matters most in consequential systems: a single, auditable truth for what the runtime actually did.
Recommendation
Use Raft in converge-runtime, not converge-core.
Start with the smallest high-value surface:
Phase 1: Raft-replicate only the governance event log
- proposal_created
- validation_passed / validation_failed
- promotion_approved / promotion_rejected
- rollback_issued
- override_granted
This gives you ordered, distributed auditability first.
Phase 2: Leader-owned orchestration
Only the leader dispatches certain runtime actions. Followers replicate state and remain hot standbys.
Phase 3: Selective operational state
Replicate leases, assignments, or critical queue ownership only where split-brain cost is high.
Implementation guidance:
- Keep trait boundaries in core, adapters in runtime
- Prefer OpenRaft first for storage/state-machine clarity
- Use Lamport clocks for scalable event streams, not final authority
- Do not use Raft for LLM reasoning loops, prompt generation, or semantic ranking
Raft should make Converge more trustworthy, not merely more distributed.
Examples
use std::sync::Arc;
use converge_runtime::consensus::{
GovernanceConsensus, InMemoryRaftGovernanceEngine, JsonFileRaftStore,
};
use converge_runtime::consensus::raft::{
GovernanceEvent, LamportTimestamp, NodeId, RaftGovernanceCluster,
};
fn bootstrap_swarm() -> Result<(), Box<dyn std::error::Error>> {
// Rust swarm cluster for authoritative governance commits
let cluster = RaftGovernanceCluster::bootstrap([
NodeId::new(1),
NodeId::new(2),
NodeId::new(3),
])?;
let store = Arc::new(JsonFileRaftStore::new("/var/lib/converge/raft")); // snapshots + durable state
let engine = InMemoryRaftGovernanceEngine::with_store(cluster, store);
engine.elect_leader(NodeId::new(1))?;
// Lamport timestamp observed from append-heavy Elixir ledger stream (ordering hint)
let mut cluster_view = engine.cluster_state()?;
cluster_view.observe_remote_timestamp(LamportTimestamp {
counter: 42,
node: NodeId::new(9),
});
// In a real adapter, the runtime would persist this clock update in the consensus service.
let entry = engine.append_governance_event(
NodeId::new(1),
GovernanceEvent::PromotionApproved {
proposal_id: "proposal-123".to_owned(),
approver: "engine://governor".to_owned(),
},
)?;
// Quorum before side effects
engine.acknowledge_replication(NodeId::new(2), entry.index)?;
let committed = engine.cluster_state()?;
if committed.commit_index() >= entry.index {
// Now it is safe to dispatch external effects (or do so idempotently by commit index)
println!("Committed promotion at index {}", entry.index.get());
}
Ok(())
}The pattern is layered: Lamport timestamps help order distributed events cheaply, while Raft enforces authoritative commit order for governance events. External side effects happen only after quorum commit.