Why Elixir for the Ledger
Fault tolerance and distributed consensus on the BEAM
The Problem
The Converge ledger is an append-only substrate that records every decision, every state change, every obligation. It must:
- Never lose data, even during network partitions
- Remain available during node failures
- Support real-time replication across nodes
- Handle concurrent writes without conflicts
This is a distributed systems problem. The database is part of the solution, but the runtime matters equally. We needed a platform designed from the ground up for fault tolerance and distribution.
Enter the BEAM—the Erlang virtual machine that has powered telecom systems (99.9999999% uptime) for 30+ years.
Current Options
| Option | Pros | Cons |
|---|---|---|
| JVM + Distributed DBJava/Kotlin with Cassandra, CockroachDB, or similar. |
|
|
| Go + etcd/RaftGo services with consensus via etcd or custom Raft. |
|
|
| Elixir/Erlang on BEAMFunctional language on a VM designed for telecoms. |
|
|
Future Outlook
The BEAM is proven technology. WhatsApp handles 2 million connections per server. Discord uses Elixir for their real-time infrastructure. Ericsson's telecom switches have run Erlang for decades with legendary uptime.
For Converge, the ledger is the most critical component. Every other service can restart, retry, recover. The ledger must never lose an acknowledged write.
The BEAM's "let it crash" philosophy is counterintuitive but powerful: instead of defensive error handling everywhere, processes crash and supervisors restart them in known-good state. This eliminates entire categories of bugs around error recovery.
Combined with Mnesia (Erlang's distributed database) and libcluster (automatic node discovery), we get distributed consensus without bolting on external infrastructure.
Our Decision
✓Why we chose this
- Supervision treesProcesses are organized into supervision hierarchies. When a process crashes, its supervisor restarts it. Failures are isolated, not cascading.
- Native distributionBEAM nodes can cluster with a single line of configuration. Message passing between nodes is transparent. No separate RPC layer needed.
- Mnesia integrationMnesia is a distributed database built into OTP. Transactions, replication, and failover are native—not external dependencies.
- Pattern matchingElixir's pattern matching makes state machines explicit and readable. Protocol handling becomes declarative.
- Hot code upgradesBEAM supports updating code without stopping the system. For a ledger that must never go down, this is essential.
×Trade-offs we accept
- Not for heavy computationBEAM is optimized for I/O and concurrency, not CPU-bound work. Our ML inference stays in Rust; the ledger stays in Elixir.
- Dynamic typingElixir is dynamically typed. We use Dialyzer for static analysis and write comprehensive property-based tests.
- Smaller ecosystemFewer libraries than JVM or Node. For our use case (distributed consensus, persistence), OTP provides everything we need.
Motivation
The ledger is where correctness matters most. It's the source of truth for every obligation, every decision, every state transition in the system.
Elixir/OTP gives us:
Fault tolerance by design — Supervision trees mean individual process failures don't bring down the system. The runtime is designed to keep running.
Distributed consistency — Mnesia provides distributed transactions with configurable consistency. We get CAP trade-offs without external infrastructure.
Explicit state machines — Pattern matching makes the ledger's state transitions explicit. Each message type has a clear handler.
The decision to use Elixir for the ledger and Rust for everything else reflects their respective strengths: Elixir for distribution and fault tolerance, Rust for computation and type safety.
Recommendation
For distributed systems: Elixir/OTP is underrated. If your problem involves coordination, fault tolerance, or real-time communication, evaluate it seriously.
For state machines: Pattern matching and GenServers make complex protocols manageable.
For high availability: The BEAM's supervision model is battle-tested. "Let it crash" sounds scary but works.
Not for: CPU-intensive computation, ML inference, or domains where static typing is critical.
At Converge, Elixir powers converge-ledger exclusively. It's the right tool for that specific job.
Examples
defmodule Converge.Ledger.EntryServer do
use GenServer
require Logger
# Client API
def append(entry) do
GenServer.call(__MODULE__, {:append, entry})
end
# Server callbacks
@impl true
def handle_call({:append, entry}, _from, state) do
case validate_entry(entry) do
:ok ->
# Mnesia transaction for distributed consistency
result = :mnesia.transaction(fn ->
sequence = get_next_sequence()
record = {Entry, sequence, entry, :os.system_time(:millisecond)}
:mnesia.write(record)
{:ok, sequence}
end)
case result do
{:atomic, {:ok, seq}} ->
Logger.info("Appended entry #{seq}")
{:reply, {:ok, seq}, state}
{:aborted, reason} ->
{:reply, {:error, reason}, state}
end
{:error, reason} ->
{:reply, {:error, reason}, state}
end
end
# Process crashes here? Supervisor restarts us in clean state.
# No defensive error handling needed—let it crash.
endGenServer processes handle state and messages. Mnesia provides distributed transactions. If the process crashes (bug, bad data, network issue), the supervisor restarts it automatically. This is the BEAM's "let it crash" philosophy in action.