Elixirlanguage

Why Elixir for the Ledger

Fault tolerance and distributed consensus on the BEAM

v1.1·10 min read·Kenneth Pernyér
elixirerlangbeamdistributedfault-toleranceledger

The Problem

The Converge ledger is an append-only substrate that records every decision, every state change, every obligation. It must:

  • Never lose data, even during network partitions
  • Remain available during node failures
  • Support real-time replication across nodes
  • Handle concurrent writes without conflicts

This is a distributed systems problem. The database is part of the solution, but the runtime matters equally. We needed a platform designed from the ground up for fault tolerance and distribution.

Enter the BEAM—the Erlang virtual machine that has powered telecom systems (99.9999999% uptime) for 30+ years.

Current Options

OptionProsCons
JVM + Distributed DBJava/Kotlin with Cassandra, CockroachDB, or similar.
  • Mature ecosystem and tooling
  • Large talent pool
  • Well-understood deployment patterns
  • Many database options
  • Distribution is bolted on, not native
  • Fault tolerance requires external frameworks
  • GC pauses affect latency
  • Complexity in managing distributed state
Go + etcd/RaftGo services with consensus via etcd or custom Raft.
  • Simple language, easy deployment
  • etcd is battle-tested
  • Good performance characteristics
  • Strong at networking
  • No supervision trees—crashes require external restart
  • Distributed state management is complex
  • Less expressive for protocol state machines
  • Error handling obscures control flow
Elixir/Erlang on BEAMFunctional language on a VM designed for telecoms.
  • Fault tolerance is built into the runtime
  • Distributed by default (node clustering)
  • Supervision trees auto-restart failed processes
  • Pattern matching perfect for protocol state machines
  • Smaller talent pool
  • Not suitable for CPU-intensive computation
  • Dynamic typing (mitigated by Dialyzer)
  • Different mental model than mainstream languages

Future Outlook

The BEAM is proven technology. WhatsApp handles 2 million connections per server. Discord uses Elixir for their real-time infrastructure. Ericsson's telecom switches have run Erlang for decades with legendary uptime.

For Converge, the ledger is the most critical component. Every other service can restart, retry, recover. The ledger must never lose an acknowledged write.

The BEAM's "let it crash" philosophy is counterintuitive but powerful: instead of defensive error handling everywhere, processes crash and supervisors restart them in known-good state. This eliminates entire categories of bugs around error recovery.

Combined with Mnesia (Erlang's distributed database) and libcluster (automatic node discovery), we get distributed consensus without bolting on external infrastructure.

Our Decision

Why we chose this

  • Supervision treesProcesses are organized into supervision hierarchies. When a process crashes, its supervisor restarts it. Failures are isolated, not cascading.
  • Native distributionBEAM nodes can cluster with a single line of configuration. Message passing between nodes is transparent. No separate RPC layer needed.
  • Mnesia integrationMnesia is a distributed database built into OTP. Transactions, replication, and failover are native—not external dependencies.
  • Pattern matchingElixir's pattern matching makes state machines explicit and readable. Protocol handling becomes declarative.
  • Hot code upgradesBEAM supports updating code without stopping the system. For a ledger that must never go down, this is essential.

×Trade-offs we accept

  • Not for heavy computationBEAM is optimized for I/O and concurrency, not CPU-bound work. Our ML inference stays in Rust; the ledger stays in Elixir.
  • Dynamic typingElixir is dynamically typed. We use Dialyzer for static analysis and write comprehensive property-based tests.
  • Smaller ecosystemFewer libraries than JVM or Node. For our use case (distributed consensus, persistence), OTP provides everything we need.

Motivation

The ledger is where correctness matters most. It's the source of truth for every obligation, every decision, every state transition in the system.

Elixir/OTP gives us:

Fault tolerance by design — Supervision trees mean individual process failures don't bring down the system. The runtime is designed to keep running.

Distributed consistency — Mnesia provides distributed transactions with configurable consistency. We get CAP trade-offs without external infrastructure.

Explicit state machines — Pattern matching makes the ledger's state transitions explicit. Each message type has a clear handler.

The decision to use Elixir for the ledger and Rust for everything else reflects their respective strengths: Elixir for distribution and fault tolerance, Rust for computation and type safety.

Recommendation

For distributed systems: Elixir/OTP is underrated. If your problem involves coordination, fault tolerance, or real-time communication, evaluate it seriously.

For state machines: Pattern matching and GenServers make complex protocols manageable.

For high availability: The BEAM's supervision model is battle-tested. "Let it crash" sounds scary but works.

Not for: CPU-intensive computation, ML inference, or domains where static typing is critical.

At Converge, Elixir powers converge-ledger exclusively. It's the right tool for that specific job.

Examples

lib/ledger/entry_server.exelixir
defmodule Converge.Ledger.EntryServer do
  use GenServer
  require Logger

  # Client API
  def append(entry) do
    GenServer.call(__MODULE__, {:append, entry})
  end

  # Server callbacks
  @impl true
  def handle_call({:append, entry}, _from, state) do
    case validate_entry(entry) do
      :ok ->
        # Mnesia transaction for distributed consistency
        result = :mnesia.transaction(fn ->
          sequence = get_next_sequence()
          record = {Entry, sequence, entry, :os.system_time(:millisecond)}
          :mnesia.write(record)
          {:ok, sequence}
        end)

        case result do
          {:atomic, {:ok, seq}} ->
            Logger.info("Appended entry #{seq}")
            {:reply, {:ok, seq}, state}
          {:aborted, reason} ->
            {:reply, {:error, reason}, state}
        end

      {:error, reason} ->
        {:reply, {:error, reason}, state}
    end
  end

  # Process crashes here? Supervisor restarts us in clean state.
  # No defensive error handling needed—let it crash.
end

GenServer processes handle state and messages. Mnesia provides distributed transactions. If the process crashes (bug, bad data, network issue), the supervisor restarts it automatically. This is the BEAM's "let it crash" philosophy in action.

Related Articles

Stockholm, Sweden

Version 1.1

Kenneth Pernyér signature