Skip to content

High Availability

Mium provides fault tolerance and high availability through multi-Master leader election, state replication, and automatic failure recovery.

Master High Availability

Multiple Masters can run simultaneously in a leader/follower configuration:

  • Leader Election: Apache ZooKeeper (via Curator LeaderLatch) elects a primary Master. The leader owns all write operations to the state stores (RocksDB).
  • State Replication: The leader Master replicates IAM, KMS, ConnectionStore, and MemoryStore state to follower Masters via the internal NIO protocol.
  • Automatic Failover: If the leader Master fails, ZooKeeper elects a new leader, which reloads persisted state from RocksDB and resumes operations.
  • Transparent Proxying: Follower Masters can serve read requests. Write requests received by followers are transparently proxied to the leader via LeaderRouter.
  • Self-Healing: Followers periodically pull state snapshots from the leader to ensure consistency.

Worker Registration

Workers register as ephemeral nodes in ZooKeeper under /mium/workers/<nodeId>:

  • Automatic detection of worker joins and failures
  • When a Worker disconnects, its ephemeral node is automatically removed
  • The Master adjusts tool execution routing based on available Workers

Service Discovery

Masters and Workers register as ephemeral nodes in ZooKeeper. When a node joins or leaves the cluster, all other nodes are notified automatically. No manual configuration of cluster membership is required.

ZooKeeper node layout:

/mium/
  masters/
    <nodeId-1>  (ephemeral)
    <nodeId-2>  (ephemeral)
  workers/
    <nodeId-3>  (ephemeral)
    <nodeId-4>  (ephemeral)

State Stores

All cluster state is stored in embedded RocksDB — no external database is needed:

  • IAM users, groups, policies, and organizations
  • KMS encrypted keystore with versioned KEKs
  • Per-user connection credentials (encrypted)
  • Per-user chat sessions and messages

State Synchronization OpCodes

OpCode Purpose
IAM_SYNC User, group, policy, organization data
KMS_SYNC Encryption key bundle
CONNECTION_SYNC Per-user tool credentials
MEMORY_SYNC Chat sessions and messages