Cluster Topology
Chango's cluster topology has two simple rules — one for masters, one for node managers — and the rest follows.
The two rules
Master
- More than one master may run on the same host (admin ports must differ).
- A master may co-locate with a node manager on the same host.
- HA is achieved by running multiple masters; one is elected leader via ZooKeeper.
Node manager
- Exactly one node manager per host.
- A node manager may co-locate with one or more masters on the same host.
These rules are enforced at registration time — a second node manager attempting to register against the same host (with a different nodeId) is rejected with IllegalStateException.
Why these rules
- Master = N per host — masters are stateless (cluster state lives in the leader's RocksDB and ZooKeeper). Running more than one on a single host gives admin-UI / API capacity headroom and a hot standby that can take over in <1 leader-election window.
- NM = 1 per host — the node manager is the host agent. It owns the host's
/opt/components/*install dirs, the host'spsview of every managed process, and the SELinux / sudo / file-ownership surface. Two NMs on one host would fight over those. - Co-location is fine — a small cluster (three hosts, one master + three NMs) is the standard layout. On large clusters you split masters off onto dedicated control hosts; on dev / lab clusters everything stacks on one host.
Typical layouts
Small layout (one master)
host1 master + NM (chango admin, ZK, component instances)
host2 NM (component instances)
host3 NM (component instances)
This is the layout used by the standard install playbook (inventory.yml.example). One master is enough to run a cluster — a master restart costs ~10 s of admin-API unavailability; component processes keep running.
Three masters + N node managers (HA)
Three masters form a ZooKeeper quorum and elect a leader. Losing one master (network blip, restart) causes a sub-second leader re-election with the other two; component processes are unaffected.
Dedicated control hosts
host-c1 master (no NM, no component processes)
host-c2 master
host-c3 master
host1 NM (every component instance)
host2 NM
...
hostN NM
Used when control-plane and workload need physical isolation (compliance, blast radius).
Node identifiers
Default identifiers are derived from the host and the role:
- Master:
master-<sanitized-hostname>-<adminPort>— the admin port is what makes a second master on the same host distinct (currently a deploy-layer limitation — single master per host is what the bundled systemd unit supports today). - Node manager:
nm-<sanitized-hostname>-<internalPort>— the internal port is in the id for stable backwards compatibility with component records; since only one NM per host is allowed at runtime, the port is effectively constant.
Both are overridable via chango.master.nodeId / chango.nodemanager.nodeId in chango.properties for environments that need explicit naming.
Hostname model
Chango registers each node by its OS canonical hostname. There are two cases:
| OS hostname looks like | Chango behaviour |
|---|---|
Real FQDN (host1.example.com, ip-10-0-0-11.region.compute.internal, …) |
Use it as-is. /etc/hosts is not modified — your DNS already resolves it. |
Short (host1, ip-10-0-0-11, …) |
Synthesize <short>.<chango.cluster.domain> (default chango.private) and keep /etc/hosts in sync across all hosts with a managed # BEGIN chango … # END chango block. |
See Hosts Sync for the sync details.
Adding / removing hosts
- Adding a node manager — prepare the host, add it to the inventory, re-run the install playbook with
--limit <new-host>. The new NM registers in ZooKeeper, the leader picks it up, and component install panels in the admin UI start offering it as a target. - Removing a node manager — stop
chango-nodemanager.serviceon that host. Its ephemeral znode disappears; the leader removes it from the live registry. Component instances still listed against that nodeId stay in metadata; before reusing the same nodeId on a different host, delete or move those instances first. - Adding a master — start a new master process with a fresh
nodeIdagainst the same ZooKeeper. It joins the leader election. The existing leader keeps leadership (sticky leader). - Removing a master — stop
chango-master.service. If it was the leader, the other masters elect a new leader within a second.