Backup & Restore
Chango's control-plane state — IAM, KMS, component inventory — lives in three RocksDB stores on the master. The built-in BackupService packs all three (plus a manifest) into a single tar archive and uploads it to an S3-compatible target on demand or on a schedule.
What is in a backup
| Archive entry | Source | What it contains |
|---|---|---|
kms/ |
/var/lib/chango/kms/ |
KEKs wrapped by the cluster master DEK |
iam/ |
/var/lib/chango/iam/ |
Users, groups, policies, access keys (password hashes / access secrets are stored hashed / encrypted at the field level inside RocksDB) |
metadata/ |
/var/lib/chango/metadata/ |
Component inventory, per-cluster settings, gateway topology, nginx state |
manifest.json |
generated | timestamp, hostname, chango.version, archive checksum |
What is NOT in a backup
- The cluster root secret that derives chango's master DEK. The operator owns and stores this out-of-band; without it the backup is mathematically undecryptable. See KMS.
- Per-component data — Iceberg files on ShannonStore, NeoRunBase tables, ItdaStream topics, Kafka logs, PostgreSQL data dirs, kiok workflow state. Each component manages its own data lifecycle. Chango's backup is the control-plane backup; the data plane is backed up via the component's own mechanism (most use the same S3 endpoint).
- Logs —
/var/log/changois operational and excluded.
Backup destination
The backup target is an S3-compatible endpoint. The same BackupService works against AWS S3, ShannonStore, MinIO, or any provider that speaks SigV4 path-style or virtual-hosted style.
| Field | Notes |
|---|---|
endpoint |
https://s3.<region>.amazonaws.com or http://<shannonstore-api>:8080 |
region |
SigV4 region — for non-AWS providers, anything (typically us-east-1) |
bucket |
Pre-created |
prefix |
Optional path prefix (e.g. chango-backups/<cluster-name>/) |
accessKey / secretKey |
Long-lived S3 credentials |
S3 PUT requests are signed with SigV4 and request server-side encryption (x-amz-server-side-encryption: AES256). The payload body is already KMS-envelope-encrypted at the chango layer, so the SSE-S3 layer is a defence-in-depth measure rather than the primary control.
Triggering a backup
Admin UI
Sidebar → Backup → fill in S3 endpoint / bucket / credentials → Backup Now. The destination configuration is persisted (envelope-encrypted in the metadata RocksDB) so subsequent backups reuse the same target.
REST API
# Configure the destination (one-time)
curl -X PUT -H "Authorization: Bearer $TOK" \
-H 'Content-Type: application/json' \
-d '{
"endpoint": "http://shannonstore:8080",
"region": "us-east-1",
"bucket": "chango-backups",
"prefix": "prod/",
"accessKey": "...",
"secretKey": "..."
}' $BASE/admin/api/backup/config
# Run a backup
curl -X POST -H "Authorization: Bearer $TOK" $BASE/admin/api/backup/run
The response includes the S3 key (prod/chango-<timestamp>-<uuid>.tar.gz) and the SHA-256 of the archive.
Schedule
Set a cron expression in the destination config — the leader runs the backup on that schedule. Default is off.
Restoring
Restore is an out-of-band procedure on a fresh master node:
- Stop the chango master systemd unit (if anything is running).
- Download the desired backup object from S3 (e.g. with
aws s3 cporcurl+ SigV4). - Untar it into the master's RocksDB parent dir:
sudo systemctl stop chango-master
sudo rm -rf /var/lib/chango/{kms,iam,metadata}
sudo tar -xzf chango-<timestamp>-<uuid>.tar.gz -C /var/lib/chango/
sudo chown -R chango:chango /var/lib/chango/
- Supply the cluster root secret on the new master node (same secret as the source cluster — without it the restored RocksDB is undecryptable).
- Start the chango master:
The master opens the restored RocksDB, decrypts KEKs with the master DEK derived from the supplied root secret, and resumes leader election. Component instances stored in the metadata RocksDB are visible again immediately.
Restoring node managers
NMs do not need to be restored — they have no authoritative state. After the master is back, NMs that are still alive against the same ZooKeeper re-register automatically. NMs that were lost can be reinstalled per Download & Install — once registered, the leader re-attaches any component instances that were bound to that nodeId.
Component data
Restoring the chango master gives you back the component inventory — which instance lives on which NM, with which configuration. It does not bring back component data (Iceberg files, NeoRunBase tables, …) — those are backed up via their own component-specific paths and restored separately. After the inventory is restored and the components are visible in the admin UI, start each component cluster again; the component process reads its own data from its still-present data dir.
Backup hygiene
- Treat each backup tarball as cluster-confidential. Even though chango-side envelope encryption is intact, the tarball contains everything an attacker needs to impersonate your control plane if they ever learn the root secret.
- Keep at least one offline copy. The cluster root secret and the backup tarball stored on the same S3 bucket gives an attacker both halves on a single compromise.
- Test restore. A backup you have never restored from is a backup you do not have.