instantlinux/data-sync

•Updated 13 days ago

HA storage synchronization for persistent data

Image

3.0K

instantlinux/data-sync repository overview

⁠data-sync

This provides HA storage for a bare-metal cluster. NAS servers are not usually HA and SAN installations are costly. Add this resource definition to your Kubernetes cluster and the volumes you mount under a directory /var/data-sync will be kept in sync using the unison⁠ file synchronizer from UPenn.

At present this is far easier to set up than other clustering technologies and results in a stable system with an easily-tracked audit log. I've tried and abandoned CephFS, GlusterFS, DRBD, and others; unison has been running well for several trouble-free years. It works like, and is as easy to understand as, a bidirectional rsync.

⁠Usage

To generate two ssh keypairs (stored in a single secret) and label your nodes, invoke the following:

HOST1=<host1> HOST2=<host2> make label_nodes
make data-sync

(This needs two keypairs so it can securely invoke both rsync and unison.)

Define any custom directives in the data-sync ConfigMap, and set environment variable $SERVICE_NAME to data-sync (you can run more than one copy of this by setting different SERVICE_NAME and ConfigMap names).

This repo has complete instructions for building a kubernetes cluster⁠ where you can deploy with helm⁠ or kubernetes.yaml⁠ using make and customizing Makefile.vars⁠ after cloning this repo:

git clone https://github.com/instantlinux/docker-tools.git
cd docker-tools/k8s
make data-sync

For monitoring, put nagios-nrpe-data-sync.cfg into your /etc/nagios directory and add an NRPE check_data_sync check to the primary host's list of services. Set the warning/critical values as appropriate for your polling frequency in the cfg file.

If you add any mount points underneath the synchronized volume, restart this service.

Files in the ConfigMap (mounted as /etc/unison.d) contain customizable directives as defined in unison-manual⁠. An example common.prf from this repo is installed if you don't define this.

To scale this beyond the first two nodes, add the service.data-sync=allow label to more nodes and invoke the kubectl scale command. The main scaling issue you'll run into with unison is high memory usage as the number of nodes and files increases. The host with ordinal 0 is configured as the hub of a star topology as defined in the UPenn doc.

Not running under Kubernetes? Omit PEERNAME value on the primary node, and set PEERNAME to the primary's hostname on each of the additional nodes.

⁠Variables

These variables can be passed to the image from helm's values.yaml, kubernetes.yaml or docker-compose.yml as needed:

Variable	Default	Description
PEERNAME		destination peer's hostname (if not running in k8s)
SECRET	data-sync_sshkey	override name of secret described below
PUBKEY1		public key as stored in configmap
PUBKEY2		public key as stored in configmap
RRSYNCROOT	/	root path allowed by rrsync
SYNC_INTERVAL	5	frequency, in minutes

Interval is slightly inexact, intentionally. An earlier version of this used cron for precision, but that causes more resource contention than necessary.

⁠Secrets

Secret	Description
data-sync-sshkey1	private half of ssh keypair
data-sync-sshkey2	private half of ssh keypair

⁠Contributing

If you want to make improvements to this image, see CONTRIBUTING⁠.

Tag summary

Recent tags

Content type

Image

Digest

sha256:065ffa29e…

Size

19.9 MB

Last updated

13 days ago

Run in Docker Desktop

Requires Docker Desktop 4.37.1 or later.