zrepl is a one-stop ZFS backup & replication solution.
This project is a fork of zrepl.
Current FreeBSD port is here. Keep in mind, sometimes it is a development version of this project, for testing.
Stable version of this project can be easy installed on FreeBSD using
pkg install zrepl-dsh2dshI don't build RPMs, but if somebody needs, here is a third party repo. Thanks to fluoros-jp for that.
-
The project has switched from gRPC to REST API.
This change isn't compatible with old job configurations. Instead of configuring
serv:for every job, it configures in one place:# Include file with keys for accessing remote jobs and authenticate remote # clients. The filename is relative to filename of this configuration file. include_keys: "keys.yaml" # Include multiple files with keys from directory. #include_keys "keys.d/*.yaml" listen: # Serve "sink" and "source" jobs for network access. - addr: ":8888" tls_cert: "/usr/local/etc/ssl/cert.pem" tls_key: "/usr/local/etc/ssl/key.pem" zfs: true
This configuration serves http and https API requests.
tls_certandtls_keyare optional and needed for serving https requests.keys.yamlcontains authentication keys of remote clients:# Clients with defined authentication keys have network access to "sink" and # "source" jobs. The key name is their client identity name. # Authentication token and client_identity for me. - name: "a.domain.com" # client_identity key: "long and secret token"
By default all authenticated clients have remote access to
sinkandsourcejobs. But it can be restricted usingclient_keyslike:jobs: - name: "zdisk" type: "sink" # Restrict access to this job for listed remote clients client_keys: - "key1" - "key2" # and nobody else.
-
All transports has been replaced by
localandhttptransports.localtransport configuration looks almost the same:jobs: - name: "zroot-to-zdisk" type: "push" connect: type: "local" listener_name: "zdisk" client_identity: "localhost"
with one exception.
listener_namenow is a remote job name actually.The new
httptransport replaced all network transports. Its configuration look like:jobs: - name: "zroot-to-server" type: "push" connect: type: "http" server: "https://server:8888" listener_name: "zdisk" client_identity: "serverkey" - name: "server-to-zdisk" type: "pull" connect: type: "http" server: "https://server:8888" listener_name: "zroot-to-client" client_identity: "serverkey"
listener_nameis a job name on the server with type ofsinkorsource.client_identityis a key name fromkeys.yaml. That key will be sent to the server for authentication and the server must have a key with the samekeycontent inkeys.yaml.namecan be different, becausesinkandsourcejobs use key name asclient_identity.
Changes from upstream:
-
Fresh dependencies
-
Merged.last_nkeep rule fixedSee
#691. -
Some of resolved upstream issues:
-
New dataset filter syntax instead of
filesystems:New field
datasetsis a list of patterns. By default a pattern includes matched dataset. All patterns applied in order and last matched pattern wins. Lets see some examples.The following configuration will allow access to all datasets:
jobs: - name: "source" type: "source" datasets: - recursive: true
The following configuration will allow access to datasets
zroot/ROOT/defaultandzroot/usr/homeincluding all their children.jobs: - name: "snap-1h" type: "snap" datasets: - pattern: "zroot/ROOT/default" recursive: true - pattern: "zroot/usr/home" recursive: true
The following configuration is more complicated:
jobs: - name: "source" type: "source" datasets: - pattern: "tank" # rule (1) recursive: true - pattern: "tank/foo" # rule (2) exclude: true recursive: true - pattern: "tank/foo/bar" # rule (3)
tank/foo/bar/loois excluded by (2), because (3) isn't matched (it isn't recursive).tank/baris included by (1).tank/foo/baris included by (3), because yes, it matched by (2), but last matched rule wins and (3) is the last matched rule.zrootisn't included at all, because nothing matched it.tank/var/logis included by (1), becuase this rule is recursive and other rules are not matched.For compatibility reasons old
filesystemsstill works, but I wouldn't suggest use it. It's deprecated and can be removed anytime. -
Added support of shell patterns for datasets definitions.
Configuration example:
datasets: # exclude all children of zroot/bastille/jails - pattern: "zroot/bastille/jails" exclude: true recursive: true # except datasets matched by this shell pattern - pattern: "zroot/bastille/jails/*/root" shell: true
This configuration includes
zroot/bastille/jails/a/root,zroot/bastille/jails/b/rootzfs datasets, and excludeszroot/bastille/jails/a,zroot/bastille/jails/bzfs datasets on.Another example:
datasets: # exclude datasets matched by this shell pattern - pattern: "zroot/bastille/jails/*/root" exclude: true shell: true # and include everything else inside zroot/bastille/jails - pattern: "zroot/bastille/jails" recursive: true
excludes
zroot/bastille/jails/a/root,zroot/bastille/jails/b/rootand includes everything else insidezroot/bastille/jails.See Match for details about patterns.
-
Added new log formatters:
jsonandtext.Both formatters use slog for formatting log entries. The new
jsonformatter replaces oldjsonformatter. Configuration example:logging: - type: "file" format: "text" # or "json" time: false # don't prepend with date and time hide_fields: - "span" # don't log "span" field
-
Added ability to log into a file.
See #756. Configuration example:
logging: - type: "file" format: "text" # or "json" time: false # don't prepend with date and time hide_fields: &hide-log-fields - "span" # don't log "span" field level: "error" # log errors only # without filename logs to stderr - type: "file" format: "text" hide_fields: *hide-log-fields level: "info" filename: "/var/log/zrepl.log"
-
Replication jobs (without periodic snapshotting) can be configured for periodic run.
See #758. Configuration example:
- name: "zroot-to-server" type: "push" interval: "1h" snapshotting: type: "manual"
Both
pullandpushjob types support configuration of periodic run using cron specification. For instance:- name: "zroot-to-server" type: "push" cron: "25 15-22 * * *" snapshotting: type: "manual"
See CRON Expression Format for details.
-
Added ability to configure command pipelines between
zfs sendandzfs recv.See #761. Configuration example:
send: execpipe: # zfs send | zstd | mbuffer - [ "zstd", "-3" ] - [ "/usr/local/bin/mbuffer", "-q", "-s", "128k", "-m", "100M" ]
recv: execpipe: # mbuffer | unzstd | zfs receive - [ "/usr/local/bin/mbuffer", "-q", "-s", "128k", "-m", "100M" ] - [ "unzstd" ]
zreplexports somezfs send|recvargs as env variables:ZREPL_SEND_RESUME_TOKEN,ZREPL_SEND_FROM,ZREPL_SEND_SNAPSHOTandZREPL_RECV_FS. -
Added Icinga/Nagios checks for checking the daemon is alive, snapshots count is ok, latest or oldest snapshots are not too old.
See #765. Configuration example:
monitor: count: - prefix: "zrepl_frequently_" warning: 20 critical: 30 - prefix: "zrepl_hourly_" warning: 31 critical: 50 - prefix: "zrepl_daily_" warning: 91 critical: 92 - prefix: "zrepl_monthly_" warning: 13 critical: 14 - prefix: "" # everything else warning: 2 critical: 10 latest: - prefix: "zrepl_frequently_" critical: "48h" # 2d - prefix: "zrepl_hourly_" critical: "48h" - prefix: "zrepl_daily_" critical: "48h" - prefix: "zrepl_monthly_" critical: "768h" # 32d oldest: - prefix: "zrepl_frequently_" critical: "48h" # 2d - prefix: "zrepl_hourly_" critical: "168h" # 7d - prefix: "zrepl_daily_" critical: "2208h" # 90d + 2d - prefix: "zrepl_monthly_" critical: "8688h" # 30 * 12 = 360d + 2d - prefix: "" # everything else critical: "168h" # 7d
Every item can be configured to skip some datasets from the check, like:
- prefix: "zrepl_monthly_" skip_datasets: - pattern: "zdisk/video" warning: 13 critical: 14
In this example it checks number of snapshots with prefix
zrepl_monthly_for every dataset, configured indatasets, exceptzdisk/video.skip_datasetshas the same syntax, likedatasets.An example of a daily script:
echo echo "zrepl status:" zrepl monitor alive zrepl monitor snapshots
-
Removed support of
postgres-checkpointandmysql-lock-tableshooks. -
Periodic snapshotting now recognizes cron specification. For instance:
snapshotting: type: "periodic" cron: "25 15-22 * * *"
type: "cron"still works too, just for compatibility. Both of them is the same type. -
Fast skip "keep all" pruning.
Instead of configuration like this:
pruning: keep: - type: "regex" regex: ".*"
or like this:
pruning: keep_sender: - type: "regex" regex: ".*" keep_receiver:
which keeps all snapshots, now it's possible to omit
pruning:at all, or just one ofkeep_sender:orkeep_receiver:. In this case zrepl will early abort pruning and mark it as done.Originally zrepl requests all snapshots and does nothing after that, because pruning configured to keep all snapshots, but anyway it spends some time executing zfs commands.
-
Snapshots are named using local time for timestamps, instead of UTC.
So instead of snapshot names like
zrepl_20240508_140000_000it'szrepl_20240508_160000_CEST.timestamp_localdefines time zone of timestamps. By default it's local time, but withtimestamp_local: falseit's UTC. Configuration like:snapshotting: type: "periodic" cron: "*/15 * * * *" prefix: "zrepl_" timestamp_format: "20060102_150405_000" timestamp_local: false
returns original naming like
zrepl_20240508_140000_000with UTC time. -
Configurable RPC timeout (1 minute by default). Configuration example:
global: rpc_timeout: "2m30s"
sets RPC timeout to 2 minutes and 30 seconds.
See also zrepl/zrepl#791
-
Configurable path to zfs binary ("zfs" by default). Configuration example:
global: zfs_bin: "/sbin/zfs"
sets zfs binary path to "/sbin/zfs".
-
Replication now generates a stream package that sends all intermediary snapshots (
zfs send -I), instead of every intermediary snapshot one by one (zfs send -i). Such replication is much faster. For instance a replication job on my desktop configured like:replication: concurrency: steps: 4 size_estimates: 8
replicates over WLAN for 1m32s, instead of 8m.
-
New command
zrepl signal stopStop the daemon right now. Actually it's the same like sending
SIGINTto the daemon. -
New command
zrepl signal shutdownStop the daemon gracefully. After this signal, zrepl daemon will exit as soon as it'll be safe. It interrupts any operation, except replication steps. The daemon will wait for all replication steps completed and exit.
Sending
SIGTERMhas the same effect. -
Redesigned
zrepl status -
zfs send -wis default now. Example how to change it back:send: raw: false
-
New configuration for control and prometheus services. Example:
listen: # control socket for zrepl client, like `zrepl signal` or `zrepl status`. - unix: "/var/run/zrepl/control" # unix_mode: 0o660 # write perm for group control: true # Export Prometheus metrics on http://127.0.0.1:8000/metrics - addr: "127.0.0.1:8000" # tls_cert: "/usr/local/etc/zrepl/cert.pem" # tls_key: "/usr/local/etc/zrepl/key.pem" metrics: true
One of
addrorunixis required or both of them can be configured. One ofcontrolormetricsis required or both of them can be configured too. Everything else is optional. For backward compatibility old style configuration works too.See also zrepl/zrepl#780
-
New optional
preandposthooks forpushandpulljobs. Example:- name: "zroot-to-zdisk" type: "push" hooks: pre: path: "/root/bin/zrepl_hook.sh" args: [ "pre" ] # optional positional parameters env: # optional environment variables ZREPL_FOOBAR: "foo" timeout: "30s" # optional, default is 1m # don't continue job if exit status is nonzero (default: false) err_is_fatal: true post: path: "/root/bin/zrepl_hook.sh" args: [ "post" ] # optional positional parameters env: # optional environment variables ZREPL_FOOBAR: "bar" timeout: "0s" # without timeout at all
This configuration runs
/root/bin/zrepl_hook.sh prebefore replication with environment variables:ZREPL_FOOBAR=foo ZREPL_JOB_NAME=zroot-to-zdiskIf it exit with nonzero exit status the job will not continue. By default
err_is_fatal: falseand exit status is ignored.After pruning finished it runs
/root/bin/zrepl_hook.sh postwith environment variables:ZREPL_FOOBAR=bar ZREPL_JOB_ERR= ZREPL_JOB_NAME=zroot-to-zdiskThe
posthook setsZREPL_JOB_ERRto the last error. It's empty if the job finished without errors. -
The pruning now prunes filesystems concurrently.
By default it uses the number of CPUs as concurrency limit and it can be changed in config:
jobs: - name: "zroot-to-zdisk" pruning: concurrency: 1
That
concurrencycan be defined on both sides: local and remote. One side uses it to limit concurrentzfs listand other side usesconcurrencyfrom their config to limit concurrentzfs destroy. -
Job configurations can be included from multiple files:
include_jobs: "jobs.d/*.yaml"
Like
include_keys, the directory is relative to main configuration file.include_jobscan be combined withjobs:jobs: - name: "zroot-to-zdisk" include_jobs: "jobs.d/*.yaml"
-
The replication can be configured to replicate subset of snapshots, instead of all of them. Example:
jobs: - name: "zroot-to-zdisk" type: "push" replication: prefix: "zrepl_"
This configuration will replicate snapshots with names beginning with "zrepl_".
See also zrepl/zrepl#403
-
Snapshots now created concurrently.
By default it uses the number of CPUs as concurrency limit and it can be changed in config:
jobs: - name: "zroot-to-zdisk" snapshotting: type: "periodic" concurrency: 1
-
Recursive snapshots
zfs snaphot -rWhere possible, snapshots created using
zfs snaphot -r. With configuration like this:jobs: - name: "snap-1h" type: "snap" datasets: - pattern: "zroot" recursive: true
zfs snapshot -r zrootrecursively create snapshots of all descendent datasets, instead of one by one. But if recursive dataset has any exclusion, like this:jobs: - name: "snap-1h" type: "snap" datasets: - pattern: "zroot" recursive: true - pattern: "zroot/foo" exclude: true
-rnot used and it create snapshots of all descendent datasets one by one.See also zrepl/zrepl#634
-
Faster
zfs list(zrepl/zrepl#870)Runs
zfs list filesystem..., if possible, and list only specific datasets, instead ofzfs list, which lists every dataset on the entire system. -
Don't create empty or small snapshots
New option
written_thresholddefines the amount of space that should be written since the previous snapshot, before the new snapshot can be created.With configuration like this:
snapshotting: type: "periodic" written_threshold: 1024
any dataset, which has written less than 1024 bytes, will be skipped.
See also zrepl/zrepl#728
-
Manual "periodic" snapshot jobs
Snapshot jobs can be configured don't run periodically, but on signal only, from CLI or
zrepl status, like this:snapshotting: type: "periodic" interval: "manual"
User Documentation can be found at zrepl.github.io. Keep in mind, it doesn't contain changes from this fork.