On- and Off-site Data Protection
OODP feature creates rotational auto-snapshots of a volume (dataset or zvol) according to user-defined retention-interval plans, and asynchronously replicates snapshots delta to local (on-site) or remote (off-site) destinations.
Optionally, the service can create auto-snapshots only (without replicating to other volume).
The backup volumes are the asynchronously created mirrors (copies) of production volumes. With each time interval, snapshot is created and delta of the subsequent snapshots is sent to the destination backup volume.
The data replication starts every defined time interval, the new data changes that took place in the interval time are replicated to the backup volume. Also, after every time interval the age of existing snapshots is checked against its retention time. Snapshots that are older than it has been set in the user-defined retention plan will automatically be deleted.
As creating snapshots and replicating small deltas of data does not generate any significant system load, the backup process can work round the clock even during heavy load periods.
For each source volume we can set up a single backup volume or multiple backup volumes. The interval-retention plan can be separately defined on the source volume and any destination backup volumes.
Only Open-E JovianDSS can be used as a backup destination server.
Storage snapshots are not applications memory consistent, but most Windows OS and applications, including MS-SQL, and Linux virtual machines running on protected volumes can start from exported and mounted backup volumes without any problem.
If applications consistent snapshots are required, the replication task wizard or CLI will provide option to register VMware vCenter or vSphere servers which are using the protected datastores.
Once vCenter or vSphere servers are registered, all virtual machines running on the protected datastores will receive VSS (Windows) or freeze (Linux) the snapshot request via VMware API. The VMware tools have to be installed on virtual machines in order to execute API triggered snapshots.
If a virtual machine does not require application consistent snapshot, it must have the following text entered into VM Annotations: ##auto-snap-skip##
It is also possible to enter time ranges and days of week for the auto-snap-skip time period.
For example, skip weekdays from Monday to Friday, from 8:45 AM to 12:30 PM and from 1:15 PM to 6:30 PM
Following syntax options are accepted:
##auto-snap-skip##8:45..12:30,13:15..18:30##Mon..Fri##
##auto-snap-skip##08:45..12:30,13:15..18:45##Mon..Fri##
##auto-snap-skip##08:45..12:30,1:15PM..6:45PM##Mon..Fri##
##auto-snap-skip##08:45..12:30,13:15..18:45##Mon,Tue,Wed,Thu,Fri##
It is recommended to put the auto-skip instruction as the first line in the VM Annotations, but it will work also if it is found in any place of the VM annotations.
Clustered environment:
It is possible to use OODP in a clustered environment when both source and destination are clusters (two pairs of clusters). Each source node must then attach to each destination node using its physical IP address and tasks must be created with IP of the destination cluster ("VIP" Virtual IP).
In the following configuration: source (cluster) -> destination (single node), if the destination node is inaccessible over network, it will not stop the failover process on the source (cluster).
Ongoing replication processes are stopped in case of an automatic failover or manual ‘move’ operation. It applies only to the moved pools. Replication processes connected with different pools will remain and continue to function.
OODP Working modes:
Asynchronous replication of snapshotsdelta to local or remote destinations, where destination is:
- Volume on a remote server pool.
- Volume on different pool within the same server.
- Another volume within the same pool. (not recommended)
Auto-snapshots on the local server only.
Note: Auto-snapshots only can NOT be used as full data protection as the data are not replicated.
If other 3rd party backup is used, the auto-snapshots can be used for quick access to the previous data versions. To create the auto-snapshots only task, the destination volume is disabled in the task wizard or omitted in the CLI.
Retention plans:
The ODP retention-interval plan consists of a series of retention periods to interval associations. It can be intuitively defined in the replication task wizard on the GUI. If CLI is used the retention-interval use following syntax:
"retention_every_interval,retention_every_interval,retention_every_interval,...".
Example: 1hour_every_10min,3day_every_1hour,1month_every_1day
Un-Kh intervals and retention periods use standard units of time or multiples of them, with full names or a shortcut according to the following list: second|sec|s, minute|min, hour|h, day|d, week|w, month|mon|m, year|
Rotational auto-snapshots on both source and destination are created according to retention plans. It is possible to have different retention plans for source and destination pool.
GUI provides a step-by-step wizard to define the replication tasks.
If CLI is used to define the replication tasks, the following tasks options are provided:
- It is possible to attach, list and detach backup nodes (remote nodes used for the asynchronous replication)
- It is possible to create a backup task in the following modes:
- Only source node details provided -> snapshots created locally, no replication.
- Source and destination node details provided -> snapshots created locally and replicated to destination. Un-Kh source and destination keep rotational auto-snapshots.
- Optionally, it is possible to use mbuffer (with parameter mbuffer size) as a tool for buffering data streams.
- It is possible to get details of all backup tasks.
- It is possible to delete the backup task.
- It is possible to get the status of the ODP service:
- Service status
- Last entries from logs
- It is possible to debug the backup task - run it in dry mode in order to check what is the issue.
- It is possible to restart all tasks - configuration of tasks will be reset.
Important notes:
- The OODP CLI provides detailed syntax help.
- Only Open-E JovianDSS can be used as a destination server.
- Replication will not be done as long as the destination dataset or zvol does not exist. User needs to create the destination dataset/zvol manually.
- Replication will not be done if the dataset or zvol on destination node is used e.g. by iSCSI Target with an active session. Data from a particular snapshot can be accessed only via a clone created from the snapshot.
- User snapshots created on the destination dataset or zvol will be deleted by OODP during rotation round.
- User snapshots created on the source dataset or zvol are not deleted by OODP during rotation round.
- Snapshots on both source and destination that are cloned are not deleted by OODP during rotation round.
- Replication round fails due to the fact that it is not possible to replicate a snapshot to destination e.g. caused by:
- Lack of communication between nodes
- Busy dataset on destination server (used e.g. by iSCSI Target)
- Existence of user’s own snapshot with a clone on destination server,
- If user’s manual snapshot on destination server was created before the first replication, the source snapshots will not be rotated. During next replication the rotation of snapshots on both source and destination is performed, provided that all replication requirements are fulfilled
- If nodes have different sets of snapshots (no common snapshot between source and destination), snapshots on destination server are deleted and re-replicated from the source.
- OODP is activated when at least one backup task exists in the system ( at pool import and system start).
- OODP is deactivated when there are no backup tasks in the system (at pool export).
- Replication to remote destination is encrypted (SSH).
- OODP replication processes are stopped at pool export/destroy.
- Ongoing replication process is not stopped when OODP task is deleted. After finishing this process there are no more replications.
- When the backup plan is created as follows: 1min_every_10min, the backup task will not start. Retention time must always be larger than the interval: 10min_every_1min.
- The Source snapshot that is being replicated (replication of older snapshot is still in progress) blocks the rotation of snapshots on source. New snapshots are created according to the scheduled plan.
- In case many destination servers are used, only one replication is performed at given time.
- It is not recommended to create manual snapshots on destination servers. This may break the synchronization process. If access to the data on the destination server is required, use snapshots created by OODP.
- Save settings mechanism does not support OODP tasks (this does not apply for attached nodes). All information regarding OODP tasks are stored as dataset properties. If the dataset does not exist, the OODP task will not be restored.
Known issues:
- Source and destination should be of the same type (zvol-zvol, dataset-dataset). It is possible to create a task with source e.g. zvol and destination - dataset but when started, it shows error.
- Temporary errors on GUI: OODP can cause temporary inconsistency of internal GUI cache (used by API in Python) with zfs resources. This temporary inconsistency can lead to errors displayed in the GUI that inform about missing snapshots. This issue happens because OODP refreshes the cache using a hook script. Cache is refreshed before and after any snapshot is removed by OODP. But using a hook script (only possible way of cache refresh when using external software) leaves a small window when cache is inconsistent. If GUI requests information about snapshots in time between snapshot removal and cache refresh, then cache is inconsistent and GUI gets information about non existing items which may lead to errors. Most common place of this error is the volume edition - snapshots are checked in this window in order to lock the edition of name if volume has any snapshots. The issue is only temporary and cache is refreshed when window is closed and reopened. In such case, the error will not pop up because cache is consistent again.