Deduplication
Deduplication is a process that eliminates redundant copies of data and reduces storage overhead. In turn, the deduplication ratio is the measurement of the zpool's data original size versus the data size after removing redundancy. Deduplication can be set on zvols or datasets but the deduplication ratio is displayed per pool instead of single zvols or datasets because they are located on the pool.
Note: It is not recommended to use deduplication in systems whose primary aim is to provide efficiency and fast data access in case when High Availability cluster is enabled. In certain conditions, the deduplication table may crash and rebuilding it while importing the pool may take a lot of time or even lead to improper functioning of a failover.
Things to be taken into consideration
Before using deduplication, the following matters should be taken into account:
- hardware - deduplication is very memory consuming so if simultaneously the system would have to process the current tasks, it may significantly slow down its efficiency
- need for quick access to the data - when archiving or backing up the data as it saves the disk space. In case when there is a small amount of repetitive data, deduplication can only cause longer write times.
It is also worth to calculate memory requirements in the following way:
- deduplication table (DDT) entry that is equal to about 320 bytes should be multiplied by the number of allocated blocks by 320. For example: DDT size (1.08MB) x 320 = 345,6 meaning that this amount of memory is required for deduplication
Once deduplication is enabled, it has an impact on the whole pool since it creates the global DDT array with deduplication indicators. When deduplication is disabled, Zpool storage deduplication rate is set to 1. If the value is greater than 1, then the deduplication operation has taken place. Disabling deduplication will not cause the value to return to 1.
Removing deduplicated data
Deduplicated data must be rearranged after deduplication has been disabled on zpool by performing the send/ receive operation on the pool. The Zpool storage deduplication rate will be reset to 1, otherwise old data will be left in deduplication state.
To remove deduplicated data, disable deduplication and transfer data from sources which had deduplication enabled to a different place where deduplication is not enabled.
If the deduplication ratio on the given pool returns to value 1.0 it is possible to assume that there are no deduplicated data on the pool.