Dataset

From Open-E Wiki
Jump to navigation Jump to search

A dataset is a ZFS file system created inside a ZFS pool (zpool). In this documentation, dataset always refers to a file-system resource, typically used as NAS storage for SMB / NFS shares.

Datasets are typically used to:

  • Provide NAS volumes for SMB and NFS shares.
  • Separate data sets with different performance or data-protection policies (for example, different compression, recordsize, or deduplication settings).
  • Apply independent quota and reservation limits for different workloads or tenants.

You create a dataset first, then create shares (SMB / NFS, etc.) that point to it.

Creating a dataset

  1. Go to the pool management view in the GUI.
  2. Select and expand the zpool where you want to create the dataset.
  3. Navigate to the Shares section for the selected zpool.
  4. Click Add dataset to open the dataset creation dialog.
  5. Configure the parameters.
  6. Review and confirm creation. The new dataset appears in the dataset list for the selected zpool.

After creation, you can:

  • Assign SMB or NFS shares to the dataset in the appropriate shares configuration pages.
  • Adjust most dataset properties later; however, encryption settings and some layout-related properties cannot be changed after creation.

Below is a description of all dataset properties.

Encryption settings

This section is displayed at the top of the dialog. Encryption is available only during creation and cannot be disabled later. Once the dataset is created, you cannot turn encryption off for it.

Encrypt resource

Enable this switch to create an encrypted dataset. When disabled, the dataset is created unencrypted.

Encryption method

Shows the algorithm used when the dataset is encrypted.

  • By default, it inherits the value from the Configuration -> Resource encryption setting (for example, aes-256-gcm).
  • You can select a different supported method for this dataset.

For details about keys, unlocking, and error handling, see Encryption.

Dataset properties

These fields define the behaviour of the dataset itself.

Name

  • The dataset name must be unique within the pool.
  • Allowed characters: a–z  A–Z  0–9  .  _  -

Changing the name of an existing dataset breaks paths used by its shares; clients will lose access until the share definitions are adjusted.

Deduplication

Enables ZFS block-level deduplication for this dataset. Options:

  • Disabled (default) – deduplication is turned off.
  • On – alias for “sha256”.
  • Verify – alias for "sha256, Verify"; additionally compares blocks to reduce the risk of false matches.
  • sha256 - uses the SHA-256 checksum for deduplication. When two blocks have the same checksum, they are treated as identical and only a single copy is stored.
  • sha256, Verify – uses SHA-256 for deduplication and additionally verifies candidate duplicate blocks to detect possible hash collisions. This mode is very resource-intensive and is not recommended for general use.

Use deduplication only for workloads with a high ratio of identical blocks and sufficient RAM (e.g., many similar VM images). For general data, it is usually better to keep it disabled.

Number of data copies

Controls the number of ZFS data copies stored for this dataset, in addition to pool redundancy (mirror, RAIDZ, and so on).

  • Possible values: 1 (default), 2, 3.
  • Copies are stored on different disks when possible.
  • Extra copies increase used space and are counted towards quota and reservation.
  • Only new writes use the current setting.

Use higher values only for small but critical datasets where local redundancy is more important than capacity.

Compression

The compression algorithm used for this dataset.

  • Default: lz4 (default) – fast, generally recommended.
  • None – disables compression.
  • Other algorithms that can appear in the list:
    • gzip levels 1–9 (1 = fastest, lowest compression; 9 = slowest, highest compression),
    • lzjb,
    • zle.

Keep lz4 for most datasets. Disable compression only when data is already compressed and very latency-sensitive.

Record size

Suggested block size for files stored in this dataset.

  • Designed primarily for database-type workloads that access large files in fixed-size records.
  • For such workloads, setting the “record size” to at least match the database record size can significantly improve performance.
  • For general-purpose datasets, changing the default is not recommended and may reduce performance.
  • Values: 4, 8, 16, 32, 64, 128, 256, 512 KiB and 1 MiB; newer software versions allow values up to 16 MiB.
  • Default: 128 KiB (default).

The new record size applies only to data written after the change; existing files keep their original block size.

Write cache sync requests

Controls the ZFS sync property – how synchronous write operations are handled.

Options in the drop-down:

  • Always: All file-system transactions are committed and flushed to stable storage before returning to the application. Best data safety; lower performance.
  • Standard (default): Synchronous operations are logged and flushed; however, to improve performance, the most recent cached data (approximately one second) may be lost if a sudden power failure occurs. Recommended only when the environment is protected by a reliable UPS, as indicated by the warning in the dialog.
  • Disabled: Synchronous requests are treated as asynchronous; data is committed only when the next transaction group is written. This provides maximum performance but the highest risk of data loss and inconsistency. Use only for non-critical workloads where this risk is acceptable.

Write cache sync request handling (logbias)

Gives a hint how synchronous writes for this dataset should use log devices.

  • Write log device (Latency) – if the pool has separate log devices, they are used to minimize latency of synchronous writes. Recommended default.
  • In pool (Throughput) – log devices are not used; the software optimizes for overall pool throughput and efficient use of resources.

Read cache (primary, ARC) scope

Controls what is cached in main memory (ARC) for this dataset.

  • All (default) – cache data and metadata.
  • Metadata – cache only metadata.
  • None – do not cache anything from this dataset in ARC.

You can reduce ARC pressure for large streaming or low-priority datasets by switching to “Metadata” or “None”.

Read cache (secondary, L2ARC) scope

Controls what is cached on L2ARC devices (if present).

  • All (default) – cache data and metadata.
  • Metadata – cache only metadata.
  • None – do not cache this dataset in L2ARC.

Use “Metadata” or “None” for datasets that would otherwise fill L2ARC with low-value data.

Access time

Controls whether file access time (atime) is updated on reads.

  • Disabled (default) – access time is not updated, which avoids extra writes and can significantly improve performance.
  • Enabled – access time is updated on each read; required by some legacy applications (for example, certain mailers).

Small data blocks policy

This section controls how small data blocks of this dataset are placed when the pool has a special devices group configured.

  • If no special devices group exists in the pool, the section is disabled, and an information banner appears: “Available only when a special devices group exists.” In this case, all data blocks are stored on regular data vdevs.
  • When a special devices group exists and is healthy, the Small data block size list becomes active.

Small data block size

Defines the maximum size of blocks that will be stored on special devices instead of regular data vdevs (this corresponds to the ZFS special_small_blocks property for the dataset). More info available in the “Small blocks policy settings” article.

Available options in the drop-down:

  • Disable for the dataset: The small data blocks policy is disabled for this dataset, regardless of the pool settings. All data blocks (including small ones) are stored on regular data vdevs.
  • 4 KiB, 8 KiB, 16 KiB, 32 KiB, 64 KiB, 128 KiB, 256 KiB, 512 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, 16 MiB: Any data block with a logical size less than or equal to the selected value is stored on special devices. Larger blocks are stored on regular data vdevs.
  • Inherit from the pool settings (default) [X KiB]: The dataset inherits the pool-level small blocks setting. The value in brackets ([X KiB]) shows the current pool threshold; e.g.:
    • [0 KiB] – small data blocks policy is effectively disabled on the pool.
    • [128 KiB] – blocks up to 128 KiB are redirected to special devices according to pool settings.

Notes and recommendations

  • A higher threshold moves more data to special devices, which can improve performance for small, random I/O, but also increases capacity usage on the special devices group.
  • A very small value (e.g., 4 KiB or 8 KiB) typically limits the placement mostly to metadata and very small files.
  • If special or dedup devices are not supported by the pool layout (e.g., the pool contains RAIDZ data groups instead of mirror-based data vdevs), the small data blocks policy cannot be effectively used. Plan the pool layout accordingly.
  • If the special devices group becomes degraded or unavailable, the performance and behaviour of datasets using the small data blocks policy can be affected; always monitor pool health.

Space management – quota and reservation

The bottom part of the dialog controls space limits for the dataset.

Enable quota

When this switch is enabled:

  • Quota definition
    • Hard limit on the total space that the dataset and all its descendants (child datasets, snapshots, clones) can consume.
    • A unit (MiB, GiB, TiB) can be selected from the drop-down.
  • Include snapshots and clones (checkbox)
    • When checked (default), space used by snapshots and clones counts towards the quota. This matches standard ZFS behaviour and is usually recommended.

Notes:

  • Quota cannot be smaller than reservation (if reservation is enabled).
  • When the quota is reached, further writes fail with “out of space” for this dataset even if the pool still has free capacity.

Enable reservation

When this switch is enabled:

  • Reserved space
    • Amount of pool space reserved exclusively for this dataset.
    • You cannot reserve more than the currently available free space in the pool. The dialog shows the currently available physical space below the field.
  • Include snapshots and clones (checkbox)
    • When checked, the reserved space covers the dataset and all its descendants (snapshots and clones).
    • When unchecked, reserved space applies only to the dataset itself (behaviour similar to ZFS refreservation).

Additional rules:

  • The sum of all reservations in a pool cannot exceed its free space.
  • Quota must be greater than or equal to reservation.

Use reservation only for datasets that must have guaranteed space, for example critical databases or backup targets.