ZFS for Busy Sysadmins
ZFS is a filesystem and volume manager combined. This means that you don’t necessarily need to add mountpoints to /etc/fstab
if you don’t want to. You don’t even need to create partitions with gpart
or makepart
if you’re using the whole disk for ZFS. Because swap partitions can’t live in ZFS, a boot disk typically has a partition for data, formatted as ZFS, and a swap partition. Any additional data disks can be managed completely with ZFS.
ZFS Limits
The maximum size of a ZFS volume is 2128 bytes which means that even the most advanced computers won’t encounter a hard limit for millennia.
The maximum file size is 264 bytes. Again, this limit won’t be reached for the foreseeable future.
The maximum number of files per directory is 248 bytes. The maximum number of files for an entire file system is effectively unlimited.
The maximum file name length is 255 ASCII characters.
ZFS Compatibility
This file system was created by Oracle for its Solaris operating system. It has native support in the defunct OpenSolaris OS and the commercial Solaris. ZFS became closed source when Oracle discontinued OpenSolaris. It was forked into the OpenZFS project which was based on the last release of ZFS before it became closed source. Most operating systems with ZFS support such as FreeBSD and mainline Linux use OpenZFS. Mac OS X also supports ZFS if OpenZFS is installed. Because development of Oracle ZFS and OpenZFS forked years ago, compatibility can be hit or miss depending on what versions are being used. Oracle ZFS is only available on Solaris. Windows cannot use ZFS in any form.
Most BSD flavors support ZFS out of the box. Distros of Linux may or may not have it enabled but it can usually be added after install; check your distro’s documentation to learn how to do this. Mac OS X requires OpenZFS to be installed to use ZFS. There are dedicated open source operating systems designed for serving files on ZFS such as FreeNAS.
Care should be taken when mixing different versions of OpenZFS. Newer versions support additional features which old versions cannot use. Unless the new features are disabled, older releases will not be able to access newer versions. A newer version can be used in some cases in read-only mode.
ZFS Hardware
Almost any type of hardware can run ZFS, from consumer grade to enterprise. A semi modern CPU is good enough unless a ton of disks or features are going to be used. Any SATA, SAS, or NVMe disk will work connected to any port the OS can see. HBA adapters should be used instead of RAID controllers. If the only option is a RAID controller, all RAID functionality should be turned off and the individual disks exposed to the OS. The usual rules for disks apply: enterprise grade drives should be used when performance and reliability is required. ZFS won’t magically make the underlying hardware any better.
More memory is better for ZFS. It uses available RAM for file caching for increased performance. Some sources state that 1GB of memory is required for every 1TB of space. This is only true if deduplication is turned on. 8GB per 100TB is a reasonable minimum. Memory speed doesn’t matter if the device is a NAS because the data transfer bus (e.g. ethernet) will be the performance bottleneck first.
Cache devices can be used to increase performance. The read cache is called the L2ARC and the write cache is called the ZIL or SLOG These should be located on high performance, high endurance SSDs. In mission critical systems the ZIL/SLOG should use mirrored drives. Although it is unlikely, if power is lost while data is being written, you can lose some of the recent data. The loss of a ZIL/SLOG drive will not corrupt already written data. The L2ARC drive does not need to be mirrored. If it’s lost the only effect will be decreased read performance.
The ZIL/SLOG device should have very high write endurance but doesn’t need to be very big. Unless data is actively being written it will sit idle. The L2ARC device can have slightly lower endurance but should be higher capacity. It will always cache the most used files. The L2ARC and ZIL/SLOG can be on the same device but is not recommended. If the pool is entirely made up of SSDs, an L2ARC and ZIL/SLOG probably aren’t needed, any could possibly even degrade performance.
ZFS Concepts
At the lowest level is the physical disk, which is sometimes called the provider, although things other than disks can be providers. Disks make up virtual devices, or vdevs which are equivalent to traditional RAID sets. vdevs make up pools which are equivalent to nested RAID levels. Any number of drives can be in a vdev and any number of vdevs can be in a pool. Like traditional RAID, if the redundancy level is exceeded, the vdev will fail. If any vdev in a pool fails, the pool fails. More disks in a vdev will improve performance as will more vdevs in a pool. The flipside is that the chance of data loss is increased if too many disks are in a vdev or too many vdevs are in a pool. You need to find the tradeoff between performance and redundancy to suit your environment.
ZFS RAID Equivalant
Regular single disk – Single disk in a vdev, single vdev in a pool
RAID 0 – multiple disks in a vdev with no redundancy set or multiple single disk vdevs in a pool
RAID 1 – mirror; 2 or more disks in a vdev with mirroring turned on.
RAID 5 – RAIDZ1; 3 or more disks in a vdev set as RAIDZ1
RAID 6 – RAIDZ2; 4 or more disks in a vdev set as RAIDZ2
RAID 7? – RAIDZ3; 5 or more disks in a vdev set as RAIDZ3. Like RAID6 but with 3 parity disks.
RAID 10 – multiple mirrored vdevs in a pool
RAID 50 – multiple RAIDZ1 vdevs in a pool
RAID 60 – multiple RAIDZ2 vdevs in a pool
RAID 70? – multiple RAIDZ3 vdevs in a pool
Undeleting a Pool
Deleting a ZFS pool is like deleting a partition. The data is still on disk as long as it hasn’t been overwritten by new data. A recently destroyed pool can be recovered by running:
zpool import -D
.
If the pool shows up as ONLINE (DESTROYED)
, run:
zpool import -D <pool name or ID number>