Understanding NFS Caching

nfs server and clients

Filesystem caching is a great tool for improving performance, but it is important to balance performance with data safety. Caching over NFS involves caches at several different levels, so it is not immediately obvious which combination of options ensures a good compromise between performance and safety.

Client-side caching

the NFS client has the async mount option, which caches writes in the client's RAM until certain conditions are met:

 1The NFS client treats the sync mount option differently than some other file systems (refer to mount(8) for a description of the generic sync and async mount options). If neither sync nor async is specified (or if the async option is specified), the NFS client delays sending application writes to the server until any of these events occur:
 2
 3- Memory pressure forces reclamation of system memory resources.
 4- An application flushes file data explicitly with sync(2), msync(2), or fsync(3).
 5- An application closes a file with close(2).
 6- The file is locked/unlocked via fcntl(2).
 7
 8In other words, under normal circumstances, data written by an application may not immediately appear on the server that hosts the file. 
 9
10If the sync option is specified on a mount point, any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to user space. This provides greater data cache coherence among clients, but at a significant performance cost.

See nfs(5) for more details. In other words, when writing data to a file or set of files, rather than flush to the server on each write(2) call, the system waits until the file is closed or the application expliticly calls fsync(3) or another sync function. Since you're relying on the application correctly request its data to be synced, I was concerned about relying on this cache in a general circumstance, when potentially poorly-written applications could be never syncing their data. However, given that close(2) causes the data to be synced, this seems like a non-issue, and asking on the linux-nfs mailing list clarified in more detail how this works:

 1In NFSv3, the close() will cause the client to flush all data to stable storage.
 2The client will also flush data to stable storage on a chmod, since
 3that could potentially affect its ability to write back the data. It
 4will not bother to do so for rename.
 5
 6An application should normally be able to rely on the data being
 7safely on disk in both these situations provided that the server
 8honours the NFS protocol (with a caveat that an ill-timed 'kill -9'
 9could interrupt the process of flushing).
10
11All metadata operations such as create, chmod, rename, etc. will cause
12the server to flush the file metadata to disk assuming that you set
13the (highly recommended) "sync" export option. If "sync" is set, the
14server will also honour COMMIT requests by flushing the data to stable
15storage.
16
17If, OTOH, your server lists the "async" export option as being set,
18then COMMIT is considered a no-op, and it will not bother to
19explicitly flush metadata operations to stable storage. Performance
20will scream, but be prepared to lose data if that server crashes. This
21is all technically a violation of the NFS spec, however you have been
22given rope...

Therefore, using async on the client is safe and will provide a pretty significant performance boost.

It's also important to look at soft verses hard mounts. A soft mount will give up attempting to write to a server that is unavailable after a specific timeout and number of retries. In my experience, this hasn't worked well and I often end up with processes stuck in uninterruptable sleep blocking on an NFS mountpoint anyway. As per the manpage, hard is highly recommended to ensure data integrity:

1Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
2
3NB: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option.

Note that the intr option allows you to interrupt a request waiting on a hard NFS mount by sending it the SIGKILL signal. However, on kernels newer than 2.6.25 this is provided by default, and the intr option is deprecated. You should still be aware of it though in case you are working with an older kernel.

Given my poor experience using soft (the timeouts don't seem to actually work) and the increased risk of data loss, hard seems like the most appropriate option to use. The common problem mentioned with using hard is if the server goes away (e.g hardware failure and it is down for an extended period of time), there used to be no way to unmount that mountpoint or let processes blocking on it complete. There are now a few ways to mitigate this:

bring up a fake NFS server on the same IP address as the offline server, which can then reject the requests that are waiting for a response. I've even seen this done with a secondary IP on an interface
use the fsid=<unique number> on the server side in /etc/exports. This creates a static unique identifier for the export, so you won't get a "Stale NFS File Handle" error on the client if the server is restarted or goes offline. These ID numbers must be unique and be greater than 1, since 1 is used by NFSv4 as the root export. This does not work on NFS servers on FreeBSD (which don't have the ability to set a static fsid).
try "lazy" unmounting the mountpoint with umount -l /path/to/mountpoint

If the above fails to work, you will probably have to reboot the client in order to clear the stuck mountpoint.

Server-side caching

Confusingly, the NFS server options (found in /etc/exports) are also called sync and async, see exports(5) for details:

async:

1This option allows the NFS server to violate the NFS protocol and reply to requests before any changes made by that request have been committed to stable storage (e.g. disc drive).
2
3Using  this  option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data to be lost or corrupted.

sync:

1Reply to requests only after the changes have been committed to stable storage (see async above).
2
3In releases of nfs-utils up to and including 1.0.0, the async option was the default. In all releases after 1.0.0, sync is the default, and async must be explicitly requested if needed. To help make system administrators aware of this change, exportfs will issue a warning if neither sync nor async is specified.

Thus if you use async on the server side, the data will be confirmed to be written as soon as it hits the server's RAM. In the case of a power failure, this data would be lost. Conversely, sync waits for the data to be written to the disk or other stable storage (and confirmed) before returning a success. It is clear that sync is the appropriate option to use on the server side.

Recommended Options

In conclusion, these options seem to provide a good balance of stability and performance when using NFS:

Client Side:
- hard - forces requests to retry indefinitely to avoid corruption
- intr - this allows hard mounts to be interrupted (though is unnecessary on kernels newer than 2.6.25)
- async - queue up writes and flush them in logical groups for more efficient writing
- tcp - using TCP is more reliable than UDP since it requires confirmation of receipt of packets
Server Side:
- fsid - specifies a unique, static identifier for this export; see above for more details
- sync - ensures that data is really flushed to stable storage when the server says it is

Support Us

If you found this article helpful, please subscribe to our newsletter or support us on Patreon and get access to bonus features!

Questions? Comments?

Do you have questions or comments about this article? Please contact us via Mastodon or Email - we want to hear from you!