Cluster Coherent NFSv4 and Share Reservations

From Linux NFS

Revision as of 17:04, 10 October 2006 by Peterhoneyman (Talk | contribs)
Jump to: navigation, search

Cluster Coherent NFSv4 and Share Reservations

Background

NFSv4 share reservations control the concurrent sharing of files at the time they are opened. Share reservations come in two flavors, ACCEPT and DENY. There are three types of ACCEPT reservations: READ, WRITE, and BOTH; and four types of DENY reservations: NONE, READ, WRITE, and BOTH.

ACCEPT reservations are familiar to Linux users, as they map directly to posix open() flags. NFSv4 ACCEPT shares of READ, WRITE, and BOTH map directly to O_RDONLY, O_WRONLY and O_RDWR, respectively.

NFSv4 DENY reservations act as a type of whole file lock applied when a file is opened. NFSv4 DENY shares of READ, WRITE, and BOTH prevent other opens with read, write, or any access from succeeding. DENY NONE allows other opens to proceed.

The Linux system call interface for open() follows the posix standard, which does not include support for share reservations. In particular, there is no direct analog in posix for an application to request DENY READ, WRITE, or BOTH shares. Consequently, Linux NFSv4 clients always use DENY NONE.

The mismatch between posix and NFSv4 shares is also reflected on an NFSv4 server. The Linux NFSv4 server that receives DENY reservations from clients that can express them, which in practice means Windows clients, does the appropriate bookeepping and enforcement, but the local filesystem is unable to enforce DENY shares for local access on the server.

When a cluster file system is exported with NFSv4, multiple NFSv4 servers export a common back-end file system, so ACCESS and DENY reservations must be distributed to take into account shares from other NFSv4 servers. In other words, the NFSv4 server has to ask the cluster file system if an incoming OPEN share can be granted.

DENY Share Support in Linux

Adding DENY share support to the Linux kernel faces several obstacles:

  • DENY shares are alien to posix, the Linux model for file systems.
  • There are currently no open Linux file systems that support DENY shares.
  • Linux and all other UNIX-like NFSv4 clients currently work correctly because they never request DENY access.
  • DENY shares do not meet the NFSv4 access needs of Linux clients, just Windows clients.
  • Not even off-the-shelf Windows clients benefit as NFSv4 for Windows is a third-party add-on (from Hummingbird).
  • The user level SAMBA server implements DENY shares with open and flock (albeit with the obvious race conditions), which obviates kernel support.

Implementation Issues

To enforce open share DENY access across the cluster back end is complicated, since an open with DENY must atomically lookup, (possibly) create, open, and lock the target file.

The Linux client atomically joins lookup, create, and open with lookup intents; the back end may have to do the same thing. The Linux client must also make the open and lock an atomic operation, but there is a problem: you can't lock that doesn't exist, so you must first create it. But as soon as the file is created, some other application might find it and lock it. Returning an error to an open that succeeding in creating a file is unexpected behavior. Applying restrictive mode bits to the create won't always work, either, because another application might relax the mode restrictions and open the file.

This suggests that we add the share lock to the open call instead of making it a separate operation.

One approach

  • Add 2 bits to the open flags, deny_read and deny_write. (Use the existing open bits as the allow bits.) Also make sure these get propagated to the intent structure.
  • Provide operation adjust_share(file, flags). FS should be allowed to refuse operations that could not result from open or close. (So, anything that doesn't only turn bits on or only turn them off.)

Is there a race here?: Say we can an open create with a share lock. How do we decide whether to treat it as an upgrade or an open?

Best attempt

  • look up; upgrade if we find it.
  • open; if we get an error indicating a share conflict, retry the lookup. Etc.

Obviously not ideal. Would it help to get a reference on the dentry before trying the open?

Is there currently a lookup/open race if the backend is a distributed filesystem? I suppose that's up to them--we need to look at how we implement open and make sure it does the intent stuff right. On a brief glance it looks to me like we probably don't.

An alternative might be to expose something similar to the openowner to the vfs and let it decide (by comparing openowners) whether a given open is an upgrade or a new open.

Status

No progress to report.

Personal tools