Cluster Coherent NFSv4 and Share Reservations

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(One approach: new flags for open())
 
(10 intermediate revisions not shown)
Line 1: Line 1:
-
'''Cluster Coherent NFSv4 and Share Reservations'''
+
=Background=
-
''Background''
+
NFSv4 share reservations control the concurrent sharing of files at the time they are opened.  Share reservations come in two flavors, ACCESS and DENY. There are three types of ACCESS reservations: READ, WRITE, and BOTH; and four types of DENY reservations: NONE, READ, WRITE, and BOTH. 
-
NFSv4 share reservations come in two flavors - ACCEPT and DENY. ACCEPT reservations are familiar to Linux users - they are the posix open() flags O_RDONLY, O_WRONLY and O_RDWR which are mapped to NFSv4 access shares of READ, WRITE, and BOTH respectivly.
+
ACCESS reservations are familiar to Linux users, as they map directly to posix open() flagsNFSv4 ACCESS shares of READ, WRITE, and BOTH map directly to O_RDONLY, O_WRONLY and O_RDWR, respectively.
-
NFSv4 also has deny shares NONE, READ, WRITE, and BOTH. With the exception of deny NONE, deny shares act as a type of whole file lock: requesting deny READ at open means that no other open with read access will succeed.
+
NFSv4 DENY reservations act as a type of whole file lock applied when a file is opened.  NFSv4 DENY shares of READ, WRITE, and BOTH prevent other opens with read, write, or any access from succeeding.  DENY NONE allows other opens to proceed.
-
Linux supports a posix syscall interface which does not include support for share reservations. Specifically, there is no way for an application to request deny shares. So, Linux NFSv4 clients will always use a deny of NONE, because there is no way to express any other deny share through the posix open() interface.
+
The Linux system call interface for open() follows the posix standard, which does not include support for share reservations. In particular, there is no direct analog in posix for an application to request DENY READ, WRITE, or BOTH shares. Consequently, Linux NFSv4 clients always use DENY NONE.
-
This is also true for local access to NFSv4 exports. While the NFSv4 server bookeeps and enforces deny shares from clients who can express them (e.g. Windows clients), there is no way to enforce deny shares on local access.
+
The mismatch between posix and NFSv4 shares is also reflected on an NFSv4 server. The Linux NFSv4 server that receives DENY reservations from clients that can express them, which in practice means Windows clients, does the appropriate bookeepping and enforcement, but the local filesystem is unable to enforce DENY shares for local access on the server.
-
In the cluster file system case, where multiple NFSv4 servers are exporting the same back-end file system, the share ACCESS/DENY decision needs to be distributed to take into account shares from other NFSv4 servers; in other words, the NFSv4 server has to ask the cluster file system if an incoming OPEN share can be granted.
+
When a cluster file system is exported with NFSv4, multiple NFSv4 servers export a common back-end file system, so ACCESS and DENY reservations must be distributed to take into account shares from other NFSv4 servers.  In other words, the NFSv4 server has to ask the cluster file system if an incoming OPEN share can be granted.
-
''Linux Deny Share Support''
+
==DENY Share Support in Linux==
-
Reasons that getting deny share support into the kernel will be difficult include:
+
Adding DENY share support to the Linux kernel faces several obstacles:
-
    * Deny shares are not present in POSIX systems such as linux.
+
* DENY shares are alien to posix, the Linux model for file systems.
-
    * Deny shares are only needed to support NFSv4 windows clients.
+
* There are currently no open Linux file systems that support DENY shares.
-
    * There is no native NFSv4 windows client (all third party - hummingbird)
+
* Linux and all other UNIX-like NFSv4 clients currently work correctly because they never request DENY access.
-
    * There are currently no open Linux file systems that support deny shares
+
* DENY shares do not meet the NFSv4 access needs of Linux clients, just Windows clients.
-
    * The userlevel samba server uses open and flock (with all the races) to implement deny share locking
+
* Not even off-the-shelf Windows clients benefit as NFSv4 for Windows is a third-party add-on (from Hummingbird).
-
    * Unix NFSv4 clients (no deny shares, only access shares) currently work correctly
+
* The user level SAMBA server implements DENY shares with open and flock (albeit with the obvious race conditions), which obviates kernel support.
-
''Implementation Issues''
+
=Implementation Issues=
-
We want to correctly enforce open share deny bits, for the benefit of windows v4 clients, across the whole cluster. This is complicated, since an open is simultaneously
+
To enforce open share DENY access across the cluster back end is complicated, since an open with DENY must atomically lookup, (possibly) create, open, and lock the target file.
-
    * a lookup
+
The Linux client atomically joins lookup, create, and open with [[lookup intents]]; the back end may have to do the same thing. The Linux client must also make the open and lock an atomic operation, but there is a problem: you can't lock that doesn't exist, so you must first create it. But as soon as the file is created, some other application might find it and lock it.  Returning an error to an open that succeeding in creating a file is unexpected behavior. 
-
    * a create (possibly)
+
-
    * a lock
+
-
We manage to do a and b atomically on the client with open intents. The distributed filesystem may have to do the same thing. We need to also deal with c atomically somehow.
+
Applying restrictive mode bits to the create won't always work, either, because another application might relax the mode restrictions and open the file.
-
One possible problem (there may be others): you can't lock before create, so you must create first. But once you've created, someone else may find the file and get a share lock. Returning a deny to an open that created a file is probably unexpected behavior.
+
This suggests that we add the share lock to the open call instead of making it a separate operation.
-
So it'd be nice to add the share_lock to the open instead of making it a separate operation.
+
==One approach: new flags for open()==
-
''One approach''
+
* Use existing O_RDONLY, O_WRONLY and O_RDWR open flags to implement O_ACCESS_READ, O_ACCESS_WRITE, and O_ACCESS_BOTH, respectively.
 +
* Add two open flags: O_DENY_READ and O_DENY_WRITE.
 +
* Propagate O_DENY flags to the intent structure.
 +
* Add operation adjust_share(file, flags).  The file system should be allowed to refuse operations that could not result from open or close. (So, anything that doesn't only turn bits on or only turn them off.)
-
    * Add 2 bits to the open flags, deny_read and deny_write. (Use the existing open bits as the allow bits.) Also make sure these get propagated to the intent structure.
+
* Is this a new kernel operation?  Who is supposed to call it?  This needs a little better explanation.
-
    *
+
-
      Provide operation adjust_share(file, flags). FS should be allowed to refuse operations that could not result from open or close. (So, anything that doesn't only turn bits on or only turn them off.)
+
Is there a race here? E.g., say we open+create with a share lock. How do we decide whether to treat it as an upgrade or an open?
-
Is there a race here?: Say we can an open create with a share lock. How do we decide whether to treat it as an upgrade or an open?
+
* This issue needs to be explained a little better.
-
''Best attempt''
+
Note patches were posted for this at one point by Pavel Shilovsky; see https://lwn.net/Articles/581005/.  He gave up and as of this writing nobody's taken up the task since.
-
    * look up; upgrade if we find it.
+
==Another approach: best attempt==
-
    * open; if we get an error indicating a share conflict, retry the lookup. Etc.
+
-
Obviously not ideal. Would it help to get a reference on the dentry before trying the open?
+
* Issue a lookup.  If the file exists, then upgrade.
-
Is there currently a lookup/open race if the backend is a distributed filesystem? I suppose that's up to them--we need to look at how we implement open and make sure it does the intent stuff right. On a brief glance it looks to me like we probably don't.
+
* Someone please clarify "upgrade."
-
An alternative might be to expose something similar to the openowner to the vfs and let it decide (by comparing openowners) whether a given open is an upgrade or a new open.
+
* Otherwise open with implicit create.  If we get an error indicating a share conflict, retry the lookup.
-
''Status''
+
* But the subsequent upgrade (?) might fail.  Then what?
-
No progress to report.
+
This is obviously not ideal.
 +
 
 +
*  Would it help to get a reference on the dentry before trying the open?
 +
* Is there currently a lookup/open race if the backend is a distributed filesystem?  One way of looking at it is "that's up to them."  The client just needs to look at how we implement open and make sure it does the intent stuff right.
 +
 
 +
* A brief glance suggests that we probably don't.
 +
 
 +
An alternative might be to expose something along the lines of the [[open owner]] to the VFS and let it decide (by comparing open owners) whether a given open is an upgrade or a new open.
 +
 
 +
=Status=
 +
 
 +
Implementation awaits resolution of these issues.

Latest revision as of 15:16, 18 August 2021

Contents

Background

NFSv4 share reservations control the concurrent sharing of files at the time they are opened. Share reservations come in two flavors, ACCESS and DENY. There are three types of ACCESS reservations: READ, WRITE, and BOTH; and four types of DENY reservations: NONE, READ, WRITE, and BOTH.

ACCESS reservations are familiar to Linux users, as they map directly to posix open() flags. NFSv4 ACCESS shares of READ, WRITE, and BOTH map directly to O_RDONLY, O_WRONLY and O_RDWR, respectively.

NFSv4 DENY reservations act as a type of whole file lock applied when a file is opened. NFSv4 DENY shares of READ, WRITE, and BOTH prevent other opens with read, write, or any access from succeeding. DENY NONE allows other opens to proceed.

The Linux system call interface for open() follows the posix standard, which does not include support for share reservations. In particular, there is no direct analog in posix for an application to request DENY READ, WRITE, or BOTH shares. Consequently, Linux NFSv4 clients always use DENY NONE.

The mismatch between posix and NFSv4 shares is also reflected on an NFSv4 server. The Linux NFSv4 server that receives DENY reservations from clients that can express them, which in practice means Windows clients, does the appropriate bookeepping and enforcement, but the local filesystem is unable to enforce DENY shares for local access on the server.

When a cluster file system is exported with NFSv4, multiple NFSv4 servers export a common back-end file system, so ACCESS and DENY reservations must be distributed to take into account shares from other NFSv4 servers. In other words, the NFSv4 server has to ask the cluster file system if an incoming OPEN share can be granted.

DENY Share Support in Linux

Adding DENY share support to the Linux kernel faces several obstacles:

  • DENY shares are alien to posix, the Linux model for file systems.
  • There are currently no open Linux file systems that support DENY shares.
  • Linux and all other UNIX-like NFSv4 clients currently work correctly because they never request DENY access.
  • DENY shares do not meet the NFSv4 access needs of Linux clients, just Windows clients.
  • Not even off-the-shelf Windows clients benefit as NFSv4 for Windows is a third-party add-on (from Hummingbird).
  • The user level SAMBA server implements DENY shares with open and flock (albeit with the obvious race conditions), which obviates kernel support.

Implementation Issues

To enforce open share DENY access across the cluster back end is complicated, since an open with DENY must atomically lookup, (possibly) create, open, and lock the target file.

The Linux client atomically joins lookup, create, and open with lookup intents; the back end may have to do the same thing. The Linux client must also make the open and lock an atomic operation, but there is a problem: you can't lock that doesn't exist, so you must first create it. But as soon as the file is created, some other application might find it and lock it. Returning an error to an open that succeeding in creating a file is unexpected behavior.

Applying restrictive mode bits to the create won't always work, either, because another application might relax the mode restrictions and open the file.

This suggests that we add the share lock to the open call instead of making it a separate operation.

One approach: new flags for open()

  • Use existing O_RDONLY, O_WRONLY and O_RDWR open flags to implement O_ACCESS_READ, O_ACCESS_WRITE, and O_ACCESS_BOTH, respectively.
  • Add two open flags: O_DENY_READ and O_DENY_WRITE.
  • Propagate O_DENY flags to the intent structure.
  • Add operation adjust_share(file, flags). The file system should be allowed to refuse operations that could not result from open or close. (So, anything that doesn't only turn bits on or only turn them off.)
* Is this a new kernel operation?  Who is supposed to call it?  This needs a little better explanation.

Is there a race here? E.g., say we open+create with a share lock. How do we decide whether to treat it as an upgrade or an open?

* This issue needs to be explained a little better.

Note patches were posted for this at one point by Pavel Shilovsky; see https://lwn.net/Articles/581005/. He gave up and as of this writing nobody's taken up the task since.

Another approach: best attempt

  • Issue a lookup. If the file exists, then upgrade.
* Someone please clarify "upgrade."
  • Otherwise open with implicit create. If we get an error indicating a share conflict, retry the lookup.
* But the subsequent upgrade (?) might fail.  Then what?

This is obviously not ideal.

  • Would it help to get a reference on the dentry before trying the open?
  • Is there currently a lookup/open race if the backend is a distributed filesystem? One way of looking at it is "that's up to them." The client just needs to look at how we implement open and make sure it does the intent stuff right.
* A brief glance suggests that we probably don't.

An alternative might be to expose something along the lines of the open owner to the VFS and let it decide (by comparing open owners) whether a given open is an upgrade or a new open.

Status

Implementation awaits resolution of these issues.

Personal tools