MountNotes

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(New page: Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount? Passing just a string should be pretty darn easy. Al...)
m (mount(2) API return codes)
 
(13 intermediate revisions not shown)
Line 1: Line 1:
 +
== Initial impressions ==
 +
Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount?
Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount?
Line 13: Line 15:
# Remount processing
# Remount processing
-
Oy.  The rewrite to create mount.nfs and mount.nfs4 are starting to smell really badnfsmount() is returning a pointer to a structure on its own stack, then passing it to the kernel. The nfsmount() calling sequence is a mess. String handling in C is, um, interesting, and perverted.
+
Does "mount.nfs ... -o defaults" work? Do we need "mount.nfs -a" to work? Check with mount.ocfs2.
-
Rebuilding it from the ground up is beginning to look very inviting. Add a couple of new directories under nfs-utils/utils -- new-mount.nfs and new-mount.nfs4. Throw away all the legacy stuff, all the sloppy code, and do it simply.  Maybe even in Python (with a few C modules for invoking system calls directly).
+
And does 'mount.nfs' support single parameter mounts such as "mount.nfs /home" ? There is logic to do this in there, but is it working right?
-
A strategy might be: rewrite mount.c in Python.  It's not long... Python has much nicer string handling than does C. And, many of the newer Red Hat utilities are writtin in Python.
+
When does the mount command fail immediately, and when does it background itself? If "bg" is specified, do all errors cause the mount command to go into the background, even permanent errors?  Is there a class of errors that should always fail immediately?
-
OK, Zach sez: Python utilities are difficult to include on initrd... that's probably a reasonable argument not to use Python here.
+
Should I implement the fallback logic first, before I construct the "bg" logic?  If I don't, then a bad set of mount options will force a background mount that can't ever be satisfied.... But maybe that's the way it works already.
-
I've cleaned up mount.c, and fixed the stack smashing problem.
+
Obviously, the legacy mount will sort out bad mount options first, and not even try the mount request. Now that mount option parsing is in the kernel, the kernel has to return some error indicating that the mount options are bad, and that the mount shouldn't be retried. The kernel needs to distinguish between a retry-able and a non-retry-able mount failure.  I wonder if Trond will object to return codes from mount(2) that are not listed in the man page?  What does CIFS do?
-
May also want to rename it "mount.nfs.c".  Add some lint/sparse tests to the Makefile?
+
Why isn't "bg" implemented for mount.nfs4 ?
-
nfs-utils is now maintained in a git repository on linux-nfs.org.  I've cloned it locally, and initialized a stgit series in it to create a series of fixes based on the hacking I've already done.
+
== Setting our own connect timeout ==
-
I've noted that mounttype, which comes from util-linux's mount command, is basically not used in the nfs-utils version, but it should be or'd into "flags" by parse_opts(). I'll bet --bind mounts don't work at all.
+
  You need to call connect on a socket set to non-blocking mode with fcntl,
 +
  and then use select with a timeout to limit the amount of time you will
 +
  wait for the connect to complete. If select returns because you timed out,
 +
  then close the socket and return an error. If select returns because of
 +
  an event on the socket, you use getsockopt to determine if the connect
 +
  succeeded or not.
-
I also see that "running_bg" is ignored in mount.nfs[4] -- perhaps background mounts don't work either?  Probably not: nfsmount() is prevented from ever returning EX_BG. Maybe that's correct behavior for a stand-alone mount utility, but if so, why wasn't it removed wholesale?
+
  See Stevens, Unix Network Programming Vol 1 for details. Comments in the
 +
  code I'm looking at say page 411.
-
Does "mount.nfs ... -o defaults" work?  Do we need "mount.nfs -a" to work? Check with mount.ocfs2.
+
This is a non-bug of sorts... user-space TCP connects will time out after 75 secondsHowever, it would be nicer if these timed out quicker, like say after 15 seconds.
-
As I recalled, the -t option goes before <spec> <dir>, not after.  And lo, the util-linux mount command does do it that way.
+
== Reference implementation ==
-
And does 'mount.nfs' support single parameter mounts such as "mount.nfs /home" ? There is logic to do this in there, but is it working right?
+
I took a look at Solaris network behavior, just as a reference pointI specified "-o proto=tcp,vers=3".
-
I can't build the git-ified source. Need to ask Steve. See if I can poke him about where the real git tree is (apparently it's not linux-nfs.org). Ah... there is a repo on neil.brown.name, but it looks identical to the one on linux-nfs.org.
+
# It always uses UDP for GETPORT requests, for both MNT and NFS, mount and umount;
 +
# It always uses rpcbind version 2 for IPv4 bind requests;
 +
# It always uses UDP for MNT protocol requests, for both mount and umount;
 +
# It does a MNT NULL request before the actual MNT call, for both mount and umount;
 +
# It does two separate NFS pings, on two separate TCP connections; probably one is from the mount command, and one from the kernel? Both use an ephemeral port rather than a privileged one.
 +
# The Solaris kernel appears to cache TCP connections to the server, so if there's already one, it will use it instead of opening a fresh one. I didn't see a NULL request on this connection (either when it already existed, or when the kernel had to create one).
-
Well.  Steinar reports that util-linux's mount no longer supports NFS mounts, since there is a mount.nfs program provided by nfs-utils.  However, mount.nfs is not installed by default because Neil doesn't think it's secure enough yet.  So the world is stuck without an NFS mount program!
+
Copy support for other mount options (quiet/loud, quota, user[s]) to kernel mount client.
-
Looks like mount.nfs is worse off than I imaginedIt absolutely needs to be fixed up before we can get in-kernel mount processing working.
+
The version/transport break-back code is not workingNeed to poke at it more. Should it break back if GETPORT says the service is there but the server isn't responding, or should it break back only if GETPORT says use another version?
-
Setting our own connect timeout
+
Also I should check why umount hangs when the server goes down.  Is lazy umounting working?  What does the --force option do exactly?
-
[ You need to call connect on a socket set to non-blocking mode with fcntl, and then use select with a timeout to limit the amount of time you will wait for the connect to complete. If select returns because you timed out, then close the socket and return an error. If select returns because of an event on the socket, you use getsockopt to determine if the connect succeeded or not.
+
Why does a failed umount report the same error twice?
-
See Stevens, Unix Network Programming Vol 1 for details. Comments in the code I'm looking at say page 411. ]
+
[root@monet ~]# umount /mnt -O mountport=891,proto=tcp
 +
umount.nfs: Server failed to unmount 'ingres:/export/fast'
 +
umount.nfs: Server failed to unmount 'ingres:/export/fast'
 +
[root@monet ~]
-
This is a non-bug of sorts... user-space TCP connects will time out after 75 seconds.  However, it would be nicer if these timed out quicker, like say after 15 seconds.
+
== Developing some other ideas ==
-
I took a look at Solaris network behavior, just as a reference pointI specified "-o proto=tcp,vers=3".
+
# (generic NFS) Somehow, fail new RPCs immediately if the transport is in a state where it can't connect (ECONNREFUSED or EHOSTUNREACH).
 +
# (generic RPC) A control-C isn't cancelling all transport stateAn interrupted "mount -o tcp" blocks a subsequent "mount -o udp" until the failed TCP connection attempt times out and clears.  Probably what's happening here is that the RPC client's connect logic is attempting to re-use the port, then the ->connect() call is just going on with TCP again.  The RPC client should force a different port if the new connect request doesn't use the same transport.
 +
# I should fix up rpcb_getport_sync() to use only UDP.  Except, umount needs to work somehow through firewalls.  That's fixed... but maybe GETPORT should try UDP first, then if it times out, try TCP.
 +
# Break-back should be done by looking at portmapper's whole database and figuring out which transports, versions, and programs are available.  Steve says some Cisco routers depend on a real GETPORT to determine which ports to open.
 +
# If we absolutely need to do a GETPORT over TCP, why not do multiple GETPORTs on the same connection?  Because you have to know what GETPORTs you want to do all at once... the RPC library isn't re-entrant; you can't leave a CLIENT open and open a second one.
 +
# Use the select() on a non-blocking connect() method described above to shorten the TCP connect time out in get_socket().
 +
# Support for user-only mount options in the kernel option parser -- [no]quota, [no]user, [no]users, and so on.  See utils/mount/mount.c for more.  Hmm. Maybe this isn't needed -- looks like mount.c already strips those off before sending the option string to the kernel.  Maybe a better strategy would be to remove support for the user-only options (like fg/bg) from the kernel, and make sure they are purged from the options string before I send them down.
 +
# add a t/ directory under utils/mount/ that contains a suite of tests similar to the eponymous directory in the git distribution.  The tests can be done against an NFS server running on the same system.  That way the tests can start and stop the server and issue iptables commands, without adding a local/remote complication.  Maybe I could get Bull or CITI interested?
 +
# Mount support for nfs:// URLs
 +
# Implement a long option for mount.nfs for forcing string-ified mounts.
-
1.  It always uses UDP for GETPORT requests, for both MNT and NFS, mount and umount;
+
== Rewriting nfs(5) ==
-
2.  It always uses UDP for MNT protocol requests, for both mount and umount;
+
The purpose of rewriting nfs(5) is several-fold:
-
3.  It does a MNT NULL request before the actual MNT call, for both mount and umount;
+
# Provide correct and clear user documentation for NFS mount options,
 +
# Review the behavior of each mount option to make sure we agree on what each option does and why, in order to provide an opportunity for discussion and change of said behavior,
 +
# Act as a design specification process for both the user space and string-ified NFS mount process, and
 +
# Modernize the use of the markup macros and address typographic inconsistencies
-
4It does two separate NFS pings, on two separate TCP connections; probably one is from the mount command, and one from the kernel? Both use an ephemeral port rather than a privileged one.
+
Should add a "DISCUSSION" section to the man page that presents some background about how mount options interact with each otherWhat is a foreground mount versus a background mount?  What does the v2/v3 mount process look like (GETPORT, MNT, NFS)?  It might also be cool to cover how locking, open options such as O_DIRECT and O_SYNC, and ac/cto behave on NFS compared to local file systems.  Should also carefully describe the behavior of sharedcache and nosharedcache. A discussion of security flavors...
-
5The Solaris kernel appears to cache TCP connections to the server, so if there's already one, it will use it instead of opening a fresh one.  I didn't see a NULL request on this connection (either when it already existed, or when the kernel had to create one).
+
Also expand the "EXAMPLES" section to provide recommendations for various scenariosOne example might be "noauto,users,nosuid".
-
Copy support for other mount options (quiet/loud, quota, user[s]) to kernel mount client.
+
Need to test mount.nfs's retry= behavior, as documented in nfs(5).
-
The version/transport break-back code is not working.  Need to poke at it more.  Should it break back if GETPORT says the service is there but the server isn't responding, or should it break back only if GETPORT says use another version?
+
Need to check how nfs and nfs4 mount's behave for all combinations when the server's portmapper is unavailable, or when the port isn't in the portmapper database.
-
Also I should check why umount hangs when the server goes down.  Is lazy umounting working?  What does the --force option do exactly?
+
== Improving error reporting ==
-
Why does a failed umount report the same error twice?
+
Mount's error messages just suck.  One problem is the error messages are just wrong.  Another is that errors are reported at too low a level:  reporting that RPC program/version mismatch occurred is nonsense -- the error is "proto=udp" is not supported.
-
[root@monet ~]# umount /mnt -O mountport=891,proto=tcp
+
Perhaps a clear error message can be reported to the command line, and a lot of detail should be reported in the system log?  Well, that's easy enough with in-kernel mount option parsing!
-
umount.nfs: Server failed to unmount 'ingres:/export/fast'
+
-
umount.nfs: Server failed to unmount 'ingres:/export/fast'
+
-
[root@monet ~]
+
-
Need to get back to chasing down fg/bg behavior.
+
=== mount(2) API return codes ===
-
Developing some other ideas:
+
The mount.nfs program needs to distinguish between temporary problems and permanent errors in order to determine whether it's worth retrying the mount request in the background.  I'm still unsure whether the version/protocol fallback mechanism should occur in user space or in the kernel -- certainly policy would be easier to set and implement in user space, but then the kernel would need to provide specific information about how a mount request failed so that user space could make an appropriate choice about the next step to try.
-
1.  (generic NFS) Somehow, fail new RPCs immediately if the transport is in a state where it can't connect (ECONNREFUSED or EHOSTUNREACH).
+
The current mount(2) API is described in a man page.  The man page describes a set of generic error return codes, which we excerpt here.  It also suggests that we can add specific error codes for NFS mounts.
-
2. (generic RPC) A control-C isn't cancelling all transport state. An interrupted "mount -o tcp" blocks a subsequent "mount -o udp" until the failed TCP connection attempt times out and clearsProbably what's happening here is that the RPC client's connect logic is attempting to re-use the port, then the ->connect() call is just going on with TCP again.  The RPC client should force a different port if the new connect request doesn't use the same transport.
+
<pre>
 +
RETURN VALUE
 +
      On success, zero is returnedOn error, -1 is returned, and errno is
 +
      set appropriately.
-
3. I should fix up rpcb_getport_sync() to use only UDP. Except, umount needs to work somehow through firewallsThat's fixed... but maybe GETPORT should try UDP first, then if it times out, try TCP.
+
ERRORS
 +
      The error values given below result from filesystem type  independent
 +
      errorsEach  filesystem  type may have its own special errors and its
 +
      own special behavior. See the kernel source code for details.
-
4. Break-back should be done by looking at portmapper's whole database and figuring out which transports, versions, and programs are availableSteve says some Cisco routers depend on a real GETPORT to determine which ports to open.
+
      EACCES A component of a path was not searchable. (See also path_resolu-
 +
              tion(2).)  Or, mounting  a  read-only filesystem was attempted
 +
              without giving the MS_RDONLY flagOr, the block device  source
 +
              is located on a filesystem mounted with the MS_NODEV option.
-
5.  If we absolutely need to do a GETPORT over TCP, why not do multiple GETPORTs on the same connection?  Because you have to know what GETPORTs you want to do all at once... the RPC library isn't re-entrant; you can't leave a CLIENT open and open a second one.
+
      EAGAIN A call to umount2() specifying MNT_EXPIRE successfully marked an
 +
              unbusy file system as expired.
-
6Use the select() on a non-blocking connect() method described above to shorten the TCP connect time out in get_socket().
+
      EBUSY  source is already mounted. Or, it cannot be remounted read-only,
 +
              because it still holds files open for writingOr, it cannot be
 +
              mounted on target because target is still busy (it is the work-
 +
              ing  directory  of some task, the mount point of another device,
 +
              has open files, etc.).  Or, it could not be unmounted because it
 +
              is busy.
-
7. Support for user-only mount options in the kernel option parser -- [no]quota, [no]user, [no]users, and so on. See utils/mount/mount.c for more. Hmm. Maybe this isn't needed -- looks like mount.c already strips those off before sending the option string to the kernel.
+
      EFAULT One  of the  pointer arguments points outside the user address
 +
              space.
-
Maybe a better strategy would be to remove support for the user-only options (like fg/bg) from the kernel, and make sure they are purged from the options string before I send them down.
+
      EINVAL source had an invalid superblock.  Or,  a remount  (MS_REMOUNT)
 +
              was  attempted, but  source  was not already mounted on target.
 +
              Or, a move (MS_MOVE) was attempted, but source was not  a  mount
 +
              point, or was ’/’.  Or, an unmount was attempted, but target was
 +
              not a mount point.  Or, umount2() was called with MNT_EXPIRE and
 +
              either MNT_DETACH or MNT_FORCE.
-
8. add a t/ directory under utils/mount/ that contains a suite of tests similar to the eponymous directory in the git distribution. The tests can be done against an NFS server running on the same systemThat way the tests can start and stop the server and issue iptables commands, without adding a local/remote complication. Maybe I could get Bull or CITI interested?
+
      ELOOP Too many  link  encountered  during pathname resolutionOr, a
 +
              move was attempted, while target is a descendant of source.
-
9.  Mount support for nfs:// URLs
+
      EMFILE (In case no block device is required:) Table of dummy devices is
 +
              full.
-
10.  Implement a long option for mount.nfs for forcing string-ified mounts.
+
      ENAMETOOLONG
 +
              A pathname was longer than MAXPATHLEN.
-
The purpose of rewriting nfs(5) is several-fold:
+
      ENODEV filesystemtype not configured in the kernel.
-
1. Provide correct and clear user documentation for NFS mount options,
+
      ENOENT A pathname was empty or had a nonexistent component.
-
2. Review the behavior of each mount option to make sure we agree on what each option does and why, in order to provide an opportunity for discussion and change of said behavior,
+
      ENOMEM The kernel  could not allocate a free page to copy filenames or
 +
              data into.
-
3.  Act as a design specification process for both the user space and string-ified NFS mount process, and
+
      ENOTBLK
 +
              source is not a block device (and a device was required).
-
4.  Modernize the use of the markup macros and address typographic inconsistencies
+
      ENOTDIR
 +
              The second argument, or a prefix of the first argument, is not a
 +
              directory.
-
Should add a "DISCUSSION" section to the man page that presents some background about how mount options interact with each other.  What is a foreground mount versus a background mount?  What does the v2/v3 mount process look like (GETPORT, MNT, NFS)?  It might also be cool to cover how locking, open options such as O_DIRECT and O_SYNC, and ac/cto behave on NFS compared to local file systems.  Should also carefully describe the behavior of sharedcache and nosharedcache.  A discussion of security flavors...
+
      ENXIO  The major number of the block device source is out of range.
-
Also expand the "EXAMPLES" section to provide recommendations for various scenarios.  One example might be "noauto,users,nosuid".
+
      EPERM  The caller does not have the required privileges.
 +
</pre>
-
Need to test mount.nfs's retry= behavior, as documented in nfs(5).
+
Here are some additional return codes I recommend for NFS mounts, just as a start.  These should allow a calling program to report a reasonably specific error message, and decide whether and how to retry the request.
-
Need to check how nfs and nfs4 mount's behave for all combinations when the server's portmapper is unavailable, or when the port isn't in the portmapper database.
+
<pre>
 +
      EBADF  The mount option  string was not able to be parsed, or an unre-
 +
              cognized option was specified, or a keyword option was specified
 +
              with a value that is out of range.
 +
</pre>
-
Mount's error messages just suckOne problem is the error messages are just wrongAnother is that errors are reported at too low a level: reporting that RPC program/version mismatch occurred is nonsense -- the error is "proto=udp" is not supported.
+
This is a permanent mount error.  The calling program should not retry this request with the same options.
 +
 
 +
<pre>
 +
      ESTALE The server denied access to the requested share.
 +
 
 +
      ETIMEDOUT
 +
              The kernel's mount attempt timed out after n seconds (I think n
 +
              is 15).
 +
</pre>
 +
 
 +
These are temporary errors. The calling program may choose to retry this request using the same options, or fail immediately.
 +
 
 +
<pre>
 +
      EPROTONOSUPPORT
 +
              The server reports that the program, version,  or transport pro-
 +
              tocol is not currently available.
 +
 
 +
      ECONNREFUSED
 +
              The kernel's mount connection  attempt was refused by the server
 +
              at the network transport layer.
 +
</pre>
 +
 
 +
These are temporary errors.  The calling program can attempt to recover by adjusting the options and retrying the request.
 +
 
 +
== i18n ==
 +
 
 +
Internationalization references and hints:
 +
 
 +
* [http://www.suodenjoki.dk/us/productions/articles/localization.htm localization hints]
 +
* [http://developers.sun.com/dev/gadc/educationtutorial/creference/locale/locale.html Sun C Internationalization Reference]
 +
* [http://www.chemie.fu-berlin.de/chemnet/use/info/libc/libc_19.html another set of hints]
 +
* [http://www-106.ibm.com/developerworks/unicode/library/l-linuni.html?dwzone=unicode Linux Unicode programming]

Latest revision as of 22:04, 21 August 2007

Contents

Initial impressions

Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount?

Passing just a string should be pretty darn easy. All that's needed is to drop in an "addr=" option -- mount.c already gets rid of the "MS_" related options for us.

TODO:

  1. break-back retries
  2. bg retries
  3. Support for IPv6
  4. Support for server failover options
  5. Better error reporting
  6. Mount server connection caching
  7. Remount processing

Does "mount.nfs ... -o defaults" work? Do we need "mount.nfs -a" to work? Check with mount.ocfs2.

And does 'mount.nfs' support single parameter mounts such as "mount.nfs /home" ? There is logic to do this in there, but is it working right?

When does the mount command fail immediately, and when does it background itself? If "bg" is specified, do all errors cause the mount command to go into the background, even permanent errors? Is there a class of errors that should always fail immediately?

Should I implement the fallback logic first, before I construct the "bg" logic? If I don't, then a bad set of mount options will force a background mount that can't ever be satisfied.... But maybe that's the way it works already.

Obviously, the legacy mount will sort out bad mount options first, and not even try the mount request. Now that mount option parsing is in the kernel, the kernel has to return some error indicating that the mount options are bad, and that the mount shouldn't be retried. The kernel needs to distinguish between a retry-able and a non-retry-able mount failure. I wonder if Trond will object to return codes from mount(2) that are not listed in the man page? What does CIFS do?

Why isn't "bg" implemented for mount.nfs4 ?

Setting our own connect timeout

 You need to call connect on a socket set to non-blocking mode with fcntl,
 and then use select with a timeout to limit the amount of time you will
 wait for the connect to complete. If select returns because you timed out,
 then close the socket and return an error. If select returns because of
 an event on the socket, you use getsockopt to determine if the connect
 succeeded or not.
 See Stevens, Unix Network Programming Vol 1 for details. Comments in the
 code I'm looking at say page 411.

This is a non-bug of sorts... user-space TCP connects will time out after 75 seconds. However, it would be nicer if these timed out quicker, like say after 15 seconds.

Reference implementation

I took a look at Solaris network behavior, just as a reference point. I specified "-o proto=tcp,vers=3".

  1. It always uses UDP for GETPORT requests, for both MNT and NFS, mount and umount;
  2. It always uses rpcbind version 2 for IPv4 bind requests;
  3. It always uses UDP for MNT protocol requests, for both mount and umount;
  4. It does a MNT NULL request before the actual MNT call, for both mount and umount;
  5. It does two separate NFS pings, on two separate TCP connections; probably one is from the mount command, and one from the kernel? Both use an ephemeral port rather than a privileged one.
  6. The Solaris kernel appears to cache TCP connections to the server, so if there's already one, it will use it instead of opening a fresh one. I didn't see a NULL request on this connection (either when it already existed, or when the kernel had to create one).

Copy support for other mount options (quiet/loud, quota, user[s]) to kernel mount client.

The version/transport break-back code is not working. Need to poke at it more. Should it break back if GETPORT says the service is there but the server isn't responding, or should it break back only if GETPORT says use another version?

Also I should check why umount hangs when the server goes down. Is lazy umounting working? What does the --force option do exactly?

Why does a failed umount report the same error twice?

[root@monet ~]# umount /mnt -O mountport=891,proto=tcp
umount.nfs: Server failed to unmount 'ingres:/export/fast'
umount.nfs: Server failed to unmount 'ingres:/export/fast'
[root@monet ~]

Developing some other ideas

  1. (generic NFS) Somehow, fail new RPCs immediately if the transport is in a state where it can't connect (ECONNREFUSED or EHOSTUNREACH).
  2. (generic RPC) A control-C isn't cancelling all transport state. An interrupted "mount -o tcp" blocks a subsequent "mount -o udp" until the failed TCP connection attempt times out and clears. Probably what's happening here is that the RPC client's connect logic is attempting to re-use the port, then the ->connect() call is just going on with TCP again. The RPC client should force a different port if the new connect request doesn't use the same transport.
  3. I should fix up rpcb_getport_sync() to use only UDP. Except, umount needs to work somehow through firewalls. That's fixed... but maybe GETPORT should try UDP first, then if it times out, try TCP.
  4. Break-back should be done by looking at portmapper's whole database and figuring out which transports, versions, and programs are available. Steve says some Cisco routers depend on a real GETPORT to determine which ports to open.
  5. If we absolutely need to do a GETPORT over TCP, why not do multiple GETPORTs on the same connection? Because you have to know what GETPORTs you want to do all at once... the RPC library isn't re-entrant; you can't leave a CLIENT open and open a second one.
  6. Use the select() on a non-blocking connect() method described above to shorten the TCP connect time out in get_socket().
  7. Support for user-only mount options in the kernel option parser -- [no]quota, [no]user, [no]users, and so on. See utils/mount/mount.c for more. Hmm. Maybe this isn't needed -- looks like mount.c already strips those off before sending the option string to the kernel. Maybe a better strategy would be to remove support for the user-only options (like fg/bg) from the kernel, and make sure they are purged from the options string before I send them down.
  8. add a t/ directory under utils/mount/ that contains a suite of tests similar to the eponymous directory in the git distribution. The tests can be done against an NFS server running on the same system. That way the tests can start and stop the server and issue iptables commands, without adding a local/remote complication. Maybe I could get Bull or CITI interested?
  9. Mount support for nfs:// URLs
  10. Implement a long option for mount.nfs for forcing string-ified mounts.

Rewriting nfs(5)

The purpose of rewriting nfs(5) is several-fold:

  1. Provide correct and clear user documentation for NFS mount options,
  2. Review the behavior of each mount option to make sure we agree on what each option does and why, in order to provide an opportunity for discussion and change of said behavior,
  3. Act as a design specification process for both the user space and string-ified NFS mount process, and
  4. Modernize the use of the markup macros and address typographic inconsistencies

Should add a "DISCUSSION" section to the man page that presents some background about how mount options interact with each other. What is a foreground mount versus a background mount? What does the v2/v3 mount process look like (GETPORT, MNT, NFS)? It might also be cool to cover how locking, open options such as O_DIRECT and O_SYNC, and ac/cto behave on NFS compared to local file systems. Should also carefully describe the behavior of sharedcache and nosharedcache. A discussion of security flavors...

Also expand the "EXAMPLES" section to provide recommendations for various scenarios. One example might be "noauto,users,nosuid".

Need to test mount.nfs's retry= behavior, as documented in nfs(5).

Need to check how nfs and nfs4 mount's behave for all combinations when the server's portmapper is unavailable, or when the port isn't in the portmapper database.

Improving error reporting

Mount's error messages just suck. One problem is the error messages are just wrong. Another is that errors are reported at too low a level: reporting that RPC program/version mismatch occurred is nonsense -- the error is "proto=udp" is not supported.

Perhaps a clear error message can be reported to the command line, and a lot of detail should be reported in the system log? Well, that's easy enough with in-kernel mount option parsing!

mount(2) API return codes

The mount.nfs program needs to distinguish between temporary problems and permanent errors in order to determine whether it's worth retrying the mount request in the background. I'm still unsure whether the version/protocol fallback mechanism should occur in user space or in the kernel -- certainly policy would be easier to set and implement in user space, but then the kernel would need to provide specific information about how a mount request failed so that user space could make an appropriate choice about the next step to try.

The current mount(2) API is described in a man page. The man page describes a set of generic error return codes, which we excerpt here. It also suggests that we can add specific error codes for NFS mounts.

RETURN VALUE
       On  success,  zero is returned.  On error, -1 is returned, and errno is
       set appropriately.

ERRORS
       The error values given below result from  filesystem  type  independent
       errors.  Each  filesystem  type may have its own special errors and its
       own special behavior.  See the kernel source code for details.

       EACCES A component of a path was not searchable. (See also path_resolu-
              tion(2).)   Or,  mounting  a  read-only filesystem was attempted
              without giving the MS_RDONLY flag.  Or, the block device  source
              is located on a filesystem mounted with the MS_NODEV option.

       EAGAIN A call to umount2() specifying MNT_EXPIRE successfully marked an
              unbusy file system as expired.

       EBUSY  source is already mounted. Or, it cannot be remounted read-only,
              because it still holds files open for writing.  Or, it cannot be
              mounted on target because target is still busy (it is the  work-
              ing  directory  of some task, the mount point of another device,
              has open files, etc.).  Or, it could not be unmounted because it
              is busy.

       EFAULT One  of  the  pointer  arguments points outside the user address
              space.

       EINVAL source had an invalid superblock.  Or,  a  remount  (MS_REMOUNT)
              was  attempted,  but  source  was not already mounted on target.
              Or, a move (MS_MOVE) was attempted, but source was not  a  mount
              point, or was ’/’.  Or, an unmount was attempted, but target was
              not a mount point.  Or, umount2() was called with MNT_EXPIRE and
              either MNT_DETACH or MNT_FORCE.

       ELOOP  Too  many  link  encountered  during pathname resolution.  Or, a
              move was attempted, while target is a descendant of source.

       EMFILE (In case no block device is required:) Table of dummy devices is
              full.

       ENAMETOOLONG
              A pathname was longer than MAXPATHLEN.

       ENODEV filesystemtype not configured in the kernel.

       ENOENT A pathname was empty or had a nonexistent component.

       ENOMEM The  kernel  could not allocate a free page to copy filenames or
              data into.

       ENOTBLK
              source is not a block device (and a device was required).

       ENOTDIR
              The second argument, or a prefix of the first argument, is not a
              directory.

       ENXIO  The major number of the block device source is out of range.

       EPERM  The caller does not have the required privileges.

Here are some additional return codes I recommend for NFS mounts, just as a start. These should allow a calling program to report a reasonably specific error message, and decide whether and how to retry the request.

       EBADF  The mount option  string was not able to be parsed,  or an unre-
              cognized option was specified, or a keyword option was specified
              with a value that is out of range.

This is a permanent mount error. The calling program should not retry this request with the same options.

       ESTALE The server denied access to the requested share.

       ETIMEDOUT
              The kernel's mount attempt timed out after n seconds  (I think n
              is 15).

These are temporary errors. The calling program may choose to retry this request using the same options, or fail immediately.

       EPROTONOSUPPORT
              The server reports that the program, version,  or transport pro-
              tocol is not currently available.

       ECONNREFUSED
              The kernel's mount connection  attempt was refused by the server
              at the network transport layer.

These are temporary errors. The calling program can attempt to recover by adjusting the options and retrying the request.

i18n

Internationalization references and hints:

Personal tools