MountNotes

From Linux NFS

(Difference between revisions)

Revision as of 17:34, 21 August 2007

Initial impressions

Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount?

Passing just a string should be pretty darn easy. All that's needed is to drop in an "addr=" option -- mount.c already gets rid of the "MS_" related options for us.

TODO:

break-back retries
bg retries
Support for IPv6
Support for server failover options
Better error reporting
Mount server connection caching
Remount processing

Does "mount.nfs ... -o defaults" work? Do we need "mount.nfs -a" to work? Check with mount.ocfs2.

And does 'mount.nfs' support single parameter mounts such as "mount.nfs /home" ? There is logic to do this in there, but is it working right?

When does the mount command fail immediately, and when does it background itself? If "bg" is specified, do all errors cause the mount command to go into the background, even permanent errors? Is there a class of errors that should always fail immediately?

Should I implement the fallback logic first, before I construct the "bg" logic? If I don't, then a bad set of mount options will force a background mount that can't ever be satisfied.... But maybe that's the way it works already.

Obviously, the legacy mount will sort out bad mount options first, and not even try the mount request. Now that mount option parsing is in the kernel, the kernel has to return some error indicating that the mount options are bad, and that the mount shouldn't be retried. The kernel needs to distinguish between a retry-able and a non-retry-able mount failure. I wonder if Trond will object to return codes from mount(2) that are not listed in the man page? What does CIFS do?

Why isn't "bg" implemented for mount.nfs4 ?

Setting our own connect timeout

 You need to call connect on a socket set to non-blocking mode with fcntl,
 and then use select with a timeout to limit the amount of time you will
 wait for the connect to complete. If select returns because you timed out,
 then close the socket and return an error. If select returns because of
 an event on the socket, you use getsockopt to determine if the connect
 succeeded or not.

 See Stevens, Unix Network Programming Vol 1 for details. Comments in the
 code I'm looking at say page 411.

This is a non-bug of sorts... user-space TCP connects will time out after 75 seconds. However, it would be nicer if these timed out quicker, like say after 15 seconds.

Reference implementation

I took a look at Solaris network behavior, just as a reference point. I specified "-o proto=tcp,vers=3".

It always uses UDP for GETPORT requests, for both MNT and NFS, mount and umount;
It always uses rpcbind version 2 for IPv4 bind requests;
It always uses UDP for MNT protocol requests, for both mount and umount;
It does a MNT NULL request before the actual MNT call, for both mount and umount;
It does two separate NFS pings, on two separate TCP connections; probably one is from the mount command, and one from the kernel? Both use an ephemeral port rather than a privileged one.
The Solaris kernel appears to cache TCP connections to the server, so if there's already one, it will use it instead of opening a fresh one. I didn't see a NULL request on this connection (either when it already existed, or when the kernel had to create one).

Copy support for other mount options (quiet/loud, quota, user[s]) to kernel mount client.

The version/transport break-back code is not working. Need to poke at it more. Should it break back if GETPORT says the service is there but the server isn't responding, or should it break back only if GETPORT says use another version?

Also I should check why umount hangs when the server goes down. Is lazy umounting working? What does the --force option do exactly?

Why does a failed umount report the same error twice?

[root@monet ~]# umount /mnt -O mountport=891,proto=tcp
umount.nfs: Server failed to unmount 'ingres:/export/fast'
umount.nfs: Server failed to unmount 'ingres:/export/fast'
[root@monet ~]

Developing some other ideas

(generic NFS) Somehow, fail new RPCs immediately if the transport is in a state where it can't connect (ECONNREFUSED or EHOSTUNREACH).
(generic RPC) A control-C isn't cancelling all transport state. An interrupted "mount -o tcp" blocks a subsequent "mount -o udp" until the failed TCP connection attempt times out and clears. Probably what's happening here is that the RPC client's connect logic is attempting to re-use the port, then the ->connect() call is just going on with TCP again. The RPC client should force a different port if the new connect request doesn't use the same transport.
I should fix up rpcb_getport_sync() to use only UDP. Except, umount needs to work somehow through firewalls. That's fixed... but maybe GETPORT should try UDP first, then if it times out, try TCP.
Break-back should be done by looking at portmapper's whole database and figuring out which transports, versions, and programs are available. Steve says some Cisco routers depend on a real GETPORT to determine which ports to open.
If we absolutely need to do a GETPORT over TCP, why not do multiple GETPORTs on the same connection? Because you have to know what GETPORTs you want to do all at once... the RPC library isn't re-entrant; you can't leave a CLIENT open and open a second one.
Use the select() on a non-blocking connect() method described above to shorten the TCP connect time out in get_socket().
Support for user-only mount options in the kernel option parser -- [no]quota, [no]user, [no]users, and so on. See utils/mount/mount.c for more. Hmm. Maybe this isn't needed -- looks like mount.c already strips those off before sending the option string to the kernel. Maybe a better strategy would be to remove support for the user-only options (like fg/bg) from the kernel, and make sure they are purged from the options string before I send them down.
add a t/ directory under utils/mount/ that contains a suite of tests similar to the eponymous directory in the git distribution. The tests can be done against an NFS server running on the same system. That way the tests can start and stop the server and issue iptables commands, without adding a local/remote complication. Maybe I could get Bull or CITI interested?
Mount support for nfs:// URLs
Implement a long option for mount.nfs for forcing string-ified mounts.

Rewriting nfs(5)

The purpose of rewriting nfs(5) is several-fold:

Provide correct and clear user documentation for NFS mount options,
Review the behavior of each mount option to make sure we agree on what each option does and why, in order to provide an opportunity for discussion and change of said behavior,
Act as a design specification process for both the user space and string-ified NFS mount process, and
Modernize the use of the markup macros and address typographic inconsistencies

Should add a "DISCUSSION" section to the man page that presents some background about how mount options interact with each other. What is a foreground mount versus a background mount? What does the v2/v3 mount process look like (GETPORT, MNT, NFS)? It might also be cool to cover how locking, open options such as O_DIRECT and O_SYNC, and ac/cto behave on NFS compared to local file systems. Should also carefully describe the behavior of sharedcache and nosharedcache. A discussion of security flavors...

Also expand the "EXAMPLES" section to provide recommendations for various scenarios. One example might be "noauto,users,nosuid".

Need to test mount.nfs's retry= behavior, as documented in nfs(5).

Need to check how nfs and nfs4 mount's behave for all combinations when the server's portmapper is unavailable, or when the port isn't in the portmapper database.

Improving error reporting

Mount's error messages just suck. One problem is the error messages are just wrong. Another is that errors are reported at too low a level: reporting that RPC program/version mismatch occurred is nonsense -- the error is "proto=udp" is not supported.

Perhaps a clear error message can be reported to the command line, and a lot of detail should be reported in the system log? Well, that's easy enough with in-kernel mount option parsing!

i18n

Internationalization references and hints:

MountNotes

From Linux NFS

Revision as of 17:34, 21 August 2007

Contents

Initial impressions

Setting our own connect timeout

Reference implementation

Developing some other ideas

Rewriting nfs(5)

Improving error reporting

i18n

Views

Personal tools

Navigation

Search

Toolbox

@@ Line 22: / Line 22: @@
 Should I implement the fallback logic first, before I construct the "bg" logic?  If I don't, then a bad set of mount options will force a background mount that can't ever be satisfied.... But maybe that's the way it works already.
+Obviously, the legacy mount will sort out bad mount options first, and not even try the mount request.  Now that mount option parsing is in the kernel, the kernel has to return some error indicating that the mount options are bad, and that the mount shouldn't be retried.  The kernel needs to distinguish between a retry-able and a non-retry-able mount failure.  I wonder if Trond will object to return codes from mount(2) that are not listed in the man page?  What does CIFS do?
 Why isn't "bg" implemented for mount.nfs4 ?