MountNotes

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
Chucklever (Talk | contribs)
(New page: Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount? Passing just a string should be pretty darn easy. Al...)
Newer edit →

Revision as of 21:54, 15 August 2007

Should the kernel mount client be smart enough to sniff the remote server and tell what options are supported before trying to mount?

Passing just a string should be pretty darn easy. All that's needed is to drop in an "addr=" option -- mount.c already gets rid of the "MS_" related options for us.

TODO:

  1. break-back retries
  2. bg retries
  3. Support for IPv6
  4. Support for server failover options
  5. Better error reporting
  6. Mount server connection caching
  7. Remount processing

Oy. The rewrite to create mount.nfs and mount.nfs4 are starting to smell really bad. nfsmount() is returning a pointer to a structure on its own stack, then passing it to the kernel. The nfsmount() calling sequence is a mess. String handling in C is, um, interesting, and perverted.

Rebuilding it from the ground up is beginning to look very inviting. Add a couple of new directories under nfs-utils/utils -- new-mount.nfs and new-mount.nfs4. Throw away all the legacy stuff, all the sloppy code, and do it simply. Maybe even in Python (with a few C modules for invoking system calls directly).

A strategy might be: rewrite mount.c in Python. It's not long... Python has much nicer string handling than does C. And, many of the newer Red Hat utilities are writtin in Python.

OK, Zach sez: Python utilities are difficult to include on initrd... that's probably a reasonable argument not to use Python here.

I've cleaned up mount.c, and fixed the stack smashing problem.

May also want to rename it "mount.nfs.c". Add some lint/sparse tests to the Makefile?

nfs-utils is now maintained in a git repository on linux-nfs.org. I've cloned it locally, and initialized a stgit series in it to create a series of fixes based on the hacking I've already done.

I've noted that mounttype, which comes from util-linux's mount command, is basically not used in the nfs-utils version, but it should be or'd into "flags" by parse_opts(). I'll bet --bind mounts don't work at all.

I also see that "running_bg" is ignored in mount.nfs[4] -- perhaps background mounts don't work either? Probably not: nfsmount() is prevented from ever returning EX_BG. Maybe that's correct behavior for a stand-alone mount utility, but if so, why wasn't it removed wholesale?

Does "mount.nfs ... -o defaults" work? Do we need "mount.nfs -a" to work? Check with mount.ocfs2.

As I recalled, the -t option goes before <spec> <dir>, not after. And lo, the util-linux mount command does do it that way.

And does 'mount.nfs' support single parameter mounts such as "mount.nfs /home" ? There is logic to do this in there, but is it working right?

I can't build the git-ified source. Need to ask Steve. See if I can poke him about where the real git tree is (apparently it's not linux-nfs.org). Ah... there is a repo on neil.brown.name, but it looks identical to the one on linux-nfs.org.

Well. Steinar reports that util-linux's mount no longer supports NFS mounts, since there is a mount.nfs program provided by nfs-utils. However, mount.nfs is not installed by default because Neil doesn't think it's secure enough yet. So the world is stuck without an NFS mount program!

Looks like mount.nfs is worse off than I imagined. It absolutely needs to be fixed up before we can get in-kernel mount processing working.

Setting our own connect timeout

[ You need to call connect on a socket set to non-blocking mode with fcntl, and then use select with a timeout to limit the amount of time you will wait for the connect to complete. If select returns because you timed out, then close the socket and return an error. If select returns because of an event on the socket, you use getsockopt to determine if the connect succeeded or not.

See Stevens, Unix Network Programming Vol 1 for details. Comments in the code I'm looking at say page 411. ]

This is a non-bug of sorts... user-space TCP connects will time out after 75 seconds. However, it would be nicer if these timed out quicker, like say after 15 seconds.

I took a look at Solaris network behavior, just as a reference point. I specified "-o proto=tcp,vers=3".

1. It always uses UDP for GETPORT requests, for both MNT and NFS, mount and umount;

2. It always uses UDP for MNT protocol requests, for both mount and umount;

3. It does a MNT NULL request before the actual MNT call, for both mount and umount;

4. It does two separate NFS pings, on two separate TCP connections; probably one is from the mount command, and one from the kernel? Both use an ephemeral port rather than a privileged one.

5. The Solaris kernel appears to cache TCP connections to the server, so if there's already one, it will use it instead of opening a fresh one. I didn't see a NULL request on this connection (either when it already existed, or when the kernel had to create one).

Copy support for other mount options (quiet/loud, quota, user[s]) to kernel mount client.

The version/transport break-back code is not working. Need to poke at it more. Should it break back if GETPORT says the service is there but the server isn't responding, or should it break back only if GETPORT says use another version?

Also I should check why umount hangs when the server goes down. Is lazy umounting working? What does the --force option do exactly?

Why does a failed umount report the same error twice?

[root@monet ~]# umount /mnt -O mountport=891,proto=tcp umount.nfs: Server failed to unmount 'ingres:/export/fast' umount.nfs: Server failed to unmount 'ingres:/export/fast' [root@monet ~]

Need to get back to chasing down fg/bg behavior.

Developing some other ideas:

1. (generic NFS) Somehow, fail new RPCs immediately if the transport is in a state where it can't connect (ECONNREFUSED or EHOSTUNREACH).

2. (generic RPC) A control-C isn't cancelling all transport state. An interrupted "mount -o tcp" blocks a subsequent "mount -o udp" until the failed TCP connection attempt times out and clears. Probably what's happening here is that the RPC client's connect logic is attempting to re-use the port, then the ->connect() call is just going on with TCP again. The RPC client should force a different port if the new connect request doesn't use the same transport.

3. I should fix up rpcb_getport_sync() to use only UDP. Except, umount needs to work somehow through firewalls. That's fixed... but maybe GETPORT should try UDP first, then if it times out, try TCP.

4. Break-back should be done by looking at portmapper's whole database and figuring out which transports, versions, and programs are available. Steve says some Cisco routers depend on a real GETPORT to determine which ports to open.

5. If we absolutely need to do a GETPORT over TCP, why not do multiple GETPORTs on the same connection? Because you have to know what GETPORTs you want to do all at once... the RPC library isn't re-entrant; you can't leave a CLIENT open and open a second one.

6. Use the select() on a non-blocking connect() method described above to shorten the TCP connect time out in get_socket().

7. Support for user-only mount options in the kernel option parser -- [no]quota, [no]user, [no]users, and so on. See utils/mount/mount.c for more. Hmm. Maybe this isn't needed -- looks like mount.c already strips those off before sending the option string to the kernel.

Maybe a better strategy would be to remove support for the user-only options (like fg/bg) from the kernel, and make sure they are purged from the options string before I send them down.

8. add a t/ directory under utils/mount/ that contains a suite of tests similar to the eponymous directory in the git distribution. The tests can be done against an NFS server running on the same system. That way the tests can start and stop the server and issue iptables commands, without adding a local/remote complication. Maybe I could get Bull or CITI interested?

9. Mount support for nfs:// URLs

10. Implement a long option for mount.nfs for forcing string-ified mounts.

The purpose of rewriting nfs(5) is several-fold:

1. Provide correct and clear user documentation for NFS mount options,

2. Review the behavior of each mount option to make sure we agree on what each option does and why, in order to provide an opportunity for discussion and change of said behavior,

3. Act as a design specification process for both the user space and string-ified NFS mount process, and

4. Modernize the use of the markup macros and address typographic inconsistencies

Should add a "DISCUSSION" section to the man page that presents some background about how mount options interact with each other. What is a foreground mount versus a background mount? What does the v2/v3 mount process look like (GETPORT, MNT, NFS)? It might also be cool to cover how locking, open options such as O_DIRECT and O_SYNC, and ac/cto behave on NFS compared to local file systems. Should also carefully describe the behavior of sharedcache and nosharedcache. A discussion of security flavors...

Also expand the "EXAMPLES" section to provide recommendations for various scenarios. One example might be "noauto,users,nosuid".

Need to test mount.nfs's retry= behavior, as documented in nfs(5).

Need to check how nfs and nfs4 mount's behave for all combinations when the server's portmapper is unavailable, or when the port isn't in the portmapper database.

Mount's error messages just suck. One problem is the error messages are just wrong. Another is that errors are reported at too low a level: reporting that RPC program/version mismatch occurred is nonsense -- the error is "proto=udp" is not supported.

Personal tools