NFS for AFS users

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(volume management and migration)
(volume management and migration)
Line 17: Line 17:
# You don't need to know where the fileservers are that are hosting a volume.  There's a layer of Volume Location servers that tell you where they are.  A client contacts them occasionally - or when the fileserver returns an abort saying that the volume has moved - to keep tabs on the current hosts of a volume.
# You don't need to know where the fileservers are that are hosting a volume.  There's a layer of Volume Location servers that tell you where they are.  A client contacts them occasionally - or when the fileserver returns an abort saying that the volume has moved - to keep tabs on the current hosts of a volume.
-
# A volume may be hosted on multiple servers - a client can use any of them and may use multiple servers simultaneously.  At the moment, there can only be one RW version of a volume, but multiple RO versions.  The RO versions - which all have to be identical - can be thought of as clones of a snapshot of the RW volume.  These do not have to be co-located with each other.
+
# A volume may be hosted on multiple servers - a client can use any of them and may use multiple servers simultaneously.  At the moment, there can only be one RW version of a volume, but multiple RO versions.  The RO versions - which all have to be identical - can be thought of as clones of a snapshot of the RW volume.  These do not have to be co-located with each other. <br/><br/>There is also a 'backup' volume version which is just, say, a daily temporary read-only snapshot of a RW volume and has to be located on the same machine.  That would be easy to support in btrfs.
-
 
+
-
:There is also a 'backup' volume version which is just, say, a daily temporary read-only snapshot of a RW volume and has to be located on the same machine.  That would be easy to support in btrfs.
+
# When a RW volume is "released" (snapshotted) to the RO volumes, all the RO volumes update simultaneously and atomically.  The users, in theory, don't notice as the volumes don't go offline - and then they see all the changes happen at once.  There is coordination handling for when one or more of the fileservers or the VL servers are offline.</p>
# When a RW volume is "released" (snapshotted) to the RO volumes, all the RO volumes update simultaneously and atomically.  The users, in theory, don't notice as the volumes don't go offline - and then they see all the changes happen at once.  There is coordination handling for when one or more of the fileservers or the VL servers are offline.</p>
-
# Volumes can be migrated between machines whilst in active use without the user in theory noticing anything.
+
# Volumes can be migrated between machines whilst in active use without the user in theory noticing anything. <br/><br/>This is fairly easy to achieve for RO volumes since multiple servers serving the same data can just add one more server of the data with no problem - it's the migration of live RW volumes that's the real trick, and I don't know how this is actually done in OpenAFS.<br/><br/>There are moves afoot to add multi-hosted RW volumes, but I'm not sure how that'll work, and may involve Ceph integration.  But it's not there yet.
-
 
+
-
:This is fairly easy to achieve for RO volumes since multiple servers serving the same data can just add one more server of the data with no problem - it's the migration of live RW volumes that's the real trick, and I don't know how this is actually done in OpenAFS.
+
-
 
+
-
:There are moves afoot to add multi-hosted RW volumes, but I'm not sure how that'll work, and may involve Ceph integration.  But it's not there yet.
+
In fact, logical volumes is something they particularly like.  You can make a volume for a purpose; give particular people access to it, give it some
In fact, logical volumes is something they particularly like.  You can make a volume for a purpose; give particular people access to it, give it some

Revision as of 17:00, 13 July 2020

This page tracks some of the obstacles that might keep an AFS user from using NFS instead.

Contents

Missing Features

volume management and migration

For migration we need to preserve filehandles, so need to migrate at the block level or using fs-specific send/receive. The protocol can be handled by migrating only entire servers or containers, so that migration can be treated as a server reboot.

Between LVM and knfsd, we have a lot of the necessary pieces, but there's at a minimum a lot of tooling and documentation to write before this is usable.

An explanation of the AFS features from David Howells:

I think this is probably the main feature of AFS that people particularly like. There's a volume indirection layer, if you will. It provides a number of pieces:

  1. You don't need to know where the fileservers are that are hosting a volume. There's a layer of Volume Location servers that tell you where they are. A client contacts them occasionally - or when the fileserver returns an abort saying that the volume has moved - to keep tabs on the current hosts of a volume.
  1. A volume may be hosted on multiple servers - a client can use any of them and may use multiple servers simultaneously. At the moment, there can only be one RW version of a volume, but multiple RO versions. The RO versions - which all have to be identical - can be thought of as clones of a snapshot of the RW volume. These do not have to be co-located with each other.

    There is also a 'backup' volume version which is just, say, a daily temporary read-only snapshot of a RW volume and has to be located on the same machine. That would be easy to support in btrfs.
  1. When a RW volume is "released" (snapshotted) to the RO volumes, all the RO volumes update simultaneously and atomically. The users, in theory, don't notice as the volumes don't go offline - and then they see all the changes happen at once. There is coordination handling for when one or more of the fileservers or the VL servers are offline.</p>
  1. Volumes can be migrated between machines whilst in active use without the user in theory noticing anything.

    This is fairly easy to achieve for RO volumes since multiple servers serving the same data can just add one more server of the data with no problem - it's the migration of live RW volumes that's the real trick, and I don't know how this is actually done in OpenAFS.

    There are moves afoot to add multi-hosted RW volumes, but I'm not sure how that'll work, and may involve Ceph integration. But it's not there yet.

In fact, logical volumes is something they particularly like. You can make a volume for a purpose; give particular people access to it, give it some storage, expand and contract it and move it around. It's intrinsically quota'd.

PAGS

PAGs: AFS allows a group of processes to share a common identity, different from the local uid, for the purposes of accessing an AFS filesystem: https://docs.openafs.org/AdminGuide/HDRWQ63.html

Dave Howells says: "This is why I added session keyrings. You can run a process in a new keyring and give it new tokens. systemd kind of stuck a spike in that, though, by doing their own incompatible thing with their user manager service....

NFS would need to do what the in-kernel AFS client does and call request_key() on entry to each filesystem method that doesn't take a file* and use that to cache the credentials it is using. If there is no key, it can make one up on the spot and stick the uid/gid/groups in there. This would then need to be handed down to the sunrpc protocol to define the security creds to use.

The key used to open a file would then need to be cached in the file struct private data."

ACLs

NFSv4 has ACLs, but Linux filesystems only support "posix" ACLs. An attempt was made to support NFSv4 ACLs ("richacls") but hasn't been accepted upstream. So knfsd is stuck mapping between NFSv4 and posix ACLs. The result is awkward to use, but it should be able to do most of the things people do with AFS ACLs.

(AFS ACLs, unfortunately, are yet again a third incompatible style of ACL.)

Other possible avenues for improvement: add posix ACL support to the NFSv4 spec? Look for other ways to implement NFSv4 ACL support?

user and group management

AFS has a "protection server" and you can communicate with it using the pts commnad which allows you to set up users and groups and add ACEs for machines. In particular, I think it allows non-superusers to do things like create groups of their friends.

https://docs.openafs.org/Reference/1/pts.html

global namespace

On an AFS client by default you can look up something like /afs/umich.edu/... and reach files kept in AFS anywhere.

We have automounting support, NFS has standards for DNS discovery of server, so in theory this is all possible. Handling kerberos users across domains would be interesting.

migrating existing AFS installations to NFS

Once NFS does everything AFS does, there's still the question of how you'd migrate over a particular installation.

There's a standard AFS dump format (used by vos dump/vos restore) that might be worth looking at. It looks simple enough. Maybe look at https://github.com/openafs-contrib/cmu-dumpscan (strange license, though).

The differences between NFSv4, posix, and AFS ACLs make perfect translation possible. But it would still be possible to make a best-effort automatic translation and produce a report warning of any resulting permission differences. https://docs.openafs.org/UserGuide/HDRWQ46.html

Personal tools