NFS for AFS users
From Linux NFS
(→ACLs) |
(→volume management and migration) |
||
Line 9: | Line 9: | ||
Between LVM and (container-respecting) knfsd, we have a lot of the necessary pieces, but there's at a minimum a lot of tooling and documentation to write before this is usable. | Between LVM and (container-respecting) knfsd, we have a lot of the necessary pieces, but there's at a minimum a lot of tooling and documentation to write before this is usable. | ||
- | + | There's a "Volume Location Database" that tracks where (machine and partition) a volume is located. | |
- | + | Fast clones using COW are supported, along with complete copies on other machines. | |
- | + | ||
- | + | ||
- | + | Currently there can be only one writeable version of a volume, but multiple read-only versions (which all have to be identical). They can be on different servers. | |
- | + | ||
- | + | ||
- | + | ||
- | + | There can also be a 'backup' volume which is just, say, a daily temporary read-only snapshot of a RW volume and has to be located on the same machine. | |
- | + | ||
+ | When a RW volume is "released" (snapshotted) to the RO volumes, all the RO volumes update simultaneously and atomically. The users, in theory, don't notice as the volumes don't go offline - and then they see all the changes happen at once. There is coordination handling for when one or more of the fileservers or the Volume Location servers are offline. | ||
- | + | Volumes can be migrated between machines whilst in active use without the user in theory noticing anything. | |
+ | This is fairly easy to achieve for RO volumes since multiple servers serving the same data can just add one more server of the data with no problem - it's the migration of live RW volumes that's the real trick, and I don't know how this is actually done in OpenAFS. | ||
- | + | There are moves afoot to add multi-hosted RW volumes, but I'm not sure how that'll work, and may involve Ceph integration. But it's not there yet. | |
+ | |||
+ | In fact, logical volumes is something they particularly like. You can make a volume for a purpose; give particular people access to it, give it some storage, expand and contract it and move it around. It's intrinsically quota'd." | ||
+ | |||
+ | (I think volume and quota management is all unprivileged, too, is that right? So administrators can delegate that work somehow.) | ||
+ | |||
+ | |||
+ | Possible tools at our disposal: LVM, btrfs, [https://tools.ietf.org/html/rfc5661#section-11.9 fs_locations], [https://tools.ietf.org/html/rfc5661#section-11.10 fs_locations_info], [https://datatracker.ietf.org/doc/rfc8435/ pnfs flexfiles], Kubernetes. [https://wiki.linux-nfs.org/wiki/index.php/FedFsUtilsProject FedFS] may be dormant but may be worth keeping in mind. | ||
[https://docs.openafs.org/AdminGuide/HDRWQ177.html AFS Administrator's guide, Chapter 5: Managing Volumes] | [https://docs.openafs.org/AdminGuide/HDRWQ177.html AFS Administrator's guide, Chapter 5: Managing Volumes] |
Revision as of 20:45, 17 August 2020
This page tracks some of the obstacles that might keep an AFS user from using NFS instead.
Contents |
Missing Features
volume management and migration
For migration we need to preserve filehandles, so need to migrate at the block level or using fs-specific send/receive. The protocol can be handled by migrating only entire servers or containers, so that migration can be treated as a server reboot.
Between LVM and (container-respecting) knfsd, we have a lot of the necessary pieces, but there's at a minimum a lot of tooling and documentation to write before this is usable.
There's a "Volume Location Database" that tracks where (machine and partition) a volume is located.
Fast clones using COW are supported, along with complete copies on other machines.
Currently there can be only one writeable version of a volume, but multiple read-only versions (which all have to be identical). They can be on different servers.
There can also be a 'backup' volume which is just, say, a daily temporary read-only snapshot of a RW volume and has to be located on the same machine.
When a RW volume is "released" (snapshotted) to the RO volumes, all the RO volumes update simultaneously and atomically. The users, in theory, don't notice as the volumes don't go offline - and then they see all the changes happen at once. There is coordination handling for when one or more of the fileservers or the Volume Location servers are offline.
Volumes can be migrated between machines whilst in active use without the user in theory noticing anything.
This is fairly easy to achieve for RO volumes since multiple servers serving the same data can just add one more server of the data with no problem - it's the migration of live RW volumes that's the real trick, and I don't know how this is actually done in OpenAFS.
There are moves afoot to add multi-hosted RW volumes, but I'm not sure how that'll work, and may involve Ceph integration. But it's not there yet.
In fact, logical volumes is something they particularly like. You can make a volume for a purpose; give particular people access to it, give it some storage, expand and contract it and move it around. It's intrinsically quota'd."
(I think volume and quota management is all unprivileged, too, is that right? So administrators can delegate that work somehow.)
Possible tools at our disposal: LVM, btrfs, fs_locations, fs_locations_info, pnfs flexfiles, Kubernetes. FedFS may be dormant but may be worth keeping in mind.
AFS Administrator's guide, Chapter 5: Managing Volumes
PAGS
PAGs: AFS allows a group of processes to share a common identity, different from the local uid, for the purposes of accessing an AFS filesystem: https://docs.openafs.org/AdminGuide/HDRWQ63.html
Dave Howells says: "This is why I added session keyrings. You can run a process in a new keyring and give it new tokens. systemd kind of stuck a spike in that, though, by doing their own incompatible thing with their user manager service....
NFS would need to do what the in-kernel AFS client does and call request_key() on entry to each filesystem method that doesn't take a file* and use that to cache the credentials it is using. If there is no key, it can make one up on the spot and stick the uid/gid/groups in there. This would then need to be handed down to the sunrpc protocol to define the security creds to use.
The key used to open a file would then need to be cached in the file struct private data."
ACLs
NFSv4 has ACLs, but Linux filesystems only support "posix" ACLs. An attempt was made to support NFSv4 ACLs ("richacls") but hasn't been accepted upstream. So knfsd is stuck mapping between NFSv4 and posix ACLs. Posix ACLs are more coarse-grained than NFSv4 ACLs, so information can be lost when a user on an NFSv4 client sets an ACL. This makes ACLs confusing and less useful.
There are other servers that support full NFSv4 ACLs, so users of those servers are better off. Our client-side tools could still use some improvements for those users, though.
AFS ACLs, unfortunately, are yet again a third style of ACL, incompatible with both POSIX and NFSv4 ACLs. They are more fine-grained than POSIX ACLs and probably closer to NFSv4 ACLs overall.
To do:
- make NFSv4 ACL tools more usable:
- Map groups of NFSv4 permission bits to read, write, and execute permissions so we only have to display the simpler bits in common cases
- Look for other opportunities to simplify display and editing of NFSv4 ACLs
- Add NFSv4 ACL support to graphical file managers like GNOME Files
- Adopt a commandline interface that's more similar to the posix acl utilities.
- Perhaps also look into ["https://github.com/kvaneesh/richacl-tools" richacl tools] as an alternative starting point to nfs4-acl-tools.
- In general, try to make NFSv4 ACL management more similar to management of existing posix ACLs.
- For AFS->NFS transition:
- Write code that translates AFS ACLs to NFSv4 ACLs. It should be possible to do this with little or no loss of information for servers with full NFSv4 ACL support.
- For migrations to Linux knfsd, this will effectively translate AFS ACLs to POSIX ACLs, and information will be lost. Test this case. The conversion tool should be able to fetch the ACLs after setting them, compare results, and summarize the results of the conversion in a way that's usable even for conversions of large numbers of files. I believe that setting an ACL is enough to invalidate the client's ACL cache, so a subsequent fetch of an ACL should show the results of any server-side mapping. But, test this to make sure. More details on AFS to NFSv4 ACL conversion
- more ambitious options:
- Try reviving Rich ACLs. Maybe we could convince people this time. Or maybe there's a different approach that would work. Maybe we could find a more incremental route, e.g. by adding some features of richacls to POSIX ACLs, such as the separation of directory write permissions into add and delete, and of file write permissions into modify and append.
user and group management
AFS has a "protection server" and you can communicate with it using the pts command which allows you to set up users and groups and add ACEs for machines. In particular, I think it allows non-superusers to do things like create groups of their friends.
https://docs.openafs.org/Reference/1/pts.html
global namespace
On an AFS client by default you can look up something like /afs/umich.edu/... and reach files kept in AFS anywhere.
We have automounting support, NFS has standards for DNS discovery of server, so in theory this is all possible. Handling kerberos users across domains would be interesting.
migrating existing AFS installations to NFS
Once NFS does everything AFS does, there's still the question of how you'd migrate over a particular installation.
There's a standard AFS dump format (used by vos dump/vos restore) that might be worth looking at. It looks simple enough. Maybe also look at cmu-dumpscan.
The differences between NFSv4, posix, and AFS ACLs make perfect translation possible. But it would still be possible to make a best-effort automatic translation and produce a report warning of any resulting permission differences. https://docs.openafs.org/UserGuide/HDRWQ46.html