Reboot recovery for re-export servers

From Linux NFS

Revision as of 20:39, 3 December 2020 by Bfields (Talk | contribs)
Jump to: navigation, search

Maybe this would work:

Terminology: end clients ---> re-export server ---> original server

We configure the original server with a lease time sufficient to allow its clients to reboot with their leases expiring.

As always, when a server takes an EXCHANGE_ID, confirmed by a CREATE_SESSION, it removes that client's state.

But it actually hangs on to the underlying locks, for either another lease period or until the client sends RECLAIM_COMPLETE.

During that time, it won't honor the old stateids, but it will accept reclaims from the client as long as they're for state it already knows the client has.

The client doesn't send an immediate RECLAIM_COMPLETE; it may still have locks to reclaim, even though it doesn't know what they are yet.

The client delays sending RECLAIM_COMPLETE until either

1) A local application attempts a lock or an open, in which case it sends the RECLAIM_COMPLETE before attempting the operation. 2) knfsd starts up and ends the grace period.

In the first case, if the re-export server subsequently starts, it skips any grace period and returns NO_GRACE to any reclaims by end clients. This is probably undesirable, so local applications should not be permitted access to the filesystem on the re-export server until knfsd is started.

Once the server has started, any reclaim operations received by the re-export server are satisfied by sending corresponding reclaims to the original server.

It may also send non-reclaim operations to the original server (whether the originate from local applications or from end clients); it's up to the original server to process them normally or return GRACE if it needs to wait for reclaims before deciding how to process them.

Call a client or server that implements the above "re-export friendly".

If a client gets NO_GRACE replies to its reclaim calls it may decide that the server is not re-export friendly, or that it has just missed the grace period, and it can end its own grace period early and deny any reclaim attempts from end clients.

Personal tools