Reboot recovery for re-export servers
From Linux NFS
Maybe this would work:
Terminology: end clients ---> re-export server ---> original server
We configure the original server with a lease time sufficient to allow its clients to reboot before their leases expire.
As always, when any server receives an EXCHANGE_ID, confirmed by a CREATE_SESSION, it removes that client's state.
But it actually hangs on to the underlying locks, for either another lease period or until the client sends RECLAIM_COMPLETE.
During that time, it won't honor the old stateids, but it will accept reclaims from the client as long as they're for state it already knows the client has.
The client doesn't send an immediate RECLAIM_COMPLETE; it may still have locks to reclaim, even though it doesn't know what they are yet.
The client delays sending RECLAIM_COMPLETE until either
- A local application attempts a lock or an open, in which case it sends the RECLAIM_COMPLETE before attempting the operation.
- knfsd starts up and ends the grace period.
In the first case, if the re-export server subsequently starts, it skips any grace period and returns NO_GRACE to any reclaims by end clients. This is probably undesirable, so local applications should not be permitted access to the filesystem on the re-export server until knfsd is started. (Note this is a preexisting requirement: a local application with access to an exported filesystem could lock and unlock a file without knowing that an NFS client is about to reclaim a preexisting conflicting lock, resulting in incorrect lock behavior.)
Once the server has started, any reclaim operations received by the re-export server are satisfied by sending corresponding reclaims to the original server.
It may also send non-reclaim operations to the original server (whether the originate from local applications or from end clients); it's up to the original server to process them normally or return GRACE if it needs to wait for reclaims before deciding how to process them.
Call a client or server that implements the above "re-export friendly".
If a client gets NO_GRACE replies to its reclaim calls it may decide that the server is not re-export friendly, or that it has just missed the grace period, and it can end its own grace period early and deny any reclaim attempts from end clients.
I think this requires no new protocol. Maybe EXCHANGE_ID flags to negotiate the new behavior would help, but I don't see a reason they're necessary.
I think the only new knfsd<->nfs interface required is a reclaim flag on kernel lock requests, and a new reclaim_complete() export operation to tell the client when the re-export server believes it's done reclaiming.
I believe it would also work for the original server to return GRACE to any non-reclaim locks from the time it drops the locks of a client confirming a new EXCHANGE_ID until the time that client sends a RECLAIM_COMPLETE (or a lease period passes). So new locking from any clients would be prohibited while a rebooting client reclaims. That implementation would be simpler and I don't think it should be prohibited. But I suspect it's not much more difficult for the original server to hang onto the locks, and it will be worth the trouble for most servers to allow non-conflicting access to continue without delay.