Server 4.0 and 4.1 issues
From Linux NFS
Revision as of 15:08, 3 March 2010
This is my attempt at a description of the minimum we should have implemented for the 4.1 server before we can consider it sufficiently in compliance with the spec that it shouldn't cause major headaches for future clients. The rough summary:
- Trunking looks necessary. - It looks like we may get away without SSV for now, but I think we should still support at least GSS_MACH_CRED. - We should support backchannel security parameters. - There are a few operations (RECLAIM_COMPLETE, DESTROY_CLIENTID, ...) that are mandatory, and for which minimal implementations should be fairly easy. - We need to do a more careful job of checking when DRC limits on reply sizes are exceeded. - We need to communicate callback errors to the client correctly. - Allow two new ACL mask bits (ACE4_WRITE_RETENTION and ACE4_WRITE_RETENTION_HOLD). Probably just ignore them on any setattr of an acl, and return them always zero on a read of any acl.
I think we should get at least these problems fixed before merging optional features (including pNFS).
This is what I got mainly from looking through preexisting todo lists and parts of the spec. I probably overlooked something. I'll continue looking, but any help is appreciated.
There are also some problems inherited from the 4.0 implementation. I don't want those preexisting problems to hold up pNFS submission, but they still need to be kept as priorities.
Some more details and justification follow.
Trunking ^^^^^^^^
Both clientid (multiple sessions per client) and session (multiple connections per session) trunking are mandatory for a server to support. Therefore a client would be within its rights to simply refuse to interoperate with a server that didn't support either.
We could ask whether it is actually likely that a client will do that, and if there are instead obvious errors we could return that any client is likely to be able to handle gracefully.
I don't see any reliable way to do that: neither BIND_CONN_TO_SESSION nor BACKCHANNEL_CTL have allowable errors that seem reasonable to me in this case. CREATE_SESSION does at least allow returning NOSPC in the case where we can't commit to the additional DRC memory, so maybe we could get away with using that in the case where we don't want to provide any more sessions.
I suspect trunking is actually very easy to support on the server side. The client may be slower to support trunking, so we'll want to write some pynfs tests.
If we don't support this, I'm afraid of subtle exchange_id and create_session problems. Correct implementation of trunking also looks necessary for correct behavior in the case of multi-homed servers, for example.
So I think if we don't implement this soon we'll end up with idiosyncratic behavior that will be hard for clients to work around.
Kerberos, GSS, SSV ^^^^^^^^^^^^^^^^^^
Even though kerberos is mandatory, the fact is that every implementation is capable of running without it. So we could dodge some of these requirements by temporarily turning off support for the combination of GSS and 4.1.
However, I'd rather avoid the confusion that would come with turning off a preexisting major feature in a new protocol version.
So, we need to look at requirements for correct GSS support on 4.1.
The CREATE_SESSION operation allows the client to request certain security on the backchannel (with the csa_sec_parms field), and doesn't give the server any way to negotiate this (other than failing the whole request). So, if we support GSS, we should support this.
The same argument applies to SSV: the client requests a certain kind of state protection, and we don't have any reasonable way to refuse it. Unfortunately, it's unclear whether others are going to implement SSV. So for now it may make sense to give up and return SERVERFAULT in this case, and leave future clients to deal with that behavior. To avoid returning the same SERVERFAULT error for a variety of features first used on SETCLIENTID (trunking, backchannel security), we need to be careful to support those other features.
Others do appear to be supporting SP4_MACH_CRED, unlike SSV, so that would be a useful minimum for us to support.
More details for gss backchannel support:
We must allow the client to pass the server gss contexts to use on the backchannel.
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED should be set when required.
Server Reboot Recovery
^^^^^^^^^^^^^^^^^^^^^^
We need at least basic RECLAIM_COMPLETE support.
Question: do we need to set SEQ4_STATUS_RESTART_RECLAIM_NEEDED on any new session created by a preexisting client during the grace period? Seems like that should be necessary only if we implement persistent sessions, but I suppose it can't harm to set it otherwise.
The reboot recovery system common to 4.0 and 4.1 needs some work, but that's a preexisting 4.0 problem.
DRC limit checking ^^^^^^^^^^^^^^^^^^
We check for replies that are too big only *after* performing the operation in question. Depending on the operation, that may be too late to return NFS4ERR_REP_TOO_BIG_TO_CACHE. (For example, an irreversible filesystem operation may already have been performed.) We need to figure out how to estimate the size of the response before performing an operation, at least for operations that actually change the filesystem.
Callback failure handling ^^^^^^^^^^^^^^^^^^^^^^^^^
The server is required to set SEQ4_STATUS_CB_PATH_DOWN as long as it lacks any usable backchannel for the client. (Also, CB_PATH_DOWN should be returned on DESTROY_SESSION when appropriate.)
SEQ4_STATUS_CB_PATH_DOWN_SESSION is required when unable to retry a callback due to lack of a callback for that particular session.
Set SEQ4_STATUS_BACKCHANNEL_FAULT on encountering "unrecoverable fault with the backchannel (e.g. it has lost track of the sequence ID for a slot in the backchannel)."
Miscellaneous Mandatory Operations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and TEST_STATEID are not currently used by clients, but will be (and the spec recommends their use in common cases), and clients should not be expected to know how to recover from the case where they are not supported. They should also be fairly easy to implement.
Miscellaneous ^^^^^^^^^^^^^
Set SEQ4_STATUS_RECALLABLE_STATE_REVOKED when a client's failure to return a recallable object causes us to revoke the object, and be prepared to handle a FREE_STATEID from the client as acknowledgement.
(None of the STATE_REVOKED bits should be required as long as we don't partially revoke state (which we don't, under 4.0 or 4.1).)
Lower priority
==
This is stuff that is still a high priority, but that we can temporarily get away without doing on the grounds that they aren't absolutely required for minimal interoperability, and/or they don't introduce any new problems that don't already exist in the 4.0 implementation.
Referring triples ^^^^^^^^^^^^^^^^^
So, the particular requirement, from 2.10.6.3, is below (and in the below you can take the "client operation" to be an open, and the "associated object" to be a delegation created by that open):
"For each client operation which might result in some sort of server callback, the server SHOULD "remember" the { session ID, slot ID, sequence ID } triple of the client request until the slot ID retirement rules allow the server to determine that the client has, in fact, seen the server's reply. Until the time the { session ID, slot ID, sequence ID } request triple can be retired, any recalls of the associated object MUST carry an array of these referring identifiers (in the CB_SEQUENCE operation's arguments), for the benefit of the client."
If we ignore that "MUST", the result will be for the client to return a BADHANDLE or BADSTATEID error, as in v4.0. We have code to handle that case (by retrying) on the server. So if we ignore this requirement, the resulting behavior will be no worse than in 4.0. So I think we can get away with keeping this a *slightly* lower priority than the other stuff.
(I'd still like to see this done--if possible, at about the time it's done on the client. But it's a higher priority task on the client because there it really is mandatory: a server that lists the referring triples correctly does have a right not to have to handle those temporary BADHANDLE/BADSTATEID errors.)
Fix ERROR_RESOURCE and BADXDR returns ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We shouldn't be return RESOURCE to 4.1 clients at all, and most of our BADXDR returns are probably also incorrect--instead we should be returning NFS4ERR_REP_TOO_BIG, NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, etc.
SSV ^^^
This is still listed as mandatory in the spec, and while clients and other servers don't seem to be working on implementing this, it's not yet clear to me that there's a consensus to drop it.
Problems inherited from 4.0 implementation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Delegations: our delegation implementation does not currently revoke delegations on rename or unlink of a delegated file, leading to stale client caches in some cases.
Out-of-spec compound restrictions: we don't, for example, currently allow the client to send more than one IO (read, write, readdir) operation in a single compound. Some day adventurous clients may run across these cases.
Changeid: we're still relying on ctime for this, inadequate especially for ext3. Newer filesystems are fixing this, but some more work is needed to take advantage of improvements (for example to improve ext4's native changeid feature.)
Reboot recovery: the existing reboot recovery mechanism for NFSv4.0 has some architectural problems, and the core kernel developers have asked us to replace it. The transition between the new and old system will be awkward, and the earlier its done the better.
Lockowner DOS protection: we don't remove lockowners until close, release_lockowner, or client expiration, making it possible to DOS the server by opening a file and repeatedly locking it with a different lockowner each time, without closing the file.