Server 4.0 and 4.1 issues

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(initial import from txt file)
(NFSv4.0)
 
(119 intermediate revisions not shown)
Line 1: Line 1:
-
This is my attempt at a description of the minimum we should have
+
Before 3.11, the server's implementation of NFSv4.1 deviates from the spec ([http://tools.ietf.org/search/rfc5661 rfc 5661]) in a number of important ways, and is recommended for developers only.
-
implemented for the 4.1 server before we can consider it sufficiently in
+
-
compliance with the spec that it shouldn't cause major headaches for
+
-
future clients. The rough summary:
+
-
        - Trunking looks necessary.
+
As of about 3.11:
-
        - It looks like we may get away without SSV for now, but I think
+
* someone upgrading from NFSv4.0 should experience no loss in functionality;
-
          we should still support at least GSS_MACH_CRED.
+
* server behavior should be close enough to the spec that clients will not be forced into undocumented workarounds.
-
        - We should support backchannel security parameters.
+
-
        - There are a few operations (RECLAIM_COMPLETE,
+
-
          DESTROY_CLIENTID, ...) that are mandatory, and for which
+
-
          minimal implementations should be fairly easy.
+
-
        - We need to do a more careful job of checking when DRC limits
+
-
          on reply sizes are exceeded.
+
-
        - We need to communicate callback errors to the client
+
-
          correctly.
+
-
        - Allow two new ACL mask bits (ACE4_WRITE_RETENTION and
+
-
          ACE4_WRITE_RETENTION_HOLD).  Probably just ignore them on any
+
-
          setattr of an acl, and return them always zero on a read of
+
-
          any acl.
+
-
I think we should get at least these problems fixed before merging
+
Any exceptions should be reported as bugs.
-
optional features (including pNFS).
+
-
This is what I got mainly from looking through preexisting todo lists
+
In a few cases we lack support for features that are mandated by the rfc's, but that nevertheless are rarely (or never) implemented, and whose absence is easily worked around on the client.
-
and parts of the spec.  I probably overlooked something.  I'll continue
+
-
looking, but any help is appreciated.
+
-
There are also some problems inherited from the 4.0 implementation.  I
+
= NFSv4.1 =
-
don't want those preexisting problems to hold up pNFS submission, but
+
-
they still need to be kept as priorities.
+
-
Some more details and justification follow.
+
== Done, needs testing ==
-
Trunking
+
=== SP4_MACH_CRED ===
-
^^^^^^^^
+
-
Both clientid (multiple sessions per client) and session (multiple
+
SP4_MACH_CRED (like SSV) is mandatory for servers to implement.   It is less complicated than SSV and provides some (not all) of the advantages, and there's a better chance a client may implement it.  So we should implement it now.
-
connections per session) trunking are mandatory for a server to support.
+
-
Therefore a client would be within its rights to simply refuse to
+
-
interoperate with a server that didn't support either.
+
-
We could ask whether it is actually likely that a client will do that,
+
We want to minimize the number of optional features we don't implement; each such omission makes it harder for future clients, which will be forced to negotiate support for features that the protocol wasn't designed to negotiate support for.
-
and if there are instead obvious errors we could return that any client
+
-
is likely to be able to handle gracefully.
+
-
I don't see any reliable way to do that: neither BIND_CONN_TO_SESSION
+
=== Check 4.0/4.1 interactions ===
-
nor BACKCHANNEL_CTL have allowable errors that seem reasonable to me in
+
-
this case. CREATE_SESSION does at least allow returning NOSPC in the
+
-
case where we can't commit to the additional DRC memory, so maybe we
+
-
could get away with using that in the case where we don't want to
+
-
provide any more sessions.
+
-
I suspect trunking is actually very easy to support on the server side.
+
I think this is covered now, but it might be interesting to write python tests that send 4.0 compounds referencing 4.1 clients, or vice versa, especially for create_session, setclientid, and friends.
-
The client may be slower to support trunking, so we'll want to write
+
-
some pynfs tests.
+
-
If we don't support this, I'm afraid of subtle exchange_id and
+
=== New open claim types ===
-
create_session problems.  Correct implementation of trunking also looks
+
-
necessary for correct behavior in the case of multi-homed servers, for
+
-
example.
+
-
So I think if we don't implement this soon we'll end up with
+
We must support CLAIM_FH and CLAIM_DELEG_CUR_FH.  (We shouldn't need CLAIM_DELEG_PREV_FH.)
-
idiosyncratic behavior that will be hard for clients to work around.
+
Needs some simple pynfs tests.
-
Kerberos, GSS, SSV
+
=== DRC limit checking ===
-
^^^^^^^^^^^^^^^^^^
+
-
Even though kerberos is mandatory, the fact is that every implementation
+
We check for replies that are too big only *after* performing the
-
is capable of running without itSo we could dodge some of these
+
operation in questionDepending on the operation, that may be too late to return NFS4ERR_REP_TOO_BIG_TO_CACHE.  (For example, an irreversible filesystem operation may already have been performed.)  We need to figure out how to estimate the size of the response before performing an operation, at least for operations that actually change the filesystem.
-
requirements by temporarily turning off support for the combination of
+
-
GSS and 4.1.
+
-
However, I'd rather avoid the confusion that would come with turning off
+
Possible fix: add to each nfsd4_ops[] a field with an upper bound on the size of a reply to that operation.  Before calling the operation, check that there's room for the worst case, and return TOO_BIG_TO_CACHE if not.
-
a preexisting major feature in a new protocol version.
+
-
So, we need to look at requirements for correct GSS support on 4.1.
+
=== Callback failure handling ===
-
The CREATE_SESSION operation allows the client to request certain
+
The server is required to set SEQ4_STATUS_CB_PATH_DOWN as long as it lacks any usable backchannel for the client(Also, CB_PATH_DOWN should be returned on DESTROY_SESSION when appropriate.)  (SOME TESTING DONE.)
-
security on the backchannel (with the csa_sec_parms field), and doesn't
+
-
give the server any way to negotiate this (other than failing the whole
+
-
request)So, if we support GSS, we should support this.
+
-
The same argument applies to SSV: the client requests a certain kind of
+
Set SEQ4_STATUS_BACKCHANNEL_FAULT on encountering "unrecoverable fault with the backchannel (e.g. it has lost track of the sequence ID for a slot in the backchannel)."
-
state protection, and we don't have any reasonable way to refuse it.
+
-
Unfortunately, it's unclear whether others are going to implement SSV.
+
-
So for now it may make sense to give up and return SERVERFAULT in this
+
-
case, and leave future clients to deal with that behavior.  To avoid
+
-
returning the same SERVERFAULT error for a variety of features first
+
-
used on SETCLIENTID (trunking, backchannel security), we need to be
+
-
careful to support those other features.
+
-
Others do appear to be supporting SP4_MACH_CRED, unlike SSV, so that
+
=== Trunking ===
-
would be a useful minimum for us to support.
+
-
More details for gss backchannel support:
+
Both clientid (multiple sessions per client) and session (multiple connections per session) trunking are mandatory for a server to support. Therefore a client would be within its rights to simply refuse to interoperate with a server that didn't support either.
-
We must allow the client to pass the server gss contexts to use on the
+
We could ask whether it is actually likely that a client will do that, and if there are instead obvious errors we could return that any client is likely to be able to handle gracefully.
-
backchannel.
+
-
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
+
On the first question: Supporting trunking really just means doing what the spec says when we receive multiple exchange_id's, create_sessions, or transport connections from the same client.  These can arise in simple situations. For example, multi-homed servers need to know how to handle the former.  Client recovery of various kinds (see BIND_CONN_TO_SESSION above) may also require that multiple connections be associated with a single session over time, even if only one is in use at a time.
-
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED should be set when required.
+
 +
On the second question: I don't see any reliable way to error out: neither BIND_CONN_TO_SESSION nor BACKCHANNEL_CTL have allowable errors that seem reasonable to me in this case.  CREATE_SESSION does at least allow returning NOSPC in the case where we can't commit to the additional DRC memory, so maybe we could get away with using that in the case where we don't want to provide any more sessions.
-
Server Reboot Recovery
+
So I think if we don't implement this soon we'll end up with idiosyncratic behavior that will be hard for clients to work around.
-
^^^^^^^^^^^^^^^^^^^^^^
+
-
We need at least basic RECLAIM_COMPLETE support.
+
We'll also need some pynfs tests to make sure we're getting this right.
-
Question: do we need to set SEQ4_STATUS_RESTART_RECLAIM_NEEDED on any
+
=== BIND_CONN_TO_SESSION ===
-
new session created by a preexisting client during the grace period?
+
-
Seems like that should be necessary only if we implement persistent
+
-
sessions, but I suppose it can't harm to set it otherwise.
+
-
The reboot recovery system common to 4.0 and 4.1 needs some work, but
+
This is mandatory to implement on the server.
-
that's a preexisting 4.0 problem.
+
-
DRC limit checking
+
This is not just for exotic multi-connection setups.
-
^^^^^^^^^^^^^^^^^^
+
-
We check for replies that are too big only *after* performing the
+
If a client opts for [[#SP4_MACH_CRED]] protection, and if its tcp connection is broken for some reason, then it may choose to give up all its state and start from scratchThat may be good enough for very minimal first 4.1 implementationsHowever, clients that wish to reconnect without giving up their session state (e.g., the reply cache) will need to use BIND_CONN_TO_SESSION to associate the new connection to the old session.
-
operation in questionDepending on the operation, that may be too late
+
-
to return NFS4ERR_REP_TOO_BIG_TO_CACHE(For example, an irreversible
+
-
filesystem operation may already have been performed.) We need to
+
-
figure out how to estimate the size of the response before performing an
+
-
operation, at least for operations that actually change the filesystem.
+
-
Callback failure handling
+
Even with SP4_NONE, clients will want to be able to reconnect without losing the backchannel.  Again, that will require BIND_CONN_TO_SESSION.
-
^^^^^^^^^^^^^^^^^^^^^^^^^
+
-
The server is required to set SEQ4_STATUS_CB_PATH_DOWN as long as it
+
=== deferral fixes ===
-
lacks any usable backchannel for the client.  (Also, CB_PATH_DOWN should
+
-
be returned on DESTROY_SESSION when appropriate.)
+
-
SEQ4_STATUS_CB_PATH_DOWN_SESSION is required when unable to retry a
+
Fix merged for 2.6.37.
-
callback due to lack of a callback for that particular session.
+
-
Set SEQ4_STATUS_BACKCHANNEL_FAULT on encountering "unrecoverable fault
+
The current code returns ERR_DELAY whenever an upcall is required instead of using the server's deferral mechanism, since that mechanism replays a request internally, causing SEQUENCE to fail on the second time through.
-
with the backchannel (e.g. it has lost track of the sequence ID for a
+
-
slot in the backchannel)."
+
-
Miscellaneous Mandatory Operations
+
These DELAYS are hard on clients, and will cause unacceptable delays in some cases.  Fix the deferral code to sleep at least a little before giving up and returning ERR_DELAY.
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
-
DESTROY_CLIENTID, FREE_STATEID, SECINFO_NO_NAME, and TEST_STATEID are
+
=== compound op ordering enforcement ===
-
not currently used by clients, but will be (and the spec recommends
+
-
their use in common cases), and clients should not be expected to know
+
-
how to recover from the case where they are not supported.  They should
+
-
also be fairly easy to implement.
+
-
Miscellaneous
+
DESTROY_SESSION must be the final operation in a compound request, nfs4err_not_only_op should be returned when appropriate.  Make sure a session is defined whenever the code expects it.
-
^^^^^^^^^^^^^
+
-
Set SEQ4_STATUS_RECALLABLE_STATE_REVOKED when a client's failure to
+
The risk here is that there may be nasty DOS's (or worse) against a server that doesn't check this kind of thing carefully.
-
return a recallable object causes us to revoke the object, and be
+
-
prepared to handle a FREE_STATEID from the client as acknowledgement.
+
-
(None of the STATE_REVOKED bits should be required as long as we don't
+
=== Keep client from expiring while in use by session ===
-
partially revoke state (which we don't, under 4.0 or 4.1).)
+
 
 +
The session associated with a compound may be implicitly referred to by individual operations.  For example, RECLAIM_COMPLETE implicitly applies to the client associated with the current session.  However, we don't currently do anything to prevent the client from being freed partway through processing a compound.
 +
 
 +
=== Fix ERROR_RESOURCE and BADXDR returns ===
 +
 
 +
We shouldn't be return RESOURCE to 4.1 clients at all, and most of our
 +
BADXDR returns are probably also incorrect--instead we should be
 +
returning NFS4ERR_REP_TOO_BIG, NFS4ERR_REQ_TOO_BIG,
 +
NFS4ERR_TOO_MANY_OPS, etc.
-
Lower priority
+
== Not needed immediately ==
-
==============
+
-
This is stuff that is still a high priority, but that we can temporarily
+
This is stuff that is still a high priority, but that we can temporarily get away without doing on the grounds that they aren't absolutely required for minimal interoperability, and/or they don't introduce any new problems that don't already exist in the 4.0 implementation.
-
get away without doing on the grounds that they aren't absolutely
+
-
required for minimal interoperability, and/or they don't introduce any
+
-
new problems that don't already exist in the 4.0 implementation.
+
-
Referring triples
+
=== Referring triples ===
-
^^^^^^^^^^^^^^^^^
+
So, the particular requirement, from 2.10.6.3, is below (and in the
So, the particular requirement, from 2.10.6.3, is below (and in the
Line 196: Line 122:
temporary BADHANDLE/BADSTATEID errors.)
temporary BADHANDLE/BADSTATEID errors.)
-
Fix ERROR_RESOURCE and BADXDR returns
+
=== SSV ===
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
-
 
+
-
We shouldn't be return RESOURCE to 4.1 clients at all, and most of our
+
-
BADXDR returns are probably also incorrect--instead we should be
+
-
returning NFS4ERR_REP_TOO_BIG, NFS4ERR_REQ_TOO_BIG,
+
-
NFS4ERR_TOO_MANY_OPS, etc.
+
-
 
+
-
SSV
+
-
^^^
+
This is still listed as mandatory in the spec, and while clients and
This is still listed as mandatory in the spec, and while clients and
Line 211: Line 128:
yet clear to me that there's a consensus to drop it.
yet clear to me that there's a consensus to drop it.
-
Problems inherited from 4.0 implementation
+
== Done ==
-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
-
Delegations: our delegation implementation does not currently revoke
+
=== GSS on the backchannel ===
-
delegations on rename or unlink of a delegated file, leading to stale
+
 
-
client caches in some cases.
+
(bfields is working on this)
 +
 
 +
Clients don't currently request gss on the backchannel.  It is mandatory to support this.  I don't know if anyone actually does.  Still undecided actually how to fail.
 +
 
 +
More details for gss backchannel support:
 +
 
 +
We must allow the client to pass the server gss contexts to use on the backchannel.
 +
 
 +
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and
 +
SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED should be set when required.
 +
 
 +
See the end of [http://tools.ietf.org/search/rfc5661#section-18.36.4 section 18.36.4] for more implementation details.
 +
 
 +
=== SEQ4_STATUS_RECALLABLE_STATE_REVOKED ===
 +
 
 +
Set SEQ4_STATUS_RECALLABLE_STATE_REVOKED when a client's failure to
 +
return a recallable object causes us to revoke the object, and be
 +
prepared to handle a FREE_STATEID from the client as acknowledgement.
 +
 
 +
=== backchannel attribute negotiation ===
 +
 
 +
alloc_init_session() should sanity-check the values of the backchannel attributes that the client gives us, and fail the CREATE_SESSION if they don't meet our minimal requirements.
 +
 
 +
One starting point here: http://marc.info/?l=linux-nfs&m=128647408432218&w=2
 +
 
 +
=== Make DESTROY_SESSION wait on in-progress requests ===
 +
 
 +
See discussion on the ietf list: [http://www.ietf.org/mail-archive/web/nfsv4/current/msg08584.html]
 +
 
 +
(This is a slightly odd one as it's not really required till 4.2.  However it appears we have some races here that could cause memory corruption, and that those races would be most easily fixed by delaying session and client destruction till in-progress requests are processed.)
 +
 
 +
=== destroy lockowners on unlock ===
 +
 
 +
Unlike in the 4.0 case, in the 4.1 case a stateid should become invalid on an unlock that leaves the caller with no bytes locked; a 4.1 client will no longer call release_lockowner.
 +
 
 +
=== Respect client-requested backchannel security ===
 +
 
 +
Done (auth_null and partial auth_sys); to be merged for 3.8.  pynfs tests (DELEG5-7) also available.
 +
 
 +
We ignore the csa_sec_parms and bca_sec_parms fields that specify the security to be used on the backchannel.  Instead, we *always* use auth_sys credentials, because we happen to know is what the Linux client currently expects.
 +
 
 +
The client can provide a list of possible parameters.  To start, we'll just take the first one that we support and ignore the rest.
 +
 
 +
=== DESTROY_CLIENTID ===
 +
 
 +
DESTROY_CLIENTID is not currently used by clients, but will be, and clients should not be expected to know how to recover from the case where it is not supported.  It should also be fairly easy to implement.
 +
 
 +
=== FREE_STATEID and TEST_STATEID ===
 +
 
 +
=== Implement SECINFO_NO_NAME ===
 +
 
 +
It's mandatory and not very difficult.
 +
 
 +
=== SECINFO should consume current filehandle ===
 +
 
 +
See [http://tools.ietf.org/html/rfc5661#section-2.6.3.1.1.8]
 +
 
 +
=== Basic Server Reboot Recovery for 4.1 ===
 +
 
 +
We need at least basic RECLAIM_COMPLETE support.
 +
 
 +
Question: do we need to set SEQ4_STATUS_RESTART_RECLAIM_NEEDED on any new session created by a preexisting client during the grace period? Seems like that should be necessary only if we implement persistent sessions, but I suppose it can't harm to set it otherwise.
 +
 
 +
The reboot recovery system common to 4.0 and 4.1 needs some work, but that's a preexisting 4.0 problem.
 +
 
 +
=== Clarify RDMA non-support ===
 +
 
 +
Nobody has stepped up to work on RDMA and 4.1, so while it's a violation of our principal that "someone upgrading from the previous version should experience no loss in functionality", we should probably declare the combination of RDMA and 4.1 unsupported, until someone has a chance to spend some time on it.
 +
 
 +
We've checked that create_session and bind_conn_to_session returns both indicate non-support, so that should be all we need to do for now.
 +
 
 +
See also [[4.1 RDMA issues]]
 +
 
 +
=== ACL retention bits ===
 +
 
 +
From inspection of the code, it appears that these bits are ignored on set, cleared on return.  We have no plans to really implement these, so for now that's probably adequate.
 +
 
 +
= NFSv4.0 =
 +
 
 +
== Highest priority ==
 +
 
 +
Required for the 4.0 server to be minimally acceptable.
 +
 
 +
We may accept new features into 4.1 without requiring these be fixed, but it will be a huge problem if they aren't somehow fixed soon.
 +
 
 +
=== Fix changeid ===
 +
 
 +
We're still relying on ctime for this, inadequate especially for ext3 (with 1-second resolution).  Newer filesystems are fixing this, but some more work is needed to take advantage of improvements (for example to improve ext4's native changeid feature.)
 +
 
 +
== Done, needs testing ==
 +
 
 +
=== Turn on reply cache for 4.0 ===
 +
 
 +
The reply cache is currently off for 4.0 and 4.1.  We want it to stay that way for 4.1 (sessions replaces it), but 4.0 still needs it.  The slightly tricky part is identifying idempotent versus non-idempotent operations.  We should be able to do that by tagging individual ops as one or the other, then noting in the xdr-decode phase whether we've encountered any non-idempotent ops.
 +
 
 +
"Lower priority" only because NFSv4 is only supported over TCP, and while the reply cache is still needed over TCP, the current reply cache design seems unlikely to help in that case.  Therefore NFSv4 without a reply cache is unlikely to be any worse than NFSv3 over TCP already is.  So there is still a high-priority bug here (to fix the reply cache), but it already exists in NFSv3.
 +
 
 +
=== Accepting more compounds ===
Out-of-spec compound restrictions: we don't, for example, currently
Out-of-spec compound restrictions: we don't, for example, currently
Line 223: Line 236:
across these cases.
across these cases.
-
Changeid: we're still relying on ctime for this, inadequate especially
+
== Done ==
-
for ext3.  Newer filesystems are fixing this, but some more work is
+
 
-
needed to take advantage of improvements (for example to improve ext4's
+
=== Breaking delegations when required ===
-
native changeid feature.)
+
 
 +
(bfields is working on this.)
 +
 
 +
Our delegation implementation does not currently recall delegations on rename or unlink of a delegated file, leading to stale client caches in some cases.
 +
 
 +
This has been recently fixed for NFS-only access, making this somewhat of a lower priority, but the problem still exists for multi-protocol or local access.  For example, if you ssh into the server and remove a file, or remove a file using Samba, then an NFSv4 delegation on that file will not be recalled.
 +
 
 +
We have CITI patches to address this problem in the VFS.  They still have some bugs, and the design needs to be revisited.
 +
 
 +
See [[http://marc.info/?t=127382965200004&r=1&w=2]] for discussion.
 +
 
 +
=== Fix reboot recovery ===
-
Reboot recovery: the existing reboot recovery mechanism for NFSv4.0 has
+
The existing reboot recovery mechanism for NFSv4.0 has some architectural problems, and the core kernel developers have asked us to replace it.  The transition between the new and old system will be awkward, and the earlier it's done the better.
-
some architectural problems, and the core kernel developers have asked
+
-
us to replace it.  The transition between the new and old system will be
+
-
awkward, and the earlier its done the better.
+
-
Lockowner DOS protection: we don't remove lockowners until close,
+
We have a basic design for [[nfsd4 server recovery]].
-
release_lockowner, or client expiration, making it possible to DOS the
+
-
server by opening a file and repeatedly locking it with a different
+
-
lockowner each time, without closing the file.
+

Latest revision as of 15:11, 5 August 2014

Before 3.11, the server's implementation of NFSv4.1 deviates from the spec (rfc 5661) in a number of important ways, and is recommended for developers only.

As of about 3.11:

  • someone upgrading from NFSv4.0 should experience no loss in functionality;
  • server behavior should be close enough to the spec that clients will not be forced into undocumented workarounds.

Any exceptions should be reported as bugs.

In a few cases we lack support for features that are mandated by the rfc's, but that nevertheless are rarely (or never) implemented, and whose absence is easily worked around on the client.

Contents

NFSv4.1

Done, needs testing

SP4_MACH_CRED

SP4_MACH_CRED (like SSV) is mandatory for servers to implement. It is less complicated than SSV and provides some (not all) of the advantages, and there's a better chance a client may implement it. So we should implement it now.

We want to minimize the number of optional features we don't implement; each such omission makes it harder for future clients, which will be forced to negotiate support for features that the protocol wasn't designed to negotiate support for.

Check 4.0/4.1 interactions

I think this is covered now, but it might be interesting to write python tests that send 4.0 compounds referencing 4.1 clients, or vice versa, especially for create_session, setclientid, and friends.

New open claim types

We must support CLAIM_FH and CLAIM_DELEG_CUR_FH. (We shouldn't need CLAIM_DELEG_PREV_FH.) Needs some simple pynfs tests.

DRC limit checking

We check for replies that are too big only *after* performing the operation in question. Depending on the operation, that may be too late to return NFS4ERR_REP_TOO_BIG_TO_CACHE. (For example, an irreversible filesystem operation may already have been performed.) We need to figure out how to estimate the size of the response before performing an operation, at least for operations that actually change the filesystem.

Possible fix: add to each nfsd4_ops[] a field with an upper bound on the size of a reply to that operation. Before calling the operation, check that there's room for the worst case, and return TOO_BIG_TO_CACHE if not.

Callback failure handling

The server is required to set SEQ4_STATUS_CB_PATH_DOWN as long as it lacks any usable backchannel for the client. (Also, CB_PATH_DOWN should be returned on DESTROY_SESSION when appropriate.) (SOME TESTING DONE.)

Set SEQ4_STATUS_BACKCHANNEL_FAULT on encountering "unrecoverable fault with the backchannel (e.g. it has lost track of the sequence ID for a slot in the backchannel)."

Trunking

Both clientid (multiple sessions per client) and session (multiple connections per session) trunking are mandatory for a server to support. Therefore a client would be within its rights to simply refuse to interoperate with a server that didn't support either.

We could ask whether it is actually likely that a client will do that, and if there are instead obvious errors we could return that any client is likely to be able to handle gracefully.

On the first question: Supporting trunking really just means doing what the spec says when we receive multiple exchange_id's, create_sessions, or transport connections from the same client. These can arise in simple situations. For example, multi-homed servers need to know how to handle the former. Client recovery of various kinds (see BIND_CONN_TO_SESSION above) may also require that multiple connections be associated with a single session over time, even if only one is in use at a time.

On the second question: I don't see any reliable way to error out: neither BIND_CONN_TO_SESSION nor BACKCHANNEL_CTL have allowable errors that seem reasonable to me in this case. CREATE_SESSION does at least allow returning NOSPC in the case where we can't commit to the additional DRC memory, so maybe we could get away with using that in the case where we don't want to provide any more sessions.

So I think if we don't implement this soon we'll end up with idiosyncratic behavior that will be hard for clients to work around.

We'll also need some pynfs tests to make sure we're getting this right.

BIND_CONN_TO_SESSION

This is mandatory to implement on the server.

This is not just for exotic multi-connection setups.

If a client opts for #SP4_MACH_CRED protection, and if its tcp connection is broken for some reason, then it may choose to give up all its state and start from scratch. That may be good enough for very minimal first 4.1 implementations. However, clients that wish to reconnect without giving up their session state (e.g., the reply cache) will need to use BIND_CONN_TO_SESSION to associate the new connection to the old session.

Even with SP4_NONE, clients will want to be able to reconnect without losing the backchannel. Again, that will require BIND_CONN_TO_SESSION.

deferral fixes

Fix merged for 2.6.37.

The current code returns ERR_DELAY whenever an upcall is required instead of using the server's deferral mechanism, since that mechanism replays a request internally, causing SEQUENCE to fail on the second time through.

These DELAYS are hard on clients, and will cause unacceptable delays in some cases. Fix the deferral code to sleep at least a little before giving up and returning ERR_DELAY.

compound op ordering enforcement

DESTROY_SESSION must be the final operation in a compound request, nfs4err_not_only_op should be returned when appropriate. Make sure a session is defined whenever the code expects it.

The risk here is that there may be nasty DOS's (or worse) against a server that doesn't check this kind of thing carefully.

Keep client from expiring while in use by session

The session associated with a compound may be implicitly referred to by individual operations. For example, RECLAIM_COMPLETE implicitly applies to the client associated with the current session. However, we don't currently do anything to prevent the client from being freed partway through processing a compound.

Fix ERROR_RESOURCE and BADXDR returns

We shouldn't be return RESOURCE to 4.1 clients at all, and most of our BADXDR returns are probably also incorrect--instead we should be returning NFS4ERR_REP_TOO_BIG, NFS4ERR_REQ_TOO_BIG, NFS4ERR_TOO_MANY_OPS, etc.

Not needed immediately

This is stuff that is still a high priority, but that we can temporarily get away without doing on the grounds that they aren't absolutely required for minimal interoperability, and/or they don't introduce any new problems that don't already exist in the 4.0 implementation.

Referring triples

So, the particular requirement, from 2.10.6.3, is below (and in the below you can take the "client operation" to be an open, and the "associated object" to be a delegation created by that open):

       "For each client operation which might result in some sort of
       server callback, the server SHOULD "remember" the { session ID,
       slot ID, sequence ID } triple of the client request until the
       slot ID retirement rules allow the server to determine that the
       client has, in fact, seen the server's reply.  Until the time
       the { session ID, slot ID, sequence ID } request triple can be
       retired, any recalls of the associated object MUST carry an
       array of these referring identifiers (in the CB_SEQUENCE
       operation's arguments), for the benefit of the client."

If we ignore that "MUST", the result will be for the client to return a BADHANDLE or BADSTATEID error, as in v4.0. We have code to handle that case (by retrying) on the server. So if we ignore this requirement, the resulting behavior will be no worse than in 4.0. So I think we can get away with keeping this a *slightly* lower priority than the other stuff.

(I'd still like to see this done--if possible, at about the time it's done on the client. But it's a higher priority task on the client because there it really is mandatory: a server that lists the referring triples correctly does have a right not to have to handle those temporary BADHANDLE/BADSTATEID errors.)

SSV

This is still listed as mandatory in the spec, and while clients and other servers don't seem to be working on implementing this, it's not yet clear to me that there's a consensus to drop it.

Done

GSS on the backchannel

(bfields is working on this)

Clients don't currently request gss on the backchannel. It is mandatory to support this. I don't know if anyone actually does. Still undecided actually how to fail.

More details for gss backchannel support:

We must allow the client to pass the server gss contexts to use on the backchannel.

SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING and SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED should be set when required.

See the end of section 18.36.4 for more implementation details.

SEQ4_STATUS_RECALLABLE_STATE_REVOKED

Set SEQ4_STATUS_RECALLABLE_STATE_REVOKED when a client's failure to return a recallable object causes us to revoke the object, and be prepared to handle a FREE_STATEID from the client as acknowledgement.

backchannel attribute negotiation

alloc_init_session() should sanity-check the values of the backchannel attributes that the client gives us, and fail the CREATE_SESSION if they don't meet our minimal requirements.

One starting point here: http://marc.info/?l=linux-nfs&m=128647408432218&w=2

Make DESTROY_SESSION wait on in-progress requests

See discussion on the ietf list: [1]

(This is a slightly odd one as it's not really required till 4.2. However it appears we have some races here that could cause memory corruption, and that those races would be most easily fixed by delaying session and client destruction till in-progress requests are processed.)

destroy lockowners on unlock

Unlike in the 4.0 case, in the 4.1 case a stateid should become invalid on an unlock that leaves the caller with no bytes locked; a 4.1 client will no longer call release_lockowner.

Respect client-requested backchannel security

Done (auth_null and partial auth_sys); to be merged for 3.8. pynfs tests (DELEG5-7) also available.

We ignore the csa_sec_parms and bca_sec_parms fields that specify the security to be used on the backchannel. Instead, we *always* use auth_sys credentials, because we happen to know is what the Linux client currently expects.

The client can provide a list of possible parameters. To start, we'll just take the first one that we support and ignore the rest.

DESTROY_CLIENTID

DESTROY_CLIENTID is not currently used by clients, but will be, and clients should not be expected to know how to recover from the case where it is not supported. It should also be fairly easy to implement.

FREE_STATEID and TEST_STATEID

Implement SECINFO_NO_NAME

It's mandatory and not very difficult.

SECINFO should consume current filehandle

See [2]

Basic Server Reboot Recovery for 4.1

We need at least basic RECLAIM_COMPLETE support.

Question: do we need to set SEQ4_STATUS_RESTART_RECLAIM_NEEDED on any new session created by a preexisting client during the grace period? Seems like that should be necessary only if we implement persistent sessions, but I suppose it can't harm to set it otherwise.

The reboot recovery system common to 4.0 and 4.1 needs some work, but that's a preexisting 4.0 problem.

Clarify RDMA non-support

Nobody has stepped up to work on RDMA and 4.1, so while it's a violation of our principal that "someone upgrading from the previous version should experience no loss in functionality", we should probably declare the combination of RDMA and 4.1 unsupported, until someone has a chance to spend some time on it.

We've checked that create_session and bind_conn_to_session returns both indicate non-support, so that should be all we need to do for now.

See also 4.1 RDMA issues

ACL retention bits

From inspection of the code, it appears that these bits are ignored on set, cleared on return. We have no plans to really implement these, so for now that's probably adequate.

NFSv4.0

Highest priority

Required for the 4.0 server to be minimally acceptable.

We may accept new features into 4.1 without requiring these be fixed, but it will be a huge problem if they aren't somehow fixed soon.

Fix changeid

We're still relying on ctime for this, inadequate especially for ext3 (with 1-second resolution). Newer filesystems are fixing this, but some more work is needed to take advantage of improvements (for example to improve ext4's native changeid feature.)

Done, needs testing

Turn on reply cache for 4.0

The reply cache is currently off for 4.0 and 4.1. We want it to stay that way for 4.1 (sessions replaces it), but 4.0 still needs it. The slightly tricky part is identifying idempotent versus non-idempotent operations. We should be able to do that by tagging individual ops as one or the other, then noting in the xdr-decode phase whether we've encountered any non-idempotent ops.

"Lower priority" only because NFSv4 is only supported over TCP, and while the reply cache is still needed over TCP, the current reply cache design seems unlikely to help in that case. Therefore NFSv4 without a reply cache is unlikely to be any worse than NFSv3 over TCP already is. So there is still a high-priority bug here (to fix the reply cache), but it already exists in NFSv3.

Accepting more compounds

Out-of-spec compound restrictions: we don't, for example, currently allow the client to send more than one IO (read, write, readdir) operation in a single compound. Some day adventurous clients may run across these cases.

Done

Breaking delegations when required

(bfields is working on this.)

Our delegation implementation does not currently recall delegations on rename or unlink of a delegated file, leading to stale client caches in some cases.

This has been recently fixed for NFS-only access, making this somewhat of a lower priority, but the problem still exists for multi-protocol or local access. For example, if you ssh into the server and remove a file, or remove a file using Samba, then an NFSv4 delegation on that file will not be recalled.

We have CITI patches to address this problem in the VFS. They still have some bugs, and the design needs to be revisited.

See [[3]] for discussion.

Fix reboot recovery

The existing reboot recovery mechanism for NFSv4.0 has some architectural problems, and the core kernel developers have asked us to replace it. The transition between the new and old system will be awkward, and the earlier it's done the better.

We have a basic design for nfsd4 server recovery.

Personal tools