RpcClientTransportSwitch

From Linux NFS

Revision as of 01:50, 24 August 2007 by Chucklever (Talk | contribs)
Jump to: navigation, search

Contents

Purpose

We document the design for a transport switch in the Linux 2.6 RPC client.

Introduction

Today's RPC client and server in the Linux kernel use a socket-based transport layer API. This works well for existing network transport technologies such as IPv4 TCP over gigabit Ethernet.

In the near future, alternate transport technologies will appear which may be difficult to mate with the socket abstraction. Examples of such new technologies include transports that support direct data placement and TCP offload devices accessed directly rather than through the Linux kernel's network layer.

Additionally, other new technologies such as IPv6 and new stream protocols such as SCTP will require significant changes to the socket-based infrastructure in the RPC client and server, but may have little if any effect on other areas.

Finally, security mechanisms such as IPsec and Kerberos 5 privacy may have special buffer management requirements in the transport layer in order to provide as efficient an implementation as possible.

In the following text, we refer to today's RPC client and server that do not have a generic transport switch implementation as the "pre- switch" versions of the client and server.

Specification

Our final goal is an implementation that facilitates integration of
alternate transports while retaining or improving the stability,
performance, and maintainability of the pre-switch RPC client with
socket-based transports.  In other words, we want to have no negative
impact on the performance or stability of the existing IPv4 socket-based
transport as we add a transport switch capability.  Toward that end,
we will introduce as little new functionality to existing support as
possible for IPv4 socket transports; we are simply moving code and
data structures.  When complete, the IPv4 socket transport
implementation will act as a reference for new transport
implementations.
A "transport implementation" provides the code base that supports
particular transport mechanisms, such as "IPv4 socket."  Eventually
transport implementations will be contained in loadable kernel modules.
As they are loaded, they will register with the RPC client and server.
Each transport implementation provides a vector of procs that provide a
way to create, bind, and connect a new transport instance, provide
auxiliary services such as portmapping, and provide ways to configure
send and receive data on, or destroy, such instances.
Each transport connection between the client and server using a
particular transport implementation is known as a "transport instance."
Such an instance is identified by its transport implementation, and
by the endpoint addresses of the client and server, and is represented
by an rpc_xprt struct.  For the "IPv4 socket" transport implementation,
a transport instance is a single IPv4 socket connection that uses
either the UDP or TCP network protocol.  Note, for example, that a
single transport instance might also consist of multiple sockets that
share a workload, or an RDMA link with a passive failover IP socket,
depending on how the instance's transport is implemented.
The transport API now contains methods to access various fields in
the rpc_xprt struct.  A transport-private data structure contains
fields that are specific to a particular transport instance.
When the API is complete, transport endpoint addresses will be contained
in a sockaddr_storage structure and an API method will be provided to
retrieve the value of the remote peer's endpoint address.  Setting the
remote address will only be allowed during transport instance creation.
A transport implementation will usually include its own mechanism for
RPC portmapping.  For example, IPv4 sockets will use the standard RPC
portmapper.  IPv6 sockets may use rpcbind.  Some implementations will
not need any kind of port mapping; such implementations can provide the
portmap methods as no-ops.
We defer the introduction of mechanisms by which user space, and
subsequently the NFS client and server, specify which transport to use
and parameters specific to a particular transport implementation.  New
mount options that control aspects of transport operation and changes
to the mount_data structure will be considered on a case by case basis.

Support for the NFS version 4 session model

The pre-existing RPC client transport model includes a capability
to send RPC requests and receive replies from servers via a single
transport instance.  NFS version 4 (RFC3530) introduces the concept of
a callback channel to support RPC requests sent by NFS servers and
received by clients.  The primary use of this channel is to support
NFS version 4 read and write delegation.  Typically it uses a separate
RPC server instance on the client supported by a separate transport
instance to service callback RPC requests.
In the near future, a minor revision of NFS version 4 will require the
ability to combine the normal RPC request channel with the callback
channel on a single transport instance (also known as the NFS version
4 session layer).  To support bi-directional RPC communications on a
single transport instance, additional transport methods will be
required.
At this time we do not understand yet what will be required, in
addition to the methods described above, to support callbacks on the
same transport instance as the RPC request forward channel.

API Specification

The generic functionality of all RPC transports (ie congestion control,
request queuing, retransmit timeouts, and so on) will remain in xprt.c.
All API methods must be present in all transport implementations.
We define thirteen transport methods:
struct rpc_xprt_ops {
        void            (*setbufsize)(struct rpc_xprt *,
                                        size_t, size_t);
        void            (*print_addr)(struct rpc_xprt *,
                                        size_t, char *, int);
        int             (*is_bound)(struct rpc_xprt *);
        void            (*rpcbind)(struct rpc_task *, struct rpc_clnt *);
        void            (*set_port)(struct rpc_xprt *, unsigned short);
        void            (*connect)(struct rpc_task *);
        int             (*aux_protocol)(struct rpc_xprt *);
        void *          (*buf_alloc)(struct rpc_task *, size_t);
        void            (*buf_free)(struct rpc_task *);
        int             (*send_request)(struct rpc_task *);
        void            (*set_receive_timeout)(struct rpc_task *);
        int             (*is_congested)(struct rpc_xprt *);
        void            (*timeout)(struct rpc_xprt *);
        void            (*close)(struct rpc_xprt *);
        void            (*destroy)(struct rpc_xprt *);
};
The following type defines a single transport implementation.  It provides
a name that functions only as an eye-catcher; the address of the transport
implementation's kernel module structure; a family and protocol; and the
address of the function that the generic layer can use to set up a new
transport instance.  The address of this structure is passed to the
generic layer when the transport implementation initializes.
struct xprt_type {
        struct list_head        list;
        char                    name[32];
        struct module *         owner;
        unsigned short          family;
        int                     protocol;
        int                     (*setup)(struct rpc_xprt *,
                                                struct rpc_timeout *);
};
The setup function is responsible for initializing a number of fields
in the rpc_xprt structure it is passed, in addition to possibly
allocating and intializing a private area for the transport instance.
 tsh_size:              the size, in 8-bit bytes, of a transport-
                        specific header to be placed before the
                        RPC header when building each RPC request.

 cwnd:                  the initial size of the congestion window.

 resvport:              a boolean which, if true, means this
                        transport needs a reserved port.

 max_payload:           the size, in 8-bit bytes, of the largest
                        payload a single RPC request can contain
                        on this transport.

 bind_timeout:          number of jiffies to wait for a bind
                        request to complete before timing it out.

 connect_timeout:       number of jiffies to wait for a transport
                        connect request to complete before timing
                        it out.

 reestablish_timeout:   number of jiffies to wait after a transport
                        is remotely disconnected before attempting
                        to reestablish a connection.

 idle_timeout:          number of jiffies to wait after a transport
                        becomes idle before disconnecting.

 ops:                   the address of this transport instance's
                        operations vector.

 max_reqs:              the maximum number of concurrent requests
                        this transport instance can support.
A (void *) pointer field is made available in the rpc_xprt structure
to reference an implementation-private area where instance variables
specific to a transport implementation can be maintained.


Procedure syntax and functional descriptions

 "setup"         This external function is provided by the transport
                 implementation for initializing a new transport instance,
                 setting the remote peer address, and providing some
                 transport-specific parameters, such as request timeout
                 values.  This function also initializes the vector of API
                 methods with which the generic layer can manipulate the
                 new transport instance.

                 The function takes two arguments:  the address of a
                 freshly allocated rpc_xprt structure, and the address
                 of a structure containing transport-specific options.
                 The "addr" field of the rpc_xprt structure is initialized
                 with the remote endpoint address before "setup" is
                 invoked.

                 The return value is an errno value if problems were
                 encountered, or zero on success.

                 This function is called from a user process context,
                 so it may sleep.  It does not depend on any external
                 locks being held.


 "setbufsize"    This API method is invoked following the creation of
                 a new transport instance to initialize transport layer
                 buffer parameters.

                 The function takes three arguments, which are the address
                 of the rpc_xprt structure that is to be reconnected, and
                 two unsigned integers reflecting the desired size of the
                 tranport's buffer size, in bytes.  It returns nothing.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function is called from a user process context,
                 so it may sleep.  It does not depend on any external
                 locks being held.


 "print_addr"    This API method stuffs a buffer with a formatted string
                 representing the address of the remote peer address.
                 It's useful for building hash functions or with error,
                 warning, and trace messages.

                 The function takes four arguments, which are the address
                 of the rpc_xprt structure containing the remote address,
                 the size in bytes and the address of a buffer to stuff,
                 and a set of flags that determine which address fields
                 are to be formatted.  It returns nothing.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function is called from a user process context,
                 so it may sleep.  It does not depend on any external
                 locks being held.


 "is_bound"      This API method is invoked to determine whether a bind
                 operation is required before a connection is made.

                 The function takes a single argument, which is the address
                 of the rpc_xprt structure which is being tested.  It
                 returns true if the transport is bound already, and false
                 if a bind operation is necessary before proceding.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.

 "rpcbind"       This API method is invoked before a connect to allow
                 portmapping to occur.  If ports are not supported by
                 the underlying transport mechanism, this method can
                 be a no-op.

                 The function takes two arguments: the address of the
                 rpc_task structure for the current RPC request, and
                 the address of the rpc_clnt structure associated with
                 this task.  It returns nothing.

                 This operation starts the bind operation asynchronously,
                 and the caller sleeps using the RPC client's scheduling
                 primitives.  The caller is awoken automatically when
                 the bind is complete, and can check the status of the
                 bind operation using "is_bound."

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "set_port"      This API method is invoked to change the bound port
                 number for a transport.  It is generally invoked only
                 during a bind operation.

                 The function takes two arguments: the address of an
                 rpc_xprt structure to update, and an unsigned 16-bit
                 integer which is the new port number.  It returns
                 nothing.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "connect"       This API method is invoked to connect a transport when
                 the generic transport layer recognizes the need to
                 connect a transport instance.

                 The generic layer serializes transport reads and writes
                 with the connect operation on this transport.  Calling
                 this function starts the connection, but the transport
                 may or may not be connected when it returns.  The
                 generic layer uses the RPC client's scheduler primitives
                 to wait safely until the connection operation is complete,
                 and to allow only one connection attempt at a time.

                 The details of whether a transport is connection-oriented
                 or datagram-oriented can be well hidden in the tranport
                 implementation itself.  The RPC client's finite state
                 engine automatically detects whether a transport is
                 connected before sending each request; if it is not, it
                 will invoke this method automatically.

                 The function takes one argument, which is the address of
                 an rpc_task structure which can be used for scheduling
                 the connection and sleeping.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.


 "aux_protocol"  This API method returns the protocol number to be used
                 to set up auxiliary transports.  An auxiliary transport
                 is an additional transport instance that connects the
                 same endpoints, but carries a different RPC program.
                 NLM, NSM, and NFSACL would use an auxiliary transport
                 to connect to servers.

                 The function takes one argument, which is the address of
                 an rpc_xprt structure.  It returns an integer.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.


 "buf_alloc"     This API method returns an area of memory in which to
                 construct an outgoing RPC and to contain its reply.
                 The memory can be a dynamically allocated buffer, or
                 it can provide the address of an existing memory area
                 where the construction can occur.

                 The function takes two arguments: the address of the
                 rpc_task structure associated with the current request,
                 and a requested size of the memory area, in bytes.  It
                 returns an address of a usable area of memory, or NULL
                 in case no area is currently available.  The RPC
                 client will retry if a NULL is returned.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "buf_free"      This API method is invoked when an rpc_task is finished
                 and must free a memory area allocated via buf_alloc.

                 The function takes one argument: the address of the
                 rpc_task structure associated with the current
                 request.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "send_request"  This API method is invoked to send a single RPC request
                 over the transport, after taking the transports write
                 lock to serialize with other write or connect operations.
                 This method must not sleep or block.

                 This method adds any transport-specific headers that
                 are required before the request is transmitted.  The
                 transport implementation exports the byte size of the
                 space required in the buffer where requests are assembled
                 so that the generic logic may leave that space available
                 for transport-specific header information.

                 The function takes one argument: the address of the
                 rpc_task structure associated with the current
                 request.  The request has already been completely
                 specified in the task's associated rq_rqst.

                 If the transport is unable to write the complete request,
                 this function places the task on a sleep queue and
                 returns EAGAIN.  The transport implementation will
                 wake the task when the send operation can make forward
                 progress.  The generic layer calls this method again
                 when the task is awakened.  The generic layer does not
                 release the write lock until the current request has
                 been completely sent.

                 If the transport requires a "connect" operation, this
                 function returns ENOTCONN.  If any other error occurs,
                 that error is returned.  If the send operation is
                 entirely successful, this method returns zero.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  The generic layer serializes
                 transport reads and writes with the connect operation on
                 this transport.  Calling this function starts the write
                 operation, but the write may not be complete when it
                 returns.  The generic layer uses the RPC client's
                 scheduler primitives to wait safely until the reply to
                 this request is received.


 "set_receive_timeout"  The generic transport layer invokes this API
                 method after a message has been sent successfully on
                 a transport.

                 Each transport implementation provides its own RPC
                 retransmit logic via this method.  It sets the RPC
                 task timeout values so that the task is automatically
                 awakened if no server reply is received.  The timer
                 callout is always xprt_timer.

                 The function takes one argument: the address of the
                 rpc_task structure associated with the current
                 request.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  The caller must acquire the
                 transport_lock and the write lock while calling this
                 function.


 "is_congested"  This API method is invoked to determine whether a
                 transport is congested.  If the transport indicates that
                 it is congested, the generic transport layer puts the
                 current request to sleep.

                 The function takes one argument: the address of the
                 rpc_xprt structure to check.  It returns a zero value
                 if the transport is not congested, and a nonzero
                 value if the current request should be delayed.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "timeout"       This API method is invoked when the RPC client detects a
                 major retransmit timeout on this transport.  The transport
                 implementation can use this to record statistics, adjust
                 timeout values, or mark a connection for reconnection.

                 The function takes one argument: the address of the
                 rpc_xprt structure that experienced the retransmit
                 timeout.  It returns nothing.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "close"         This API method is invoked to close a transport connection.
                 It is the opposite of the "connect" method.

                 The function takes one argument: the address of an
                 rpc_xprt structure to close.  It returns nothing.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 or tasklets, so it must not sleep.  It does not depend on
                 any external locks being held.


 "destroy"       This API method is invoked when a transport will no longer
                 be used.  It is the opposite of the "setup" external
                 function.

                 The function takes one argument: the address of an
                 rpc_xprt structure to close.  It returns nothing.

                 The caller must ensure that the xprt's reference count is
                 positive when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.

Procedure syntax and functional descriptions (external functions)

 "rpc_peeraddr"  This external function is a convenient way to invoke a
                 transport's peer_addr method.

                 The function takes three arguments: the address of the
                 rpc_clnt structure to be queried, the address of a buffer
                 into which to copy the endpoint address, and the size of
                 that buffer.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "rpc_print_peeraddr"  This external function provides a way to format
                 remote peer addresses for printing or for use in a hash
                 function.

                 The function takes four arguments: the address of the
                 rpc_clnt structure containing the address of interest,
                 the address and size of a buffer, and a set of flags
                 that determine which parts of the address are formatted.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "xprt_tsh_size" This external function returns the number of bytes
                 to be left before the RPC header is inserted into the
                 transmission buffer.  The generic transport layer uses
                 this value when constructing each RPC request to leave
                 room for transport specific and protocol specific
                 headers.

                 This function takes one argument: the address of the
                 rpc_xprt structure that will be used to transmit the
                 current request.  It returns the size of any protocol
                 specific header, in bytes, or zero, if no space for
                 a protocol specific header is required.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "xprt_is_bound" This external function is a convenient way to invoke a
                 transport's bound method.

                 The function takes a single argument, which is the address
                 of the rpc_xprt structure which is being tested.  It
                 returns true if the transport is bound already, and false
                 if a bind operation is necessary before proceding.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "xprt_connected" This external function is a convenient way to determine
                 whether a transport is connected.

                 The function takes one argument: the address of the
                 rpc_xprt structure that represents the transport
                 instance to check.  It returns a truth value.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "rpc_max_payload" This external function reports the maximum number of
                 bytes of payload that a single RPC can carry on a given
                 transport protocol.

                 The function takes one argument, which is the address of
                 an rpc_clnt structure created by rpc_create.  It returns
                 a size_t value.

                 This function is called from a user process context,
                 so it may sleep.  It does not depend on any external
                 locks being held.


 "rpc_force_rebind" This external function allows applications to request
                 that the RPC client rebind the transport.

                 The function takes one argument: the address of the
                 rpc_clnt structure to rebind.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "rpc_aux_protocol" This external function reports what transport protocol
                 to use when connecting auxiliary services, such as NLM
                 or NFSACL, based on the protocol used on the main
                 forward channel.

                 The function takes one argument: the address of the
                 rpc_clnt structure to query.  It returns an integer.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.

Procedure syntax and functional descriptions (generic functions)

 In addition to the above API, transport implementations may also need
 to invoke functions that are a part of the generic RPC client.  These
 functions are:

  void rpc_getport(struct rpc_task *task, struct rpc_clnt *clnt)

     This interface provides portmapping for IPv4 sockets.

     The function takes two arguments: the address of the rpc_task
     structure for the current RPC request, and the address of the
     rpc_clnt structure associated with this task.  It returns nothing.

     This operation starts the bind operation asynchronously, and the
     caller sleeps using the RPC client's scheduling primitives.  The
     caller is awoken automatically when the bind is complete, and can
     check the status of the bind operation using "is_bound."

     This function can be called from asynchronous RPC tasks so it must
     not sleep.  It does not depend on any external locks being held.


  void * rpc_malloc(struct rpc_task *task, size_t size)

     This interface allocates a buffer from the rpc_buffer slab cache.
     These buffers are generally used to contain the RPC header for
     each each RPC request.

     The function takes two arguments: the address of the rpc_task
     structure associated with the current request, and a requested
     size of the new buffer, in bytes.  It returns an address of a
     usable area of memory, or NULL in case no buffer is currently
     available.  The RPC client will retry if a NULL is returned.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  It does not depend on any external locks being
     held.


  void rpc_free(struct rpc_task *task)

     Buffers allocated via rpc_malloc are freed via this interface.

     The function takes one argument: the address of the rpc_task
     structure associated with the current request.  It returns
     nothing.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  It does not depend on any external locks being
     held.


  void xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base,
                                 skb_reader_t *desc,
                                 skb_read_actor_t copy_actor)

     This interface is used by datagram socket transports to copy
     data from an incoming skb to an xdr_buf.  It is used by both
     the client and server RPC implementations.

     The function takes four arguments: the address of a standard
     xdr_buf structure containing data to be copied; the base offset
     where the copy operation should begin; the address of the read
     operation descriptor, and the address of a copy actor function.
     It returns nothing.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  It does not depend on any external locks being
     held.


  int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb)

     This interface provides a checksum copy function that copies
     data from an skb to an xdr_buf.  It is used by both the client
     and server RPC implementations.
     
     The function takes two arguments: the address of a standard
     xdr_buf structure that acts as the destination of the copy
     operation, and the address of an skbuff structure containing
     data to be copied.  It returns the number of bytes that were
     copied.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  It does not depend on any external locks being
     held.


  void rpc_init_rtt(struct rpc_rtt *rt, unsigned long timeo)

     A transport implementation can invoke this function to initialize
     an rpc_rtt structure.

     The function takes two arguments: the address of an rpc_rtt
     structure to initialize, and the number of jiffies to use as
     the initial timeout value.  It returns nothing.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  It does not depend on any external locks being
     held.


  void rpc_update_rtt(struct rpc_rtt *rt, unsigned timer, long m)

     Transport implementations use this function to update an rpc_rtt
     structure when an RPC request has completed.

     The function takes three arguments: the address of the rpc_rtt
     structure to update; the index of the timer to update; and
     the number of jiffies that have passed since the RPC request
     was started.  It returns nothing.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  The transport_lock must be held before calling
     this function.


  unsigned long rpc_calc_rto(struct rpc_rtt *rt, unsigned timer)

     This interface returns a value suitable for use as a retransmission
     timeout, in jiffies, based on the context data contained in an
     rpc_rtt structure.

     The function takes two arguments: the address of the rpc_rtt
     structure that contains the data to use for the calculation,
     and the index of the timer to use.  It returns the number of
     jiffies to use for the retransmit timer.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  The transport_lock must be held before calling
     this function.


  int xprt_register(struct xprt_type *transport)
  int xprt_unregister(struct xprt_type *transport)

     Transport implementations use this interface to register their
     presence with the generic transport layer.  The transport layer
     will not use a transport implementation for new RPC connections
     until the transport implementation has registered via this
     interface.

     Both functions take a single argument: the address of an
     xprt_type structure representing the transport implementation
     to register or unregister.  Both functions return zero on
     success, and an errno-type value on failure.

     This function is called from a user process context, so it may
     sleep.  It does not depend on any external locks being held.


  void xprt_adjust_cwnd(struct rpc_rqst *req, int result)

     Transport implementations that need congestion control invoke
     this function to adjust their congestion window.

     The function takes two arguments: the address of an rpc_rqst
     structure representing the request that has caused the
     change in the transport's congestion window, and an integer
     containing an errno value indicating why the window needs
     to be adjusted.  It returns nothing.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  The transport_lock must be held before calling
     this function.


  void xprt_disconnect(struct rpc_xprt *xprt)

     Callers use this interface to mark a transport as disconnected.
     The generic layer will subsequently terminate the transport
     connection when it is safe to do so.

     The function takes a single argument: the address of an
     rpc_xprt structure representing the transport instance to
     mark disconnected.  It returns nothing.

     The caller must ensure that the xprt's reference count is
     greater than one when calling this function.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  It does not depend on any external locks being
     held.


  struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt,
                                             u32 xid)

     When an RPC reply is first recieved, the transport implementation
     invokes this function to map the received XID to a pending
     rpc_rqst.

     The function takes two arguments: the address of an rpc_xprt
     structure on which a request has just arrived, and a 32-bit
     value representing the XID of the request to look up.

     The caller must ensure that the xprt's reference count is
     greater than one when calling this function.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  The transport_lock must be held before calling
     this function.


  void xprt_complete_rqst(struct rpc_rqst *req, size_t copied)

     A transport implementation invokes this function to signal
     that a complete RPC reply has been received, and that the
     RPC client may begin decoding the reply.

     This function takes two arguments: the address of an rpc_rqst
     structure representing the request that is being completed, and
     an integer containing the number of payload bytes that were just
     copied by the request.

     This function can be called from asynchronous RPC tasks so it
     must not sleep.  The transport_lock must be held before calling
     this function.

Procedure syntax and functional descriptions (create)

The transport switch replaces the two functions that were formerly
used to create a new rpc_clnt, xprt_create_proto and rpc_create_client,
with a single function call that hides the details of the transport
from RPC applications.
To create a new rpc_clnt structure, an application will fill in
this structure, and pass it to the new rpc_create function:
struct rpc_create_args {
        int                     protocol;
        struct sockaddr         *address;
        size_t                  addrsize;
        struct rpc_timeout      *timeout;
        char                    *servername;
        struct rpc_program      *program;
        u32                     version;
        rpc_authflavor_t        authflavor;
        unsigned long           behavior;
};
This structure contains all the same parameters that the
xprt_create_proto and rpc_create_client function calls used.  In
addition, a "behavior" field contains bits that enable specific
behaviors in the new rpc_clnt instance.
#define RPC_CLNT_SOFTRTRY       (1UL << 0)
#define RPC_CLNT_INTR           (1UL << 1)
#define RPC_CLNT_CHATTY         (1UL << 2)
#define RPC_CLNT_AUTOBIND       (1UL << 3)
#define RPC_CLNT_DROPPRIV       (1UL << 4)
#define RPC_CLNT_ONESHOT        (1UL << 5)
#define RPC_CLNT_RESVPORT       (1UL << 6)
  int rpc_create(struct rpc_create_args *);

     This function is invoked by applications to create a new
     rpc_clnt structure.

     The function takes a single argument: the address of the
     rpc_create_args structure that provides the parameters for
     the new rpc_clnt instance.

     This function is called from a user process context, so it
     may sleep.  It does not depend on any external locks being
     held.

Conclusion

With the implementation of an RPC transport switch, we hope to
facilitate the introduction of significant new technolgy into the
Linux kernel RPC implementation.  Not only will the RPC transport
switch enable new transport technologies such as high performance
TCP offload, but it will ease enhancements such as multiple sockets
per client-server pair, the elimination of the RPC slot table, and
the removal of the global kernel lock from the RPC client and server.
Personal tools