RpcClientTransportSwitch

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(Break into sections)
(wiki-fy this section -- remove a whole hunk of white space)
 
(9 intermediate revisions not shown)
Line 1: Line 1:
-
<pre>
 
-
                      Linux 2.6 RPC Transport Switch:
 
-
                          Design & Implementation
 
-
 
-
AUTHOR  Chuck Lever
 
-
VERSION  Sat Feb 26 13:18:44 PST 2005
 
-
</pre>
 
-
 
== Purpose ==
== Purpose ==
Line 14: Line 6:
== Introduction ==
== Introduction ==
-
Today's RPC client and server in the Linux kernel use a socket-based
+
Today's RPC client and server in the Linux kernel use a socket-based transport layer API.  This works well for existing network transport technologies such as IPv4 TCP over gigabit Ethernet.
-
transport layer API.  This works well for existing network transport
+
-
technologies such as IPv4 TCP over gigabit Ethernet.
+
-
 
+
-
In the near future, alternate transport technologies will appear
+
-
which may be difficult to mate with the socket abstraction.  Examples
+
-
of such new technologies include transports that support direct data
+
-
placement and TCP offload devices accessed directly rather than
+
-
through the Linux kernel's network layer.
+
-
Additionally, other new technologies such as IPv6 and new stream
+
In the near future, alternate transport technologies will appear which may be difficult to mate with the socket abstraction. Examples of such new technologies include transports that support direct data placement and TCP offload devices accessed directly rather than through the Linux kernel's network layer.
-
protocols such as SCTP will require significant changes to the
+
-
  socket-based infrastructure in the RPC client and server, but may
+
-
have little if any effect on other areas.
+
-
Finally, security mechanisms such as IPsec and Kerberos 5 privacy
+
Additionally, other new technologies such as IPv6 and new stream protocols such as SCTP will require significant changes to the socket-based infrastructure in the RPC client and server, but may have little if any effect on other areas.
-
may have special buffer management requirements in the transport
+
-
layer in order to provide as efficient an implementation as
+
-
possible.
+
-
In the following text, we refer to today's RPC client and server that
+
Finally, security mechanisms such as IPsec and Kerberos 5 privacy may have special buffer management requirements in the transport layer in order to provide as efficient an implementation as possible.
-
do not have a generic transport switch implementation as the "pre-
+
-
switch" versions of the client and server.
+
 +
In the following text, we refer to today's RPC client and server that do not have a generic transport switch implementation as the "pre- switch" versions of the client and server.
== Specification ==
== Specification ==
-
Our final goal is an implementation that facilitates integration of
+
Our final goal is an implementation that facilitates integration of alternate transports while retaining or improving the stability, performance, and maintainability of the pre-switch RPC client with socket-based transports.  In other words, we want to have no negative impact on the performance or stability of the existing IPv4 socket-based transport as we add a transport switch capability.  Toward that end, we will introduce as little new functionality to existing support as possible for IPv4 socket transports; we are simply moving code and data structures.  When complete, the IPv4 socket transport implementation will act as a reference for new transport implementations.
-
alternate transports while retaining or improving the stability,
+
-
performance, and maintainability of the pre-switch RPC client with
+
-
socket-based transports.  In other words, we want to have no negative
+
-
impact on the performance or stability of the existing IPv4 socket-based
+
-
transport as we add a transport switch capability.  Toward that end,
+
-
we will introduce as little new functionality to existing support as
+
-
possible for IPv4 socket transports; we are simply moving code and
+
-
data structures.  When complete, the IPv4 socket transport
+
-
implementation will act as a reference for new transport
+
-
implementations.
+
-
A "transport implementation" provides the code base that supports
+
A "transport implementation" provides the code base that supports particular transport mechanisms, such as "IPv4 socket."  Eventually transport implementations will be contained in loadable kernel modules. As they are loaded, they will register with the RPC client and server. Each transport implementation provides a vector of procs that provide a way to create, bind, and connect a new transport instance, provide auxiliary services such as portmapping, and provide ways to configure send and receive data on, or destroy, such instances.
-
particular transport mechanisms, such as "IPv4 socket."  Eventually
+
-
transport implementations will be contained in loadable kernel modules.
+
-
As they are loaded, they will register with the RPC client and server.
+
-
Each transport implementation provides a vector of procs that provide a
+
-
way to create, bind, and connect a new transport instance, provide
+
-
auxiliary services such as portmapping, and provide ways to configure
+
-
send and receive data on, or destroy, such instances.
+
-
Each transport connection between the client and server using a
+
Each transport connection between the client and server using a particular transport implementation is known as a "transport instance." Such an instance is identified by its transport implementation, and by the endpoint addresses of the client and server, and is represented by an rpc_xprt struct.  For the "IPv4 socket" transport implementation, a transport instance is a single IPv4 socket connection that uses either the UDP or TCP network protocol.  Note, for example, that a single transport instance might also consist of multiple sockets that share a workload, or an RDMA link with a passive failover IP socket, depending on how the instance's transport is implemented.
-
particular transport implementation is known as a "transport instance."
+
-
Such an instance is identified by its transport implementation, and
+
-
by the endpoint addresses of the client and server, and is represented
+
-
by an rpc_xprt struct.  For the "IPv4 socket" transport implementation,
+
-
a transport instance is a single IPv4 socket connection that uses
+
-
either the UDP or TCP network protocol.  Note, for example, that a
+
-
single transport instance might also consist of multiple sockets that
+
-
share a workload, or an RDMA link with a passive failover IP socket,
+
-
depending on how the instance's transport is implemented.
+
-
The transport API now contains methods to access various fields in
+
The transport API now contains methods to access various fields in the rpc_xprt struct.  A transport-private data structure contains fields that are specific to a particular transport instance.
-
the rpc_xprt struct.  A transport-private data structure contains
+
-
fields that are specific to a particular transport instance.
+
-
When the API is complete, transport endpoint addresses will be contained
+
When the API is complete, transport endpoint addresses will be contained in a sockaddr_storage structure and an API method will be provided to retrieve the value of the remote peer's endpoint address.  Setting the remote address will only be allowed during transport instance creation.
-
in a sockaddr_storage structure and an API method will be provided to
+
-
retrieve the value of the remote peer's endpoint address.  Setting the
+
-
remote address will only be allowed during transport instance creation.
+
-
A transport implementation will usually include its own mechanism for
+
A transport implementation will usually include its own mechanism for RPC portmapping.  For example, IPv4 sockets will use the standard RPC portmapper.  IPv6 sockets may use rpcbind.  Some implementations will not need any kind of port mapping; such implementations can provide the portmap methods as no-ops.
-
RPC portmapping.  For example, IPv4 sockets will use the standard RPC
+
-
portmapper.  IPv6 sockets may use rpcbind.  Some implementations will
+
-
not need any kind of port mapping; such implementations can provide the
+
-
portmap methods as no-ops.
+
-
We defer the introduction of mechanisms by which user space, and
+
We defer the introduction of mechanisms by which user space, and subsequently the NFS client and server, specify which transport to use and parameters specific to a particular transport implementation.  New mount options that control aspects of transport operation and changes to the mount_data structure will be considered on a case by case basis.
-
subsequently the NFS client and server, specify which transport to use
+
-
and parameters specific to a particular transport implementation.  New
+
-
mount options that control aspects of transport operation and changes
+
-
to the mount_data structure will be considered on a case by case basis.
+
=== Support for the NFS version 4 session model ===
=== Support for the NFS version 4 session model ===
-
The pre-existing RPC client transport model includes a capability
+
The pre-existing RPC client transport model includes a capability to send RPC requests and receive replies from servers via a single transport instance.  NFS version 4 (RFC 3530) introduces the concept of a callback channel to support RPC requests sent by NFS servers and received by clients.  The primary use of this channel is to support NFS version 4 read and write delegation.  Typically it uses a separate RPC server instance on the client supported by a separate transport instance to service callback RPC requests.
-
to send RPC requests and receive replies from servers via a single
+
-
transport instance.  NFS version 4 (RFC3530) introduces the concept of
+
-
a callback channel to support RPC requests sent by NFS servers and
+
-
received by clients.  The primary use of this channel is to support
+
-
NFS version 4 read and write delegation.  Typically it uses a separate
+
-
RPC server instance on the client supported by a separate transport
+
-
instance to service callback RPC requests.
+
-
In the near future, a minor revision of NFS version 4 will require the
+
In the near future, a minor revision of NFS version 4 will require the ability to combine the normal RPC request channel with the callback channel on a single transport instance (also known as the NFS version 4 session layer).  To support bi-directional RPC communications on a single transport instance, additional transport methods will be required.
-
ability to combine the normal RPC request channel with the callback
+
-
channel on a single transport instance (also known as the NFS version
+
-
4 session layer).  To support bi-directional RPC communications on a
+
-
single transport instance, additional transport methods will be
+
-
required.
+
-
At this time we do not understand yet what will be required, in
+
At this time we do not understand yet what will be required, in addition to the methods described above, to support callbacks on the same transport instance as the RPC request forward channel.
-
addition to the methods described above, to support callbacks on the
+
-
same transport instance as the RPC request forward channel.
+
== API Specification ==
== API Specification ==
-
The generic functionality of all RPC transports (ie congestion control,
+
The generic functionality of all RPC transports (ie congestion control, request queuing, retransmit timeouts, and so on) will remain in xprt.c. All API methods must be present in all transport implementations.
-
request queuing, retransmit timeouts, and so on) will remain in xprt.c.
+
-
All API methods must be present in all transport implementations.
+
-
We define thirteen transport methods:
+
We define thirteen transport methods:
<pre>
<pre>
Line 146: Line 68:
</pre>
</pre>
-
The following type defines a single transport implementation.  It provides
+
The following type defines a single transport implementation.  It provides a name that functions only as an eye-catcher; the address of the transport implementation's kernel module structure; a family and protocol; and the address of the function that the generic layer can use to set up a new transport instance.  The address of this structure is passed to the generic layer when the transport implementation initializes.
-
a name that functions only as an eye-catcher; the address of the transport
+
-
implementation's kernel module structure; a family and protocol; and the
+
-
address of the function that the generic layer can use to set up a new
+
-
transport instance.  The address of this structure is passed to the
+
-
generic layer when the transport implementation initializes.
+
<pre>
<pre>
Line 165: Line 82:
</pre>
</pre>
-
The setup function is responsible for initializing a number of fields
+
The setup function is responsible for initializing a number of fields in the rpc_xprt structure it is passed, in addition to possibly allocating and intializing a private area for the transport instance.
-
in the rpc_xprt structure it is passed, in addition to possibly
+
-
allocating and intializing a private area for the transport instance.
+
-
<pre>
+
;tsh_size
-
tsh_size:             the size, in 8-bit bytes, of a transport-
+
:the size, in 8-bit bytes, of a transport-specific header to be placed before the RPC header when building each RPC request.
-
                        specific header to be placed before the
+
;cwnd
-
                        RPC header when building each RPC request.
+
:the initial size of the congestion window.
 +
;resvport
 +
:a boolean which, if true, means this transport needs a reserved port.
 +
;max_payload
 +
:the size, in 8-bit bytes, of the largest payload a single RPC request can contain on this transport.
 +
;bind_timeout
 +
:number of jiffies to wait for a bind request to complete before timing it out.
 +
;connect_timeout
 +
:number of jiffies to wait for a transport connect request to complete before timing it out.
 +
;reestablish_timeout
 +
:number of jiffies to wait after a transport is remotely disconnected before attempting to reestablish a connection.
 +
;idle_timeout
 +
:number of jiffies to wait after a transport becomes idle before disconnecting.
 +
;ops
 +
:the address of this transport instance's operations vector.
 +
;max_reqs
 +
:the maximum number of concurrent requests this transport instance can support.
-
cwnd:                  the initial size of the congestion window.
+
A (void *) pointer field is made available in the rpc_xprt structure to reference an implementation-private area where instance variables specific to a transport implementation can be maintained.
-
resvport:              a boolean which, if true, means this
+
=== Procedure syntax and functional descriptions (transport ops) ===
-
                        transport needs a reserved port.
+
-
max_payload:          the size, in 8-bit bytes, of the largest
+
;setup
-
                        payload a single RPC request can contain
+
-
                        on this transport.
+
-
bind_timeout:         number of jiffies to wait for a bind
+
:This external function is provided by the transport implementation for initializing a new transport instance, setting the remote peer address, and providing some transport-specific parameters, such as request timeout values.  This function also initializes the vector of API methods with which the generic layer can manipulate the new transport instance.
-
                        request to complete before timing it out.
+
-
connect_timeout:       number of jiffies to wait for a transport
+
:The function takes two arguments:  the address of a freshly allocated rpc_xprt structure, and the address of a structure containing transport-specific options. The "addr" field of the rpc_xprt structure is initialized with the remote endpoint address before "setup" is invoked.
-
                        connect request to complete before timing
+
-
                        it out.
+
-
reestablish_timeout:   number of jiffies to wait after a transport
+
:The return value is an errno value if problems were encountered, or zero on success.
-
                        is remotely disconnected before attempting
+
-
                        to reestablish a connection.
+
-
idle_timeout:         number of jiffies to wait after a transport
+
:This function is called from a user process context, so it may sleep.  It does not depend on any external locks being held.
-
                        becomes idle before disconnecting.
+
-
ops:                  the address of this transport instance's
+
;setbufsize
-
                        operations vector.
+
-
max_reqs:             the maximum number of concurrent requests
+
:This API method is invoked following the creation of a new transport instance to initialize transport layer buffer parameters.
-
                        this transport instance can support.
+
-
</pre>
+
-
A (void *) pointer field is made available in the rpc_xprt structure
+
:The function takes three arguments, which are the address of the rpc_xprt structure that is to be reconnected, and two unsigned integers reflecting the desired size of the tranport's buffer size, in bytes.  It returns nothing.
-
to reference an implementation-private area where instance variables
+
-
specific to a transport implementation can be maintained.
+
 +
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
=== Procedure syntax and functional descriptions ===
+
:This function is called from a user process context, so it may sleep.  It does not depend on any external locks being held.
-
<pre>
+
;print_addr
-
"setup"        This external function is provided by the transport
+
-
                implementation for initializing a new transport instance,
+
-
                setting the remote peer address, and providing some
+
-
                transport-specific parameters, such as request timeout
+
-
                values.  This function also initializes the vector of API
+
-
                methods with which the generic layer can manipulate the
+
-
                new transport instance.
+
-
                The function takes two arguments: the address of a
+
:This API method stuffs a buffer with a formatted string representing the address of the remote peer address. It's useful for building hash functions or with error, warning, and trace messages.
-
                freshly allocated rpc_xprt structure, and the address
+
-
                of a structure containing transport-specific options.
+
-
                The "addr" field of the rpc_xprt structure is initialized
+
-
                with the remote endpoint address before "setup" is
+
-
                invoked.
+
-
                The return value is an errno value if problems were
+
:The function takes four arguments, which are the address of the rpc_xprt structure containing the remote address, the size in bytes and the address of a buffer to stuff, and a set of flags that determine which address fields are to be formatted.  It returns nothing.
-
                encountered, or zero on success.
+
-
                This function is called from a user process context,
+
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
                so it may sleep.  It does not depend on any external
+
-
                locks being held.
+
 +
:This function is called from a user process context, so it may sleep.  It does not depend on any external locks being held.
-
"setbufsize"    This API method is invoked following the creation of
+
;is_bound
-
                a new transport instance to initialize transport layer
+
-
                buffer parameters.
+
-
                The function takes three arguments, which are the address
+
:This API method is invoked to determine whether a bind operation is required before a connection is made.
-
                of the rpc_xprt structure that is to be reconnected, and
+
-
                two unsigned integers reflecting the desired size of the
+
-
                tranport's buffer size, in bytes.  It returns nothing.
+
-
                The caller must ensure that the xprt's reference count is
+
:The function takes a single argument, which is the address of the rpc_xprt structure which is being tested.  It returns true if the transport is bound already, and false if a bind operation is necessary before proceding.
-
                greater than one when calling this function.
+
-
                This function is called from a user process context,
+
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
                so it may sleep.  It does not depend on any external
+
-
                locks being held.
+
 +
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
"print_addr"    This API method stuffs a buffer with a formatted string
+
;rpcbind
-
                representing the address of the remote peer address.
+
-
                It's useful for building hash functions or with error,
+
-
                warning, and trace messages.
+
-
                The function takes four arguments, which are the address
+
:This API method is invoked before a connect to allow portmapping to occurIf ports are not supported by the underlying transport mechanism, this method can be a no-op.
-
                of the rpc_xprt structure containing the remote address,
+
-
                the size in bytes and the address of a buffer to stuff,
+
-
                and a set of flags that determine which address fields
+
-
                are to be formattedIt returns nothing.
+
-
                The caller must ensure that the xprt's reference count is
+
:The function takes two arguments: the address of the rpc_task structure for the current RPC request, and the address of the rpc_clnt structure associated with this task.  It returns nothing.
-
                greater than one when calling this function.
+
-
                This function is called from a user process context,
+
:This operation starts the bind operation asynchronously, and the caller sleeps using the RPC client's scheduling primitivesThe caller is awoken automatically when the bind is complete, and can check the status of the bind operation using "is_bound."
-
                so it may sleepIt does not depend on any external
+
-
                locks being held.
+
 +
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
"is_bound"      This API method is invoked to determine whether a bind
+
;set_port
-
                operation is required before a connection is made.
+
-
                The function takes a single argument, which is the address
+
:This API method is invoked to change the bound port number for a transport.  It is generally invoked only during a bind operation.
-
                of the rpc_xprt structure which is being tested.  It
+
-
                returns true if the transport is bound already, and false
+
-
                if a bind operation is necessary before proceding.
+
-
                The caller must ensure that the xprt's reference count is
+
:The function takes two arguments: the address of an rpc_xprt structure to update, and an unsigned 16-bit integer which is the new port number.  It returns nothing.
-
                greater than one when calling this function.
+
-
                This function can be called from asynchronous RPC tasks
+
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
                so it must not sleep.  It does not depend on any external
+
-
                locks being held.
+
-
"rpcbind"      This API method is invoked before a connect to allow
+
:This function can be called from asynchronous RPC task, so it must not sleepIt does not depend on any external locks being held.
-
                portmapping to occurIf ports are not supported by
+
-
                the underlying transport mechanism, this method can
+
-
                be a no-op.
+
-
                The function takes two arguments: the address of the
+
;connect
-
                rpc_task structure for the current RPC request, and
+
-
                the address of the rpc_clnt structure associated with
+
-
                this task.  It returns nothing.
+
-
                This operation starts the bind operation asynchronously,
+
:This API method is invoked to connect a transport when the generic transport layer recognizes the need to connect a transport instance.
-
                and the caller sleeps using the RPC client's scheduling
+
-
                primitives.  The caller is awoken automatically when
+
-
                the bind is complete, and can check the status of the
+
-
                bind operation using "is_bound."
+
-
                This function can be called from asynchronous RPC tasks
+
:The generic layer serializes transport reads and writes with the connect operation on this transport.  Calling this function starts the connection, but the transport may or may not be connected when it returnsThe generic layer uses the RPC client's scheduler primitives to wait safely until the connection operation is complete, and to allow only one connection attempt at a time.
-
                so it must not sleepIt does not depend on any external
+
-
                locks being held.
+
 +
:The details of whether a transport is connection-oriented or datagram-oriented can be well hidden in the tranport implementation itself.  The RPC client's finite state engine automatically detects whether a transport is connected before sending each request; if it is not, it will invoke this method automatically.
-
"set_port"      This API method is invoked to change the bound port
+
:The function takes one argument, which is the address of an rpc_task structure which can be used for scheduling the connection and sleeping.  It returns nothing.
-
                number for a transport.  It is generally invoked only
+
-
                during a bind operation.
+
-
                The function takes two arguments: the address of an
+
:This function can be called from asynchronous RPC tasks so it must not sleep.
-
                rpc_xprt structure to update, and an unsigned 16-bit
+
-
                integer which is the new port number.  It returns
+
-
                nothing.
+
-
                The caller must ensure that the xprt's reference count is
+
;aux_protocol
-
                greater than one when calling this function.
+
-
                This function can be called from asynchronous RPC tasks
+
:This API method returns the protocol number to be used to set up auxiliary transportsAn auxiliary transport is an additional transport instance that connects the same endpoints, but carries a different RPC program. NLM, NSM, and NFSACL would use an auxiliary transport to connect to servers.
-
                so it must not sleepIt does not depend on any external
+
-
                locks being held.
+
 +
:The function takes one argument, which is the address of an rpc_xprt structure.  It returns an integer.
-
"connect"      This API method is invoked to connect a transport when
+
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
                the generic transport layer recognizes the need to
+
-
                connect a transport instance.
+
-
                The generic layer serializes transport reads and writes
+
:This function can be called from asynchronous RPC tasks so it must not sleep.
-
                with the connect operation on this transport.  Calling
+
-
                this function starts the connection, but the transport
+
-
                may or may not be connected when it returns.  The
+
-
                generic layer uses the RPC client's scheduler primitives
+
-
                to wait safely until the connection operation is complete,
+
-
                and to allow only one connection attempt at a time.
+
-
                The details of whether a transport is connection-oriented
+
;buf_alloc
-
                or datagram-oriented can be well hidden in the tranport
+
-
                implementation itself.  The RPC client's finite state
+
-
                engine automatically detects whether a transport is
+
-
                connected before sending each request; if it is not, it
+
-
                will invoke this method automatically.
+
-
                The function takes one argument, which is the address of
+
:This API method returns an area of memory in which to construct an outgoing RPC and to contain its reply. The memory can be a dynamically allocated buffer, or it can provide the address of an existing memory area where the construction can occur.
-
                an rpc_task structure which can be used for scheduling
+
-
                the connection and sleeping.  It returns nothing.
+
-
                This function can be called from asynchronous RPC tasks
+
:The function takes two arguments: the address of the rpc_task structure associated with the current request, and a requested size of the memory area, in bytes.  It returns an address of a usable area of memory, or NULL in case no area is currently available.  The RPC client will retry if a NULL is returned.
-
                so it must not sleep.
+
 +
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
"aux_protocol"  This API method returns the protocol number to be used
+
;buf_free
-
                to set up auxiliary transports.  An auxiliary transport
+
-
                is an additional transport instance that connects the
+
-
                same endpoints, but carries a different RPC program.
+
-
                NLM, NSM, and NFSACL would use an auxiliary transport
+
-
                to connect to servers.
+
-
                The function takes one argument, which is the address of
+
:This API method is invoked when an rpc_task is finished and must free a memory area allocated via buf_alloc.
-
                an rpc_xprt structure.  It returns an integer.
+
-
                The caller must ensure that the xprt's reference count is
+
:The function takes one argument: the address of the rpc_task structure associated with the current request.  It returns nothing.
-
                greater than one when calling this function.
+
-
                This function can be called from asynchronous RPC tasks
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
                so it must not sleep.
+
 +
;send_request
-
"buf_alloc"    This API method returns an area of memory in which to
+
:This API method is invoked to send a single RPC request over the transport, after taking the transports write lock to serialize with other write or connect operations. This method must not sleep or block.
-
                construct an outgoing RPC and to contain its reply.
+
-
                The memory can be a dynamically allocated buffer, or
+
-
                it can provide the address of an existing memory area
+
-
                where the construction can occur.
+
-
                The function takes two arguments: the address of the
+
:This method adds any transport-specific headers that are required before the request is transmitted.  The transport implementation exports the byte size of the space required in the buffer where requests are assembled so that the generic logic may leave that space available for transport-specific header information.
-
                rpc_task structure associated with the current request,
+
-
                and a requested size of the memory area, in bytes.  It
+
-
                returns an address of a usable area of memory, or NULL
+
-
                in case no area is currently available.  The RPC
+
-
                client will retry if a NULL is returned.
+
-
                This function can be called from asynchronous RPC tasks
+
:The function takes one argument: the address of the rpc_task structure associated with the current requestThe request has already been completely specified in the task's associated rq_rqst.
-
                so it must not sleepIt does not depend on any external
+
-
                locks being held.
+
 +
:If the transport is unable to write the complete request, this function places the task on a sleep queue and returns EAGAIN.  The transport implementation will wake the task when the send operation can make forward progress.  The generic layer calls this method again when the task is awakened.  The generic layer does not release the write lock until the current request has been completely sent.
-
"buf_free"     This API method is invoked when an rpc_task is finished
+
:If the transport requires a "connect" operation, this function returns ENOTCONN.  If any other error occurs, that error is returned.  If the send operation is entirely successful, this method returns zero.
-
                and must free a memory area allocated via buf_alloc.
+
-
                The function takes one argument: the address of the
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  The generic layer serializes transport reads and writes with the connect operation on this transportCalling this function starts the write operation, but the write may not be complete when it returns.  The generic layer uses the RPC client's scheduler primitives to wait safely until the reply to this request is received.
-
                rpc_task structure associated with the current
+
-
                requestIt returns nothing.
+
-
                This function can be called from asynchronous RPC tasks
+
;set_receive_timeout
-
                so it must not sleep.  It does not depend on any external
+
-
                locks being held.
+
 +
:The generic transport layer invokes this API method after a message has been sent successfully on a transport.
-
"send_request"  This API method is invoked to send a single RPC request
+
:Each transport implementation provides its own RPC retransmit logic via this method.  It sets the RPC task timeout values so that the task is automatically awakened if no server reply is received. The timer callout is always xprt_timer.
-
                over the transport, after taking the transports write
+
-
                lock to serialize with other write or connect operations.
+
-
                This method must not sleep or block.
+
-
                This method adds any transport-specific headers that
+
:The function takes one argument: the address of the rpc_task structure associated with the current request.  It returns nothing.
-
                are required before the request is transmitted.  The
+
-
                transport implementation exports the byte size of the
+
-
                space required in the buffer where requests are assembled
+
-
                so that the generic logic may leave that space available
+
-
                for transport-specific header information.
+
-
                The function takes one argument: the address of the
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  The caller must acquire the transport_lock and the write lock while calling this function.
-
                rpc_task structure associated with the current
+
-
                request.  The request has already been completely
+
-
                specified in the task's associated rq_rqst.
+
-
                If the transport is unable to write the complete request,
+
;is_congested
-
                this function places the task on a sleep queue and
+
-
                returns EAGAIN.  The transport implementation will
+
-
                wake the task when the send operation can make forward
+
-
                progress.  The generic layer calls this method again
+
-
                when the task is awakened.  The generic layer does not
+
-
                release the write lock until the current request has
+
-
                been completely sent.
+
-
                If the transport requires a "connect" operation, this
+
:This API method is invoked to determine whether a transport is congested.  If the transport indicates that it is congested, the generic transport layer puts the current request to sleep.
-
                function returns ENOTCONN.  If any other error occurs,
+
-
                that error is returned.  If the send operation is
+
-
                entirely successful, this method returns zero.
+
-
                This function can be called from asynchronous RPC tasks
+
:The function takes one argument: the address of the rpc_xprt structure to checkIt returns a zero value if the transport is not congested, and a nonzero value if the current request should be delayed.
-
                so it must not sleep.  The generic layer serializes
+
-
                transport reads and writes with the connect operation on
+
-
                this transport.  Calling this function starts the write
+
-
                operation, but the write may not be complete when it
+
-
                returnsThe generic layer uses the RPC client's
+
-
                scheduler primitives to wait safely until the reply to
+
-
                this request is received.
+
 +
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
  "set_receive_timeout"  The generic transport layer invokes this API
+
:This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
-
                method after a message has been sent successfully on
+
-
                a transport.
+
-
                Each transport implementation provides its own RPC
+
;timeout
-
                retransmit logic via this method.  It sets the RPC
+
-
                task timeout values so that the task is automatically
+
-
                awakened if no server reply is received.  The timer
+
-
                callout is always xprt_timer.
+
-
                The function takes one argument: the address of the
+
:This API method is invoked when the RPC client detects a major retransmit timeout on this transportThe transport implementation can use this to record statistics, adjust timeout values, or mark a connection for reconnection.
-
                rpc_task structure associated with the current
+
-
                requestIt returns nothing.
+
-
                This function can be called from asynchronous RPC tasks
+
:The function takes one argument: the address of the rpc_xprt structure that experienced the retransmit timeout.  It returns nothing.
-
                so it must not sleep.  The caller must acquire the
+
-
                transport_lock and the write lock while calling this
+
-
                function.
+
 +
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
"is_congested"  This API method is invoked to determine whether a
+
:This function can be called from asynchronous RPC tasks so it must not sleepIt does not depend on any external locks being held.
-
                transport is congestedIf the transport indicates that
+
-
                it is congested, the generic transport layer puts the
+
-
                current request to sleep.
+
-
                The function takes one argument: the address of the
+
;close
-
                rpc_xprt structure to check.  It returns a zero value
+
-
                if the transport is not congested, and a nonzero
+
-
                value if the current request should be delayed.
+
-
                The caller must ensure that the xprt's reference count is
+
:This API method is invoked to close a transport connection. It is the opposite of the "connect" method.
-
                greater than one when calling this function.
+
-
                This function can be called from asynchronous RPC tasks
+
:The function takes one argument: the address of an rpc_xprt structure to close.  It returns nothing.
-
                so it must not sleep.  It does not depend on any external
+
-
                locks being held.
+
 +
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
"timeout"      This API method is invoked when the RPC client detects a
+
:This function can be called from asynchronous RPC tasks or tasklets, so it must not sleepIt does not depend on any external locks being held.
-
                major retransmit timeout on this transportThe transport
+
-
                implementation can use this to record statistics, adjust
+
-
                timeout values, or mark a connection for reconnection.
+
-
                The function takes one argument: the address of the
+
;destroy
-
                rpc_xprt structure that experienced the retransmit
+
-
                timeout.  It returns nothing.
+
-
                The caller must ensure that the xprt's reference count is
+
:This API method is invoked when a transport will no longer be used.  It is the opposite of the "setup" external function.
-
                greater than one when calling this function.
+
-
                This function can be called from asynchronous RPC tasks
+
:The function takes one argument: the address of an rpc_xprt structure to close.  It returns nothing.
-
                so it must not sleep.  It does not depend on any external
+
-
                locks being held.
+
 +
:The caller must ensure that the xprt's reference count is positive when calling this function.
-
"close"        This API method is invoked to close a transport connection.
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
                It is the opposite of the "connect" method.
+
-
 
+
-
                The function takes one argument: the address of an
+
-
                rpc_xprt structure to close.  It returns nothing.
+
-
 
+
-
                The caller must ensure that the xprt's reference count is
+
-
                greater than one when calling this function.
+
-
 
+
-
                This function can be called from asynchronous RPC tasks
+
-
                or tasklets, so it must not sleep.  It does not depend on
+
-
                any external locks being held.
+
-
 
+
-
 
+
-
"destroy"      This API method is invoked when a transport will no longer
+
-
                be used.  It is the opposite of the "setup" external
+
-
                function.
+
-
 
+
-
                The function takes one argument: the address of an
+
-
                rpc_xprt structure to close.  It returns nothing.
+
-
 
+
-
                The caller must ensure that the xprt's reference count is
+
-
                positive when calling this function.
+
-
 
+
-
                This function can be called from asynchronous RPC tasks
+
-
                so it must not sleep.  It does not depend on any external
+
-
                locks being held.
+
-
</pre>
+
=== Procedure syntax and functional descriptions (external functions) ===
=== Procedure syntax and functional descriptions (external functions) ===
Line 644: Line 392:
=== Procedure syntax and functional descriptions (generic functions) ===
=== Procedure syntax and functional descriptions (generic functions) ===
-
<pre>
+
In addition to the above API, transport implementations may also need to invoke functions that are a part of the generic RPC client.  These functions are:
-
In addition to the above API, transport implementations may also need
+
-
to invoke functions that are a part of the generic RPC client.  These
+
-
functions are:
+
-
  void rpc_getport(struct rpc_task *task, struct rpc_clnt *clnt)
+
;<tt>void rpc_getport(struct rpc_task *task, struct rpc_clnt *clnt)</tt>
-
    This interface provides portmapping for IPv4 sockets.
+
:This interface provides portmapping for IPv4 sockets.
-
    The function takes two arguments: the address of the rpc_task
+
:The function takes two arguments: the address of the rpc_task structure for the current RPC request, and the address of the rpc_clnt structure associated with this task.  It returns nothing.
-
    structure for the current RPC request, and the address of the
+
-
    rpc_clnt structure associated with this task.  It returns nothing.
+
-
    This operation starts the bind operation asynchronously, and the
+
:This operation starts the bind operation asynchronously, and the caller sleeps using the RPC client's scheduling primitives.  The caller is awoken automatically when the bind is complete, and can check the status of the bind operation using "is_bound."
-
    caller sleeps using the RPC client's scheduling primitives.  The
+
-
    caller is awoken automatically when the bind is complete, and can
+
-
    check the status of the bind operation using "is_bound."
+
-
    This function can be called from asynchronous RPC tasks so it must
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
    not sleep.  It does not depend on any external locks being held.
+
 +
;<tt>void * rpc_malloc(struct rpc_task *task, size_t size)</tt>
-
  void * rpc_malloc(struct rpc_task *task, size_t size)
+
:This interface allocates a buffer from the rpc_buffer slab cache. These buffers are generally used to contain the RPC header for each each RPC request.
-
    This interface allocates a buffer from the rpc_buffer slab cache.
+
:The function takes two arguments: the address of the rpc_task structure associated with the current request, and a requested size of the new buffer, in bytes. It returns an address of a usable area of memory, or NULL in case no buffer is currently available.  The RPC client will retry if a NULL is returned.
-
    These buffers are generally used to contain the RPC header for
+
-
    each each RPC request.
+
-
    The function takes two arguments: the address of the rpc_task
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
    structure associated with the current request, and a requested
+
-
    size of the new buffer, in bytes.  It returns an address of a
+
-
    usable area of memory, or NULL in case no buffer is currently
+
-
    available.  The RPC client will retry if a NULL is returned.
+
-
    This function can be called from asynchronous RPC tasks so it
+
;<tt>void rpc_free(struct rpc_task *task)</tt>
-
    must not sleep.  It does not depend on any external locks being
+
-
    held.
+
 +
:Buffers allocated via rpc_malloc are freed via this interface.
-
  void rpc_free(struct rpc_task *task)
+
:The function takes one argument: the address of the rpc_task structure associated with the current request.  It returns nothing.
-
    Buffers allocated via rpc_malloc are freed via this interface.
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
    The function takes one argument: the address of the rpc_task
+
;<tt>void xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, skb_reader_t *desc, skb_read_actor_t copy_actor)</tt>
-
    structure associated with the current request.  It returns
+
-
    nothing.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:This interface is used by datagram socket transports to copy data from an incoming skb to an xdr_buf.  It is used by both the client and server RPC implementations.
-
    must not sleep.  It does not depend on any external locks being
+
-
    held.
+
 +
:The function takes four arguments: the address of a standard xdr_buf structure containing data to be copied; the base offset where the copy operation should begin; the address of the read operation descriptor, and the address of a copy actor function. It returns nothing.
-
  void xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base,
+
This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
                                skb_reader_t *desc,
+
-
                                skb_read_actor_t copy_actor)
+
-
    This interface is used by datagram socket transports to copy
+
;<tt>int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb)</tt>
-
    data from an incoming skb to an xdr_buf.  It is used by both
+
-
    the client and server RPC implementations.
+
-
    The function takes four arguments: the address of a standard
+
:This interface provides a checksum copy function that copies data from an skb to an xdr_buf.  It is used by both the client and server RPC implementations.
-
    xdr_buf structure containing data to be copied; the base offset
+
-
    where the copy operation should begin; the address of the read
+
-
    operation descriptor, and the address of a copy actor function.
+
-
    It returns nothing.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:The function takes two arguments: the address of a standard xdr_buf structure that acts as the destination of the copy operation, and the address of an skbuff structure containing data to be copied.  It returns the number of bytes that were copied.
-
    must not sleep.  It does not depend on any external locks being
+
-
    held.
+
 +
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
  int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb)
+
;<tt>void rpc_init_rtt(struct rpc_rtt *rt, unsigned long timeo)</tt>
-
    This interface provides a checksum copy function that copies
+
:A transport implementation can invoke this function to initialize an rpc_rtt structure.
-
    data from an skb to an xdr_buf.  It is used by both the client
+
-
    and server RPC implementations.
+
-
   
+
-
    The function takes two arguments: the address of a standard
+
-
    xdr_buf structure that acts as the destination of the copy
+
-
    operation, and the address of an skbuff structure containing
+
-
    data to be copied.  It returns the number of bytes that were
+
-
    copied.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:The function takes two arguments: the address of an rpc_rtt structure to initialize, and the number of jiffies to use as the initial timeout value.  It returns nothing.
-
    must not sleep.  It does not depend on any external locks being
+
-
    held.
+
 +
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
  void rpc_init_rtt(struct rpc_rtt *rt, unsigned long timeo)
+
;<tt>void rpc_update_rtt(struct rpc_rtt *rt, unsigned timer, long m)</tt>
-
    A transport implementation can invoke this function to initialize
+
:Transport implementations use this function to update an rpc_rtt structure when an RPC request has completed.
-
    an rpc_rtt structure.
+
-
    The function takes two arguments: the address of an rpc_rtt
+
:The function takes three arguments: the address of the rpc_rtt structure to update; the index of the timer to update; and the number of jiffies that have passed since the RPC request was started.  It returns nothing.
-
    structure to initialize, and the number of jiffies to use as
+
-
    the initial timeout value.  It returns nothing.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  The transport_lock must be held before calling this function.
-
    must not sleep.  It does not depend on any external locks being
+
-
    held.
+
 +
;<tt>unsigned long rpc_calc_rto(struct rpc_rtt *rt, unsigned timer)</tt>
-
  void rpc_update_rtt(struct rpc_rtt *rt, unsigned timer, long m)
+
:This interface returns a value suitable for use as a retransmission timeout, in jiffies, based on the context data contained in an rpc_rtt structure.
-
    Transport implementations use this function to update an rpc_rtt
+
:The function takes two arguments: the address of the rpc_rtt structure that contains the data to use for the calculation, and the index of the timer to use.  It returns the number of jiffies to use for the retransmit timer.
-
    structure when an RPC request has completed.
+
-
    The function takes three arguments: the address of the rpc_rtt
+
:This function can be called from asynchronous RPC tasks so it must not sleepThe transport_lock must be held before calling this function.
-
    structure to update; the index of the timer to update; and
+
-
    the number of jiffies that have passed since the RPC request
+
-
    was startedIt returns nothing.
+
-
    This function can be called from asynchronous RPC tasks so it
+
;<tt>int xprt_register(struct xprt_type *transport)</tt>
-
    must not sleep.  The transport_lock must be held before calling
+
;<tt>int xprt_unregister(struct xprt_type *transport)</tt>
-
    this function.
+
 +
:Transport implementations use this interface to register their presence with the generic transport layer.  The transport layer will not use a transport implementation for new RPC connections until the transport implementation has registered via this interface.
-
  unsigned long rpc_calc_rto(struct rpc_rtt *rt, unsigned timer)
+
:Both functions take a single argument: the address of an xprt_type structure representing the transport implementation to register or unregister.  Both functions return zero on success, and an errno-type value on failure.
-
    This interface returns a value suitable for use as a retransmission
+
:This function is called from a user process context, so it may sleep.  It does not depend on any external locks being held.
-
    timeout, in jiffies, based on the context data contained in an
+
-
    rpc_rtt structure.
+
-
    The function takes two arguments: the address of the rpc_rtt
+
;<tt>void xprt_adjust_cwnd(struct rpc_rqst *req, int result)</tt>
-
    structure that contains the data to use for the calculation,
+
-
    and the index of the timer to use.  It returns the number of
+
-
    jiffies to use for the retransmit timer.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:Transport implementations that need congestion control invoke this function to adjust their congestion window.
-
    must not sleep.  The transport_lock must be held before calling
+
-
    this function.
+
 +
:The function takes two arguments: the address of an rpc_rqst structure representing the request that has caused the change in the transport's congestion window, and an integer containing an errno value indicating why the window needs to be adjusted.  It returns nothing.
-
  int xprt_register(struct xprt_type *transport)
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  The transport_lock must be held before calling this function.
-
  int xprt_unregister(struct xprt_type *transport)
+
-
    Transport implementations use this interface to register their
+
;<tt>void xprt_disconnect(struct rpc_xprt *xprt)</tt>
-
    presence with the generic transport layer.  The transport layer
+
-
    will not use a transport implementation for new RPC connections
+
-
    until the transport implementation has registered via this
+
-
    interface.
+
-
    Both functions take a single argument: the address of an
+
:Callers use this interface to mark a transport as disconnected. The generic layer will subsequently terminate the transport connection when it is safe to do so.
-
    xprt_type structure representing the transport implementation
+
-
    to register or unregister.  Both functions return zero on
+
-
    success, and an errno-type value on failure.
+
-
    This function is called from a user process context, so it may
+
:The function takes a single argument: the address of an rpc_xprt structure representing the transport instance to mark disconnected.  It returns nothing.
-
    sleep.  It does not depend on any external locks being held.
+
 +
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
  void xprt_adjust_cwnd(struct rpc_rqst *req, int result)
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  It does not depend on any external locks being held.
-
    Transport implementations that need congestion control invoke
+
;<tt>struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, u32 xid)</tt>
-
    this function to adjust their congestion window.
+
-
    The function takes two arguments: the address of an rpc_rqst
+
:When an RPC reply is first recieved, the transport implementation invokes this function to map the received XID to a pending rpc_rqst.
-
    structure representing the request that has caused the
+
-
    change in the transport's congestion window, and an integer
+
-
    containing an errno value indicating why the window needs
+
-
    to be adjusted.  It returns nothing.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:The function takes two arguments: the address of an rpc_xprt structure on which a request has just arrived, and a 32-bit value representing the XID of the request to look up.
-
    must not sleep.  The transport_lock must be held before calling
+
-
    this function.
+
 +
:The caller must ensure that the xprt's reference count is greater than one when calling this function.
-
  void xprt_disconnect(struct rpc_xprt *xprt)
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  The transport_lock must be held before calling this function.
-
    Callers use this interface to mark a transport as disconnected.
+
;<tt>void xprt_complete_rqst(struct rpc_rqst *req, size_t copied)</tt>
-
    The generic layer will subsequently terminate the transport
+
-
    connection when it is safe to do so.
+
-
    The function takes a single argument: the address of an
+
:A transport implementation invokes this function to signal that a complete RPC reply has been received, and that the RPC client may begin decoding the reply.
-
    rpc_xprt structure representing the transport instance to
+
-
    mark disconnected.  It returns nothing.
+
-
    The caller must ensure that the xprt's reference count is
+
:This function takes two arguments: the address of an rpc_rqst structure representing the request that is being completed, and an integer containing the number of payload bytes that were just copied by the request.
-
    greater than one when calling this function.
+
-
    This function can be called from asynchronous RPC tasks so it
+
:This function can be called from asynchronous RPC tasks so it must not sleep.  The transport_lock must be held before calling this function.
-
    must not sleep.  It does not depend on any external locks being
+
-
    held.
+
-
 
+
-
 
+
-
  struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt,
+
-
                                            u32 xid)
+
-
 
+
-
    When an RPC reply is first recieved, the transport implementation
+
-
    invokes this function to map the received XID to a pending
+
-
    rpc_rqst.
+
-
 
+
-
    The function takes two arguments: the address of an rpc_xprt
+
-
    structure on which a request has just arrived, and a 32-bit
+
-
    value representing the XID of the request to look up.
+
-
 
+
-
    The caller must ensure that the xprt's reference count is
+
-
    greater than one when calling this function.
+
-
 
+
-
    This function can be called from asynchronous RPC tasks so it
+
-
    must not sleep.  The transport_lock must be held before calling
+
-
    this function.
+
-
 
+
-
 
+
-
  void xprt_complete_rqst(struct rpc_rqst *req, size_t copied)
+
-
 
+
-
    A transport implementation invokes this function to signal
+
-
    that a complete RPC reply has been received, and that the
+
-
    RPC client may begin decoding the reply.
+
-
 
+
-
    This function takes two arguments: the address of an rpc_rqst
+
-
    structure representing the request that is being completed, and
+
-
    an integer containing the number of payload bytes that were just
+
-
    copied by the request.
+
-
 
+
-
    This function can be called from asynchronous RPC tasks so it
+
-
    must not sleep.  The transport_lock must be held before calling
+
-
    this function.
+
-
</pre>
+
=== Procedure syntax and functional descriptions (create) ===
=== Procedure syntax and functional descriptions (create) ===
-
The transport switch replaces the two functions that were formerly
+
The transport switch replaces the two functions that were formerly used to create a new rpc_clnt, xprt_create_proto and rpc_create_client, with a single function call that hides the details of the transport from RPC applications.
-
used to create a new rpc_clnt, xprt_create_proto and rpc_create_client,
+
-
with a single function call that hides the details of the transport
+
-
from RPC applications.
+
-
To create a new rpc_clnt structure, an application will fill in
+
To create a new rpc_clnt structure, an application will fill in this structure, and pass it to the new rpc_create function:
-
this structure, and pass it to the new rpc_create function:
+
<pre>
<pre>
Line 888: Line 525:
</pre>
</pre>
-
This structure contains all the same parameters that the
+
This structure contains all the same parameters that the xprt_create_proto and rpc_create_client function calls used.  In addition, a "behavior" field contains bits that enable specific behaviors in the new rpc_clnt instance.
-
xprt_create_proto and rpc_create_client function calls used.  In
+
-
addition, a "behavior" field contains bits that enable specific
+
-
behaviors in the new rpc_clnt instance.
+
<pre>
<pre>
Line 903: Line 537:
</pre>
</pre>
-
<pre>
+
;<tt>int rpc_create(struct rpc_create_args *);</tt>
-
  int rpc_create(struct rpc_create_args *);
+
-
    This function is invoked by applications to create a new
+
:This function is invoked by applications to create a new rpc_clnt structure.
-
    rpc_clnt structure.
+
-
    The function takes a single argument: the address of the
+
:The function takes a single argument: the address of the rpc_create_args structure that provides the parameters for the new rpc_clnt instance.
-
    rpc_create_args structure that provides the parameters for
+
-
    the new rpc_clnt instance.
+
-
    This function is called from a user process context, so it
+
:This function is called from a user process context, so it may sleep.  It does not depend on any external locks being held.
-
    may sleep.  It does not depend on any external locks being
+
-
    held.
+
-
</pre>
+
== Conclusion ==
== Conclusion ==
-
With the implementation of an RPC transport switch, we hope to
+
With the implementation of an RPC transport switch, we hope to facilitate the introduction of significant new technolgy into the Linux kernel RPC implementation.  Not only will the RPC transport switch enable new transport technologies such as high performance TCP offload, but it will ease enhancements such as multiple sockets per client-server pair, the elimination of the RPC slot table, and the removal of the global kernel lock from the RPC client and server.
-
facilitate the introduction of significant new technolgy into the
+
-
Linux kernel RPC implementation.  Not only will the RPC transport
+
-
switch enable new transport technologies such as high performance
+
-
TCP offload, but it will ease enhancements such as multiple sockets
+
-
per client-server pair, the elimination of the RPC slot table, and
+
-
the removal of the global kernel lock from the RPC client and server.
+

Latest revision as of 02:53, 24 August 2007

Contents

Purpose

We document the design for a transport switch in the Linux 2.6 RPC client.

Introduction

Today's RPC client and server in the Linux kernel use a socket-based transport layer API. This works well for existing network transport technologies such as IPv4 TCP over gigabit Ethernet.

In the near future, alternate transport technologies will appear which may be difficult to mate with the socket abstraction. Examples of such new technologies include transports that support direct data placement and TCP offload devices accessed directly rather than through the Linux kernel's network layer.

Additionally, other new technologies such as IPv6 and new stream protocols such as SCTP will require significant changes to the socket-based infrastructure in the RPC client and server, but may have little if any effect on other areas.

Finally, security mechanisms such as IPsec and Kerberos 5 privacy may have special buffer management requirements in the transport layer in order to provide as efficient an implementation as possible.

In the following text, we refer to today's RPC client and server that do not have a generic transport switch implementation as the "pre- switch" versions of the client and server.

Specification

Our final goal is an implementation that facilitates integration of alternate transports while retaining or improving the stability, performance, and maintainability of the pre-switch RPC client with socket-based transports. In other words, we want to have no negative impact on the performance or stability of the existing IPv4 socket-based transport as we add a transport switch capability. Toward that end, we will introduce as little new functionality to existing support as possible for IPv4 socket transports; we are simply moving code and data structures. When complete, the IPv4 socket transport implementation will act as a reference for new transport implementations.

A "transport implementation" provides the code base that supports particular transport mechanisms, such as "IPv4 socket." Eventually transport implementations will be contained in loadable kernel modules. As they are loaded, they will register with the RPC client and server. Each transport implementation provides a vector of procs that provide a way to create, bind, and connect a new transport instance, provide auxiliary services such as portmapping, and provide ways to configure send and receive data on, or destroy, such instances.

Each transport connection between the client and server using a particular transport implementation is known as a "transport instance." Such an instance is identified by its transport implementation, and by the endpoint addresses of the client and server, and is represented by an rpc_xprt struct. For the "IPv4 socket" transport implementation, a transport instance is a single IPv4 socket connection that uses either the UDP or TCP network protocol. Note, for example, that a single transport instance might also consist of multiple sockets that share a workload, or an RDMA link with a passive failover IP socket, depending on how the instance's transport is implemented.

The transport API now contains methods to access various fields in the rpc_xprt struct. A transport-private data structure contains fields that are specific to a particular transport instance.

When the API is complete, transport endpoint addresses will be contained in a sockaddr_storage structure and an API method will be provided to retrieve the value of the remote peer's endpoint address. Setting the remote address will only be allowed during transport instance creation.

A transport implementation will usually include its own mechanism for RPC portmapping. For example, IPv4 sockets will use the standard RPC portmapper. IPv6 sockets may use rpcbind. Some implementations will not need any kind of port mapping; such implementations can provide the portmap methods as no-ops.

We defer the introduction of mechanisms by which user space, and subsequently the NFS client and server, specify which transport to use and parameters specific to a particular transport implementation. New mount options that control aspects of transport operation and changes to the mount_data structure will be considered on a case by case basis.

Support for the NFS version 4 session model

The pre-existing RPC client transport model includes a capability to send RPC requests and receive replies from servers via a single transport instance. NFS version 4 (RFC 3530) introduces the concept of a callback channel to support RPC requests sent by NFS servers and received by clients. The primary use of this channel is to support NFS version 4 read and write delegation. Typically it uses a separate RPC server instance on the client supported by a separate transport instance to service callback RPC requests.

In the near future, a minor revision of NFS version 4 will require the ability to combine the normal RPC request channel with the callback channel on a single transport instance (also known as the NFS version 4 session layer). To support bi-directional RPC communications on a single transport instance, additional transport methods will be required.

At this time we do not understand yet what will be required, in addition to the methods described above, to support callbacks on the same transport instance as the RPC request forward channel.

API Specification

The generic functionality of all RPC transports (ie congestion control, request queuing, retransmit timeouts, and so on) will remain in xprt.c. All API methods must be present in all transport implementations.

We define thirteen transport methods:

struct rpc_xprt_ops {
        void            (*setbufsize)(struct rpc_xprt *,
                                        size_t, size_t);
        void            (*print_addr)(struct rpc_xprt *,
                                        size_t, char *, int);
        int             (*is_bound)(struct rpc_xprt *);
        void            (*rpcbind)(struct rpc_task *, struct rpc_clnt *);
        void            (*set_port)(struct rpc_xprt *, unsigned short);
        void            (*connect)(struct rpc_task *);
        int             (*aux_protocol)(struct rpc_xprt *);
        void *          (*buf_alloc)(struct rpc_task *, size_t);
        void            (*buf_free)(struct rpc_task *);
        int             (*send_request)(struct rpc_task *);
        void            (*set_receive_timeout)(struct rpc_task *);
        int             (*is_congested)(struct rpc_xprt *);
        void            (*timeout)(struct rpc_xprt *);
        void            (*close)(struct rpc_xprt *);
        void            (*destroy)(struct rpc_xprt *);
};

The following type defines a single transport implementation. It provides a name that functions only as an eye-catcher; the address of the transport implementation's kernel module structure; a family and protocol; and the address of the function that the generic layer can use to set up a new transport instance. The address of this structure is passed to the generic layer when the transport implementation initializes.

struct xprt_type {
        struct list_head        list;
        char                    name[32];
        struct module *         owner;
        unsigned short          family;
        int                     protocol;
        int                     (*setup)(struct rpc_xprt *,
                                                struct rpc_timeout *);
};

The setup function is responsible for initializing a number of fields in the rpc_xprt structure it is passed, in addition to possibly allocating and intializing a private area for the transport instance.

tsh_size
the size, in 8-bit bytes, of a transport-specific header to be placed before the RPC header when building each RPC request.
cwnd
the initial size of the congestion window.
resvport
a boolean which, if true, means this transport needs a reserved port.
max_payload
the size, in 8-bit bytes, of the largest payload a single RPC request can contain on this transport.
bind_timeout
number of jiffies to wait for a bind request to complete before timing it out.
connect_timeout
number of jiffies to wait for a transport connect request to complete before timing it out.
reestablish_timeout
number of jiffies to wait after a transport is remotely disconnected before attempting to reestablish a connection.
idle_timeout
number of jiffies to wait after a transport becomes idle before disconnecting.
ops
the address of this transport instance's operations vector.
max_reqs
the maximum number of concurrent requests this transport instance can support.

A (void *) pointer field is made available in the rpc_xprt structure to reference an implementation-private area where instance variables specific to a transport implementation can be maintained.

Procedure syntax and functional descriptions (transport ops)

setup
This external function is provided by the transport implementation for initializing a new transport instance, setting the remote peer address, and providing some transport-specific parameters, such as request timeout values. This function also initializes the vector of API methods with which the generic layer can manipulate the new transport instance.
The function takes two arguments: the address of a freshly allocated rpc_xprt structure, and the address of a structure containing transport-specific options. The "addr" field of the rpc_xprt structure is initialized with the remote endpoint address before "setup" is invoked.
The return value is an errno value if problems were encountered, or zero on success.
This function is called from a user process context, so it may sleep. It does not depend on any external locks being held.
setbufsize
This API method is invoked following the creation of a new transport instance to initialize transport layer buffer parameters.
The function takes three arguments, which are the address of the rpc_xprt structure that is to be reconnected, and two unsigned integers reflecting the desired size of the tranport's buffer size, in bytes. It returns nothing.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function is called from a user process context, so it may sleep. It does not depend on any external locks being held.
print_addr
This API method stuffs a buffer with a formatted string representing the address of the remote peer address. It's useful for building hash functions or with error, warning, and trace messages.
The function takes four arguments, which are the address of the rpc_xprt structure containing the remote address, the size in bytes and the address of a buffer to stuff, and a set of flags that determine which address fields are to be formatted. It returns nothing.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function is called from a user process context, so it may sleep. It does not depend on any external locks being held.
is_bound
This API method is invoked to determine whether a bind operation is required before a connection is made.
The function takes a single argument, which is the address of the rpc_xprt structure which is being tested. It returns true if the transport is bound already, and false if a bind operation is necessary before proceding.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
rpcbind
This API method is invoked before a connect to allow portmapping to occur. If ports are not supported by the underlying transport mechanism, this method can be a no-op.
The function takes two arguments: the address of the rpc_task structure for the current RPC request, and the address of the rpc_clnt structure associated with this task. It returns nothing.
This operation starts the bind operation asynchronously, and the caller sleeps using the RPC client's scheduling primitives. The caller is awoken automatically when the bind is complete, and can check the status of the bind operation using "is_bound."
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
set_port
This API method is invoked to change the bound port number for a transport. It is generally invoked only during a bind operation.
The function takes two arguments: the address of an rpc_xprt structure to update, and an unsigned 16-bit integer which is the new port number. It returns nothing.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC task, so it must not sleep. It does not depend on any external locks being held.
connect
This API method is invoked to connect a transport when the generic transport layer recognizes the need to connect a transport instance.
The generic layer serializes transport reads and writes with the connect operation on this transport. Calling this function starts the connection, but the transport may or may not be connected when it returns. The generic layer uses the RPC client's scheduler primitives to wait safely until the connection operation is complete, and to allow only one connection attempt at a time.
The details of whether a transport is connection-oriented or datagram-oriented can be well hidden in the tranport implementation itself. The RPC client's finite state engine automatically detects whether a transport is connected before sending each request; if it is not, it will invoke this method automatically.
The function takes one argument, which is the address of an rpc_task structure which can be used for scheduling the connection and sleeping. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep.
aux_protocol
This API method returns the protocol number to be used to set up auxiliary transports. An auxiliary transport is an additional transport instance that connects the same endpoints, but carries a different RPC program. NLM, NSM, and NFSACL would use an auxiliary transport to connect to servers.
The function takes one argument, which is the address of an rpc_xprt structure. It returns an integer.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep.
buf_alloc
This API method returns an area of memory in which to construct an outgoing RPC and to contain its reply. The memory can be a dynamically allocated buffer, or it can provide the address of an existing memory area where the construction can occur.
The function takes two arguments: the address of the rpc_task structure associated with the current request, and a requested size of the memory area, in bytes. It returns an address of a usable area of memory, or NULL in case no area is currently available. The RPC client will retry if a NULL is returned.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
buf_free
This API method is invoked when an rpc_task is finished and must free a memory area allocated via buf_alloc.
The function takes one argument: the address of the rpc_task structure associated with the current request. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
send_request
This API method is invoked to send a single RPC request over the transport, after taking the transports write lock to serialize with other write or connect operations. This method must not sleep or block.
This method adds any transport-specific headers that are required before the request is transmitted. The transport implementation exports the byte size of the space required in the buffer where requests are assembled so that the generic logic may leave that space available for transport-specific header information.
The function takes one argument: the address of the rpc_task structure associated with the current request. The request has already been completely specified in the task's associated rq_rqst.
If the transport is unable to write the complete request, this function places the task on a sleep queue and returns EAGAIN. The transport implementation will wake the task when the send operation can make forward progress. The generic layer calls this method again when the task is awakened. The generic layer does not release the write lock until the current request has been completely sent.
If the transport requires a "connect" operation, this function returns ENOTCONN. If any other error occurs, that error is returned. If the send operation is entirely successful, this method returns zero.
This function can be called from asynchronous RPC tasks so it must not sleep. The generic layer serializes transport reads and writes with the connect operation on this transport. Calling this function starts the write operation, but the write may not be complete when it returns. The generic layer uses the RPC client's scheduler primitives to wait safely until the reply to this request is received.
set_receive_timeout
The generic transport layer invokes this API method after a message has been sent successfully on a transport.
Each transport implementation provides its own RPC retransmit logic via this method. It sets the RPC task timeout values so that the task is automatically awakened if no server reply is received. The timer callout is always xprt_timer.
The function takes one argument: the address of the rpc_task structure associated with the current request. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep. The caller must acquire the transport_lock and the write lock while calling this function.
is_congested
This API method is invoked to determine whether a transport is congested. If the transport indicates that it is congested, the generic transport layer puts the current request to sleep.
The function takes one argument: the address of the rpc_xprt structure to check. It returns a zero value if the transport is not congested, and a nonzero value if the current request should be delayed.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
timeout
This API method is invoked when the RPC client detects a major retransmit timeout on this transport. The transport implementation can use this to record statistics, adjust timeout values, or mark a connection for reconnection.
The function takes one argument: the address of the rpc_xprt structure that experienced the retransmit timeout. It returns nothing.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
close
This API method is invoked to close a transport connection. It is the opposite of the "connect" method.
The function takes one argument: the address of an rpc_xprt structure to close. It returns nothing.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks or tasklets, so it must not sleep. It does not depend on any external locks being held.
destroy
This API method is invoked when a transport will no longer be used. It is the opposite of the "setup" external function.
The function takes one argument: the address of an rpc_xprt structure to close. It returns nothing.
The caller must ensure that the xprt's reference count is positive when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.

Procedure syntax and functional descriptions (external functions)

 "rpc_peeraddr"  This external function is a convenient way to invoke a
                 transport's peer_addr method.

                 The function takes three arguments: the address of the
                 rpc_clnt structure to be queried, the address of a buffer
                 into which to copy the endpoint address, and the size of
                 that buffer.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "rpc_print_peeraddr"  This external function provides a way to format
                 remote peer addresses for printing or for use in a hash
                 function.

                 The function takes four arguments: the address of the
                 rpc_clnt structure containing the address of interest,
                 the address and size of a buffer, and a set of flags
                 that determine which parts of the address are formatted.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "xprt_tsh_size" This external function returns the number of bytes
                 to be left before the RPC header is inserted into the
                 transmission buffer.  The generic transport layer uses
                 this value when constructing each RPC request to leave
                 room for transport specific and protocol specific
                 headers.

                 This function takes one argument: the address of the
                 rpc_xprt structure that will be used to transmit the
                 current request.  It returns the size of any protocol
                 specific header, in bytes, or zero, if no space for
                 a protocol specific header is required.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "xprt_is_bound" This external function is a convenient way to invoke a
                 transport's bound method.

                 The function takes a single argument, which is the address
                 of the rpc_xprt structure which is being tested.  It
                 returns true if the transport is bound already, and false
                 if a bind operation is necessary before proceding.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "xprt_connected" This external function is a convenient way to determine
                 whether a transport is connected.

                 The function takes one argument: the address of the
                 rpc_xprt structure that represents the transport
                 instance to check.  It returns a truth value.

                 The caller must ensure that the xprt's reference count is
                 greater than one when calling this function.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "rpc_max_payload" This external function reports the maximum number of
                 bytes of payload that a single RPC can carry on a given
                 transport protocol.

                 The function takes one argument, which is the address of
                 an rpc_clnt structure created by rpc_create.  It returns
                 a size_t value.

                 This function is called from a user process context,
                 so it may sleep.  It does not depend on any external
                 locks being held.


 "rpc_force_rebind" This external function allows applications to request
                 that the RPC client rebind the transport.

                 The function takes one argument: the address of the
                 rpc_clnt structure to rebind.  It returns nothing.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.


 "rpc_aux_protocol" This external function reports what transport protocol
                 to use when connecting auxiliary services, such as NLM
                 or NFSACL, based on the protocol used on the main
                 forward channel.

                 The function takes one argument: the address of the
                 rpc_clnt structure to query.  It returns an integer.

                 This function can be called from asynchronous RPC tasks
                 so it must not sleep.  It does not depend on any external
                 locks being held.

Procedure syntax and functional descriptions (generic functions)

In addition to the above API, transport implementations may also need to invoke functions that are a part of the generic RPC client. These functions are:

void rpc_getport(struct rpc_task *task, struct rpc_clnt *clnt)
This interface provides portmapping for IPv4 sockets.
The function takes two arguments: the address of the rpc_task structure for the current RPC request, and the address of the rpc_clnt structure associated with this task. It returns nothing.
This operation starts the bind operation asynchronously, and the caller sleeps using the RPC client's scheduling primitives. The caller is awoken automatically when the bind is complete, and can check the status of the bind operation using "is_bound."
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
void * rpc_malloc(struct rpc_task *task, size_t size)
This interface allocates a buffer from the rpc_buffer slab cache. These buffers are generally used to contain the RPC header for each each RPC request.
The function takes two arguments: the address of the rpc_task structure associated with the current request, and a requested size of the new buffer, in bytes. It returns an address of a usable area of memory, or NULL in case no buffer is currently available. The RPC client will retry if a NULL is returned.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
void rpc_free(struct rpc_task *task)
Buffers allocated via rpc_malloc are freed via this interface.
The function takes one argument: the address of the rpc_task structure associated with the current request. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
void xdr_partial_copy_from_skb(struct xdr_buf *xdr, unsigned int base, skb_reader_t *desc, skb_read_actor_t copy_actor)
This interface is used by datagram socket transports to copy data from an incoming skb to an xdr_buf. It is used by both the client and server RPC implementations.
The function takes four arguments: the address of a standard xdr_buf structure containing data to be copied; the base offset where the copy operation should begin; the address of the read operation descriptor, and the address of a copy actor function. It returns nothing.

This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.

int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct sk_buff *skb)
This interface provides a checksum copy function that copies data from an skb to an xdr_buf. It is used by both the client and server RPC implementations.
The function takes two arguments: the address of a standard xdr_buf structure that acts as the destination of the copy operation, and the address of an skbuff structure containing data to be copied. It returns the number of bytes that were copied.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
void rpc_init_rtt(struct rpc_rtt *rt, unsigned long timeo)
A transport implementation can invoke this function to initialize an rpc_rtt structure.
The function takes two arguments: the address of an rpc_rtt structure to initialize, and the number of jiffies to use as the initial timeout value. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
void rpc_update_rtt(struct rpc_rtt *rt, unsigned timer, long m)
Transport implementations use this function to update an rpc_rtt structure when an RPC request has completed.
The function takes three arguments: the address of the rpc_rtt structure to update; the index of the timer to update; and the number of jiffies that have passed since the RPC request was started. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep. The transport_lock must be held before calling this function.
unsigned long rpc_calc_rto(struct rpc_rtt *rt, unsigned timer)
This interface returns a value suitable for use as a retransmission timeout, in jiffies, based on the context data contained in an rpc_rtt structure.
The function takes two arguments: the address of the rpc_rtt structure that contains the data to use for the calculation, and the index of the timer to use. It returns the number of jiffies to use for the retransmit timer.
This function can be called from asynchronous RPC tasks so it must not sleep. The transport_lock must be held before calling this function.
int xprt_register(struct xprt_type *transport)
int xprt_unregister(struct xprt_type *transport)
Transport implementations use this interface to register their presence with the generic transport layer. The transport layer will not use a transport implementation for new RPC connections until the transport implementation has registered via this interface.
Both functions take a single argument: the address of an xprt_type structure representing the transport implementation to register or unregister. Both functions return zero on success, and an errno-type value on failure.
This function is called from a user process context, so it may sleep. It does not depend on any external locks being held.
void xprt_adjust_cwnd(struct rpc_rqst *req, int result)
Transport implementations that need congestion control invoke this function to adjust their congestion window.
The function takes two arguments: the address of an rpc_rqst structure representing the request that has caused the change in the transport's congestion window, and an integer containing an errno value indicating why the window needs to be adjusted. It returns nothing.
This function can be called from asynchronous RPC tasks so it must not sleep. The transport_lock must be held before calling this function.
void xprt_disconnect(struct rpc_xprt *xprt)
Callers use this interface to mark a transport as disconnected. The generic layer will subsequently terminate the transport connection when it is safe to do so.
The function takes a single argument: the address of an rpc_xprt structure representing the transport instance to mark disconnected. It returns nothing.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep. It does not depend on any external locks being held.
struct rpc_rqst *xprt_lookup_rqst(struct rpc_xprt *xprt, u32 xid)
When an RPC reply is first recieved, the transport implementation invokes this function to map the received XID to a pending rpc_rqst.
The function takes two arguments: the address of an rpc_xprt structure on which a request has just arrived, and a 32-bit value representing the XID of the request to look up.
The caller must ensure that the xprt's reference count is greater than one when calling this function.
This function can be called from asynchronous RPC tasks so it must not sleep. The transport_lock must be held before calling this function.
void xprt_complete_rqst(struct rpc_rqst *req, size_t copied)
A transport implementation invokes this function to signal that a complete RPC reply has been received, and that the RPC client may begin decoding the reply.
This function takes two arguments: the address of an rpc_rqst structure representing the request that is being completed, and an integer containing the number of payload bytes that were just copied by the request.
This function can be called from asynchronous RPC tasks so it must not sleep. The transport_lock must be held before calling this function.

Procedure syntax and functional descriptions (create)

The transport switch replaces the two functions that were formerly used to create a new rpc_clnt, xprt_create_proto and rpc_create_client, with a single function call that hides the details of the transport from RPC applications.

To create a new rpc_clnt structure, an application will fill in this structure, and pass it to the new rpc_create function:

struct rpc_create_args {
        int                     protocol;
        struct sockaddr         *address;
        size_t                  addrsize;
        struct rpc_timeout      *timeout;
        char                    *servername;
        struct rpc_program      *program;
        u32                     version;
        rpc_authflavor_t        authflavor;
        unsigned long           behavior;
};

This structure contains all the same parameters that the xprt_create_proto and rpc_create_client function calls used. In addition, a "behavior" field contains bits that enable specific behaviors in the new rpc_clnt instance.

#define RPC_CLNT_SOFTRTRY       (1UL << 0)
#define RPC_CLNT_INTR           (1UL << 1)
#define RPC_CLNT_CHATTY         (1UL << 2)
#define RPC_CLNT_AUTOBIND       (1UL << 3)
#define RPC_CLNT_DROPPRIV       (1UL << 4)
#define RPC_CLNT_ONESHOT        (1UL << 5)
#define RPC_CLNT_RESVPORT       (1UL << 6)
int rpc_create(struct rpc_create_args *);
This function is invoked by applications to create a new rpc_clnt structure.
The function takes a single argument: the address of the rpc_create_args structure that provides the parameters for the new rpc_clnt instance.
This function is called from a user process context, so it may sleep. It does not depend on any external locks being held.

Conclusion

With the implementation of an RPC transport switch, we hope to facilitate the introduction of significant new technolgy into the Linux kernel RPC implementation. Not only will the RPC transport switch enable new transport technologies such as high performance TCP offload, but it will ease enhancements such as multiple sockets per client-server pair, the elimination of the RPC slot table, and the removal of the global kernel lock from the RPC client and server.

Personal tools