P2P Design Specification

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
 
(32 intermediate revisions not shown)
Line 5: Line 5:
|}
|}
-
= Overview =
+
== Overview ==
-
<font color="blue">''The design specification covers the internal details of a module.  This includes anything that doesn’t have an effect on the interaction model presented by the Functional Spec (FS) or Architecture Spec (AS).</font>
+
Peer-to-peer pNFS is designed to solve the "boot storm" problem that happens when several clients in a cluster boot and attempt to read the same set of files from a single NFS server all at the same time.  This could overload the server's bandwidth, slowing down operations on most client machines.  The idea behind p2p NFS is to allow clients to act as an adhoc read-only pNFS data server that serves files out of their data cache.  This should spread out network usage across all machines, rather than focusing all activity on a single node.  Server and desired DS machines will need to be modified but any
 +
pNFS-enabled client already has the code required to read from adhoc DSs.
-
<font color="blue">''The target audience for this document is:</font>
+
== Related Documents ==
-
* <font color="blue">''Development – Current and future: be thinking of the new engineer who’s been assigned a burt in this module</font>
+
-
* <font color="blue">''QA – Given this DS, QA should understand the design enough to be able to create white-box type tests for the various parts.</font>
+
-
 
+
-
<font color=blue>
+
-
''Describe the work concisely but well enough that a reader not on your team will understand at a high level what you're doing, how you're doing it, why you're doing it, who should care enough to read further and why.  Be sure to highlight any key interactions with other components of the system.''
+
-
</font>
+
-
 
+
-
<font color="blue">''Provide enough context to make the rest of this document meaningful.</font>
+
-
 
+
-
= Related Documents =
+
* draft-myklebust-nfsv4-pnfs-backend-protocol-01.txt
* draft-myklebust-nfsv4-pnfs-backend-protocol-01.txt
* [http://tools.ietf.org/html/rfc5661 RFC 5661]
* [http://tools.ietf.org/html/rfc5661 RFC 5661]
-
= Dependencies =
+
 
-
== This design needs the following from others: ==
+
== Dependencies ==
 +
=== This design needs the following from others: ===
{| class=wikitable style="width:100%"
{| class=wikitable style="width:100%"
Line 42: Line 34:
|}
|}
-
== Assumptions ==
+
=== Assumptions ===
 +
* Workload with large number of read-only files
* Enable the following .config options for the pNFS client and pNFS ds machines:
* Enable the following .config options for the pNFS client and pNFS ds machines:
** CONFIG_NFS_V4_1
** CONFIG_NFS_V4_1
Line 53: Line 46:
** CONFIG_PNFSD_P2P
** CONFIG_PNFSD_P2P
* Install pnfs-nfs-utils on the pNFS server
* Install pnfs-nfs-utils on the pNFS server
-
* Add "pnfs" to the export options of a local filesystem
+
* Add "pnfs" to the export options of a local filesystem on the pNFS server
 +
* pNFS DS should have nfsd running, but does not need to edit /etc/exports to share files
 +
* pnfsd needs to add "pnfs" export option to /etc/exports
 +
* pnfsd also needs to have "fsid=0" as an export option, otherwise the path walking code will trigger an early UNREGISTER_DS.
-
= Design =
+
== Design ==
-
<font color="blue">'''''DESCRIBE YOUR DESIGN IN THIS SECTION'''''</font>
+
=== REGISTER_DS ===
 +
* Server
 +
** Only implemented REGISTER_DS_ALL
 +
** Create a new struct pnfs_p2p_client to store information about the adhoc DS:
 +
*** p2p client stateid
 +
*** netid
 +
*** ip address
 +
*** MDS identifier
 +
** Store structure as part of the nfs4_client
 +
** Encode p2p client stateid as reply to client
 +
* Client
 +
** Send REGISTER_DS call as part of nfs4_remote_mount()
 +
*** Use REGISTER_DS_ALL so server knows we'll cache everything
 +
*** Generate MDS identifier using cl_cb_ident and a static u32 counter
-
<font color="blue">''This section is typically the largest section. Since designs are highly specific, the template cannot provide much in the way of guidelines here.  Information which is relevant to the sections below should not be discussed here. </font>
+
=== UNREGISTER_DS ===
 +
* Server
 +
** Check that the nfs4_client has an associated pnfs_p2p_client
 +
** Check that the nfs4_client is using the correct stateid
 +
** Free memory allocated for struct pnfs_p2p_client structure during REGISTER_DS
 +
** Free pnfs_p2p_po_stids associated with the DS
 +
** Set pnfs_p2p_client pointer in nfs4_client to NULL
 +
* Client
 +
** Send UNREGISTER_DS as part of nfs4_destroy_server()
-
'''<font color="blue">''This is the main place where customizing the template for each particular team can really pay off.  Teams are encouraged to add a section for the design considerations their own particular area needs to address.'''</font>
+
=== PROXY_OPEN ===
 +
* Server
 +
** Introduce a pnfs_p2p_po_stid to track what DS the client was referred to
 +
** Strip MDS ID from the filehandle
 +
** Add stateid to list stored in the pnfs_p2p_client for the DS
 +
** Add stateid to list stored in the nfs4_client for the client
 +
** Initialize a callback workqueue structure for PROXY_REVOKE
 +
* Client
 +
** Check if we have already called PROXY_OPEN for this (filehandle, stateid)
 +
** Check that we still have a delegation for the file
 +
** Use MDS identifier from filehandle to find the correct nfs_server structure
 +
** Use server to call an nfs4_proc_proxy_open()
 +
*** Pass filehandle and read stateid
 +
*** Use the compound: [SEQUENCE, PUTFH, PROXY_OPEN,GETFH] to look up the actual filehandle and get a proxy revoke stateid
 +
** Store both filehandles, read stateid and revoke stateid in a pnfs_po_state structure
 +
*** Store this in the pnfs_layout_hdr
 +
** Pass resulting filehandle to nfs_delegation_find_inode() to find inode
 +
** Use d_find_any_alias() on the inode to find and return a dentry to the server
-
<font color="blue">''The Design specification describes how the functionality is implemented. Intended readers are:</font>
+
=== CB_PROXY_REVOKE ===
-
* <font color="blue">''Engineering (current and future)</font>
+
* Server
-
* <font color="blue">''QA; given this spec, QA should understand the design enough to be able to create white-box type tests for the various parts.</font>
+
** Call when client expires on server
 +
** Remove pnfs_p2p_po_stid from lists, but don't free until proxy_revoke_release()
 +
* Client
 +
** Use the filehandle and stateid to find associated layout
 +
** Free that pnfs_po_stid
-
* <font color="blue">''Overall design<br>This document should describe:</font>
+
=== LAYOUTGET ===
-
** <font color="blue">''How it works, in detail.  </font>
+
* Server
-
** <font color="blue">''Module breakdown</font>
+
** Edit pnfs_lexp_layout_get()
-
** <font color="blue">''Major data paths through the code. (Referring to the use cases might be useful here)</font>
+
** Set device id field in the layout to the clientid of the machine acting as the DS
-
** <font color="blue">''Process structure.</font>
+
** If we are not using p2p for the file, instead continue to return 1 as the devid
-
** <font color="blue">''Major data structures. </font>
+
** Encode a filehandle with the DSs MDS ID prepended in filelayout_encode_layout()
-
** <font color="blue">''Concurrency, parallelism, and mutual exclusion.</font>
+
-
** <font color="blue">''Class hierarchy, if your design uses object-oriented notions of inheritance and polymorphism.  This applies to, but is not limited to, development done in object-oriented languages such as C++ and Java.</font>
+
-
**<font color="blue">''A UML diagram may be the easiest and most precise way of describing the relationship between the various abstractions supported by your design.</font>
+
-
** <font color="blue">''Any state machines.</font>
+
-
** <font color="blue">''What persistent storage is used?  For Data ONTAP this might be files in the root, rdb databases, registry entries, and the like.  For other products, it might be a client filesystem, a NetApp system somewhere, or dedicated hardware.  What happens when (not if) these are lost due to failure or hardware replacement?</font>
+
-
** <font color="blue">''Resources used, how they’re controlled, what we do when we run out, recovery steps</font>
+
-
** <font color="blue">''What languages are involved.</font>
+
-
** <font color="blue">''Document how the consistency model is maintained. (NG, CFO, consistency points, etc.)</font>
+
-
* <font color="blue">''Licenses</font>
+
=== LAYOUTRETURN ===
-
** <font color="blue">''Describe how licenses are used, especially if the license checking must be done before <font color="blue">the licensing infrastructure is initialized</font>.</font>
+
* Server
 +
** Add to pnfs_lexp_layout_return()
 +
** Check nfs4_client for files opened on a DS
 +
*** Send CB_PROXY_REVOKE
 +
** Also check the pnfs_p2p_client structure for files cached as a DS
 +
*** Free these stateids directly
 +
* Client
 +
** Free up pnfs_po_state stored in the pnfs_layout_hdr
-
* <font color="blue">''Upgrade/revert</font>
+
=== GETDEVICEINFO ===
-
** <font color="blue">''Describe how upgrade and revert work.</font>
+
* Server
-
** <font color="blue">''Discuss how <font color="blue">these modules interact with CFO and SFO, data motion, and data replication</font>.</font>
+
** If we are given a device id of 1 continue using the non-p2p code
 +
** Edit pnfsd_lexp_get_device_info() to fill out pnfs_filelayout_devaddr structure with DS information
 +
** Translate deviceid back to clientid to look up the DS
 +
** Fill out netid and ip address information using data in the pnfs_p2p_client structure
-
* <font color="blue">''Install/uninstall</font>
+
=== PUTFH ===
-
** <font color="blue">''Describe how the product is installed and uninstalled.</font>
+
* Server
 +
** If this is a p2p filehandle then skip some of the state checking stuff because we won't have a dentry until after calling PROXY_OPEN
 +
** Check if a filehandle is p2p by looking at the length (p2p: 36 bytes, normal: 28 bytes)
-
* <font color="blue">''Versioning/compatibility</font>
+
=== READ ===
-
** <font color="blue">''Describe how the versioning checks are implemented.</font>
+
* Server
-
** <font color="blue">''If wire- or disk-layout is important, discuss tools (like IDL’s) used to achieve that.</font>
+
** Call into the NFS client module to perform PROXY_OPEN and return the associated dentry for p2p filehandles
-
* <font color="blue">''Internationalization/language support</font>
+
=== OPEN ===
 +
* Server
 +
** Introduce a vfs_find_any_mount() to look up any mount structure for a dentry
 +
*** This is a hack, but we don't care which mount structure as long as we get the file data!
-
* <font color="blue">''Branding and brand or vendor-neutral implementation.''</font>
+
=== Other Notes ===
 +
* free_p2p_po_stid()
 +
** Remove from lists first before either freeing or calling CB_PROXY_REVOKE to prevent accidental double frees
 +
* DS expires on server
 +
** Treat as if the client had called unregister_ds()
-
* <font color="blue">''Configurations</font>
+
=== Data Structures ===
-
** <font color="blue">''Describe algorithms related to the platform or architecture type.</font>
+
==== Server ====
-
** <font color="blue">''Describe algorithms affected by user configuration.</font>
+
* p2p client information
 +
struct pnfs_p2p_client {
 +
        struct nfs4_stid p2p_stid;
 +
        u64 p2p_mds_id;
 +
        char *p2p_netid;
 +
        char *p2p_addr;
 +
        struct list_head p2p_ds_files;
 +
};
 +
* p2p proxy open stateid
 +
struct pnfs_p2p_po_stid {
 +
        struct nfs4_stid  po_stid;
 +
        struct knfsd_fh  po_fh;
 +
        struct list_head  po_ds_list;
 +
        struct list_head  po_cl_list;
 +
        struct nfsd4_callback po_cb;
 +
};
-
* <font color="blue">''Packaging</font>
+
==== Client ====
-
** <font color="blue">''Does it change the build/release/install process in any way (e.g. adds new build types, new build steps, new build files, new files to be shipped in the tar bundle, etc.) If so, describe how these are implemented.</font>
+
* NFSv4 Proxy Open
 +
struct pnfs_po_state {
 +
        nfs4_stateid  read_stateid;
 +
        nfs4_stateid  revoke_stateid;
 +
        struct nfs_fh fh;
 +
        struct list_head list;
 +
};
-
* <font color="blue">''Online documentation</font>
+
=== Compatibility ===
-
** <font color="blue">''Describe implementations of documentation of any form (for example, tools which process commentary and create other documents)</font>
+
* Any v4.1 / pNFS enabled client should be able to make use of adhoc data servers already, and not need special p2p extensions.
 +
* Clients wishing to act as a data server need CONFIG_NFS_P2P enabled
 +
* Servers wishing to track adhoc DSs need CONFIG_PNFSD_P2P enabled
-
=Feature Interaction Dependencies and Impacts=
+
=== Documentation ===
 +
* I can write a Documentation/filesystems/nfs/peer_to_peer.txt file to give a brief overview of how p2pNFS is supposed to work and how users can configure it.
 +
* I can also copy the page to linux-nfs.org for "online documentation"
 +
 
 +
== Feature Interaction Dependencies and Impacts ==
* nfsd <-> nfs
* nfsd <-> nfs
** The machine acting as a pNFS DS needs to be running both the nfs server and the nfs client.
** The machine acting as a pNFS DS needs to be running both the nfs server and the nfs client.
 +
* Made changes to putfh
 +
** Check filehandle length since p2p filehandles are longer
 +
** Call the original version of the function if we are using a normal fh
 +
* Made changes to nfsd4_read
 +
** Call original read function if this isn't p2p, call proxy open otherwise to get data from client
 +
* nfsd_open needs to lookup mount structure without using an exportops structure for NFS
 +
* filelayout_encode_layout needs to be able to encode p2p filehandles and normal filehandles
 +
* pnfs_p2p_mark_fh increases filehandle size, server needs to know to use the mds id for bigger filehandles
 +
* nfsd4_proc_compound needs to know if a filehandle is a p2p fh since the dentry will be looked up later for reads
-
= Performance =
+
== Performance ==
-
<font color="blue">''Describe what if any aspects of the design impact the performance?</font>
+
* Keep a per-file LRU list of clients that currently have the file cached to avoid redirecting all p2p activity to the same client for that file.
-
* <font color="blue">''What bottlenecks, limitations, or unpredictable performance effects may result from the design, and why?</font>
+
-
* <font color="blue">''Discuss resource limitations and sizing issues as they apply to performance.</font>
+
-
=Scalability=
+
== Scalability ==
-
<font color="blue"> ''Provide details about how scalability goals identified in the related Architecture and Functional Specifications will be met.''
+
The hope is that p2p NFS scales to hundreds and thousands of clients better
 +
than straight pnfs does.  This can be tested by comparing read times for files
 +
of varying sizes both with and without p2p enabled. A handful of DSs and a
 +
large number of clients should be used to get a feel for how this would work
 +
in a data center.
 +
* An LRU list of clients should help load balance traffic to each DS
 +
** Make use of already existing nfs4_file->fi_delegations list, move a DSs delegation to the end when referring
 +
* I take the state lock (global mutex) when accessing file or client state
 +
* I created a p2p spinlock for accessing p2p state
-
''Provide descriptions of data structures, algorithms, and programmatic interfaces between Data ONTAP components, or between client and server, which are needed to achieve a scalable solution.''
+
== Testing ==
 +
* Basic proof-of-concept tests
 +
** 1 client, 1 DS, 1 server
 +
** Have DS and client rsync files from server
 +
** Maybe do a `git clone linux-src` instead?
 +
** Try exporting a /lib partition
-
''For example, fast lookup of a logical object may involve replacing use of a linear based search, with use
+
* In-depth testing
-
of a hash table or btree based search.'' </font>
+
** NFSv4root with varying numbers of clients
 +
*** NFSv4root doesn't work right now due to idmapping issues
 +
** More rsyncs / git clones with more clients
-
 
+
== Open Issues ==
-
= Open Issues =
+
-
 
+
-
<font color="blue">''Record in this section issues that you are aware of, but which are not yet resolved in the specification. If you discover issues after the specification is approved, you may record them here, and then re-review the specification after you address the issues.''</font>
+
{| class=wikitable width="100%"
{| class=wikitable width="100%"
Line 141: Line 236:
|-
|-
| 1
| 1
-
| Date the issue was raised.
+
| 12/11/2012
-
| Who raised it?
+
| Bryan
-
| Describe the issue.
+
| Client needs to mount server with the public filehandle, otherwise the path walking code will trigger an early UNREGISTER_DS.
-
| Describe what you did to resolve the issue.
+
| [NONE]
-
| Date
+
| [NONE]
|}
|}
-
 
+
== Approvals ==
-
= Approvals =
+
=== Approvers ===
-
 
+
-
== Approvers==
+
{| class=wikitable width=100%
{| class=wikitable width=100%
Line 164: Line 257:
|}
|}
-
==Reviewers==
+
=== Reviewers ===
-
 
+
-
<font color="blue">''Reviewers are those people who should be informed of the feature, but who are not required to officially approve it. Normally, these are people you depend on, or who depend on you, and are called out here to make sure they're aware of the dependency. Record here the names of the individuals who should review the specification, and upon completion add the date in the last column. If your specification is in Spec Tool, replace the table below with a link to the approver list in Spec Tool.</font>
+
{| class=wikitable width=100%
{| class=wikitable width=100%

Latest revision as of 19:23, 18 January 2013

3 December 2012 DRAFT bjschuma@netapp.com

Contents

Overview

Peer-to-peer pNFS is designed to solve the "boot storm" problem that happens when several clients in a cluster boot and attempt to read the same set of files from a single NFS server all at the same time. This could overload the server's bandwidth, slowing down operations on most client machines. The idea behind p2p NFS is to allow clients to act as an adhoc read-only pNFS data server that serves files out of their data cache. This should spread out network usage across all machines, rather than focusing all activity on a single node. Server and desired DS machines will need to be modified but any pNFS-enabled client already has the code required to read from adhoc DSs.

Related Documents

  • draft-myklebust-nfsv4-pnfs-backend-protocol-01.txt
  • RFC 5661

Dependencies

This design needs the following from others:

Item Description of Dependency or Issue Affected Group Contact
1 Linux pNFS server development code Bryan Schumaker Benny Halevy
2 pNFS nfs utils needs to be installed on the NFSD server so it can export a filesystem over pNFS. Bryan Schumaker Benny Halevy

Assumptions

  • Workload with large number of read-only files
  • Enable the following .config options for the pNFS client and pNFS ds machines:
    • CONFIG_NFS_V4_1
    • CONFIG_PNFS_FILE_LAYOUT
  • Enable the following .config option for the pNFS ds machine:
    • CONFIG_NFS_P2P
  • Enable the following .config options for the pNFS server and pNFS ds machines:
    • CONFIG_PNFSD
    • CONFIG_PNFSD_LOCAL_EXPORT
    • CONFIG_PNFSD_P2P
  • Install pnfs-nfs-utils on the pNFS server
  • Add "pnfs" to the export options of a local filesystem on the pNFS server
  • pNFS DS should have nfsd running, but does not need to edit /etc/exports to share files
  • pnfsd needs to add "pnfs" export option to /etc/exports
  • pnfsd also needs to have "fsid=0" as an export option, otherwise the path walking code will trigger an early UNREGISTER_DS.

Design

REGISTER_DS

  • Server
    • Only implemented REGISTER_DS_ALL
    • Create a new struct pnfs_p2p_client to store information about the adhoc DS:
      • p2p client stateid
      • netid
      • ip address
      • MDS identifier
    • Store structure as part of the nfs4_client
    • Encode p2p client stateid as reply to client
  • Client
    • Send REGISTER_DS call as part of nfs4_remote_mount()
      • Use REGISTER_DS_ALL so server knows we'll cache everything
      • Generate MDS identifier using cl_cb_ident and a static u32 counter

UNREGISTER_DS

  • Server
    • Check that the nfs4_client has an associated pnfs_p2p_client
    • Check that the nfs4_client is using the correct stateid
    • Free memory allocated for struct pnfs_p2p_client structure during REGISTER_DS
    • Free pnfs_p2p_po_stids associated with the DS
    • Set pnfs_p2p_client pointer in nfs4_client to NULL
  • Client
    • Send UNREGISTER_DS as part of nfs4_destroy_server()

PROXY_OPEN

  • Server
    • Introduce a pnfs_p2p_po_stid to track what DS the client was referred to
    • Strip MDS ID from the filehandle
    • Add stateid to list stored in the pnfs_p2p_client for the DS
    • Add stateid to list stored in the nfs4_client for the client
    • Initialize a callback workqueue structure for PROXY_REVOKE
  • Client
    • Check if we have already called PROXY_OPEN for this (filehandle, stateid)
    • Check that we still have a delegation for the file
    • Use MDS identifier from filehandle to find the correct nfs_server structure
    • Use server to call an nfs4_proc_proxy_open()
      • Pass filehandle and read stateid
      • Use the compound: [SEQUENCE, PUTFH, PROXY_OPEN,GETFH] to look up the actual filehandle and get a proxy revoke stateid
    • Store both filehandles, read stateid and revoke stateid in a pnfs_po_state structure
      • Store this in the pnfs_layout_hdr
    • Pass resulting filehandle to nfs_delegation_find_inode() to find inode
    • Use d_find_any_alias() on the inode to find and return a dentry to the server

CB_PROXY_REVOKE

  • Server
    • Call when client expires on server
    • Remove pnfs_p2p_po_stid from lists, but don't free until proxy_revoke_release()
  • Client
    • Use the filehandle and stateid to find associated layout
    • Free that pnfs_po_stid

LAYOUTGET

  • Server
    • Edit pnfs_lexp_layout_get()
    • Set device id field in the layout to the clientid of the machine acting as the DS
    • If we are not using p2p for the file, instead continue to return 1 as the devid
    • Encode a filehandle with the DSs MDS ID prepended in filelayout_encode_layout()

LAYOUTRETURN

  • Server
    • Add to pnfs_lexp_layout_return()
    • Check nfs4_client for files opened on a DS
      • Send CB_PROXY_REVOKE
    • Also check the pnfs_p2p_client structure for files cached as a DS
      • Free these stateids directly
  • Client
    • Free up pnfs_po_state stored in the pnfs_layout_hdr

GETDEVICEINFO

  • Server
    • If we are given a device id of 1 continue using the non-p2p code
    • Edit pnfsd_lexp_get_device_info() to fill out pnfs_filelayout_devaddr structure with DS information
    • Translate deviceid back to clientid to look up the DS
    • Fill out netid and ip address information using data in the pnfs_p2p_client structure

PUTFH

  • Server
    • If this is a p2p filehandle then skip some of the state checking stuff because we won't have a dentry until after calling PROXY_OPEN
    • Check if a filehandle is p2p by looking at the length (p2p: 36 bytes, normal: 28 bytes)

READ

  • Server
    • Call into the NFS client module to perform PROXY_OPEN and return the associated dentry for p2p filehandles

OPEN

  • Server
    • Introduce a vfs_find_any_mount() to look up any mount structure for a dentry
      • This is a hack, but we don't care which mount structure as long as we get the file data!

Other Notes

  • free_p2p_po_stid()
    • Remove from lists first before either freeing or calling CB_PROXY_REVOKE to prevent accidental double frees
  • DS expires on server
    • Treat as if the client had called unregister_ds()

Data Structures

Server

  • p2p client information
struct pnfs_p2p_client {
       struct nfs4_stid p2p_stid;
       u64 p2p_mds_id;
       char *p2p_netid;
       char *p2p_addr;
       struct list_head p2p_ds_files;
};
  • p2p proxy open stateid
struct pnfs_p2p_po_stid {
       struct nfs4_stid  po_stid;
       struct knfsd_fh   po_fh;
       struct list_head  po_ds_list;
       struct list_head  po_cl_list;
       struct nfsd4_callback po_cb;
};

Client

  • NFSv4 Proxy Open
struct pnfs_po_state {
       nfs4_stateid  read_stateid;
       nfs4_stateid  revoke_stateid;
       struct nfs_fh fh;
       struct list_head list;
};

Compatibility

  • Any v4.1 / pNFS enabled client should be able to make use of adhoc data servers already, and not need special p2p extensions.
  • Clients wishing to act as a data server need CONFIG_NFS_P2P enabled
  • Servers wishing to track adhoc DSs need CONFIG_PNFSD_P2P enabled

Documentation

  • I can write a Documentation/filesystems/nfs/peer_to_peer.txt file to give a brief overview of how p2pNFS is supposed to work and how users can configure it.
  • I can also copy the page to linux-nfs.org for "online documentation"

Feature Interaction Dependencies and Impacts

  • nfsd <-> nfs
    • The machine acting as a pNFS DS needs to be running both the nfs server and the nfs client.
  • Made changes to putfh
    • Check filehandle length since p2p filehandles are longer
    • Call the original version of the function if we are using a normal fh
  • Made changes to nfsd4_read
    • Call original read function if this isn't p2p, call proxy open otherwise to get data from client
  • nfsd_open needs to lookup mount structure without using an exportops structure for NFS
  • filelayout_encode_layout needs to be able to encode p2p filehandles and normal filehandles
  • pnfs_p2p_mark_fh increases filehandle size, server needs to know to use the mds id for bigger filehandles
  • nfsd4_proc_compound needs to know if a filehandle is a p2p fh since the dentry will be looked up later for reads

Performance

  • Keep a per-file LRU list of clients that currently have the file cached to avoid redirecting all p2p activity to the same client for that file.

Scalability

The hope is that p2p NFS scales to hundreds and thousands of clients better than straight pnfs does. This can be tested by comparing read times for files of varying sizes both with and without p2p enabled. A handful of DSs and a large number of clients should be used to get a feel for how this would work in a data center.

  • An LRU list of clients should help load balance traffic to each DS
    • Make use of already existing nfs4_file->fi_delegations list, move a DSs delegation to the end when referring
  • I take the state lock (global mutex) when accessing file or client state
  • I created a p2p spinlock for accessing p2p state

Testing

  • Basic proof-of-concept tests
    • 1 client, 1 DS, 1 server
    • Have DS and client rsync files from server
    • Maybe do a `git clone linux-src` instead?
    • Try exporting a /lib partition
  • In-depth testing
    • NFSv4root with varying numbers of clients
      • NFSv4root doesn't work right now due to idmapping issues
    • More rsyncs / git clones with more clients

Open Issues

Item Date Name Issue Resolution Date Resolved
1 12/11/2012 Bryan Client needs to mount server with the public filehandle, otherwise the path walking code will trigger an early UNREGISTER_DS. [NONE] [NONE]

Approvals

Approvers

Name Role Target Approval Date Approval Date
Trond Myklebust NFS Client Maintainer Date Date

Reviewers

Name Role Target Approval Date Approval Date
Jeffrey Heller Bryan's Manager Date Date
Personal tools