PNFS Development Road Map
From Linux NFS
(→Minimal sessions, back channel) |
m (PNFS Developers Road Map moved to PNFS Development Road Map: cuz bruce fields sez so) |
||
(18 intermediate revisions not shown) | |||
Line 7: | Line 7: | ||
==IETF Road Map== | ==IETF Road Map== | ||
- | + | NFSv4.1 extends NFSv4 with two major components: sessions and pNFS. As of the 70th IETF Meeting in Vancouver (December 2007), the specification of sessions in [http://www3.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-17.txt draft-ietf-nfsv4-minorversion1-17.txt] appears to be complete. [http://www1.ietf.org/mail-archive/web/nfsv4/current/msg05155.html pNFS discussions] centered on device ID mappings, layout range accounting, sparse files, persistent sessions, and recall processing. | |
+ | |||
+ | Draft 18 is anticipated to be released on December 21, 2007. The major change is device mappings, which allow a device ID to be recalled without affecting the layout. Draft 18 issues will be tested at the Austin Bakeathon in February 2008. | ||
+ | |||
+ | Draft 19 is expected to follow the Austin Bakeathon and be issued as an RFC following the 71st IETF Meeting in Philadelphia (March 2008). This will freeze the specification of sessions, generic pNFS protocol issues, and pNFS file layout. Specification of block layout, currently [http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-block draft-ietf-nfsv4-pnfs-block-05.txt], and object layout, currently [http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj draft-ietf-nfsv4-pnfs-obj-04.txt], may also be ready to move forward in Philadelphia; otherwise they will wait until the 72nd IETF Meeting in Europe (July/August 2008). | ||
==Linux pNFS Road Map== | ==Linux pNFS Road Map== | ||
Line 21: | Line 25: | ||
* Patch forward | * Patch forward | ||
- | :Benny Halevy (Panasas) | + | :Benny Halevy (Panasas) has rebased the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level. |
+ | |||
+ | Benny says: "I've completed rebasing our patches in the linux-pnfs-2.6 over 2.6.24-rc5." I'm confused | ||
* Rewrite | * Rewrite | ||
Line 31: | Line 37: | ||
===Fully implement the final specification=== | ===Fully implement the final specification=== | ||
- | The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification is draft | + | The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification at this writing is [http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.txt draft 17]; draft 18 is anticipated by December 21, 2007, and draft 19 is anticipated before the next IETF meeting. Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation. This is detailed below. |
===Organize and submit a sequence of patches to the Linux maintainers=== | ===Organize and submit a sequence of patches to the Linux maintainers=== | ||
Line 85: | Line 91: | ||
The pNFS generic client supports two I/O paths that use the NFS page cache: | The pNFS generic client supports two I/O paths that use the NFS page cache: | ||
- | + | * an RPC based I/O path, used by the file layout module, and | |
- | + | *a non-RPC path, used by the block layout and object layout modules. | |
- | + | Implmentation steps: | |
- | + | ||
- | + | * Negotiate pNFS layout type common to the pNFS client and server | |
+ | *Client and server perform I/O over the file layout type | ||
+ | *Client returns layout on unmount | ||
Client implementation: | Client implementation: | ||
- | + | *Generic pNFS client and layout API | |
- | + | * File layout, using the layout API | |
Server implementation: | Server implementation: | ||
- | + | * pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system. | |
The server API is used by the following prototypes: | The server API is used by the following prototypes: | ||
- | + | * IBM GPFS file layout server, | |
- | + | * Panasas object layout server, and | |
- | + | * Network Appliance Linux MDS file layout server. | |
The Network Appliance Linux MDS prototype is not released at this writing. | The Network Appliance Linux MDS prototype is not released at this writing. | ||
Line 111: | Line 119: | ||
Client and server operations to be implemented: | Client and server operations to be implemented: | ||
- | + | * OP_EXCHANGE_ID | |
- | + | * pNFS-specific OP_GETATTR attributes | |
- | + | * OP_GETDEVICELIST | |
- | + | * OP_GETDEVICEINFO | |
- | + | * OP_LAYOUTGET | |
- | + | * OP_LAYOUTCOMMIT | |
- | + | * OP_LAYOUTRETURN | |
+ | |||
+ | ''Depends on minor version switch and minimal sessions forward channel'' | ||
- | |||
- | |||
===pNFS layout recall=== | ===pNFS layout recall=== | ||
- | + | * Enable the pNFS server to recall layouts using the minimal sessions back channel. | |
- | + | * Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available. | |
- | + | * When complete, the pNFS client and server will be able to setup and manage layout caches. | |
Client and server operations to be implemented: | Client and server operations to be implemented: | ||
- | + | * OP_CB_SEQUENCE | |
- | + | * OP_CB_LAYOUTRECALL | |
- | + | * OP_CB_RECALLABLE_OBJ_AVAIL | |
- | + | * OP_LAYOUTGET | |
- | + | * OP_LAYOUTRETURN | |
- | Depends on | + | ''Depends on minor version switch, and minimal sessions forward and back channels, and pNFS I/O'' |
===Exactly once semantics=== | ===Exactly once semantics=== | ||
- | + | * Revisit forward channel attributes on client and server. | |
- | + | * Implement server replay cache | |
- | Depends on | + | ''Depends on minor version switch and minimal sessions forward channel'' |
===pNFS reboot recovery=== | ===pNFS reboot recovery=== | ||
- | + | * Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC. | |
- | + | * Implement grace period recovery. | |
- | Depends on | + | ''Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O'' |
===Full sessions forward channel=== | ===Full sessions forward channel=== | ||
- | + | * Implement the mandatory session forward channel features: | |
- | + | * Trunking | |
- | + | * OP_BIND_CONN_TO_SESSION | |
- | + | * Kerberos and X509 machine credentials at mount for EXCHANGE_ID | |
- | + | * SSV | |
- | + | * Secure forward channel | |
- | Depends on | + | ''Depends on minor version switch and minimal sessions forward channel'' |
===Full sessions back channel=== | ===Full sessions back channel=== | ||
- | + | * Implement the mandatory session forward channel features: | |
- | + | * OP_BACKCHANNEL_CTL | |
- | + | * SSV (secret state verifier) | |
- | + | * Secure back channel | |
- | + | * OP_CB_SEQUENCE | |
- | Depends on | + | ''Depends on minor version switch and minimal sessions forward and back channels'' |
===pNFS device recall=== | ===pNFS device recall=== | ||
- | + | * Implement the Draft 18 pNFS device recall feature. | |
- | Depends on | + | ''Depends on minor version switch and minimal sessions forward and back channels'' |
===Back channel replay cache=== | ===Back channel replay cache=== | ||
- | + | * Implement the NFSv4.1 server replay cache required for exactly once semantics. | |
- | Depends on | + | ''Depends on minor version switch and minimal sessions forward and back channels'' |
===pNFS O_DIRECT I/O path=== | ===pNFS O_DIRECT I/O path=== | ||
Line 188: | Line 196: | ||
When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache. | When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache. | ||
- | + | * Add pNFS I/O callouts to fs/nfs/direct.c to get a layout. | |
- | + | * Perform pNFS I/O. | |
- | Status | + | ==Status== |
The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels | The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels | ||
- | + | * Step 1: switch on minor version | |
- | + | * Step 2: minimal sessions forward channel | |
- | + | * Step 3: minimal sessions back channel | |
- | + | * Step 4: pNFS I/O | |
- | + | :* READ for file, block, and object layouts | |
- | + | :* WRITE for file, block, and object layouts is working, but needs patches factored for review |
Latest revision as of 21:48, 16 January 2008
Completing pNFS for Linux requires fighting three battles: IETF specification, Linux implementation, and integration into the Linux kernel.
Section 1 describes the status of NFSv4.1 specification, based on the IETF meeting that just ended.
Section II describes the plan for implementation and integration.
IETF Road Map
NFSv4.1 extends NFSv4 with two major components: sessions and pNFS. As of the 70th IETF Meeting in Vancouver (December 2007), the specification of sessions in draft-ietf-nfsv4-minorversion1-17.txt appears to be complete. pNFS discussions centered on device ID mappings, layout range accounting, sparse files, persistent sessions, and recall processing.
Draft 18 is anticipated to be released on December 21, 2007. The major change is device mappings, which allow a device ID to be recalled without affecting the layout. Draft 18 issues will be tested at the Austin Bakeathon in February 2008.
Draft 19 is expected to follow the Austin Bakeathon and be issued as an RFC following the 71st IETF Meeting in Philadelphia (March 2008). This will freeze the specification of sessions, generic pNFS protocol issues, and pNFS file layout. Specification of block layout, currently draft-ietf-nfsv4-pnfs-block-05.txt, and object layout, currently draft-ietf-nfsv4-pnfs-obj-04.txt, may also be ready to move forward in Philadelphia; otherwise they will wait until the 72nd IETF Meeting in Europe (July/August 2008).
Linux pNFS Road Map
The Linux pNFS road map entails fighting three battles
Rebase the implementation on the latest Linux kernel
The current version of Linux pNFS is implemented on the 2.6.18.3 kernel. The Linux pNFS developers group is rebasing the code to the latest kernel, 2.6.24 at this writing.
Along the way to the current kernel, the NFS client and RPC layer saw major changes, complicating a direct port pNFS and sessions code. This led to two efforts to rebase the code:
- Patch forward
- Benny Halevy (Panasas) has rebased the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level.
Benny says: "I've completed rebasing our patches in the linux-pnfs-2.6 over 2.6.24-rc5." I'm confused
- Rewrite
- A team at Network Appliance led by Ricardo Labiaga rewrote sessions for the latest Linux kernel and submitted patches to the Linux pNFS developers group for review. The forward channel code was added to linux-pnfs-2.6-latest, a git tree based on the latest kernel.
- Andy Adamson rewrote the pNFS I/O path. READ I/O patches to the latest kernel are under review by the Linux pNFS developers. WRITE I/O is being factored into patches of manageable size.
Fully implement the final specification
The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification at this writing is draft 17; draft 18 is anticipated by December 21, 2007, and draft 19 is anticipated before the next IETF meeting. Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation. This is detailed below.
Organize and submit a sequence of patches to the Linux maintainers
Once the code is ported to a git tree based on Linus’ kernel and brought forward to the final NFSv4.1 draft, a “ready to submit” branch can be made available to Linux kernel maintainers and pNFS developers for review, performance testing, and error testing.
The pNFS and sessions patches for 2.6.18.3 tree are huge and lack a patch history suitable for submission. Benny is creating small patches from the 2.6.18.3 code base and applying them to successive kernels, with the hope that at the end of the process, he will have preserved functionality and created a patch history useful for submitting to kernel review.
Ricardo and Andy approach the problem from the other direction. After rewriting pNFS and sessions for the latest kernel, they factor the code into small patches that they can submit for review by Linux kernel maintainers.
Components and dependencies
Switch on minor version
Provide the unified framework for minor versions in the NFSv4 client and server.
Minimal sessions, forward channel
Set up the minimal NFSv4.1 session over a forward channel, including session slot and sequence number management.
Client and server negotiate a session, place an OP_SEQUENCE as the first operation of every compound, and recover from session loss due to lease expiration.
Implement session keep-alive.
The client and server operations to be implemented for this step:
- OP_EXCHANGE_ID
- OP_CREATE_SESSION
- OP_SEQUENCE
- OP_DESTROY_SESSION
- Add OP_SEQUENCE to each compound
- State renewal
- RPC layer errors
Depends on minor version switch
Minimal sessions, back channel
- Set up a minimal NFSv4.1 session over a back channel negotiated between the client and the server.
- Use the forward channel code for session slot and sequence number management.
- Client and server will create back channel(s) and place a CB_SEQUENCE as the first operation on all CB_COMPOUND RPCs.
Client and server operations to be implemented:
- OP_CREATE_SESSION
- OP_CB_SEQUENCE
- OP_CB_RECALL_SLOT
Depends on minor version switch and minimal sessions forward channel
pNFS I/O READ and WRITE
The pNFS generic client supports two I/O paths that use the NFS page cache:
- an RPC based I/O path, used by the file layout module, and
- a non-RPC path, used by the block layout and object layout modules.
Implmentation steps:
- Negotiate pNFS layout type common to the pNFS client and server
- Client and server perform I/O over the file layout type
- Client returns layout on unmount
Client implementation:
- Generic pNFS client and layout API
- File layout, using the layout API
Server implementation:
- pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.
The server API is used by the following prototypes:
- IBM GPFS file layout server,
- Panasas object layout server, and
- Network Appliance Linux MDS file layout server.
The Network Appliance Linux MDS prototype is not released at this writing.
Client and server operations to be implemented:
- OP_EXCHANGE_ID
- pNFS-specific OP_GETATTR attributes
- OP_GETDEVICELIST
- OP_GETDEVICEINFO
- OP_LAYOUTGET
- OP_LAYOUTCOMMIT
- OP_LAYOUTRETURN
Depends on minor version switch and minimal sessions forward channel
pNFS layout recall
- Enable the pNFS server to recall layouts using the minimal sessions back channel.
- Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available.
- When complete, the pNFS client and server will be able to setup and manage layout caches.
Client and server operations to be implemented:
- OP_CB_SEQUENCE
- OP_CB_LAYOUTRECALL
- OP_CB_RECALLABLE_OBJ_AVAIL
- OP_LAYOUTGET
- OP_LAYOUTRETURN
Depends on minor version switch, and minimal sessions forward and back channels, and pNFS I/O
Exactly once semantics
- Revisit forward channel attributes on client and server.
- Implement server replay cache
Depends on minor version switch and minimal sessions forward channel
pNFS reboot recovery
- Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC.
- Implement grace period recovery.
Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O
Full sessions forward channel
- Implement the mandatory session forward channel features:
- Trunking
- OP_BIND_CONN_TO_SESSION
- Kerberos and X509 machine credentials at mount for EXCHANGE_ID
- SSV
- Secure forward channel
Depends on minor version switch and minimal sessions forward channel
Full sessions back channel
- Implement the mandatory session forward channel features:
- OP_BACKCHANNEL_CTL
- SSV (secret state verifier)
- Secure back channel
- OP_CB_SEQUENCE
Depends on minor version switch and minimal sessions forward and back channels
pNFS device recall
- Implement the Draft 18 pNFS device recall feature.
Depends on minor version switch and minimal sessions forward and back channels
Back channel replay cache
- Implement the NFSv4.1 server replay cache required for exactly once semantics.
Depends on minor version switch and minimal sessions forward and back channels
pNFS O_DIRECT I/O path
When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache.
- Add pNFS I/O callouts to fs/nfs/direct.c to get a layout.
- Perform pNFS I/O.
Status
The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels
- Step 1: switch on minor version
- Step 2: minimal sessions forward channel
- Step 3: minimal sessions back channel
- Step 4: pNFS I/O
- READ for file, block, and object layouts
- WRITE for file, block, and object layouts is working, but needs patches factored for review