Linux pnfs client rewrite may 2006
From Linux NFS
(2 intermediate revisions not shown) |
Latest revision as of 23:40, 9 July 2007
pNFS Client Rewrite May 2006
NOTE: This patch set has been applied (05-09-2006) to the pnfs-2-6-16 CVS tree. The patches are left on line for their comments.
The current pNFS 2.6.16 CVS kernel client code combines the pNFS processing in the NFSv4 code path resulting in many #ifdefs and pnfs specific switching. The main purpose of this rewrite is to separate the pNFS code path from the NFSv2/v3/v4 code path. Following the method used to separate the NFSv2/v3/v4 code paths from each other, I created a new rpc_ops for pNFS, and moved all pNFS processing into the appropriate routines. New rpc_ops were created where necessary. Existing nfs functions were split when necessary.
The rpc_ops are set at mount. The NFSv4 client uses the nfs_v4_clientops rpc_ops as usual. If a server supports pNFS and a layout driver has been negotiated and initalized, the nfs_v4_clientops are replaced with the new pnfs_v4_clientops (see set_pnfs_layoutderiver()). The pnfs_v4_clientops also contain a reference to the new pnfs_file_operations, and which are now set via the rpc_ops.
Moving pNFS processing into their own rpc_ops allows for errors to be returned in rpc_ops calling routines that are ignored by the normal NFS code path, but required by pNFS. In the RPC based NFS read path, for example, the only error is -ENOMEM from allocating pages. All other errors are detected in the RPC path. pNFS has other possible errors, such as LAYOUTGET failing, or non-RPC based I/0 failing.
Four new rpc_ops allow pNFS to switch between using the normal server (server->rsize,rpages,wsize,wpages) read/write sizes and the data server read/write sizes (server->ds_rsize,ds_rpages,ds_wsize,ds_wpages) without if statements in the normal NFS code path. Note that there is still a chicken-and-egg problem due to this decision being made proir to the request size being calculated.
rsize(struct inode *, struct nfs_read_data *) wsize(struct inode *, struct nfs_write_data *) rpages(struct inode *, unsigned int *) wpages(struct inode *, unsigned int *)
Two new rpc_ops allow isolation of pNFS processing from normal NFS processing in the pageing setup for read and write.
pagein_one(struct list_head *, struct inode *) flush_one(struct inode *, struct list_head *, int, int)
Finally, non-RPC based I/0 drivers can use the page setup routines. At the conclusion of I/O, the pages need to be returned. This code exists in the RPC callback routines, which were called in the old pNFS client code. This resulted in many if(pnfs_XXX) switches around portions of the callback code that are RPC related. I split the callbacks into functions, and created new pNFS_xxx_norpc() callbacks that call only the portions of the RPC callbacks that apply.
The above process has been applied to the read/write/commit code paths. The directIO code path has yet to be examined.