Configuring pNFS/spnfsd
From Linux NFS
Note: spnfs was dropped from git://linux-nfs.org/~bhalevy/linux-pnfs.git pnfs-all-3.2
Contents[hide] |
What is pNFS ?
pNFS is a new NFS feature provided in NFSv4.1, also known as Parallel NFS. Parallel NFS (pNFS) extends Network File Sharing version 4 (NFSv4) to allow clients to directly access file data on the storage used by the NFSv4 server. This ability to bypass the server for data access can increase both performance and parallelism, but requires additional client functionality for data access, some of which is dependent on the class of storage used.
Parallel NFS comes with various ways of accessing the data directly. For the moment, three such "layouts" have been provided.
- the LAYOUT4_FILE that stripes accross multiple NFS Server
- the LAYOUT4_BLOCK_VOLUME that allow the client to access data as stored in a block device
- LAYOUT4_OSD2_OBJECTS that is based on the OSD2 protocol.
NFSv4.1 and pNFS are described by the following RFCs:
- RFC5661 : Network File System (NFS) Version 4 Minor Version 1 Protocol
- RFC5662 : Network File System (NFS) Version 4 Minor Version 1, External Data Representation Standard (XDR) Description
- RFC5663 : Parallel NFS (pNFS) Block/Volume Layout
- RFC5664 : Object-Based Parallel NFS (pNFS) Operations
What is spNFS ?
spNFS is a simple pNFS LAYOUT4_FILE server implementation that uses independent nfs servers as data servers, and puts most of the MDS logic in a userspace daemon. As of early 2011, it is mostly unmaintained, and we no longer recommend its use; the following directions are still available if you want to try spNFS anyway, but you may be better off with a different server implementation (see http://wiki.linux-nfs.org/wiki/index.php/PNFS_server_projects).
Content of this document
This document describes how 3 machines were set up to build a basic pNFS/LAYOUT4_FILE test configuration using spNFS on the server side.
(WARNING: as of February 2011, the spNFS code is mostly unmaintained; we no longer recommend it.)
The machines I used are:
- nfsmds, IP addr = XX.YY.ZZ.A, used as Metadata Server
- nfsds, IP addr = XX.YY.ZZ.B, used a Data Server
- nfsclient, IP addr = XX.YY.ZZ.C, used as client
Where is the source code?
The first things to be done are recompiling a kernel and a nfs-utils distribution that are compatible. I used those from Benny Halevy's git repository:
# Get kernel repository git clone git://git.linux-nfs.org/projects/bhalevy/linux-pnfs.git # Get nfs-utils repository git://linux-nfs.org/~bhalevy/pnfs-nfs-utils.git
For this document, I used the repositories version with the following status:
- pnfs-nfs-utils: commit id = 2b5373db8615a52c47dbcf3ab968fad7cdcc6fed (pnfs-nfs-utils-1-2-2)
- kernel linux-pnfs: commit id = cbd09e0fb2b160a06a44aad1c21786b99401823f (pnfs-all-2.6.33-2010-03-09)
Let's go configuring now...
Building the pnfs Kernel
The kernel compilation goes ok. Just make sure that you have the right options configured in .config
CONFIG_NETWORK_FILESYSTEMS=y CONFIG_NFS_FS=m CONFIG_NFS_V4=y CONFIG_NFS_V4_1=y CONFIG_PNFS=y CONFIG_NFSD=m CONFIG_PNFSD=y # CONFIG_PNFSD_LOCAL_EXPORT is not set CONFIG_SPNFS=y CONFIG_SPNFS_LAYOUTSEGMENTS=y
With kernel 2.6.34 or higher, add (should be the same as CONFIG_NFS_FS)
CONFIG_PNFS_FILE_LAYOUT=m
Building nfs-utils
Compiling pnfs-nfs-utils will be done as this
# autoreconf --instal # ./configure --prefix=/usr && make && make install
but you have to make sure that you have the following products installed (all nodes were installed with a Fedora 12):
- libtirpc + libtirpc-dev
- tcp_wrappers + tcp_wrapper-libs + tcp_wrappers-devel
- libblkid + libblkid-devel
- libevent + libevent-devel
- libnfsidmap
- device-mapper-devel (starting Fedora 15)
You'll find all of them as rpm packages, but the libnfsidmap. For this one, you'll have to get the lastest version, compile and install it (do not forget to specify "./configure --prefix=/usr"). You can get it from nfs-utils-lib-devel-1.1.4-8 or higher as well.
Basically, as command like the following one should do all the required work (example for Fedora 15):
# yum install libtirpc{,-devel} tcp_wrappers{,-devel} libevent{,-devel} libnfsidmap{,-devel} openldap-devel \ libgssglue{,-devel} krb5-devel libblkid{,-devel} device-mapper-devel libcap{,-devel}
Configuring the test bed to used pNFS over LAYOUT4_FILES
In this configuration, the client (nfsclient) will mount the MDS (nfsmds). The client has inserted a specific kernel module, known as the layout driver to connect to the DS. All of the metadata traffic will go through the MDS, but data traffic will be done in-between the DS and the client.
The MDS should be able to mount the DS and have root access on it. It runs a user space daemon, the spnfsd (which is part of nfs-utils) that uses this mount point to get information from the DS.
Configuring the spNFS Data Server
The Data Server is just a regular NFSv4.1 server. It is important that the Metadata Data Server had root access on it, to prevent from weird behaviour due to EPERM errors.
The Data Server's /etc/exports will look like this on nfsds:
/export/spnfs *(rw,sync,fsid=0,insecure,no_subtree_check,pnfs,no_root_squash)
Configuring the spNFS Metadata Server
The MDS is a client to the DS, and runs the spnfsd. It is as well a NFSv4.1 server with pNFS enabled.
The spnfsd configuration is done in two steps:
- configuring the MDS as client to the DS
- Writing the /etc/spnfsd.conf file
On the MDS, the /etc/fstab should contain this line:
nfsds:/ /spnfs/XX.YY.ZZ.B nfs4 minorversion=1 0 0
It is mandatory to have the mount point done over NFSv4 and with minorversion set to 1.
Its /etc/spnfsd will look like this (this is a single DS configuration)
[General] Verbosity = 1 Stripe-size = 8192 Dense-striping = 0 Pipefs-Directory = /var/lib/nfs/rpc_pipefs DS-Mount-Directory = /spnfs [DataServers] NumDS = 1 DS1_IP = XX.YY.ZZ.B DS1_PORT = 2049 DS1_ROOT = / DS1_ID = 1
Finally the /etc/exports will be like this
/export *(rw,sync,pnfs,fsid=0,insecure,no_subtree_check,no_root_squash)
Notice the pnfs token within the export's options
Configuring the client
The client is to be used as a regular NFSv4.1 client. The only thing to do is making sure that layout driver kernel module is loaded
# modprobe nfs_layout_nfsv41_files
(previously called nfslayoutdriver in pre-2.6.26 kernels)
Then you can mount the MDS on the client:
# mount -t nfs4 -o minorversion=1 nfsmds:/ /mnt
warning: Before making any read/write operations, make sure that the NFSv4 grace delay is passed. Usually it take 90s after the nfs service starts..
Basic test
The first test is pretty simple: On the client, I write 50 bytes to a file:
# echo "jljlkjljjhkjhkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhk" > ./myfile # ls -i ./myfile 330246 myfile
On the DS, I should see a new file whose name contains the fileid of myfile and located in the root of what it exports to the MDS.
# ls -l /export/spnfs/330246* -rwxrwxrwx 1 root root 50 Mar 24 10:49 /export/spnfs/330246.2343187478 # cat /export/spnfs/330246.2343187478 jljlkjljjhkjhkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhk
As you can see, this file, located on the DS contains the data written by the client.
On the MDS, the file has the right size, but no blocks allocated if watched outside NFS. It contains no data.
# cd /export # stat myfile File: `myfile' Size: 50 Blocks: 0 IO Block: 4096 regular file Device: fd00h/64768d Inode: 330246 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2010-03-24 12:56:02.331151053 +0100 Modify: 2010-03-24 10:49:08.997150735 +0100 Change: 2010-03-24 10:49:08.997150735 +0100 # cat myfile (no output, the file is empty)
-- Philippe Deniel 2010-04-07