Cluster client migration prototype

From Linux NFS

Revision as of 20:08, 10 January 2008 by Richterd (Talk | contribs)
Jump to: navigation, search

As part of CITI's work with IBM, we looked at some of the issues involved with NFSv4 client migration and developed an initial prototype. Our setup involved a cluster of equivalent NFS servers attached to a GFS2 disk array, with each server exporting the same directory from the GFS2 filesystem. The intent was to provide an interface by which an administrator could selectively migrate NFSv4 clients from one server to another (e.g., to take a server down for maintenance).

Contents

Prototype overview

The prototype is a proof-of-concept: the "right way" to migrate a client would be to transfer all of the client-related state from one server to another and then have the client reorient to the new server and continue without interruption; instead, this prototype leverages parts of the existing reboot-recovery process. To briefly explain reboot-recovery, when a Linux NFSv4 server starts, it enters a ~90sec phase called a grace period; during this time, eligible clients may contact the server and reclaim state for open files and locks they were holding prior to a server crash/reboot. In order to allow clients to reclaim state without conflicts, new opens, etc, are disallowed during the grace period.

Migration overview

During a migration, the cluster is put into an artificial grace period and the target-server is notified that a new client is eligible to perform reclaims. When the client contacts the source-server, it receives an error message saying that the file system has moved and sees that it should migrate to the target-server. The client establishes a connection to the target-server and reclaims its state almost identically to how it would after a server reboot. Shortly thereafter, the grace period expires, the client is purged from the source-server, and then it's business as usual.

Statetransfer daemon

To go into a bit more detail, the migration prototype is based off of a redesigned approach to reboot-recovery that Andy Adamson developed, wherein a new userspace daemon (so far named rpc.stransd) takes over some responsibilities previously handled within the kernel. For the most part, the daemon is responsible for keeping track of the clientids of legitimate NFS clients who have established state on the server; the daemon records these clientids in stable storage.

For migration, the administrator runs a client program (so far called rpc.strans) that contacts the source-server's rpc.stransd and sends the IP address of the client to migrate. rpc.stransd looks up all clientids associated with that IP address and sends them to the target-server's rpc.stransd, which saves them in stable storage and notifies its (the target-server's) knfsd that the clientids are eligible for reclaim. Then, when the client has received the error message that the file system has moved, it sends an FS_LOCATIONS request to the source-server in order to find out where next it should go and receives a reply containing the target-server's IP address. Since it is migrating, the client reuses its existing clientid (already in the target-server's eligible-to-reclaim list) when it contacts the target-server instead of creating a new one, and thereafter proceeds to reclaim its state.

Going forward

The mechanism by which one rpc.stransd transfers clientids to another will be expanded so that all client open/lock/delegation state held on the source-server can be directly sent to the target-server and loaded into memory. By directly transferring the state instead of relying on reclaims, the invasive and problematic cluster-wide grace period can be avoided entirely.

Limitations

The existing prototype is limited in many ways: for ease of integration, only the creation of a symlink completes a migration event on the client; there is no security associated with the triggering of a migration; the GFS2 code in the kernel version used in the prototype is fragile; the list goes on. Nevertheless, we have migrated clients at CITI that are able to -- to the extent that the maturity of that version of the GFS2 code permits -- continue functioning normally after a migration.


Prototype code

As a compromise between the setup of the original prototype and the relative stability of GFS2, the current code based off of the 2.6.19.7 Linux kernel. Until a proper git repository is online, here is a

A network trace of the client 141.211.133.86 migrating from server 141.211.133.212 to 141.211.133.213 is available from CITI's website. After filtering for only NFS traffic, packets 104/106 show a file initially being opened; then the migration was triggered; then, packets 128/130 show the client trying to make a symlink and getting a "moved" error; packets 140/142 show the client making contact with the target server; packets 156/158 show the client reclaiming state for the file it had open; and finally, packets 239/241 show subsequent "normal" operation as another file is read after the artificial grace period expired.

Personal tools