Server Side Copy

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
 
Line 65: Line 65:
Rather than returning with an error if the filesystem doesn't support the copy_range file operation, instead fall back on do_splice_direct() to copy the data.
Rather than returning with an error if the filesystem doesn't support the copy_range file operation, instead fall back on do_splice_direct() to copy the data.
-
Also do the fallback if the filesystem returns -ENOTSUPP to the VFS so NFS v4 and v4.1 can use the copy operation too.
+
Also do the fallback if the filesystem returns -ENOTSUPP to the VFS so NFS v4 and v4.1 can use the copy operation too without having to include a "../internal.h" an export a vfs function to modules.

Latest revision as of 18:18, 9 August 2013

The server-side copy feature provides a mechanism for the NFS client to perform a file copy on the server without the data being transmitted back and forth over the network. Without this feature, an NFS client copies data from one location to another by reading the data from the server over the network, and then writing the data back over the network to the server. Using this server-side copy operation, the client is able to instruct the server to copy the data locally without the data being sent back and forth over the network unnecessarily.

The main usecase is for virtual machine migration between servers operating over NFS. Another use is for copying large files from one directory on a server to a different directory on the same server.

The application calling the copyfile() system call is in charge of opening and closing file descriptors before and after the copy call. If a file can't be opened, it can't be copied.


Contents

Data type reference

typedef uint64_t length4
typedef uint64_t offset4

const COPY4_GUARDED     = 0x00000001;
const COPY4_METADATA    = 0x00000002;

struct write_response4 {
        stateid4        wr_callback_id<1>;
        count4          wr_count;
        stable_how4     wr_committed;
        verifier4       wr_writeverf;
};


Argument

struct COPY4args {
        /* SAVED_FH: source file */
        /* CURRENT_FH: destination file or */
        /*             directory           */
        stateid4        ca_src_stateid;
        stateid4        ca_dst_stateid;
        offset4         ca_src_offset;
        offset4         ca_dst_offset;
        length4         ca_count;
        uint32_t        ca_flags;
        component4      ca_destination;
        netloc4         ca_source_server<>;
};


Result

union COPY4res switch (nfsstat4 cr_status) {
case NFS4_OK:
        write_response4 resok4;
default:
        length4         cr_bytes_copied;
};


Copy range system call

ssize_t vfs_copy_range(struct file *file_in, loff_t pos_in,
                       struct file *file_out, loff_t pos_out,
                       size_t count);

struct file_operations {

   <snip>

   ssize_t (*copy_range)(struct file *, loff_t, struct file *, loff_t, size_t);
};

My modifications

Rather than returning with an error if the filesystem doesn't support the copy_range file operation, instead fall back on do_splice_direct() to copy the data.

Also do the fallback if the filesystem returns -ENOTSUPP to the VFS so NFS v4 and v4.1 can use the copy operation too without having to include a "../internal.h" an export a vfs function to modules.


Synchronous copy

Synchronous copy is significantly easier to implement, so it is a good milestone for implementing the entire copy operation. Later patches can expand on the sync code to make the copy asynchronous.

Client

  • Enable the copy_range operation for v4.2, return -ENOTSUPP for v4 and v4.1.
  • Prefer a lock stateid if the file has one, otherwise use the open id.
  • Send the compound:
SEQUENCE
PUTFH   /* source */
SAVEFH
PUTFH   /* destination */
COPY
  • Don't need to worry about the server to server case, this may be removed from the RFC and the vfs_copy_range() function gives an error if the file moves to a different superblock.
  • The server will tell us the number of bytes copied, return this to the vfs_copy_range() function.

Server

  • OP_COPY op_flags should mimic the flags set in OP_WRITE.
  • Use nfs_preprocess_stateid_op() to find files associated with both the CURRENT_FH and the SAVED_FH.
  • Call the vfs_copy_range() function with arguments provided by the client.
  • Only copy the first 1 GB (1073741842 bytes) of the requested range to avoid holding an RPC slot for too long.
  • Call vfs_fsync_range() after the data is copied and set stable_how to NFS_FILE_SYNC.
  • Return an empty stateid list to the client to show that the copy was done synchronously.


Asynchronous copy

Asynchronous copy will free up RPC slots, since the server can do the copy on its own time and simply notify the client once it's done. To be spec compliant, the OFFLOAD_STATUS and OFFLOAD_ABORT operations also need to be implemented, but since I mostly want to prepare the client for this case the server patch may be submitted at a later time.

Client

  • Keep a list of offloads that we know are in flight. Use a spinlock to protect list access.
  • Use a struct completion to put the thread to sleep until the callback comes in if we detect an async copy.
  • Watch for the OP_CB_OFFLOAD callback from the server.
  • Match the callback stateid to a stateid on the offload waitlist.
  • Call complete() on the completion struct.
  • Return bytes_copied from the callback data and not the COPY reply data.

Server

  • Remove the 1GB copy cap.
  • Schedule the copy to run later using a work struct.
  • Need to allocate a new structure to pass to the delayed function so the main thread can be deallocated.
  • Need to initialize a new stateid to represent the copy and return it to the client so it knows to expect the callback.
  • Call CB_OFFLOAD after completing the copy.
    • Free the stateid during nfsd4_cb_offload_release()


Userspace test program

My test program is similar to cp. Run it as `nfscopy.py file1 file2` to make a copy of file1 named file2. If an entire file cannot be copied in one call then copy will be called again with the range set to the remaining data.


#!/usr/bin/python

import sys
import os
from ctypes import *

libc = CDLL("libc.so.6")
SYS_COPY_RANGE = 314

def copyfile(f_in, f_out):
	INTP = POINTER(c_int)
	size = os.stat(f_in.fileno()).st_size
	copied = 0

	while size != 0:
		pos = c_int(copied)
		addr = addressof(pos)
		offset = cast(addr, INTP)
		print("Offset: %s size: %s" % (copied, size))

		ret = libc.syscall(c_int(SYS_COPY_RANGE),
			     c_int(f_in.fileno()),  offset,
			     c_int(f_out.fileno()), offset,
			     c_int(size)) # count
		copied = copied + ret;
		size = size - ret;
		print("SYS_COPY_RANGE returns: %s" % ret)
		print("Total: %s" % copied)
		if ret < 0:
			return ret
		if ret == 0:
			return ret
	return ret


if len(sys.argv) != 3:
	print("Usage: " + sys.argv[0] + " src dst")
	sys.exit(1)

if not os.path.exists(sys.argv[1]):
	print("ERROR: " + sys.argv[1] + " does not exist :(")
	sys.exit(1)

f_in = open(sys.argv[1], "r")
f_out = open(sys.argv[2], "w")
copyfile(f_in, f_out)
Personal tools