Server Side Copy

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
Amschuma (Talk | contribs)
(Created page with "The server-side copy feature provides a mechanism for the NFS client to perform a file copy on the server without the data being transmitted back and forth over the network. Wit...")
Newer edit →

Revision as of 17:50, 9 August 2013

The server-side copy feature provides a mechanism for the NFS client to perform a file copy on the server without the data being transmitted back and forth over the network. Without this feature, an NFS client copies data from one location to another by reading the data from the server over the network, and then writing the data back over the network to the server. Using this server-side copy operation, the client is able to instruct the server to copy the data locally without the data being sent back and forth over the network unnecessarily.

The main usecase is for virtual machine migration between servers operating over NFS. Another use is for copying large files from one directory on a server to a different directory on the same server.

The application calling the copyfile() system call is in charge of opening and closing file descriptors before and after the copy call. If a file can't be opened, it can't be copied.


Data type reference

  typedef uint64_t length4
  typedef uint64_t offset4
  const COPY4_GUARDED     = 0x00000001;
  const COPY4_METADATA    = 0x00000002;
  struct write_response4 {
          stateid4        wr_callback_id<1>;
          count4          wr_count;
          stable_how4     wr_committed;
          verifier4       wr_writeverf;
  };


2. Argument

  struct COPY4args {
          /* SAVED_FH: source file */
          /* CURRENT_FH: destination file or */
          /*             directory           */
          stateid4        ca_src_stateid;
          stateid4        ca_dst_stateid;
          offset4         ca_src_offset;
          offset4         ca_dst_offset;
          length4         ca_count;
          uint32_t        ca_flags;
          component4      ca_destination;
          netloc4         ca_source_server<>;
  };


3. Result

  union COPY4res switch (nfsstat4 cr_status) {
  case NFS4_OK:
          write_response4 resok4;
  default:
          length4         cr_bytes_copied;
  };


4. Copy range system call

  ssize_t vfs_copy_range(struct file *file_in, loff_t pos_in,
                         struct file *file_out, loff_t pos_out,
                         size_t count);
  struct file_operations {
     <snip>
     ssize_t (*copy_range)(struct file *, loff_t, struct file *, loff_t, size_t);
  };

4.1 My modifications

  Rather than returning with an error if the filesystem doesn't support the
  copy_range file operation, instead fall back on do_splice_direct() to
  copy the data.
  Also do the fallback if the filesystem returns -ENOTSUPP to the VFS so
  NFS v4 and v4.1 can use the copy operation too.


5. Synchronous copy

  Synchronous copy is significantly easier to implement, so it is a good
  milestone for implementing the entire copy operation.  Later patches can
  expand on the sync code to make the copy asynchronous.

5.1. Client

  - Enable the copy_range operation for v4.2, return -ENOTSUPP for v4 and v4.1.
  - Prefer a lock stateid if the file has one, otherwise use the open id.
  - Send the compound:
        SEQUENCE
        PUTFH   /* source */
        SAVEFH
        PUTFH   /* destination */
        COPY
  - Don't need to worry about the server to server case, this may be removed
    from the RFC and the vfs_copy_range() function gives an error if the file
    moves to a different superblock.
  - The server will tell us the number of bytes copied, return this to the
    vfs_copy_range() function.

5.2. Server

  - OP_COPY op_flags should mimic the flags set in OP_WRITE.
  - Use nfs_preprocess_stateid_op() to find files associated with both
    the CURRENT_FH and the SAVED_FH.
  - Call the vfs_copy_range() function with arguments provided by the client.
  - Only copy the first 1 GB (1073741842 bytes) of the requested range to
    avoid holding an RPC slot for too long.
  - Call vfs_fsync_range() after the data is copied and set stable_how to
    NFS_FILE_SYNC.
  - Return an empty stateid list to the client to show that the copy was
    done synchronously.


6. Asynchronous copy

  Asynchronous copy will free up RPC slots, since the server can do the copy
  on its own time and simply notify the client once it's done.  To be spec
  compliant, the OFFLOAD_STATUS and OFFLOAD_ABORT operations also need to be
  implemented, but since I mostly want to prepare the client for this case
  the server patch may be submitted at a later time.

6.1. Client

  - Keep a list of offloads that we know are in flight.  Use a spinlock to
    protect list access.
  - Use a struct completion to put the thread to sleep until the callback
    comes in if we detect an async copy.
  - Watch for the OP_CB_OFFLOAD callback from the server.
  - Match the callback stateid to a stateid on the offload waitlist.
  - Call complete() on the completion struct.
  - Return bytes_copied from the callback data and not the COPY reply data.

6.2. Server

  - Remove the 1GB copy cap.
  - Schedule the copy to run later using a work struct.
  - Need to allocate a new structure to pass to the delayed function so the
    main thread can be deallocated.
  - Need to initialize a new stateid to represent the copy and return it to
    the client so it knows to expect the callback.
  - Call CB_OFFLOAD after completing the copy.
    - Free the stateid during nfsd4_cb_offload_release()


7. Userspace test program

  My test program is similar to cp.  Run it as `nfscopy.py file1 file2` to
  make a copy of file1 named file2.  If an entire file cannot be copied in
  one call then copy will be called again with the range set to the remaining
  data.

7.1 Code

  #!/usr/bin/python
  import sys
  import os
  from ctypes import *
  libc = CDLL("libc.so.6")
  SYS_COPY_RANGE = 314
  def copyfile(f_in, f_out):
  	INTP = POINTER(c_int)
  	size = os.stat(f_in.fileno()).st_size
  	copied = 0
  	while size != 0:
  		pos = c_int(copied)
  		addr = addressof(pos)
  		offset = cast(addr, INTP)
  		print("Offset: %s size: %s" % (copied, size))
  		ret = libc.syscall(c_int(SYS_COPY_RANGE),
  			     c_int(f_in.fileno()),  offset,
  			     c_int(f_out.fileno()), offset,
  			     c_int(size)) # count
  		copied = copied + ret;
  		size = size - ret;
  		print("SYS_COPY_RANGE returns: %s" % ret)
  		print("Total: %s" % copied)
  		if ret < 0:
  			return ret
  		if ret == 0:
  			return ret
  	return ret


  if len(sys.argv) != 3:
  	print("Usage: " + sys.argv[0] + " src dst")
  	sys.exit(1)
  if not os.path.exists(sys.argv[1]):
  	print("ERROR: " + sys.argv[1] + " does not exist :(")
  	sys.exit(1)
  f_in = open(sys.argv[1], "r")
  f_out = open(sys.argv[2], "w")
  copyfile(f_in, f_out)
Personal tools