Some NFS file transfers fail and hang automounting

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
 
(4 intermediate revisions not shown)
Line 1: Line 1:
== About ==
== About ==
* Kernel version: 2.6.33.5-112.fc13.x86_64  
* Kernel version: 2.6.33.5-112.fc13.x86_64  
-
* [https://bugzilla.kernel.org/show_bug.cgi?id=16213 Bug 126213]
+
* [https://bugzilla.kernel.org/show_bug.cgi?id=16213 Bug 16213]
* Reported by: Philippe Dax (June 15, 2010)
* Reported by: Philippe Dax (June 15, 2010)
* Fixed by: Trond Myklebust (June 16, 2010)
* Fixed by: Trond Myklebust (June 16, 2010)
Line 21: Line 21:
== Resolution ==
== Resolution ==
-
This problem was fixed by:  
+
* This problem was fixed by commit [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b76ce56192bcf618013fb9aecd83488cffd645cc b76ce56192bcf618013fb9aecd83488cffd645cc]
<pre>
<pre>
commit b76ce56192bcf618013fb9aecd83488cffd645cc
commit b76ce56192bcf618013fb9aecd83488cffd645cc

Latest revision as of 15:42, 22 October 2010

About

  • Kernel version: 2.6.33.5-112.fc13.x86_64
  • Bug 16213
  • Reported by: Philippe Dax (June 15, 2010)
  • Fixed by: Trond Myklebust (June 16, 2010)

Symptoms

  • Given a file "foo" of 50Mb on a remote machine "remote".
    • This command will never finish
      <localmachine $> cp /remote_mount_point/foo bar 
    • bar will have a size less than foo.
    • automounting of the local machine is hung.
  • The following message will show up in /var/log/messages
kernel: Callback slot table overflowed
  • The problem doesn't occur if foo has a size less than 10Mb
  • The final size of bar appears to be random
  • This incident occurs with:
    sunrpc.tcp_slot_table_entries = 16
  • This incident does NOT occur with:
    sunrpc.tcp_slot_table_entries = 32

Resolution

commit b76ce56192bcf618013fb9aecd83488cffd645cc
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Wed Jun 16 13:57:32 2010 -0400

    SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()
    
    If the attempt to read the calldir fails, then instead of storing the read
    bytes, we currently discard them. This leads to a garbage final result when
    upon re-entry to the same routine, we read the remaining bytes.
    
    Fixes the regression in bugzilla number 16213. Please see
        https://bugzilla.kernel.org/show_bug.cgi?id=16213
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Cc: stable@kernel.org
Personal tools