Soft lockup in nfs commit inode()

From Linux NFS

Revision as of 17:40, 29 July 2011 by Amschuma (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

About

  • Kernel version: 2.6.36, 2.6.37, 2.6.38
  • Bug 29062
  • Reported by: Roman Kononov (February 13, 2011)
  • Closed by: Trond Myklebust (April 2, 2011)

Symptoms

  • Soft lockup message appears in dmesg
    • Something similar to:
Feb 12 00:17:12 10.10.10.102 kernel: BUG: soft lockup - CPU#3 stuck for 67s!
  • Call trace points to nfs_commit_inode()
Feb 12 00:17:12 10.10.10.102 kernel: Call Trace: 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b98e8>] ? out_of_line_wait_on_bit_lock+0x28/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810550e0>] ? wake_bit_function+0x0/0x40 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff8100372e>] ? reschedule_interrupt+0xe/0x20 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81133489>] ? nfs_commit_inode+0xb9/0x1c0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff811337a9>] ? nfs_wb_page+0x69/0xc0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81133ce1>] ? nfs_flush_incompatible+0x41/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81124c9f>] ? nfs_write_begin+0x8f/0x220 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81081638>] ? generic_file_buffered_write+0x118/0x280 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b8c63>] ? schedule+0x273/0x900 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81083419>] ? __generic_file_aio_write+0x229/0x420 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81035aa3>] ? load_balance+0x133/0x700 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81083677>] ? generic_file_aio_write+0x67/0xe0 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81125ad1>] ? nfs_file_write+0xe1/0x230 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff8104a472>] ? get_signal_to_deliver+0x92/0x350 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c2abf>] ? do_sync_write+0xbf/0x100 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff813b8c63>] ? schedule+0x273/0x900 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c31f6>] ? vfs_write+0xc6/0x180 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff810c350e>] ? sys_write+0x4e/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81040276>] ? sys_gettimeofday+0x36/0x90 
Feb 12 00:17:12 10.10.10.102 kernel:  [<ffffffff81002c6b>] ? system_call_fastpath+0x16/0x1b 

git bisect results

$ git bisect log
# bad: [f6f94e2ab1b33f0082ac22d71f66385a60d8157f] Linux 2.6.36
# good: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect start 'v2.6.36' 'v2.6.35' '--' 'fs/nfs'
# skip: [452e93523d9433f83670e7b42cbe75319c208762] NFSv4: Clean up the process of renewing the NFSv4 lease
git bisect skip 452e93523d9433f83670e7b42cbe75319c208762
# skip: [a17c2153d2e271b0cbacae9bed83b0eaa41db7e1] SUNRPC: Move the bound cred to struct rpc_rqst
git bisect skip a17c2153d2e271b0cbacae9bed83b0eaa41db7e1
# bad: [df486a25900f4dba9cdc3886c4ac871951c6aef3] NFS: Fix the selection of security flavours in Kconfig
git bisect bad df486a25900f4dba9cdc3886c4ac871951c6aef3
# bad: [77041ed9b49a9e10f374bfa6e482d30ee7a3d46e] NFSv4: Ensure the lockowners are labelled using the fl_owner and/or fl_pid
git bisect bad 77041ed9b49a9e10f374bfa6e482d30ee7a3d46e
# skip: [c48f4f3541e67881c9eb7c46e052f5ece48ef530] NFSv41: Convert the various reboot recovery ops etc to minor version ops
git bisect skip c48f4f3541e67881c9eb7c46e052f5ece48ef530
# good: [d185a334c748b3ca9de1f3a293fd8a9cf68378ab] NFSv4.1: Simplify nfs41_sequence_done()
git bisect good d185a334c748b3ca9de1f3a293fd8a9cf68378ab
# skip: [a4432345352c2be157ed844603147ac2c82f209c] NFSv41: Deprecate nfs_client->cl_minorversion
git bisect skip a4432345352c2be157ed844603147ac2c82f209c
# good: [035168ab39f66e4946d493f9ee20d11e154f332a] NFSv4.1: Make nfs4_setup_sequence take a nfs_server argument
git bisect good 035168ab39f66e4946d493f9ee20d11e154f332a
# good: [d77d76ffb638bd013782138cca6d8f4918c5afd6] NFSv41: Clean up exclusive create
git bisect good d77d76ffb638bd013782138cca6d8f4918c5afd6
# good: [1f0e890dba5b0f543fea47732116b1c65d55614e] NFSv4: Clean up struct nfs4_state_owner
git bisect good 1f0e890dba5b0f543fea47732116b1c65d55614e
# skip: [daccbded7f153ec84a3baf3136052e41d0eab555] NFSv4: Clean up for lockowner XDR encoding
git bisect skip daccbded7f153ec84a3baf3136052e41d0eab555
# bad: [d3c7b7ccc199ee564177ee914c04771d6bc00295] NFSv4: Add support for the RELEASE_LOCKOWNER operation
git bisect bad d3c7b7ccc199ee564177ee914c04771d6bc00295
# bad: [f11ac8db5d07b6e99d41ff4aa39d878ee5cef1c5] NFSv4: Ensure that we track the NFSv4 lock state in read/write requests.
git bisect bad f11ac8db5d07b6e99d41ff4aa39d878ee5cef1c5


Resolution

Commit b8413f98f997bb3ed7327e6d7117e7e91ce010c3
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon Mar 21 15:37:01 2011 -0400

    NFS: Fix a hang/infinite loop in nfs_wb_page()
    
    When one of the two waits in nfs_commit_inode() is interrupted, it
    returns a non-negative value, which causes nfs_wb_page() to think
    that the operation was successful causing it to busy-loop rather
    than exiting.
    It also causes nfs_file_fsync() to incorrectly report the file as
    being successfully committed to disk.
    
    This patch fixes both problems by ensuring that we return an error
    if the attempts to wait fail.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Cc: stable@kernel.org

Personal tools