Trond Myklebust wrote: On Fri, 2011-02-04 at 16:36 -0500, Jim Rees wrote: > I have a report here of iozone hanging when run on nfs4 client against an > EMC server. We have reproduced this problem with a wide range of client > kernel versions, from 2.6.33.3-85.fc13.x86_64 up to > 2.6.38-0.rc3.git2.1.pnfs_wave3_20110203.fc15.x86_64, and on both 4.0 and > 4.1. It seems to happen only with heavy multi-threaded iozone testing with > big files. The iozone is something like this: > > iozone -r 2m -s 256m -w -W -c -t 12 -i 0 -o > > The call trace is usually something like this: > > [] ? sync_page+0x0/0x45 > [] io_schedule+0x6e/0xb0 > [] sync_page+0x41/0x45 > [] __wait_on_bit+0x43/0x76 > [] wait_on_page_bit+0x6d/0x74 > [] ? wake_bit_function+0x0/0x2e > [] ? pagevec_lookup_tag+0x20/0x29 > [] filemap_fdatawait_range+0x9f/0x173 > [] filemap_write_and_wait_range+0x3e/0x51 > [] vfs_fsync_range+0x5a/0xad > [] generic_write_sync+0x53/0x55 > [] generic_file_aio_write+0x86/0xa2 > [] nfs_file_write+0xed/0x169 [nfs] > [] do_sync_write+0xbf/0xfc > [] ? __slab_free+0x28/0x22e > [] ? might_fault+0x1c/0x1e > [] ? security_file_permission+0x11/0x13 > [] vfs_write+0xa9/0x106 > [] sys_write+0x45/0x69 > [] system_call_fastpath+0x16/0x1b > > I have a pcap file here but it's 8GB. I am trying to distill it to the > important parts. > > Those of you who are familiar with the page cache, is there any obvious > deadlock here that jumps out at you? The above just tells you that something is waiting for the PG_writeback lock (IOW: it is waiting for a writeback of the page to the server to complete). It doesn't actually tell you why that page writeback is failing to complete. Can you send us the output of 'dmesg' after you do echo 0 >/proc/sys/sunrpc/rpc_debug as root? The 'echo' command needs to be done during the hang. Attached.