[RFC]: make nfs_wait_on_request() KILLABLE

* [RFC]: make nfs_wait_on_request() KILLABLE
       [not found] <1508157147.125995.1412239112248.JavaMail.zimbra@opinsys.fi>
@ 2014-10-02  9:01 ` Tuomas Räsänen
  2014-10-02 13:45   ` Trond Myklebust
  0 siblings, 1 reply; 3+ messages in thread
From: Tuomas Räsänen @ 2014-10-02  9:01 UTC (permalink / raw)
  To: linux-nfs

Hi

Before David Jefferey's commit:

  92a5655 nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait

we often experienced softlockups in our systems due to busy-looping
after SIGKILL.

With that patch applied, the frequency of softlockups has decreased
but they are not completely gone. Now softlockups happen with
following kind of call traces:

 [<c1045c27>] ? kvm_clock_get_cycles+0x17/0x20
 [<c10b2028>] ? ktime_get_ts+0x48/0x140
 [<f8b77be0>] ? nfs_free_request+0x90/0x90 [nfs]
 [<c1656fb6>] io_schedule+0x86/0x100
 [<f8b77bed>] nfs_wait_bit_uninterruptible+0xd/0x20 [nfs]
 [<c16572d1>] __wait_on_bit+0x51/0x70
 [<f8b77be0>] ? nfs_free_request+0x90/0x90 [nfs]
 [<f8b77be0>] ? nfs_free_request+0x90/0x90 [nfs]
 [<c165734b>] out_of_line_wait_on_bit+0x5b/0x70
 [<c1091470>] ? autoremove_wake_function+0x40/0x40
 [<f8b77f3e>] nfs_wait_on_request+0x2e/0x30 [nfs]
 [<f8b7c5ae>] nfs_updatepage+0x11e/0x7d0 [nfs]
 [<f8b7b15b>] ? nfs_page_find_request+0x3b/0x50 [nfs]
 [<f8b7c41d>] ? nfs_flush_incompatible+0x6d/0xe0 [nfs]
 [<f8b6f1a0>] nfs_write_end+0x110/0x280 [nfs]
 [<c10503f2>] ? kmap_atomic_prot+0xe2/0x100
 [<c1050283>] ? __kunmap_atomic+0x63/0x80
 [<c1121e52>] generic_file_buffered_write+0x132/0x210
 [<c112362d>] __generic_file_aio_write+0x25d/0x460
 [<f8b71df2>] ? __nfs_revalidate_inode+0x102/0x2e0 [nfs]
 [<c1123883>] generic_file_aio_write+0x53/0x90
 [<f8b6e267>] nfs_file_write+0xa7/0x1d0 [nfs]
 [<c12a78eb>] ? common_file_perm+0x4b/0xe0
 [<c11794f7>] do_sync_write+0x57/0x90
 [<c11794a0>] ? do_sync_readv_writev+0x80/0x80
 [<c1179975>] vfs_write+0x95/0x1b0
 [<c117a019>] SyS_write+0x49/0x90
 [<c165a297>] syscall_call+0x7/0x7
 [<c1650000>] ? balance_dirty_pages.isra.18+0x390/0x4c3

As I understand it, there are some outstanding requests going on which
nfs_wait_on_request() is waiting for. For some reason, they are not
finished in timely manner and the process is eventually killed with
SIGKILL by admin. However, nfs_wait_on_request() has set the task
state TASK_UNINTERRUPTIBLE and it does not get killed.

Why nfs_wait_on_request() is UNINTERRUPTIBLE instead of KILLABLE?

Would the following patch fix the issue?

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index be7cbce..6a1766d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -459,8 +459,9 @@ void nfs_release_request(struct nfs_page *req)
 int
 nfs_wait_on_request(struct nfs_page *req)
 {
-       return wait_on_bit_io(&req->wb_flags, PG_BUSY,
-                             TASK_UNINTERRUPTIBLE);
+       return wait_on_bit_action(&req->wb_flags, PG_BUSY,
+                               nfs_wait_bit_killable,
+                               TASK_KILLABLE);
 }
 
 /*

-- 
Tuomas

^ permalink raw reply related	[flat|nested] 3+ messages in thread