linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* User process NFS write hang in wait_on_commit with kworker
@ 2019-06-18  0:06 Alan Post
  2019-06-18 15:29 ` Benjamin Coddington
  0 siblings, 1 reply; 9+ messages in thread
From: Alan Post @ 2019-06-18  0:06 UTC (permalink / raw)
  To: linux-nfs

On May 20th I reported "User process NFS write hang followed
by automount hang requiring reboot" to this list.  There I
had a process that would hang on NFS write, followed by sync
hanging, eventually leading to my need to reboot the host.

On June 4th, after upgrading to Linux 4.19.44, I reported
the issue resolved.  Since that time, as I've deployed out
Linux 4.19.44, the issue has come back--sort of.

I have begun once again getting sync hangs following a
hung NFS write.  The hung write has a different stack trace
than any I previously reported:

    [<0>] wait_on_commit+0x60/0x90 [nfs]
    [<0>] __nfs_commit_inode+0x146/0x1a0 [nfs]
    [<0>] nfs_file_fsync+0xa7/0x1d0 [nfs]
    [<0>] filp_close+0x25/0x70
    [<0>] put_files_struct+0x66/0xb0
    [<0>] do_exit+0x2af/0xbb0
    [<0>] do_group_exit+0x35/0xa0
    [<0>] __x64_sys_exit_group+0xf/0x10
    [<0>] do_syscall_64+0x45/0x100
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<0>] 0xffffffffffffffff

And there is attendant kworker thread:

    [<0>] wait_on_commit+0x60/0x90 [nfs]
    [<0>] __nfs_commit_inode+0x146/0x1a0 [nfs]
    [<0>] nfs_write_inode+0x5c/0x90 [nfs]
    [<0>] nfs4_write_inode+0xd/0x30 [nfsv4]
    [<0>] __writeback_single_inode+0x27a/0x320
    [<0>] writeback_sb_inodes+0x19a/0x460
    [<0>] wb_writeback+0x102/0x2f0
    [<0>] wb_workfn+0xa3/0x400
    [<0>] process_one_work+0x1e3/0x3d0
    [<0>] worker_thread+0x28/0x3c0
    [<0>] kthread+0x10e/0x130
    [<0>] ret_from_fork+0x35/0x40
    [<0>] 0xffffffffffffffff

Oddly enough, I can clear the problem without rebooting the host.
I arrange to block all traffic between the NFS server and NFS
client using iptables, of sufficient time for any open TCP
connections to timeout.  After which the connection apparently
reestablishes and unblocks the hung process.

I can't explain what's keeping the connection alive but apparently
stalled--requiring my manual intervention.  Do any of you have
ideas or speculation?  I'm happy to poke around in a packet capture
if the information provided isn't sufficient.

-A
-- 
Alan Post | Xen VPS hosting for the technically adept
PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/
email: adp@prgmr.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-05 23:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18  0:06 User process NFS write hang in wait_on_commit with kworker Alan Post
2019-06-18 15:29 ` Benjamin Coddington
2019-06-19  0:07   ` Alan Post
2019-06-19 12:38     ` Benjamin Coddington
2019-06-21 20:47       ` Alan Post
2019-06-28 18:33         ` Alan Post
2019-07-02  9:55           ` Benjamin Coddington
2019-07-03 21:32             ` Alan Post
2019-07-05 23:53               ` Tom Talpey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).