From: Jason L Tibbitts III <tibbs@math.uh.edu>
To: linux-nfs@vger.kernel.org
Subject: Re: Need help debugging NFS issues new to 4.20 kernel
Date: Thu, 24 Jan 2019 13:28:46 -0600 [thread overview]
Message-ID: <ufaa7jpam8h.fsf@epithumia.math.uh.edu> (raw)
In-Reply-To: <ufaimyearlx.fsf@epithumia.math.uh.edu> (Jason L. Tibbitts, III's message of "Thu, 24 Jan 2019 11:32:42 -0600")
Just had another machine get into this state. This time I thought to at
least do alt-sysrq-w. The result is a big pile of backtraces but it
looks like basically every NFS operation is hung up.
I'm not sure how useful it will be, but I tried to categorize the
various backtraces. Most of them are like the following, going through
nfs_file_fsync:
tcsh D 0 9511 1 0x80000006
Call Trace:
? __schedule+0x253/0x850
schedule+0x28/0x80
io_schedule+0x12/0x40
wait_on_page_bit_common+0x11a/0x270
? file_check_and_advance_wb_err+0xc0/0xc0
__filemap_fdatawait_range+0xe2/0x130
? __filemap_fdatawrite_range+0xc8/0xf0
filemap_write_and_wait_range+0x45/0x80
nfs_file_fsync+0x44/0x1e0 [nfs]
filp_close+0x31/0x60
put_files_struct+0x6c/0xc0
do_exit+0x2b4/0xb30
? __fput+0x151/0x220
do_group_exit+0x3a/0xa0
get_signal+0x276/0x590
do_signal+0x36/0x610
? do_sys_open+0x128/0x210
exit_to_usermode_loop+0x71/0xe0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4760d46ca2
Code: Bad RIP value.
RSP: 002b:00007ffc29c788d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: fffffffffffffe00 RBX: 00005652381bf990 RCX: 00007f4760d46ca2
RDX: 0000000000000000 RSI: 00005652381d4b40 RDI: 00000000ffffff9c
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffc29c78920
R10: 0000000000000000 R11: 0000000000000246 R12: 00005652381d4b40
R13: 0000000000000000 R14: 00005652381dcd30 R15: 0000000000000000
There are a few others which are hung on locking operations:
QThread D 0 9740 1 0x80000006
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
nfs4_proc_lock+0x1a9/0x320 [nfsv4]
? do_unlk+0x98/0xe0 [nfs]
locks_remove_posix+0xb1/0x140
? __nfs_commit_inode+0xc4/0x1b0 [nfs]
filp_close+0x50/0x60
put_files_struct+0x6c/0xc0
do_exit+0x2b4/0xb30
? mem_cgroup_try_charge+0xe6/0x180
do_group_exit+0x3a/0xa0
get_signal+0x276/0x590
do_signal+0x36/0x610
exit_to_usermode_loop+0x71/0xe0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff3e9321421
Code: Bad RIP value.
RSP: 002b:00007ff3d7ffeac0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
RAX: fffffffffffffdfc RBX: 00007ff3c8004db0 RCX: 00007ff3e9321421
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 00007ff3c8004db0
RBP: 0000000000000001 R08: 0000000000000000 R09: 00007ff3c8000c78
R10: 00007ff3d0003480 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00007ff3e6e48f10 R14: 00000000ffffffff R15: 0000000000000001
gvfsd-metadata D 0 24382 1 0x80000006
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
nfs4_do_close+0x2bf/0x2f0 [nfsv4]
__put_nfs_open_context+0xa9/0x100 [nfs]
nfs_file_release+0x39/0x40 [nfs]
__fput+0xb4/0x220
task_work_run+0x84/0xa0
do_exit+0x2d2/0xb30
do_group_exit+0x3a/0xa0
get_signal+0x276/0x590
do_signal+0x36/0x610
? handle_mm_fault+0xda/0x200
exit_to_usermode_loop+0x71/0xe0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3cf843d421
Code: Bad RIP value.
RSP: 002b:00007ffc224ea250 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
RAX: fffffffffffffdfc RBX: 000055c9db8fa2a0 RCX: 00007f3cf843d421
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 000055c9db8fa2a0
RBP: 0000000000000001 R08: 0000000000000000 R09: 000055c9db8ea488
R10: 000055c9db8dbae0 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00007f3cf85bff10 R14: 00000000ffffffff R15: 0000000000000001
And some others waiting on metadata operations:
localStorage DB D 0 24794 9621 0x00000004
Call Trace:
? __schedule+0x253/0x850
schedule+0x28/0x80
io_schedule+0x12/0x40
wait_on_page_bit_common+0x11a/0x270
? file_check_and_advance_wb_err+0xc0/0xc0
__filemap_fdatawait_range+0xe2/0x130
filemap_write_and_wait+0x47/0x70
nfs_wb_all+0x1a/0x120 [nfs]
nfs_setattr+0x1cb/0x1d0 [nfs]
notify_change+0x2da/0x440
do_truncate+0x74/0xc0
do_sys_ftruncate+0xb6/0x100
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff1cd1382b
Code: Bad RIP value.
RSP: 002b:00007eff0c3787c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 0000000000bd0000 RCX: 00007eff1cd1382b
RDX: 0002000100000000 RSI: 0000000000bd0000 RDI: 000000000000005c
RBP: 00007eff034d61a0 R08: 00007ffc7d3b40b0 R09: 00007ffc7d3b4080
R10: 000000000274ded2 R11: 0000000000000246 R12: 000000000000005c
R13: 00007eff02f081c0 R14: 00007eff025fd8e0 R15: 00007eff034832e0
df D 0 25187 25186 0x00000000
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
? __rpc_atrun+0x20/0x20 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
? trace_event_raw_event_rpc_stats_latency+0x240/0x240 [sunrpc]
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
__rpc_execute+0xe0/0x350 [sunrpc]
? recalibrate_cpu_khz+0x10/0x10
? ktime_get+0x36/0xa0
rpc_run_task+0x121/0x180 [sunrpc]
nfs4_call_sync_sequence+0x64/0xa0 [nfsv4]
_nfs4_proc_getattr+0xf8/0x120 [nfsv4]
nfs4_proc_getattr+0x75/0x100 [nfsv4]
__nfs_revalidate_inode+0x1b7/0x2a0 [nfs]
nfs_getattr+0x118/0x2b0 [nfs]
? security_inode_getattr+0x30/0x50
vfs_statx+0x89/0xe0
__do_sys_newstat+0x39/0x70
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f7dc6195579
Code: Bad RIP value.
RSP: 002b:00007ffd44e725a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 000055ad9832ef80 RCX: 00007f7dc6195579
RDX: 00007ffd44e72650 RSI: 00007ffd44e72650 RDI: 000055ad9832e380
RBP: 000055ad9832e310 R08: 0000000000000003 R09: aaaaaaaaaaaaaaab
R10: 000055ad98326010 R11: 0000000000000246 R12: 000055ad9832e3c0
R13: 0000000000000000 R14: 000055ad9832b9b0 R15: 0000000000000000
And others on writes:
mozStorage #1 D 0 24816 9621 0x00000004
Call Trace:
? __schedule+0x253/0x850
? __switch_to_asm+0x40/0x70
schedule+0x28/0x80
io_schedule+0x12/0x40
__lock_page+0x13c/0x230
? file_check_and_advance_wb_err+0xc0/0xc0
pagecache_get_page+0x1ae/0x340
grab_cache_page_write_begin+0x1c/0x40
nfs_write_begin+0x63/0x340 [nfs]
generic_perform_write+0xf4/0x1b0
? __handle_mm_fault+0xb9d/0x1380
nfs_file_write+0xdc/0x200 [nfs]
__vfs_write+0x136/0x1a0
vfs_write+0xa5/0x1a0
ksys_write+0x4f/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff1d131d57
Code: Bad RIP value.
RSP: 002b:00007eff00546380 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000002d RCX: 00007eff1d131d57
RDX: 0000000000008000 RSI: 00007efef8085000 RDI: 000000000000002d
RBP: 00007efef8085000 R08: 0000000000000000 R09: 00007ffc7d3b4080
R10: 000000000274d526 R11: 0000000000000293 R12: 0000000000008000
R13: 00007eff08604868 R14: 00000000000300c8 R15: 00007efef8085000
Not sure what this one is waiting on:
kdm-uh D 0 25357 25345 0x00000000
Call Trace:
? __schedule+0x253/0x850
? __switch_to_asm+0x34/0x70
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
? __rpc_atrun+0x20/0x20 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
? trace_event_raw_event_rpc_stats_latency+0x240/0x240 [sunrpc]
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
__rpc_execute+0xe0/0x350 [sunrpc]
? recalibrate_cpu_khz+0x10/0x10
? ktime_get+0x36/0xa0
rpc_run_task+0x121/0x180 [sunrpc]
nfs4_call_sync_sequence+0x64/0xa0 [nfsv4]
_nfs4_proc_getattr+0xf8/0x120 [nfsv4]
nfs4_proc_getattr+0x75/0x100 [nfsv4]
__nfs_revalidate_inode+0x1b7/0x2a0 [nfs]
nfs_do_access+0x2cd/0x400 [nfs]
? rpcauth_lookupcred+0x99/0xc0 [sunrpc]
nfs_permission+0x1b9/0x1e0 [nfs]
inode_permission+0xbe/0x180
link_path_walk.part.49+0x3f2/0x520
path_openat+0x9f/0x1610
? page_counter_try_charge+0x57/0xc0
? memcg_kmem_charge_memcg+0x76/0xa0
? get_random_u32+0x3e/0xd0
do_filp_open+0x93/0x100
? __check_object_size+0x15d/0x189
do_sys_open+0x186/0x210
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3f0fc5dca2
Code: Bad RIP value.
RSP: 002b:00007ffc18c62840 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 000055a70650bfa0 RCX: 00007f3f0fc5dca2
RDX: 0000000000000800 RSI: 000055a70650bcf0 RDI: 00000000ffffff9c
RBP: 0000000000001e34 R08: 000055a70650c3b0 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffc18c62a30 R14: 000055a70650bcf0 R15: 00007ffc18c62afc
And there's one kernel thread, I assume for the mount since that's the
IP of the NFS server.
172.21.86.74-ma D 0 25099 2 0x80000000
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
nfs4_proc_sequence+0x3c/0x50 [nfsv4]
nfs4_run_state_manager+0x257/0x760 [nfsv4]
? nfs4_do_reclaim+0x680/0x680 [nfsv4]
kthread+0x112/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x35/0x40
next prev parent reply other threads:[~2019-01-24 20:09 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-24 17:32 Need help debugging NFS issues new to 4.20 kernel Jason L Tibbitts III
2019-01-24 19:28 ` Jason L Tibbitts III [this message]
2019-01-24 19:58 ` Trond Myklebust
2019-01-25 19:13 ` Schumaker, Anna
2019-01-26 17:59 ` Sasha Levin
2019-01-25 19:51 ` Jason L Tibbitts III
2019-02-05 18:12 ` Jason Tibbitts
2019-02-06 12:05 ` Benjamin Coddington
[not found] ` <87imxwab12.fsf@hippogriff.math.uh.edu>
2019-02-07 11:13 ` Benjamin Coddington
[not found] ` <87d0o3aadg.fsf@hippogriff.math.uh.edu>
2019-02-08 12:01 ` Benjamin Coddington
2019-02-08 15:19 ` Chuck Lever
2019-02-08 17:17 ` Jason L Tibbitts III
2019-02-15 20:33 ` Jason L Tibbitts III
2019-02-16 14:46 ` Trond Myklebust
2019-02-20 2:13 ` Jason L Tibbitts III
2019-02-20 15:25 ` Jason L Tibbitts III
2019-02-20 15:37 ` Trond Myklebust
2019-02-20 15:39 ` Chuck Lever
2019-02-20 15:41 ` Trond Myklebust
2019-02-21 18:19 ` Jason L Tibbitts III
2019-02-25 19:24 ` Jason L Tibbitts III
2019-02-25 23:15 ` Benjamin Coddington
2019-02-20 16:25 ` Jason L Tibbitts III
2019-02-20 16:45 ` Trond Myklebust
2019-02-20 16:49 ` Jason L Tibbitts III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ufaa7jpam8h.fsf@epithumia.math.uh.edu \
--to=tibbs@math.uh.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).