From: Jason L Tibbitts III <tibbs@math.uh.edu>
To: linux-nfs@vger.kernel.org
Subject: Re: Need help debugging NFS issues new to 4.20 kernel
Date: Thu, 24 Jan 2019 13:28:46 -0600 [thread overview]
Message-ID: <ufaa7jpam8h.fsf@epithumia.math.uh.edu> (raw)
In-Reply-To: <ufaimyearlx.fsf@epithumia.math.uh.edu> (Jason L. Tibbitts, III's message of "Thu, 24 Jan 2019 11:32:42 -0600")
Just had another machine get into this state. This time I thought to at
least do alt-sysrq-w. The result is a big pile of backtraces but it
looks like basically every NFS operation is hung up.
I'm not sure how useful it will be, but I tried to categorize the
various backtraces. Most of them are like the following, going through
nfs_file_fsync:
tcsh D 0 9511 1 0x80000006
Call Trace:
? __schedule+0x253/0x850
schedule+0x28/0x80
io_schedule+0x12/0x40
wait_on_page_bit_common+0x11a/0x270
? file_check_and_advance_wb_err+0xc0/0xc0
__filemap_fdatawait_range+0xe2/0x130
? __filemap_fdatawrite_range+0xc8/0xf0
filemap_write_and_wait_range+0x45/0x80
nfs_file_fsync+0x44/0x1e0 [nfs]
filp_close+0x31/0x60
put_files_struct+0x6c/0xc0
do_exit+0x2b4/0xb30
? __fput+0x151/0x220
do_group_exit+0x3a/0xa0
get_signal+0x276/0x590
do_signal+0x36/0x610
? do_sys_open+0x128/0x210
exit_to_usermode_loop+0x71/0xe0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4760d46ca2
Code: Bad RIP value.
RSP: 002b:00007ffc29c788d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: fffffffffffffe00 RBX: 00005652381bf990 RCX: 00007f4760d46ca2
RDX: 0000000000000000 RSI: 00005652381d4b40 RDI: 00000000ffffff9c
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffc29c78920
R10: 0000000000000000 R11: 0000000000000246 R12: 00005652381d4b40
R13: 0000000000000000 R14: 00005652381dcd30 R15: 0000000000000000
There are a few others which are hung on locking operations:
QThread D 0 9740 1 0x80000006
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
nfs4_proc_lock+0x1a9/0x320 [nfsv4]
? do_unlk+0x98/0xe0 [nfs]
locks_remove_posix+0xb1/0x140
? __nfs_commit_inode+0xc4/0x1b0 [nfs]
filp_close+0x50/0x60
put_files_struct+0x6c/0xc0
do_exit+0x2b4/0xb30
? mem_cgroup_try_charge+0xe6/0x180
do_group_exit+0x3a/0xa0
get_signal+0x276/0x590
do_signal+0x36/0x610
exit_to_usermode_loop+0x71/0xe0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff3e9321421
Code: Bad RIP value.
RSP: 002b:00007ff3d7ffeac0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
RAX: fffffffffffffdfc RBX: 00007ff3c8004db0 RCX: 00007ff3e9321421
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 00007ff3c8004db0
RBP: 0000000000000001 R08: 0000000000000000 R09: 00007ff3c8000c78
R10: 00007ff3d0003480 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00007ff3e6e48f10 R14: 00000000ffffffff R15: 0000000000000001
gvfsd-metadata D 0 24382 1 0x80000006
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
nfs4_do_close+0x2bf/0x2f0 [nfsv4]
__put_nfs_open_context+0xa9/0x100 [nfs]
nfs_file_release+0x39/0x40 [nfs]
__fput+0xb4/0x220
task_work_run+0x84/0xa0
do_exit+0x2d2/0xb30
do_group_exit+0x3a/0xa0
get_signal+0x276/0x590
do_signal+0x36/0x610
? handle_mm_fault+0xda/0x200
exit_to_usermode_loop+0x71/0xe0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3cf843d421
Code: Bad RIP value.
RSP: 002b:00007ffc224ea250 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
RAX: fffffffffffffdfc RBX: 000055c9db8fa2a0 RCX: 00007f3cf843d421
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 000055c9db8fa2a0
RBP: 0000000000000001 R08: 0000000000000000 R09: 000055c9db8ea488
R10: 000055c9db8dbae0 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00007f3cf85bff10 R14: 00000000ffffffff R15: 0000000000000001
And some others waiting on metadata operations:
localStorage DB D 0 24794 9621 0x00000004
Call Trace:
? __schedule+0x253/0x850
schedule+0x28/0x80
io_schedule+0x12/0x40
wait_on_page_bit_common+0x11a/0x270
? file_check_and_advance_wb_err+0xc0/0xc0
__filemap_fdatawait_range+0xe2/0x130
filemap_write_and_wait+0x47/0x70
nfs_wb_all+0x1a/0x120 [nfs]
nfs_setattr+0x1cb/0x1d0 [nfs]
notify_change+0x2da/0x440
do_truncate+0x74/0xc0
do_sys_ftruncate+0xb6/0x100
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff1cd1382b
Code: Bad RIP value.
RSP: 002b:00007eff0c3787c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 0000000000bd0000 RCX: 00007eff1cd1382b
RDX: 0002000100000000 RSI: 0000000000bd0000 RDI: 000000000000005c
RBP: 00007eff034d61a0 R08: 00007ffc7d3b40b0 R09: 00007ffc7d3b4080
R10: 000000000274ded2 R11: 0000000000000246 R12: 000000000000005c
R13: 00007eff02f081c0 R14: 00007eff025fd8e0 R15: 00007eff034832e0
df D 0 25187 25186 0x00000000
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
? __rpc_atrun+0x20/0x20 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
? trace_event_raw_event_rpc_stats_latency+0x240/0x240 [sunrpc]
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
__rpc_execute+0xe0/0x350 [sunrpc]
? recalibrate_cpu_khz+0x10/0x10
? ktime_get+0x36/0xa0
rpc_run_task+0x121/0x180 [sunrpc]
nfs4_call_sync_sequence+0x64/0xa0 [nfsv4]
_nfs4_proc_getattr+0xf8/0x120 [nfsv4]
nfs4_proc_getattr+0x75/0x100 [nfsv4]
__nfs_revalidate_inode+0x1b7/0x2a0 [nfs]
nfs_getattr+0x118/0x2b0 [nfs]
? security_inode_getattr+0x30/0x50
vfs_statx+0x89/0xe0
__do_sys_newstat+0x39/0x70
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f7dc6195579
Code: Bad RIP value.
RSP: 002b:00007ffd44e725a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 000055ad9832ef80 RCX: 00007f7dc6195579
RDX: 00007ffd44e72650 RSI: 00007ffd44e72650 RDI: 000055ad9832e380
RBP: 000055ad9832e310 R08: 0000000000000003 R09: aaaaaaaaaaaaaaab
R10: 000055ad98326010 R11: 0000000000000246 R12: 000055ad9832e3c0
R13: 0000000000000000 R14: 000055ad9832b9b0 R15: 0000000000000000
And others on writes:
mozStorage #1 D 0 24816 9621 0x00000004
Call Trace:
? __schedule+0x253/0x850
? __switch_to_asm+0x40/0x70
schedule+0x28/0x80
io_schedule+0x12/0x40
__lock_page+0x13c/0x230
? file_check_and_advance_wb_err+0xc0/0xc0
pagecache_get_page+0x1ae/0x340
grab_cache_page_write_begin+0x1c/0x40
nfs_write_begin+0x63/0x340 [nfs]
generic_perform_write+0xf4/0x1b0
? __handle_mm_fault+0xb9d/0x1380
nfs_file_write+0xdc/0x200 [nfs]
__vfs_write+0x136/0x1a0
vfs_write+0xa5/0x1a0
ksys_write+0x4f/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff1d131d57
Code: Bad RIP value.
RSP: 002b:00007eff00546380 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000002d RCX: 00007eff1d131d57
RDX: 0000000000008000 RSI: 00007efef8085000 RDI: 000000000000002d
RBP: 00007efef8085000 R08: 0000000000000000 R09: 00007ffc7d3b4080
R10: 000000000274d526 R11: 0000000000000293 R12: 0000000000008000
R13: 00007eff08604868 R14: 00000000000300c8 R15: 00007efef8085000
Not sure what this one is waiting on:
kdm-uh D 0 25357 25345 0x00000000
Call Trace:
? __schedule+0x253/0x850
? __switch_to_asm+0x34/0x70
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
? __rpc_atrun+0x20/0x20 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
? trace_event_raw_event_rpc_stats_latency+0x240/0x240 [sunrpc]
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
__rpc_execute+0xe0/0x350 [sunrpc]
? recalibrate_cpu_khz+0x10/0x10
? ktime_get+0x36/0xa0
rpc_run_task+0x121/0x180 [sunrpc]
nfs4_call_sync_sequence+0x64/0xa0 [nfsv4]
_nfs4_proc_getattr+0xf8/0x120 [nfsv4]
nfs4_proc_getattr+0x75/0x100 [nfsv4]
__nfs_revalidate_inode+0x1b7/0x2a0 [nfs]
nfs_do_access+0x2cd/0x400 [nfs]
? rpcauth_lookupcred+0x99/0xc0 [sunrpc]
nfs_permission+0x1b9/0x1e0 [nfs]
inode_permission+0xbe/0x180
link_path_walk.part.49+0x3f2/0x520
path_openat+0x9f/0x1610
? page_counter_try_charge+0x57/0xc0
? memcg_kmem_charge_memcg+0x76/0xa0
? get_random_u32+0x3e/0xd0
do_filp_open+0x93/0x100
? __check_object_size+0x15d/0x189
do_sys_open+0x186/0x210
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3f0fc5dca2
Code: Bad RIP value.
RSP: 002b:00007ffc18c62840 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 000055a70650bfa0 RCX: 00007f3f0fc5dca2
RDX: 0000000000000800 RSI: 000055a70650bcf0 RDI: 00000000ffffff9c
RBP: 0000000000001e34 R08: 000055a70650c3b0 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffc18c62a30 R14: 000055a70650bcf0 R15: 00007ffc18c62afc
And there's one kernel thread, I assume for the mount since that's the
IP of the NFS server.
172.21.86.74-ma D 0 25099 2 0x80000000
Call Trace:
? __schedule+0x253/0x850
? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
schedule+0x28/0x80
rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
__wait_on_bit+0x6c/0x80
out_of_line_wait_on_bit+0x91/0xb0
? init_wait_var_entry+0x40/0x40
nfs4_proc_sequence+0x3c/0x50 [nfsv4]
nfs4_run_state_manager+0x257/0x760 [nfsv4]
? nfs4_do_reclaim+0x680/0x680 [nfsv4]
kthread+0x112/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x35/0x40
next prev parent reply other threads:[~2019-01-24 20:09 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-24 17:32 Need help debugging NFS issues new to 4.20 kernel Jason L Tibbitts III
2019-01-24 19:28 ` Jason L Tibbitts III [this message]
2019-01-24 19:58 ` Trond Myklebust
2019-01-25 19:13 ` Schumaker, Anna
2019-01-26 17:59 ` Sasha Levin
2019-01-25 19:51 ` Jason L Tibbitts III
2019-02-05 18:12 ` Jason Tibbitts
2019-02-06 12:05 ` Benjamin Coddington
[not found] ` <87imxwab12.fsf@hippogriff.math.uh.edu>
2019-02-07 11:13 ` Benjamin Coddington
[not found] ` <87d0o3aadg.fsf@hippogriff.math.uh.edu>
2019-02-08 12:01 ` Benjamin Coddington
2019-02-08 15:19 ` Chuck Lever
2019-02-08 17:17 ` Jason L Tibbitts III
2019-02-15 20:33 ` Jason L Tibbitts III
2019-02-16 14:46 ` Trond Myklebust
2019-02-20 2:13 ` Jason L Tibbitts III
2019-02-20 15:25 ` Jason L Tibbitts III
2019-02-20 15:37 ` Trond Myklebust
2019-02-20 15:39 ` Chuck Lever
2019-02-20 15:41 ` Trond Myklebust
2019-02-21 18:19 ` Jason L Tibbitts III
2019-02-25 19:24 ` Jason L Tibbitts III
2019-02-25 23:15 ` Benjamin Coddington
2019-02-20 16:25 ` Jason L Tibbitts III
2019-02-20 16:45 ` Trond Myklebust
2019-02-20 16:49 ` Jason L Tibbitts III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ufaa7jpam8h.fsf@epithumia.math.uh.edu \
--to=tibbs@math.uh.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.