linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason L Tibbitts III <tibbs@math.uh.edu>
To: linux-nfs@vger.kernel.org
Subject: Re: Need help debugging NFS issues new to 4.20 kernel
Date: Thu, 24 Jan 2019 13:28:46 -0600	[thread overview]
Message-ID: <ufaa7jpam8h.fsf@epithumia.math.uh.edu> (raw)
In-Reply-To: <ufaimyearlx.fsf@epithumia.math.uh.edu> (Jason L. Tibbitts, III's message of "Thu, 24 Jan 2019 11:32:42 -0600")

Just had another machine get into this state.  This time I thought to at
least do alt-sysrq-w.  The result is a big pile of backtraces but it
looks like basically every NFS operation is hung up.

I'm not sure how useful it will be, but I tried to categorize the
various backtraces.  Most of them are like the following, going through
nfs_file_fsync:

tcsh            D    0  9511      1 0x80000006
Call Trace:
 ? __schedule+0x253/0x850
 schedule+0x28/0x80
 io_schedule+0x12/0x40
 wait_on_page_bit_common+0x11a/0x270
 ? file_check_and_advance_wb_err+0xc0/0xc0
 __filemap_fdatawait_range+0xe2/0x130
 ? __filemap_fdatawrite_range+0xc8/0xf0
 filemap_write_and_wait_range+0x45/0x80
 nfs_file_fsync+0x44/0x1e0 [nfs]
 filp_close+0x31/0x60
 put_files_struct+0x6c/0xc0
 do_exit+0x2b4/0xb30
 ? __fput+0x151/0x220
 do_group_exit+0x3a/0xa0
 get_signal+0x276/0x590
 do_signal+0x36/0x610
 ? do_sys_open+0x128/0x210
 exit_to_usermode_loop+0x71/0xe0
 do_syscall_64+0x14d/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4760d46ca2
Code: Bad RIP value.
RSP: 002b:00007ffc29c788d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: fffffffffffffe00 RBX: 00005652381bf990 RCX: 00007f4760d46ca2
RDX: 0000000000000000 RSI: 00005652381d4b40 RDI: 00000000ffffff9c
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffc29c78920
R10: 0000000000000000 R11: 0000000000000246 R12: 00005652381d4b40
R13: 0000000000000000 R14: 00005652381dcd30 R15: 0000000000000000


There are a few others which are hung on locking operations:

QThread         D    0  9740      1 0x80000006
Call Trace:
 ? __schedule+0x253/0x850
 ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
 schedule+0x28/0x80
 rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
 __wait_on_bit+0x6c/0x80
 out_of_line_wait_on_bit+0x91/0xb0
 ? init_wait_var_entry+0x40/0x40
 nfs4_proc_lock+0x1a9/0x320 [nfsv4]
 ? do_unlk+0x98/0xe0 [nfs]
 locks_remove_posix+0xb1/0x140
 ? __nfs_commit_inode+0xc4/0x1b0 [nfs]
 filp_close+0x50/0x60
 put_files_struct+0x6c/0xc0
 do_exit+0x2b4/0xb30
 ? mem_cgroup_try_charge+0xe6/0x180
 do_group_exit+0x3a/0xa0
 get_signal+0x276/0x590
 do_signal+0x36/0x610
 exit_to_usermode_loop+0x71/0xe0
 do_syscall_64+0x14d/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff3e9321421
Code: Bad RIP value.
RSP: 002b:00007ff3d7ffeac0 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
RAX: fffffffffffffdfc RBX: 00007ff3c8004db0 RCX: 00007ff3e9321421
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 00007ff3c8004db0
RBP: 0000000000000001 R08: 0000000000000000 R09: 00007ff3c8000c78
R10: 00007ff3d0003480 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00007ff3e6e48f10 R14: 00000000ffffffff R15: 0000000000000001

gvfsd-metadata  D    0 24382      1 0x80000006
Call Trace:
 ? __schedule+0x253/0x850
 ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
 schedule+0x28/0x80
 rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
 __wait_on_bit+0x6c/0x80
 out_of_line_wait_on_bit+0x91/0xb0
 ? init_wait_var_entry+0x40/0x40
 nfs4_do_close+0x2bf/0x2f0 [nfsv4]
 __put_nfs_open_context+0xa9/0x100 [nfs]
 nfs_file_release+0x39/0x40 [nfs]
 __fput+0xb4/0x220
 task_work_run+0x84/0xa0
 do_exit+0x2d2/0xb30
 do_group_exit+0x3a/0xa0
 get_signal+0x276/0x590
 do_signal+0x36/0x610
 ? handle_mm_fault+0xda/0x200
 exit_to_usermode_loop+0x71/0xe0
 do_syscall_64+0x14d/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3cf843d421
Code: Bad RIP value.
RSP: 002b:00007ffc224ea250 EFLAGS: 00000293 ORIG_RAX: 0000000000000007
RAX: fffffffffffffdfc RBX: 000055c9db8fa2a0 RCX: 00007f3cf843d421
RDX: 00000000ffffffff RSI: 0000000000000001 RDI: 000055c9db8fa2a0
RBP: 0000000000000001 R08: 0000000000000000 R09: 000055c9db8ea488
R10: 000055c9db8dbae0 R11: 0000000000000293 R12: 00000000ffffffff
R13: 00007f3cf85bff10 R14: 00000000ffffffff R15: 0000000000000001


And some others waiting on metadata operations:

localStorage DB D    0 24794   9621 0x00000004
Call Trace:
 ? __schedule+0x253/0x850
 schedule+0x28/0x80
 io_schedule+0x12/0x40
 wait_on_page_bit_common+0x11a/0x270
 ? file_check_and_advance_wb_err+0xc0/0xc0
 __filemap_fdatawait_range+0xe2/0x130
 filemap_write_and_wait+0x47/0x70
 nfs_wb_all+0x1a/0x120 [nfs]
 nfs_setattr+0x1cb/0x1d0 [nfs]
 notify_change+0x2da/0x440
 do_truncate+0x74/0xc0
 do_sys_ftruncate+0xb6/0x100
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff1cd1382b
Code: Bad RIP value.
RSP: 002b:00007eff0c3787c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
RAX: ffffffffffffffda RBX: 0000000000bd0000 RCX: 00007eff1cd1382b
RDX: 0002000100000000 RSI: 0000000000bd0000 RDI: 000000000000005c
RBP: 00007eff034d61a0 R08: 00007ffc7d3b40b0 R09: 00007ffc7d3b4080
R10: 000000000274ded2 R11: 0000000000000246 R12: 000000000000005c
R13: 00007eff02f081c0 R14: 00007eff025fd8e0 R15: 00007eff034832e0

df              D    0 25187  25186 0x00000000
Call Trace:
 ? __schedule+0x253/0x850
 ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
 ? __rpc_atrun+0x20/0x20 [sunrpc]
 schedule+0x28/0x80
 rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
 __wait_on_bit+0x6c/0x80
 ? trace_event_raw_event_rpc_stats_latency+0x240/0x240 [sunrpc]
 out_of_line_wait_on_bit+0x91/0xb0
 ? init_wait_var_entry+0x40/0x40
 __rpc_execute+0xe0/0x350 [sunrpc]
 ? recalibrate_cpu_khz+0x10/0x10
 ? ktime_get+0x36/0xa0
 rpc_run_task+0x121/0x180 [sunrpc]
 nfs4_call_sync_sequence+0x64/0xa0 [nfsv4]
 _nfs4_proc_getattr+0xf8/0x120 [nfsv4]
 nfs4_proc_getattr+0x75/0x100 [nfsv4]
 __nfs_revalidate_inode+0x1b7/0x2a0 [nfs]
 nfs_getattr+0x118/0x2b0 [nfs]
 ? security_inode_getattr+0x30/0x50
 vfs_statx+0x89/0xe0
 __do_sys_newstat+0x39/0x70
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f7dc6195579
Code: Bad RIP value.
RSP: 002b:00007ffd44e725a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 000055ad9832ef80 RCX: 00007f7dc6195579
RDX: 00007ffd44e72650 RSI: 00007ffd44e72650 RDI: 000055ad9832e380
RBP: 000055ad9832e310 R08: 0000000000000003 R09: aaaaaaaaaaaaaaab
R10: 000055ad98326010 R11: 0000000000000246 R12: 000055ad9832e3c0
R13: 0000000000000000 R14: 000055ad9832b9b0 R15: 0000000000000000


And others on writes:

mozStorage #1   D    0 24816   9621 0x00000004
Call Trace:
 ? __schedule+0x253/0x850
 ? __switch_to_asm+0x40/0x70
 schedule+0x28/0x80
 io_schedule+0x12/0x40
 __lock_page+0x13c/0x230
 ? file_check_and_advance_wb_err+0xc0/0xc0
 pagecache_get_page+0x1ae/0x340
 grab_cache_page_write_begin+0x1c/0x40
 nfs_write_begin+0x63/0x340 [nfs]
 generic_perform_write+0xf4/0x1b0
 ? __handle_mm_fault+0xb9d/0x1380
 nfs_file_write+0xdc/0x200 [nfs]
 __vfs_write+0x136/0x1a0
 vfs_write+0xa5/0x1a0
 ksys_write+0x4f/0xb0
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7eff1d131d57
Code: Bad RIP value.
RSP: 002b:00007eff00546380 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000002d RCX: 00007eff1d131d57
RDX: 0000000000008000 RSI: 00007efef8085000 RDI: 000000000000002d
RBP: 00007efef8085000 R08: 0000000000000000 R09: 00007ffc7d3b4080
R10: 000000000274d526 R11: 0000000000000293 R12: 0000000000008000
R13: 00007eff08604868 R14: 00000000000300c8 R15: 00007efef8085000


Not sure what this one is waiting on:

kdm-uh          D    0 25357  25345 0x00000000
Call Trace:
 ? __schedule+0x253/0x850
 ? __switch_to_asm+0x34/0x70
 ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
 ? __rpc_atrun+0x20/0x20 [sunrpc]
 schedule+0x28/0x80
 rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
 __wait_on_bit+0x6c/0x80
 ? trace_event_raw_event_rpc_stats_latency+0x240/0x240 [sunrpc]
 out_of_line_wait_on_bit+0x91/0xb0
 ? init_wait_var_entry+0x40/0x40
 __rpc_execute+0xe0/0x350 [sunrpc]
 ? recalibrate_cpu_khz+0x10/0x10
 ? ktime_get+0x36/0xa0
 rpc_run_task+0x121/0x180 [sunrpc]
 nfs4_call_sync_sequence+0x64/0xa0 [nfsv4]
 _nfs4_proc_getattr+0xf8/0x120 [nfsv4]
 nfs4_proc_getattr+0x75/0x100 [nfsv4]
 __nfs_revalidate_inode+0x1b7/0x2a0 [nfs]
 nfs_do_access+0x2cd/0x400 [nfs]
 ? rpcauth_lookupcred+0x99/0xc0 [sunrpc]
 nfs_permission+0x1b9/0x1e0 [nfs]
 inode_permission+0xbe/0x180
 link_path_walk.part.49+0x3f2/0x520
 path_openat+0x9f/0x1610
 ? page_counter_try_charge+0x57/0xc0
 ? memcg_kmem_charge_memcg+0x76/0xa0
 ? get_random_u32+0x3e/0xd0
 do_filp_open+0x93/0x100
 ? __check_object_size+0x15d/0x189
 do_sys_open+0x186/0x210
 do_syscall_64+0x5b/0x160
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3f0fc5dca2
Code: Bad RIP value.
RSP: 002b:00007ffc18c62840 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 000055a70650bfa0 RCX: 00007f3f0fc5dca2
RDX: 0000000000000800 RSI: 000055a70650bcf0 RDI: 00000000ffffff9c
RBP: 0000000000001e34 R08: 000055a70650c3b0 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffc18c62a30 R14: 000055a70650bcf0 R15: 00007ffc18c62afc

And there's one kernel thread, I assume for the mount since that's the
IP of the NFS server.

172.21.86.74-ma D    0 25099      2 0x80000000
Call Trace:
 ? __schedule+0x253/0x850
 ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
 schedule+0x28/0x80
 rpc_wait_bit_killable+0x1e/0x90 [sunrpc]
 __wait_on_bit+0x6c/0x80
 out_of_line_wait_on_bit+0x91/0xb0
 ? init_wait_var_entry+0x40/0x40
 nfs4_proc_sequence+0x3c/0x50 [nfsv4]
 nfs4_run_state_manager+0x257/0x760 [nfsv4]
 ? nfs4_do_reclaim+0x680/0x680 [nfsv4]
 kthread+0x112/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x35/0x40


  reply	other threads:[~2019-01-24 20:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-24 17:32 Need help debugging NFS issues new to 4.20 kernel Jason L Tibbitts III
2019-01-24 19:28 ` Jason L Tibbitts III [this message]
2019-01-24 19:58 ` Trond Myklebust
2019-01-25 19:13   ` Schumaker, Anna
2019-01-26 17:59     ` Sasha Levin
2019-01-25 19:51   ` Jason L Tibbitts III
2019-02-05 18:12     ` Jason Tibbitts
2019-02-06 12:05       ` Benjamin Coddington
     [not found]         ` <87imxwab12.fsf@hippogriff.math.uh.edu>
2019-02-07 11:13           ` Benjamin Coddington
     [not found]             ` <87d0o3aadg.fsf@hippogriff.math.uh.edu>
2019-02-08 12:01               ` Benjamin Coddington
2019-02-08 15:19                 ` Chuck Lever
2019-02-08 17:17                   ` Jason L Tibbitts III
2019-02-15 20:33                 ` Jason L Tibbitts III
2019-02-16 14:46                   ` Trond Myklebust
2019-02-20  2:13                     ` Jason L Tibbitts III
2019-02-20 15:25                     ` Jason L Tibbitts III
2019-02-20 15:37                       ` Trond Myklebust
2019-02-20 15:39                         ` Chuck Lever
2019-02-20 15:41                         ` Trond Myklebust
2019-02-21 18:19                           ` Jason L Tibbitts III
2019-02-25 19:24                             ` Jason L Tibbitts III
2019-02-25 23:15                               ` Benjamin Coddington
2019-02-20 16:25                         ` Jason L Tibbitts III
2019-02-20 16:45                           ` Trond Myklebust
2019-02-20 16:49                             ` Jason L Tibbitts III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ufaa7jpam8h.fsf@epithumia.math.uh.edu \
    --to=tibbs@math.uh.edu \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).