All of lore.kernel.org
 help / color / mirror / Atom feed
From: Scott Mayhew <smayhew@redhat.com>
To: NeilBrown <neilb@suse.de>
Cc: "J.  Bruce Fields" <bfields@fieldses.org>,
	Timothy Pearson <tpearson@raptorengineering.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-nfs@vger.kernel.org, Trond Myklebust <trondmy@gmail.com>
Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
Date: Fri, 8 Oct 2021 16:27:10 -0400	[thread overview]
Message-ID: <YWCpnsdVqssFaLrf@aion.usersys.redhat.com> (raw)
In-Reply-To: <162880418532.15074.7140645794203395299@noble.neil.brown.name>

On Fri, 13 Aug 2021, NeilBrown wrote:

> On Fri, 13 Aug 2021, J.  Bruce Fields wrote:
> > On Tue, Aug 10, 2021 at 10:43:31AM +1000, NeilBrown wrote:
> > > 
> > > The problem here appears to be that a signalled task is being retried
> > > without clearing the SIGNALLED flag.  That is causing the infinite loop
> > > and the soft lockup.
> > > 
> > > This bug appears to have been introduced in Linux 5.2 by
> > > Commit: ae67bd3821bb ("SUNRPC: Fix up task signalling")
> > 
> > I wonder how we arrived here.  Does it require that an rpc task returns
> > from one of those rpc_delay() calls just as rpc_shutdown_client() is
> > signalling it?  That's the only way async tasks get signalled, I think.
> 
> I don't think "just as" is needed.
> I think it could only happen if rpc_shutdown_client() were called when
> there were active tasks - presumably from nfsd4_process_cb_update(), but
> I don't know the callback code well.
> If any of those active tasks has a ->done handler which might try to
> reschedule the task when tk_status == -ERESTARTSYS, then you get into
> the infinite loop.

This thread seems to have fizzled out, but I'm pretty sure I hit this
during the Virtual Bakeathon yesterday.  My server was unresponsive but
I eventually managed to get a vmcore.

[182411.119788] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f2f40905 xid 5d83adfb
[182437.775113] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u4:1:216458]
[182437.775633] Modules linked in: nfs_layout_flexfiles nfsv3 nfs_layout_nfsv41_files bluetooth ecdh_generic ecc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netfs rpcrdma rdma_cm iw_cm ib_cm ib_core tun rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib isofs cdrom nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink xfs intel_rapl_msr intel_rapl_common libcrc32c kvm_intel qxl kvm drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops irqbypass cec joydev virtio_balloon pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm fuse ext4 mbcache jbd2 ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net serio_raw ata_piix net_failover libata virtio_blk virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod
[182437.780157] CPU: 1 PID: 216458 Comm: kworker/u4:1 Kdump: loaded Not tainted 5.14.0-5.el9.x86_64 #1
[182437.780894] Hardware name: DigitalOcean Droplet, BIOS 20171212 12/12/2017
[182437.781567] Workqueue: rpciod rpc_async_schedule [sunrpc]
[182437.782500] RIP: 0010:try_to_grab_pending+0x12/0x160
[182437.783104] Code: e7 e8 72 f3 ff ff e9 6e ff ff ff 48 89 df e8 65 f3 ff ff eb b7 0f 1f 00 0f 1f 44 00 00 41 55 41 54 55 48 89 d5 53 48 89 fb 9c <58> fa 48 89 02 40 84 f6 0f 85 92 00 00 00 f0 48 0f ba 2b 00 72 09
[182437.784261] RSP: 0018:ffffb5b24066fd30 EFLAGS: 00000246
[182437.785052] RAX: 0000000000000000 RBX: ffffffffc05e0768 RCX: 00000000000007d0
[182437.785760] RDX: ffffb5b24066fd60 RSI: 0000000000000001 RDI: ffffffffc05e0768
[182437.786399] RBP: ffffb5b24066fd60 R08: ffffffffc05e0708 R09: ffffffffc05e0708
[182437.787010] R10: 0000000000000003 R11: 0000000000000003 R12: ffffffffc05e0768
[182437.787621] R13: ffff9daa40312400 R14: 00000000000007d0 R15: 0000000000000000
[182437.788235] FS:  0000000000000000(0000) GS:ffff9daa5bd00000(0000) knlGS:0000000000000000
[182437.788859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[182437.789483] CR2: 00007f8f73d5d828 CR3: 000000008a010003 CR4: 00000000001706e0
[182437.790188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[182437.790831] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[182437.791522] Call Trace:
[182437.792183]  mod_delayed_work_on+0x3c/0x90
[182437.792866]  __rpc_sleep_on_priority_timeout+0x107/0x110 [sunrpc]
[182437.793553]  rpc_delay+0x56/0x90 [sunrpc]
[182437.794236]  nfsd4_cb_sequence_done+0x202/0x290 [nfsd]
[182437.794910]  nfsd4_cb_done+0x18/0xf0 [nfsd]
[182437.795974]  rpc_exit_task+0x58/0x100 [sunrpc]
[182437.796955]  ? rpc_do_put_task+0x60/0x60 [sunrpc]
[182437.797645]  __rpc_execute+0x5e/0x250 [sunrpc]
[182437.798375]  rpc_async_schedule+0x29/0x40 [sunrpc]
[182437.799043]  process_one_work+0x1e6/0x380
[182437.799703]  worker_thread+0x53/0x3d0
[182437.800393]  ? process_one_work+0x380/0x380
[182437.801029]  kthread+0x10f/0x130
[182437.801686]  ? set_kthread_struct+0x40/0x40
[182437.802333]  ret_from_fork+0x22/0x30

The process causing the soft lockup warnings:

crash> set 216458
    PID: 216458
COMMAND: "kworker/u4:1"
   TASK: ffff9da94281e200  [THREAD_INFO: ffff9da94281e200]
    CPU: 0
  STATE: TASK_RUNNING (ACTIVE)
crash> bt
PID: 216458  TASK: ffff9da94281e200  CPU: 0   COMMAND: "kworker/u4:1"
 #0 [fffffe000000be50] crash_nmi_callback at ffffffffb3055c31
 #1 [fffffe000000be58] nmi_handle at ffffffffb30268f8
 #2 [fffffe000000bea0] default_do_nmi at ffffffffb3a36d42
 #3 [fffffe000000bec8] exc_nmi at ffffffffb3a36f49
 #4 [fffffe000000bef0] end_repeat_nmi at ffffffffb3c013cb
    [exception RIP: add_timer]
    RIP: ffffffffb317c230  RSP: ffffb5b24066fd58  RFLAGS: 00000046
    RAX: 000000010b3585fc  RBX: 0000000000000000  RCX: 00000000000007d0
    RDX: ffffffffc05e0768  RSI: ffff9daa40312400  RDI: ffffffffc05e0788
    RBP: 0000000000002000   R8: ffffffffc05e0770   R9: ffffffffc05e0788
    R10: 0000000000000003  R11: 0000000000000003  R12: ffffffffc05e0768
    R13: ffff9daa40312400  R14: 00000000000007d0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffffb5b24066fd58] add_timer at ffffffffb317c230
 #6 [ffffb5b24066fd58] mod_delayed_work_on at ffffffffb3103247
 #7 [ffffb5b24066fd98] __rpc_sleep_on_priority_timeout at ffffffffc0580547 [sunrpc]
 #8 [ffffb5b24066fdc8] rpc_delay at ffffffffc0589ed6 [sunrpc]
 #9 [ffffb5b24066fde8] nfsd4_cb_sequence_done at ffffffffc06731b2 [nfsd]
#10 [ffffb5b24066fe10] nfsd4_cb_done at ffffffffc0673258 [nfsd]
#11 [ffffb5b24066fe30] rpc_exit_task at ffffffffc05800a8 [sunrpc]
#12 [ffffb5b24066fe40] __rpc_execute at ffffffffc0589fee [sunrpc]
#13 [ffffb5b24066fe70] rpc_async_schedule at ffffffffc058a209 [sunrpc]
#14 [ffffb5b24066fe88] process_one_work at ffffffffb31026c6
#15 [ffffb5b24066fed0] worker_thread at ffffffffb31028b3
#16 [ffffb5b24066ff10] kthread at ffffffffb310960f
#17 [ffffb5b24066ff50] ret_from_fork at ffffffffb30034f2

Looking at the rpc_task being executed:

crash> rpc_task.tk_status,tk_callback,tk_action,tk_runstate,tk_client,tk_flags ffff9da94120bd00
  tk_status = 0x0
  tk_callback = 0xffffffffc057bc60 <__rpc_atrun>
  tk_action = 0xffffffffc0571f20 <call_start>
  tk_runstate = 0x47
  tk_client = 0xffff9da958909c00
  tk_flags = 0x2281

tk_runstate has the following flags set: RPC_TASK_SIGNALLED, RPC_TASK_ACTIVE,
RPC_TASK_QUEUED, and RPC_TASK_RUNNING.

tk_flags is RPC_TASK_NOCONNECT|RPC_TASK_SOFT|RPC_TASK_DYNAMIC|RPC_TASK_ASYNC.

There's another kworker thread calling rpc_shutdown_client() via
nfsd4_process_cb_update():

crash> bt 0x342a3
PID: 213667  TASK: ffff9daa4fde9880  CPU: 1   COMMAND: "kworker/u4:4"
 #0 [ffffb5b24077bbe0] __schedule at ffffffffb3a40ec6
 #1 [ffffb5b24077bc60] schedule at ffffffffb3a4124c
 #2 [ffffb5b24077bc78] schedule_timeout at ffffffffb3a45058
 #3 [ffffb5b24077bcd0] rpc_shutdown_client at ffffffffc056fbb3 [sunrpc]
 #4 [ffffb5b24077bd20] nfsd4_process_cb_update at ffffffffc0672c6c [nfsd]
 #5 [ffffb5b24077be68] nfsd4_run_cb_work at ffffffffc0672f0f [nfsd]
 #6 [ffffb5b24077be88] process_one_work at ffffffffb31026c6
 #7 [ffffb5b24077bed0] worker_thread at ffffffffb31028b3
 #8 [ffffb5b24077bf10] kthread at ffffffffb310960f
 #9 [ffffb5b24077bf50] ret_from_fork at ffffffffb30034f2

The rpc_clnt being shut down is:

crash> nfs4_client.cl_cb_client ffff9daa454db808
  cl_cb_client = 0xffff9da958909c00

Which is the same as the tk_client for the rpc_task being executed by the
thread triggering the soft lockup warnings.

-Scott

> 
> > 
> > > Prior to this commit a flag RPC_TASK_KILLED was used, and it gets
> > > cleared by rpc_reset_task_statistics() (called from rpc_exit_task()).
> > > After this commit a new flag RPC_TASK_SIGNALLED is used, and it is never
> > > cleared.
> > > 
> > > A fix might be to clear RPC_TASK_SIGNALLED in
> > > rpc_reset_task_statistics(), but I'll leave that decision to someone
> > > else.
> > 
> > Might be worth testing with that change just to verify that this is
> > what's happening.
> > 
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index c045f63d11fa..caa931888747 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -813,7 +813,8 @@ static void
> >  rpc_reset_task_statistics(struct rpc_task *task)
> >  {
> >  	task->tk_timeouts = 0;
> > -	task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_SENT);
> > +	task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_SIGNALLED|
> > +							RPC_TASK_SENT);
> 
> NONONONONO.
> RPC_TASK_SIGNALLED is a flag in tk_runstate.
> So you need
> 	clear_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);
> 
> NeilBrown
> 


  reply	other threads:[~2021-10-08 20:27 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05  9:44 CPU stall, eventual host hang with BTRFS + NFS under heavy load Timothy Pearson
2021-07-05  9:47 ` Timothy Pearson
2021-07-23 21:01   ` J. Bruce Fields
     [not found]   ` <B4D8C4B7-EE8C-456C-A6C5-D25FF1F3608E@rutgers.edu>
     [not found]     ` <3A4DF3BB-955C-4301-BBED-4D5F02959F71@rutgers.edu>
2021-08-09 17:06       ` Timothy Pearson
2021-08-09 17:15         ` hedrick
2021-08-09 17:25           ` Timothy Pearson
2021-08-09 17:37           ` Chuck Lever III
     [not found]             ` <F5179A41-FB9A-4AB1-BE58-C2859DB7EC06@rutgers.edu>
2021-08-09 18:30               ` Timothy Pearson
2021-08-09 18:38                 ` hedrick
2021-08-09 18:44                   ` Timothy Pearson
2021-08-09 18:49                   ` J. Bruce Fields
     [not found]                     ` <15AD846A-4638-4ACF-B47C-8EF655AD6E85@rutgers.edu>
2021-08-09 18:56                       ` Timothy Pearson
2021-08-09 20:54                         ` Charles Hedrick
2021-08-09 21:49                           ` Timothy Pearson
2021-08-09 22:01                             ` Charles Hedrick
     [not found]             ` <1119B476-171F-4C5A-9DEF-184F211A6A98@rutgers.edu>
2021-08-10 16:22               ` Timothy Pearson
2021-08-16 14:43                 ` hedrick
2021-08-09 18:30           ` J. Bruce Fields
2021-08-09 18:34             ` hedrick
     [not found]             ` <413163A6-8484-4170-9877-C0C2D50B13C0@rutgers.edu>
2021-08-10 14:58               ` J. Bruce Fields
2021-07-23 21:00 ` J. Bruce Fields
2021-07-23 21:22   ` Timothy Pearson
2021-07-28 19:51     ` Timothy Pearson
2021-08-02 19:28       ` J. Bruce Fields
2021-08-10  0:43 ` NeilBrown
2021-08-10  0:54   ` J.  Bruce Fields
2021-08-12 14:44   ` J.  Bruce Fields
2021-08-12 21:36     ` NeilBrown
2021-10-08 20:27       ` Scott Mayhew [this message]
2021-10-08 20:53         ` Timothy Pearson
2021-10-08 21:11         ` J.  Bruce Fields
2021-10-09 17:33         ` Chuck Lever III
2021-10-11 14:30           ` Bruce Fields
2021-10-11 16:36             ` Chuck Lever III
2021-10-11 21:57               ` NeilBrown
2021-10-14 22:36                 ` Trond Myklebust
2021-10-14 22:51                   ` NeilBrown
2021-10-15  8:03                     ` Trond Myklebust
2021-10-15  8:05                       ` Trond Myklebust
2021-12-01 18:36                         ` Scott Mayhew
2021-12-01 19:35                           ` Trond Myklebust
2021-12-01 20:13                             ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YWCpnsdVqssFaLrf@aion.usersys.redhat.com \
    --to=smayhew@redhat.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tpearson@raptorengineering.com \
    --cc=trondmy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.