All of lore.kernel.org
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "J.  Bruce Fields" <bfields@fieldses.org>
Cc: "Timothy Pearson" <tpearson@raptorengineering.com>,
	"Chuck Lever" <chuck.lever@oracle.com>,
	linux-nfs@vger.kernel.org, "Trond Myklebust" <trondmy@gmail.com>
Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
Date: Fri, 13 Aug 2021 07:36:25 +1000	[thread overview]
Message-ID: <162880418532.15074.7140645794203395299@noble.neil.brown.name> (raw)
In-Reply-To: <20210812144428.GA9536@fieldses.org>

On Fri, 13 Aug 2021, J.  Bruce Fields wrote:
> On Tue, Aug 10, 2021 at 10:43:31AM +1000, NeilBrown wrote:
> > 
> > The problem here appears to be that a signalled task is being retried
> > without clearing the SIGNALLED flag.  That is causing the infinite loop
> > and the soft lockup.
> > 
> > This bug appears to have been introduced in Linux 5.2 by
> > Commit: ae67bd3821bb ("SUNRPC: Fix up task signalling")
> 
> I wonder how we arrived here.  Does it require that an rpc task returns
> from one of those rpc_delay() calls just as rpc_shutdown_client() is
> signalling it?  That's the only way async tasks get signalled, I think.

I don't think "just as" is needed.
I think it could only happen if rpc_shutdown_client() were called when
there were active tasks - presumably from nfsd4_process_cb_update(), but
I don't know the callback code well.
If any of those active tasks has a ->done handler which might try to
reschedule the task when tk_status == -ERESTARTSYS, then you get into
the infinite loop.

> 
> > Prior to this commit a flag RPC_TASK_KILLED was used, and it gets
> > cleared by rpc_reset_task_statistics() (called from rpc_exit_task()).
> > After this commit a new flag RPC_TASK_SIGNALLED is used, and it is never
> > cleared.
> > 
> > A fix might be to clear RPC_TASK_SIGNALLED in
> > rpc_reset_task_statistics(), but I'll leave that decision to someone
> > else.
> 
> Might be worth testing with that change just to verify that this is
> what's happening.
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index c045f63d11fa..caa931888747 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -813,7 +813,8 @@ static void
>  rpc_reset_task_statistics(struct rpc_task *task)
>  {
>  	task->tk_timeouts = 0;
> -	task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_SENT);
> +	task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_SIGNALLED|
> +							RPC_TASK_SENT);

NONONONONO.
RPC_TASK_SIGNALLED is a flag in tk_runstate.
So you need
	clear_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);

NeilBrown

  reply	other threads:[~2021-08-12 21:36 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05  9:44 CPU stall, eventual host hang with BTRFS + NFS under heavy load Timothy Pearson
2021-07-05  9:47 ` Timothy Pearson
2021-07-23 21:01   ` J. Bruce Fields
     [not found]   ` <B4D8C4B7-EE8C-456C-A6C5-D25FF1F3608E@rutgers.edu>
     [not found]     ` <3A4DF3BB-955C-4301-BBED-4D5F02959F71@rutgers.edu>
2021-08-09 17:06       ` Timothy Pearson
2021-08-09 17:15         ` hedrick
2021-08-09 17:25           ` Timothy Pearson
2021-08-09 17:37           ` Chuck Lever III
     [not found]             ` <F5179A41-FB9A-4AB1-BE58-C2859DB7EC06@rutgers.edu>
2021-08-09 18:30               ` Timothy Pearson
2021-08-09 18:38                 ` hedrick
2021-08-09 18:44                   ` Timothy Pearson
2021-08-09 18:49                   ` J. Bruce Fields
     [not found]                     ` <15AD846A-4638-4ACF-B47C-8EF655AD6E85@rutgers.edu>
2021-08-09 18:56                       ` Timothy Pearson
2021-08-09 20:54                         ` Charles Hedrick
2021-08-09 21:49                           ` Timothy Pearson
2021-08-09 22:01                             ` Charles Hedrick
     [not found]             ` <1119B476-171F-4C5A-9DEF-184F211A6A98@rutgers.edu>
2021-08-10 16:22               ` Timothy Pearson
2021-08-16 14:43                 ` hedrick
2021-08-09 18:30           ` J. Bruce Fields
2021-08-09 18:34             ` hedrick
     [not found]             ` <413163A6-8484-4170-9877-C0C2D50B13C0@rutgers.edu>
2021-08-10 14:58               ` J. Bruce Fields
2021-07-23 21:00 ` J. Bruce Fields
2021-07-23 21:22   ` Timothy Pearson
2021-07-28 19:51     ` Timothy Pearson
2021-08-02 19:28       ` J. Bruce Fields
2021-08-10  0:43 ` NeilBrown
2021-08-10  0:54   ` J.  Bruce Fields
2021-08-12 14:44   ` J.  Bruce Fields
2021-08-12 21:36     ` NeilBrown [this message]
2021-10-08 20:27       ` Scott Mayhew
2021-10-08 20:53         ` Timothy Pearson
2021-10-08 21:11         ` J.  Bruce Fields
2021-10-09 17:33         ` Chuck Lever III
2021-10-11 14:30           ` Bruce Fields
2021-10-11 16:36             ` Chuck Lever III
2021-10-11 21:57               ` NeilBrown
2021-10-14 22:36                 ` Trond Myklebust
2021-10-14 22:51                   ` NeilBrown
2021-10-15  8:03                     ` Trond Myklebust
2021-10-15  8:05                       ` Trond Myklebust
2021-12-01 18:36                         ` Scott Mayhew
2021-12-01 19:35                           ` Trond Myklebust
2021-12-01 20:13                             ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=162880418532.15074.7140645794203395299@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tpearson@raptorengineering.com \
    --cc=trondmy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.