All of lore.kernel.org
 help / color / mirror / Atom feed
* Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
@ 2021-08-16 19:19 David Chen
  2021-08-16 19:30 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 9+ messages in thread
From: David Chen @ 2021-08-16 19:19 UTC (permalink / raw)
  To: stable; +Cc: Greg Kroah-Hartman, Paul E. McKenney, neeraju

Hi Greg,

We recently hit a hung task timeout issue in synchronize_rcu_expedited on 4.14 branch.
The issue seems to be identical to the one described in `fd6bc19d7676 rcu: Fix missed wakeup of exp_wq waiters`
Can we backport it to 4.14 and 4.19 branch?
The patch doesn't apply cleanly, but it should be trivial to resolve, just do this

-		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]);
+		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);

I don't know if we should do it for 4.9, because the handling of sequence number is a bit different.

Thanks,
David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-16 19:19 Request for backport fd6bc19d7676 to 4.14 and 4.19 branch David Chen
@ 2021-08-16 19:30 ` Greg Kroah-Hartman
  2021-08-16 22:02   ` David Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-16 19:30 UTC (permalink / raw)
  To: David Chen; +Cc: stable, Paul E. McKenney, neeraju

On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> Hi Greg,
> 
> We recently hit a hung task timeout issue in synchronize_rcu_expedited on 4.14 branch.
> The issue seems to be identical to the one described in `fd6bc19d7676 rcu: Fix missed wakeup of exp_wq waiters`
> Can we backport it to 4.14 and 4.19 branch?
> The patch doesn't apply cleanly, but it should be trivial to resolve, just do this
> 
> -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]);
> +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> 
> I don't know if we should do it for 4.9, because the handling of sequence number is a bit different.

Please provide a working backport, me hand-editing patches does not
scale, and this way you get the proper credit for backporting it (after
testing it).

You have tested, this, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-16 19:30 ` Greg Kroah-Hartman
@ 2021-08-16 22:02   ` David Chen
  2021-08-17  6:16     ` Greg Kroah-Hartman
  2021-08-18  7:19     ` Neeraj Upadhyay
  0 siblings, 2 replies; 9+ messages in thread
From: David Chen @ 2021-08-16 22:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: stable, Paul E. McKenney, neeraju



> -----Original Message-----
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Sent: Monday, August 16, 2021 12:31 PM
> To: David Chen <david.chen@nutanix.com>
> Cc: stable@vger.kernel.org; Paul E. McKenney
> <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> 
> On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> > Hi Greg,
> >
> > We recently hit a hung task timeout issue in synchronize_rcu_expedited on
> 4.14 branch.
> > The issue seems to be identical to the one described in `fd6bc19d7676
> > rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
> 4.19 branch?
> > The patch doesn't apply cleanly, but it should be trivial to resolve,
> > just do this
> >
> > -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
> >expedited_sequence) & 0x3]);
> > +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> >
> > I don't know if we should do it for 4.9, because the handling of sequence
> number is a bit different.
> 
> Please provide a working backport, me hand-editing patches does not scale,
> and this way you get the proper credit for backporting it (after testing it).

Sure, appended at the end.

> 
> You have tested, this, right?

I don't have a good repro for the original issue, so I only ran rcutorture and
some basic work load test to see if anything obvious went wrong.

> 
> thanks,
> 
> greg k-h

--------

From 307a212335fe143027e3a9f7a9d548beead7ba33 Mon Sep 17 00:00:00 2001
From: Neeraj Upadhyay <neeraju@codeaurora.org>
Date: Tue, 19 Nov 2019 03:17:07 +0000
Subject: [PATCH] rcu: Fix missed wakeup of exp_wq waiters

[ Upstream commit fd6bc19d7676a060a171d1cf3dcbf6fd797eb05f ]

Tasks waiting within exp_funnel_lock() for an expedited grace period to
elapse can be starved due to the following sequence of events:

1.	Tasks A and B both attempt to start an expedited grace
	period at about the same time.	This grace period will have
	completed when the lower four bits of the rcu_state structure's
	->expedited_sequence field are 0b'0100', for example, when the
	initial value of this counter is zero.	Task A wins, and thus
	does the actual work of starting the grace period, including
	acquiring the rcu_state structure's .exp_mutex and sets the
	counter to 0b'0001'.

2.	Because task B lost the race to start the grace period, it
	waits on ->expedited_sequence to reach 0b'0100' inside of
	exp_funnel_lock(). This task therefore blocks on the rcu_node
	structure's ->exp_wq[1] field, keeping in mind that the
	end-of-grace-period value of ->expedited_sequence (0b'0100')
	is shifted down two bits before indexing the ->exp_wq[] field.

3.	Task C attempts to start another expedited grace period,
	but blocks on ->exp_mutex, which is still held by Task A.

4.	The aforementioned expedited grace period completes, so that
	->expedited_sequence now has the value 0b'0100'.  A kworker task
	therefore acquires the rcu_state structure's ->exp_wake_mutex
	and starts awakening any tasks waiting for this grace period.

5.	One of the first tasks awakened happens to be Task A.  Task A
	therefore releases the rcu_state structure's ->exp_mutex,
	which allows Task C to start the next expedited grace period,
	which causes the lower four bits of the rcu_state structure's
	->expedited_sequence field to become 0b'0101'.

6.	Task C's expedited grace period completes, so that the lower four
	bits of the rcu_state structure's ->expedited_sequence field now
	become 0b'1000'.

7.	The kworker task from step 4 above continues its wakeups.
	Unfortunately, the wake_up_all() refetches the rcu_state
	structure's .expedited_sequence field:

	wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);

	This results in the wakeup being applied to the rcu_node
	structure's ->exp_wq[2] field, which is unfortunate given that
	Task B is instead waiting on ->exp_wq[1].

On a busy system, no harm is done (or at least no permanent harm is done).
Some later expedited grace period will redo the wakeup.  But on a quiet
system, such as many embedded systems, it might be a good long time before
there was another expedited grace period.  On such embedded systems,
this situation could therefore result in a system hang.

This issue manifested as DPM device timeout during suspend (which
usually qualifies as a quiet time) due to a SCSI device being stuck in
_synchronize_rcu_expedited(), with the following stack trace:

	schedule()
	synchronize_rcu_expedited()
	synchronize_rcu()
	scsi_device_quiesce()
	scsi_bus_suspend()
	dpm_run_callback()
	__device_suspend()

This commit therefore prevents such delays, timeouts, and hangs by
making rcu_exp_wait_wake() use its "s" argument consistently instead of
refetching from rcu_state.expedited_sequence.

Fixes: 3b5f668e715b ("rcu: Overlap wakeups with next expedited grace period")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: David Chen <david.chen@nutanix.com>
---
 kernel/rcu/tree_exp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 46d61b597731..f90d10c1c3c8 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -534,7 +534,7 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
 			spin_unlock(&rnp->exp_lock);
 		}
 		smp_mb(); /* All above changes before wakeup. */
-		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]);
+		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
 	}
 	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
 	mutex_unlock(&rsp->exp_wake_mutex);
-- 
2.22.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-16 22:02   ` David Chen
@ 2021-08-17  6:16     ` Greg Kroah-Hartman
  2021-08-17 18:47       ` David Chen
  2021-08-18  7:19     ` Neeraj Upadhyay
  1 sibling, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-17  6:16 UTC (permalink / raw)
  To: David Chen; +Cc: stable, Paul E. McKenney, neeraju

On Mon, Aug 16, 2021 at 10:02:28PM +0000, David Chen wrote:
> 
> 
> > -----Original Message-----
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Sent: Monday, August 16, 2021 12:31 PM
> > To: David Chen <david.chen@nutanix.com>
> > Cc: stable@vger.kernel.org; Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > 
> > On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> > > Hi Greg,
> > >
> > > We recently hit a hung task timeout issue in synchronize_rcu_expedited on
> > 4.14 branch.
> > > The issue seems to be identical to the one described in `fd6bc19d7676
> > > rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
> > 4.19 branch?
> > > The patch doesn't apply cleanly, but it should be trivial to resolve,
> > > just do this
> > >
> > > -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
> > >expedited_sequence) & 0x3]);
> > > +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> > >
> > > I don't know if we should do it for 4.9, because the handling of sequence
> > number is a bit different.
> > 
> > Please provide a working backport, me hand-editing patches does not scale,
> > and this way you get the proper credit for backporting it (after testing it).
> 
> Sure, appended at the end.
> 
> > 
> > You have tested, this, right?
> 
> I don't have a good repro for the original issue, so I only ran rcutorture and
> some basic work load test to see if anything obvious went wrong.

Ideally you would be able to also hit this without the patch on the
older kernels, is this the case?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-17  6:16     ` Greg Kroah-Hartman
@ 2021-08-17 18:47       ` David Chen
  2021-08-18  6:55         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 9+ messages in thread
From: David Chen @ 2021-08-17 18:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: stable, Paul E. McKenney, neeraju



> -----Original Message-----
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Sent: Monday, August 16, 2021 11:16 PM
> To: David Chen <david.chen@nutanix.com>
> Cc: stable@vger.kernel.org; Paul E. McKenney <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> 
> On Mon, Aug 16, 2021 at 10:02:28PM +0000, David Chen wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Sent: Monday, August 16, 2021 12:31 PM
> > > To: David Chen <david.chen@nutanix.com>
> > > Cc: stable@vger.kernel.org; Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > >
> > > On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> > > > Hi Greg,
> > > >
> > > > We recently hit a hung task timeout issue in synchronize_rcu_expedited on
> > > 4.14 branch.
> > > > The issue seems to be identical to the one described in `fd6bc19d7676
> > > > rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
> > > 4.19 branch?
> > > > The patch doesn't apply cleanly, but it should be trivial to resolve,
> > > > just do this
> > > >
> > > > -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
> > > >expedited_sequence) & 0x3]);
> > > > +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> > > >
> > > > I don't know if we should do it for 4.9, because the handling of sequence
> > > number is a bit different.
> > >
> > > Please provide a working backport, me hand-editing patches does not scale,
> > > and this way you get the proper credit for backporting it (after testing it).
> >
> > Sure, appended at the end.
> >
> > >
> > > You have tested, this, right?
> >
> > I don't have a good repro for the original issue, so I only ran rcutorture and
> > some basic work load test to see if anything obvious went wrong.
> 
> Ideally you would be able to also hit this without the patch on the
> older kernels, is this the case?
> 
So far we've only seen this once. I was able to figure out the issue from the vmcore,
but I haven't been able to reproduce this. I think the nature of the bug makes it
very difficult to hit. It requires a race with synchronize_rcu_expedited but once
the thread hangs, you can't call it again, because it might rescue the hung thread.

> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-17 18:47       ` David Chen
@ 2021-08-18  6:55         ` Greg Kroah-Hartman
  2021-08-19  0:28           ` David Chen
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-18  6:55 UTC (permalink / raw)
  To: David Chen; +Cc: stable, Paul E. McKenney, neeraju

On Tue, Aug 17, 2021 at 06:47:45PM +0000, David Chen wrote:
> 
> 
> > -----Original Message-----
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Sent: Monday, August 16, 2021 11:16 PM
> > To: David Chen <david.chen@nutanix.com>
> > Cc: stable@vger.kernel.org; Paul E. McKenney <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > 
> > On Mon, Aug 16, 2021 at 10:02:28PM +0000, David Chen wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Sent: Monday, August 16, 2021 12:31 PM
> > > > To: David Chen <david.chen@nutanix.com>
> > > > Cc: stable@vger.kernel.org; Paul E. McKenney
> > > > <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > > > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > > >
> > > > On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> > > > > Hi Greg,
> > > > >
> > > > > We recently hit a hung task timeout issue in synchronize_rcu_expedited on
> > > > 4.14 branch.
> > > > > The issue seems to be identical to the one described in `fd6bc19d7676
> > > > > rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
> > > > 4.19 branch?
> > > > > The patch doesn't apply cleanly, but it should be trivial to resolve,
> > > > > just do this
> > > > >
> > > > > -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
> > > > >expedited_sequence) & 0x3]);
> > > > > +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> > > > >
> > > > > I don't know if we should do it for 4.9, because the handling of sequence
> > > > number is a bit different.
> > > >
> > > > Please provide a working backport, me hand-editing patches does not scale,
> > > > and this way you get the proper credit for backporting it (after testing it).
> > >
> > > Sure, appended at the end.
> > >
> > > >
> > > > You have tested, this, right?
> > >
> > > I don't have a good repro for the original issue, so I only ran rcutorture and
> > > some basic work load test to see if anything obvious went wrong.
> > 
> > Ideally you would be able to also hit this without the patch on the
> > older kernels, is this the case?
> > 
> So far we've only seen this once. I was able to figure out the issue from the vmcore,
> but I haven't been able to reproduce this. I think the nature of the bug makes it
> very difficult to hit. It requires a race with synchronize_rcu_expedited but once
> the thread hangs, you can't call it again, because it might rescue the hung thread.

I would like a bit more verification that this is really needed, and
some acks from the developers/maintainers involved, before accepting
this change.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-16 22:02   ` David Chen
  2021-08-17  6:16     ` Greg Kroah-Hartman
@ 2021-08-18  7:19     ` Neeraj Upadhyay
  1 sibling, 0 replies; 9+ messages in thread
From: Neeraj Upadhyay @ 2021-08-18  7:19 UTC (permalink / raw)
  To: David Chen, Greg Kroah-Hartman; +Cc: stable, Paul E. McKenney



On 8/17/2021 3:32 AM, David Chen wrote:
> 
> 
>> -----Original Message-----
>> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Sent: Monday, August 16, 2021 12:31 PM
>> To: David Chen <david.chen@nutanix.com>
>> Cc: stable@vger.kernel.org; Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
>> Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
>>
>> On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
>>> Hi Greg,
>>>
>>> We recently hit a hung task timeout issue in synchronize_rcu_expedited on
>> 4.14 branch.
>>> The issue seems to be identical to the one described in `fd6bc19d7676
>>> rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
>> 4.19 branch?
>>> The patch doesn't apply cleanly, but it should be trivial to resolve,
>>> just do this
>>>
>>> -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
>>> expedited_sequence) & 0x3]);
>>> +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
>>>
>>> I don't know if we should do it for 4.9, because the handling of sequence
>> number is a bit different.
>>
>> Please provide a working backport, me hand-editing patches does not scale,
>> and this way you get the proper credit for backporting it (after testing it).
> 
> Sure, appended at the end.
> 
>>
>> You have tested, this, right?
> 
> I don't have a good repro for the original issue, so I only ran rcutorture and
> some basic work load test to see if anything obvious went wrong.
> 
>>
>> thanks,
>>
>> greg k-h
> 
> --------
> 
>  From 307a212335fe143027e3a9f7a9d548beead7ba33 Mon Sep 17 00:00:00 2001
> From: Neeraj Upadhyay <neeraju@codeaurora.org>
> Date: Tue, 19 Nov 2019 03:17:07 +0000
> Subject: [PATCH] rcu: Fix missed wakeup of exp_wq waiters
> 
> [ Upstream commit fd6bc19d7676a060a171d1cf3dcbf6fd797eb05f ]
> 
> Tasks waiting within exp_funnel_lock() for an expedited grace period to
> elapse can be starved due to the following sequence of events:
> 
> 1.	Tasks A and B both attempt to start an expedited grace
> 	period at about the same time.	This grace period will have
> 	completed when the lower four bits of the rcu_state structure's
> 	->expedited_sequence field are 0b'0100', for example, when the
> 	initial value of this counter is zero.	Task A wins, and thus
> 	does the actual work of starting the grace period, including
> 	acquiring the rcu_state structure's .exp_mutex and sets the
> 	counter to 0b'0001'.
> 
> 2.	Because task B lost the race to start the grace period, it
> 	waits on ->expedited_sequence to reach 0b'0100' inside of
> 	exp_funnel_lock(). This task therefore blocks on the rcu_node
> 	structure's ->exp_wq[1] field, keeping in mind that the
> 	end-of-grace-period value of ->expedited_sequence (0b'0100')
> 	is shifted down two bits before indexing the ->exp_wq[] field.
> 
> 3.	Task C attempts to start another expedited grace period,
> 	but blocks on ->exp_mutex, which is still held by Task A.
> 
> 4.	The aforementioned expedited grace period completes, so that
> 	->expedited_sequence now has the value 0b'0100'.  A kworker task
> 	therefore acquires the rcu_state structure's ->exp_wake_mutex
> 	and starts awakening any tasks waiting for this grace period.
> 
> 5.	One of the first tasks awakened happens to be Task A.  Task A
> 	therefore releases the rcu_state structure's ->exp_mutex,
> 	which allows Task C to start the next expedited grace period,
> 	which causes the lower four bits of the rcu_state structure's
> 	->expedited_sequence field to become 0b'0101'.
> 
> 6.	Task C's expedited grace period completes, so that the lower four
> 	bits of the rcu_state structure's ->expedited_sequence field now
> 	become 0b'1000'.
> 
> 7.	The kworker task from step 4 above continues its wakeups.
> 	Unfortunately, the wake_up_all() refetches the rcu_state
> 	structure's .expedited_sequence field:
> 
> 	wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);

Minor: On these kernel versions, we had rsp pointer, per RCU flavor, 
whereas post 4.20 kernel versions, we have a single rcu_state. So, the 
commit message can be corrected here. The  functionality is mostly
unchanged and same fix is applicable.

> 
> 	This results in the wakeup being applied to the rcu_node
> 	structure's ->exp_wq[2] field, which is unfortunate given that
> 	Task B is instead waiting on ->exp_wq[1].
> 
> On a busy system, no harm is done (or at least no permanent harm is done).
> Some later expedited grace period will redo the wakeup.  But on a quiet
> system, such as many embedded systems, it might be a good long time before
> there was another expedited grace period.  On such embedded systems,
> this situation could therefore result in a system hang.
> 
> This issue manifested as DPM device timeout during suspend (which
> usually qualifies as a quiet time) due to a SCSI device being stuck in
> _synchronize_rcu_expedited(), with the following stack trace:
> 
> 	schedule()
> 	synchronize_rcu_expedited()
> 	synchronize_rcu()
> 	scsi_device_quiesce()
> 	scsi_bus_suspend()
> 	dpm_run_callback()
> 	__device_suspend()
> 
> This commit therefore prevents such delays, timeouts, and hangs by
> making rcu_exp_wait_wake() use its "s" argument consistently instead of
> refetching from rcu_state.expedited_sequence.
> 
> Fixes: 3b5f668e715b ("rcu: Overlap wakeups with next expedited grace period")
> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> Signed-off-by: David Chen <david.chen@nutanix.com>
> ---
>   kernel/rcu/tree_exp.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 46d61b597731..f90d10c1c3c8 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -534,7 +534,7 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
>   			spin_unlock(&rnp->exp_lock);
>   		}
>   		smp_mb(); /* All above changes before wakeup. */
> -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp->expedited_sequence) & 0x3]);
> +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
>   	}
>   	trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
>   	mutex_unlock(&rsp->exp_wake_mutex);
> 


Acked-by: Neeraj Upadhyay <neeraju@codeaurora.org>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-18  6:55         ` Greg Kroah-Hartman
@ 2021-08-19  0:28           ` David Chen
  2021-09-23  7:52             ` Greg Kroah-Hartman
  0 siblings, 1 reply; 9+ messages in thread
From: David Chen @ 2021-08-19  0:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: stable, Paul E. McKenney, neeraju



> -----Original Message-----
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Sent: Tuesday, August 17, 2021 11:55 PM
> To: David Chen <david.chen@nutanix.com>
> Cc: stable@vger.kernel.org; Paul E. McKenney <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> 
> On Tue, Aug 17, 2021 at 06:47:45PM +0000, David Chen wrote:
> >
> >
> > > -----Original Message-----
> > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Sent: Monday, August 16, 2021 11:16 PM
> > > To: David Chen <david.chen@nutanix.com>
> > > Cc: stable@vger.kernel.org; Paul E. McKenney <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > >
> > > On Mon, Aug 16, 2021 at 10:02:28PM +0000, David Chen wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > Sent: Monday, August 16, 2021 12:31 PM
> > > > > To: David Chen <david.chen@nutanix.com>
> > > > > Cc: stable@vger.kernel.org; Paul E. McKenney
> > > > > <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > > > > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > > > >
> > > > > On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> > > > > > Hi Greg,
> > > > > >
> > > > > > We recently hit a hung task timeout issue in synchronize_rcu_expedited on
> > > > > 4.14 branch.
> > > > > > The issue seems to be identical to the one described in `fd6bc19d7676
> > > > > > rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
> > > > > 4.19 branch?
> > > > > > The patch doesn't apply cleanly, but it should be trivial to resolve,
> > > > > > just do this
> > > > > >
> > > > > > -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
> > > > > >expedited_sequence) & 0x3]);
> > > > > > +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> > > > > >
> > > > > > I don't know if we should do it for 4.9, because the handling of sequence
> > > > > number is a bit different.
> > > > >
> > > > > Please provide a working backport, me hand-editing patches does not scale,
> > > > > and this way you get the proper credit for backporting it (after testing it).
> > > >
> > > > Sure, appended at the end.
> > > >
> > > > >
> > > > > You have tested, this, right?
> > > >
> > > > I don't have a good repro for the original issue, so I only ran rcutorture and
> > > > some basic work load test to see if anything obvious went wrong.
> > >
> > > Ideally you would be able to also hit this without the patch on the
> > > older kernels, is this the case?
> > >
> > So far we've only seen this once. I was able to figure out the issue from the vmcore,
> > but I haven't been able to reproduce this. I think the nature of the bug makes it
> > very difficult to hit. It requires a race with synchronize_rcu_expedited but once
> > the thread hangs, you can't call it again, because it might rescue the hung thread.
> 
> I would like a bit more verification that this is really needed, and
> some acks from the developers/maintainers involved, before accepting
> this change.
> 
https://lkml.org/lkml/2019/11/18/184
From the original discussion, Neeraj said they hit the issue on 4.9, 4.14 and 4.19 as well.
I also tried running with the "WARN_ON(s_low != exp_low);" mentioned above without
the fix, and force a schedule before "mutex_lock(&rsp->exp_wake_mutex);" to simulate
a random latency from running on VM. I was able to trigger the warning.

[  162.760480] WARNING: CPU: 2 PID: 1129 at kernel/rcu/tree_exp.h:549 rcu_exp_wait_wake+0x4a5/0x6c0
[  162.760482] Modules linked in: rcutorture torture nls_utf8 isofs nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c iptable_filter sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ttm aesni_intel crypto_simd drm_kms_helper drm sg joydev syscopyarea sysfillrect virtio_balloon sysimgblt fb_sys_fops i2c_piix4 input_leds pcspkr qemu_fw_cfg loop binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod sr_mod cdrom ata_generic pata_acpi virtio_net virtio_scsi ata_piix virtio_pci serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log sha3_generic authenc cmac wp512 twofish_generic twofish_x86_64 twofish_common
[  162.760509]  tea sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic seed salsa20_generic rmd320 rmd256 rmd160 rmd128 michael_mic md4 khazad fcrypt dm_crypt dm_mod dax des_generic deflate cts crc32c_intel ccm cast6_avx_x86_64 cast6_generic cast_common camellia_generic ablk_helper cryptd xts lrw glue_helper blowfish_generic blowfish_common arc4 ansi_cprng fuse [last unloaded: rcu_kprobe]
[  162.760524] CPU: 2 PID: 1129 Comm: kworker/2:3 Tainted: G        W  O    4.14.243-1.nutanix.20210810.test.el7.x86_64 #1
[  162.760524] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
[  162.760525] Workqueue: events wait_rcu_exp_gp
[  162.760526] task: ffffa083e92745c0 task.stack: ffffb29442cb8000
[  162.760527] RIP: 0010:rcu_exp_wait_wake+0x4a5/0x6c0
[  162.760527] RSP: 0018:ffffb29442cbbde8 EFLAGS: 00010206
[  162.760528] RAX: 0000000000000000 RBX: ffffffff932b43c0 RCX: 0000000000000000
[  162.760529] RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
[  162.760529] RBP: ffffb29442cbbe58 R08: ffffffff932b43c0 R09: ffffb29442cbbd70
[  162.760530] R10: ffffb29442cbbba0 R11: 000000000000011b R12: ffffffff932b2440
[  162.760531] R13: 000000000000157c R14: ffffffff932b4240 R15: 0000000000000003
[  162.760531] FS:  0000000000000000(0000) GS:ffffa083efa80000(0000) knlGS:0000000000000000
[  162.760532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  162.760533] CR2: 00007f6d6d5160c8 CR3: 000000002320a001 CR4: 00000000001606e0
[  162.760535] Call Trace:
[  162.760537]  ? cpu_needs_another_gp+0x70/0x70
[  162.760538]  wait_rcu_exp_gp+0x2b/0x30
[  162.760539]  process_one_work+0x18f/0x3c0
[  162.760540]  worker_thread+0x35/0x3c0
[  162.760541]  kthread+0x128/0x140
[  162.760542]  ? process_one_work+0x3c0/0x3c0
[  162.760543]  ? __kthread_cancel_work+0x50/0x50
[  162.760544]  ret_from_fork+0x35/0x40
[  162.760545] Code: 4c 24 30 49 8b 94 24 10 13 04 00 48 c7 c7 d0 d7 05 93 0f 95 c0 48 2b 75 a8 44 0f be 80 d8 d2 05 93 e8 99 2f 70 00 e9 ae fe ff ff <0f> 0b e9 ec fc ff ff 65 8b 05 2d 40 f1 6d 89 c0 48 0f a3 05 d3
[  162.760570] ---[ end trace 2cc2ddd257a55220 ]---

The warning triggered mean that the waker skipped the slot it's supposed to do wake_up_all on,
and would result in possible missed wake up issue.

> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
  2021-08-19  0:28           ` David Chen
@ 2021-09-23  7:52             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-23  7:52 UTC (permalink / raw)
  To: David Chen; +Cc: stable, Paul E. McKenney, neeraju

On Thu, Aug 19, 2021 at 12:28:40AM +0000, David Chen wrote:
> 
> 
> > -----Original Message-----
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Sent: Tuesday, August 17, 2021 11:55 PM
> > To: David Chen <david.chen@nutanix.com>
> > Cc: stable@vger.kernel.org; Paul E. McKenney <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > 
> > On Tue, Aug 17, 2021 at 06:47:45PM +0000, David Chen wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Sent: Monday, August 16, 2021 11:16 PM
> > > > To: David Chen <david.chen@nutanix.com>
> > > > Cc: stable@vger.kernel.org; Paul E. McKenney <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > > > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > > >
> > > > On Mon, Aug 16, 2021 at 10:02:28PM +0000, David Chen wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > > > Sent: Monday, August 16, 2021 12:31 PM
> > > > > > To: David Chen <david.chen@nutanix.com>
> > > > > > Cc: stable@vger.kernel.org; Paul E. McKenney
> > > > > > <paulmck@linux.vnet.ibm.com>; neeraju@codeaurora.org
> > > > > > Subject: Re: Request for backport fd6bc19d7676 to 4.14 and 4.19 branch
> > > > > >
> > > > > > On Mon, Aug 16, 2021 at 07:19:34PM +0000, David Chen wrote:
> > > > > > > Hi Greg,
> > > > > > >
> > > > > > > We recently hit a hung task timeout issue in synchronize_rcu_expedited on
> > > > > > 4.14 branch.
> > > > > > > The issue seems to be identical to the one described in `fd6bc19d7676
> > > > > > > rcu: Fix missed wakeup of exp_wq waiters` Can we backport it to 4.14 and
> > > > > > 4.19 branch?
> > > > > > > The patch doesn't apply cleanly, but it should be trivial to resolve,
> > > > > > > just do this
> > > > > > >
> > > > > > > -		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rsp-
> > > > > > >expedited_sequence) & 0x3]);
> > > > > > > +		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> > > > > > >
> > > > > > > I don't know if we should do it for 4.9, because the handling of sequence
> > > > > > number is a bit different.
> > > > > >
> > > > > > Please provide a working backport, me hand-editing patches does not scale,
> > > > > > and this way you get the proper credit for backporting it (after testing it).
> > > > >
> > > > > Sure, appended at the end.
> > > > >
> > > > > >
> > > > > > You have tested, this, right?
> > > > >
> > > > > I don't have a good repro for the original issue, so I only ran rcutorture and
> > > > > some basic work load test to see if anything obvious went wrong.
> > > >
> > > > Ideally you would be able to also hit this without the patch on the
> > > > older kernels, is this the case?
> > > >
> > > So far we've only seen this once. I was able to figure out the issue from the vmcore,
> > > but I haven't been able to reproduce this. I think the nature of the bug makes it
> > > very difficult to hit. It requires a race with synchronize_rcu_expedited but once
> > > the thread hangs, you can't call it again, because it might rescue the hung thread.
> > 
> > I would like a bit more verification that this is really needed, and
> > some acks from the developers/maintainers involved, before accepting
> > this change.
> > 
> https://lkml.org/lkml/2019/11/18/184
> >From the original discussion, Neeraj said they hit the issue on 4.9, 4.14 and 4.19 as well.
> I also tried running with the "WARN_ON(s_low != exp_low);" mentioned above without
> the fix, and force a schedule before "mutex_lock(&rsp->exp_wake_mutex);" to simulate
> a random latency from running on VM. I was able to trigger the warning.
> 
> [  162.760480] WARNING: CPU: 2 PID: 1129 at kernel/rcu/tree_exp.h:549 rcu_exp_wait_wake+0x4a5/0x6c0
> [  162.760482] Modules linked in: rcutorture torture nls_utf8 isofs nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c iptable_filter sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ttm aesni_intel crypto_simd drm_kms_helper drm sg joydev syscopyarea sysfillrect virtio_balloon sysimgblt fb_sys_fops i2c_piix4 input_leds pcspkr qemu_fw_cfg loop binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod sr_mod cdrom ata_generic pata_acpi virtio_net virtio_scsi ata_piix virtio_pci serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log sha3_generic authenc cmac wp512 twofish_generic twofish_x86_64 twofish_common
> [  162.760509]  tea sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic seed salsa20_generic rmd320 rmd256 rmd160 rmd128 michael_mic md4 khazad fcrypt dm_crypt dm_mod dax des_generic deflate cts crc32c_intel ccm cast6_avx_x86_64 cast6_generic cast_common camellia_generic ablk_helper cryptd xts lrw glue_helper blowfish_generic blowfish_common arc4 ansi_cprng fuse [last unloaded: rcu_kprobe]
> [  162.760524] CPU: 2 PID: 1129 Comm: kworker/2:3 Tainted: G        W  O    4.14.243-1.nutanix.20210810.test.el7.x86_64 #1
> [  162.760524] Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> [  162.760525] Workqueue: events wait_rcu_exp_gp
> [  162.760526] task: ffffa083e92745c0 task.stack: ffffb29442cb8000
> [  162.760527] RIP: 0010:rcu_exp_wait_wake+0x4a5/0x6c0
> [  162.760527] RSP: 0018:ffffb29442cbbde8 EFLAGS: 00010206
> [  162.760528] RAX: 0000000000000000 RBX: ffffffff932b43c0 RCX: 0000000000000000
> [  162.760529] RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
> [  162.760529] RBP: ffffb29442cbbe58 R08: ffffffff932b43c0 R09: ffffb29442cbbd70
> [  162.760530] R10: ffffb29442cbbba0 R11: 000000000000011b R12: ffffffff932b2440
> [  162.760531] R13: 000000000000157c R14: ffffffff932b4240 R15: 0000000000000003
> [  162.760531] FS:  0000000000000000(0000) GS:ffffa083efa80000(0000) knlGS:0000000000000000
> [  162.760532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  162.760533] CR2: 00007f6d6d5160c8 CR3: 000000002320a001 CR4: 00000000001606e0
> [  162.760535] Call Trace:
> [  162.760537]  ? cpu_needs_another_gp+0x70/0x70
> [  162.760538]  wait_rcu_exp_gp+0x2b/0x30
> [  162.760539]  process_one_work+0x18f/0x3c0
> [  162.760540]  worker_thread+0x35/0x3c0
> [  162.760541]  kthread+0x128/0x140
> [  162.760542]  ? process_one_work+0x3c0/0x3c0
> [  162.760543]  ? __kthread_cancel_work+0x50/0x50
> [  162.760544]  ret_from_fork+0x35/0x40
> [  162.760545] Code: 4c 24 30 49 8b 94 24 10 13 04 00 48 c7 c7 d0 d7 05 93 0f 95 c0 48 2b 75 a8 44 0f be 80 d8 d2 05 93 e8 99 2f 70 00 e9 ae fe ff ff <0f> 0b e9 ec fc ff ff 65 8b 05 2d 40 f1 6d 89 c0 48 0f a3 05 d3
> [  162.760570] ---[ end trace 2cc2ddd257a55220 ]---
> 
> The warning triggered mean that the waker skipped the slot it's supposed to do wake_up_all on,
> and would result in possible missed wake up issue.

Ok, now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-23  7:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 19:19 Request for backport fd6bc19d7676 to 4.14 and 4.19 branch David Chen
2021-08-16 19:30 ` Greg Kroah-Hartman
2021-08-16 22:02   ` David Chen
2021-08-17  6:16     ` Greg Kroah-Hartman
2021-08-17 18:47       ` David Chen
2021-08-18  6:55         ` Greg Kroah-Hartman
2021-08-19  0:28           ` David Chen
2021-09-23  7:52             ` Greg Kroah-Hartman
2021-08-18  7:19     ` Neeraj Upadhyay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.