All of lore.kernel.org
 help / color / mirror / Atom feed
* synchronize_rcu_expedited gets stuck in hotplug path
@ 2022-01-18 11:46 Mukesh Ojha
  2022-01-18 20:06 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Mukesh Ojha @ 2022-01-18 11:46 UTC (permalink / raw)
  To: lkml; +Cc: paulmck, Thomas Gleixner

Hi ,

We are facing one issue in hotplug test where cpuhp/2 gets stuck in 
below path [1] in
synchronize_rcu_expedited at state CPUHP_AP_ONLINE_DYN and it is not 
able to proceed.
We see wait_rcu_exp_gp() is queued to cpu2  and it looks like it did not 
get chance to
run as we see it as in pending state at cpu2 [2].

So, when exactly cpu2 gets available for scheduling in hotplug path, is 
it after
CPUHP_AP_ACTIVE?

It looks to be dead lock here. Can it be fixed by making 
wait_rcu_exp_gp() queued on another wq ?
or is it a wrong usage of synchronise_rcu in hotplug path?

[1]

=======================================================
Process: cpuhp/2, [affinity: 0x4] cpu: 2 pid: 24 start: 0xffffff87803e4a00
=====================================================
     Task name: cpuhp/2 [affinity: 0x4] pid: 24 cpu: 2 prio: 120 start: 
ffffff87803e4a00
     state: 0x2[D] exit_state: 0x0 stack base: 0xffffffc010160000
     Last_enqueued_ts:      59.022215498 Last_sleep_ts: 59.022922946
     Stack:
     [<ffffffe9f4074354>] __switch_to+0x248
     [<ffffffe9f5c02474>] __schedule+0x5b0
     [<ffffffe9f5c02b28>] schedule+0x80
     [<ffffffe9f42321a4>] synchronize_rcu_expedited+0x1c4
     [<ffffffe9f423b294>] synchronize_rcu+0x4c
     [<ffffffe9f6d04ab0>] waltgov_stop[sched_walt]+0x78
     [<ffffffe9f512fa28>] cpufreq_add_policy_cpu+0xc0
     [<ffffffe9f512e48c>] cpufreq_online[jt]+0x10f4
     [<ffffffe9f51323b8>] cpuhp_cpufreq_online+0x14
     [<ffffffe9f4128d3c>] cpuhp_invoke_callback+0x2f8
     [<ffffffe9f412c30c>] cpuhp_thread_fun+0x130
     [<ffffffe9f4187a58>] smpboot_thread_fn+0x180
     [<ffffffe9f417d98c>] kthread+0x150
     [<ffffffe9f4013918>] ret_to_user[jt]+0x0


[2]

CPU 2
pool 0
IDLE Workqueue worker: kworker/2:3 current_work: (None)
IDLE Workqueue worker: kworker/2:2 current_work: (None)
IDLE Workqueue worker: kworker/2:1 current_work: (None)
IDLE Workqueue worker: kworker/2:0 current_work: (None)
Pending entry: wait_rcu_exp_gp[jt]
Pending entry: lru_add_drain_per_cpu[jt]
Pending entry: wq_barrier_func[jt]

Thanks,
Mukesh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-18 11:46 synchronize_rcu_expedited gets stuck in hotplug path Mukesh Ojha
@ 2022-01-18 20:06 ` Paul E. McKenney
  2022-01-18 20:11   ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2022-01-18 20:06 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: lkml, Thomas Gleixner, tj, jiangshanlai

On Tue, Jan 18, 2022 at 05:16:39PM +0530, Mukesh Ojha wrote:
> Hi ,
> 
> We are facing one issue in hotplug test where cpuhp/2 gets stuck in below
> path [1] in
> synchronize_rcu_expedited at state CPUHP_AP_ONLINE_DYN and it is not able to
> proceed.
> We see wait_rcu_exp_gp() is queued to cpu2  and it looks like it did not get
> chance to
> run as we see it as in pending state at cpu2 [2].
> 
> So, when exactly cpu2 gets available for scheduling in hotplug path, is it
> after
> CPUHP_AP_ACTIVE?
> 
> It looks to be dead lock here. Can it be fixed by making wait_rcu_exp_gp()
> queued on another wq ?
> or is it a wrong usage of synchronise_rcu in hotplug path?
> 
> [1]
> 
> =======================================================
> Process: cpuhp/2, [affinity: 0x4] cpu: 2 pid: 24 start: 0xffffff87803e4a00
> =====================================================
>     Task name: cpuhp/2 [affinity: 0x4] pid: 24 cpu: 2 prio: 120 start:
> ffffff87803e4a00
>     state: 0x2[D] exit_state: 0x0 stack base: 0xffffffc010160000
>     Last_enqueued_ts:      59.022215498 Last_sleep_ts: 59.022922946
>     Stack:
>     [<ffffffe9f4074354>] __switch_to+0x248
>     [<ffffffe9f5c02474>] __schedule+0x5b0
>     [<ffffffe9f5c02b28>] schedule+0x80
>     [<ffffffe9f42321a4>] synchronize_rcu_expedited+0x1c4
>     [<ffffffe9f423b294>] synchronize_rcu+0x4c
>     [<ffffffe9f6d04ab0>] waltgov_stop[sched_walt]+0x78
>     [<ffffffe9f512fa28>] cpufreq_add_policy_cpu+0xc0
>     [<ffffffe9f512e48c>] cpufreq_online[jt]+0x10f4
>     [<ffffffe9f51323b8>] cpuhp_cpufreq_online+0x14
>     [<ffffffe9f4128d3c>] cpuhp_invoke_callback+0x2f8
>     [<ffffffe9f412c30c>] cpuhp_thread_fun+0x130
>     [<ffffffe9f4187a58>] smpboot_thread_fn+0x180
>     [<ffffffe9f417d98c>] kthread+0x150
>     [<ffffffe9f4013918>] ret_to_user[jt]+0x0
> 
> 
> [2]
> 
> CPU 2
> pool 0
> IDLE Workqueue worker: kworker/2:3 current_work: (None)
> IDLE Workqueue worker: kworker/2:2 current_work: (None)
> IDLE Workqueue worker: kworker/2:1 current_work: (None)
> IDLE Workqueue worker: kworker/2:0 current_work: (None)
> Pending entry: wait_rcu_exp_gp[jt]
> Pending entry: lru_add_drain_per_cpu[jt]
> Pending entry: wq_barrier_func[jt]

Interesting.  Adding Tejun and Lai on CC for their perspective.

As you say, the incoming CPU invoked synchronize_rcu_expedited() which
in turn invoked queue_work().  By default, workqueues will of course
queue that work on the current CPU.  But in this case, the CPU's bit
is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
be reached until after the grace period ends, which cannot happen until
the workqueue handler is invoked.

I could imagine doing something as shown in the (untested) patch below,
but first does this help?

If it does help, would this sort of check be appropriate here or
should it instead go into workqueues?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb9..03c0556b29f22 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -849,7 +849,15 @@ void synchronize_rcu_expedited(void)
 		/* Marshall arguments & schedule the expedited grace period. */
 		rew.rew_s = s;
 		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
-		queue_work(rcu_gp_wq, &rew.rew_work);
+		preempt_disable();
+		if (cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) {
+			preempt_enable();
+			queue_work(rcu_gp_wq, &rew.rew_work);
+		} else {
+			// Avoid incoming CPUs.
+			preempt_enable();
+			queue_work_on(cpumask_first(cpu_active_mask), rcu_gp_wq, &rew.rew_work);
+		}
 	}
 
 	/* Wait for expedited grace period to complete. */

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-18 20:06 ` Paul E. McKenney
@ 2022-01-18 20:11   ` Tejun Heo
  2022-01-18 21:41     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2022-01-18 20:11 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Mukesh Ojha, lkml, Thomas Gleixner, jiangshanlai

Hello,

On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
> Interesting.  Adding Tejun and Lai on CC for their perspective.
> 
> As you say, the incoming CPU invoked synchronize_rcu_expedited() which
> in turn invoked queue_work().  By default, workqueues will of course
> queue that work on the current CPU.  But in this case, the CPU's bit
> is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
> the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
> be reached until after the grace period ends, which cannot happen until
> the workqueue handler is invoked.
> 
> I could imagine doing something as shown in the (untested) patch below,
> but first does this help?
> 
> If it does help, would this sort of check be appropriate here or
> should it instead go into workqueues?

Maybe it can be solved by rearranging the hotplug sequence but it's fragile
to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
instead? Does it require to be per-cpu?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-18 20:11   ` Tejun Heo
@ 2022-01-18 21:41     ` Paul E. McKenney
  2022-01-24 14:02       ` Mukesh Ojha
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2022-01-18 21:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Mukesh Ojha, lkml, Thomas Gleixner, jiangshanlai

On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
> > Interesting.  Adding Tejun and Lai on CC for their perspective.
> > 
> > As you say, the incoming CPU invoked synchronize_rcu_expedited() which
> > in turn invoked queue_work().  By default, workqueues will of course
> > queue that work on the current CPU.  But in this case, the CPU's bit
> > is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
> > the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
> > be reached until after the grace period ends, which cannot happen until
> > the workqueue handler is invoked.
> > 
> > I could imagine doing something as shown in the (untested) patch below,
> > but first does this help?
> > 
> > If it does help, would this sort of check be appropriate here or
> > should it instead go into workqueues?
> 
> Maybe it can be solved by rearranging the hotplug sequence but it's fragile
> to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
> be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
> instead? Does it require to be per-cpu?

Good point!

And now that you mention it, RCU expedited grace periods already avoid
using workqueues during early boot.  The (again untested) patch below
extends that approach to incoming CPUs.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb9..1a45667402260 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
  */
 void synchronize_rcu_expedited(void)
 {
-	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
+	bool no_wq;
 	struct rcu_exp_work rew;
 	struct rcu_node *rnp;
 	unsigned long s;
@@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
 	if (exp_funnel_lock(s))
 		return;  /* Someone else did our work for us. */
 
+	/* Don't use workqueue during boot or from an incoming CPU. */
+	preempt_disable();
+	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
+		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
+	preempt_enable();
+
 	/* Ensure that load happens before action based on it. */
-	if (unlikely(boottime)) {
-		/* Direct call during scheduler init and early_initcalls(). */
+	if (unlikely(no_wq)) {
+		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
 		rcu_exp_sel_wait_wake(s);
 	} else {
 		/* Marshall arguments & schedule the expedited grace period. */
@@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
 	/* Let the next expedited grace period start. */
 	mutex_unlock(&rcu_state.exp_mutex);
 
-	if (likely(!boottime))
+	if (likely(!no_wq))
 		destroy_work_on_stack(&rew.rew_work);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-18 21:41     ` Paul E. McKenney
@ 2022-01-24 14:02       ` Mukesh Ojha
  2022-01-24 16:44         ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Mukesh Ojha @ 2022-01-24 14:02 UTC (permalink / raw)
  To: paulmck, Tejun Heo; +Cc: lkml, Thomas Gleixner, jiangshanlai


On 1/19/2022 3:11 AM, Paul E. McKenney wrote:
> On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
>>> Interesting.  Adding Tejun and Lai on CC for their perspective.
>>>
>>> As you say, the incoming CPU invoked synchronize_rcu_expedited() which
>>> in turn invoked queue_work().  By default, workqueues will of course
>>> queue that work on the current CPU.  But in this case, the CPU's bit
>>> is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
>>> the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
>>> be reached until after the grace period ends, which cannot happen until
>>> the workqueue handler is invoked.
>>>
>>> I could imagine doing something as shown in the (untested) patch below,
>>> but first does this help?
>>>
>>> If it does help, would this sort of check be appropriate here or
>>> should it instead go into workqueues?
>> Maybe it can be solved by rearranging the hotplug sequence but it's fragile
>> to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
>> be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
>> instead? Does it require to be per-cpu?
> Good point!
>
> And now that you mention it, RCU expedited grace periods already avoid
> using workqueues during early boot.  The (again untested) patch below
> extends that approach to incoming CPUs.
>
> Thoughts?

Hi Paul,

We are not seeing the issue after this patch.
Can we merge this patch ?

-Mukesh

>
> 							Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 60197ea24ceb9..1a45667402260 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>    */
>   void synchronize_rcu_expedited(void)
>   {
> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> +	bool no_wq;
>   	struct rcu_exp_work rew;
>   	struct rcu_node *rnp;
>   	unsigned long s;
> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>   	if (exp_funnel_lock(s))
>   		return;  /* Someone else did our work for us. */
>   
> +	/* Don't use workqueue during boot or from an incoming CPU. */
> +	preempt_disable();
> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> +	preempt_enable();
> +
>   	/* Ensure that load happens before action based on it. */
> -	if (unlikely(boottime)) {
> -		/* Direct call during scheduler init and early_initcalls(). */
> +	if (unlikely(no_wq)) {
> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>   		rcu_exp_sel_wait_wake(s);
>   	} else {
>   		/* Marshall arguments & schedule the expedited grace period. */
> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>   	/* Let the next expedited grace period start. */
>   	mutex_unlock(&rcu_state.exp_mutex);
>   
> -	if (likely(!boottime))
> +	if (likely(!no_wq))
>   		destroy_work_on_stack(&rew.rew_work);
>   }
>   EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-24 14:02       ` Mukesh Ojha
@ 2022-01-24 16:44         ` Paul E. McKenney
  2022-01-24 16:58           ` Mukesh Ojha
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2022-01-24 16:44 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: Tejun Heo, lkml, Thomas Gleixner, jiangshanlai

On Mon, Jan 24, 2022 at 07:32:01PM +0530, Mukesh Ojha wrote:
> 
> On 1/19/2022 3:11 AM, Paul E. McKenney wrote:
> > On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
> > > > Interesting.  Adding Tejun and Lai on CC for their perspective.
> > > > 
> > > > As you say, the incoming CPU invoked synchronize_rcu_expedited() which
> > > > in turn invoked queue_work().  By default, workqueues will of course
> > > > queue that work on the current CPU.  But in this case, the CPU's bit
> > > > is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
> > > > the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
> > > > be reached until after the grace period ends, which cannot happen until
> > > > the workqueue handler is invoked.
> > > > 
> > > > I could imagine doing something as shown in the (untested) patch below,
> > > > but first does this help?
> > > > 
> > > > If it does help, would this sort of check be appropriate here or
> > > > should it instead go into workqueues?
> > > Maybe it can be solved by rearranging the hotplug sequence but it's fragile
> > > to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
> > > be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
> > > instead? Does it require to be per-cpu?
> > Good point!
> > 
> > And now that you mention it, RCU expedited grace periods already avoid
> > using workqueues during early boot.  The (again untested) patch below
> > extends that approach to incoming CPUs.
> > 
> > Thoughts?
> 
> Hi Paul,
> 
> We are not seeing the issue after this patch.
> Can we merge this patch ?

It is currently in -rcu and should also be in -next shortly.  Left to
myself, and assuming further testing and reviews all go well, I would
submit it during the upcoming v5.18 merge window.

Does that work for you?  Or do you need it in mainline sooner?

							Thanx, Paul

> -Mukesh
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 60197ea24ceb9..1a45667402260 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> >    */
> >   void synchronize_rcu_expedited(void)
> >   {
> > -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> > +	bool no_wq;
> >   	struct rcu_exp_work rew;
> >   	struct rcu_node *rnp;
> >   	unsigned long s;
> > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
> >   	if (exp_funnel_lock(s))
> >   		return;  /* Someone else did our work for us. */
> > +	/* Don't use workqueue during boot or from an incoming CPU. */
> > +	preempt_disable();
> > +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> > +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> > +	preempt_enable();
> > +
> >   	/* Ensure that load happens before action based on it. */
> > -	if (unlikely(boottime)) {
> > -		/* Direct call during scheduler init and early_initcalls(). */
> > +	if (unlikely(no_wq)) {
> > +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
> >   		rcu_exp_sel_wait_wake(s);
> >   	} else {
> >   		/* Marshall arguments & schedule the expedited grace period. */
> > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
> >   	/* Let the next expedited grace period start. */
> >   	mutex_unlock(&rcu_state.exp_mutex);
> > -	if (likely(!boottime))
> > +	if (likely(!no_wq))
> >   		destroy_work_on_stack(&rew.rew_work);
> >   }
> >   EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-24 16:44         ` Paul E. McKenney
@ 2022-01-24 16:58           ` Mukesh Ojha
  2022-01-25 20:21             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Mukesh Ojha @ 2022-01-24 16:58 UTC (permalink / raw)
  To: paulmck; +Cc: Tejun Heo, lkml, Thomas Gleixner, jiangshanlai


On 1/24/2022 10:14 PM, Paul E. McKenney wrote:
> On Mon, Jan 24, 2022 at 07:32:01PM +0530, Mukesh Ojha wrote:
>> On 1/19/2022 3:11 AM, Paul E. McKenney wrote:
>>> On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
>>>> Hello,
>>>>
>>>> On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
>>>>> Interesting.  Adding Tejun and Lai on CC for their perspective.
>>>>>
>>>>> As you say, the incoming CPU invoked synchronize_rcu_expedited() which
>>>>> in turn invoked queue_work().  By default, workqueues will of course
>>>>> queue that work on the current CPU.  But in this case, the CPU's bit
>>>>> is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
>>>>> the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
>>>>> be reached until after the grace period ends, which cannot happen until
>>>>> the workqueue handler is invoked.
>>>>>
>>>>> I could imagine doing something as shown in the (untested) patch below,
>>>>> but first does this help?
>>>>>
>>>>> If it does help, would this sort of check be appropriate here or
>>>>> should it instead go into workqueues?
>>>> Maybe it can be solved by rearranging the hotplug sequence but it's fragile
>>>> to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
>>>> be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
>>>> instead? Does it require to be per-cpu?
>>> Good point!
>>>
>>> And now that you mention it, RCU expedited grace periods already avoid
>>> using workqueues during early boot.  The (again untested) patch below
>>> extends that approach to incoming CPUs.
>>>
>>> Thoughts?
>> Hi Paul,
>>
>> We are not seeing the issue after this patch.
>> Can we merge this patch ?
> It is currently in -rcu and should also be in -next shortly.  Left to
> myself, and assuming further testing and reviews all go well, I would
> submit it during the upcoming v5.18 merge window.
>
> Does that work for you?  Or do you need it in mainline sooner?

Before reporting this issue, we saw only one instance of it.
Also got this fix tested with same set of test cases, did not observe 
any issue as of yet.

I would be happy to get a mail once it clear all the testing and get merges
to -next. I would cherry-pick it in android branch-5.10.

-Mukesh

>
> 							Thanx, Paul
>
>> -Mukesh
>>
>>> 							Thanx, Paul
>>>
>>> ------------------------------------------------------------------------
>>>
>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>> index 60197ea24ceb9..1a45667402260 100644
>>> --- a/kernel/rcu/tree_exp.h
>>> +++ b/kernel/rcu/tree_exp.h
>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>>>     */
>>>    void synchronize_rcu_expedited(void)
>>>    {
>>> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>> +	bool no_wq;
>>>    	struct rcu_exp_work rew;
>>>    	struct rcu_node *rnp;
>>>    	unsigned long s;
>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>    	if (exp_funnel_lock(s))
>>>    		return;  /* Someone else did our work for us. */
>>> +	/* Don't use workqueue during boot or from an incoming CPU. */
>>> +	preempt_disable();
>>> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>> +	preempt_enable();
>>> +
>>>    	/* Ensure that load happens before action based on it. */
>>> -	if (unlikely(boottime)) {
>>> -		/* Direct call during scheduler init and early_initcalls(). */
>>> +	if (unlikely(no_wq)) {
>>> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>>>    		rcu_exp_sel_wait_wake(s);
>>>    	} else {
>>>    		/* Marshall arguments & schedule the expedited grace period. */
>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>    	/* Let the next expedited grace period start. */
>>>    	mutex_unlock(&rcu_state.exp_mutex);
>>> -	if (likely(!boottime))
>>> +	if (likely(!no_wq))
>>>    		destroy_work_on_stack(&rew.rew_work);
>>>    }
>>>    EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-24 16:58           ` Mukesh Ojha
@ 2022-01-25 20:21             ` Paul E. McKenney
  2022-01-26  7:32               ` Mukesh Ojha
  2022-01-26  7:33               ` Mukesh Ojha
  0 siblings, 2 replies; 10+ messages in thread
From: Paul E. McKenney @ 2022-01-25 20:21 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: Tejun Heo, lkml, Thomas Gleixner, jiangshanlai

On Mon, Jan 24, 2022 at 10:28:28PM +0530, Mukesh Ojha wrote:
> 
> On 1/24/2022 10:14 PM, Paul E. McKenney wrote:
> > On Mon, Jan 24, 2022 at 07:32:01PM +0530, Mukesh Ojha wrote:
> > > On 1/19/2022 3:11 AM, Paul E. McKenney wrote:
> > > > On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
> > > > > Hello,
> > > > > 
> > > > > On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
> > > > > > Interesting.  Adding Tejun and Lai on CC for their perspective.
> > > > > > 
> > > > > > As you say, the incoming CPU invoked synchronize_rcu_expedited() which
> > > > > > in turn invoked queue_work().  By default, workqueues will of course
> > > > > > queue that work on the current CPU.  But in this case, the CPU's bit
> > > > > > is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
> > > > > > the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
> > > > > > be reached until after the grace period ends, which cannot happen until
> > > > > > the workqueue handler is invoked.
> > > > > > 
> > > > > > I could imagine doing something as shown in the (untested) patch below,
> > > > > > but first does this help?
> > > > > > 
> > > > > > If it does help, would this sort of check be appropriate here or
> > > > > > should it instead go into workqueues?
> > > > > Maybe it can be solved by rearranging the hotplug sequence but it's fragile
> > > > > to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
> > > > > be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
> > > > > instead? Does it require to be per-cpu?
> > > > Good point!
> > > > 
> > > > And now that you mention it, RCU expedited grace periods already avoid
> > > > using workqueues during early boot.  The (again untested) patch below
> > > > extends that approach to incoming CPUs.
> > > > 
> > > > Thoughts?
> > > Hi Paul,
> > > 
> > > We are not seeing the issue after this patch.
> > > Can we merge this patch ?
> > It is currently in -rcu and should also be in -next shortly.  Left to
> > myself, and assuming further testing and reviews all go well, I would
> > submit it during the upcoming v5.18 merge window.
> > 
> > Does that work for you?  Or do you need it in mainline sooner?
> 
> Before reporting this issue, we saw only one instance of it.
> Also got this fix tested with same set of test cases, did not observe any
> issue as of yet.
> 
> I would be happy to get a mail once it clear all the testing and get merges
> to -next. I would cherry-pick it in android branch-5.10.

It is in -next as of next-20220125.

							Thanx, Paul

> -Mukesh
> 
> > 
> > 							Thanx, Paul
> > 
> > > -Mukesh
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index 60197ea24ceb9..1a45667402260 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > > >     */
> > > >    void synchronize_rcu_expedited(void)
> > > >    {
> > > > -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> > > > +	bool no_wq;
> > > >    	struct rcu_exp_work rew;
> > > >    	struct rcu_node *rnp;
> > > >    	unsigned long s;
> > > > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
> > > >    	if (exp_funnel_lock(s))
> > > >    		return;  /* Someone else did our work for us. */
> > > > +	/* Don't use workqueue during boot or from an incoming CPU. */
> > > > +	preempt_disable();
> > > > +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> > > > +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> > > > +	preempt_enable();
> > > > +
> > > >    	/* Ensure that load happens before action based on it. */
> > > > -	if (unlikely(boottime)) {
> > > > -		/* Direct call during scheduler init and early_initcalls(). */
> > > > +	if (unlikely(no_wq)) {
> > > > +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
> > > >    		rcu_exp_sel_wait_wake(s);
> > > >    	} else {
> > > >    		/* Marshall arguments & schedule the expedited grace period. */
> > > > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
> > > >    	/* Let the next expedited grace period start. */
> > > >    	mutex_unlock(&rcu_state.exp_mutex);
> > > > -	if (likely(!boottime))
> > > > +	if (likely(!no_wq))
> > > >    		destroy_work_on_stack(&rew.rew_work);
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-25 20:21             ` Paul E. McKenney
@ 2022-01-26  7:32               ` Mukesh Ojha
  2022-01-26  7:33               ` Mukesh Ojha
  1 sibling, 0 replies; 10+ messages in thread
From: Mukesh Ojha @ 2022-01-26  7:32 UTC (permalink / raw)
  To: paulmck; +Cc: Tejun Heo, lkml, Thomas Gleixner, jiangshanlai


On 1/26/2022 1:51 AM, Paul E. McKenney wrote:
> On Mon, Jan 24, 2022 at 10:28:28PM +0530, Mukesh Ojha wrote:
>> On 1/24/2022 10:14 PM, Paul E. McKenney wrote:
>>> On Mon, Jan 24, 2022 at 07:32:01PM +0530, Mukesh Ojha wrote:
>>>> On 1/19/2022 3:11 AM, Paul E. McKenney wrote:
>>>>> On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
>>>>>>> Interesting.  Adding Tejun and Lai on CC for their perspective.
>>>>>>>
>>>>>>> As you say, the incoming CPU invoked synchronize_rcu_expedited() which
>>>>>>> in turn invoked queue_work().  By default, workqueues will of course
>>>>>>> queue that work on the current CPU.  But in this case, the CPU's bit
>>>>>>> is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
>>>>>>> the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
>>>>>>> be reached until after the grace period ends, which cannot happen until
>>>>>>> the workqueue handler is invoked.
>>>>>>>
>>>>>>> I could imagine doing something as shown in the (untested) patch below,
>>>>>>> but first does this help?
>>>>>>>
>>>>>>> If it does help, would this sort of check be appropriate here or
>>>>>>> should it instead go into workqueues?
>>>>>> Maybe it can be solved by rearranging the hotplug sequence but it's fragile
>>>>>> to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
>>>>>> be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
>>>>>> instead? Does it require to be per-cpu?
>>>>> Good point!
>>>>>
>>>>> And now that you mention it, RCU expedited grace periods already avoid
>>>>> using workqueues during early boot.  The (again untested) patch below
>>>>> extends that approach to incoming CPUs.
>>>>>
>>>>> Thoughts?
>>>> Hi Paul,
>>>>
>>>> We are not seeing the issue after this patch.
>>>> Can we merge this patch ?
>>> It is currently in -rcu and should also be in -next shortly.  Left to
>>> myself, and assuming further testing and reviews all go well, I would
>>> submit it during the upcoming v5.18 merge window.
>>>
>>> Does that work for you?  Or do you need it in mainline sooner?
>> Before reporting this issue, we saw only one instance of it.
>> Also got this fix tested with same set of test cases, did not observe any
>> issue as of yet.
>>
>> I would be happy to get a mail once it clear all the testing and get merges
>> to -next. I would cherry-pick it in android branch-5.10.
> It is in -next as of next-20220125.

Thanks :-)


> 							Thanx, Paul
>
>> -Mukesh
>>
>>> 							Thanx, Paul
>>>
>>>> -Mukesh
>>>>
>>>>> 							Thanx, Paul
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>>> index 60197ea24ceb9..1a45667402260 100644
>>>>> --- a/kernel/rcu/tree_exp.h
>>>>> +++ b/kernel/rcu/tree_exp.h
>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>>>>>      */
>>>>>     void synchronize_rcu_expedited(void)
>>>>>     {
>>>>> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>>>> +	bool no_wq;
>>>>>     	struct rcu_exp_work rew;
>>>>>     	struct rcu_node *rnp;
>>>>>     	unsigned long s;
>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>>>     	if (exp_funnel_lock(s))
>>>>>     		return;  /* Someone else did our work for us. */
>>>>> +	/* Don't use workqueue during boot or from an incoming CPU. */
>>>>> +	preempt_disable();
>>>>> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>>>> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>>>> +	preempt_enable();
>>>>> +
>>>>>     	/* Ensure that load happens before action based on it. */
>>>>> -	if (unlikely(boottime)) {
>>>>> -		/* Direct call during scheduler init and early_initcalls(). */
>>>>> +	if (unlikely(no_wq)) {
>>>>> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>>>>>     		rcu_exp_sel_wait_wake(s);
>>>>>     	} else {
>>>>>     		/* Marshall arguments & schedule the expedited grace period. */
>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>>>     	/* Let the next expedited grace period start. */
>>>>>     	mutex_unlock(&rcu_state.exp_mutex);
>>>>> -	if (likely(!boottime))
>>>>> +	if (likely(!no_wq))
>>>>>     		destroy_work_on_stack(&rew.rew_work);
>>>>>     }
>>>>>     EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: synchronize_rcu_expedited gets stuck in hotplug path
  2022-01-25 20:21             ` Paul E. McKenney
  2022-01-26  7:32               ` Mukesh Ojha
@ 2022-01-26  7:33               ` Mukesh Ojha
  1 sibling, 0 replies; 10+ messages in thread
From: Mukesh Ojha @ 2022-01-26  7:33 UTC (permalink / raw)
  To: paulmck; +Cc: Tejun Heo, lkml, Thomas Gleixner, jiangshanlai


On 1/26/2022 1:51 AM, Paul E. McKenney wrote:
> On Mon, Jan 24, 2022 at 10:28:28PM +0530, Mukesh Ojha wrote:
>> On 1/24/2022 10:14 PM, Paul E. McKenney wrote:
>>> On Mon, Jan 24, 2022 at 07:32:01PM +0530, Mukesh Ojha wrote:
>>>> On 1/19/2022 3:11 AM, Paul E. McKenney wrote:
>>>>> On Tue, Jan 18, 2022 at 10:11:34AM -1000, Tejun Heo wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Tue, Jan 18, 2022 at 12:06:46PM -0800, Paul E. McKenney wrote:
>>>>>>> Interesting.  Adding Tejun and Lai on CC for their perspective.
>>>>>>>
>>>>>>> As you say, the incoming CPU invoked synchronize_rcu_expedited() which
>>>>>>> in turn invoked queue_work().  By default, workqueues will of course
>>>>>>> queue that work on the current CPU.  But in this case, the CPU's bit
>>>>>>> is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
>>>>>>> the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
>>>>>>> be reached until after the grace period ends, which cannot happen until
>>>>>>> the workqueue handler is invoked.
>>>>>>>
>>>>>>> I could imagine doing something as shown in the (untested) patch below,
>>>>>>> but first does this help?
>>>>>>>
>>>>>>> If it does help, would this sort of check be appropriate here or
>>>>>>> should it instead go into workqueues?
>>>>>> Maybe it can be solved by rearranging the hotplug sequence but it's fragile
>>>>>> to schedule per-cpu work items from hotplug paths. Maybe the whole issue can
>>>>>> be side-stepped by making synchronize_rcu_expedited() use unbound workqueue
>>>>>> instead? Does it require to be per-cpu?
>>>>> Good point!
>>>>>
>>>>> And now that you mention it, RCU expedited grace periods already avoid
>>>>> using workqueues during early boot.  The (again untested) patch below
>>>>> extends that approach to incoming CPUs.
>>>>>
>>>>> Thoughts?
>>>> Hi Paul,
>>>>
>>>> We are not seeing the issue after this patch.
>>>> Can we merge this patch ?
>>> It is currently in -rcu and should also be in -next shortly.  Left to
>>> myself, and assuming further testing and reviews all go well, I would
>>> submit it during the upcoming v5.18 merge window.
>>>
>>> Does that work for you?  Or do you need it in mainline sooner?
>> Before reporting this issue, we saw only one instance of it.
>> Also got this fix tested with same set of test cases, did not observe any
>> issue as of yet.
>>
>> I would be happy to get a mail once it clear all the testing and get merges
>> to -next. I would cherry-pick it in android branch-5.10.
> It is in -next as of next-20220125.

Thanks :-)


> 							Thanx, Paul
>
>> -Mukesh
>>
>>> 							Thanx, Paul
>>>
>>>> -Mukesh
>>>>
>>>>> 							Thanx, Paul
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>>> index 60197ea24ceb9..1a45667402260 100644
>>>>> --- a/kernel/rcu/tree_exp.h
>>>>> +++ b/kernel/rcu/tree_exp.h
>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>>>>>      */
>>>>>     void synchronize_rcu_expedited(void)
>>>>>     {
>>>>> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>>>> +	bool no_wq;
>>>>>     	struct rcu_exp_work rew;
>>>>>     	struct rcu_node *rnp;
>>>>>     	unsigned long s;
>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>>>     	if (exp_funnel_lock(s))
>>>>>     		return;  /* Someone else did our work for us. */
>>>>> +	/* Don't use workqueue during boot or from an incoming CPU. */
>>>>> +	preempt_disable();
>>>>> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>>>> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>>>> +	preempt_enable();
>>>>> +
>>>>>     	/* Ensure that load happens before action based on it. */
>>>>> -	if (unlikely(boottime)) {
>>>>> -		/* Direct call during scheduler init and early_initcalls(). */
>>>>> +	if (unlikely(no_wq)) {
>>>>> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>>>>>     		rcu_exp_sel_wait_wake(s);
>>>>>     	} else {
>>>>>     		/* Marshall arguments & schedule the expedited grace period. */
>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>>>     	/* Let the next expedited grace period start. */
>>>>>     	mutex_unlock(&rcu_state.exp_mutex);
>>>>> -	if (likely(!boottime))
>>>>> +	if (likely(!no_wq))
>>>>>     		destroy_work_on_stack(&rew.rew_work);
>>>>>     }
>>>>>     EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-26  7:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18 11:46 synchronize_rcu_expedited gets stuck in hotplug path Mukesh Ojha
2022-01-18 20:06 ` Paul E. McKenney
2022-01-18 20:11   ` Tejun Heo
2022-01-18 21:41     ` Paul E. McKenney
2022-01-24 14:02       ` Mukesh Ojha
2022-01-24 16:44         ` Paul E. McKenney
2022-01-24 16:58           ` Mukesh Ojha
2022-01-25 20:21             ` Paul E. McKenney
2022-01-26  7:32               ` Mukesh Ojha
2022-01-26  7:33               ` Mukesh Ojha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.