linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rcu 0/3] Expedited-grace-period updates for v5.18
@ 2022-02-04 22:54 Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-04 22:54 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series provides updates for RCU expedited grace periods:

1.	Fix check for idle context in rcu_exp_handler, courtesy of
	Neeraj Upadhyay.

2.	Mark ->expmask access in synchronize_rcu_expedited_wait().

3.	Allow expedited RCU grace periods on incoming CPUs.

						Thanx, Paul

------------------------------------------------------------------------

 b/kernel/rcu/tree_exp.h |    2 +-
 kernel/rcu/tree_exp.h   |   17 ++++++++++++-----
 2 files changed, 13 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler
  2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney
@ 2022-02-04 22:55 ` Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney
  2 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-04 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Neeraj Upadhyay,
	Frederic Weisbecker, Paul E . McKenney

From: Neeraj Upadhyay <quic_neeraju@quicinc.com>

For PREEMPT_RCU, the rcu_exp_handler() function checks
whether the current CPU is in idle, by calling
rcu_dynticks_curr_cpu_in_eqs(). However, rcu_exp_handler()
is called in IPI handler context. So, it should be checking
the idle context using rcu_is_cpu_rrupt_from_idle(). Fix this
by using rcu_is_cpu_rrupt_from_idle() instead of
rcu_dynticks_curr_cpu_in_eqs(). Non-preempt configuration
already uses the correct check.

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_exp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 237a79989abae..1568c8ef185b2 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -656,7 +656,7 @@ static void rcu_exp_handler(void *unused)
 	 */
 	if (!depth) {
 		if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) ||
-		    rcu_dynticks_curr_cpu_in_eqs()) {
+		    rcu_is_cpu_rrupt_from_idle()) {
 			rcu_report_exp_rdp(rdp);
 		} else {
 			WRITE_ONCE(rdp->cpu_no_qs.b.exp, true);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait()
  2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney
@ 2022-02-04 22:55 ` Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney
  2 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-04 22:55 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

This commit adds a READ_ONCE() to an access to the rcu_node structure's
->expmask field to prevent compiler mischief.  Detected by KCSAN.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_exp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 1568c8ef185b2..60197ea24ceb9 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -502,7 +502,8 @@ static void synchronize_rcu_expedited_wait(void)
 		if (synchronize_rcu_expedited_wait_once(1))
 			return;
 		rcu_for_each_leaf_node(rnp) {
-			for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) {
+			mask = READ_ONCE(rnp->expmask);
+			for_each_leaf_node_cpu_mask(rnp, cpu, mask) {
 				rdp = per_cpu_ptr(&rcu_data, cpu);
 				if (rdp->rcu_forced_tick_exp)
 					continue;
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney
  2022-02-04 22:55 ` [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() Paul E. McKenney
@ 2022-02-04 22:55 ` Paul E. McKenney
  2022-02-08 18:56   ` Tejun Heo
  2022-02-09 18:23   ` Mukesh Ojha
  2 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-04 22:55 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Mukesh Ojha, Tejun Heo

Although it is usually safe to invoke synchronize_rcu_expedited() from a
preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
invoke a workqueue handler will hang due to RCU waiting on a CPU that
the scheduler is not paying attention to.  This commit therefore expands
use of the existing workqueue-independent synchronize_rcu_expedited()
from early boot to also include CPUs that are being hotplugged.

Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_exp.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb9..1a45667402260 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
  */
 void synchronize_rcu_expedited(void)
 {
-	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
+	bool no_wq;
 	struct rcu_exp_work rew;
 	struct rcu_node *rnp;
 	unsigned long s;
@@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
 	if (exp_funnel_lock(s))
 		return;  /* Someone else did our work for us. */
 
+	/* Don't use workqueue during boot or from an incoming CPU. */
+	preempt_disable();
+	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
+		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
+	preempt_enable();
+
 	/* Ensure that load happens before action based on it. */
-	if (unlikely(boottime)) {
-		/* Direct call during scheduler init and early_initcalls(). */
+	if (unlikely(no_wq)) {
+		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
 		rcu_exp_sel_wait_wake(s);
 	} else {
 		/* Marshall arguments & schedule the expedited grace period. */
@@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
 	/* Let the next expedited grace period start. */
 	mutex_unlock(&rcu_state.exp_mutex);
 
-	if (likely(!boottime))
+	if (likely(!no_wq))
 		destroy_work_on_stack(&rew.rew_work);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney
@ 2022-02-08 18:56   ` Tejun Heo
  2022-02-09 18:23   ` Mukesh Ojha
  1 sibling, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2022-02-08 18:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu, linux-kernel, kernel-team, rostedt, Mukesh Ojha

On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
> Although it is usually safe to invoke synchronize_rcu_expedited() from a
> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> invoke a workqueue handler will hang due to RCU waiting on a CPU that
> the scheduler is not paying attention to.  This commit therefore expands
> use of the existing workqueue-independent synchronize_rcu_expedited()
> from early boot to also include CPUs that are being hotplugged.
> 
> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney
  2022-02-08 18:56   ` Tejun Heo
@ 2022-02-09 18:23   ` Mukesh Ojha
  2022-02-09 22:06     ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-09 18:23 UTC (permalink / raw)
  To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt, Tejun Heo


On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
> Although it is usually safe to invoke synchronize_rcu_expedited() from a
> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> invoke a workqueue handler will hang due to RCU waiting on a CPU that
> the scheduler is not paying attention to.  This commit therefore expands
> use of the existing workqueue-independent synchronize_rcu_expedited()
> from early boot to also include CPUs that are being hotplugged.
>
> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>   kernel/rcu/tree_exp.h | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 60197ea24ceb9..1a45667402260 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>    */
>   void synchronize_rcu_expedited(void)
>   {
> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> +	bool no_wq;
>   	struct rcu_exp_work rew;
>   	struct rcu_node *rnp;
>   	unsigned long s;
> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>   	if (exp_funnel_lock(s))
>   		return;  /* Someone else did our work for us. */
>   
> +	/* Don't use workqueue during boot or from an incoming CPU. */
> +	preempt_disable();
> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> +	preempt_enable();
> +
>   	/* Ensure that load happens before action based on it. */
> -	if (unlikely(boottime)) {
> -		/* Direct call during scheduler init and early_initcalls(). */
> +	if (unlikely(no_wq)) {
> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>   		rcu_exp_sel_wait_wake(s);
>   	} else {
>   		/* Marshall arguments & schedule the expedited grace period. */
> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>   	/* Let the next expedited grace period start. */
>   	mutex_unlock(&rcu_state.exp_mutex);
>   
> -	if (likely(!boottime))
> +	if (likely(!no_wq))
>   		destroy_work_on_stack(&rew.rew_work);
>   }
>   EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

Can we reach a condition after this change where no_wq = true and during 
rcu_stall report where exp_task = 0 list and exp_mask contain only this 
cpu ?

-Mukesh



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-09 18:23   ` Mukesh Ojha
@ 2022-02-09 22:06     ` Paul E. McKenney
  2022-02-11 18:44       ` Mukesh Ojha
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-09 22:06 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo

On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote:
> 
> On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
> > Although it is usually safe to invoke synchronize_rcu_expedited() from a
> > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> > invoke a workqueue handler will hang due to RCU waiting on a CPU that
> > the scheduler is not paying attention to.  This commit therefore expands
> > use of the existing workqueue-independent synchronize_rcu_expedited()
> > from early boot to also include CPUs that are being hotplugged.
> > 
> > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >   kernel/rcu/tree_exp.h | 14 ++++++++++----
> >   1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 60197ea24ceb9..1a45667402260 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> >    */
> >   void synchronize_rcu_expedited(void)
> >   {
> > -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> > +	bool no_wq;
> >   	struct rcu_exp_work rew;
> >   	struct rcu_node *rnp;
> >   	unsigned long s;
> > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
> >   	if (exp_funnel_lock(s))
> >   		return;  /* Someone else did our work for us. */
> > +	/* Don't use workqueue during boot or from an incoming CPU. */
> > +	preempt_disable();
> > +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> > +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> > +	preempt_enable();
> > +
> >   	/* Ensure that load happens before action based on it. */
> > -	if (unlikely(boottime)) {
> > -		/* Direct call during scheduler init and early_initcalls(). */
> > +	if (unlikely(no_wq)) {
> > +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
> >   		rcu_exp_sel_wait_wake(s);
> >   	} else {
> >   		/* Marshall arguments & schedule the expedited grace period. */
> > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
> >   	/* Let the next expedited grace period start. */
> >   	mutex_unlock(&rcu_state.exp_mutex);
> > -	if (likely(!boottime))
> > +	if (likely(!no_wq))
> >   		destroy_work_on_stack(&rew.rew_work);
> >   }
> >   EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
> 
> Can we reach a condition after this change where no_wq = true and during
> rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu
> ?

Hello, Mukesh, and thank you for looking this over!

At first glance, I do not believe that this can happen because the
expedited grace-period machinery avoids waiting on the current CPU.
(See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id()
early in the function and the get_cpu() later in the function.)

But please let me know if I am missing something here.

But suppose that we could in fact reach this condition.  What bad thing
would happen?  Other than a resched_cpu() having been invoked several
times on a not-yet-online CPU, of course.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-09 22:06     ` Paul E. McKenney
@ 2022-02-11 18:44       ` Mukesh Ojha
  2022-02-11 22:14         ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-11 18:44 UTC (permalink / raw)
  To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo


On 2/10/2022 3:36 AM, Paul E. McKenney wrote:
> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote:
>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
>>> Although it is usually safe to invoke synchronize_rcu_expedited() from a
>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that
>>> the scheduler is not paying attention to.  This commit therefore expands
>>> use of the existing workqueue-independent synchronize_rcu_expedited()
>>> from early boot to also include CPUs that are being hotplugged.
>>>
>>> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>> ---
>>>    kernel/rcu/tree_exp.h | 14 ++++++++++----
>>>    1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>> index 60197ea24ceb9..1a45667402260 100644
>>> --- a/kernel/rcu/tree_exp.h
>>> +++ b/kernel/rcu/tree_exp.h
>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>>>     */
>>>    void synchronize_rcu_expedited(void)
>>>    {
>>> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>> +	bool no_wq;
>>>    	struct rcu_exp_work rew;
>>>    	struct rcu_node *rnp;
>>>    	unsigned long s;
>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>    	if (exp_funnel_lock(s))
>>>    		return;  /* Someone else did our work for us. */
>>> +	/* Don't use workqueue during boot or from an incoming CPU. */
>>> +	preempt_disable();
>>> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>> +	preempt_enable();
>>> +
>>>    	/* Ensure that load happens before action based on it. */
>>> -	if (unlikely(boottime)) {
>>> -		/* Direct call during scheduler init and early_initcalls(). */
>>> +	if (unlikely(no_wq)) {
>>> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>>>    		rcu_exp_sel_wait_wake(s);
>>>    	} else {
>>>    		/* Marshall arguments & schedule the expedited grace period. */
>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>    	/* Let the next expedited grace period start. */
>>>    	mutex_unlock(&rcu_state.exp_mutex);
>>> -	if (likely(!boottime))
>>> +	if (likely(!no_wq))
>>>    		destroy_work_on_stack(&rew.rew_work);
>>>    }
>>>    EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
>> Can we reach a condition after this change where no_wq = true and during
>> rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu
>> ?
> Hello, Mukesh, and thank you for looking this over!
>
> At first glance, I do not believe that this can happen because the
> expedited grace-period machinery avoids waiting on the current CPU.
> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id()
> early in the function and the get_cpu() later in the function.)
>
> But please let me know if I am missing something here.
>
> But suppose that we could in fact reach this condition.  What bad thing
> would happen?  Other than a resched_cpu() having been invoked several
> times on a not-yet-online CPU, of course.  ;-)


I thought more about this, what if  synchronize_rcu_expedited thread got 
schedule out and run on some other cpu
and we clear out cpu on which it ran next from exp_mask.

Queuing the work on same cpu ensures that it will always be right cpu to 
clear out.
Do you think this can happen ?

-Mukesh

>
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-11 18:44       ` Mukesh Ojha
@ 2022-02-11 22:14         ` Paul E. McKenney
  2022-02-12  8:47           ` Mukesh Ojha
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-11 22:14 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo

On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote:
> 
> On 2/10/2022 3:36 AM, Paul E. McKenney wrote:
> > On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote:
> > > On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
> > > > Although it is usually safe to invoke synchronize_rcu_expedited() from a
> > > > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> > > > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> > > > invoke a workqueue handler will hang due to RCU waiting on a CPU that
> > > > the scheduler is not paying attention to.  This commit therefore expands
> > > > use of the existing workqueue-independent synchronize_rcu_expedited()
> > > > from early boot to also include CPUs that are being hotplugged.
> > > > 
> > > > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> > > > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
> > > > Cc: Tejun Heo <tj@kernel.org>
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > ---
> > > >    kernel/rcu/tree_exp.h | 14 ++++++++++----
> > > >    1 file changed, 10 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index 60197ea24ceb9..1a45667402260 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > > >     */
> > > >    void synchronize_rcu_expedited(void)
> > > >    {
> > > > -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> > > > +	bool no_wq;
> > > >    	struct rcu_exp_work rew;
> > > >    	struct rcu_node *rnp;
> > > >    	unsigned long s;
> > > > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
> > > >    	if (exp_funnel_lock(s))
> > > >    		return;  /* Someone else did our work for us. */
> > > > +	/* Don't use workqueue during boot or from an incoming CPU. */
> > > > +	preempt_disable();
> > > > +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> > > > +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> > > > +	preempt_enable();
> > > > +
> > > >    	/* Ensure that load happens before action based on it. */
> > > > -	if (unlikely(boottime)) {
> > > > -		/* Direct call during scheduler init and early_initcalls(). */
> > > > +	if (unlikely(no_wq)) {
> > > > +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
> > > >    		rcu_exp_sel_wait_wake(s);
> > > >    	} else {
> > > >    		/* Marshall arguments & schedule the expedited grace period. */
> > > > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
> > > >    	/* Let the next expedited grace period start. */
> > > >    	mutex_unlock(&rcu_state.exp_mutex);
> > > > -	if (likely(!boottime))
> > > > +	if (likely(!no_wq))
> > > >    		destroy_work_on_stack(&rew.rew_work);
> > > >    }
> > > >    EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
> > > Can we reach a condition after this change where no_wq = true and during
> > > rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu
> > > ?
> > Hello, Mukesh, and thank you for looking this over!
> > 
> > At first glance, I do not believe that this can happen because the
> > expedited grace-period machinery avoids waiting on the current CPU.
> > (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id()
> > early in the function and the get_cpu() later in the function.)
> > 
> > But please let me know if I am missing something here.
> > 
> > But suppose that we could in fact reach this condition.  What bad thing
> > would happen?  Other than a resched_cpu() having been invoked several
> > times on a not-yet-online CPU, of course.  ;-)
> 
> 
> I thought more about this, what if  synchronize_rcu_expedited thread got
> schedule out and run on some other cpu
> and we clear out cpu on which it ran next from exp_mask.
> 
> Queuing the work on same cpu ensures that it will always be right cpu to
> clear out.
> Do you think this can happen ?

Indeed it might.

But if it did, the scheduler would invoke RCU's hook, which is named
rcu_note_context_switch(), and do so on the pre-switch CPU.  There are
two implementations for this function, one for CONFIG_PREEMPT=y
and another for CONFIG_PREEMPT=n.  Both look to me like they invoke
rcu_report_exp_rdp() when needed, one directly and the other via the
CONFIG_PREEMPT=n variant of rcu_qs().

Am I missing something?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-11 22:14         ` Paul E. McKenney
@ 2022-02-12  8:47           ` Mukesh Ojha
  2022-02-12 11:28             ` Neeraj Upadhyay
  0 siblings, 1 reply; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-12  8:47 UTC (permalink / raw)
  To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo


On 2/12/2022 3:44 AM, Paul E. McKenney wrote:
> On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote:
>> On 2/10/2022 3:36 AM, Paul E. McKenney wrote:
>>> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote:
>>>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
>>>>> Although it is usually safe to invoke synchronize_rcu_expedited() from a
>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that
>>>>> the scheduler is not paying attention to.  This commit therefore expands
>>>>> use of the existing workqueue-independent synchronize_rcu_expedited()
>>>>> from early boot to also include CPUs that are being hotplugged.
>>>>>
>>>>> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
>>>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>>> ---
>>>>>     kernel/rcu/tree_exp.h | 14 ++++++++++----
>>>>>     1 file changed, 10 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>>> index 60197ea24ceb9..1a45667402260 100644
>>>>> --- a/kernel/rcu/tree_exp.h
>>>>> +++ b/kernel/rcu/tree_exp.h
>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
>>>>>      */
>>>>>     void synchronize_rcu_expedited(void)
>>>>>     {
>>>>> -	bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>>>> +	bool no_wq;
>>>>>     	struct rcu_exp_work rew;
>>>>>     	struct rcu_node *rnp;
>>>>>     	unsigned long s;
>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>>>     	if (exp_funnel_lock(s))
>>>>>     		return;  /* Someone else did our work for us. */
>>>>> +	/* Don't use workqueue during boot or from an incoming CPU. */
>>>>> +	preempt_disable();
>>>>> +	no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>>>> +		!cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>>>> +	preempt_enable();
>>>>> +
>>>>>     	/* Ensure that load happens before action based on it. */
>>>>> -	if (unlikely(boottime)) {
>>>>> -		/* Direct call during scheduler init and early_initcalls(). */
>>>>> +	if (unlikely(no_wq)) {
>>>>> +		/* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */
>>>>>     		rcu_exp_sel_wait_wake(s);
>>>>>     	} else {
>>>>>     		/* Marshall arguments & schedule the expedited grace period. */
>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>>>     	/* Let the next expedited grace period start. */
>>>>>     	mutex_unlock(&rcu_state.exp_mutex);
>>>>> -	if (likely(!boottime))
>>>>> +	if (likely(!no_wq))
>>>>>     		destroy_work_on_stack(&rew.rew_work);
>>>>>     }
>>>>>     EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
>>>> Can we reach a condition after this change where no_wq = true and during
>>>> rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu
>>>> ?
>>> Hello, Mukesh, and thank you for looking this over!
>>>
>>> At first glance, I do not believe that this can happen because the
>>> expedited grace-period machinery avoids waiting on the current CPU.
>>> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id()
>>> early in the function and the get_cpu() later in the function.)
>>>
>>> But please let me know if I am missing something here.
>>>
>>> But suppose that we could in fact reach this condition.  What bad thing
>>> would happen?  Other than a resched_cpu() having been invoked several
>>> times on a not-yet-online CPU, of course.  ;-)
>>
>> I thought more about this, what if  synchronize_rcu_expedited thread got
>> schedule out and run on some other cpu
>> and we clear out cpu on which it ran next from exp_mask.
>>
>> Queuing the work on same cpu ensures that it will always be right cpu to
>> clear out.
>> Do you think this can happen ?
> Indeed it might.
>
> But if it did, the scheduler would invoke RCU's hook, which is named
> rcu_note_context_switch(), and do so on the pre-switch CPU.  There are
> two implementations for this function, one for CONFIG_PREEMPT=y
> and another for CONFIG_PREEMPT=n.  Both look to me like they invoke
> rcu_report_exp_rdp() when needed, one directly and the other via the
> CONFIG_PREEMPT=n variant of rcu_qs().
>
> Am I missing something?
>
> 	

There is a issue we are facing where exp_mask is not getting cleared and 
rcu_stall report that
the cpu we are waiting on sometime in idle and sometime executing some 
other task but
it is not clearing itself from exp_mask from a very long time and in all 
the instances exp_task list is NULL.

    expmask = 8,     ==> cpu3

[80235.522440][T12441] rcu: INFO: rcu_preempt detected expedited stalls 
on CPUs/tasks: { 3-... } 9163622 jiffies s: 634705 root: 0x8/.
[80235.534757][T12441] rcu: blocking rcu_node structures:
[80235.540102][T12441] Task dump for CPU 3:
[80235.540118][T12441] task:core_ctl        state:D stack:    0 pid:  
172 ppid:     2 flags:0x00000008
[80235.540150][T12441] Call trace:
[80235.540178][T12441]  __switch_to+0x2a8/0x3ac
[80235.540207][T12441]  rcu_state+0x11b0/0x1480


[80299.010105][T12441] rcu: INFO: rcu_preempt detected expedited stalls 
on CPUs/tasks: { 3-... } 9179494 jiffies s: 634705 root: 0x8/.
[80299.022623][T12441] rcu: blocking rcu_node structures:
[80299.027924][T12441] Task dump for CPU 3:
[80299.027942][T12441] task:swapper/3       state:R  running task     
stack:    0 pid:    0 ppid:     1 flags:0x00000008
[80299.027993][T12441] Call trace:
[80299.028025][T12441]  __switch_to+0x2a8/0x3ac
[80299.028051][T12441]  0xffffffc010113eb4


As we were not seeing this earlier.
Below is compile tested patch, can we do something like this  ?

==========================================><====================================================

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6453ac5..f0332e4 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -812,10 +812,12 @@ static int rcu_print_task_exp_stall(struct 
rcu_node *rnp)
   */
  void synchronize_rcu_expedited(void)
  {
-    bool no_wq;
+    bool no_wq = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
+    bool is_active;
      struct rcu_exp_work rew;
      struct rcu_node *rnp;
      unsigned long s;
+    int next_cpu;

      RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
               lock_is_held(&rcu_lock_map) ||
@@ -837,19 +839,28 @@ void synchronize_rcu_expedited(void)
      if (exp_funnel_lock(s))
          return;  /* Someone else did our work for us. */

-    /* Don't use workqueue during boot or from an incoming CPU. */
-    preempt_disable();
-    no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
-        !cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
-    preempt_enable();
-
      /* Ensure that load happens before action based on it. */
      if (unlikely(no_wq)) {
-        /* Direct call during scheduler init, early_initcalls() and 
incoming CPUs. */
+        /* Direct call during scheduler init, early_initcalls(). */
          rcu_exp_sel_wait_wake(s);
+        mutex_unlock(&rcu_state.exp_mutex);
+        return;
+    }
+
+    preempt_disable();
+    is_active = cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
+    preempt_enable();
+
+    rew.rew_s = s;
+    if (!is_active) {
+        INIT_WORK(&rew.rew_work, wait_rcu_exp_gp);
+        next_cpu = cpumask_next(smp_processor_id(), cpu_active_mask);
+        if (next_cpu >= nr_cpu_ids)
+            next_cpu = cpumask_first(cpu_active_mask);
+
+        queue_work_on(next_cpu, rcu_gp_wq, &rew.rew_work);
      } else {
          /* Marshall arguments & schedule the expedited grace period. */
-        rew.rew_s = s;
          INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
          queue_work(rcu_gp_wq, &rew.rew_work);
      }
@@ -863,7 +874,9 @@ void synchronize_rcu_expedited(void)
      /* Let the next expedited grace period start. */
      mutex_unlock(&rcu_state.exp_mutex);

-    if (likely(!no_wq))
+    if (likely(is_active))
          destroy_work_on_stack(&rew.rew_work);
+    else
+        flush_work(&rew.rew_work);
  }
  EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
-- 
2.7.4


-Mukesh




> 					Thanx, Paul

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-12  8:47           ` Mukesh Ojha
@ 2022-02-12 11:28             ` Neeraj Upadhyay
  2022-02-12 13:56               ` Mukesh Ojha
  0 siblings, 1 reply; 19+ messages in thread
From: Neeraj Upadhyay @ 2022-02-12 11:28 UTC (permalink / raw)
  To: Mukesh Ojha, paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo

Hi Mukesh,

On 2/12/2022 2:17 PM, Mukesh Ojha wrote:
> 
> On 2/12/2022 3:44 AM, Paul E. McKenney wrote:
>> On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote:
>>> On 2/10/2022 3:36 AM, Paul E. McKenney wrote:
>>>> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote:
>>>>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
>>>>>> Although it is usually safe to invoke synchronize_rcu_expedited() 
>>>>>> from a
>>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a 
>>>>>> notifier
>>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
>>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that
>>>>>> the scheduler is not paying attention to.  This commit therefore 
>>>>>> expands
>>>>>> use of the existing workqueue-independent synchronize_rcu_expedited()
>>>>>> from early boot to also include CPUs that are being hotplugged.
>>>>>>
>>>>>> Link: 
>>>>>> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ 
>>>>>>
>>>>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>>>> ---
>>>>>>     kernel/rcu/tree_exp.h | 14 ++++++++++----
>>>>>>     1 file changed, 10 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>>>> index 60197ea24ceb9..1a45667402260 100644
>>>>>> --- a/kernel/rcu/tree_exp.h
>>>>>> +++ b/kernel/rcu/tree_exp.h
>>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct 
>>>>>> rcu_node *rnp)
>>>>>>      */
>>>>>>     void synchronize_rcu_expedited(void)
>>>>>>     {
>>>>>> -    bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>>>>> +    bool no_wq;
>>>>>>         struct rcu_exp_work rew;
>>>>>>         struct rcu_node *rnp;
>>>>>>         unsigned long s;
>>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>>>>         if (exp_funnel_lock(s))
>>>>>>             return;  /* Someone else did our work for us. */
>>>>>> +    /* Don't use workqueue during boot or from an incoming CPU. */
>>>>>> +    preempt_disable();
>>>>>> +    no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>>>>> +        !cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>>>>> +    preempt_enable();
>>>>>> +
>>>>>>         /* Ensure that load happens before action based on it. */
>>>>>> -    if (unlikely(boottime)) {
>>>>>> -        /* Direct call during scheduler init and 
>>>>>> early_initcalls(). */
>>>>>> +    if (unlikely(no_wq)) {
>>>>>> +        /* Direct call for scheduler init, early_initcall()s, and 
>>>>>> incoming CPUs. */
>>>>>>             rcu_exp_sel_wait_wake(s);
>>>>>>         } else {
>>>>>>             /* Marshall arguments & schedule the expedited grace 
>>>>>> period. */
>>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>>>>         /* Let the next expedited grace period start. */
>>>>>>         mutex_unlock(&rcu_state.exp_mutex);
>>>>>> -    if (likely(!boottime))
>>>>>> +    if (likely(!no_wq))
>>>>>>             destroy_work_on_stack(&rew.rew_work);
>>>>>>     }
>>>>>>     EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
>>>>> Can we reach a condition after this change where no_wq = true and 
>>>>> during
>>>>> rcu_stall report where exp_task = 0 list and exp_mask contain only 
>>>>> this cpu
>>>>> ?
>>>> Hello, Mukesh, and thank you for looking this over!
>>>>
>>>> At first glance, I do not believe that this can happen because the
>>>> expedited grace-period machinery avoids waiting on the current CPU.
>>>> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id()
>>>> early in the function and the get_cpu() later in the function.)
>>>>
>>>> But please let me know if I am missing something here.
>>>>
>>>> But suppose that we could in fact reach this condition.  What bad thing
>>>> would happen?  Other than a resched_cpu() having been invoked several
>>>> times on a not-yet-online CPU, of course.  ;-)
>>>
>>> I thought more about this, what if  synchronize_rcu_expedited thread got
>>> schedule out and run on some other cpu
>>> and we clear out cpu on which it ran next from exp_mask.
>>>
>>> Queuing the work on same cpu ensures that it will always be right cpu to
>>> clear out.
>>> Do you think this can happen ?
>> Indeed it might.
>>
>> But if it did, the scheduler would invoke RCU's hook, which is named
>> rcu_note_context_switch(), and do so on the pre-switch CPU.  There are
>> two implementations for this function, one for CONFIG_PREEMPT=y
>> and another for CONFIG_PREEMPT=n.  Both look to me like they invoke
>> rcu_report_exp_rdp() when needed, one directly and the other via the
>> CONFIG_PREEMPT=n variant of rcu_qs().
>>
>> Am I missing something?
>>
>>
> 
> There is a issue we are facing where exp_mask is not getting cleared and 
> rcu_stall report that
> the cpu we are waiting on sometime in idle and sometime executing some 
> other task but
> it is not clearing itself from exp_mask from a very long time and in all 
> the instances exp_task list is NULL.

Can you please check whether [1] is present in your tree?



Thanks
Neeraj

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/rcu/tree_exp.h?h=v5.17-rc3&id=81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b
> 
>     expmask = 8,     ==> cpu3
> 
> [80235.522440][T12441] rcu: INFO: rcu_preempt detected expedited stalls 
> on CPUs/tasks: { 3-... } 9163622 jiffies s: 634705 root: 0x8/.
> [80235.534757][T12441] rcu: blocking rcu_node structures:
> [80235.540102][T12441] Task dump for CPU 3:
> [80235.540118][T12441] task:core_ctl        state:D stack:    0 pid: 172 
> ppid:     2 flags:0x00000008
> [80235.540150][T12441] Call trace:
> [80235.540178][T12441]  __switch_to+0x2a8/0x3ac
> [80235.540207][T12441]  rcu_state+0x11b0/0x1480
> 
> 
> [80299.010105][T12441] rcu: INFO: rcu_preempt detected expedited stalls 
> on CPUs/tasks: { 3-... } 9179494 jiffies s: 634705 root: 0x8/.
> [80299.022623][T12441] rcu: blocking rcu_node structures:
> [80299.027924][T12441] Task dump for CPU 3:
> [80299.027942][T12441] task:swapper/3       state:R  running task 
> stack:    0 pid:    0 ppid:     1 flags:0x00000008
> [80299.027993][T12441] Call trace:
> [80299.028025][T12441]  __switch_to+0x2a8/0x3ac
> [80299.028051][T12441]  0xffffffc010113eb4
> 
> 
> As we were not seeing this earlier.
> Below is compile tested patch, can we do something like this  ?
> 
> ==========================================><==================================================== 
> 
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 6453ac5..f0332e4 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -812,10 +812,12 @@ static int rcu_print_task_exp_stall(struct 
> rcu_node *rnp)
>    */
>   void synchronize_rcu_expedited(void)
>   {
> -    bool no_wq;
> +    bool no_wq = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
> +    bool is_active;
>       struct rcu_exp_work rew;
>       struct rcu_node *rnp;
>       unsigned long s;
> +    int next_cpu;
> 
>       RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
>                lock_is_held(&rcu_lock_map) ||
> @@ -837,19 +839,28 @@ void synchronize_rcu_expedited(void)
>       if (exp_funnel_lock(s))
>           return;  /* Someone else did our work for us. */
> 
> -    /* Don't use workqueue during boot or from an incoming CPU. */
> -    preempt_disable();
> -    no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
> -        !cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> -    preempt_enable();
> -
>       /* Ensure that load happens before action based on it. */
>       if (unlikely(no_wq)) {
> -        /* Direct call during scheduler init, early_initcalls() and 
> incoming CPUs. */
> +        /* Direct call during scheduler init, early_initcalls(). */
>           rcu_exp_sel_wait_wake(s);
> +        mutex_unlock(&rcu_state.exp_mutex);
> +        return;
> +    }
> +
> +    preempt_disable();
> +    is_active = cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
> +    preempt_enable();
> +
> +    rew.rew_s = s;
> +    if (!is_active) {
> +        INIT_WORK(&rew.rew_work, wait_rcu_exp_gp);
> +        next_cpu = cpumask_next(smp_processor_id(), cpu_active_mask);
> +        if (next_cpu >= nr_cpu_ids)
> +            next_cpu = cpumask_first(cpu_active_mask);
> +
> +        queue_work_on(next_cpu, rcu_gp_wq, &rew.rew_work);
>       } else {
>           /* Marshall arguments & schedule the expedited grace period. */
> -        rew.rew_s = s;
>           INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
>           queue_work(rcu_gp_wq, &rew.rew_work);
>       }
> @@ -863,7 +874,9 @@ void synchronize_rcu_expedited(void)
>       /* Let the next expedited grace period start. */
>       mutex_unlock(&rcu_state.exp_mutex);
> 
> -    if (likely(!no_wq))
> +    if (likely(is_active))
>           destroy_work_on_stack(&rew.rew_work);
> +    else
> +        flush_work(&rew.rew_work);
>   }
>   EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-12 11:28             ` Neeraj Upadhyay
@ 2022-02-12 13:56               ` Mukesh Ojha
  0 siblings, 0 replies; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-12 13:56 UTC (permalink / raw)
  To: Neeraj Upadhyay, paulmck
  Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo


On 2/12/2022 4:58 PM, Neeraj Upadhyay wrote:
> Hi Mukesh,
>
> On 2/12/2022 2:17 PM, Mukesh Ojha wrote:
>>
>> On 2/12/2022 3:44 AM, Paul E. McKenney wrote:
>>> On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote:
>>>> On 2/10/2022 3:36 AM, Paul E. McKenney wrote:
>>>>> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote:
>>>>>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote:
>>>>>>> Although it is usually safe to invoke 
>>>>>>> synchronize_rcu_expedited() from a
>>>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a 
>>>>>>> notifier
>>>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its 
>>>>>>> attempts to
>>>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU 
>>>>>>> that
>>>>>>> the scheduler is not paying attention to.  This commit therefore 
>>>>>>> expands
>>>>>>> use of the existing workqueue-independent 
>>>>>>> synchronize_rcu_expedited()
>>>>>>> from early boot to also include CPUs that are being hotplugged.
>>>>>>>
>>>>>>> Link: 
>>>>>>> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ 
>>>>>>>
>>>>>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>>>>> ---
>>>>>>>     kernel/rcu/tree_exp.h | 14 ++++++++++----
>>>>>>>     1 file changed, 10 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>>>>>> index 60197ea24ceb9..1a45667402260 100644
>>>>>>> --- a/kernel/rcu/tree_exp.h
>>>>>>> +++ b/kernel/rcu/tree_exp.h
>>>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct 
>>>>>>> rcu_node *rnp)
>>>>>>>      */
>>>>>>>     void synchronize_rcu_expedited(void)
>>>>>>>     {
>>>>>>> -    bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>>>>>>> +    bool no_wq;
>>>>>>>         struct rcu_exp_work rew;
>>>>>>>         struct rcu_node *rnp;
>>>>>>>         unsigned long s;
>>>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void)
>>>>>>>         if (exp_funnel_lock(s))
>>>>>>>             return;  /* Someone else did our work for us. */
>>>>>>> +    /* Don't use workqueue during boot or from an incoming CPU. */
>>>>>>> +    preempt_disable();
>>>>>>> +    no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>>>>>>> +        !cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>>>>>>> +    preempt_enable();
>>>>>>> +
>>>>>>>         /* Ensure that load happens before action based on it. */
>>>>>>> -    if (unlikely(boottime)) {
>>>>>>> -        /* Direct call during scheduler init and 
>>>>>>> early_initcalls(). */
>>>>>>> +    if (unlikely(no_wq)) {
>>>>>>> +        /* Direct call for scheduler init, early_initcall()s, 
>>>>>>> and incoming CPUs. */
>>>>>>>             rcu_exp_sel_wait_wake(s);
>>>>>>>         } else {
>>>>>>>             /* Marshall arguments & schedule the expedited grace 
>>>>>>> period. */
>>>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void)
>>>>>>>         /* Let the next expedited grace period start. */
>>>>>>>         mutex_unlock(&rcu_state.exp_mutex);
>>>>>>> -    if (likely(!boottime))
>>>>>>> +    if (likely(!no_wq))
>>>>>>>             destroy_work_on_stack(&rew.rew_work);
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
>>>>>> Can we reach a condition after this change where no_wq = true and 
>>>>>> during
>>>>>> rcu_stall report where exp_task = 0 list and exp_mask contain 
>>>>>> only this cpu
>>>>>> ?
>>>>> Hello, Mukesh, and thank you for looking this over!
>>>>>
>>>>> At first glance, I do not believe that this can happen because the
>>>>> expedited grace-period machinery avoids waiting on the current CPU.
>>>>> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id()
>>>>> early in the function and the get_cpu() later in the function.)
>>>>>
>>>>> But please let me know if I am missing something here.
>>>>>
>>>>> But suppose that we could in fact reach this condition. What bad 
>>>>> thing
>>>>> would happen?  Other than a resched_cpu() having been invoked several
>>>>> times on a not-yet-online CPU, of course.  ;-)
>>>>
>>>> I thought more about this, what if synchronize_rcu_expedited thread 
>>>> got
>>>> schedule out and run on some other cpu
>>>> and we clear out cpu on which it ran next from exp_mask.
>>>>
>>>> Queuing the work on same cpu ensures that it will always be right 
>>>> cpu to
>>>> clear out.
>>>> Do you think this can happen ?
>>> Indeed it might.
>>>
>>> But if it did, the scheduler would invoke RCU's hook, which is named
>>> rcu_note_context_switch(), and do so on the pre-switch CPU. There are
>>> two implementations for this function, one for CONFIG_PREEMPT=y
>>> and another for CONFIG_PREEMPT=n.  Both look to me like they invoke
>>> rcu_report_exp_rdp() when needed, one directly and the other via the
>>> CONFIG_PREEMPT=n variant of rcu_qs().
>>>
>>> Am I missing something?
>>>
>>>
>>
>> There is a issue we are facing where exp_mask is not getting cleared 
>> and rcu_stall report that
>> the cpu we are waiting on sometime in idle and sometime executing 
>> some other task but
>> it is not clearing itself from exp_mask from a very long time and in 
>> all the instances exp_task list is NULL.
>
> Can you please check whether [1] is present in your tree?
>
Thanks Neeraj.
It is not there, will check the results with this patch.

-Mukesh

>
>
> Thanks
> Neeraj
>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/rcu/tree_exp.h?h=v5.17-rc3&id=81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b
>>
>>     expmask = 8,     ==> cpu3
>>
>> [80235.522440][T12441] rcu: INFO: rcu_preempt detected expedited 
>> stalls on CPUs/tasks: { 3-... } 9163622 jiffies s: 634705 root: 0x8/.
>> [80235.534757][T12441] rcu: blocking rcu_node structures:
>> [80235.540102][T12441] Task dump for CPU 3:
>> [80235.540118][T12441] task:core_ctl        state:D stack:    0 pid: 
>> 172 ppid:     2 flags:0x00000008
>> [80235.540150][T12441] Call trace:
>> [80235.540178][T12441]  __switch_to+0x2a8/0x3ac
>> [80235.540207][T12441]  rcu_state+0x11b0/0x1480
>>
>>
>> [80299.010105][T12441] rcu: INFO: rcu_preempt detected expedited 
>> stalls on CPUs/tasks: { 3-... } 9179494 jiffies s: 634705 root: 0x8/.
>> [80299.022623][T12441] rcu: blocking rcu_node structures:
>> [80299.027924][T12441] Task dump for CPU 3:
>> [80299.027942][T12441] task:swapper/3       state:R  running task 
>> stack:    0 pid:    0 ppid:     1 flags:0x00000008
>> [80299.027993][T12441] Call trace:
>> [80299.028025][T12441]  __switch_to+0x2a8/0x3ac
>> [80299.028051][T12441]  0xffffffc010113eb4
>>
>>
>> As we were not seeing this earlier.
>> Below is compile tested patch, can we do something like this  ?
>>
>> ==========================================><==================================================== 
>>
>>
>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>> index 6453ac5..f0332e4 100644
>> --- a/kernel/rcu/tree_exp.h
>> +++ b/kernel/rcu/tree_exp.h
>> @@ -812,10 +812,12 @@ static int rcu_print_task_exp_stall(struct 
>> rcu_node *rnp)
>>    */
>>   void synchronize_rcu_expedited(void)
>>   {
>> -    bool no_wq;
>> +    bool no_wq = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
>> +    bool is_active;
>>       struct rcu_exp_work rew;
>>       struct rcu_node *rnp;
>>       unsigned long s;
>> +    int next_cpu;
>>
>>       RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
>>                lock_is_held(&rcu_lock_map) ||
>> @@ -837,19 +839,28 @@ void synchronize_rcu_expedited(void)
>>       if (exp_funnel_lock(s))
>>           return;  /* Someone else did our work for us. */
>>
>> -    /* Don't use workqueue during boot or from an incoming CPU. */
>> -    preempt_disable();
>> -    no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT ||
>> -        !cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>> -    preempt_enable();
>> -
>>       /* Ensure that load happens before action based on it. */
>>       if (unlikely(no_wq)) {
>> -        /* Direct call during scheduler init, early_initcalls() and 
>> incoming CPUs. */
>> +        /* Direct call during scheduler init, early_initcalls(). */
>>           rcu_exp_sel_wait_wake(s);
>> +        mutex_unlock(&rcu_state.exp_mutex);
>> +        return;
>> +    }
>> +
>> +    preempt_disable();
>> +    is_active = cpumask_test_cpu(smp_processor_id(), cpu_active_mask);
>> +    preempt_enable();
>> +
>> +    rew.rew_s = s;
>> +    if (!is_active) {
>> +        INIT_WORK(&rew.rew_work, wait_rcu_exp_gp);
>> +        next_cpu = cpumask_next(smp_processor_id(), cpu_active_mask);
>> +        if (next_cpu >= nr_cpu_ids)
>> +            next_cpu = cpumask_first(cpu_active_mask);
>> +
>> +        queue_work_on(next_cpu, rcu_gp_wq, &rew.rew_work);
>>       } else {
>>           /* Marshall arguments & schedule the expedited grace 
>> period. */
>> -        rew.rew_s = s;
>>           INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
>>           queue_work(rcu_gp_wq, &rew.rew_work);
>>       }
>> @@ -863,7 +874,9 @@ void synchronize_rcu_expedited(void)
>>       /* Let the next expedited grace period start. */
>>       mutex_unlock(&rcu_state.exp_mutex);
>>
>> -    if (likely(!no_wq))
>> +    if (likely(is_active))
>>           destroy_work_on_stack(&rew.rew_work);
>> +    else
>> +        flush_work(&rew.rew_work);
>>   }
>>   EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-18 17:33       ` Mukesh Ojha
@ 2022-02-18 18:09         ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-18 18:09 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: kernel-team, linux-kernel, rcu, rostedt, tj, Peter Zijlstra

On Fri, Feb 18, 2022 at 11:03:03PM +0530, Mukesh Ojha wrote:
> 
> On 2/15/2022 11:09 PM, Paul E. McKenney wrote:
> > On Tue, Feb 15, 2022 at 07:53:10PM +0530, Mukesh Ojha wrote:
> > > On 2/14/2022 10:14 PM, Paul E. McKenney wrote:
> > > > On Thu, Feb 10, 2022 at 12:38:11AM +0100, Frederic Weisbecker wrote:
> > > > > On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
> > > > > > Although it is usually safe to invoke synchronize_rcu_expedited() from a
> > > > > > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> > > > > > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> > > > > > invoke a workqueue handler will hang due to RCU waiting on a CPU that
> > > > > > the scheduler is not paying attention to.  This commit therefore expands
> > > > > > use of the existing workqueue-independent synchronize_rcu_expedited()
> > > > > > from early boot to also include CPUs that are being hotplugged.
> > > > > > 
> > > > > > Link:https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> > > > > > Reported-by: Mukesh Ojha<quic_mojha@quicinc.com>
> > > > > > Cc: Tejun Heo<tj@kernel.org>
> > > > > > Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
> > > > > I'm surprised by this scheduler behaviour.
> > > > > 
> > > > > Since sched_cpu_activate() hasn't been called yet,
> > > > > rq->balance_callback = balance_push_callback. As a result, balance_push() should
> > > > > be called at the end of schedule() when the workqueue is picked as the next task.
> > > > > Then eventually the workqueue should be immediately preempted by the stop task to
> > > > > be migrated elsewhere.
> > > > > 
> > > > > So I must be missing something. For the fun, I booted the following and it
> > > > > didn't produce any issue:
> > > > > 
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 80faf2273ce9..b1e74a508881 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
> > > > >    	// Stop-machine done, so allow nohz_full to disable tick.
> > > > >    	tick_dep_clear(TICK_DEP_BIT_RCU);
> > > > > +	if (cpu != 0)
> > > > > +		synchronize_rcu_expedited();
> > > > >    	return 0;
> > > > >    }
> > > > That does seem compelling.  And others have argued that the workqueue
> > > > system's handling of offline CPUs should deal with this.
> > > > 
> > > > Mukesh, was this a theoretical bug, or did you actually make it happen?
> > > > If you made it happen, as seems to have been the case given your original
> > > > email [1], could you please post your reproducer?
> > > No, it was not theoretical one. We saw this issue only once in our testing
> > > and i don't think it is easy to reproduce otherwise
> > > it would been fixed by now.
> > > 
> > > When one of thread calling synchronize_expedite_rcu with timer of 20s but it
> > > did not get the exp funnel
> > > lock for 20s and there we crash it with panic() on timeout.
> > > 
> > > The other thread cpuhp which was having the lock got stuck at the point
> > > mentioned at the below link.
> > > 
> > > https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> > OK.  Are you able to create an in-kernel reproducer, perhaps similar to
> > Frederic's change above?
> > 
> > I am worried that the patch that I am carrying might be fixing some
> > other bug by accident...
> 
> Just for information, we are running on 5.10 kernel and after numerous
> attempt, i was not able to reproduce the issue:-)

Thank you for checking!

I will drop this commit from -rcu's "dev" branch, but leave a tag
"exponl.2022.02.18a" should it ever prove necessary.

							Thanx, Paul

> Thanks,
> -Mukesh
> 
> > 
> > 							Thanx, Paul
> > 
> > > e.g Below sample test in combination of many other test in parallel
> > > 
> > > :loop
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu0/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu1/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu3/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu4/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu4/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu5/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu5/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu6/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu6/online"
> > > 
> > > adb shell "echo 0 > /sys/devices/system/cpu/cpu7/online"
> > > 
> > > adb shell "echo 1 > /sys/devices/system/cpu/cpu7/online"
> > > 
> > > goto loop
> > > 
> > > 
> > > 
> > > Thanks, Mukesh
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > [1]https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-15 17:39     ` Paul E. McKenney
  2022-02-17 16:13       ` Mukesh Ojha
@ 2022-02-18 17:33       ` Mukesh Ojha
  2022-02-18 18:09         ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-18 17:33 UTC (permalink / raw)
  To: paulmck; +Cc: kernel-team, linux-kernel, rcu, rostedt, tj, Peter Zijlstra


On 2/15/2022 11:09 PM, Paul E. McKenney wrote:
> On Tue, Feb 15, 2022 at 07:53:10PM +0530, Mukesh Ojha wrote:
>> On 2/14/2022 10:14 PM, Paul E. McKenney wrote:
>>> On Thu, Feb 10, 2022 at 12:38:11AM +0100, Frederic Weisbecker wrote:
>>>> On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
>>>>> Although it is usually safe to invoke synchronize_rcu_expedited() from a
>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that
>>>>> the scheduler is not paying attention to.  This commit therefore expands
>>>>> use of the existing workqueue-independent synchronize_rcu_expedited()
>>>>> from early boot to also include CPUs that are being hotplugged.
>>>>>
>>>>> Link:https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
>>>>> Reported-by: Mukesh Ojha<quic_mojha@quicinc.com>
>>>>> Cc: Tejun Heo<tj@kernel.org>
>>>>> Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
>>>> I'm surprised by this scheduler behaviour.
>>>>
>>>> Since sched_cpu_activate() hasn't been called yet,
>>>> rq->balance_callback = balance_push_callback. As a result, balance_push() should
>>>> be called at the end of schedule() when the workqueue is picked as the next task.
>>>> Then eventually the workqueue should be immediately preempted by the stop task to
>>>> be migrated elsewhere.
>>>>
>>>> So I must be missing something. For the fun, I booted the following and it
>>>> didn't produce any issue:
>>>>
>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>>> index 80faf2273ce9..b1e74a508881 100644
>>>> --- a/kernel/rcu/tree.c
>>>> +++ b/kernel/rcu/tree.c
>>>> @@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
>>>>    	// Stop-machine done, so allow nohz_full to disable tick.
>>>>    	tick_dep_clear(TICK_DEP_BIT_RCU);
>>>> +	if (cpu != 0)
>>>> +		synchronize_rcu_expedited();
>>>>    	return 0;
>>>>    }
>>> That does seem compelling.  And others have argued that the workqueue
>>> system's handling of offline CPUs should deal with this.
>>>
>>> Mukesh, was this a theoretical bug, or did you actually make it happen?
>>> If you made it happen, as seems to have been the case given your original
>>> email [1], could you please post your reproducer?
>> No, it was not theoretical one. We saw this issue only once in our testing
>> and i don't think it is easy to reproduce otherwise
>> it would been fixed by now.
>>
>> When one of thread calling synchronize_expedite_rcu with timer of 20s but it
>> did not get the exp funnel
>> lock for 20s and there we crash it with panic() on timeout.
>>
>> The other thread cpuhp which was having the lock got stuck at the point
>> mentioned at the below link.
>>
>> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> OK.  Are you able to create an in-kernel reproducer, perhaps similar to
> Frederic's change above?
>
> I am worried that the patch that I am carrying might be fixing some
> other bug by accident...

Just for information, we are running on 5.10 kernel and after numerous 
attempt, i was not able to reproduce the issue:-)

Thanks,
-Mukesh

>
> 							Thanx, Paul
>
>> e.g Below sample test in combination of many other test in parallel
>>
>> :loop
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu0/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu1/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu3/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu4/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu4/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu5/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu5/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu6/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu6/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu7/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu7/online"
>>
>> goto loop
>>
>>
>>
>> Thanks, Mukesh
>>
>>> 							Thanx, Paul
>>>
>>> [1]https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-15 17:39     ` Paul E. McKenney
@ 2022-02-17 16:13       ` Mukesh Ojha
  2022-02-18 17:33       ` Mukesh Ojha
  1 sibling, 0 replies; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-17 16:13 UTC (permalink / raw)
  To: paulmck; +Cc: kernel-team, linux-kernel, rcu, rostedt, tj, Peter Zijlstra


On 2/15/2022 11:09 PM, Paul E. McKenney wrote:
> On Tue, Feb 15, 2022 at 07:53:10PM +0530, Mukesh Ojha wrote:
>> On 2/14/2022 10:14 PM, Paul E. McKenney wrote:
>>> On Thu, Feb 10, 2022 at 12:38:11AM +0100, Frederic Weisbecker wrote:
>>>> On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
>>>>> Although it is usually safe to invoke synchronize_rcu_expedited() from a
>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that
>>>>> the scheduler is not paying attention to.  This commit therefore expands
>>>>> use of the existing workqueue-independent synchronize_rcu_expedited()
>>>>> from early boot to also include CPUs that are being hotplugged.
>>>>>
>>>>> Link:https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
>>>>> Reported-by: Mukesh Ojha<quic_mojha@quicinc.com>
>>>>> Cc: Tejun Heo<tj@kernel.org>
>>>>> Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
>>>> I'm surprised by this scheduler behaviour.
>>>>
>>>> Since sched_cpu_activate() hasn't been called yet,
>>>> rq->balance_callback = balance_push_callback. As a result, balance_push() should
>>>> be called at the end of schedule() when the workqueue is picked as the next task.
>>>> Then eventually the workqueue should be immediately preempted by the stop task to
>>>> be migrated elsewhere.
>>>>
>>>> So I must be missing something. For the fun, I booted the following and it
>>>> didn't produce any issue:
>>>>
>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>>> index 80faf2273ce9..b1e74a508881 100644
>>>> --- a/kernel/rcu/tree.c
>>>> +++ b/kernel/rcu/tree.c
>>>> @@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
>>>>    	// Stop-machine done, so allow nohz_full to disable tick.
>>>>    	tick_dep_clear(TICK_DEP_BIT_RCU);
>>>> +	if (cpu != 0)
>>>> +		synchronize_rcu_expedited();
>>>>    	return 0;
>>>>    }
>>> That does seem compelling.  And others have argued that the workqueue
>>> system's handling of offline CPUs should deal with this.
>>>
>>> Mukesh, was this a theoretical bug, or did you actually make it happen?
>>> If you made it happen, as seems to have been the case given your original
>>> email [1], could you please post your reproducer?
>> No, it was not theoretical one. We saw this issue only once in our testing
>> and i don't think it is easy to reproduce otherwise
>> it would been fixed by now.
>>
>> When one of thread calling synchronize_expedite_rcu with timer of 20s but it
>> did not get the exp funnel
>> lock for 20s and there we crash it with panic() on timeout.
>>
>> The other thread cpuhp which was having the lock got stuck at the point
>> mentioned at the below link.
>>
>> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> OK.  Are you able to create an in-kernel reproducer, perhaps similar to
> Frederic's change above?
>
> I am worried that the patch that I am carrying might be fixing some
> other bug by accident...
>
I have started overnight test to reproduce this. let me see if we hit this.
if not, feel free to take decision on this patch.

Thanks,
-Mukesh


> 							Thanx, Paul
>
>> e.g Below sample test in combination of many other test in parallel
>>
>> :loop
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu0/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu1/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu3/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu4/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu4/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu5/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu5/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu6/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu6/online"
>>
>> adb shell "echo 0 > /sys/devices/system/cpu/cpu7/online"
>>
>> adb shell "echo 1 > /sys/devices/system/cpu/cpu7/online"
>>
>> goto loop
>>
>>
>>
>> Thanks, Mukesh
>>
>>> 							Thanx, Paul
>>>
>>> [1]https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
       [not found]   ` <f8cff19c-5e8f-a7ed-c2ff-49a264b4e342@quicinc.com>
@ 2022-02-15 17:39     ` Paul E. McKenney
  2022-02-17 16:13       ` Mukesh Ojha
  2022-02-18 17:33       ` Mukesh Ojha
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-15 17:39 UTC (permalink / raw)
  To: Mukesh Ojha; +Cc: kernel-team, linux-kernel, rcu, rostedt, tj, Peter Zijlstra

On Tue, Feb 15, 2022 at 07:53:10PM +0530, Mukesh Ojha wrote:
> 
> On 2/14/2022 10:14 PM, Paul E. McKenney wrote:
> > On Thu, Feb 10, 2022 at 12:38:11AM +0100, Frederic Weisbecker wrote:
> > > On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
> > > > Although it is usually safe to invoke synchronize_rcu_expedited() from a
> > > > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> > > > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> > > > invoke a workqueue handler will hang due to RCU waiting on a CPU that
> > > > the scheduler is not paying attention to.  This commit therefore expands
> > > > use of the existing workqueue-independent synchronize_rcu_expedited()
> > > > from early boot to also include CPUs that are being hotplugged.
> > > > 
> > > > Link:https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> > > > Reported-by: Mukesh Ojha<quic_mojha@quicinc.com>
> > > > Cc: Tejun Heo<tj@kernel.org>
> > > > Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
> > > I'm surprised by this scheduler behaviour.
> > > 
> > > Since sched_cpu_activate() hasn't been called yet,
> > > rq->balance_callback = balance_push_callback. As a result, balance_push() should
> > > be called at the end of schedule() when the workqueue is picked as the next task.
> > > Then eventually the workqueue should be immediately preempted by the stop task to
> > > be migrated elsewhere.
> > > 
> > > So I must be missing something. For the fun, I booted the following and it
> > > didn't produce any issue:
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 80faf2273ce9..b1e74a508881 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
> > >   	// Stop-machine done, so allow nohz_full to disable tick.
> > >   	tick_dep_clear(TICK_DEP_BIT_RCU);
> > > +	if (cpu != 0)
> > > +		synchronize_rcu_expedited();
> > >   	return 0;
> > >   }
> > That does seem compelling.  And others have argued that the workqueue
> > system's handling of offline CPUs should deal with this.
> > 
> > Mukesh, was this a theoretical bug, or did you actually make it happen?
> > If you made it happen, as seems to have been the case given your original
> > email [1], could you please post your reproducer?
> 
> No, it was not theoretical one. We saw this issue only once in our testing
> and i don't think it is easy to reproduce otherwise
> it would been fixed by now.
> 
> When one of thread calling synchronize_expedite_rcu with timer of 20s but it
> did not get the exp funnel
> lock for 20s and there we crash it with panic() on timeout.
> 
> The other thread cpuhp which was having the lock got stuck at the point
> mentioned at the below link.
> 
> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

OK.  Are you able to create an in-kernel reproducer, perhaps similar to
Frederic's change above?

I am worried that the patch that I am carrying might be fixing some
other bug by accident...

							Thanx, Paul

> e.g Below sample test in combination of many other test in parallel
> 
> :loop
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu0/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu1/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu3/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu4/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu4/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu5/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu5/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu6/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu6/online"
> 
> adb shell "echo 0 > /sys/devices/system/cpu/cpu7/online"
> 
> adb shell "echo 1 > /sys/devices/system/cpu/cpu7/online"
> 
> goto loop
> 
> 
> 
> Thanks, Mukesh
> 
> > 
> > 							Thanx, Paul
> > 
> > [1]https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-14 16:44 ` Paul E. McKenney
@ 2022-02-15 14:28   ` Mukesh Ojha
       [not found]   ` <f8cff19c-5e8f-a7ed-c2ff-49a264b4e342@quicinc.com>
  1 sibling, 0 replies; 19+ messages in thread
From: Mukesh Ojha @ 2022-02-15 14:28 UTC (permalink / raw)
  To: paulmck, kernel-team, linux-kernel, rcu, rostedt, tj, Peter Zijlstra


On 2/14/2022 10:14 PM, Paul E. McKenney wrote:
> On Thu, Feb 10, 2022 at 12:38:11AM +0100, Frederic Weisbecker wrote:
>> On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
>>> Although it is usually safe to invoke synchronize_rcu_expedited() from a
>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that
>>> the scheduler is not paying attention to.  This commit therefore expands
>>> use of the existing workqueue-independent synchronize_rcu_expedited()
>>> from early boot to also include CPUs that are being hotplugged.
>>>
>>> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>> I'm surprised by this scheduler behaviour.
>>
>> Since sched_cpu_activate() hasn't been called yet,
>> rq->balance_callback = balance_push_callback. As a result, balance_push() should
>> be called at the end of schedule() when the workqueue is picked as the next task.
>> Then eventually the workqueue should be immediately preempted by the stop task to
>> be migrated elsewhere.
>>
>> So I must be missing something. For the fun, I booted the following and it
>> didn't produce any issue:
>>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index 80faf2273ce9..b1e74a508881 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
>>   
>>   	// Stop-machine done, so allow nohz_full to disable tick.
>>   	tick_dep_clear(TICK_DEP_BIT_RCU);
>> +	if (cpu != 0)
>> +		synchronize_rcu_expedited();
>>   	return 0;
>>   }
> That does seem compelling.  And others have argued that the workqueue
> system's handling of offline CPUs should deal with this.
>
> Mukesh, was this a theoretical bug, or did you actually make it happen?
> If you made it happen, as seems to have been the case given your original
> email [1], could you please post your reproducer?
>
> 							Thanx, Paul


No, it was not theoretical one. We saw this issue only once in our 
testing and i don't think it is easy to reproduce otherwise
it would been fixed by now.

When one of thread calling synchronize_expedite_rcu with timer of 20s 
but it did not get the exp funnel
lock for 20s and there we crash it with panic() on timeout.

The other thread cpuhp which was having the lock got stuck at the point 
mentioned at the below link.

> [1] https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

Below sample test in combination of many other tests

:loop
adb shell "echo 0 > /sys/devices/system/cpu/cpu0/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu0/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu1/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu1/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu2/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu2/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu3/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu3/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu4/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu4/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu5/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu5/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu6/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu6/online"
adb shell "echo 0 > /sys/devices/system/cpu/cpu7/online"
adb shell "echo 1 > /sys/devices/system/cpu/cpu7/online"
goto loop

-Mukesh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
  2022-02-09 23:38 Frederic Weisbecker
@ 2022-02-14 16:44 ` Paul E. McKenney
  2022-02-15 14:28   ` Mukesh Ojha
       [not found]   ` <f8cff19c-5e8f-a7ed-c2ff-49a264b4e342@quicinc.com>
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-02-14 16:44 UTC (permalink / raw)
  To: kernel-team, linux-kernel, quic_mojha, rcu, rostedt, tj, Peter Zijlstra

On Thu, Feb 10, 2022 at 12:38:11AM +0100, Frederic Weisbecker wrote:
> On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
> > Although it is usually safe to invoke synchronize_rcu_expedited() from a
> > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> > invoke a workqueue handler will hang due to RCU waiting on a CPU that
> > the scheduler is not paying attention to.  This commit therefore expands
> > use of the existing workqueue-independent synchronize_rcu_expedited()
> > from early boot to also include CPUs that are being hotplugged.
> > 
> > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
> > Cc: Tejun Heo <tj@kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> I'm surprised by this scheduler behaviour.
> 
> Since sched_cpu_activate() hasn't been called yet,
> rq->balance_callback = balance_push_callback. As a result, balance_push() should
> be called at the end of schedule() when the workqueue is picked as the next task.
> Then eventually the workqueue should be immediately preempted by the stop task to
> be migrated elsewhere.
> 
> So I must be missing something. For the fun, I booted the following and it
> didn't produce any issue:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 80faf2273ce9..b1e74a508881 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
>  
>  	// Stop-machine done, so allow nohz_full to disable tick.
>  	tick_dep_clear(TICK_DEP_BIT_RCU);
> +	if (cpu != 0)
> +		synchronize_rcu_expedited();
>  	return 0;
>  }

That does seem compelling.  And others have argued that the workqueue
system's handling of offline CPUs should deal with this.

Mukesh, was this a theoretical bug, or did you actually make it happen?
If you made it happen, as seems to have been the case given your original
email [1], could you please post your reproducer?

							Thanx, Paul

[1] https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs
@ 2022-02-09 23:38 Frederic Weisbecker
  2022-02-14 16:44 ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2022-02-09 23:38 UTC (permalink / raw)
  To: paulmck
  Cc: kernel-team, linux-kernel, quic_mojha, rcu, rostedt, tj, Peter Zijlstra

On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
> Although it is usually safe to invoke synchronize_rcu_expedited() from a
> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> invoke a workqueue handler will hang due to RCU waiting on a CPU that
> the scheduler is not paying attention to.  This commit therefore expands
> use of the existing workqueue-independent synchronize_rcu_expedited()
> from early boot to also include CPUs that are being hotplugged.
> 
> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
> Cc: Tejun Heo <tj@kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

I'm surprised by this scheduler behaviour.

Since sched_cpu_activate() hasn't been called yet,
rq->balance_callback = balance_push_callback. As a result, balance_push() should
be called at the end of schedule() when the workqueue is picked as the next task.
Then eventually the workqueue should be immediately preempted by the stop task to
be migrated elsewhere.

So I must be missing something. For the fun, I booted the following and it
didn't produce any issue:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 80faf2273ce9..b1e74a508881 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
 
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
+	if (cpu != 0)
+		synchronize_rcu_expedited();
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-02-18 18:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney
2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney
2022-02-04 22:55 ` [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() Paul E. McKenney
2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney
2022-02-08 18:56   ` Tejun Heo
2022-02-09 18:23   ` Mukesh Ojha
2022-02-09 22:06     ` Paul E. McKenney
2022-02-11 18:44       ` Mukesh Ojha
2022-02-11 22:14         ` Paul E. McKenney
2022-02-12  8:47           ` Mukesh Ojha
2022-02-12 11:28             ` Neeraj Upadhyay
2022-02-12 13:56               ` Mukesh Ojha
2022-02-09 23:38 Frederic Weisbecker
2022-02-14 16:44 ` Paul E. McKenney
2022-02-15 14:28   ` Mukesh Ojha
     [not found]   ` <f8cff19c-5e8f-a7ed-c2ff-49a264b4e342@quicinc.com>
2022-02-15 17:39     ` Paul E. McKenney
2022-02-17 16:13       ` Mukesh Ojha
2022-02-18 17:33       ` Mukesh Ojha
2022-02-18 18:09         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).