* [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 @ 2022-02-04 22:54 Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Paul E. McKenney @ 2022-02-04 22:54 UTC (permalink / raw) To: rcu; +Cc: linux-kernel, kernel-team, rostedt Hello! This series provides updates for RCU expedited grace periods: 1. Fix check for idle context in rcu_exp_handler, courtesy of Neeraj Upadhyay. 2. Mark ->expmask access in synchronize_rcu_expedited_wait(). 3. Allow expedited RCU grace periods on incoming CPUs. Thanx, Paul ------------------------------------------------------------------------ b/kernel/rcu/tree_exp.h | 2 +- kernel/rcu/tree_exp.h | 17 ++++++++++++----- 2 files changed, 13 insertions(+), 6 deletions(-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler 2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney @ 2022-02-04 22:55 ` Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney 2 siblings, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2022-02-04 22:55 UTC (permalink / raw) To: rcu Cc: linux-kernel, kernel-team, rostedt, Neeraj Upadhyay, Frederic Weisbecker, Paul E . McKenney From: Neeraj Upadhyay <quic_neeraju@quicinc.com> For PREEMPT_RCU, the rcu_exp_handler() function checks whether the current CPU is in idle, by calling rcu_dynticks_curr_cpu_in_eqs(). However, rcu_exp_handler() is called in IPI handler context. So, it should be checking the idle context using rcu_is_cpu_rrupt_from_idle(). Fix this by using rcu_is_cpu_rrupt_from_idle() instead of rcu_dynticks_curr_cpu_in_eqs(). Non-preempt configuration already uses the correct check. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> --- kernel/rcu/tree_exp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 237a79989abae..1568c8ef185b2 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -656,7 +656,7 @@ static void rcu_exp_handler(void *unused) */ if (!depth) { if (!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)) || - rcu_dynticks_curr_cpu_in_eqs()) { + rcu_is_cpu_rrupt_from_idle()) { rcu_report_exp_rdp(rdp); } else { WRITE_ONCE(rdp->cpu_no_qs.b.exp, true); -- 2.31.1.189.g2e36527f23 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() 2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney @ 2022-02-04 22:55 ` Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney 2 siblings, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2022-02-04 22:55 UTC (permalink / raw) To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney This commit adds a READ_ONCE() to an access to the rcu_node structure's ->expmask field to prevent compiler mischief. Detected by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> --- kernel/rcu/tree_exp.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 1568c8ef185b2..60197ea24ceb9 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -502,7 +502,8 @@ static void synchronize_rcu_expedited_wait(void) if (synchronize_rcu_expedited_wait_once(1)) return; rcu_for_each_leaf_node(rnp) { - for_each_leaf_node_cpu_mask(rnp, cpu, rnp->expmask) { + mask = READ_ONCE(rnp->expmask); + for_each_leaf_node_cpu_mask(rnp, cpu, mask) { rdp = per_cpu_ptr(&rcu_data, cpu); if (rdp->rcu_forced_tick_exp) continue; -- 2.31.1.189.g2e36527f23 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() Paul E. McKenney @ 2022-02-04 22:55 ` Paul E. McKenney 2022-02-08 18:56 ` Tejun Heo 2022-02-09 18:23 ` Mukesh Ojha 2 siblings, 2 replies; 12+ messages in thread From: Paul E. McKenney @ 2022-02-04 22:55 UTC (permalink / raw) To: rcu Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney, Mukesh Ojha, Tejun Heo Although it is usually safe to invoke synchronize_rcu_expedited() from a preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to invoke a workqueue handler will hang due to RCU waiting on a CPU that the scheduler is not paying attention to. This commit therefore expands use of the existing workqueue-independent synchronize_rcu_expedited() from early boot to also include CPUs that are being hotplugged. Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> --- kernel/rcu/tree_exp.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 60197ea24ceb9..1a45667402260 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) */ void synchronize_rcu_expedited(void) { - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); + bool no_wq; struct rcu_exp_work rew; struct rcu_node *rnp; unsigned long s; @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) if (exp_funnel_lock(s)) return; /* Someone else did our work for us. */ + /* Don't use workqueue during boot or from an incoming CPU. */ + preempt_disable(); + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); + preempt_enable(); + /* Ensure that load happens before action based on it. */ - if (unlikely(boottime)) { - /* Direct call during scheduler init and early_initcalls(). */ + if (unlikely(no_wq)) { + /* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */ rcu_exp_sel_wait_wake(s); } else { /* Marshall arguments & schedule the expedited grace period. */ @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) /* Let the next expedited grace period start. */ mutex_unlock(&rcu_state.exp_mutex); - if (likely(!boottime)) + if (likely(!no_wq)) destroy_work_on_stack(&rew.rew_work); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); -- 2.31.1.189.g2e36527f23 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney @ 2022-02-08 18:56 ` Tejun Heo 2022-02-09 18:23 ` Mukesh Ojha 1 sibling, 0 replies; 12+ messages in thread From: Tejun Heo @ 2022-02-08 18:56 UTC (permalink / raw) To: Paul E. McKenney; +Cc: rcu, linux-kernel, kernel-team, rostedt, Mukesh Ojha On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote: > Although it is usually safe to invoke synchronize_rcu_expedited() from a > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to > invoke a workqueue handler will hang due to RCU waiting on a CPU that > the scheduler is not paying attention to. This commit therefore expands > use of the existing workqueue-independent synchronize_rcu_expedited() > from early boot to also include CPUs that are being hotplugged. > > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> > Cc: Tejun Heo <tj@kernel.org> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney 2022-02-08 18:56 ` Tejun Heo @ 2022-02-09 18:23 ` Mukesh Ojha 2022-02-09 22:06 ` Paul E. McKenney 1 sibling, 1 reply; 12+ messages in thread From: Mukesh Ojha @ 2022-02-09 18:23 UTC (permalink / raw) To: Paul E. McKenney, rcu; +Cc: linux-kernel, kernel-team, rostedt, Tejun Heo On 2/5/2022 4:25 AM, Paul E. McKenney wrote: > Although it is usually safe to invoke synchronize_rcu_expedited() from a > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to > invoke a workqueue handler will hang due to RCU waiting on a CPU that > the scheduler is not paying attention to. This commit therefore expands > use of the existing workqueue-independent synchronize_rcu_expedited() > from early boot to also include CPUs that are being hotplugged. > > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> > Cc: Tejun Heo <tj@kernel.org> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > --- > kernel/rcu/tree_exp.h | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 60197ea24ceb9..1a45667402260 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) > */ > void synchronize_rcu_expedited(void) > { > - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); > + bool no_wq; > struct rcu_exp_work rew; > struct rcu_node *rnp; > unsigned long s; > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) > if (exp_funnel_lock(s)) > return; /* Someone else did our work for us. */ > > + /* Don't use workqueue during boot or from an incoming CPU. */ > + preempt_disable(); > + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || > + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); > + preempt_enable(); > + > /* Ensure that load happens before action based on it. */ > - if (unlikely(boottime)) { > - /* Direct call during scheduler init and early_initcalls(). */ > + if (unlikely(no_wq)) { > + /* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */ > rcu_exp_sel_wait_wake(s); > } else { > /* Marshall arguments & schedule the expedited grace period. */ > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) > /* Let the next expedited grace period start. */ > mutex_unlock(&rcu_state.exp_mutex); > > - if (likely(!boottime)) > + if (likely(!no_wq)) > destroy_work_on_stack(&rew.rew_work); > } > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); Can we reach a condition after this change where no_wq = true and during rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu ? -Mukesh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-09 18:23 ` Mukesh Ojha @ 2022-02-09 22:06 ` Paul E. McKenney 2022-02-11 18:44 ` Mukesh Ojha 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-02-09 22:06 UTC (permalink / raw) To: Mukesh Ojha; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote: > > On 2/5/2022 4:25 AM, Paul E. McKenney wrote: > > Although it is usually safe to invoke synchronize_rcu_expedited() from a > > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier > > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to > > invoke a workqueue handler will hang due to RCU waiting on a CPU that > > the scheduler is not paying attention to. This commit therefore expands > > use of the existing workqueue-independent synchronize_rcu_expedited() > > from early boot to also include CPUs that are being hotplugged. > > > > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ > > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> > > Cc: Tejun Heo <tj@kernel.org> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > --- > > kernel/rcu/tree_exp.h | 14 ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > index 60197ea24ceb9..1a45667402260 100644 > > --- a/kernel/rcu/tree_exp.h > > +++ b/kernel/rcu/tree_exp.h > > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) > > */ > > void synchronize_rcu_expedited(void) > > { > > - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); > > + bool no_wq; > > struct rcu_exp_work rew; > > struct rcu_node *rnp; > > unsigned long s; > > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) > > if (exp_funnel_lock(s)) > > return; /* Someone else did our work for us. */ > > + /* Don't use workqueue during boot or from an incoming CPU. */ > > + preempt_disable(); > > + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || > > + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); > > + preempt_enable(); > > + > > /* Ensure that load happens before action based on it. */ > > - if (unlikely(boottime)) { > > - /* Direct call during scheduler init and early_initcalls(). */ > > + if (unlikely(no_wq)) { > > + /* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */ > > rcu_exp_sel_wait_wake(s); > > } else { > > /* Marshall arguments & schedule the expedited grace period. */ > > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) > > /* Let the next expedited grace period start. */ > > mutex_unlock(&rcu_state.exp_mutex); > > - if (likely(!boottime)) > > + if (likely(!no_wq)) > > destroy_work_on_stack(&rew.rew_work); > > } > > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); > > Can we reach a condition after this change where no_wq = true and during > rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu > ? Hello, Mukesh, and thank you for looking this over! At first glance, I do not believe that this can happen because the expedited grace-period machinery avoids waiting on the current CPU. (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id() early in the function and the get_cpu() later in the function.) But please let me know if I am missing something here. But suppose that we could in fact reach this condition. What bad thing would happen? Other than a resched_cpu() having been invoked several times on a not-yet-online CPU, of course. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-09 22:06 ` Paul E. McKenney @ 2022-02-11 18:44 ` Mukesh Ojha 2022-02-11 22:14 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Mukesh Ojha @ 2022-02-11 18:44 UTC (permalink / raw) To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo On 2/10/2022 3:36 AM, Paul E. McKenney wrote: > On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote: >> On 2/5/2022 4:25 AM, Paul E. McKenney wrote: >>> Although it is usually safe to invoke synchronize_rcu_expedited() from a >>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier >>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to >>> invoke a workqueue handler will hang due to RCU waiting on a CPU that >>> the scheduler is not paying attention to. This commit therefore expands >>> use of the existing workqueue-independent synchronize_rcu_expedited() >>> from early boot to also include CPUs that are being hotplugged. >>> >>> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ >>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> >>> Cc: Tejun Heo <tj@kernel.org> >>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>> --- >>> kernel/rcu/tree_exp.h | 14 ++++++++++---- >>> 1 file changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h >>> index 60197ea24ceb9..1a45667402260 100644 >>> --- a/kernel/rcu/tree_exp.h >>> +++ b/kernel/rcu/tree_exp.h >>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) >>> */ >>> void synchronize_rcu_expedited(void) >>> { >>> - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); >>> + bool no_wq; >>> struct rcu_exp_work rew; >>> struct rcu_node *rnp; >>> unsigned long s; >>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) >>> if (exp_funnel_lock(s)) >>> return; /* Someone else did our work for us. */ >>> + /* Don't use workqueue during boot or from an incoming CPU. */ >>> + preempt_disable(); >>> + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || >>> + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); >>> + preempt_enable(); >>> + >>> /* Ensure that load happens before action based on it. */ >>> - if (unlikely(boottime)) { >>> - /* Direct call during scheduler init and early_initcalls(). */ >>> + if (unlikely(no_wq)) { >>> + /* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */ >>> rcu_exp_sel_wait_wake(s); >>> } else { >>> /* Marshall arguments & schedule the expedited grace period. */ >>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) >>> /* Let the next expedited grace period start. */ >>> mutex_unlock(&rcu_state.exp_mutex); >>> - if (likely(!boottime)) >>> + if (likely(!no_wq)) >>> destroy_work_on_stack(&rew.rew_work); >>> } >>> EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); >> Can we reach a condition after this change where no_wq = true and during >> rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu >> ? > Hello, Mukesh, and thank you for looking this over! > > At first glance, I do not believe that this can happen because the > expedited grace-period machinery avoids waiting on the current CPU. > (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id() > early in the function and the get_cpu() later in the function.) > > But please let me know if I am missing something here. > > But suppose that we could in fact reach this condition. What bad thing > would happen? Other than a resched_cpu() having been invoked several > times on a not-yet-online CPU, of course. ;-) I thought more about this, what if synchronize_rcu_expedited thread got schedule out and run on some other cpu and we clear out cpu on which it ran next from exp_mask. Queuing the work on same cpu ensures that it will always be right cpu to clear out. Do you think this can happen ? -Mukesh > > Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-11 18:44 ` Mukesh Ojha @ 2022-02-11 22:14 ` Paul E. McKenney 2022-02-12 8:47 ` Mukesh Ojha 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-02-11 22:14 UTC (permalink / raw) To: Mukesh Ojha; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote: > > On 2/10/2022 3:36 AM, Paul E. McKenney wrote: > > On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote: > > > On 2/5/2022 4:25 AM, Paul E. McKenney wrote: > > > > Although it is usually safe to invoke synchronize_rcu_expedited() from a > > > > preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier > > > > between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to > > > > invoke a workqueue handler will hang due to RCU waiting on a CPU that > > > > the scheduler is not paying attention to. This commit therefore expands > > > > use of the existing workqueue-independent synchronize_rcu_expedited() > > > > from early boot to also include CPUs that are being hotplugged. > > > > > > > > Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ > > > > Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> > > > > Cc: Tejun Heo <tj@kernel.org> > > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > > --- > > > > kernel/rcu/tree_exp.h | 14 ++++++++++---- > > > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > > index 60197ea24ceb9..1a45667402260 100644 > > > > --- a/kernel/rcu/tree_exp.h > > > > +++ b/kernel/rcu/tree_exp.h > > > > @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) > > > > */ > > > > void synchronize_rcu_expedited(void) > > > > { > > > > - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); > > > > + bool no_wq; > > > > struct rcu_exp_work rew; > > > > struct rcu_node *rnp; > > > > unsigned long s; > > > > @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) > > > > if (exp_funnel_lock(s)) > > > > return; /* Someone else did our work for us. */ > > > > + /* Don't use workqueue during boot or from an incoming CPU. */ > > > > + preempt_disable(); > > > > + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || > > > > + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); > > > > + preempt_enable(); > > > > + > > > > /* Ensure that load happens before action based on it. */ > > > > - if (unlikely(boottime)) { > > > > - /* Direct call during scheduler init and early_initcalls(). */ > > > > + if (unlikely(no_wq)) { > > > > + /* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */ > > > > rcu_exp_sel_wait_wake(s); > > > > } else { > > > > /* Marshall arguments & schedule the expedited grace period. */ > > > > @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) > > > > /* Let the next expedited grace period start. */ > > > > mutex_unlock(&rcu_state.exp_mutex); > > > > - if (likely(!boottime)) > > > > + if (likely(!no_wq)) > > > > destroy_work_on_stack(&rew.rew_work); > > > > } > > > > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); > > > Can we reach a condition after this change where no_wq = true and during > > > rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu > > > ? > > Hello, Mukesh, and thank you for looking this over! > > > > At first glance, I do not believe that this can happen because the > > expedited grace-period machinery avoids waiting on the current CPU. > > (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id() > > early in the function and the get_cpu() later in the function.) > > > > But please let me know if I am missing something here. > > > > But suppose that we could in fact reach this condition. What bad thing > > would happen? Other than a resched_cpu() having been invoked several > > times on a not-yet-online CPU, of course. ;-) > > > I thought more about this, what if synchronize_rcu_expedited thread got > schedule out and run on some other cpu > and we clear out cpu on which it ran next from exp_mask. > > Queuing the work on same cpu ensures that it will always be right cpu to > clear out. > Do you think this can happen ? Indeed it might. But if it did, the scheduler would invoke RCU's hook, which is named rcu_note_context_switch(), and do so on the pre-switch CPU. There are two implementations for this function, one for CONFIG_PREEMPT=y and another for CONFIG_PREEMPT=n. Both look to me like they invoke rcu_report_exp_rdp() when needed, one directly and the other via the CONFIG_PREEMPT=n variant of rcu_qs(). Am I missing something? Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-11 22:14 ` Paul E. McKenney @ 2022-02-12 8:47 ` Mukesh Ojha 2022-02-12 11:28 ` Neeraj Upadhyay 0 siblings, 1 reply; 12+ messages in thread From: Mukesh Ojha @ 2022-02-12 8:47 UTC (permalink / raw) To: paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo On 2/12/2022 3:44 AM, Paul E. McKenney wrote: > On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote: >> On 2/10/2022 3:36 AM, Paul E. McKenney wrote: >>> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote: >>>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote: >>>>> Although it is usually safe to invoke synchronize_rcu_expedited() from a >>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier >>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to >>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that >>>>> the scheduler is not paying attention to. This commit therefore expands >>>>> use of the existing workqueue-independent synchronize_rcu_expedited() >>>>> from early boot to also include CPUs that are being hotplugged. >>>>> >>>>> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ >>>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> >>>>> Cc: Tejun Heo <tj@kernel.org> >>>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>>>> --- >>>>> kernel/rcu/tree_exp.h | 14 ++++++++++---- >>>>> 1 file changed, 10 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h >>>>> index 60197ea24ceb9..1a45667402260 100644 >>>>> --- a/kernel/rcu/tree_exp.h >>>>> +++ b/kernel/rcu/tree_exp.h >>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) >>>>> */ >>>>> void synchronize_rcu_expedited(void) >>>>> { >>>>> - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); >>>>> + bool no_wq; >>>>> struct rcu_exp_work rew; >>>>> struct rcu_node *rnp; >>>>> unsigned long s; >>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) >>>>> if (exp_funnel_lock(s)) >>>>> return; /* Someone else did our work for us. */ >>>>> + /* Don't use workqueue during boot or from an incoming CPU. */ >>>>> + preempt_disable(); >>>>> + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || >>>>> + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); >>>>> + preempt_enable(); >>>>> + >>>>> /* Ensure that load happens before action based on it. */ >>>>> - if (unlikely(boottime)) { >>>>> - /* Direct call during scheduler init and early_initcalls(). */ >>>>> + if (unlikely(no_wq)) { >>>>> + /* Direct call for scheduler init, early_initcall()s, and incoming CPUs. */ >>>>> rcu_exp_sel_wait_wake(s); >>>>> } else { >>>>> /* Marshall arguments & schedule the expedited grace period. */ >>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) >>>>> /* Let the next expedited grace period start. */ >>>>> mutex_unlock(&rcu_state.exp_mutex); >>>>> - if (likely(!boottime)) >>>>> + if (likely(!no_wq)) >>>>> destroy_work_on_stack(&rew.rew_work); >>>>> } >>>>> EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); >>>> Can we reach a condition after this change where no_wq = true and during >>>> rcu_stall report where exp_task = 0 list and exp_mask contain only this cpu >>>> ? >>> Hello, Mukesh, and thank you for looking this over! >>> >>> At first glance, I do not believe that this can happen because the >>> expedited grace-period machinery avoids waiting on the current CPU. >>> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id() >>> early in the function and the get_cpu() later in the function.) >>> >>> But please let me know if I am missing something here. >>> >>> But suppose that we could in fact reach this condition. What bad thing >>> would happen? Other than a resched_cpu() having been invoked several >>> times on a not-yet-online CPU, of course. ;-) >> >> I thought more about this, what if synchronize_rcu_expedited thread got >> schedule out and run on some other cpu >> and we clear out cpu on which it ran next from exp_mask. >> >> Queuing the work on same cpu ensures that it will always be right cpu to >> clear out. >> Do you think this can happen ? > Indeed it might. > > But if it did, the scheduler would invoke RCU's hook, which is named > rcu_note_context_switch(), and do so on the pre-switch CPU. There are > two implementations for this function, one for CONFIG_PREEMPT=y > and another for CONFIG_PREEMPT=n. Both look to me like they invoke > rcu_report_exp_rdp() when needed, one directly and the other via the > CONFIG_PREEMPT=n variant of rcu_qs(). > > Am I missing something? > > There is a issue we are facing where exp_mask is not getting cleared and rcu_stall report that the cpu we are waiting on sometime in idle and sometime executing some other task but it is not clearing itself from exp_mask from a very long time and in all the instances exp_task list is NULL. expmask = 8, ==> cpu3 [80235.522440][T12441] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-... } 9163622 jiffies s: 634705 root: 0x8/. [80235.534757][T12441] rcu: blocking rcu_node structures: [80235.540102][T12441] Task dump for CPU 3: [80235.540118][T12441] task:core_ctl state:D stack: 0 pid: 172 ppid: 2 flags:0x00000008 [80235.540150][T12441] Call trace: [80235.540178][T12441] __switch_to+0x2a8/0x3ac [80235.540207][T12441] rcu_state+0x11b0/0x1480 [80299.010105][T12441] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-... } 9179494 jiffies s: 634705 root: 0x8/. [80299.022623][T12441] rcu: blocking rcu_node structures: [80299.027924][T12441] Task dump for CPU 3: [80299.027942][T12441] task:swapper/3 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000008 [80299.027993][T12441] Call trace: [80299.028025][T12441] __switch_to+0x2a8/0x3ac [80299.028051][T12441] 0xffffffc010113eb4 As we were not seeing this earlier. Below is compile tested patch, can we do something like this ? ==========================================><==================================================== diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 6453ac5..f0332e4 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -812,10 +812,12 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp) */ void synchronize_rcu_expedited(void) { - bool no_wq; + bool no_wq = (rcu_scheduler_active == RCU_SCHEDULER_INIT); + bool is_active; struct rcu_exp_work rew; struct rcu_node *rnp; unsigned long s; + int next_cpu; RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || lock_is_held(&rcu_lock_map) || @@ -837,19 +839,28 @@ void synchronize_rcu_expedited(void) if (exp_funnel_lock(s)) return; /* Someone else did our work for us. */ - /* Don't use workqueue during boot or from an incoming CPU. */ - preempt_disable(); - no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || - !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); - preempt_enable(); - /* Ensure that load happens before action based on it. */ if (unlikely(no_wq)) { - /* Direct call during scheduler init, early_initcalls() and incoming CPUs. */ + /* Direct call during scheduler init, early_initcalls(). */ rcu_exp_sel_wait_wake(s); + mutex_unlock(&rcu_state.exp_mutex); + return; + } + + preempt_disable(); + is_active = cpumask_test_cpu(smp_processor_id(), cpu_active_mask); + preempt_enable(); + + rew.rew_s = s; + if (!is_active) { + INIT_WORK(&rew.rew_work, wait_rcu_exp_gp); + next_cpu = cpumask_next(smp_processor_id(), cpu_active_mask); + if (next_cpu >= nr_cpu_ids) + next_cpu = cpumask_first(cpu_active_mask); + + queue_work_on(next_cpu, rcu_gp_wq, &rew.rew_work); } else { /* Marshall arguments & schedule the expedited grace period. */ - rew.rew_s = s; INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); queue_work(rcu_gp_wq, &rew.rew_work); } @@ -863,7 +874,9 @@ void synchronize_rcu_expedited(void) /* Let the next expedited grace period start. */ mutex_unlock(&rcu_state.exp_mutex); - if (likely(!no_wq)) + if (likely(is_active)) destroy_work_on_stack(&rew.rew_work); + else + flush_work(&rew.rew_work); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); -- 2.7.4 -Mukesh > Thanx, Paul ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-12 8:47 ` Mukesh Ojha @ 2022-02-12 11:28 ` Neeraj Upadhyay 2022-02-12 13:56 ` Mukesh Ojha 0 siblings, 1 reply; 12+ messages in thread From: Neeraj Upadhyay @ 2022-02-12 11:28 UTC (permalink / raw) To: Mukesh Ojha, paulmck; +Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo Hi Mukesh, On 2/12/2022 2:17 PM, Mukesh Ojha wrote: > > On 2/12/2022 3:44 AM, Paul E. McKenney wrote: >> On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote: >>> On 2/10/2022 3:36 AM, Paul E. McKenney wrote: >>>> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote: >>>>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote: >>>>>> Although it is usually safe to invoke synchronize_rcu_expedited() >>>>>> from a >>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a >>>>>> notifier >>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to >>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU that >>>>>> the scheduler is not paying attention to. This commit therefore >>>>>> expands >>>>>> use of the existing workqueue-independent synchronize_rcu_expedited() >>>>>> from early boot to also include CPUs that are being hotplugged. >>>>>> >>>>>> Link: >>>>>> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ >>>>>> >>>>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> >>>>>> Cc: Tejun Heo <tj@kernel.org> >>>>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>>>>> --- >>>>>> kernel/rcu/tree_exp.h | 14 ++++++++++---- >>>>>> 1 file changed, 10 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h >>>>>> index 60197ea24ceb9..1a45667402260 100644 >>>>>> --- a/kernel/rcu/tree_exp.h >>>>>> +++ b/kernel/rcu/tree_exp.h >>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct >>>>>> rcu_node *rnp) >>>>>> */ >>>>>> void synchronize_rcu_expedited(void) >>>>>> { >>>>>> - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); >>>>>> + bool no_wq; >>>>>> struct rcu_exp_work rew; >>>>>> struct rcu_node *rnp; >>>>>> unsigned long s; >>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) >>>>>> if (exp_funnel_lock(s)) >>>>>> return; /* Someone else did our work for us. */ >>>>>> + /* Don't use workqueue during boot or from an incoming CPU. */ >>>>>> + preempt_disable(); >>>>>> + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || >>>>>> + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); >>>>>> + preempt_enable(); >>>>>> + >>>>>> /* Ensure that load happens before action based on it. */ >>>>>> - if (unlikely(boottime)) { >>>>>> - /* Direct call during scheduler init and >>>>>> early_initcalls(). */ >>>>>> + if (unlikely(no_wq)) { >>>>>> + /* Direct call for scheduler init, early_initcall()s, and >>>>>> incoming CPUs. */ >>>>>> rcu_exp_sel_wait_wake(s); >>>>>> } else { >>>>>> /* Marshall arguments & schedule the expedited grace >>>>>> period. */ >>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) >>>>>> /* Let the next expedited grace period start. */ >>>>>> mutex_unlock(&rcu_state.exp_mutex); >>>>>> - if (likely(!boottime)) >>>>>> + if (likely(!no_wq)) >>>>>> destroy_work_on_stack(&rew.rew_work); >>>>>> } >>>>>> EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); >>>>> Can we reach a condition after this change where no_wq = true and >>>>> during >>>>> rcu_stall report where exp_task = 0 list and exp_mask contain only >>>>> this cpu >>>>> ? >>>> Hello, Mukesh, and thank you for looking this over! >>>> >>>> At first glance, I do not believe that this can happen because the >>>> expedited grace-period machinery avoids waiting on the current CPU. >>>> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id() >>>> early in the function and the get_cpu() later in the function.) >>>> >>>> But please let me know if I am missing something here. >>>> >>>> But suppose that we could in fact reach this condition. What bad thing >>>> would happen? Other than a resched_cpu() having been invoked several >>>> times on a not-yet-online CPU, of course. ;-) >>> >>> I thought more about this, what if synchronize_rcu_expedited thread got >>> schedule out and run on some other cpu >>> and we clear out cpu on which it ran next from exp_mask. >>> >>> Queuing the work on same cpu ensures that it will always be right cpu to >>> clear out. >>> Do you think this can happen ? >> Indeed it might. >> >> But if it did, the scheduler would invoke RCU's hook, which is named >> rcu_note_context_switch(), and do so on the pre-switch CPU. There are >> two implementations for this function, one for CONFIG_PREEMPT=y >> and another for CONFIG_PREEMPT=n. Both look to me like they invoke >> rcu_report_exp_rdp() when needed, one directly and the other via the >> CONFIG_PREEMPT=n variant of rcu_qs(). >> >> Am I missing something? >> >> > > There is a issue we are facing where exp_mask is not getting cleared and > rcu_stall report that > the cpu we are waiting on sometime in idle and sometime executing some > other task but > it is not clearing itself from exp_mask from a very long time and in all > the instances exp_task list is NULL. Can you please check whether [1] is present in your tree? Thanks Neeraj [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/rcu/tree_exp.h?h=v5.17-rc3&id=81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b > > expmask = 8, ==> cpu3 > > [80235.522440][T12441] rcu: INFO: rcu_preempt detected expedited stalls > on CPUs/tasks: { 3-... } 9163622 jiffies s: 634705 root: 0x8/. > [80235.534757][T12441] rcu: blocking rcu_node structures: > [80235.540102][T12441] Task dump for CPU 3: > [80235.540118][T12441] task:core_ctl state:D stack: 0 pid: 172 > ppid: 2 flags:0x00000008 > [80235.540150][T12441] Call trace: > [80235.540178][T12441] __switch_to+0x2a8/0x3ac > [80235.540207][T12441] rcu_state+0x11b0/0x1480 > > > [80299.010105][T12441] rcu: INFO: rcu_preempt detected expedited stalls > on CPUs/tasks: { 3-... } 9179494 jiffies s: 634705 root: 0x8/. > [80299.022623][T12441] rcu: blocking rcu_node structures: > [80299.027924][T12441] Task dump for CPU 3: > [80299.027942][T12441] task:swapper/3 state:R running task > stack: 0 pid: 0 ppid: 1 flags:0x00000008 > [80299.027993][T12441] Call trace: > [80299.028025][T12441] __switch_to+0x2a8/0x3ac > [80299.028051][T12441] 0xffffffc010113eb4 > > > As we were not seeing this earlier. > Below is compile tested patch, can we do something like this ? > > ==========================================><==================================================== > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 6453ac5..f0332e4 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -812,10 +812,12 @@ static int rcu_print_task_exp_stall(struct > rcu_node *rnp) > */ > void synchronize_rcu_expedited(void) > { > - bool no_wq; > + bool no_wq = (rcu_scheduler_active == RCU_SCHEDULER_INIT); > + bool is_active; > struct rcu_exp_work rew; > struct rcu_node *rnp; > unsigned long s; > + int next_cpu; > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || > lock_is_held(&rcu_lock_map) || > @@ -837,19 +839,28 @@ void synchronize_rcu_expedited(void) > if (exp_funnel_lock(s)) > return; /* Someone else did our work for us. */ > > - /* Don't use workqueue during boot or from an incoming CPU. */ > - preempt_disable(); > - no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || > - !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); > - preempt_enable(); > - > /* Ensure that load happens before action based on it. */ > if (unlikely(no_wq)) { > - /* Direct call during scheduler init, early_initcalls() and > incoming CPUs. */ > + /* Direct call during scheduler init, early_initcalls(). */ > rcu_exp_sel_wait_wake(s); > + mutex_unlock(&rcu_state.exp_mutex); > + return; > + } > + > + preempt_disable(); > + is_active = cpumask_test_cpu(smp_processor_id(), cpu_active_mask); > + preempt_enable(); > + > + rew.rew_s = s; > + if (!is_active) { > + INIT_WORK(&rew.rew_work, wait_rcu_exp_gp); > + next_cpu = cpumask_next(smp_processor_id(), cpu_active_mask); > + if (next_cpu >= nr_cpu_ids) > + next_cpu = cpumask_first(cpu_active_mask); > + > + queue_work_on(next_cpu, rcu_gp_wq, &rew.rew_work); > } else { > /* Marshall arguments & schedule the expedited grace period. */ > - rew.rew_s = s; > INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); > queue_work(rcu_gp_wq, &rew.rew_work); > } > @@ -863,7 +874,9 @@ void synchronize_rcu_expedited(void) > /* Let the next expedited grace period start. */ > mutex_unlock(&rcu_state.exp_mutex); > > - if (likely(!no_wq)) > + if (likely(is_active)) > destroy_work_on_stack(&rew.rew_work); > + else > + flush_work(&rew.rew_work); > } > EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs 2022-02-12 11:28 ` Neeraj Upadhyay @ 2022-02-12 13:56 ` Mukesh Ojha 0 siblings, 0 replies; 12+ messages in thread From: Mukesh Ojha @ 2022-02-12 13:56 UTC (permalink / raw) To: Neeraj Upadhyay, paulmck Cc: rcu, linux-kernel, kernel-team, rostedt, Tejun Heo On 2/12/2022 4:58 PM, Neeraj Upadhyay wrote: > Hi Mukesh, > > On 2/12/2022 2:17 PM, Mukesh Ojha wrote: >> >> On 2/12/2022 3:44 AM, Paul E. McKenney wrote: >>> On Sat, Feb 12, 2022 at 12:14:20AM +0530, Mukesh Ojha wrote: >>>> On 2/10/2022 3:36 AM, Paul E. McKenney wrote: >>>>> On Wed, Feb 09, 2022 at 11:53:33PM +0530, Mukesh Ojha wrote: >>>>>> On 2/5/2022 4:25 AM, Paul E. McKenney wrote: >>>>>>> Although it is usually safe to invoke >>>>>>> synchronize_rcu_expedited() from a >>>>>>> preemption-enabled CPU-hotplug notifier, if it is invoked from a >>>>>>> notifier >>>>>>> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its >>>>>>> attempts to >>>>>>> invoke a workqueue handler will hang due to RCU waiting on a CPU >>>>>>> that >>>>>>> the scheduler is not paying attention to. This commit therefore >>>>>>> expands >>>>>>> use of the existing workqueue-independent >>>>>>> synchronize_rcu_expedited() >>>>>>> from early boot to also include CPUs that are being hotplugged. >>>>>>> >>>>>>> Link: >>>>>>> https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/ >>>>>>> >>>>>>> Reported-by: Mukesh Ojha <quic_mojha@quicinc.com> >>>>>>> Cc: Tejun Heo <tj@kernel.org> >>>>>>> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> >>>>>>> --- >>>>>>> kernel/rcu/tree_exp.h | 14 ++++++++++---- >>>>>>> 1 file changed, 10 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h >>>>>>> index 60197ea24ceb9..1a45667402260 100644 >>>>>>> --- a/kernel/rcu/tree_exp.h >>>>>>> +++ b/kernel/rcu/tree_exp.h >>>>>>> @@ -816,7 +816,7 @@ static int rcu_print_task_exp_stall(struct >>>>>>> rcu_node *rnp) >>>>>>> */ >>>>>>> void synchronize_rcu_expedited(void) >>>>>>> { >>>>>>> - bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT); >>>>>>> + bool no_wq; >>>>>>> struct rcu_exp_work rew; >>>>>>> struct rcu_node *rnp; >>>>>>> unsigned long s; >>>>>>> @@ -841,9 +841,15 @@ void synchronize_rcu_expedited(void) >>>>>>> if (exp_funnel_lock(s)) >>>>>>> return; /* Someone else did our work for us. */ >>>>>>> + /* Don't use workqueue during boot or from an incoming CPU. */ >>>>>>> + preempt_disable(); >>>>>>> + no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || >>>>>>> + !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); >>>>>>> + preempt_enable(); >>>>>>> + >>>>>>> /* Ensure that load happens before action based on it. */ >>>>>>> - if (unlikely(boottime)) { >>>>>>> - /* Direct call during scheduler init and >>>>>>> early_initcalls(). */ >>>>>>> + if (unlikely(no_wq)) { >>>>>>> + /* Direct call for scheduler init, early_initcall()s, >>>>>>> and incoming CPUs. */ >>>>>>> rcu_exp_sel_wait_wake(s); >>>>>>> } else { >>>>>>> /* Marshall arguments & schedule the expedited grace >>>>>>> period. */ >>>>>>> @@ -861,7 +867,7 @@ void synchronize_rcu_expedited(void) >>>>>>> /* Let the next expedited grace period start. */ >>>>>>> mutex_unlock(&rcu_state.exp_mutex); >>>>>>> - if (likely(!boottime)) >>>>>>> + if (likely(!no_wq)) >>>>>>> destroy_work_on_stack(&rew.rew_work); >>>>>>> } >>>>>>> EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); >>>>>> Can we reach a condition after this change where no_wq = true and >>>>>> during >>>>>> rcu_stall report where exp_task = 0 list and exp_mask contain >>>>>> only this cpu >>>>>> ? >>>>> Hello, Mukesh, and thank you for looking this over! >>>>> >>>>> At first glance, I do not believe that this can happen because the >>>>> expedited grace-period machinery avoids waiting on the current CPU. >>>>> (See sync_rcu_exp_select_node_cpus(), both the raw_smp_processor_id() >>>>> early in the function and the get_cpu() later in the function.) >>>>> >>>>> But please let me know if I am missing something here. >>>>> >>>>> But suppose that we could in fact reach this condition. What bad >>>>> thing >>>>> would happen? Other than a resched_cpu() having been invoked several >>>>> times on a not-yet-online CPU, of course. ;-) >>>> >>>> I thought more about this, what if synchronize_rcu_expedited thread >>>> got >>>> schedule out and run on some other cpu >>>> and we clear out cpu on which it ran next from exp_mask. >>>> >>>> Queuing the work on same cpu ensures that it will always be right >>>> cpu to >>>> clear out. >>>> Do you think this can happen ? >>> Indeed it might. >>> >>> But if it did, the scheduler would invoke RCU's hook, which is named >>> rcu_note_context_switch(), and do so on the pre-switch CPU. There are >>> two implementations for this function, one for CONFIG_PREEMPT=y >>> and another for CONFIG_PREEMPT=n. Both look to me like they invoke >>> rcu_report_exp_rdp() when needed, one directly and the other via the >>> CONFIG_PREEMPT=n variant of rcu_qs(). >>> >>> Am I missing something? >>> >>> >> >> There is a issue we are facing where exp_mask is not getting cleared >> and rcu_stall report that >> the cpu we are waiting on sometime in idle and sometime executing >> some other task but >> it is not clearing itself from exp_mask from a very long time and in >> all the instances exp_task list is NULL. > > Can you please check whether [1] is present in your tree? > Thanks Neeraj. It is not there, will check the results with this patch. -Mukesh > > > Thanks > Neeraj > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/rcu/tree_exp.h?h=v5.17-rc3&id=81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b >> >> expmask = 8, ==> cpu3 >> >> [80235.522440][T12441] rcu: INFO: rcu_preempt detected expedited >> stalls on CPUs/tasks: { 3-... } 9163622 jiffies s: 634705 root: 0x8/. >> [80235.534757][T12441] rcu: blocking rcu_node structures: >> [80235.540102][T12441] Task dump for CPU 3: >> [80235.540118][T12441] task:core_ctl state:D stack: 0 pid: >> 172 ppid: 2 flags:0x00000008 >> [80235.540150][T12441] Call trace: >> [80235.540178][T12441] __switch_to+0x2a8/0x3ac >> [80235.540207][T12441] rcu_state+0x11b0/0x1480 >> >> >> [80299.010105][T12441] rcu: INFO: rcu_preempt detected expedited >> stalls on CPUs/tasks: { 3-... } 9179494 jiffies s: 634705 root: 0x8/. >> [80299.022623][T12441] rcu: blocking rcu_node structures: >> [80299.027924][T12441] Task dump for CPU 3: >> [80299.027942][T12441] task:swapper/3 state:R running task >> stack: 0 pid: 0 ppid: 1 flags:0x00000008 >> [80299.027993][T12441] Call trace: >> [80299.028025][T12441] __switch_to+0x2a8/0x3ac >> [80299.028051][T12441] 0xffffffc010113eb4 >> >> >> As we were not seeing this earlier. >> Below is compile tested patch, can we do something like this ? >> >> ==========================================><==================================================== >> >> >> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h >> index 6453ac5..f0332e4 100644 >> --- a/kernel/rcu/tree_exp.h >> +++ b/kernel/rcu/tree_exp.h >> @@ -812,10 +812,12 @@ static int rcu_print_task_exp_stall(struct >> rcu_node *rnp) >> */ >> void synchronize_rcu_expedited(void) >> { >> - bool no_wq; >> + bool no_wq = (rcu_scheduler_active == RCU_SCHEDULER_INIT); >> + bool is_active; >> struct rcu_exp_work rew; >> struct rcu_node *rnp; >> unsigned long s; >> + int next_cpu; >> >> RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || >> lock_is_held(&rcu_lock_map) || >> @@ -837,19 +839,28 @@ void synchronize_rcu_expedited(void) >> if (exp_funnel_lock(s)) >> return; /* Someone else did our work for us. */ >> >> - /* Don't use workqueue during boot or from an incoming CPU. */ >> - preempt_disable(); >> - no_wq = rcu_scheduler_active == RCU_SCHEDULER_INIT || >> - !cpumask_test_cpu(smp_processor_id(), cpu_active_mask); >> - preempt_enable(); >> - >> /* Ensure that load happens before action based on it. */ >> if (unlikely(no_wq)) { >> - /* Direct call during scheduler init, early_initcalls() and >> incoming CPUs. */ >> + /* Direct call during scheduler init, early_initcalls(). */ >> rcu_exp_sel_wait_wake(s); >> + mutex_unlock(&rcu_state.exp_mutex); >> + return; >> + } >> + >> + preempt_disable(); >> + is_active = cpumask_test_cpu(smp_processor_id(), cpu_active_mask); >> + preempt_enable(); >> + >> + rew.rew_s = s; >> + if (!is_active) { >> + INIT_WORK(&rew.rew_work, wait_rcu_exp_gp); >> + next_cpu = cpumask_next(smp_processor_id(), cpu_active_mask); >> + if (next_cpu >= nr_cpu_ids) >> + next_cpu = cpumask_first(cpu_active_mask); >> + >> + queue_work_on(next_cpu, rcu_gp_wq, &rew.rew_work); >> } else { >> /* Marshall arguments & schedule the expedited grace >> period. */ >> - rew.rew_s = s; >> INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); >> queue_work(rcu_gp_wq, &rew.rew_work); >> } >> @@ -863,7 +874,9 @@ void synchronize_rcu_expedited(void) >> /* Let the next expedited grace period start. */ >> mutex_unlock(&rcu_state.exp_mutex); >> >> - if (likely(!no_wq)) >> + if (likely(is_active)) >> destroy_work_on_stack(&rew.rew_work); >> + else >> + flush_work(&rew.rew_work); >> } >> EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-02-12 13:56 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-04 22:54 [PATCH rcu 0/3] Expedited-grace-period updates for v5.18 Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 1/3] rcu/exp: Fix check for idle context in rcu_exp_handler Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 2/3] rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() Paul E. McKenney 2022-02-04 22:55 ` [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on incoming CPUs Paul E. McKenney 2022-02-08 18:56 ` Tejun Heo 2022-02-09 18:23 ` Mukesh Ojha 2022-02-09 22:06 ` Paul E. McKenney 2022-02-11 18:44 ` Mukesh Ojha 2022-02-11 22:14 ` Paul E. McKenney 2022-02-12 8:47 ` Mukesh Ojha 2022-02-12 11:28 ` Neeraj Upadhyay 2022-02-12 13:56 ` Mukesh Ojha
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.