linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed
@ 2022-02-28  9:36 Zqiang
  2022-03-03 16:49 ` Frederic Weisbecker
  0 siblings, 1 reply; 5+ messages in thread
From: Zqiang @ 2022-02-28  9:36 UTC (permalink / raw)
  To: paulmck, frederic; +Cc: linux-kernel

When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the rcuop
and rcuog kthreads is created. however the rcuop or rcuog kthreads
creation may fail, if failed, clear rdp offloaded flags.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 kernel/rcu/tree_nocb.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 46694e13398a..94b279147954 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1246,7 +1246,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
 				"rcuog/%d", rdp_gp->cpu);
 		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
 			mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
-			return;
+			goto end;
 		}
 		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
 		if (kthread_prio)
@@ -1258,12 +1258,22 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
 	t = kthread_run(rcu_nocb_cb_kthread, rdp,
 			"rcuo%c/%d", rcu_state.abbr, cpu);
 	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
-		return;
+		goto end;
 
 	if (kthread_prio)
 		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
 	WRITE_ONCE(rdp->nocb_cb_kthread, t);
 	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
+	return;
+end:
+	if (cpumask_test_cpu(cpu, rcu_nocb_mask)) {
+		rcu_segcblist_offload(&rdp->cblist, false);
+		rcu_segcblist_clear_flags(&rdp->cblist,
+				SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
+		rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_LOCKING);
+		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_RCU_CORE);
+	}
+	return;
 }
 
 /* How many CB CPU IDs per GP kthread?  Default of -1 for sqrt(nr_cpu_ids). */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed
  2022-02-28  9:36 [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed Zqiang
@ 2022-03-03 16:49 ` Frederic Weisbecker
  2022-03-08  7:37   ` Zhang, Qiang1
  0 siblings, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2022-03-03 16:49 UTC (permalink / raw)
  To: Zqiang
  Cc: paulmck, linux-kernel, Neeraj Upadhyay, Uladzislau Rezki, Boqun Feng

On Mon, Feb 28, 2022 at 05:36:29PM +0800, Zqiang wrote:
> When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the rcuop
> and rcuog kthreads is created. however the rcuop or rcuog kthreads
> creation may fail, if failed, clear rdp offloaded flags.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> ---
>  kernel/rcu/tree_nocb.h | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 46694e13398a..94b279147954 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1246,7 +1246,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  				"rcuog/%d", rdp_gp->cpu);
>  		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
>  			mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> -			return;
> +			goto end;
>  		}
>  		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
>  		if (kthread_prio)
> @@ -1258,12 +1258,22 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  	t = kthread_run(rcu_nocb_cb_kthread, rdp,
>  			"rcuo%c/%d", rcu_state.abbr, cpu);
>  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> -		return;
> +		goto end;
>  
>  	if (kthread_prio)
>  		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>  	WRITE_ONCE(rdp->nocb_cb_kthread, t);
>  	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> +	return;
> +end:
> +	if (cpumask_test_cpu(cpu, rcu_nocb_mask)) {
> +		rcu_segcblist_offload(&rdp->cblist, false);
> +		rcu_segcblist_clear_flags(&rdp->cblist,
> +				SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
> +		rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_LOCKING);
> +		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_RCU_CORE);
> +	}

Thanks you, consequences are indeed bad otherwise because the target is considered
offloaded but nothing actually handles the callbacks.

A few issues though:

* The rdp_gp kthread may be running concurrently. If it's iterating this rdp and
  the SEGCBLIST_LOCKING flag is cleared in the middle, rcu_nocb_unlock() won't
  release (among many other possible issues).

* we should clear the cpu from rcu_nocb_mask or we won't be able to later
  re-offload it.

* we should then delete the rdp from the group list:

     list_del_rcu(&rdp->nocb_entry_rdp);

So ideally we should call rcu_nocb_rdp_deoffload(). But then bear in mind:

1) We must lock rcu_state.barrier_mutex and hotplug read lock. But since we
   are calling rcutree_prepare_cpu(), we maybe holding hotplug write lock
   already.

   Therefore we first need to invert the locking dependency order between
   rcu_state.barrier_mutex and hotplug lock and then just lock the barrier_mutex
   before calling rcu_nocb_rdp_deoffload() from our failure path.
   

2) On rcu_nocb_rdp_deoffload(), handle non-existing nocb_gp and/or nocb_cb
   kthreads. Make sure we are holding nocb_gp_kthread_mutex.

I'm going to take your patch and adapt it along those lines.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed
  2022-03-03 16:49 ` Frederic Weisbecker
@ 2022-03-08  7:37   ` Zhang, Qiang1
  2022-03-09 21:06     ` Frederic Weisbecker
  0 siblings, 1 reply; 5+ messages in thread
From: Zhang, Qiang1 @ 2022-03-08  7:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulmck, linux-kernel, Neeraj Upadhyay, Uladzislau Rezki, Boqun Feng


On Mon, Feb 28, 2022 at 05:36:29PM +0800, Zqiang wrote:
> When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the rcuop 
> and rcuog kthreads is created. however the rcuop or rcuog kthreads 
> creation may fail, if failed, clear rdp offloaded flags.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> ---
>  kernel/rcu/tree_nocb.h | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 
> 46694e13398a..94b279147954 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1246,7 +1246,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  				"rcuog/%d", rdp_gp->cpu);
>  		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
>  			mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> -			return;
> +			goto end;
>  		}
>  		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
>  		if (kthread_prio)
> @@ -1258,12 +1258,22 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  	t = kthread_run(rcu_nocb_cb_kthread, rdp,
>  			"rcuo%c/%d", rcu_state.abbr, cpu);
>  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> -		return;
> +		goto end;
>  
>  	if (kthread_prio)
>  		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
>  	WRITE_ONCE(rdp->nocb_cb_kthread, t);
>  	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> +	return;
> +end:
> +	if (cpumask_test_cpu(cpu, rcu_nocb_mask)) {
> +		rcu_segcblist_offload(&rdp->cblist, false);
> +		rcu_segcblist_clear_flags(&rdp->cblist,
> +				SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
> +		rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_LOCKING);
> +		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_RCU_CORE);
> +	}
>>
>>Thanks you, consequences are indeed bad otherwise because the target is considered offloaded but nothing actually handles the callbacks.
>>
>>A few issues though:
>>
>>* The rdp_gp kthread may be running concurrently. If it's iterating this rdp and
>>  the SEGCBLIST_LOCKING flag is cleared in the middle, rcu_nocb_unlock() won't
>>  release (among many other possible issues).
>>
>>* we should clear the cpu from rcu_nocb_mask or we won't be able to later
>>  re-offload it.
>>
>>* we should then delete the rdp from the group list:
>>
>>     list_del_rcu(&rdp->nocb_entry_rdp);
>>
>>So ideally we should call rcu_nocb_rdp_deoffload(). But then bear in mind:
>>
>>1) We must lock rcu_state.barrier_mutex and hotplug read lock. But since we
>>   are calling rcutree_prepare_cpu(), we maybe holding hotplug write lock
>>   already.
>>
>>   Therefore we first need to invert the locking dependency order between
>>   rcu_state.barrier_mutex and hotplug lock and then just lock the barrier_mutex
>>   before calling rcu_nocb_rdp_deoffload() from our failure path.
>>   
>>
>>2) On rcu_nocb_rdp_deoffload(), handle non-existing nocb_gp and/or nocb_cb
>>   kthreads. Make sure we are holding nocb_gp_kthread_mutex.

Sorry for my late reply,  Is the nocb_gp_kthread_mutex really necessary?
Because the cpu online/offline is serial operation,  It is protected by  cpus_write_lock()

Thanks
Zqiang

>>
>>I'm going to take your patch and adapt it along those lines.
>>
>>Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed
  2022-03-08  7:37   ` Zhang, Qiang1
@ 2022-03-09 21:06     ` Frederic Weisbecker
  2022-03-10  2:37       ` Zhang, Qiang1
  0 siblings, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2022-03-09 21:06 UTC (permalink / raw)
  To: Zhang, Qiang1
  Cc: paulmck, linux-kernel, Neeraj Upadhyay, Uladzislau Rezki, Boqun Feng

On Tue, Mar 08, 2022 at 07:37:24AM +0000, Zhang, Qiang1 wrote:
> 
> On Mon, Feb 28, 2022 at 05:36:29PM +0800, Zqiang wrote:
> > When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the rcuop 
> > and rcuog kthreads is created. however the rcuop or rcuog kthreads 
> > creation may fail, if failed, clear rdp offloaded flags.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > ---
> >  kernel/rcu/tree_nocb.h | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 
> > 46694e13398a..94b279147954 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1246,7 +1246,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> >  				"rcuog/%d", rdp_gp->cpu);
> >  		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
> >  			mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> > -			return;
> > +			goto end;
> >  		}
> >  		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> >  		if (kthread_prio)
> > @@ -1258,12 +1258,22 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> >  	t = kthread_run(rcu_nocb_cb_kthread, rdp,
> >  			"rcuo%c/%d", rcu_state.abbr, cpu);
> >  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> > -		return;
> > +		goto end;
> >  
> >  	if (kthread_prio)
> >  		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> >  	WRITE_ONCE(rdp->nocb_cb_kthread, t);
> >  	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> > +	return;
> > +end:
> > +	if (cpumask_test_cpu(cpu, rcu_nocb_mask)) {
> > +		rcu_segcblist_offload(&rdp->cblist, false);
> > +		rcu_segcblist_clear_flags(&rdp->cblist,
> > +				SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
> > +		rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_LOCKING);
> > +		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_RCU_CORE);
> > +	}
> >>
> >>Thanks you, consequences are indeed bad otherwise because the target is considered offloaded but nothing actually handles the callbacks.
> >>
> >>A few issues though:
> >>
> >>* The rdp_gp kthread may be running concurrently. If it's iterating this rdp and
> >>  the SEGCBLIST_LOCKING flag is cleared in the middle, rcu_nocb_unlock() won't
> >>  release (among many other possible issues).
> >>
> >>* we should clear the cpu from rcu_nocb_mask or we won't be able to later
> >>  re-offload it.
> >>
> >>* we should then delete the rdp from the group list:
> >>
> >>     list_del_rcu(&rdp->nocb_entry_rdp);
> >>
> >>So ideally we should call rcu_nocb_rdp_deoffload(). But then bear in mind:
> >>
> >>1) We must lock rcu_state.barrier_mutex and hotplug read lock. But since we
> >>   are calling rcutree_prepare_cpu(), we maybe holding hotplug write lock
> >>   already.
> >>
> >>   Therefore we first need to invert the locking dependency order between
> >>   rcu_state.barrier_mutex and hotplug lock and then just lock the barrier_mutex
> >>   before calling rcu_nocb_rdp_deoffload() from our failure path.
> >>   
> >>
> >>2) On rcu_nocb_rdp_deoffload(), handle non-existing nocb_gp and/or nocb_cb
> >>   kthreads. Make sure we are holding nocb_gp_kthread_mutex.
> 
> Sorry for my late reply,  Is the nocb_gp_kthread_mutex really necessary?
> Because the cpu online/offline is serial operation,  It is protected by  cpus_write_lock()

And you're right! But some people are working on making cpu_up() able to work
in parallel for faster bring-up on boot.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed
  2022-03-09 21:06     ` Frederic Weisbecker
@ 2022-03-10  2:37       ` Zhang, Qiang1
  0 siblings, 0 replies; 5+ messages in thread
From: Zhang, Qiang1 @ 2022-03-10  2:37 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: paulmck, linux-kernel, Neeraj Upadhyay, Uladzislau Rezki, Boqun Feng

On Tue, Mar 08, 2022 at 07:37:24AM +0000, Zhang, Qiang1 wrote:
> 
> On Mon, Feb 28, 2022 at 05:36:29PM +0800, Zqiang wrote:
> > When CONFIG_RCU_NOCB_CPU is enabled and 'rcu_nocbs' is set, the 
> > rcuop and rcuog kthreads is created. however the rcuop or rcuog 
> > kthreads creation may fail, if failed, clear rdp offloaded flags.
> > 
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > ---
> >  kernel/rcu/tree_nocb.h | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index
> > 46694e13398a..94b279147954 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1246,7 +1246,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> >  				"rcuog/%d", rdp_gp->cpu);
> >  		if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is now expected behavior\n", __func__)) {
> >  			mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex);
> > -			return;
> > +			goto end;
> >  		}
> >  		WRITE_ONCE(rdp_gp->nocb_gp_kthread, t);
> >  		if (kthread_prio)
> > @@ -1258,12 +1258,22 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> >  	t = kthread_run(rcu_nocb_cb_kthread, rdp,
> >  			"rcuo%c/%d", rcu_state.abbr, cpu);
> >  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> > -		return;
> > +		goto end;
> >  
> >  	if (kthread_prio)
> >  		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> >  	WRITE_ONCE(rdp->nocb_cb_kthread, t);
> >  	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
> > +	return;
> > +end:
> > +	if (cpumask_test_cpu(cpu, rcu_nocb_mask)) {
> > +		rcu_segcblist_offload(&rdp->cblist, false);
> > +		rcu_segcblist_clear_flags(&rdp->cblist,
> > +				SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP);
> > +		rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_LOCKING);
> > +		rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_RCU_CORE);
> > +	}
> >>
> >>Thanks you, consequences are indeed bad otherwise because the target is considered offloaded but nothing actually handles the callbacks.
> >>
> >>A few issues though:
> >>
> >>* The rdp_gp kthread may be running concurrently. If it's iterating 
> >>this rdp and
> >>  the SEGCBLIST_LOCKING flag is cleared in the middle, 
> >>rcu_nocb_unlock() won't
> >>  release (among many other possible issues).
> >>
> >>* we should clear the cpu from rcu_nocb_mask or we won't be able to 
> >>later
> >>  re-offload it.
> >>
> >>* we should then delete the rdp from the group list:
> >>
> >>     list_del_rcu(&rdp->nocb_entry_rdp);
> >>
> >>So ideally we should call rcu_nocb_rdp_deoffload(). But then bear in mind:
> >>
> >>1) We must lock rcu_state.barrier_mutex and hotplug read lock. But since we
> >>   are calling rcutree_prepare_cpu(), we maybe holding hotplug write lock
> >>   already.
> >>
> >>   Therefore we first need to invert the locking dependency order between
> >>   rcu_state.barrier_mutex and hotplug lock and then just lock the barrier_mutex
> >>   before calling rcu_nocb_rdp_deoffload() from our failure path.
> >>   
> >>
> >>2) On rcu_nocb_rdp_deoffload(), handle non-existing nocb_gp and/or nocb_cb
> >>   kthreads. Make sure we are holding nocb_gp_kthread_mutex.
> 
> Sorry for my late reply,  Is the nocb_gp_kthread_mutex really necessary?
> Because the cpu online/offline is serial operation,  It is protected 
> by  cpus_write_lock()
>
>And you're right! But some people are working on making cpu_up() able to work in parallel for faster bring-up on boot.

Thank you for  explanation. Are you making the above changes to this patch?

Thanks
Zqiang


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-10  2:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28  9:36 [PATCH] rcu/nocb: Clear rdp offloaded flags when rcuop/rcuog kthreads spawn failed Zqiang
2022-03-03 16:49 ` Frederic Weisbecker
2022-03-08  7:37   ` Zhang, Qiang1
2022-03-09 21:06     ` Frederic Weisbecker
2022-03-10  2:37       ` Zhang, Qiang1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).