[PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
@ 2022-05-05 10:16 Uladzislau Rezki (Sony)
  2022-05-05 19:09 ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki (Sony) @ 2022-05-05 10:16 UTC (permalink / raw)
  To: LKML, RCU, Paul E . McKenney
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Uladzislau Rezki, Oleksiy Avramchenko

Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
decide if an offloading has to be done in a high-prio context or
not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
parameters and by default it is off.

This patch splits the boosting preempted RCU readers and those
kthreads which directly responsible for driving expedited grace
periods forward with enabling/disabling the offloading from/to
SCHED_FIFO/SCHED_OTHER contexts.

The main reason of such split is, for example on Android there
are some workloads which require fast expedited grace period to
be done whereas offloading in RT context can lead to starvation
and hogging a CPU for a long time what is not acceptable for
latency sensitive environment. For instance:

<snip>
<...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
<snip>

invoking 34 619 callbacks will take time thus making other CFS
tasks waiting in run-queue to be starved due to such behaviour.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/Kconfig     | 14 ++++++++++++++
 kernel/rcu/tree.c      |  5 ++++-
 kernel/rcu/tree_nocb.h |  3 ++-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 27aab870ae4c..074630b94902 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
 	  Say Y here if you want offload all CPUs by default on boot.
 	  Say N here if you are unsure.
 
+config RCU_NOCB_CPU_CB_BOOST
+	bool "Perform offloading from real-time kthread"
+	depends on RCU_NOCB_CPU && RCU_BOOST
+	default n
+	help
+	  Use this option to offload callbacks from the SCHED_FIFO context
+	  to make the process faster. As a side effect of this approach is
+	  a latency especially for the SCHED_OTHER tasks which will not be
+	  able to preempt an offloading kthread. That latency depends on a
+	  number of callbacks to be invoked.
+
+	  Say Y here if you want to set RT priority for offloading kthreads.
+	  Say N here if you are unsure.
+
 config TASKS_TRACE_RCU_READ_MB
 	bool "Tasks Trace RCU readers use memory barriers in user and idle"
 	depends on RCU_EXPERT && TASKS_TRACE_RCU
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9dc4c4e82db6..d769a15bc0e3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
 static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
 
-/* rcuc/rcub/rcuop kthread realtime priority */
+/*
+ * rcuc/rcub/rcuop kthread realtime priority. The former
+ * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.
+ */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
 module_param(kthread_prio, int, 0444);
 
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 60cc92cc6655..a2823be9b1d0 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
 	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
 		goto end;
 
-	if (kthread_prio)
+	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))
 		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+
 	WRITE_ONCE(rdp->nocb_cb_kthread, t);
 	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
 	return;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-05 10:16 [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context Uladzislau Rezki (Sony)
@ 2022-05-05 19:09 ` Paul E. McKenney
  2022-05-06 16:22   ` Uladzislau Rezki
  2022-05-10 14:07   ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-05 19:09 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko, bigeasy

On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> decide if an offloading has to be done in a high-prio context or
> not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> parameters and by default it is off.
> 
> This patch splits the boosting preempted RCU readers and those
> kthreads which directly responsible for driving expedited grace
> periods forward with enabling/disabling the offloading from/to
> SCHED_FIFO/SCHED_OTHER contexts.
> 
> The main reason of such split is, for example on Android there
> are some workloads which require fast expedited grace period to
> be done whereas offloading in RT context can lead to starvation
> and hogging a CPU for a long time what is not acceptable for
> latency sensitive environment. For instance:
> 
> <snip>
> <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> <snip>
> 
> invoking 34 619 callbacks will take time thus making other CFS
> tasks waiting in run-queue to be starved due to such behaviour.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

All good points!

Some questions and comments below.

Adding Sebastian on CC for his perspective.

						Thanx, Paul

> ---
>  kernel/rcu/Kconfig     | 14 ++++++++++++++
>  kernel/rcu/tree.c      |  5 ++++-
>  kernel/rcu/tree_nocb.h |  3 ++-
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 27aab870ae4c..074630b94902 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
>  	  Say Y here if you want offload all CPUs by default on boot.
>  	  Say N here if you are unsure.
>  
> +config RCU_NOCB_CPU_CB_BOOST
> +	bool "Perform offloading from real-time kthread"
> +	depends on RCU_NOCB_CPU && RCU_BOOST
> +	default n

I understand that you need this to default to "n" on your systems.
However, other groups already using callback offloading should not see
a sudden change.  I don't see an Android-specific defconfig file, but
perhaps something in drivers/android/Kconfig?

One easy way to make this work would be to invert the sense of this
Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
"n", but then select it somewhere in drivers/android/Kconfig.  But I
would not be surprised if there is a better way.

> +	help
> +	  Use this option to offload callbacks from the SCHED_FIFO context
> +	  to make the process faster. As a side effect of this approach is
> +	  a latency especially for the SCHED_OTHER tasks which will not be
> +	  able to preempt an offloading kthread. That latency depends on a
> +	  number of callbacks to be invoked.
> +
> +	  Say Y here if you want to set RT priority for offloading kthreads.
> +	  Say N here if you are unsure.
> +
>  config TASKS_TRACE_RCU_READ_MB
>  	bool "Tasks Trace RCU readers use memory barriers in user and idle"
>  	depends on RCU_EXPERT && TASKS_TRACE_RCU
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 9dc4c4e82db6..d769a15bc0e3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
>  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
>  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
>  
> -/* rcuc/rcub/rcuop kthread realtime priority */
> +/*
> + * rcuc/rcub/rcuop kthread realtime priority. The former
> + * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.

Aren't the rcuo[ps] kthreads controlled by the RCU_NOCB_CPU_CB_BOOST
Kconfig option?  (As opposed to the "former", which is "rcuc".)

> + */
>  static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
>  module_param(kthread_prio, int, 0444);
>  
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 60cc92cc6655..a2823be9b1d0 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
>  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
>  		goto end;
>  
> -	if (kthread_prio)
> +	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))

Don't we need both non-zero kthread_prio and the proper setting of the
new Kconfig option before we run it at SCHED_FIFO?

Yes, we could rely on sched_setscheduler_nocheck() erroring out in
that case, but that sounds like an accident waiting to happen.

>  		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
> +
>  	WRITE_ONCE(rdp->nocb_cb_kthread, t);
>  	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
>  	return;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-05 19:09 ` Paul E. McKenney
@ 2022-05-06 16:22   ` Uladzislau Rezki
  2022-05-06 18:24     ` Paul E. McKenney
  2022-05-10 14:07   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-06 16:22 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko, bigeasy

> On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > decide if an offloading has to be done in a high-prio context or
> > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > parameters and by default it is off.
> > 
> > This patch splits the boosting preempted RCU readers and those
> > kthreads which directly responsible for driving expedited grace
> > periods forward with enabling/disabling the offloading from/to
> > SCHED_FIFO/SCHED_OTHER contexts.
> > 
> > The main reason of such split is, for example on Android there
> > are some workloads which require fast expedited grace period to
> > be done whereas offloading in RT context can lead to starvation
> > and hogging a CPU for a long time what is not acceptable for
> > latency sensitive environment. For instance:
> > 
> > <snip>
> > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > <snip>
> > 
> > invoking 34 619 callbacks will take time thus making other CFS
> > tasks waiting in run-queue to be starved due to such behaviour.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> All good points!
> 
> Some questions and comments below.
> 
> Adding Sebastian on CC for his perspective.
> 
> 						Thanx, Paul
> 
> > ---
> >  kernel/rcu/Kconfig     | 14 ++++++++++++++
> >  kernel/rcu/tree.c      |  5 ++++-
> >  kernel/rcu/tree_nocb.h |  3 ++-
> >  3 files changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > index 27aab870ae4c..074630b94902 100644
> > --- a/kernel/rcu/Kconfig
> > +++ b/kernel/rcu/Kconfig
> > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> >  	  Say Y here if you want offload all CPUs by default on boot.
> >  	  Say N here if you are unsure.
> >  
> > +config RCU_NOCB_CPU_CB_BOOST
> > +	bool "Perform offloading from real-time kthread"
> > +	depends on RCU_NOCB_CPU && RCU_BOOST
> > +	default n
> 
> I understand that you need this to default to "n" on your systems.
> However, other groups already using callback offloading should not see
> a sudden change.  I don't see an Android-specific defconfig file, but
> perhaps something in drivers/android/Kconfig?
> 
> One easy way to make this work would be to invert the sense of this
> Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> "n", but then select it somewhere in drivers/android/Kconfig.  But I
> would not be surprised if there is a better way.
> 
It was done deliberately, i mean off by default. Because the user has to
think before enabling it for its workloads. It is not a big issue for
kthreads which drive a grace period forward, because their context runtime
i find pretty short. Whereas an offloading callback kthread can stuck
for a long time depending on workloads.

Also, i put it that way because initially those kthreads were staying
as SCHED_NORMAL even though the RCU_BOOST was set in kernel config.

<snip>
commit c8b16a65267e35ecc5621dbc81cbe7e5b0992fce
Author: Alison Chaiken <achaiken@aurora.tech>
Date:   Tue Jan 11 15:32:52 2022 -0800

    rcu: Elevate priority of offloaded callback threads
    
    When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
    parameter signals initialization code to boost the priority of rcuc
    callbacks to the designated value.  With the additional
    CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
    command-line parameter, the callbacks on the listed cores are
    offloaded to new rcuop kthreads that are not pinned to the cores whose
    post-grace-period work is performed.  While the rcuop kthreads perform
    the same function as the rcuc kthreads they offload, the kthread_prio
    parameter only boosts the priority of the rcuc kthreads.  Fix this
    inconsistency by elevating rcuop kthreads to the same priority as the rcuc
    kthreads.
    
    Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
<snip>

I have a doubt that it is needed for CONFIG_PREEMPT_RT=y. The reason i mentioned
above it is a source of extra latency. That is why i have made it inactive by default.

Any thoughts?

> > +	help
> > +	  Use this option to offload callbacks from the SCHED_FIFO context
> > +	  to make the process faster. As a side effect of this approach is
> > +	  a latency especially for the SCHED_OTHER tasks which will not be
> > +	  able to preempt an offloading kthread. That latency depends on a
> > +	  number of callbacks to be invoked.
> > +
> > +	  Say Y here if you want to set RT priority for offloading kthreads.
> > +	  Say N here if you are unsure.
> > +
> >  config TASKS_TRACE_RCU_READ_MB
> >  	bool "Tasks Trace RCU readers use memory barriers in user and idle"
> >  	depends on RCU_EXPERT && TASKS_TRACE_RCU
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 9dc4c4e82db6..d769a15bc0e3 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
> >  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
> >  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
> >  
> > -/* rcuc/rcub/rcuop kthread realtime priority */
> > +/*
> > + * rcuc/rcub/rcuop kthread realtime priority. The former
> > + * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.
> 
> Aren't the rcuo[ps] kthreads controlled by the RCU_NOCB_CPU_CB_BOOST
> Kconfig option?  (As opposed to the "former", which is "rcuc".)
> 
The CONFIG_RCU_NOCB_CPU_CB_BOOST controls only the last what is
the rcuo CB kthread or "rcuo%c/%d" name. Sorry it is not "former"
it is the last in the rcuc/rcub/rcuop sequence. It was a typo :)

> > + */
> >  static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
> >  module_param(kthread_prio, int, 0444);
> >  
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 60cc92cc6655..a2823be9b1d0 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> >  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> >  		goto end;
> >  
> > -	if (kthread_prio)
> > +	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))
> 
> Don't we need both non-zero kthread_prio and the proper setting of the
> new Kconfig option before we run it at SCHED_FIFO?
> 
> Yes, we could rely on sched_setscheduler_nocheck() erroring out in
> that case, but that sounds like an accident waiting to happen.
> 
As far as i see it is odd, because the "kthread_prio" is verified so
there is a sanity check to check if the value is correct for SCHED_FIFO
case and does some adjustment if not. There is sanitize_kthread_prio()
that does all trick.

Looking at the kthread_prio variable. If it is set all the code that
takes into account of it switches to SCHED_FIFO class. Maybe rename it
to something kthread_rt_prio? It might be a bad idea though because of
former dependencies of distros and so on :)



--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-06 16:22   ` Uladzislau Rezki
@ 2022-05-06 18:24     ` Paul E. McKenney
  2022-05-07  9:11       ` Uladzislau Rezki
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-06 18:24 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko, bigeasy

On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote:
> > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > > decide if an offloading has to be done in a high-prio context or
> > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > > parameters and by default it is off.
> > > 
> > > This patch splits the boosting preempted RCU readers and those
> > > kthreads which directly responsible for driving expedited grace
> > > periods forward with enabling/disabling the offloading from/to
> > > SCHED_FIFO/SCHED_OTHER contexts.
> > > 
> > > The main reason of such split is, for example on Android there
> > > are some workloads which require fast expedited grace period to
> > > be done whereas offloading in RT context can lead to starvation
> > > and hogging a CPU for a long time what is not acceptable for
> > > latency sensitive environment. For instance:
> > > 
> > > <snip>
> > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > > <snip>
> > > 
> > > invoking 34 619 callbacks will take time thus making other CFS
> > > tasks waiting in run-queue to be starved due to such behaviour.
> > > 
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > 
> > All good points!
> > 
> > Some questions and comments below.
> > 
> > Adding Sebastian on CC for his perspective.
> > 
> > 						Thanx, Paul
> > 
> > > ---
> > >  kernel/rcu/Kconfig     | 14 ++++++++++++++
> > >  kernel/rcu/tree.c      |  5 ++++-
> > >  kernel/rcu/tree_nocb.h |  3 ++-
> > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > index 27aab870ae4c..074630b94902 100644
> > > --- a/kernel/rcu/Kconfig
> > > +++ b/kernel/rcu/Kconfig
> > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> > >  	  Say Y here if you want offload all CPUs by default on boot.
> > >  	  Say N here if you are unsure.
> > >  
> > > +config RCU_NOCB_CPU_CB_BOOST
> > > +	bool "Perform offloading from real-time kthread"
> > > +	depends on RCU_NOCB_CPU && RCU_BOOST
> > > +	default n
> > 
> > I understand that you need this to default to "n" on your systems.
> > However, other groups already using callback offloading should not see
> > a sudden change.  I don't see an Android-specific defconfig file, but
> > perhaps something in drivers/android/Kconfig?
> > 
> > One easy way to make this work would be to invert the sense of this
> > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > would not be surprised if there is a better way.
> > 
> It was done deliberately, i mean off by default. Because the user has to
> think before enabling it for its workloads. It is not a big issue for
> kthreads which drive a grace period forward, because their context runtime
> i find pretty short. Whereas an offloading callback kthread can stuck
> for a long time depending on workloads.
> 
> Also, i put it that way because initially those kthreads were staying
> as SCHED_NORMAL even though the RCU_BOOST was set in kernel config.
> 
> <snip>
> commit c8b16a65267e35ecc5621dbc81cbe7e5b0992fce
> Author: Alison Chaiken <achaiken@aurora.tech>
> Date:   Tue Jan 11 15:32:52 2022 -0800
> 
>     rcu: Elevate priority of offloaded callback threads
>     
>     When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
>     parameter signals initialization code to boost the priority of rcuc
>     callbacks to the designated value.  With the additional
>     CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
>     command-line parameter, the callbacks on the listed cores are
>     offloaded to new rcuop kthreads that are not pinned to the cores whose
>     post-grace-period work is performed.  While the rcuop kthreads perform
>     the same function as the rcuc kthreads they offload, the kthread_prio
>     parameter only boosts the priority of the rcuc kthreads.  Fix this
>     inconsistency by elevating rcuop kthreads to the same priority as the rcuc
>     kthreads.
>     
>     Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> <snip>
> 
> I have a doubt that it is needed for CONFIG_PREEMPT_RT=y. The reason i mentioned
> above it is a source of extra latency. That is why i have made it inactive by default.
> 
> Any thoughts?

My first thought is that Alison does real RT work.  Let's please therefore
avoid assuming that she doesn't know what she is doing.  ;-)

One thing that she knows is that RT workloads usually run the most
latency-sensitive parts of their application at far higher priority
than they do the rcuo[ps] kthreads.  This means that they do not have
the same issues with these kthreads that you see.

> > > +	help
> > > +	  Use this option to offload callbacks from the SCHED_FIFO context
> > > +	  to make the process faster. As a side effect of this approach is
> > > +	  a latency especially for the SCHED_OTHER tasks which will not be
> > > +	  able to preempt an offloading kthread. That latency depends on a
> > > +	  number of callbacks to be invoked.
> > > +
> > > +	  Say Y here if you want to set RT priority for offloading kthreads.
> > > +	  Say N here if you are unsure.
> > > +
> > >  config TASKS_TRACE_RCU_READ_MB
> > >  	bool "Tasks Trace RCU readers use memory barriers in user and idle"
> > >  	depends on RCU_EXPERT && TASKS_TRACE_RCU
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 9dc4c4e82db6..d769a15bc0e3 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
> > >  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
> > >  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
> > >  
> > > -/* rcuc/rcub/rcuop kthread realtime priority */
> > > +/*
> > > + * rcuc/rcub/rcuop kthread realtime priority. The former
> > > + * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.
> > 
> > Aren't the rcuo[ps] kthreads controlled by the RCU_NOCB_CPU_CB_BOOST
> > Kconfig option?  (As opposed to the "former", which is "rcuc".)
> > 
> The CONFIG_RCU_NOCB_CPU_CB_BOOST controls only the last what is
> the rcuo CB kthread or "rcuo%c/%d" name. Sorry it is not "former"
> it is the last in the rcuc/rcub/rcuop sequence. It was a typo :)

I do know that feeling!  Absolutely not a problem, please just fix it
in the next version.

> > > + */
> > >  static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
> > >  module_param(kthread_prio, int, 0444);
> > >  
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index 60cc92cc6655..a2823be9b1d0 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> > >  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> > >  		goto end;
> > >  
> > > -	if (kthread_prio)
> > > +	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))
> > 
> > Don't we need both non-zero kthread_prio and the proper setting of the
> > new Kconfig option before we run it at SCHED_FIFO?
> > 
> > Yes, we could rely on sched_setscheduler_nocheck() erroring out in
> > that case, but that sounds like an accident waiting to happen.
> > 
> As far as i see it is odd, because the "kthread_prio" is verified so
> there is a sanity check to check if the value is correct for SCHED_FIFO
> case and does some adjustment if not. There is sanitize_kthread_prio()
> that does all trick.

Agreed, and like I said, we could rely on sched_setscheduler_nocheck()
erroring out in that case.  But people do sometimes turn error cases
into some other functionality.  Keeping the check of kthread_prio makes
it clear to people reading the code what our intent is and also avoids
strange breakage should someone find a use for SCHED_FIFO priority zero.

So please put the check of kthread_prio back in for the next version.

> Looking at the kthread_prio variable. If it is set all the code that
> takes into account of it switches to SCHED_FIFO class. Maybe rename it
> to something kthread_rt_prio? It might be a bad idea though because of
> former dependencies of distros and so on :)

Where were you when the kthread_prio patch was first submitted?  ;-)

But agreed, last I checked there were some tens of billions of Linux
kernel instances running out there.  If such a change affected only
0.1% of that total, we could be ruining tens of millions of system's
days with such a name change.  There would thus need to be a very good
reason to change the name, and we don't have one.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-06 18:24     ` Paul E. McKenney
@ 2022-05-07  9:11       ` Uladzislau Rezki
  2022-05-07 22:32         ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-07  9:11 UTC (permalink / raw)
  To: Paul E. McKenney, Alison Chaiken, Sebastian Andrzej Siewior
  Cc: Uladzislau Rezki, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Oleksiy Avramchenko, bigeasy

> On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote:
> > > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > > > decide if an offloading has to be done in a high-prio context or
> > > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > > > parameters and by default it is off.
> > > > 
> > > > This patch splits the boosting preempted RCU readers and those
> > > > kthreads which directly responsible for driving expedited grace
> > > > periods forward with enabling/disabling the offloading from/to
> > > > SCHED_FIFO/SCHED_OTHER contexts.
> > > > 
> > > > The main reason of such split is, for example on Android there
> > > > are some workloads which require fast expedited grace period to
> > > > be done whereas offloading in RT context can lead to starvation
> > > > and hogging a CPU for a long time what is not acceptable for
> > > > latency sensitive environment. For instance:
> > > > 
> > > > <snip>
> > > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > > > <snip>
> > > > 
> > > > invoking 34 619 callbacks will take time thus making other CFS
> > > > tasks waiting in run-queue to be starved due to such behaviour.
> > > > 
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > 
> > > All good points!
> > > 
> > > Some questions and comments below.
> > > 
> > > Adding Sebastian on CC for his perspective.
> > > 
> > > 						Thanx, Paul
> > > 
> > > > ---
> > > >  kernel/rcu/Kconfig     | 14 ++++++++++++++
> > > >  kernel/rcu/tree.c      |  5 ++++-
> > > >  kernel/rcu/tree_nocb.h |  3 ++-
> > > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > > index 27aab870ae4c..074630b94902 100644
> > > > --- a/kernel/rcu/Kconfig
> > > > +++ b/kernel/rcu/Kconfig
> > > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> > > >  	  Say Y here if you want offload all CPUs by default on boot.
> > > >  	  Say N here if you are unsure.
> > > >  
> > > > +config RCU_NOCB_CPU_CB_BOOST
> > > > +	bool "Perform offloading from real-time kthread"
> > > > +	depends on RCU_NOCB_CPU && RCU_BOOST
> > > > +	default n
> > > 
> > > I understand that you need this to default to "n" on your systems.
> > > However, other groups already using callback offloading should not see
> > > a sudden change.  I don't see an Android-specific defconfig file, but
> > > perhaps something in drivers/android/Kconfig?
> > > 
We saw a sudden change when the priority was lifted up for rcuop kthreads. 
I would like to know the reason. As for Android, i would like to avoid
it to be Android specific. It is better just to enable boosting by
default for nocb kthreads.

> > > One easy way to make this work would be to invert the sense of this
> > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > > would not be surprised if there is a better way.
In that situation probably we should just enable it by default.

> > It was done deliberately, i mean off by default. Because the user has to
> > think before enabling it for its workloads. It is not a big issue for
> > kthreads which drive a grace period forward, because their context runtime
> > i find pretty short. Whereas an offloading callback kthread can stuck
> > for a long time depending on workloads.
> > 
> > Also, i put it that way because initially those kthreads were staying
> > as SCHED_NORMAL even though the RCU_BOOST was set in kernel config.
> > 
> > <snip>
> > commit c8b16a65267e35ecc5621dbc81cbe7e5b0992fce
> > Author: Alison Chaiken <achaiken@aurora.tech>
> > Date:   Tue Jan 11 15:32:52 2022 -0800
> > 
> >     rcu: Elevate priority of offloaded callback threads
> >     
> >     When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
> >     parameter signals initialization code to boost the priority of rcuc
> >     callbacks to the designated value.  With the additional
> >     CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
> >     command-line parameter, the callbacks on the listed cores are
> >     offloaded to new rcuop kthreads that are not pinned to the cores whose
> >     post-grace-period work is performed.  While the rcuop kthreads perform
> >     the same function as the rcuc kthreads they offload, the kthread_prio
> >     parameter only boosts the priority of the rcuc kthreads.  Fix this
> >     inconsistency by elevating rcuop kthreads to the same priority as the rcuc
> >     kthreads.
> >     
> >     Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > <snip>
> > 
> > I have a doubt that it is needed for CONFIG_PREEMPT_RT=y. The reason i mentioned
> > above it is a source of extra latency. That is why i have made it inactive by default.
> > 
> > Any thoughts?
> 
> My first thought is that Alison does real RT work.  Let's please therefore
> avoid assuming that she doesn't know what she is doing.  ;-)
> 
I read a commit message that is what i know about the patch.

>
> One thing that she knows is that RT workloads usually run the most
> latency-sensitive parts of their application at far higher priority
> than they do the rcuo[ps] kthreads.  This means that they do not have
> the same issues with these kthreads that you see.
> 
We can make it ON by default for CONFIG_PREEMPT_RT=y kernels. But i do
not want to guess. The correct way is just to ask if Sebastian and Alison
would like to have it by default on.

I have added them into "To".

> > > > +	help
> > > > +	  Use this option to offload callbacks from the SCHED_FIFO context
> > > > +	  to make the process faster. As a side effect of this approach is
> > > > +	  a latency especially for the SCHED_OTHER tasks which will not be
> > > > +	  able to preempt an offloading kthread. That latency depends on a
> > > > +	  number of callbacks to be invoked.
> > > > +
> > > > +	  Say Y here if you want to set RT priority for offloading kthreads.
> > > > +	  Say N here if you are unsure.
> > > > +
> > > >  config TASKS_TRACE_RCU_READ_MB
> > > >  	bool "Tasks Trace RCU readers use memory barriers in user and idle"
> > > >  	depends on RCU_EXPERT && TASKS_TRACE_RCU
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 9dc4c4e82db6..d769a15bc0e3 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
> > > >  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
> > > >  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
> > > >  
> > > > -/* rcuc/rcub/rcuop kthread realtime priority */
> > > > +/*
> > > > + * rcuc/rcub/rcuop kthread realtime priority. The former
> > > > + * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.
> > > 
> > > Aren't the rcuo[ps] kthreads controlled by the RCU_NOCB_CPU_CB_BOOST
> > > Kconfig option?  (As opposed to the "former", which is "rcuc".)
> > > 
> > The CONFIG_RCU_NOCB_CPU_CB_BOOST controls only the last what is
> > the rcuo CB kthread or "rcuo%c/%d" name. Sorry it is not "former"
> > it is the last in the rcuc/rcub/rcuop sequence. It was a typo :)
> 
> I do know that feeling!  Absolutely not a problem, please just fix it
> in the next version.
> 
Will do :)

> > > > + */
> > > >  static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
> > > >  module_param(kthread_prio, int, 0444);
> > > >  
> > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > > index 60cc92cc6655..a2823be9b1d0 100644
> > > > --- a/kernel/rcu/tree_nocb.h
> > > > +++ b/kernel/rcu/tree_nocb.h
> > > > @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> > > >  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> > > >  		goto end;
> > > >  
> > > > -	if (kthread_prio)
> > > > +	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))
> > > 
> > > Don't we need both non-zero kthread_prio and the proper setting of the
> > > new Kconfig option before we run it at SCHED_FIFO?
> > > 
> > > Yes, we could rely on sched_setscheduler_nocheck() erroring out in
> > > that case, but that sounds like an accident waiting to happen.
> > > 
> > As far as i see it is odd, because the "kthread_prio" is verified so
> > there is a sanity check to check if the value is correct for SCHED_FIFO
> > case and does some adjustment if not. There is sanitize_kthread_prio()
> > that does all trick.
> 
> Agreed, and like I said, we could rely on sched_setscheduler_nocheck()
> erroring out in that case.  But people do sometimes turn error cases
> into some other functionality.  Keeping the check of kthread_prio makes
> it clear to people reading the code what our intent is and also avoids
> strange breakage should someone find a use for SCHED_FIFO priority zero.
> 
Hm... I can place it back, though it is useless, IMHO.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-07  9:11       ` Uladzislau Rezki
@ 2022-05-07 22:32         ` Paul E. McKenney
  2022-05-08  6:28           ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-07 22:32 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Alison Chaiken, Sebastian Andrzej Siewior, LKML, RCU,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

On Sat, May 07, 2022 at 11:11:58AM +0200, Uladzislau Rezki wrote:
> > On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote:
> > > > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > > > > decide if an offloading has to be done in a high-prio context or
> > > > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > > > > parameters and by default it is off.
> > > > > 
> > > > > This patch splits the boosting preempted RCU readers and those
> > > > > kthreads which directly responsible for driving expedited grace
> > > > > periods forward with enabling/disabling the offloading from/to
> > > > > SCHED_FIFO/SCHED_OTHER contexts.
> > > > > 
> > > > > The main reason of such split is, for example on Android there
> > > > > are some workloads which require fast expedited grace period to
> > > > > be done whereas offloading in RT context can lead to starvation
> > > > > and hogging a CPU for a long time what is not acceptable for
> > > > > latency sensitive environment. For instance:
> > > > > 
> > > > > <snip>
> > > > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > > > > <snip>
> > > > > 
> > > > > invoking 34 619 callbacks will take time thus making other CFS
> > > > > tasks waiting in run-queue to be starved due to such behaviour.
> > > > > 
> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > 
> > > > All good points!
> > > > 
> > > > Some questions and comments below.
> > > > 
> > > > Adding Sebastian on CC for his perspective.
> > > > 
> > > > 						Thanx, Paul
> > > > 
> > > > > ---
> > > > >  kernel/rcu/Kconfig     | 14 ++++++++++++++
> > > > >  kernel/rcu/tree.c      |  5 ++++-
> > > > >  kernel/rcu/tree_nocb.h |  3 ++-
> > > > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > > > index 27aab870ae4c..074630b94902 100644
> > > > > --- a/kernel/rcu/Kconfig
> > > > > +++ b/kernel/rcu/Kconfig
> > > > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> > > > >  	  Say Y here if you want offload all CPUs by default on boot.
> > > > >  	  Say N here if you are unsure.
> > > > >  
> > > > > +config RCU_NOCB_CPU_CB_BOOST
> > > > > +	bool "Perform offloading from real-time kthread"
> > > > > +	depends on RCU_NOCB_CPU && RCU_BOOST
> > > > > +	default n
> > > > 
> > > > I understand that you need this to default to "n" on your systems.
> > > > However, other groups already using callback offloading should not see
> > > > a sudden change.  I don't see an Android-specific defconfig file, but
> > > > perhaps something in drivers/android/Kconfig?
> > > > 
> We saw a sudden change when the priority was lifted up for rcuop kthreads. 
> I would like to know the reason. As for Android, i would like to avoid
> it to be Android specific. It is better just to enable boosting by
> default for nocb kthreads.

No, because that breaks an existing use case, which uses RCU_BOOST
to avoid OOM on busy systems.

> > > > One easy way to make this work would be to invert the sense of this
> > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > > > would not be surprised if there is a better way.
> 
> In that situation probably we should just enable it by default.

You are within your rights to cause it to be enabled by default -within-
-Android-.  You are -not- within your rights to break other workloads.

If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.

It is not -that- hard, guys!  ;-)

> > > It was done deliberately, i mean off by default. Because the user has to
> > > think before enabling it for its workloads. It is not a big issue for
> > > kthreads which drive a grace period forward, because their context runtime
> > > i find pretty short. Whereas an offloading callback kthread can stuck
> > > for a long time depending on workloads.
> > > 
> > > Also, i put it that way because initially those kthreads were staying
> > > as SCHED_NORMAL even though the RCU_BOOST was set in kernel config.
> > > 
> > > <snip>
> > > commit c8b16a65267e35ecc5621dbc81cbe7e5b0992fce
> > > Author: Alison Chaiken <achaiken@aurora.tech>
> > > Date:   Tue Jan 11 15:32:52 2022 -0800
> > > 
> > >     rcu: Elevate priority of offloaded callback threads
> > >     
> > >     When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line
> > >     parameter signals initialization code to boost the priority of rcuc
> > >     callbacks to the designated value.  With the additional
> > >     CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs
> > >     command-line parameter, the callbacks on the listed cores are
> > >     offloaded to new rcuop kthreads that are not pinned to the cores whose
> > >     post-grace-period work is performed.  While the rcuop kthreads perform
> > >     the same function as the rcuc kthreads they offload, the kthread_prio
> > >     parameter only boosts the priority of the rcuc kthreads.  Fix this
> > >     inconsistency by elevating rcuop kthreads to the same priority as the rcuc
> > >     kthreads.
> > >     
> > >     Signed-off-by: Alison Chaiken <achaiken@aurora.tech>
> > >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > <snip>
> > > 
> > > I have a doubt that it is needed for CONFIG_PREEMPT_RT=y. The reason i mentioned
> > > above it is a source of extra latency. That is why i have made it inactive by default.
> > > 
> > > Any thoughts?
> > 
> > My first thought is that Alison does real RT work.  Let's please therefore
> > avoid assuming that she doesn't know what she is doing.  ;-)
> > 
> I read a commit message that is what i know about the patch.

Here is the rest of that commit, it is not long:

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5bf0312f66760..9e4c5b281f003 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -153,7 +153,7 @@ static void sync_sched_exp_online_cleanup(int cpu);
 static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
 
-/* rcuc/rcub kthread realtime priority */
+/* rcuc/rcub/rcuop kthread realtime priority */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
 module_param(kthread_prio, int, 0444);
 
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index ea889cbfc3b95..547c41437c767 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1270,6 +1270,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
 			"rcuo%c/%d", rcu_state.abbr, cpu);
 	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
 		return;
+
+	if (kthread_prio)
+		sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
 	WRITE_ONCE(rdp->nocb_cb_kthread, t);
 	WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
 }

------------------------------------------------------------------------

In short, your patch undoes her patch.  Had the -rt guys wanted the
rcuo[ps] kthreads to run at SCHED_OTHER priority, they would not have
produced this commit.

> > One thing that she knows is that RT workloads usually run the most
> > latency-sensitive parts of their application at far higher priority
> > than they do the rcuo[ps] kthreads.  This means that they do not have
> > the same issues with these kthreads that you see.
> > 
> We can make it ON by default for CONFIG_PREEMPT_RT=y kernels. But i do
> not want to guess. The correct way is just to ask if Sebastian and Alison
> would like to have it by default on.
> 
> I have added them into "To".

Good, because I will not accept a change in the default unless both of
them give Acks or better for that change.

> > > > > +	help
> > > > > +	  Use this option to offload callbacks from the SCHED_FIFO context
> > > > > +	  to make the process faster. As a side effect of this approach is
> > > > > +	  a latency especially for the SCHED_OTHER tasks which will not be
> > > > > +	  able to preempt an offloading kthread. That latency depends on a
> > > > > +	  number of callbacks to be invoked.
> > > > > +
> > > > > +	  Say Y here if you want to set RT priority for offloading kthreads.
> > > > > +	  Say N here if you are unsure.
> > > > > +
> > > > >  config TASKS_TRACE_RCU_READ_MB
> > > > >  	bool "Tasks Trace RCU readers use memory barriers in user and idle"
> > > > >  	depends on RCU_EXPERT && TASKS_TRACE_RCU
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 9dc4c4e82db6..d769a15bc0e3 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
> > > > >  static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
> > > > >  static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
> > > > >  
> > > > > -/* rcuc/rcub/rcuop kthread realtime priority */
> > > > > +/*
> > > > > + * rcuc/rcub/rcuop kthread realtime priority. The former
> > > > > + * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.
> > > > 
> > > > Aren't the rcuo[ps] kthreads controlled by the RCU_NOCB_CPU_CB_BOOST
> > > > Kconfig option?  (As opposed to the "former", which is "rcuc".)
> > > > 
> > > The CONFIG_RCU_NOCB_CPU_CB_BOOST controls only the last what is
> > > the rcuo CB kthread or "rcuo%c/%d" name. Sorry it is not "former"
> > > it is the last in the rcuc/rcub/rcuop sequence. It was a typo :)
> > 
> > I do know that feeling!  Absolutely not a problem, please just fix it
> > in the next version.
> > 
> Will do :)
> 
> > > > > + */
> > > > >  static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
> > > > >  module_param(kthread_prio, int, 0444);
> > > > >  
> > > > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > > > index 60cc92cc6655..a2823be9b1d0 100644
> > > > > --- a/kernel/rcu/tree_nocb.h
> > > > > +++ b/kernel/rcu/tree_nocb.h
> > > > > @@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
> > > > >  	if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
> > > > >  		goto end;
> > > > >  
> > > > > -	if (kthread_prio)
> > > > > +	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))
> > > > 
> > > > Don't we need both non-zero kthread_prio and the proper setting of the
> > > > new Kconfig option before we run it at SCHED_FIFO?
> > > > 
> > > > Yes, we could rely on sched_setscheduler_nocheck() erroring out in
> > > > that case, but that sounds like an accident waiting to happen.
> > > > 
> > > As far as i see it is odd, because the "kthread_prio" is verified so
> > > there is a sanity check to check if the value is correct for SCHED_FIFO
> > > case and does some adjustment if not. There is sanitize_kthread_prio()
> > > that does all trick.
> > 
> > Agreed, and like I said, we could rely on sched_setscheduler_nocheck()
> > erroring out in that case.  But people do sometimes turn error cases
> > into some other functionality.  Keeping the check of kthread_prio makes
> > it clear to people reading the code what our intent is and also avoids
> > strange breakage should someone find a use for SCHED_FIFO priority zero.
> > 
> Hm... I can place it back, though it is useless, IMHO.

Not useless at all.  First, it saves inspecting each and every patch
for a change in sched_setscheduler_nocheck() behavior.  Second, it makes
it immediately obvious to the casual reader what is happening.  Third,
I won't accept the patch without that check being there.

							Thanx, Paul

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-07 22:32         ` Paul E. McKenney
@ 2022-05-08  6:28           ` Joel Fernandes
  2022-05-08 21:32             ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-05-08  6:28 UTC (permalink / raw)
  To: Paul E. McKenney, Steven Rostedt
  Cc: Uladzislau Rezki, Alison Chaiken, Sebastian Andrzej Siewior,
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay,
	Oleksiy Avramchenko

On Sat, May 7, 2022 at 6:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sat, May 07, 2022 at 11:11:58AM +0200, Uladzislau Rezki wrote:
> > > On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote:
> > > > > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > > > > > decide if an offloading has to be done in a high-prio context or
> > > > > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > > > > > parameters and by default it is off.
> > > > > >
> > > > > > This patch splits the boosting preempted RCU readers and those
> > > > > > kthreads which directly responsible for driving expedited grace
> > > > > > periods forward with enabling/disabling the offloading from/to
> > > > > > SCHED_FIFO/SCHED_OTHER contexts.
> > > > > >
> > > > > > The main reason of such split is, for example on Android there
> > > > > > are some workloads which require fast expedited grace period to
> > > > > > be done whereas offloading in RT context can lead to starvation
> > > > > > and hogging a CPU for a long time what is not acceptable for
> > > > > > latency sensitive environment. For instance:
> > > > > >
> > > > > > <snip>
> > > > > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > > > > > <snip>
> > > > > >
> > > > > > invoking 34 619 callbacks will take time thus making other CFS
> > > > > > tasks waiting in run-queue to be starved due to such behaviour.
> > > > > >
> > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > >
> > > > > All good points!
> > > > >
> > > > > Some questions and comments below.
> > > > >
> > > > > Adding Sebastian on CC for his perspective.
> > > > >
> > > > >                                                 Thanx, Paul
> > > > >
> > > > > > ---
> > > > > >  kernel/rcu/Kconfig     | 14 ++++++++++++++
> > > > > >  kernel/rcu/tree.c      |  5 ++++-
> > > > > >  kernel/rcu/tree_nocb.h |  3 ++-
> > > > > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > > > > index 27aab870ae4c..074630b94902 100644
> > > > > > --- a/kernel/rcu/Kconfig
> > > > > > +++ b/kernel/rcu/Kconfig
> > > > > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> > > > > >         Say Y here if you want offload all CPUs by default on boot.
> > > > > >         Say N here if you are unsure.
> > > > > >
> > > > > > +config RCU_NOCB_CPU_CB_BOOST
> > > > > > +     bool "Perform offloading from real-time kthread"
> > > > > > +     depends on RCU_NOCB_CPU && RCU_BOOST
> > > > > > +     default n
> > > > >
> > > > > I understand that you need this to default to "n" on your systems.
> > > > > However, other groups already using callback offloading should not see
> > > > > a sudden change.  I don't see an Android-specific defconfig file, but
> > > > > perhaps something in drivers/android/Kconfig?
> > > > >
> > We saw a sudden change when the priority was lifted up for rcuop kthreads.
> > I would like to know the reason. As for Android, i would like to avoid
> > it to be Android specific. It is better just to enable boosting by
> > default for nocb kthreads.
>
> No, because that breaks an existing use case, which uses RCU_BOOST
> to avoid OOM on busy systems.
>
> > > > > One easy way to make this work would be to invert the sense of this
> > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > > > > would not be surprised if there is a better way.
> >
> > In that situation probably we should just enable it by default.
>
> You are within your rights to cause it to be enabled by default -within-
> -Android-.  You are -not- within your rights to break other workloads.
>
> If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.
>
> It is not -that- hard, guys!  ;-)

I think on the topic of RT, +Steven Rostedt should chime in as well
considering he wrote a good chunk of the RT scheduler ;-). Personally,
I feel the issue of "rcu callback offload" threads running as RT or
not should not be a matter of CONFIG option or the system in concern.
Instead it should be a function of how many callbacks there are to
run.  The reason I say this is, RT threads should not be doing a lot
of work anyway, lest they cause RT throttling and starvation of other
threads.

Also, I think it is wrong to assume that a certain kind of system will
always have a certain number of callbacks to process at a time. That
seems prone to poor design due to assumptions which may not always be
true.

Can we not have 2 sets of RCU offload threads, one which operate at RT
and only process few callbacks at a time, while another which is the
lower priority CFS offload thread - executes whenever there is a lot
of CBs pending? Just a thought.

Otherwise, I feel like we might be again proliferating CONFIG options
and increasing burden on the user to get it the CONFIG right.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-08  6:28           ` Joel Fernandes
@ 2022-05-08 21:32             ` Paul E. McKenney
  2022-05-09  0:17               ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-08 21:32 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, Uladzislau Rezki, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Sun, May 08, 2022 at 02:28:49AM -0400, Joel Fernandes wrote:
> On Sat, May 7, 2022 at 6:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sat, May 07, 2022 at 11:11:58AM +0200, Uladzislau Rezki wrote:
> > > > On Fri, May 06, 2022 at 06:22:26PM +0200, Uladzislau Rezki wrote:
> > > > > > On Thu, May 05, 2022 at 12:16:41PM +0200, Uladzislau Rezki (Sony) wrote:
> > > > > > > Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
> > > > > > > decide if an offloading has to be done in a high-prio context or
> > > > > > > not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
> > > > > > > parameters and by default it is off.
> > > > > > >
> > > > > > > This patch splits the boosting preempted RCU readers and those
> > > > > > > kthreads which directly responsible for driving expedited grace
> > > > > > > periods forward with enabling/disabling the offloading from/to
> > > > > > > SCHED_FIFO/SCHED_OTHER contexts.
> > > > > > >
> > > > > > > The main reason of such split is, for example on Android there
> > > > > > > are some workloads which require fast expedited grace period to
> > > > > > > be done whereas offloading in RT context can lead to starvation
> > > > > > > and hogging a CPU for a long time what is not acceptable for
> > > > > > > latency sensitive environment. For instance:
> > > > > > >
> > > > > > > <snip>
> > > > > > > <...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
> > > > > > > <snip>
> > > > > > >
> > > > > > > invoking 34 619 callbacks will take time thus making other CFS
> > > > > > > tasks waiting in run-queue to be starved due to such behaviour.
> > > > > > >
> > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > >
> > > > > > All good points!
> > > > > >
> > > > > > Some questions and comments below.
> > > > > >
> > > > > > Adding Sebastian on CC for his perspective.
> > > > > >
> > > > > >                                                 Thanx, Paul
> > > > > >
> > > > > > > ---
> > > > > > >  kernel/rcu/Kconfig     | 14 ++++++++++++++
> > > > > > >  kernel/rcu/tree.c      |  5 ++++-
> > > > > > >  kernel/rcu/tree_nocb.h |  3 ++-
> > > > > > >  3 files changed, 20 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> > > > > > > index 27aab870ae4c..074630b94902 100644
> > > > > > > --- a/kernel/rcu/Kconfig
> > > > > > > +++ b/kernel/rcu/Kconfig
> > > > > > > @@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
> > > > > > >         Say Y here if you want offload all CPUs by default on boot.
> > > > > > >         Say N here if you are unsure.
> > > > > > >
> > > > > > > +config RCU_NOCB_CPU_CB_BOOST
> > > > > > > +     bool "Perform offloading from real-time kthread"
> > > > > > > +     depends on RCU_NOCB_CPU && RCU_BOOST
> > > > > > > +     default n
> > > > > >
> > > > > > I understand that you need this to default to "n" on your systems.
> > > > > > However, other groups already using callback offloading should not see
> > > > > > a sudden change.  I don't see an Android-specific defconfig file, but
> > > > > > perhaps something in drivers/android/Kconfig?
> > > > > >
> > > We saw a sudden change when the priority was lifted up for rcuop kthreads.
> > > I would like to know the reason. As for Android, i would like to avoid
> > > it to be Android specific. It is better just to enable boosting by
> > > default for nocb kthreads.
> >
> > No, because that breaks an existing use case, which uses RCU_BOOST
> > to avoid OOM on busy systems.
> >
> > > > > > One easy way to make this work would be to invert the sense of this
> > > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > > > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > > > > > would not be surprised if there is a better way.
> > >
> > > In that situation probably we should just enable it by default.
> >
> > You are within your rights to cause it to be enabled by default -within-
> > -Android-.  You are -not- within your rights to break other workloads.
> >
> > If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.
> >
> > It is not -that- hard, guys!  ;-)
> 
> I think on the topic of RT, +Steven Rostedt should chime in as well
> considering he wrote a good chunk of the RT scheduler ;-). Personally,
> I feel the issue of "rcu callback offload" threads running as RT or
> not should not be a matter of CONFIG option or the system in concern.
> Instead it should be a function of how many callbacks there are to
> run.  The reason I say this is, RT threads should not be doing a lot
> of work anyway, lest they cause RT throttling and starvation of other
> threads.

This gets complicated surprisingly quickly.  For but one example, you
would find yourself wanting time-based boosting, most likely before you
wanted boosting based on numbers of callbacks.  And it is all too easy
to drive considerably complexity into the mix before proving that it is
really needed.  Especially given how rare the need for RCU priority
boosting is to begin with.

> Also, I think it is wrong to assume that a certain kind of system will
> always have a certain number of callbacks to process at a time. That
> seems prone to poor design due to assumptions which may not always be
> true.

Who was assuming that?  Uladzislau was measuring rather than assuming,
if that was what you were getting at.  Or if you are thinking about
things like qhimark, your point is exactly why there is both a default
(which has worked quite well for a very long time) and the ability to
adjust based on the needs of your specific system.

> Can we not have 2 sets of RCU offload threads, one which operate at RT
> and only process few callbacks at a time, while another which is the
> lower priority CFS offload thread - executes whenever there is a lot
> of CBs pending? Just a thought.

How about if we start by solving the problems we know that we have?

> Otherwise, I feel like we might be again proliferating CONFIG options
> and increasing burden on the user to get it the CONFIG right.

I bet that we will solve this without adding any new Kconfig options.
And I bet that the burden is at worst on the device designer, not on
the user.  Plus it is entirely possible that there might be a way to
automatically configure things to handle what we know about today,
again without adding Kconfig options.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-08 21:32             ` Paul E. McKenney
@ 2022-05-09  0:17               ` Joel Fernandes
  2022-05-09  3:37                 ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-05-09  0:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Uladzislau Rezki, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
[...]
> > > > > > > One easy way to make this work would be to invert the sense of this
> > > > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > > > > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > > > > > > would not be surprised if there is a better way.
> > > >
> > > > In that situation probably we should just enable it by default.
> > >
> > > You are within your rights to cause it to be enabled by default -within-
> > > -Android-.  You are -not- within your rights to break other workloads.
> > >
> > > If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.
> > >
> > > It is not -that- hard, guys!  ;-)
> >
> > I think on the topic of RT, +Steven Rostedt should chime in as well
> > considering he wrote a good chunk of the RT scheduler ;-). Personally,
> > I feel the issue of "rcu callback offload" threads running as RT or
> > not should not be a matter of CONFIG option or the system in concern.
> > Instead it should be a function of how many callbacks there are to
> > run.  The reason I say this is, RT threads should not be doing a lot
> > of work anyway, lest they cause RT throttling and starvation of other
> > threads.
>
> This gets complicated surprisingly quickly.  For but one example, you
> would find yourself wanting time-based boosting, most likely before you
> wanted boosting based on numbers of callbacks.  And it is all too easy
> to drive considerably complexity into the mix before proving that it is
> really needed.  Especially given how rare the need for RCU priority
> boosting is to begin with.

I think this patch does not deal with or change the behavior of
dynamic priority boosting preempted RCU readers, but rather it makes
it such that the no-cb offload threads that execute the callbacks. So
I am not sure why you are talking about the boosting behavior of
preempted RCU readers? I was referring only to the nocb offload
kthreads which as I understand, Vlad *does not* want to run at RT
priority.

> > Also, I think it is wrong to assume that a certain kind of system will
> > always have a certain number of callbacks to process at a time. That
> > seems prone to poor design due to assumptions which may not always be
> > true.
>
> Who was assuming that?  Uladzislau was measuring rather than assuming,
> if that was what you were getting at.  Or if you are thinking about
> things like qhimark, your point is exactly why there is both a default
> (which has worked quite well for a very long time) and the ability to
> adjust based on the needs of your specific system.

I was merely saying that based on measurements make assumptions, but
in the real world the assumption may not be true, then everything
falls apart. Instead I feel, callback threads should be RT only if 1.
As you mentioned, the time based thing. 2. If the CB list is long and
there's lot of processing. But instead, if it is made a CONFIG option,
then that forces a fixed behavior which may fall apart in the real
world. I think adding more CONFIGs and special cases is more complex
but that's my opinion.

> > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > and only process few callbacks at a time, while another which is the
> > lower priority CFS offload thread - executes whenever there is a lot
> > of CBs pending? Just a thought.
>
> How about if we start by solving the problems we know that we have?

I don't know why you would say that, because we are talking about
solving the specific problem Vlad's patch addresses, not random
problems. Which is that, Android wants to run expedited GPs, but when
the callback list is large, the RT nocb thread can starve other
things. Did I misunderstand the patch? If so, sorry about that but
that's what my email was discussing. i.e. running of CBs in RT
threads. I suck at writing well as I clearly miscommunicated.

> > Otherwise, I feel like we might be again proliferating CONFIG options
> > and increasing burden on the user to get it the CONFIG right.
>
> I bet that we will solve this without adding any new Kconfig options.
> And I bet that the burden is at worst on the device designer, not on
> the user.  Plus it is entirely possible that there might be a way to
> automatically configure things to handle what we know about today,
> again without adding Kconfig options.

Yes, agreed.

Thanks,
Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09  0:17               ` Joel Fernandes
@ 2022-05-09  3:37                 ` Paul E. McKenney
  2022-05-09 17:17                   ` Joel Fernandes
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-09  3:37 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, Uladzislau Rezki, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> [...]
> > > > > > > > One easy way to make this work would be to invert the sense of this
> > > > > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > > > > > "n", but then select it somewhere in drivers/android/Kconfig.  But I
> > > > > > > > would not be surprised if there is a better way.
> > > > >
> > > > > In that situation probably we should just enable it by default.
> > > >
> > > > You are within your rights to cause it to be enabled by default -within-
> > > > -Android-.  You are -not- within your rights to break other workloads.
> > > >
> > > > If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.
> > > >
> > > > It is not -that- hard, guys!  ;-)
> > >
> > > I think on the topic of RT, +Steven Rostedt should chime in as well
> > > considering he wrote a good chunk of the RT scheduler ;-). Personally,
> > > I feel the issue of "rcu callback offload" threads running as RT or
> > > not should not be a matter of CONFIG option or the system in concern.
> > > Instead it should be a function of how many callbacks there are to
> > > run.  The reason I say this is, RT threads should not be doing a lot
> > > of work anyway, lest they cause RT throttling and starvation of other
> > > threads.
> >
> > This gets complicated surprisingly quickly.  For but one example, you
> > would find yourself wanting time-based boosting, most likely before you
> > wanted boosting based on numbers of callbacks.  And it is all too easy
> > to drive considerably complexity into the mix before proving that it is
> > really needed.  Especially given how rare the need for RCU priority
> > boosting is to begin with.
> 
> I think this patch does not deal with or change the behavior of
> dynamic priority boosting preempted RCU readers, but rather it makes
> it such that the no-cb offload threads that execute the callbacks. So
> I am not sure why you are talking about the boosting behavior of
> preempted RCU readers? I was referring only to the nocb offload
> kthreads which as I understand, Vlad *does not* want to run at RT
> priority.

OK.  Exactly what is the problem that you are trying to solve?  ;-)

And yes, I fully understand that Uladzislau does not want to run the rcuo
kthreads at RT priority, even in kernels built with CONFIG_RCU_BOOST=y.
Which makes sense, given that he is looking to solve a very different
problem than CONFIG_RCU_BOOST was designed to solve.  So adjustments must
be made.  The discussion is the exact form of the next set of adjustments,
which I expect to be quite straightforward.

> > > Also, I think it is wrong to assume that a certain kind of system will
> > > always have a certain number of callbacks to process at a time. That
> > > seems prone to poor design due to assumptions which may not always be
> > > true.
> >
> > Who was assuming that?  Uladzislau was measuring rather than assuming,
> > if that was what you were getting at.  Or if you are thinking about
> > things like qhimark, your point is exactly why there is both a default
> > (which has worked quite well for a very long time) and the ability to
> > adjust based on the needs of your specific system.
> 
> I was merely saying that based on measurements make assumptions, but
> in the real world the assumption may not be true, then everything
> falls apart. Instead I feel, callback threads should be RT only if 1.
> As you mentioned, the time based thing. 2. If the CB list is long and
> there's lot of processing. But instead, if it is made a CONFIG option,
> then that forces a fixed behavior which may fall apart in the real
> world. I think adding more CONFIGs and special cases is more complex
> but that's my opinion.

Again, exactly what problem are you trying to solve?

From what I can see, Uladzislau's issue can be addressed by statically
setting the rcuo kthreads to SCHED_OTHER at boot time.  The discussion
is on exactly how RCU is to be informed of this, at kernel build time.

> > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > and only process few callbacks at a time, while another which is the
> > > lower priority CFS offload thread - executes whenever there is a lot
> > > of CBs pending? Just a thought.
> >
> > How about if we start by solving the problems we know that we have?
> 
> I don't know why you would say that, because we are talking about
> solving the specific problem Vlad's patch addresses, not random
> problems. Which is that, Android wants to run expedited GPs, but when
> the callback list is large, the RT nocb thread can starve other
> things. Did I misunderstand the patch? If so, sorry about that but
> that's what my email was discussing. i.e. running of CBs in RT
> threads. I suck at writing well as I clearly miscommunicated.

OK.

Why do you believe that this needs anything other than small adjustments
the defaults of existing Kconfig options?  Or am I completely missing
the point of your proposal?

> > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > and increasing burden on the user to get it the CONFIG right.
> >
> > I bet that we will solve this without adding any new Kconfig options.
> > And I bet that the burden is at worst on the device designer, not on
> > the user.  Plus it is entirely possible that there might be a way to
> > automatically configure things to handle what we know about today,
> > again without adding Kconfig options.
> 
> Yes, agreed.

If I change my last sentence to read as follows, are we still in
agreement?

	Plus it is entirely possible that there might be a way to
	automatically configure things to handle what we know about today,
	again without adding Kconfig options and without changing runtime
	code beyond that covered by Uladzislau's patch.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09  3:37                 ` Paul E. McKenney
@ 2022-05-09 17:17                   ` Joel Fernandes
  2022-05-09 18:14                     ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Fernandes @ 2022-05-09 17:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Uladzislau Rezki, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
 [...]
> > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > always have a certain number of callbacks to process at a time. That
> > > > seems prone to poor design due to assumptions which may not always be
> > > > true.
> > >
> > > Who was assuming that?  Uladzislau was measuring rather than assuming,
> > > if that was what you were getting at.  Or if you are thinking about
> > > things like qhimark, your point is exactly why there is both a default
> > > (which has worked quite well for a very long time) and the ability to
> > > adjust based on the needs of your specific system.
> >
> > I was merely saying that based on measurements make assumptions, but
> > in the real world the assumption may not be true, then everything
> > falls apart. Instead I feel, callback threads should be RT only if 1.
> > As you mentioned, the time based thing. 2. If the CB list is long and
> > there's lot of processing. But instead, if it is made a CONFIG option,
> > then that forces a fixed behavior which may fall apart in the real
> > world. I think adding more CONFIGs and special cases is more complex
> > but that's my opinion.
>
> Again, exactly what problem are you trying to solve?
>
> From what I can see, Uladzislau's issue can be addressed by statically
> setting the rcuo kthreads to SCHED_OTHER at boot time.  The discussion
> is on exactly how RCU is to be informed of this, at kernel build time.
>
> > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > and only process few callbacks at a time, while another which is the
> > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > of CBs pending? Just a thought.
> > >
> > > How about if we start by solving the problems we know that we have?
> >
> > I don't know why you would say that, because we are talking about
> > solving the specific problem Vlad's patch addresses, not random
> > problems. Which is that, Android wants to run expedited GPs, but when
> > the callback list is large, the RT nocb thread can starve other
> > things. Did I misunderstand the patch? If so, sorry about that but
> > that's what my email was discussing. i.e. running of CBs in RT
> > threads. I suck at writing well as I clearly miscommunicated.
>
> OK.
>
> Why do you believe that this needs anything other than small adjustments
> the defaults of existing Kconfig options?  Or am I completely missing
> the point of your proposal?
>
> > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > and increasing burden on the user to get it the CONFIG right.
> > >
> > > I bet that we will solve this without adding any new Kconfig options.
> > > And I bet that the burden is at worst on the device designer, not on
> > > the user.  Plus it is entirely possible that there might be a way to
> > > automatically configure things to handle what we know about today,
> > > again without adding Kconfig options.
> >
> > Yes, agreed.
>
> If I change my last sentence to read as follows, are we still in
> agreement?
>
>         Plus it is entirely possible that there might be a way to
>         automatically configure things to handle what we know about today,
>         again without adding Kconfig options and without changing runtime
>         code beyond that covered by Uladzislau's patch.

Yes, actually the automatic configuration of things is what I meant,
that's the "problem" I was referring to, where the system does the
right thing for a broader range of systems, without requiring the
users to find RCU issues and hand-tune them (that requires said users
to have tracing and debugging skills and get lucky finding a problem).
To be fair, I did not propose any solutions to such problems either,
it is just some ideas. I don't like knobs too much and I don't trust
users or system designers to get them right most of the time.

In that sense,  I don't think making rcuo threads run as RT or not
(which this patch does) is really fixing the problems. In one case,
you might have priority inversion, in another case you might cause
starvation. Probably what is needed is best of both worlds. That said,
I don't have better solutions right now than what I mentioned, which
is to assign priorities to the callbacks themselves and run them in
threads of different priorities.

For the record, I am not against the patch or anything like that (and
even if I was, I am not sure that it matters for merging :P)

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09 17:17                   ` Joel Fernandes
@ 2022-05-09 18:14                     ` Paul E. McKenney
  2022-05-09 18:28                       ` Uladzislau Rezki
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-09 18:14 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steven Rostedt, Uladzislau Rezki, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>  [...]
> > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > always have a certain number of callbacks to process at a time. That
> > > > > seems prone to poor design due to assumptions which may not always be
> > > > > true.
> > > >
> > > > Who was assuming that?  Uladzislau was measuring rather than assuming,
> > > > if that was what you were getting at.  Or if you are thinking about
> > > > things like qhimark, your point is exactly why there is both a default
> > > > (which has worked quite well for a very long time) and the ability to
> > > > adjust based on the needs of your specific system.
> > >
> > > I was merely saying that based on measurements make assumptions, but
> > > in the real world the assumption may not be true, then everything
> > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > then that forces a fixed behavior which may fall apart in the real
> > > world. I think adding more CONFIGs and special cases is more complex
> > > but that's my opinion.
> >
> > Again, exactly what problem are you trying to solve?
> >
> > From what I can see, Uladzislau's issue can be addressed by statically
> > setting the rcuo kthreads to SCHED_OTHER at boot time.  The discussion
> > is on exactly how RCU is to be informed of this, at kernel build time.
> >
> > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > and only process few callbacks at a time, while another which is the
> > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > of CBs pending? Just a thought.
> > > >
> > > > How about if we start by solving the problems we know that we have?
> > >
> > > I don't know why you would say that, because we are talking about
> > > solving the specific problem Vlad's patch addresses, not random
> > > problems. Which is that, Android wants to run expedited GPs, but when
> > > the callback list is large, the RT nocb thread can starve other
> > > things. Did I misunderstand the patch? If so, sorry about that but
> > > that's what my email was discussing. i.e. running of CBs in RT
> > > threads. I suck at writing well as I clearly miscommunicated.
> >
> > OK.
> >
> > Why do you believe that this needs anything other than small adjustments
> > the defaults of existing Kconfig options?  Or am I completely missing
> > the point of your proposal?
> >
> > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > and increasing burden on the user to get it the CONFIG right.
> > > >
> > > > I bet that we will solve this without adding any new Kconfig options.
> > > > And I bet that the burden is at worst on the device designer, not on
> > > > the user.  Plus it is entirely possible that there might be a way to
> > > > automatically configure things to handle what we know about today,
> > > > again without adding Kconfig options.
> > >
> > > Yes, agreed.
> >
> > If I change my last sentence to read as follows, are we still in
> > agreement?
> >
> >         Plus it is entirely possible that there might be a way to
> >         automatically configure things to handle what we know about today,
> >         again without adding Kconfig options and without changing runtime
> >         code beyond that covered by Uladzislau's patch.
> 
> Yes, actually the automatic configuration of things is what I meant,
> that's the "problem" I was referring to, where the system does the
> right thing for a broader range of systems, without requiring the
> users to find RCU issues and hand-tune them (that requires said users
> to have tracing and debugging skills and get lucky finding a problem).
> To be fair, I did not propose any solutions to such problems either,
> it is just some ideas. I don't like knobs too much and I don't trust
> users or system designers to get them right most of the time.
> 
> In that sense,  I don't think making rcuo threads run as RT or not
> (which this patch does) is really fixing the problems. In one case,
> you might have priority inversion, in another case you might cause
> starvation. Probably what is needed is best of both worlds. That said,
> I don't have better solutions right now than what I mentioned, which
> is to assign priorities to the callbacks themselves and run them in
> threads of different priorities.
> 
> For the record, I am not against the patch or anything like that (and
> even if I was, I am not sure that it matters for merging :P)

Fair enough!

And for the record at this end, I would not be surprised if in 2032
RCU offloaded callback invocation has sophisticated dynamic tuning of
priorities and much else besides.  But one step at a time!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09 18:14                     ` Paul E. McKenney
@ 2022-05-09 18:28                       ` Uladzislau Rezki
  2022-05-09 18:39                         ` Paul E. McKenney
  2022-05-10 14:01                         ` Steven Rostedt
  0 siblings, 2 replies; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-09 18:28 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes
  Cc: Joel Fernandes, Steven Rostedt, Uladzislau Rezki, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> > On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >  [...]
> > > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > > always have a certain number of callbacks to process at a time. That
> > > > > > seems prone to poor design due to assumptions which may not always be
> > > > > > true.
> > > > >
> > > > > Who was assuming that?  Uladzislau was measuring rather than assuming,
> > > > > if that was what you were getting at.  Or if you are thinking about
> > > > > things like qhimark, your point is exactly why there is both a default
> > > > > (which has worked quite well for a very long time) and the ability to
> > > > > adjust based on the needs of your specific system.
> > > >
> > > > I was merely saying that based on measurements make assumptions, but
> > > > in the real world the assumption may not be true, then everything
> > > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > > then that forces a fixed behavior which may fall apart in the real
> > > > world. I think adding more CONFIGs and special cases is more complex
> > > > but that's my opinion.
> > >
> > > Again, exactly what problem are you trying to solve?
> > >
> > > From what I can see, Uladzislau's issue can be addressed by statically
> > > setting the rcuo kthreads to SCHED_OTHER at boot time.  The discussion
> > > is on exactly how RCU is to be informed of this, at kernel build time.
> > >
> > > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > > and only process few callbacks at a time, while another which is the
> > > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > > of CBs pending? Just a thought.
> > > > >
> > > > > How about if we start by solving the problems we know that we have?
> > > >
> > > > I don't know why you would say that, because we are talking about
> > > > solving the specific problem Vlad's patch addresses, not random
> > > > problems. Which is that, Android wants to run expedited GPs, but when
> > > > the callback list is large, the RT nocb thread can starve other
> > > > things. Did I misunderstand the patch? If so, sorry about that but
> > > > that's what my email was discussing. i.e. running of CBs in RT
> > > > threads. I suck at writing well as I clearly miscommunicated.
> > >
> > > OK.
> > >
> > > Why do you believe that this needs anything other than small adjustments
> > > the defaults of existing Kconfig options?  Or am I completely missing
> > > the point of your proposal?
> > >
> > > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > > and increasing burden on the user to get it the CONFIG right.
> > > > >
> > > > > I bet that we will solve this without adding any new Kconfig options.
> > > > > And I bet that the burden is at worst on the device designer, not on
> > > > > the user.  Plus it is entirely possible that there might be a way to
> > > > > automatically configure things to handle what we know about today,
> > > > > again without adding Kconfig options.
> > > >
> > > > Yes, agreed.
> > >
> > > If I change my last sentence to read as follows, are we still in
> > > agreement?
> > >
> > >         Plus it is entirely possible that there might be a way to
> > >         automatically configure things to handle what we know about today,
> > >         again without adding Kconfig options and without changing runtime
> > >         code beyond that covered by Uladzislau's patch.
> > 
> > Yes, actually the automatic configuration of things is what I meant,
> > that's the "problem" I was referring to, where the system does the
> > right thing for a broader range of systems, without requiring the
> > users to find RCU issues and hand-tune them (that requires said users
> > to have tracing and debugging skills and get lucky finding a problem).
> > To be fair, I did not propose any solutions to such problems either,
> > it is just some ideas. I don't like knobs too much and I don't trust
> > users or system designers to get them right most of the time.
> > 
> > In that sense,  I don't think making rcuo threads run as RT or not
> > (which this patch does) is really fixing the problems. In one case,
> > you might have priority inversion, in another case you might cause
> > starvation. Probably what is needed is best of both worlds. That said,
> > I don't have better solutions right now than what I mentioned, which
> > is to assign priorities to the callbacks themselves and run them in
> > threads of different priorities.
> > 
> > For the record, I am not against the patch or anything like that (and
> > even if I was, I am not sure that it matters for merging :P)
> 
> Fair enough!
> 
> And for the record at this end, I would not be surprised if in 2032
> RCU offloaded callback invocation has sophisticated dynamic tuning of
> priorities and much else besides.  But one step at a time!  ;-)
> 
hh... It is hard to comment because i am a bit lost in this big conversation :)

What i have got so far. Joel does not like adding extra *_CONFIG
options, actually me too since it becomes more complicated thus
it requires more specific attention from users. I prefer to make
the code common but it is not possible sometimes to make it common,
because we have different kind of kernels and workloads.

From the other hand the patch splits the BOOSTING logic into two peaces
because driving the grace periods kthreads in RT priority is not a big
issue because their run-times are short. Whereas running the "kthreads-callbacks"
in the RT context can be long so we end up in throttled situation for
other workloads.

I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
was mainly designed for that kind of kernels. So we can align with Alison
patch and her decision, so i do not see any issues. So far RT folk seems
does not mind in having "callback-kthreads" as SCHED_FIFO :)

Do you agree with start from keeping it ON for CONFIG_PREEMPT_RT conf.
by default and OFF for other cases?

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09 18:28                       ` Uladzislau Rezki
@ 2022-05-09 18:39                         ` Paul E. McKenney
  2022-05-09 18:43                           ` Uladzislau Rezki
  2022-05-10 14:09                           ` Steven Rostedt
  2022-05-10 14:01                         ` Steven Rostedt
  1 sibling, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-09 18:39 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Joel Fernandes, Steven Rostedt, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Mon, May 09, 2022 at 08:28:26PM +0200, Uladzislau Rezki wrote:
> > On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> > > On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >  [...]
> > > > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > > > always have a certain number of callbacks to process at a time. That
> > > > > > > seems prone to poor design due to assumptions which may not always be
> > > > > > > true.
> > > > > >
> > > > > > Who was assuming that?  Uladzislau was measuring rather than assuming,
> > > > > > if that was what you were getting at.  Or if you are thinking about
> > > > > > things like qhimark, your point is exactly why there is both a default
> > > > > > (which has worked quite well for a very long time) and the ability to
> > > > > > adjust based on the needs of your specific system.
> > > > >
> > > > > I was merely saying that based on measurements make assumptions, but
> > > > > in the real world the assumption may not be true, then everything
> > > > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > > > then that forces a fixed behavior which may fall apart in the real
> > > > > world. I think adding more CONFIGs and special cases is more complex
> > > > > but that's my opinion.
> > > >
> > > > Again, exactly what problem are you trying to solve?
> > > >
> > > > From what I can see, Uladzislau's issue can be addressed by statically
> > > > setting the rcuo kthreads to SCHED_OTHER at boot time.  The discussion
> > > > is on exactly how RCU is to be informed of this, at kernel build time.
> > > >
> > > > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > > > and only process few callbacks at a time, while another which is the
> > > > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > > > of CBs pending? Just a thought.
> > > > > >
> > > > > > How about if we start by solving the problems we know that we have?
> > > > >
> > > > > I don't know why you would say that, because we are talking about
> > > > > solving the specific problem Vlad's patch addresses, not random
> > > > > problems. Which is that, Android wants to run expedited GPs, but when
> > > > > the callback list is large, the RT nocb thread can starve other
> > > > > things. Did I misunderstand the patch? If so, sorry about that but
> > > > > that's what my email was discussing. i.e. running of CBs in RT
> > > > > threads. I suck at writing well as I clearly miscommunicated.
> > > >
> > > > OK.
> > > >
> > > > Why do you believe that this needs anything other than small adjustments
> > > > the defaults of existing Kconfig options?  Or am I completely missing
> > > > the point of your proposal?
> > > >
> > > > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > > > and increasing burden on the user to get it the CONFIG right.
> > > > > >
> > > > > > I bet that we will solve this without adding any new Kconfig options.
> > > > > > And I bet that the burden is at worst on the device designer, not on
> > > > > > the user.  Plus it is entirely possible that there might be a way to
> > > > > > automatically configure things to handle what we know about today,
> > > > > > again without adding Kconfig options.
> > > > >
> > > > > Yes, agreed.
> > > >
> > > > If I change my last sentence to read as follows, are we still in
> > > > agreement?
> > > >
> > > >         Plus it is entirely possible that there might be a way to
> > > >         automatically configure things to handle what we know about today,
> > > >         again without adding Kconfig options and without changing runtime
> > > >         code beyond that covered by Uladzislau's patch.
> > > 
> > > Yes, actually the automatic configuration of things is what I meant,
> > > that's the "problem" I was referring to, where the system does the
> > > right thing for a broader range of systems, without requiring the
> > > users to find RCU issues and hand-tune them (that requires said users
> > > to have tracing and debugging skills and get lucky finding a problem).
> > > To be fair, I did not propose any solutions to such problems either,
> > > it is just some ideas. I don't like knobs too much and I don't trust
> > > users or system designers to get them right most of the time.
> > > 
> > > In that sense,  I don't think making rcuo threads run as RT or not
> > > (which this patch does) is really fixing the problems. In one case,
> > > you might have priority inversion, in another case you might cause
> > > starvation. Probably what is needed is best of both worlds. That said,
> > > I don't have better solutions right now than what I mentioned, which
> > > is to assign priorities to the callbacks themselves and run them in
> > > threads of different priorities.
> > > 
> > > For the record, I am not against the patch or anything like that (and
> > > even if I was, I am not sure that it matters for merging :P)
> > 
> > Fair enough!
> > 
> > And for the record at this end, I would not be surprised if in 2032
> > RCU offloaded callback invocation has sophisticated dynamic tuning of
> > priorities and much else besides.  But one step at a time!  ;-)
> > 
> hh... It is hard to comment because i am a bit lost in this big conversation :)
> 
> What i have got so far. Joel does not like adding extra *_CONFIG
> options, actually me too since it becomes more complicated thus
> it requires more specific attention from users. I prefer to make
> the code common but it is not possible sometimes to make it common,
> because we have different kind of kernels and workloads.
> 
> >From the other hand the patch splits the BOOSTING logic into two peaces
> because driving the grace periods kthreads in RT priority is not a big
> issue because their run-times are short. Whereas running the "kthreads-callbacks"
> in the RT context can be long so we end up in throttled situation for
> other workloads.
> 
> I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> was mainly designed for that kind of kernels. So we can align with Alison
> patch and her decision, so i do not see any issues. So far RT folk seems
> does not mind in having "callback-kthreads" as SCHED_FIFO :)
> 
> Do you agree with start from keeping it ON for CONFIG_PREEMPT_RT conf.
> by default and OFF for other cases?

Yes, please!

This allows your current RCU_NOCB_CPU_CB_BOOST with something like
this in place of the "default n":

	default y if PREEMPT_RT
	default n if !PREEMPT_RT

There might be a simpler way of doing this, but this would work.

Could you please send a v2 with the requested updates?

						Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09 18:39                         ` Paul E. McKenney
@ 2022-05-09 18:43                           ` Uladzislau Rezki
  2022-05-10 14:09                           ` Steven Rostedt
  1 sibling, 0 replies; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-09 18:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Joel Fernandes, Steven Rostedt, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> On Mon, May 09, 2022 at 08:28:26PM +0200, Uladzislau Rezki wrote:
> > > On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> > > > On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > > > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >  [...]
> > > > > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > > > > always have a certain number of callbacks to process at a time. That
> > > > > > > > seems prone to poor design due to assumptions which may not always be
> > > > > > > > true.
> > > > > > >
> > > > > > > Who was assuming that?  Uladzislau was measuring rather than assuming,
> > > > > > > if that was what you were getting at.  Or if you are thinking about
> > > > > > > things like qhimark, your point is exactly why there is both a default
> > > > > > > (which has worked quite well for a very long time) and the ability to
> > > > > > > adjust based on the needs of your specific system.
> > > > > >
> > > > > > I was merely saying that based on measurements make assumptions, but
> > > > > > in the real world the assumption may not be true, then everything
> > > > > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > > > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > > > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > > > > then that forces a fixed behavior which may fall apart in the real
> > > > > > world. I think adding more CONFIGs and special cases is more complex
> > > > > > but that's my opinion.
> > > > >
> > > > > Again, exactly what problem are you trying to solve?
> > > > >
> > > > > From what I can see, Uladzislau's issue can be addressed by statically
> > > > > setting the rcuo kthreads to SCHED_OTHER at boot time.  The discussion
> > > > > is on exactly how RCU is to be informed of this, at kernel build time.
> > > > >
> > > > > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > > > > and only process few callbacks at a time, while another which is the
> > > > > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > > > > of CBs pending? Just a thought.
> > > > > > >
> > > > > > > How about if we start by solving the problems we know that we have?
> > > > > >
> > > > > > I don't know why you would say that, because we are talking about
> > > > > > solving the specific problem Vlad's patch addresses, not random
> > > > > > problems. Which is that, Android wants to run expedited GPs, but when
> > > > > > the callback list is large, the RT nocb thread can starve other
> > > > > > things. Did I misunderstand the patch? If so, sorry about that but
> > > > > > that's what my email was discussing. i.e. running of CBs in RT
> > > > > > threads. I suck at writing well as I clearly miscommunicated.
> > > > >
> > > > > OK.
> > > > >
> > > > > Why do you believe that this needs anything other than small adjustments
> > > > > the defaults of existing Kconfig options?  Or am I completely missing
> > > > > the point of your proposal?
> > > > >
> > > > > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > > > > and increasing burden on the user to get it the CONFIG right.
> > > > > > >
> > > > > > > I bet that we will solve this without adding any new Kconfig options.
> > > > > > > And I bet that the burden is at worst on the device designer, not on
> > > > > > > the user.  Plus it is entirely possible that there might be a way to
> > > > > > > automatically configure things to handle what we know about today,
> > > > > > > again without adding Kconfig options.
> > > > > >
> > > > > > Yes, agreed.
> > > > >
> > > > > If I change my last sentence to read as follows, are we still in
> > > > > agreement?
> > > > >
> > > > >         Plus it is entirely possible that there might be a way to
> > > > >         automatically configure things to handle what we know about today,
> > > > >         again without adding Kconfig options and without changing runtime
> > > > >         code beyond that covered by Uladzislau's patch.
> > > > 
> > > > Yes, actually the automatic configuration of things is what I meant,
> > > > that's the "problem" I was referring to, where the system does the
> > > > right thing for a broader range of systems, without requiring the
> > > > users to find RCU issues and hand-tune them (that requires said users
> > > > to have tracing and debugging skills and get lucky finding a problem).
> > > > To be fair, I did not propose any solutions to such problems either,
> > > > it is just some ideas. I don't like knobs too much and I don't trust
> > > > users or system designers to get them right most of the time.
> > > > 
> > > > In that sense,  I don't think making rcuo threads run as RT or not
> > > > (which this patch does) is really fixing the problems. In one case,
> > > > you might have priority inversion, in another case you might cause
> > > > starvation. Probably what is needed is best of both worlds. That said,
> > > > I don't have better solutions right now than what I mentioned, which
> > > > is to assign priorities to the callbacks themselves and run them in
> > > > threads of different priorities.
> > > > 
> > > > For the record, I am not against the patch or anything like that (and
> > > > even if I was, I am not sure that it matters for merging :P)
> > > 
> > > Fair enough!
> > > 
> > > And for the record at this end, I would not be surprised if in 2032
> > > RCU offloaded callback invocation has sophisticated dynamic tuning of
> > > priorities and much else besides.  But one step at a time!  ;-)
> > > 
> > hh... It is hard to comment because i am a bit lost in this big conversation :)
> > 
> > What i have got so far. Joel does not like adding extra *_CONFIG
> > options, actually me too since it becomes more complicated thus
> > it requires more specific attention from users. I prefer to make
> > the code common but it is not possible sometimes to make it common,
> > because we have different kind of kernels and workloads.
> > 
> > >From the other hand the patch splits the BOOSTING logic into two peaces
> > because driving the grace periods kthreads in RT priority is not a big
> > issue because their run-times are short. Whereas running the "kthreads-callbacks"
> > in the RT context can be long so we end up in throttled situation for
> > other workloads.
> > 
> > I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> > was mainly designed for that kind of kernels. So we can align with Alison
> > patch and her decision, so i do not see any issues. So far RT folk seems
> > does not mind in having "callback-kthreads" as SCHED_FIFO :)
> > 
> > Do you agree with start from keeping it ON for CONFIG_PREEMPT_RT conf.
> > by default and OFF for other cases?
> 
> Yes, please!
> 
> This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> this in place of the "default n":
> 
> 	default y if PREEMPT_RT
> 	default n if !PREEMPT_RT
> 
> There might be a simpler way of doing this, but this would work.
> 
> Could you please send a v2 with the requested updates?
> 
No problem, will send an updated version.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09 18:28                       ` Uladzislau Rezki
  2022-05-09 18:39                         ` Paul E. McKenney
@ 2022-05-10 14:01                         ` Steven Rostedt
  2022-05-11 13:39                           ` Uladzislau Rezki
  1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2022-05-10 14:01 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Mon, 9 May 2022 20:28:26 +0200
Uladzislau Rezki <urezki@gmail.com> wrote:

> I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> was mainly designed for that kind of kernels. So we can align with Alison
> patch and her decision, so i do not see any issues. So far RT folk seems
> does not mind in having "callback-kthreads" as SCHED_FIFO :)

That's because RT folks set the threads they care about to a higher RT
priority than the kthreads. ;-)


-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-05 19:09 ` Paul E. McKenney
  2022-05-06 16:22   ` Uladzislau Rezki
@ 2022-05-10 14:07   ` Sebastian Andrzej Siewior
  2022-05-10 17:14     ` Paul E. McKenney
  1 sibling, 1 reply; 28+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-05-10 14:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

On 2022-05-05 12:09:15 [-0700], Paul E. McKenney wrote:
> All good points!
> 
> Some questions and comments below.
> 
> Adding Sebastian on CC for his perspective.

Thank you.
I may missing things, I tried to digest the thread…

In my understanding: The boosting option is used to allow a SCHED_OTHER
task within a RCU section to allow to leave the RCU section while tasks
with higher priority occupy the CPU.
As far as the RCU callbacks are concerned, I'm not aware that it would
be beneficial to run them with an elevated priority. On SMP systems,
there is the suggestion to have a housekeeping CPU and to offload the
RCU callbacks to this CPU and no to bother the CPU with the RT workload.

> 						Thanx, Paul

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-09 18:39                         ` Paul E. McKenney
  2022-05-09 18:43                           ` Uladzislau Rezki
@ 2022-05-10 14:09                           ` Steven Rostedt
  2022-05-10 14:24                             ` Paul E. McKenney
  2022-05-10 14:35                             ` Uladzislau Rezki
  1 sibling, 2 replies; 28+ messages in thread
From: Steven Rostedt @ 2022-05-10 14:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Mon, 9 May 2022 11:39:34 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> this in place of the "default n":
> 
> 	default y if PREEMPT_RT
> 	default n if !PREEMPT_RT

BTW, I don't think you need the !PREEMPT_RT, because all configs are
'n' by  default. That is:

	default y if PREEMPT_RT

should be good enough.

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-10 14:09                           ` Steven Rostedt
@ 2022-05-10 14:24                             ` Paul E. McKenney
  2022-05-10 14:35                             ` Uladzislau Rezki
  1 sibling, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-10 14:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uladzislau Rezki, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Tue, May 10, 2022 at 10:09:46AM -0400, Steven Rostedt wrote:
> On Mon, 9 May 2022 11:39:34 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> > this in place of the "default n":
> > 
> > 	default y if PREEMPT_RT
> > 	default n if !PREEMPT_RT
> 
> BTW, I don't think you need the !PREEMPT_RT, because all configs are
> 'n' by  default. That is:
> 
> 	default y if PREEMPT_RT
> 
> should be good enough.

Good point, thank you!

That said, there is a lot of "default n" in a lot of Kconfig files.
And I am OK making this explicit.  So Uladzislau's choice.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-10 14:09                           ` Steven Rostedt
  2022-05-10 14:24                             ` Paul E. McKenney
@ 2022-05-10 14:35                             ` Uladzislau Rezki
  1 sibling, 0 replies; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-10 14:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Uladzislau Rezki, Joel Fernandes,
	Alison Chaiken, Sebastian Andrzej Siewior, LKML, RCU,
	Frederic Weisbecker, Neeraj Upadhyay, Oleksiy Avramchenko

On Tue, May 10, 2022 at 10:09:46AM -0400, Steven Rostedt wrote:
> On Mon, 9 May 2022 11:39:34 -0700
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> > this in place of the "default n":
> > 
> > 	default y if PREEMPT_RT
> > 	default n if !PREEMPT_RT
> 
> BTW, I don't think you need the !PREEMPT_RT, because all configs are
> 'n' by  default. That is:
> 
> 	default y if PREEMPT_RT
> 
> should be good enough.
> 
Thank you! This is what i have in the next v2 :)

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-10 14:07   ` Sebastian Andrzej Siewior
@ 2022-05-10 17:14     ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2022-05-10 17:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Uladzislau Rezki (Sony),
	LKML, RCU, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Oleksiy Avramchenko

On Tue, May 10, 2022 at 04:07:23PM +0200, Sebastian Andrzej Siewior wrote:
> On 2022-05-05 12:09:15 [-0700], Paul E. McKenney wrote:
> > All good points!
> > 
> > Some questions and comments below.
> > 
> > Adding Sebastian on CC for his perspective.
> 
> Thank you.
> I may missing things, I tried to digest the thread…
> 
> In my understanding: The boosting option is used to allow a SCHED_OTHER
> task within a RCU section to allow to leave the RCU section while tasks
> with higher priority occupy the CPU.
> As far as the RCU callbacks are concerned, I'm not aware that it would
> be beneficial to run them with an elevated priority. On SMP systems,
> there is the suggestion to have a housekeeping CPU and to offload the
> RCU callbacks to this CPU and no to bother the CPU with the RT workload.

Agreed, even within RT, there are multiple ways to get this job done.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-10 14:01                         ` Steven Rostedt
@ 2022-05-11 13:39                           ` Uladzislau Rezki
  2022-05-11 14:29                             ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-11 13:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uladzislau Rezki, Paul E. McKenney, Joel Fernandes,
	Alison Chaiken, Sebastian Andrzej Siewior, LKML, RCU,
	Frederic Weisbecker, Neeraj Upadhyay, Oleksiy Avramchenko

> On Mon, 9 May 2022 20:28:26 +0200
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> > was mainly designed for that kind of kernels. So we can align with Alison
> > patch and her decision, so i do not see any issues. So far RT folk seems
> > does not mind in having "callback-kthreads" as SCHED_FIFO :)
> 
> That's because RT folks set the threads they care about to a higher RT
> priority than the kthreads. ;-)
> 
That explains many things :)

I have one question, it is partly related to the topic that is in question
and to this thread also. I was tracing a "long" duration of the offloading
kthreads which actually invoke them one by one. And the picture was like
below from ftrace point of view:

<snip>
 rcuop/6-54  [000] .N..  183.753018: rcu_invoke_callback:  rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
 rcuop/6-54  [000] .N..  183.753020: rcu_invoke_callback:  rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
 rcuop/6-54  [000] .N..  183.753021: rcu_invoke_callback:  rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
 ... 
 rcuop/6-54  [000] .N..  183.755941: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
 rcuop/6-54  [000] .N..  183.755942: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
 rcuop/6-54  [000] dN..  183.755944: rcu_batch_end:        rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
 rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      Start context switch
 rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      End context switch
<snip>

i spent some time in order to understand why the context was not switched,
even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
was sent to the CPU_0 to reschedule. The last "." in latency field shows
that a context has not disabled any preemption. So everything should be fine.

An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
so a task becomes non preemtable but the ftrace does not provide any signal about
it. So i was fooled for some time by my tracer logs.

Do you have any thoughts about it? Should it be solved or signaled
somehow that a task in fact is not preemtable if a counter > 0?

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-11 13:39                           ` Uladzislau Rezki
@ 2022-05-11 14:29                             ` Steven Rostedt
  2022-05-11 14:51                               ` Uladzislau Rezki
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2022-05-11 14:29 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Wed, 11 May 2022 15:39:56 +0200
Uladzislau Rezki <urezki@gmail.com> wrote:

> <snip>
>  rcuop/6-54  [000] .N..  183.753018: rcu_invoke_callback:  rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
>  rcuop/6-54  [000] .N..  183.753020: rcu_invoke_callback:  rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
>  rcuop/6-54  [000] .N..  183.753021: rcu_invoke_callback:  rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
>  ... 
>  rcuop/6-54  [000] .N..  183.755941: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
>  rcuop/6-54  [000] .N..  183.755942: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
>  rcuop/6-54  [000] dN..  183.755944: rcu_batch_end:        rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
>  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      Start context switch
>  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      End context switch
> <snip>
> 
> i spent some time in order to understand why the context was not switched,
> even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> was sent to the CPU_0 to reschedule. The last "." in latency field shows
> that a context has not disabled any preemption. So everything should be fine.
> 
> An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> so a task becomes non preemtable but the ftrace does not provide any signal about
> it. So i was fooled for some time by my tracer logs.
> 
> Do you have any thoughts about it? Should it be solved or signaled
> somehow that a task in fact is not preemtable if a counter > 0?

Hmm, it should show it in the first part (where the 'd' is). Is this a
snapshot from the kernel or from trace-cmd?

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-11 14:29                             ` Steven Rostedt
@ 2022-05-11 14:51                               ` Uladzislau Rezki
  2022-05-11 14:53                                 ` Uladzislau Rezki
  0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-11 14:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uladzislau Rezki, Paul E. McKenney, Joel Fernandes,
	Alison Chaiken, Sebastian Andrzej Siewior, LKML, RCU,
	Frederic Weisbecker, Neeraj Upadhyay, Oleksiy Avramchenko

On Wed, May 11, 2022 at 10:29:57AM -0400, Steven Rostedt wrote:
> On Wed, 11 May 2022 15:39:56 +0200
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > <snip>
> >  rcuop/6-54  [000] .N..  183.753018: rcu_invoke_callback:  rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> >  rcuop/6-54  [000] .N..  183.753020: rcu_invoke_callback:  rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> >  rcuop/6-54  [000] .N..  183.753021: rcu_invoke_callback:  rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> >  ... 
> >  rcuop/6-54  [000] .N..  183.755941: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> >  rcuop/6-54  [000] .N..  183.755942: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> >  rcuop/6-54  [000] dN..  183.755944: rcu_batch_end:        rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> >  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      Start context switch
> >  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      End context switch
> > <snip>
> > 
> > i spent some time in order to understand why the context was not switched,
> > even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> > was sent to the CPU_0 to reschedule. The last "." in latency field shows
> > that a context has not disabled any preemption. So everything should be fine.
> > 
> > An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> > so a task becomes non preemtable but the ftrace does not provide any signal about
> > it. So i was fooled for some time by my tracer logs.
> > 
> > Do you have any thoughts about it? Should it be solved or signaled
> > somehow that a task in fact is not preemtable if a counter > 0?
> 
> Hmm, it should show it in the first part (where the 'd' is). Is this a
> snapshot from the kernel or from trace-cmd?
> 
I do both and the behavior is the same. But the above one looks like a
kernel trace output, the trace-cmd snapshot looks differently. So you
mean "s" has to be there then?

<snip>
	entry->preempt_count		= pc & 0xff;
	entry->pid			= (tsk) ? tsk->pid : 0;
	entry->type			= type;
	entry->flags =
#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
		(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
#else
		TRACE_FLAG_IRQS_NOSUPPORT |
#endif
		((pc & NMI_MASK    ) ? TRACE_FLAG_NMI     : 0) |
		((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
		((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) |
		(tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
		(test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
<snip>

BTW, i am not the 5.10 kernel. I have not checked the latest kernel
and what ftrace reports under holding local_bh_disable().

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-11 14:51                               ` Uladzislau Rezki
@ 2022-05-11 14:53                                 ` Uladzislau Rezki
  2022-05-11 17:17                                   ` Uladzislau Rezki
  0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-11 14:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Steven Rostedt, Paul E. McKenney, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> On Wed, May 11, 2022 at 10:29:57AM -0400, Steven Rostedt wrote:
> > On Wed, 11 May 2022 15:39:56 +0200
> > Uladzislau Rezki <urezki@gmail.com> wrote:
> > 
> > > <snip>
> > >  rcuop/6-54  [000] .N..  183.753018: rcu_invoke_callback:  rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> > >  rcuop/6-54  [000] .N..  183.753020: rcu_invoke_callback:  rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> > >  rcuop/6-54  [000] .N..  183.753021: rcu_invoke_callback:  rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> > >  ... 
> > >  rcuop/6-54  [000] .N..  183.755941: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> > >  rcuop/6-54  [000] .N..  183.755942: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> > >  rcuop/6-54  [000] dN..  183.755944: rcu_batch_end:        rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> > >  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      Start context switch
> > >  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      End context switch
> > > <snip>
> > > 
> > > i spent some time in order to understand why the context was not switched,
> > > even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> > > was sent to the CPU_0 to reschedule. The last "." in latency field shows
> > > that a context has not disabled any preemption. So everything should be fine.
> > > 
> > > An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> > > so a task becomes non preemtable but the ftrace does not provide any signal about
> > > it. So i was fooled for some time by my tracer logs.
> > > 
> > > Do you have any thoughts about it? Should it be solved or signaled
> > > somehow that a task in fact is not preemtable if a counter > 0?
> > 
> > Hmm, it should show it in the first part (where the 'd' is). Is this a
> > snapshot from the kernel or from trace-cmd?
> > 
> I do both and the behavior is the same. But the above one looks like a
> kernel trace output, the trace-cmd snapshot looks differently. So you
> mean "s" has to be there then?
> 
> <snip>
> 	entry->preempt_count		= pc & 0xff;
> 	entry->pid			= (tsk) ? tsk->pid : 0;
> 	entry->type			= type;
> 	entry->flags =
> #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
> 		(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
> #else
> 		TRACE_FLAG_IRQS_NOSUPPORT |
> #endif
> 		((pc & NMI_MASK    ) ? TRACE_FLAG_NMI     : 0) |
> 		((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
> 		((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) |
> 		(tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
> 		(test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
> <snip>
> 
> BTW, i am not the 5.10 kernel. I have not checked the latest kernel
> and what ftrace reports under holding local_bh_disable().
>
Sorry, the was a typo. I am checking 5.10 kernel and the trace was taken
on that kernel.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-11 14:53                                 ` Uladzislau Rezki
@ 2022-05-11 17:17                                   ` Uladzislau Rezki
  2022-05-16 16:22                                     ` Steven Rostedt
  0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-11 17:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Steven Rostedt, Paul E. McKenney, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

> > On Wed, May 11, 2022 at 10:29:57AM -0400, Steven Rostedt wrote:
> > > On Wed, 11 May 2022 15:39:56 +0200
> > > Uladzislau Rezki <urezki@gmail.com> wrote:
> > > 
> > > > <snip>
> > > >  rcuop/6-54  [000] .N..  183.753018: rcu_invoke_callback:  rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> > > >  rcuop/6-54  [000] .N..  183.753020: rcu_invoke_callback:  rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> > > >  rcuop/6-54  [000] .N..  183.753021: rcu_invoke_callback:  rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> > > >  ... 
> > > >  rcuop/6-54  [000] .N..  183.755941: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> > > >  rcuop/6-54  [000] .N..  183.755942: rcu_invoke_callback:  rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> > > >  rcuop/6-54  [000] dN..  183.755944: rcu_batch_end:        rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> > > >  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      Start context switch
> > > >  rcuop/6-54  [000] dN..  183.755946: rcu_utilization:      End context switch
> > > > <snip>
> > > > 
> > > > i spent some time in order to understand why the context was not switched,
> > > > even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> > > > was sent to the CPU_0 to reschedule. The last "." in latency field shows
> > > > that a context has not disabled any preemption. So everything should be fine.
> > > > 
> > > > An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> > > > so a task becomes non preemtable but the ftrace does not provide any signal about
> > > > it. So i was fooled for some time by my tracer logs.
> > > > 
> > > > Do you have any thoughts about it? Should it be solved or signaled
> > > > somehow that a task in fact is not preemtable if a counter > 0?
> > > 
> > > Hmm, it should show it in the first part (where the 'd' is). Is this a
> > > snapshot from the kernel or from trace-cmd?
> > > 
> > I do both and the behavior is the same. But the above one looks like a
> > kernel trace output, the trace-cmd snapshot looks differently. So you
> > mean "s" has to be there then?
> > 
> > <snip>
> > 	entry->preempt_count		= pc & 0xff;
> > 	entry->pid			= (tsk) ? tsk->pid : 0;
> > 	entry->type			= type;
> > 	entry->flags =
> > #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
> > 		(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
> > #else
> > 		TRACE_FLAG_IRQS_NOSUPPORT |
> > #endif
> > 		((pc & NMI_MASK    ) ? TRACE_FLAG_NMI     : 0) |
> > 		((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
> > 		((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) |
> > 		(tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
> > 		(test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
> > <snip>
> > 
> > BTW, i am not the 5.10 kernel. I have not checked the latest kernel
> > and what ftrace reports under holding local_bh_disable().
> >
> Sorry, the was a typo. I am checking 5.10 kernel and the trace was taken
> on that kernel.
> 
OK. It was added on the latest kernel:

root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace_pipe
  vmalloc_test/0-1296    [062] b....    18.157470: 0xffffffffc044e5dc: -> in the local_bh_disable()

root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:64
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
root@pc638:/home/urezki# uname -a
Linux pc638 5.17.0-rc2-next-20220201 #63 SMP PREEMPT Tue May 10 20:39:08 CEST 2022 x86_64 GNU/Linux
root@pc638:/home/urezki#

so it shows *bh* disabled sections.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-11 17:17                                   ` Uladzislau Rezki
@ 2022-05-16 16:22                                     ` Steven Rostedt
  2022-05-16 16:42                                       ` Uladzislau Rezki
  0 siblings, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2022-05-16 16:22 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Paul E. McKenney, Joel Fernandes, Alison Chaiken,
	Sebastian Andrzej Siewior, LKML, RCU, Frederic Weisbecker,
	Neeraj Upadhyay, Oleksiy Avramchenko

On Wed, 11 May 2022 19:17:42 +0200
Uladzislau Rezki <urezki@gmail.com> wrote:

> OK. It was added on the latest kernel:
> 
> root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace_pipe
>   vmalloc_test/0-1296    [062] b....    18.157470: 0xffffffffc044e5dc: -> in the local_bh_disable()
> 
> root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0   #P:64
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
> root@pc638:/home/urezki# uname -a
> Linux pc638 5.17.0-rc2-next-20220201 #63 SMP PREEMPT Tue May 10 20:39:08 CEST 2022 x86_64 GNU/Linux
> root@pc638:/home/urezki#
> 
> so it shows *bh* disabled sections.

Ah yes. That was added with commit 289e7b0f7eb47 ("tracing: Account bottom
half disabled sections."). Maybe I should mark that as stable?

-- Steve

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context
  2022-05-16 16:22                                     ` Steven Rostedt
@ 2022-05-16 16:42                                       ` Uladzislau Rezki
  0 siblings, 0 replies; 28+ messages in thread
From: Uladzislau Rezki @ 2022-05-16 16:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Uladzislau Rezki, Paul E. McKenney, Joel Fernandes,
	Alison Chaiken, Sebastian Andrzej Siewior, LKML, RCU,
	Frederic Weisbecker, Neeraj Upadhyay, Oleksiy Avramchenko

> On Wed, 11 May 2022 19:17:42 +0200
> Uladzislau Rezki <urezki@gmail.com> wrote:
> 
> > OK. It was added on the latest kernel:
> > 
> > root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace_pipe
> >   vmalloc_test/0-1296    [062] b....    18.157470: 0xffffffffc044e5dc: -> in the local_bh_disable()
> > 
> > root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 0/0   #P:64
> > #
> > #                                _-----=> irqs-off/BH-disabled
> > #                               / _----=> need-resched
> > #                              | / _---=> hardirq/softirq
> > #                              || / _--=> preempt-depth
> > #                              ||| / _-=> migrate-disable
> > #                              |||| /     delay
> > #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> > #              | |         |   |||||     |         |
> > root@pc638:/home/urezki# uname -a
> > Linux pc638 5.17.0-rc2-next-20220201 #63 SMP PREEMPT Tue May 10 20:39:08 CEST 2022 x86_64 GNU/Linux
> > root@pc638:/home/urezki#
> > 
> > so it shows *bh* disabled sections.
> 
> Ah yes. That was added with commit 289e7b0f7eb47 ("tracing: Account bottom
> half disabled sections."). Maybe I should mark that as stable?
> 
BTW, a commit message of the "289e7b0f7eb47" change exactly describes
my first thought why it behaves like that. So if it is possible it would
be great to have it in the 5.10 and 5.15 stable kernels.

Thanks!

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-05-16 16:43 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-05 10:16 [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context Uladzislau Rezki (Sony)
2022-05-05 19:09 ` Paul E. McKenney
2022-05-06 16:22   ` Uladzislau Rezki
2022-05-06 18:24     ` Paul E. McKenney
2022-05-07  9:11       ` Uladzislau Rezki
2022-05-07 22:32         ` Paul E. McKenney
2022-05-08  6:28           ` Joel Fernandes
2022-05-08 21:32             ` Paul E. McKenney
2022-05-09  0:17               ` Joel Fernandes
2022-05-09  3:37                 ` Paul E. McKenney
2022-05-09 17:17                   ` Joel Fernandes
2022-05-09 18:14                     ` Paul E. McKenney
2022-05-09 18:28                       ` Uladzislau Rezki
2022-05-09 18:39                         ` Paul E. McKenney
2022-05-09 18:43                           ` Uladzislau Rezki
2022-05-10 14:09                           ` Steven Rostedt
2022-05-10 14:24                             ` Paul E. McKenney
2022-05-10 14:35                             ` Uladzislau Rezki
2022-05-10 14:01                         ` Steven Rostedt
2022-05-11 13:39                           ` Uladzislau Rezki
2022-05-11 14:29                             ` Steven Rostedt
2022-05-11 14:51                               ` Uladzislau Rezki
2022-05-11 14:53                                 ` Uladzislau Rezki
2022-05-11 17:17                                   ` Uladzislau Rezki
2022-05-16 16:22                                     ` Steven Rostedt
2022-05-16 16:42                                       ` Uladzislau Rezki
2022-05-10 14:07   ` Sebastian Andrzej Siewior
2022-05-10 17:14     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).