[PATCH] rcu: Only pin GP kthread when full dynticks is actually used

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
@ 2014-06-13  0:16 Frederic Weisbecker
  2014-06-13  1:24 ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13  0:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: LKML, Frederic Weisbecker, Josh Triplett, Steven Rostedt,
	Mathieu Desnoyers

CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
users should be a tiny minority, if actually any.

Also there is a risk that affining the GP kthread to a single CPU could
end up noticeably reducing RCU performances and increasing energy
consumption.

So lets affine the GP kthread only when nohz full is actually used
(ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 kernel/rcu/tree_plugin.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index cbc2c45..726f52c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
  */
 static void rcu_bind_gp_kthread(void)
 {
-#ifdef CONFIG_NO_HZ_FULL
-	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
+	int cpu;
+
+	if (!tick_nohz_full_enabled())
+		return;
+
+	cpu = ACCESS_ONCE(tick_do_timer_cpu);
 
 	if (cpu < 0 || cpu >= nr_cpu_ids)
 		return;
+
 	if (raw_smp_processor_id() != cpu)
 		set_cpus_allowed_ptr(current, cpumask_of(cpu));
-#endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13  0:16 [PATCH] rcu: Only pin GP kthread when full dynticks is actually used Frederic Weisbecker
@ 2014-06-13  1:24 ` Paul E. McKenney
  2014-06-13  1:35   ` Paul E. McKenney
                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13  1:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> users should be a tiny minority, if actually any.
> 
> Also there is a risk that affining the GP kthread to a single CPU could
> end up noticeably reducing RCU performances and increasing energy
> consumption.
> 
> So lets affine the GP kthread only when nohz full is actually used
> (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
>  kernel/rcu/tree_plugin.h | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index cbc2c45..726f52c 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
>   */
>  static void rcu_bind_gp_kthread(void)
>  {
> -#ifdef CONFIG_NO_HZ_FULL
> -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> +	int cpu;
> +
> +	if (!tick_nohz_full_enabled())
> +		return;
> +
> +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> 
>  	if (cpu < 0 || cpu >= nr_cpu_ids)
>  		return;
> +
>  	if (raw_smp_processor_id() != cpu)
>  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> -#endif /* #ifdef CONFIG_NO_HZ_FULL */
>  }

Hello, Frederic,

I have the following queued.  Shall I port yours on top of mine, or is
there an issue with mine?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs

Binding the grace-period kthreads to the timekeeping CPU resulted in
significant performance decreases for some workloads.  For more detail,
see:

https://lkml.org/lkml/2014/6/3/395 for benchmark numbers

https://lkml.org/lkml/2014/6/4/218 for CPU statistics

It turns out that it is necessary to bind the grace-period kthreads
to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
In other cases, it suffices to bind the grace-period kthreads to the
set of non-nohz_full CPUs.

This commit therefore creates a tick_nohz_not_full_mask that is the
complement of tick_nohz_full_mask, and then binds the grace-period
kthread to the set of CPUs indicated by this new mask, which covers
the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
case still binds the grace-period kthreads to the timekeeping CPU.

Reported-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/tick.h b/include/linux/tick.h
index b84773cb9f4c..1fe0c05eee39 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -162,6 +162,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 #ifdef CONFIG_NO_HZ_FULL
 extern bool tick_nohz_full_running;
 extern cpumask_var_t tick_nohz_full_mask;
+extern cpumask_var_t tick_nohz_not_full_mask;
 
 static inline bool tick_nohz_full_enabled(void)
 {
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7ce734040a5e..ec7627becaf0 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2863,7 +2863,12 @@ static void rcu_bind_gp_kthread(void)
 
 	if (cpu < 0 || cpu >= nr_cpu_ids)
 		return;
+#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
 	if (raw_smp_processor_id() != cpu)
 		set_cpus_allowed_ptr(current, cpumask_of(cpu));
+#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+	if (!cpumask_test_cpu(raw_smp_processor_id(), tick_nohz_not_full_mask))
+		set_cpus_allowed_ptr(current, tick_nohz_not_full_mask);
+#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9f8af69c67ec..02209e957e76 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -151,6 +151,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 
 #ifdef CONFIG_NO_HZ_FULL
 cpumask_var_t tick_nohz_full_mask;
+cpumask_var_t tick_nohz_not_full_mask;
 bool tick_nohz_full_running;
 
 static bool can_stop_full_tick(void)
@@ -278,6 +279,7 @@ static int __init tick_nohz_full_setup(char *str)
 	int cpu;
 
 	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
 	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
 		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
 		return 1;
@@ -288,6 +290,8 @@ static int __init tick_nohz_full_setup(char *str)
 		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
 		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
 	}
+	cpumask_andnot(tick_nohz_not_full_mask,
+		       cpu_possible_mask, tick_nohz_full_mask);
 	tick_nohz_full_running = true;
 
 	return 1;
@@ -332,6 +336,8 @@ static int tick_nohz_init_all(void)
 	err = 0;
 	cpumask_setall(tick_nohz_full_mask);
 	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
+	cpumask_clear(tick_nohz_not_full_mask);
+	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
 	tick_nohz_full_running = true;
 #endif
 	return err;


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13  1:24 ` Paul E. McKenney
@ 2014-06-13  1:35   ` Paul E. McKenney
  2014-06-13 12:47     ` Frederic Weisbecker
  2014-06-13  2:05   ` Paul E. McKenney
  2014-06-13 12:42   ` Frederic Weisbecker
  2 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13  1:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > users should be a tiny minority, if actually any.
> > 
> > Also there is a risk that affining the GP kthread to a single CPU could
> > end up noticeably reducing RCU performances and increasing energy
> > consumption.
> > 
> > So lets affine the GP kthread only when nohz full is actually used
> > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)

Which reminds me...  Kernel-heavy workloads running NO_HZ_FULL_ALL=y
can see long RCU grace periods, as in about two seconds each.  It is
not hard for me to detect this situation.  Is there some way I can
call for a given CPU's scheduling-clock interrupt to be turned on?

I believe that the nsproxy guys were seeing something like this as well.

							Thanx, Paul

> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > ---
> >  kernel/rcu/tree_plugin.h | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index cbc2c45..726f52c 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> >   */
> >  static void rcu_bind_gp_kthread(void)
> >  {
> > -#ifdef CONFIG_NO_HZ_FULL
> > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > +	int cpu;
> > +
> > +	if (!tick_nohz_full_enabled())
> > +		return;
> > +
> > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > 
> >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> >  		return;
> > +
> >  	if (raw_smp_processor_id() != cpu)
> >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> >  }
> 
> Hello, Frederic,
> 
> I have the following queued.  Shall I port yours on top of mine, or is
> there an issue with mine?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
> 
> Binding the grace-period kthreads to the timekeeping CPU resulted in
> significant performance decreases for some workloads.  For more detail,
> see:
> 
> https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> 
> https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> 
> It turns out that it is necessary to bind the grace-period kthreads
> to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> In other cases, it suffices to bind the grace-period kthreads to the
> set of non-nohz_full CPUs.
> 
> This commit therefore creates a tick_nohz_not_full_mask that is the
> complement of tick_nohz_full_mask, and then binds the grace-period
> kthread to the set of CPUs indicated by this new mask, which covers
> the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> case still binds the grace-period kthreads to the timekeeping CPU.
> 
> Reported-by: Jet Chen <jet.chen@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b84773cb9f4c..1fe0c05eee39 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -162,6 +162,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
>  #ifdef CONFIG_NO_HZ_FULL
>  extern bool tick_nohz_full_running;
>  extern cpumask_var_t tick_nohz_full_mask;
> +extern cpumask_var_t tick_nohz_not_full_mask;
>  
>  static inline bool tick_nohz_full_enabled(void)
>  {
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 7ce734040a5e..ec7627becaf0 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2863,7 +2863,12 @@ static void rcu_bind_gp_kthread(void)
>  
>  	if (cpu < 0 || cpu >= nr_cpu_ids)
>  		return;
> +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
>  	if (raw_smp_processor_id() != cpu)
>  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> +	if (!cpumask_test_cpu(raw_smp_processor_id(), tick_nohz_not_full_mask))
> +		set_cpus_allowed_ptr(current, tick_nohz_not_full_mask);
> +#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
>  #endif /* #ifdef CONFIG_NO_HZ_FULL */
>  }
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 9f8af69c67ec..02209e957e76 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -151,6 +151,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
>  
>  #ifdef CONFIG_NO_HZ_FULL
>  cpumask_var_t tick_nohz_full_mask;
> +cpumask_var_t tick_nohz_not_full_mask;
>  bool tick_nohz_full_running;
>  
>  static bool can_stop_full_tick(void)
> @@ -278,6 +279,7 @@ static int __init tick_nohz_full_setup(char *str)
>  	int cpu;
>  
>  	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
> +	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
>  	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
>  		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
>  		return 1;
> @@ -288,6 +290,8 @@ static int __init tick_nohz_full_setup(char *str)
>  		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
>  		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
>  	}
> +	cpumask_andnot(tick_nohz_not_full_mask,
> +		       cpu_possible_mask, tick_nohz_full_mask);
>  	tick_nohz_full_running = true;
>  
>  	return 1;
> @@ -332,6 +336,8 @@ static int tick_nohz_init_all(void)
>  	err = 0;
>  	cpumask_setall(tick_nohz_full_mask);
>  	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
> +	cpumask_clear(tick_nohz_not_full_mask);
> +	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
>  	tick_nohz_full_running = true;
>  #endif
>  	return err;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13  1:24 ` Paul E. McKenney
  2014-06-13  1:35   ` Paul E. McKenney
@ 2014-06-13  2:05   ` Paul E. McKenney
  2014-06-13 12:55     ` Frederic Weisbecker
  2014-06-13 12:42   ` Frederic Weisbecker
  2 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13  2:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > users should be a tiny minority, if actually any.
> > 
> > Also there is a risk that affining the GP kthread to a single CPU could
> > end up noticeably reducing RCU performances and increasing energy
> > consumption.
> > 
> > So lets affine the GP kthread only when nohz full is actually used
> > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > ---
> >  kernel/rcu/tree_plugin.h | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index cbc2c45..726f52c 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> >   */
> >  static void rcu_bind_gp_kthread(void)
> >  {
> > -#ifdef CONFIG_NO_HZ_FULL
> > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > +	int cpu;
> > +
> > +	if (!tick_nohz_full_enabled())
> > +		return;
> > +
> > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > 
> >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> >  		return;
> > +
> >  	if (raw_smp_processor_id() != cpu)
> >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> >  }
> 
> Hello, Frederic,
> 
> I have the following queued.  Shall I port yours on top of mine, or is
> there an issue with mine?

OK, I suppose I should show you what it looks like ported on top of mine.

I didn't understand the removal of "#ifdef CONFIG_NO_HZ_FULL", so I
left that.  I also didn't understand how dumping the GP kthreads onto
a single CPU could affect energy efficiency all that much, so I omitted
that from the commit log.  If I am missing something, please enlighten me.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Only pin GP kthread when full dynticks is actually used

CONFIG_NO_HZ_FULL is widely enabled in distros, but only a small fraction
of the users will be taking advantage of it.  Furthermore, there is
a risk that affining the GP kthread to a single CPU could noticeably
increase RCU grace-period latency.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c93c525b71fe..2f5679791594 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2865,7 +2865,7 @@ static void rcu_bind_gp_kthread(void)
 #ifdef CONFIG_NO_HZ_FULL
 	int cpu = tick_do_timer_cpu;
 
-	if (cpu < 0 || cpu >= nr_cpu_ids)
+	if (cpu < 0 || cpu >= nr_cpu_ids || !tick_nohz_full_enabled())
 		return;
 #ifdef CONFIG_NO_HZ_FULL_SYSIDLE
 	if (raw_smp_processor_id() != cpu)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13  1:24 ` Paul E. McKenney
  2014-06-13  1:35   ` Paul E. McKenney
  2014-06-13  2:05   ` Paul E. McKenney
@ 2014-06-13 12:42   ` Frederic Weisbecker
  2014-06-13 15:58     ` Paul E. McKenney
  2 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 12:42 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > users should be a tiny minority, if actually any.
> > 
> > Also there is a risk that affining the GP kthread to a single CPU could
> > end up noticeably reducing RCU performances and increasing energy
> > consumption.
> > 
> > So lets affine the GP kthread only when nohz full is actually used
> > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Josh Triplett <josh@joshtriplett.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > ---
> >  kernel/rcu/tree_plugin.h | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index cbc2c45..726f52c 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> >   */
> >  static void rcu_bind_gp_kthread(void)
> >  {
> > -#ifdef CONFIG_NO_HZ_FULL
> > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > +	int cpu;
> > +
> > +	if (!tick_nohz_full_enabled())
> > +		return;
> > +
> > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > 
> >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> >  		return;
> > +
> >  	if (raw_smp_processor_id() != cpu)
> >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> >  }
> 
> Hello, Frederic,
> 
> I have the following queued.  Shall I port yours on top of mine, or is
> there an issue with mine?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
> 
> Binding the grace-period kthreads to the timekeeping CPU resulted in
> significant performance decreases for some workloads.  For more detail,
> see:
> 
> https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> 
> https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> 
> It turns out that it is necessary to bind the grace-period kthreads
> to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> In other cases, it suffices to bind the grace-period kthreads to the
> set of non-nohz_full CPUs.
> 
> This commit therefore creates a tick_nohz_not_full_mask that is the
> complement of tick_nohz_full_mask, and then binds the grace-period
> kthread to the set of CPUs indicated by this new mask, which covers
> the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> case still binds the grace-period kthreads to the timekeeping CPU.
> 
> Reported-by: Jet Chen <jet.chen@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b84773cb9f4c..1fe0c05eee39 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -162,6 +162,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
>  #ifdef CONFIG_NO_HZ_FULL
>  extern bool tick_nohz_full_running;
>  extern cpumask_var_t tick_nohz_full_mask;
> +extern cpumask_var_t tick_nohz_not_full_mask;
>  
>  static inline bool tick_nohz_full_enabled(void)
>  {
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 7ce734040a5e..ec7627becaf0 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2863,7 +2863,12 @@ static void rcu_bind_gp_kthread(void)
>  
>  	if (cpu < 0 || cpu >= nr_cpu_ids)
>  		return;
> +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
>  	if (raw_smp_processor_id() != cpu)
>  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> +#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> +	if (!cpumask_test_cpu(raw_smp_processor_id(), tick_nohz_not_full_mask))
> +		set_cpus_allowed_ptr(current, tick_nohz_not_full_mask);
> +#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
>  #endif /* #ifdef CONFIG_NO_HZ_FULL */
>  }
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 9f8af69c67ec..02209e957e76 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -151,6 +151,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
>  
>  #ifdef CONFIG_NO_HZ_FULL
>  cpumask_var_t tick_nohz_full_mask;
> +cpumask_var_t tick_nohz_not_full_mask;
>  bool tick_nohz_full_running;
>  
>  static bool can_stop_full_tick(void)
> @@ -278,6 +279,7 @@ static int __init tick_nohz_full_setup(char *str)
>  	int cpu;
>  
>  	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
> +	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
>  	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
>  		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
>  		return 1;
> @@ -288,6 +290,8 @@ static int __init tick_nohz_full_setup(char *str)
>  		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
>  		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
>  	}
> +	cpumask_andnot(tick_nohz_not_full_mask,
> +		       cpu_possible_mask, tick_nohz_full_mask);
>  	tick_nohz_full_running = true;
>  
>  	return 1;
> @@ -332,6 +336,8 @@ static int tick_nohz_init_all(void)
>  	err = 0;
>  	cpumask_setall(tick_nohz_full_mask);
>  	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
> +	cpumask_clear(tick_nohz_not_full_mask);
> +	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
>  	tick_nohz_full_running = true;
>  #endif
>  	return err;
> 

Ah I missed that one. Ok it should fix the issue then, no need for mine.

Ok I must confess that I don't have a comfortable feeling with having two opposed masks
like this: tick_nohz_full_mask and tick_nohz_not_full_mask.

Maybe what we should do instead is to have something like this on RCU kthread init:

          cpumask_var_t gp_kthread_mask;

          if (alloc_cpumask_var(gp_kthread_mask, GFP_KERNEL))
              return -EFAULT;

          cpumask_andnot(gp_kthread_mask, cpu_possible_mask, tick_nohz_full_mask);

	  set_cpus_allowed_ptr(current, gp_kthread_mask);

	  free_cpumask_var(gp_kthread_mask);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13  1:35   ` Paul E. McKenney
@ 2014-06-13 12:47     ` Frederic Weisbecker
  2014-06-13 15:52       ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 12:47 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Thu, Jun 12, 2014 at 06:35:15PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > users should be a tiny minority, if actually any.
> > > 
> > > Also there is a risk that affining the GP kthread to a single CPU could
> > > end up noticeably reducing RCU performances and increasing energy
> > > consumption.
> > > 
> > > So lets affine the GP kthread only when nohz full is actually used
> > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> 
> Which reminds me...  Kernel-heavy workloads running NO_HZ_FULL_ALL=y
> can see long RCU grace periods, as in about two seconds each.  It is
> not hard for me to detect this situation.

Ah yeah sounds quite long.

> Is there some way I can
> call for a given CPU's scheduling-clock interrupt to be turned on?

Yeah, once the nohz kick patchset (https://lwn.net/Articles/601214/) is merged,
a simple call to tick_nohz_full_kick_cpu() should do the trick. Although the
right condition must be there on the IPI side. Maybe with rcu_needs_cpu() or such.

But it would be interesting to identify the sources of these extended grace periods.
If we only restart the tick, we may ignore some deeper oustanding issue.

Thanks.

> 
> I believe that the nsproxy guys were seeing something like this as well.
> 
> 							Thanx, Paul

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13  2:05   ` Paul E. McKenney
@ 2014-06-13 12:55     ` Frederic Weisbecker
  2014-06-13 15:55       ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 12:55 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Thu, Jun 12, 2014 at 07:05:54PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > users should be a tiny minority, if actually any.
> > > 
> > > Also there is a risk that affining the GP kthread to a single CPU could
> > > end up noticeably reducing RCU performances and increasing energy
> > > consumption.
> > > 
> > > So lets affine the GP kthread only when nohz full is actually used
> > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index cbc2c45..726f52c 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> > >   */
> > >  static void rcu_bind_gp_kthread(void)
> > >  {
> > > -#ifdef CONFIG_NO_HZ_FULL
> > > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > +	int cpu;
> > > +
> > > +	if (!tick_nohz_full_enabled())
> > > +		return;
> > > +
> > > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > 
> > >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> > >  		return;
> > > +
> > >  	if (raw_smp_processor_id() != cpu)
> > >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > >  }
> > 
> > Hello, Frederic,
> > 
> > I have the following queued.  Shall I port yours on top of mine, or is
> > there an issue with mine?
> 
> OK, I suppose I should show you what it looks like ported on top of mine.

No need to keep my patch as long as yours goes in. It fixes all the issue.

> 
> I didn't understand the removal of "#ifdef CONFIG_NO_HZ_FULL", so I
> left that.

Yeah that's because the:

      if (!tick_nohz_full_enabled())
           return;

returns unconditionally if CONFIG_NO_HZ_FULL=n. So the whole function
becomes dead code that should be detected and removed by gcc. So same
result as with the ifdef.

> I also didn't understand how dumping the GP kthreads onto
> a single CPU could affect energy efficiency all that much, so I omitted
> that from the commit log.  If I am missing something, please enlighten me.

Because if the kthread is pinned to CPU 0 and a grace period is in progress,
the kthread is going to disturb the CPU 0 regardless of its possibly deep sleep
state. OTOH if the kthread is widely affine, it can be scheduled to idle CPUs
that are less cold and thus wake them from less deep power state.

Well that all assuming that the scheduler takes care of these deep idle state.
But I believe it does iirc...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 12:47     ` Frederic Weisbecker
@ 2014-06-13 15:52       ` Paul E. McKenney
  2014-06-13 16:00         ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 15:52 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 02:47:16PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 12, 2014 at 06:35:15PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > users should be a tiny minority, if actually any.
> > > > 
> > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > end up noticeably reducing RCU performances and increasing energy
> > > > consumption.
> > > > 
> > > > So lets affine the GP kthread only when nohz full is actually used
> > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > 
> > Which reminds me...  Kernel-heavy workloads running NO_HZ_FULL_ALL=y
> > can see long RCU grace periods, as in about two seconds each.  It is
> > not hard for me to detect this situation.
> 
> Ah yeah sounds quite long.
> 
> > Is there some way I can
> > call for a given CPU's scheduling-clock interrupt to be turned on?
> 
> Yeah, once the nohz kick patchset (https://lwn.net/Articles/601214/) is merged,
> a simple call to tick_nohz_full_kick_cpu() should do the trick. Although the
> right condition must be there on the IPI side. Maybe with rcu_needs_cpu() or such.

I could record the offending GP, and make rcu_needs_cpu() return true
if the current GP matches the offending one.

> But it would be interesting to identify the sources of these extended grace periods.
> If we only restart the tick, we may ignore some deeper oustanding issue.

Some of them have been fixable by other means, but they will probably
come back as system sizes grow.  And I really have put preemption points
into kernel code in response to RCU CPU stall warnings, and the current
state of NO_HZ_FULL effectively ignores these preemption points.  :-/

							Thanx, Paul

> Thanks.
> 
> > 
> > I believe that the nsproxy guys were seeing something like this as well.
> > 
> > 							Thanx, Paul
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 12:55     ` Frederic Weisbecker
@ 2014-06-13 15:55       ` Paul E. McKenney
  2014-06-13 16:03         ` Frederic Weisbecker
  2014-06-13 16:10         ` Paul E. McKenney
  0 siblings, 2 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 15:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 02:55:45PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 12, 2014 at 07:05:54PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > users should be a tiny minority, if actually any.
> > > > 
> > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > end up noticeably reducing RCU performances and increasing energy
> > > > consumption.
> > > > 
> > > > So lets affine the GP kthread only when nohz full is actually used
> > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > > 
> > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > ---
> > > >  kernel/rcu/tree_plugin.h | 10 +++++++---
> > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index cbc2c45..726f52c 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> > > >   */
> > > >  static void rcu_bind_gp_kthread(void)
> > > >  {
> > > > -#ifdef CONFIG_NO_HZ_FULL
> > > > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > +	int cpu;
> > > > +
> > > > +	if (!tick_nohz_full_enabled())
> > > > +		return;
> > > > +
> > > > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > 
> > > >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> > > >  		return;
> > > > +
> > > >  	if (raw_smp_processor_id() != cpu)
> > > >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > > > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > > >  }
> > > 
> > > Hello, Frederic,
> > > 
> > > I have the following queued.  Shall I port yours on top of mine, or is
> > > there an issue with mine?
> > 
> > OK, I suppose I should show you what it looks like ported on top of mine.
> 
> No need to keep my patch as long as yours goes in. It fixes all the issue.
> 
> > 
> > I didn't understand the removal of "#ifdef CONFIG_NO_HZ_FULL", so I
> > left that.
> 
> Yeah that's because the:
> 
>       if (!tick_nohz_full_enabled())
>            return;
> 
> returns unconditionally if CONFIG_NO_HZ_FULL=n. So the whole function
> becomes dead code that should be detected and removed by gcc. So same
> result as with the ifdef.

Well, that part of your patch is worthwhile, then!  And I do therefore
need to restructure to invoke tick_nohz_full_enabled() at the very
beginning of the function.

> > I also didn't understand how dumping the GP kthreads onto
> > a single CPU could affect energy efficiency all that much, so I omitted
> > that from the commit log.  If I am missing something, please enlighten me.
> 
> Because if the kthread is pinned to CPU 0 and a grace period is in progress,
> the kthread is going to disturb the CPU 0 regardless of its possibly deep sleep
> state. OTOH if the kthread is widely affine, it can be scheduled to idle CPUs
> that are less cold and thus wake them from less deep power state.
> 
> Well that all assuming that the scheduler takes care of these deep idle state.
> But I believe it does iirc...

My guess is that enough stuff gets dumped on CPU 0 to make this a non-issue,
but point taken nonetheless.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 12:42   ` Frederic Weisbecker
@ 2014-06-13 15:58     ` Paul E. McKenney
  2014-06-13 16:09       ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 15:58 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 02:42:09PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > users should be a tiny minority, if actually any.
> > > 
> > > Also there is a risk that affining the GP kthread to a single CPU could
> > > end up noticeably reducing RCU performances and increasing energy
> > > consumption.
> > > 
> > > So lets affine the GP kthread only when nohz full is actually used
> > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index cbc2c45..726f52c 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> > >   */
> > >  static void rcu_bind_gp_kthread(void)
> > >  {
> > > -#ifdef CONFIG_NO_HZ_FULL
> > > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > +	int cpu;
> > > +
> > > +	if (!tick_nohz_full_enabled())
> > > +		return;
> > > +
> > > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > 
> > >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> > >  		return;
> > > +
> > >  	if (raw_smp_processor_id() != cpu)
> > >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > >  }
> > 
> > Hello, Frederic,
> > 
> > I have the following queued.  Shall I port yours on top of mine, or is
> > there an issue with mine?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
> > 
> > Binding the grace-period kthreads to the timekeeping CPU resulted in
> > significant performance decreases for some workloads.  For more detail,
> > see:
> > 
> > https://lkml.org/lkml/2014/6/3/395 for benchmark numbers
> > 
> > https://lkml.org/lkml/2014/6/4/218 for CPU statistics
> > 
> > It turns out that it is necessary to bind the grace-period kthreads
> > to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
> > on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
> > In other cases, it suffices to bind the grace-period kthreads to the
> > set of non-nohz_full CPUs.
> > 
> > This commit therefore creates a tick_nohz_not_full_mask that is the
> > complement of tick_nohz_full_mask, and then binds the grace-period
> > kthread to the set of CPUs indicated by this new mask, which covers
> > the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
> > case still binds the grace-period kthreads to the timekeeping CPU.
> > 
> > Reported-by: Jet Chen <jet.chen@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index b84773cb9f4c..1fe0c05eee39 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -162,6 +162,7 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
> >  #ifdef CONFIG_NO_HZ_FULL
> >  extern bool tick_nohz_full_running;
> >  extern cpumask_var_t tick_nohz_full_mask;
> > +extern cpumask_var_t tick_nohz_not_full_mask;
> >  
> >  static inline bool tick_nohz_full_enabled(void)
> >  {
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 7ce734040a5e..ec7627becaf0 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -2863,7 +2863,12 @@ static void rcu_bind_gp_kthread(void)
> >  
> >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> >  		return;
> > +#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
> >  	if (raw_smp_processor_id() != cpu)
> >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > +#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> > +	if (!cpumask_test_cpu(raw_smp_processor_id(), tick_nohz_not_full_mask))
> > +		set_cpus_allowed_ptr(current, tick_nohz_not_full_mask);
> > +#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
> >  #endif /* #ifdef CONFIG_NO_HZ_FULL */
> >  }
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 9f8af69c67ec..02209e957e76 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -151,6 +151,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
> >  
> >  #ifdef CONFIG_NO_HZ_FULL
> >  cpumask_var_t tick_nohz_full_mask;
> > +cpumask_var_t tick_nohz_not_full_mask;
> >  bool tick_nohz_full_running;
> >  
> >  static bool can_stop_full_tick(void)
> > @@ -278,6 +279,7 @@ static int __init tick_nohz_full_setup(char *str)
> >  	int cpu;
> >  
> >  	alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
> > +	alloc_bootmem_cpumask_var(&tick_nohz_not_full_mask);
> >  	if (cpulist_parse(str, tick_nohz_full_mask) < 0) {
> >  		pr_warning("NOHZ: Incorrect nohz_full cpumask\n");
> >  		return 1;
> > @@ -288,6 +290,8 @@ static int __init tick_nohz_full_setup(char *str)
> >  		pr_warning("NO_HZ: Clearing %d from nohz_full range for timekeeping\n", cpu);
> >  		cpumask_clear_cpu(cpu, tick_nohz_full_mask);
> >  	}
> > +	cpumask_andnot(tick_nohz_not_full_mask,
> > +		       cpu_possible_mask, tick_nohz_full_mask);
> >  	tick_nohz_full_running = true;
> >  
> >  	return 1;
> > @@ -332,6 +336,8 @@ static int tick_nohz_init_all(void)
> >  	err = 0;
> >  	cpumask_setall(tick_nohz_full_mask);
> >  	cpumask_clear_cpu(smp_processor_id(), tick_nohz_full_mask);
> > +	cpumask_clear(tick_nohz_not_full_mask);
> > +	cpumask_set_cpu(smp_processor_id(), tick_nohz_not_full_mask);
> >  	tick_nohz_full_running = true;
> >  #endif
> >  	return err;
> > 
> 
> Ah I missed that one. Ok it should fix the issue then, no need for mine.
> 
> Ok I must confess that I don't have a comfortable feeling with having two opposed masks
> like this: tick_nohz_full_mask and tick_nohz_not_full_mask.
> 
> Maybe what we should do instead is to have something like this on RCU kthread init:
> 
>           cpumask_var_t gp_kthread_mask;
> 
>           if (alloc_cpumask_var(gp_kthread_mask, GFP_KERNEL))
>               return -EFAULT;
> 
>           cpumask_andnot(gp_kthread_mask, cpu_possible_mask, tick_nohz_full_mask);
> 
> 	  set_cpus_allowed_ptr(current, gp_kthread_mask);
> 
> 	  free_cpumask_var(gp_kthread_mask);

I was guessing that RCU's kthreads would not be the only ones that wanted
similar binding.  But if you feel strongly about this, we could at least
start by placing it local to RCU as above.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 15:52       ` Paul E. McKenney
@ 2014-06-13 16:00         ` Frederic Weisbecker
  2014-06-13 16:16           ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 16:00 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 08:52:33AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:47:16PM +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 12, 2014 at 06:35:15PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > > users should be a tiny minority, if actually any.
> > > > > 
> > > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > > end up noticeably reducing RCU performances and increasing energy
> > > > > consumption.
> > > > > 
> > > > > So lets affine the GP kthread only when nohz full is actually used
> > > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > 
> > > Which reminds me...  Kernel-heavy workloads running NO_HZ_FULL_ALL=y
> > > can see long RCU grace periods, as in about two seconds each.  It is
> > > not hard for me to detect this situation.
> > 
> > Ah yeah sounds quite long.
> > 
> > > Is there some way I can
> > > call for a given CPU's scheduling-clock interrupt to be turned on?
> > 
> > Yeah, once the nohz kick patchset (https://lwn.net/Articles/601214/) is merged,
> > a simple call to tick_nohz_full_kick_cpu() should do the trick. Although the
> > right condition must be there on the IPI side. Maybe with rcu_needs_cpu() or such.
> 
> I could record the offending GP, and make rcu_needs_cpu() return true
> if the current GP matches the offending one.
> 
> > But it would be interesting to identify the sources of these extended grace periods.
> > If we only restart the tick, we may ignore some deeper oustanding issue.
> 
> Some of them have been fixable by other means, but they will probably
> come back as system sizes grow.  And I really have put preemption points
> into kernel code in response to RCU CPU stall warnings, and the current
> state of NO_HZ_FULL effectively ignores these preemption points.  :-/

I'm not sure I really understand the issue though. So you have RCU CPU stalls due
to very extended grace periods, right?

I'm not sure how preemption points would solve that. Or maybe you're
trying to trigger quiescent states reports through these preemption points?

Is it because we have dynticks CPUs staying too long in the kernel without
taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
things?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 15:55       ` Paul E. McKenney
@ 2014-06-13 16:03         ` Frederic Weisbecker
  2014-06-13 16:20           ` Paul E. McKenney
  2014-06-13 16:10         ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 16:03 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 08:55:48AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:55:45PM +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 12, 2014 at 07:05:54PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > > users should be a tiny minority, if actually any.
> > > > > 
> > > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > > end up noticeably reducing RCU performances and increasing energy
> > > > > consumption.
> > > > > 
> > > > > So lets affine the GP kthread only when nohz full is actually used
> > > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > > > 
> > > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > ---
> > > > >  kernel/rcu/tree_plugin.h | 10 +++++++---
> > > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index cbc2c45..726f52c 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> > > > >   */
> > > > >  static void rcu_bind_gp_kthread(void)
> > > > >  {
> > > > > -#ifdef CONFIG_NO_HZ_FULL
> > > > > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > > +	int cpu;
> > > > > +
> > > > > +	if (!tick_nohz_full_enabled())
> > > > > +		return;
> > > > > +
> > > > > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > > 
> > > > >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> > > > >  		return;
> > > > > +
> > > > >  	if (raw_smp_processor_id() != cpu)
> > > > >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > > > > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > > > >  }
> > > > 
> > > > Hello, Frederic,
> > > > 
> > > > I have the following queued.  Shall I port yours on top of mine, or is
> > > > there an issue with mine?
> > > 
> > > OK, I suppose I should show you what it looks like ported on top of mine.
> > 
> > No need to keep my patch as long as yours goes in. It fixes all the issue.
> > 
> > > 
> > > I didn't understand the removal of "#ifdef CONFIG_NO_HZ_FULL", so I
> > > left that.
> > 
> > Yeah that's because the:
> > 
> >       if (!tick_nohz_full_enabled())
> >            return;
> > 
> > returns unconditionally if CONFIG_NO_HZ_FULL=n. So the whole function
> > becomes dead code that should be detected and removed by gcc. So same
> > result as with the ifdef.
> 
> Well, that part of your patch is worthwhile, then!  And I do therefore
> need to restructure to invoke tick_nohz_full_enabled() at the very
> beginning of the function.

Well if no CPU is full dynticks, the sole effect of this is that the kthread
will be re-affined globally... So no big issue.

Although that can make you spare a call to set_cpus_allowed_ptr(), which can
be worthwile after all :)

> 
> > > I also didn't understand how dumping the GP kthreads onto
> > > a single CPU could affect energy efficiency all that much, so I omitted
> > > that from the commit log.  If I am missing something, please enlighten me.
> > 
> > Because if the kthread is pinned to CPU 0 and a grace period is in progress,
> > the kthread is going to disturb the CPU 0 regardless of its possibly deep sleep
> > state. OTOH if the kthread is widely affine, it can be scheduled to idle CPUs
> > that are less cold and thus wake them from less deep power state.
> > 
> > Well that all assuming that the scheduler takes care of these deep idle state.
> > But I believe it does iirc...
> 
> My guess is that enough stuff gets dumped on CPU 0 to make this a non-issue,
> but point taken nonetheless.

Ok.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 15:58     ` Paul E. McKenney
@ 2014-06-13 16:09       ` Frederic Weisbecker
  2014-06-13 16:23         ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 16:09 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 08:58:12AM -0700, Paul E. McKenney wrote:
> > Maybe what we should do instead is to have something like this on RCU kthread init:
> > 
> >           cpumask_var_t gp_kthread_mask;
> > 
> >           if (alloc_cpumask_var(gp_kthread_mask, GFP_KERNEL))
> >               return -EFAULT;
> > 
> >           cpumask_andnot(gp_kthread_mask, cpu_possible_mask, tick_nohz_full_mask);
> > 
> > 	  set_cpus_allowed_ptr(current, gp_kthread_mask);
> > 
> > 	  free_cpumask_var(gp_kthread_mask);
> 
> I was guessing that RCU's kthreads would not be the only ones that wanted
> similar binding.  But if you feel strongly about this, we could at least
> start by placing it local to RCU as above.

Hmm, I don't feel well creating mirrored cpumasks (nor solely negation cpumasks in general).
Also exposing that nohz gut is probably not a good idea either, except for RCU due to
the sysidle stuff.

Now you're right that we can expect that this non-nohz affinity stuff is going to be reused.

Could it be housekeeping_affine(struct task_struct *tsk) maybe?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 15:55       ` Paul E. McKenney
  2014-06-13 16:03         ` Frederic Weisbecker
@ 2014-06-13 16:10         ` Paul E. McKenney
  1 sibling, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 16:10 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 08:55:48AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:55:45PM +0200, Frederic Weisbecker wrote:
> > On Thu, Jun 12, 2014 at 07:05:54PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > > users should be a tiny minority, if actually any.
> > > > > 
> > > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > > end up noticeably reducing RCU performances and increasing energy
> > > > > consumption.
> > > > > 
> > > > > So lets affine the GP kthread only when nohz full is actually used
> > > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > > > 
> > > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > ---
> > > > >  kernel/rcu/tree_plugin.h | 10 +++++++---
> > > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index cbc2c45..726f52c 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> > > > >   */
> > > > >  static void rcu_bind_gp_kthread(void)
> > > > >  {
> > > > > -#ifdef CONFIG_NO_HZ_FULL
> > > > > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > > +	int cpu;
> > > > > +
> > > > > +	if (!tick_nohz_full_enabled())
> > > > > +		return;
> > > > > +
> > > > > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > > 
> > > > >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> > > > >  		return;
> > > > > +
> > > > >  	if (raw_smp_processor_id() != cpu)
> > > > >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > > > > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > > > >  }
> > > > 
> > > > Hello, Frederic,
> > > > 
> > > > I have the following queued.  Shall I port yours on top of mine, or is
> > > > there an issue with mine?
> > > 
> > > OK, I suppose I should show you what it looks like ported on top of mine.
> > 
> > No need to keep my patch as long as yours goes in. It fixes all the issue.
> > 
> > > 
> > > I didn't understand the removal of "#ifdef CONFIG_NO_HZ_FULL", so I
> > > left that.
> > 
> > Yeah that's because the:
> > 
> >       if (!tick_nohz_full_enabled())
> >            return;
> > 
> > returns unconditionally if CONFIG_NO_HZ_FULL=n. So the whole function
> > becomes dead code that should be detected and removed by gcc. So same
> > result as with the ifdef.
> 
> Well, that part of your patch is worthwhile, then!  And I do therefore
> need to restructure to invoke tick_nohz_full_enabled() at the very
> beginning of the function.

Except then I get build errors for CONFIG_NO_HZ_FULL=n because of the
masks.  So as you in fact suggested, back to the original commit...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:00         ` Frederic Weisbecker
@ 2014-06-13 16:16           ` Paul E. McKenney
  2014-06-13 16:21             ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 16:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 06:00:04PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 08:52:33AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:47:16PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Jun 12, 2014 at 06:35:15PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > > > users should be a tiny minority, if actually any.
> > > > > > 
> > > > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > > > end up noticeably reducing RCU performances and increasing energy
> > > > > > consumption.
> > > > > > 
> > > > > > So lets affine the GP kthread only when nohz full is actually used
> > > > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > > 
> > > > Which reminds me...  Kernel-heavy workloads running NO_HZ_FULL_ALL=y
> > > > can see long RCU grace periods, as in about two seconds each.  It is
> > > > not hard for me to detect this situation.
> > > 
> > > Ah yeah sounds quite long.
> > > 
> > > > Is there some way I can
> > > > call for a given CPU's scheduling-clock interrupt to be turned on?
> > > 
> > > Yeah, once the nohz kick patchset (https://lwn.net/Articles/601214/) is merged,
> > > a simple call to tick_nohz_full_kick_cpu() should do the trick. Although the
> > > right condition must be there on the IPI side. Maybe with rcu_needs_cpu() or such.
> > 
> > I could record the offending GP, and make rcu_needs_cpu() return true
> > if the current GP matches the offending one.
> > 
> > > But it would be interesting to identify the sources of these extended grace periods.
> > > If we only restart the tick, we may ignore some deeper oustanding issue.
> > 
> > Some of them have been fixable by other means, but they will probably
> > come back as system sizes grow.  And I really have put preemption points
> > into kernel code in response to RCU CPU stall warnings, and the current
> > state of NO_HZ_FULL effectively ignores these preemption points.  :-/
> 
> I'm not sure I really understand the issue though. So you have RCU CPU stalls due
> to very extended grace periods, right?
> 
> I'm not sure how preemption points would solve that. Or maybe you're
> trying to trigger quiescent states reports through these preemption points?

If we have scheduling-clock interrupts, the preemption points will help
push RCU through its state machine.  If we don't have scheduling-clock
interrupts, RCU can't make progress in this case.

> Is it because we have dynticks CPUs staying too long in the kernel without
> taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> things?

Sort of the former, but combined with the fact that in-kernel CPUs still
need scheduling-clock interrupts for RCU to make progress.  I could
move this to RCU's context-switch hook, but that could be very bad for
workloads that do lots of context switching.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:03         ` Frederic Weisbecker
@ 2014-06-13 16:20           ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 16:20 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 06:03:20PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 08:55:48AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:55:45PM +0200, Frederic Weisbecker wrote:
> > > On Thu, Jun 12, 2014 at 07:05:54PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 12, 2014 at 06:24:32PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 13, 2014 at 02:16:59AM +0200, Frederic Weisbecker wrote:
> > > > > > CONFIG_NO_HZ_FULL may be enabled widely on distros nowadays but actual
> > > > > > users should be a tiny minority, if actually any.
> > > > > > 
> > > > > > Also there is a risk that affining the GP kthread to a single CPU could
> > > > > > end up noticeably reducing RCU performances and increasing energy
> > > > > > consumption.
> > > > > > 
> > > > > > So lets affine the GP kthread only when nohz full is actually used
> > > > > > (ie: when the nohz_full= parameter is filled or CONFIG_NO_HZ_FULL_ALL=y)
> > > > > > 
> > > > > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > > > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > > > > > ---
> > > > > >  kernel/rcu/tree_plugin.h | 10 +++++++---
> > > > > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > > index cbc2c45..726f52c 100644
> > > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > > @@ -2843,12 +2843,16 @@ static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
> > > > > >   */
> > > > > >  static void rcu_bind_gp_kthread(void)
> > > > > >  {
> > > > > > -#ifdef CONFIG_NO_HZ_FULL
> > > > > > -	int cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > > > +	int cpu;
> > > > > > +
> > > > > > +	if (!tick_nohz_full_enabled())
> > > > > > +		return;
> > > > > > +
> > > > > > +	cpu = ACCESS_ONCE(tick_do_timer_cpu);
> > > > > > 
> > > > > >  	if (cpu < 0 || cpu >= nr_cpu_ids)
> > > > > >  		return;
> > > > > > +
> > > > > >  	if (raw_smp_processor_id() != cpu)
> > > > > >  		set_cpus_allowed_ptr(current, cpumask_of(cpu));
> > > > > > -#endif /* #ifdef CONFIG_NO_HZ_FULL */
> > > > > >  }
> > > > > 
> > > > > Hello, Frederic,
> > > > > 
> > > > > I have the following queued.  Shall I port yours on top of mine, or is
> > > > > there an issue with mine?
> > > > 
> > > > OK, I suppose I should show you what it looks like ported on top of mine.
> > > 
> > > No need to keep my patch as long as yours goes in. It fixes all the issue.
> > > 
> > > > 
> > > > I didn't understand the removal of "#ifdef CONFIG_NO_HZ_FULL", so I
> > > > left that.
> > > 
> > > Yeah that's because the:
> > > 
> > >       if (!tick_nohz_full_enabled())
> > >            return;
> > > 
> > > returns unconditionally if CONFIG_NO_HZ_FULL=n. So the whole function
> > > becomes dead code that should be detected and removed by gcc. So same
> > > result as with the ifdef.
> > 
> > Well, that part of your patch is worthwhile, then!  And I do therefore
> > need to restructure to invoke tick_nohz_full_enabled() at the very
> > beginning of the function.
> 
> Well if no CPU is full dynticks, the sole effect of this is that the kthread
> will be re-affined globally... So no big issue.
> 
> Although that can make you spare a call to set_cpus_allowed_ptr(), which can
> be worthwile after all :)

Good point, I must keep the #ifdef, but can also add the call.

								Thanx, Paul

> > > > I also didn't understand how dumping the GP kthreads onto
> > > > a single CPU could affect energy efficiency all that much, so I omitted
> > > > that from the commit log.  If I am missing something, please enlighten me.
> > > 
> > > Because if the kthread is pinned to CPU 0 and a grace period is in progress,
> > > the kthread is going to disturb the CPU 0 regardless of its possibly deep sleep
> > > state. OTOH if the kthread is widely affine, it can be scheduled to idle CPUs
> > > that are less cold and thus wake them from less deep power state.
> > > 
> > > Well that all assuming that the scheduler takes care of these deep idle state.
> > > But I believe it does iirc...
> > 
> > My guess is that enough stuff gets dumped on CPU 0 to make this a non-issue,
> > but point taken nonetheless.
> 
> Ok.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:16           ` Paul E. McKenney
@ 2014-06-13 16:21             ` Frederic Weisbecker
  2014-06-13 16:44               ` Josh Triplett
  2014-06-13 20:49               ` Paul E. McKenney
  0 siblings, 2 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 16:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > Is it because we have dynticks CPUs staying too long in the kernel without
> > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > things?
> 
> Sort of the former, but combined with the fact that in-kernel CPUs still
> need scheduling-clock interrupts for RCU to make progress.  I could
> move this to RCU's context-switch hook, but that could be very bad for
> workloads that do lots of context switching.

Or I can restart the tick if the CPU stays in the kernel for too long without
a tick. I think that's what we were doing before but we removed that because
we never implemented it correctly (we sent scheduler IPI that did nothing...)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:09       ` Frederic Weisbecker
@ 2014-06-13 16:23         ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 16:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 06:09:35PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 08:58:12AM -0700, Paul E. McKenney wrote:
> > > Maybe what we should do instead is to have something like this on RCU kthread init:
> > > 
> > >           cpumask_var_t gp_kthread_mask;
> > > 
> > >           if (alloc_cpumask_var(gp_kthread_mask, GFP_KERNEL))
> > >               return -EFAULT;
> > > 
> > >           cpumask_andnot(gp_kthread_mask, cpu_possible_mask, tick_nohz_full_mask);
> > > 
> > > 	  set_cpus_allowed_ptr(current, gp_kthread_mask);
> > > 
> > > 	  free_cpumask_var(gp_kthread_mask);
> > 
> > I was guessing that RCU's kthreads would not be the only ones that wanted
> > similar binding.  But if you feel strongly about this, we could at least
> > start by placing it local to RCU as above.
> 
> Hmm, I don't feel well creating mirrored cpumasks (nor solely negation cpumasks in general).
> Also exposing that nohz gut is probably not a good idea either, except for RCU due to
> the sysidle stuff.
> 
> Now you're right that we can expect that this non-nohz affinity stuff is going to be reused.
> 
> Could it be housekeeping_affine(struct task_struct *tsk) maybe?

Now that you mention it, that does sound better.  I will do that.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:21             ` Frederic Weisbecker
@ 2014-06-13 16:44               ` Josh Triplett
  2014-06-13 20:48                 ` Paul E. McKenney
  2014-06-13 20:49               ` Paul E. McKenney
  1 sibling, 1 reply; 33+ messages in thread
From: Josh Triplett @ 2014-06-13 16:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Paul E. McKenney, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > things?
> > 
> > Sort of the former, but combined with the fact that in-kernel CPUs still
> > need scheduling-clock interrupts for RCU to make progress.  I could
> > move this to RCU's context-switch hook, but that could be very bad for
> > workloads that do lots of context switching.
> 
> Or I can restart the tick if the CPU stays in the kernel for too long without
> a tick. I think that's what we were doing before but we removed that because
> we never implemented it correctly (we sent scheduler IPI that did nothing...)

I wonder if timer slack would make sense here: when you have at least
one RCU callback pending, set a timer with a huge amount of timer slack,
and cancel it if you end up handling the callback via a trip through the
scheduler.

- Josh Triplett

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:44               ` Josh Triplett
@ 2014-06-13 20:48                 ` Paul E. McKenney
  2014-06-13 21:10                   ` Josh Triplett
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 20:48 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > things?
> > > 
> > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > move this to RCU's context-switch hook, but that could be very bad for
> > > workloads that do lots of context switching.
> > 
> > Or I can restart the tick if the CPU stays in the kernel for too long without
> > a tick. I think that's what we were doing before but we removed that because
> > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> 
> I wonder if timer slack would make sense here: when you have at least
> one RCU callback pending, set a timer with a huge amount of timer slack,
> and cancel it if you end up handling the callback via a trip through the
> scheduler.

But in this case, we need the tick even if the current CPU has no callbacks
because it might be in an RCU read-side critical section.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 16:21             ` Frederic Weisbecker
  2014-06-13 16:44               ` Josh Triplett
@ 2014-06-13 20:49               ` Paul E. McKenney
  2014-06-13 23:13                 ` Frederic Weisbecker
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 20:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > things?
> > 
> > Sort of the former, but combined with the fact that in-kernel CPUs still
> > need scheduling-clock interrupts for RCU to make progress.  I could
> > move this to RCU's context-switch hook, but that could be very bad for
> > workloads that do lots of context switching.
> 
> Or I can restart the tick if the CPU stays in the kernel for too long without
> a tick. I think that's what we were doing before but we removed that because
> we never implemented it correctly (we sent scheduler IPI that did nothing...)

That would work for me!

Just out of curiosity, what would you use to determine that the CPU
had been in the kernel too long?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 20:48                 ` Paul E. McKenney
@ 2014-06-13 21:10                   ` Josh Triplett
  2014-06-13 22:49                     ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Josh Triplett @ 2014-06-13 21:10 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > things?
> > > > 
> > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > workloads that do lots of context switching.
> > > 
> > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > a tick. I think that's what we were doing before but we removed that because
> > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > 
> > I wonder if timer slack would make sense here: when you have at least
> > one RCU callback pending, set a timer with a huge amount of timer slack,
> > and cancel it if you end up handling the callback via a trip through the
> > scheduler.
> 
> But in this case, we need the tick even if the current CPU has no callbacks
> because it might be in an RCU read-side critical section.

Don't we handle that case via the slowpath of rcu_read_unlock, and a
flag set via IPI?  ("Oh, that CPU has taken too long to note a quiescent
state; send it an IPI to set the special flag that makes unlock do the
work.")

- Josh Triplett

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 21:10                   ` Josh Triplett
@ 2014-06-13 22:49                     ` Paul E. McKenney
  2014-06-13 23:10                       ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 22:49 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Frederic Weisbecker, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 02:10:35PM -0700, Josh Triplett wrote:
> On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> > > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > > things?
> > > > > 
> > > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > > workloads that do lots of context switching.
> > > > 
> > > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > > a tick. I think that's what we were doing before but we removed that because
> > > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > > 
> > > I wonder if timer slack would make sense here: when you have at least
> > > one RCU callback pending, set a timer with a huge amount of timer slack,
> > > and cancel it if you end up handling the callback via a trip through the
> > > scheduler.
> > 
> > But in this case, we need the tick even if the current CPU has no callbacks
> > because it might be in an RCU read-side critical section.
> 
> Don't we handle that case via the slowpath of rcu_read_unlock, and a
> flag set via IPI?  ("Oh, that CPU has taken too long to note a quiescent
> state; send it an IPI to set the special flag that makes unlock do the
> work.")

There was once such logic on the force-quiescent-state path, and making
that handle this new case was my first proposal.  As Frederic pointed
out, that change requires rcu_needs_cpu()'s cooperation, because otherwise
the CPU will take the IPI, see that it still has but one runnable task,
and then keep its scheduling-clock interrupt off.

The thing that involves rcu_read_unlock_special() is a flag set
by the scheduling-clock interrupt, which doesn't help here.  Also,
if a CPU stays in the kernel for a very long time without passing
through any RCU read-side critical sections, there is nothing that
rcu_read_unlock_special() can do to help.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 22:49                     ` Paul E. McKenney
@ 2014-06-13 23:10                       ` Frederic Weisbecker
  2014-06-13 23:27                         ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 23:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 03:49:26PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 02:10:35PM -0700, Josh Triplett wrote:
> > On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> > > > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > > > things?
> > > > > > 
> > > > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > > > workloads that do lots of context switching.
> > > > > 
> > > > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > > > a tick. I think that's what we were doing before but we removed that because
> > > > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > > > 
> > > > I wonder if timer slack would make sense here: when you have at least
> > > > one RCU callback pending, set a timer with a huge amount of timer slack,
> > > > and cancel it if you end up handling the callback via a trip through the
> > > > scheduler.
> > > 
> > > But in this case, we need the tick even if the current CPU has no callbacks
> > > because it might be in an RCU read-side critical section.
> > 
> > Don't we handle that case via the slowpath of rcu_read_unlock, and a
> > flag set via IPI?  ("Oh, that CPU has taken too long to note a quiescent
> > state; send it an IPI to set the special flag that makes unlock do the
> > work.")
> 
> There was once such logic on the force-quiescent-state path, and making
> that handle this new case was my first proposal.  As Frederic pointed
> out, that change requires rcu_needs_cpu()'s cooperation, because otherwise
> the CPU will take the IPI, see that it still has but one runnable task,
> and then keep its scheduling-clock interrupt off.

Exactly. So that's what happens currently, we call rcu_kick_nohz_cpu()
on extended grace periods but the IPI doesn't reconsider the tick.

In fact it doesn't do anything at all because the scheduler IPI,
when invoked without a reason, doesn't even call irq_enter()/irq_exit(),
so rcu_needs_cpu() isn't quite called from there.

Now that's going to change with https://lwn.net/Articles/601836/ if
we convert rcu_kick_nohz_cpu() to tick_nohz_full_kick_cpu().

Then we have the choice between two options:

* We can add a check in tick_nohz_full_check() and restart the tick if
necessary.

* Extend rcu_needs_cpu() to restore a similar periodic mode until the
grace periods get some progress.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 20:49               ` Paul E. McKenney
@ 2014-06-13 23:13                 ` Frederic Weisbecker
  2014-06-13 23:22                   ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 23:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 01:49:03PM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > things?
> > > 
> > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > move this to RCU's context-switch hook, but that could be very bad for
> > > workloads that do lots of context switching.
> > 
> > Or I can restart the tick if the CPU stays in the kernel for too long without
> > a tick. I think that's what we were doing before but we removed that because
> > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> 
> That would work for me!
> 
> Just out of curiosity, what would you use to determine that the CPU
> had been in the kernel too long?

I'd rather deduce that when grace periods completion go past some delay.
I think that's the requirement for calling rcu_kick_nohz_cpu()?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 23:13                 ` Frederic Weisbecker
@ 2014-06-13 23:22                   ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 23:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Josh Triplett, Steven Rostedt, Mathieu Desnoyers

On Sat, Jun 14, 2014 at 01:13:25AM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 01:49:03PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > things?
> > > > 
> > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > workloads that do lots of context switching.
> > > 
> > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > a tick. I think that's what we were doing before but we removed that because
> > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > 
> > That would work for me!
> > 
> > Just out of curiosity, what would you use to determine that the CPU
> > had been in the kernel too long?
> 
> I'd rather deduce that when grace periods completion go past some delay.
> I think that's the requirement for calling rcu_kick_nohz_cpu()?

OK, that does work for me.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 23:10                       ` Frederic Weisbecker
@ 2014-06-13 23:27                         ` Paul E. McKenney
  2014-06-13 23:39                           ` Frederic Weisbecker
  0 siblings, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-13 23:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Sat, Jun 14, 2014 at 01:10:35AM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 03:49:26PM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 02:10:35PM -0700, Josh Triplett wrote:
> > > On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> > > > > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > > > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > > > > things?
> > > > > > > 
> > > > > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > > > > workloads that do lots of context switching.
> > > > > > 
> > > > > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > > > > a tick. I think that's what we were doing before but we removed that because
> > > > > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > > > > 
> > > > > I wonder if timer slack would make sense here: when you have at least
> > > > > one RCU callback pending, set a timer with a huge amount of timer slack,
> > > > > and cancel it if you end up handling the callback via a trip through the
> > > > > scheduler.
> > > > 
> > > > But in this case, we need the tick even if the current CPU has no callbacks
> > > > because it might be in an RCU read-side critical section.
> > > 
> > > Don't we handle that case via the slowpath of rcu_read_unlock, and a
> > > flag set via IPI?  ("Oh, that CPU has taken too long to note a quiescent
> > > state; send it an IPI to set the special flag that makes unlock do the
> > > work.")
> > 
> > There was once such logic on the force-quiescent-state path, and making
> > that handle this new case was my first proposal.  As Frederic pointed
> > out, that change requires rcu_needs_cpu()'s cooperation, because otherwise
> > the CPU will take the IPI, see that it still has but one runnable task,
> > and then keep its scheduling-clock interrupt off.
> 
> Exactly. So that's what happens currently, we call rcu_kick_nohz_cpu()
> on extended grace periods but the IPI doesn't reconsider the tick.
> 
> In fact it doesn't do anything at all because the scheduler IPI,
> when invoked without a reason, doesn't even call irq_enter()/irq_exit(),
> so rcu_needs_cpu() isn't quite called from there.
> 
> Now that's going to change with https://lwn.net/Articles/601836/ if
> we convert rcu_kick_nohz_cpu() to tick_nohz_full_kick_cpu().
> 
> Then we have the choice between two options:
> 
> * We can add a check in tick_nohz_full_check() and restart the tick if
> necessary.
> 
> * Extend rcu_needs_cpu() to restore a similar periodic mode until the
> grace periods get some progress.

If I was to extend rcu_needs_cpu(), I would add a flag and another counter
to the rcu_data structure.  If rcu_needs_cpu() saw the flag set and the
counter equal to the current ->completed value, it would return true.

I already have the rcu_kick_nohz_cpu() in rcu_implicit_dynticks_qs(),
so it is just a matter of also setting the flag and copying ->completed
to the new counter at that point.  I currently get to this point if the
CPU has managed to run for more than one jiffy without hitting either
idle or userspace execution.  Fair enough?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 23:27                         ` Paul E. McKenney
@ 2014-06-13 23:39                           ` Frederic Weisbecker
  2014-06-14  5:06                             ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-13 23:39 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 04:27:15PM -0700, Paul E. McKenney wrote:
> On Sat, Jun 14, 2014 at 01:10:35AM +0200, Frederic Weisbecker wrote:
> > On Fri, Jun 13, 2014 at 03:49:26PM -0700, Paul E. McKenney wrote:
> > > On Fri, Jun 13, 2014 at 02:10:35PM -0700, Josh Triplett wrote:
> > > > On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote:
> > > > > On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> > > > > > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > > > > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > > > > > things?
> > > > > > > > 
> > > > > > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > > > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > > > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > > > > > workloads that do lots of context switching.
> > > > > > > 
> > > > > > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > > > > > a tick. I think that's what we were doing before but we removed that because
> > > > > > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > > > > > 
> > > > > > I wonder if timer slack would make sense here: when you have at least
> > > > > > one RCU callback pending, set a timer with a huge amount of timer slack,
> > > > > > and cancel it if you end up handling the callback via a trip through the
> > > > > > scheduler.
> > > > > 
> > > > > But in this case, we need the tick even if the current CPU has no callbacks
> > > > > because it might be in an RCU read-side critical section.
> > > > 
> > > > Don't we handle that case via the slowpath of rcu_read_unlock, and a
> > > > flag set via IPI?  ("Oh, that CPU has taken too long to note a quiescent
> > > > state; send it an IPI to set the special flag that makes unlock do the
> > > > work.")
> > > 
> > > There was once such logic on the force-quiescent-state path, and making
> > > that handle this new case was my first proposal.  As Frederic pointed
> > > out, that change requires rcu_needs_cpu()'s cooperation, because otherwise
> > > the CPU will take the IPI, see that it still has but one runnable task,
> > > and then keep its scheduling-clock interrupt off.
> > 
> > Exactly. So that's what happens currently, we call rcu_kick_nohz_cpu()
> > on extended grace periods but the IPI doesn't reconsider the tick.
> > 
> > In fact it doesn't do anything at all because the scheduler IPI,
> > when invoked without a reason, doesn't even call irq_enter()/irq_exit(),
> > so rcu_needs_cpu() isn't quite called from there.
> > 
> > Now that's going to change with https://lwn.net/Articles/601836/ if
> > we convert rcu_kick_nohz_cpu() to tick_nohz_full_kick_cpu().
> > 
> > Then we have the choice between two options:
> > 
> > * We can add a check in tick_nohz_full_check() and restart the tick if
> > necessary.
> > 
> > * Extend rcu_needs_cpu() to restore a similar periodic mode until the
> > grace periods get some progress.
> 
> If I was to extend rcu_needs_cpu(), I would add a flag and another counter
> to the rcu_data structure.  If rcu_needs_cpu() saw the flag set and the
> counter equal to the current ->completed value, it would return true.
> 
> I already have the rcu_kick_nohz_cpu() in rcu_implicit_dynticks_qs(),
> so it is just a matter of also setting the flag and copying ->completed
> to the new counter at that point.  I currently get to this point if the
> CPU has managed to run for more than one jiffy without hitting either
> idle or userspace execution.  Fair enough?

Perfect for me!

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-13 23:39                           ` Frederic Weisbecker
@ 2014-06-14  5:06                             ` Paul E. McKenney
  2014-06-14 11:26                               ` Paul E. McKenney
  2014-06-14 13:05                               ` Frederic Weisbecker
  0 siblings, 2 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-14  5:06 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Sat, Jun 14, 2014 at 01:39:36AM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 13, 2014 at 04:27:15PM -0700, Paul E. McKenney wrote:
> > On Sat, Jun 14, 2014 at 01:10:35AM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jun 13, 2014 at 03:49:26PM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jun 13, 2014 at 02:10:35PM -0700, Josh Triplett wrote:
> > > > > On Fri, Jun 13, 2014 at 01:48:22PM -0700, Paul E. McKenney wrote:
> > > > > > On Fri, Jun 13, 2014 at 09:44:41AM -0700, Josh Triplett wrote:
> > > > > > > On Fri, Jun 13, 2014 at 06:21:32PM +0200, Frederic Weisbecker wrote:
> > > > > > > > On Fri, Jun 13, 2014 at 09:16:30AM -0700, Paul E. McKenney wrote:
> > > > > > > > > > Is it because we have dynticks CPUs staying too long in the kernel without
> > > > > > > > > > taking any quiescent states? Are we perhaps missing some rcu_user_enter() or
> > > > > > > > > > things?
> > > > > > > > > 
> > > > > > > > > Sort of the former, but combined with the fact that in-kernel CPUs still
> > > > > > > > > need scheduling-clock interrupts for RCU to make progress.  I could
> > > > > > > > > move this to RCU's context-switch hook, but that could be very bad for
> > > > > > > > > workloads that do lots of context switching.
> > > > > > > > 
> > > > > > > > Or I can restart the tick if the CPU stays in the kernel for too long without
> > > > > > > > a tick. I think that's what we were doing before but we removed that because
> > > > > > > > we never implemented it correctly (we sent scheduler IPI that did nothing...)
> > > > > > > 
> > > > > > > I wonder if timer slack would make sense here: when you have at least
> > > > > > > one RCU callback pending, set a timer with a huge amount of timer slack,
> > > > > > > and cancel it if you end up handling the callback via a trip through the
> > > > > > > scheduler.
> > > > > > 
> > > > > > But in this case, we need the tick even if the current CPU has no callbacks
> > > > > > because it might be in an RCU read-side critical section.
> > > > > 
> > > > > Don't we handle that case via the slowpath of rcu_read_unlock, and a
> > > > > flag set via IPI?  ("Oh, that CPU has taken too long to note a quiescent
> > > > > state; send it an IPI to set the special flag that makes unlock do the
> > > > > work.")
> > > > 
> > > > There was once such logic on the force-quiescent-state path, and making
> > > > that handle this new case was my first proposal.  As Frederic pointed
> > > > out, that change requires rcu_needs_cpu()'s cooperation, because otherwise
> > > > the CPU will take the IPI, see that it still has but one runnable task,
> > > > and then keep its scheduling-clock interrupt off.
> > > 
> > > Exactly. So that's what happens currently, we call rcu_kick_nohz_cpu()
> > > on extended grace periods but the IPI doesn't reconsider the tick.
> > > 
> > > In fact it doesn't do anything at all because the scheduler IPI,
> > > when invoked without a reason, doesn't even call irq_enter()/irq_exit(),
> > > so rcu_needs_cpu() isn't quite called from there.
> > > 
> > > Now that's going to change with https://lwn.net/Articles/601836/ if
> > > we convert rcu_kick_nohz_cpu() to tick_nohz_full_kick_cpu().
> > > 
> > > Then we have the choice between two options:
> > > 
> > > * We can add a check in tick_nohz_full_check() and restart the tick if
> > > necessary.
> > > 
> > > * Extend rcu_needs_cpu() to restore a similar periodic mode until the
> > > grace periods get some progress.
> > 
> > If I was to extend rcu_needs_cpu(), I would add a flag and another counter
> > to the rcu_data structure.  If rcu_needs_cpu() saw the flag set and the
> > counter equal to the current ->completed value, it would return true.
> > 
> > I already have the rcu_kick_nohz_cpu() in rcu_implicit_dynticks_qs(),
> > so it is just a matter of also setting the flag and copying ->completed
> > to the new counter at that point.  I currently get to this point if the
> > CPU has managed to run for more than one jiffy without hitting either
> > idle or userspace execution.  Fair enough?
> 
> Perfect for me!

One complication...  So if the grace period has gone on for a long time,
and you are returning to kernel mode, RCU will need the scheduling-clock
tick.  However, in that very same situation, if you are returning to
idle or to NO_HZ_FULL userspace execution, RCU does -not- need the
scheduling-clock tick set.

One way I could do this is to have rcu_needs_cpu() return three values:
Zero for RCU doesn't need a scheduling-clock tick for any reason,
one if RCU needs a scheduling-clock tick only if returning to kernel
mode, and two if RCU unconditionally needs the scheduling-clock tick.
Would that work, or is there a better approach?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-14  5:06                             ` Paul E. McKenney
@ 2014-06-14 11:26                               ` Paul E. McKenney
  2014-06-14 13:10                                 ` Frederic Weisbecker
  2014-06-14 13:05                               ` Frederic Weisbecker
  1 sibling, 1 reply; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-14 11:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 10:06:06PM -0700, Paul E. McKenney wrote:
> On Sat, Jun 14, 2014 at 01:39:36AM +0200, Frederic Weisbecker wrote:
> > On Fri, Jun 13, 2014 at 04:27:15PM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > If I was to extend rcu_needs_cpu(), I would add a flag and another counter
> > > to the rcu_data structure.  If rcu_needs_cpu() saw the flag set and the
> > > counter equal to the current ->completed value, it would return true.
> > > 
> > > I already have the rcu_kick_nohz_cpu() in rcu_implicit_dynticks_qs(),
> > > so it is just a matter of also setting the flag and copying ->completed
> > > to the new counter at that point.  I currently get to this point if the
> > > CPU has managed to run for more than one jiffy without hitting either
> > > idle or userspace execution.  Fair enough?
> > 
> > Perfect for me!
> 
> One complication...  So if the grace period has gone on for a long time,
> and you are returning to kernel mode, RCU will need the scheduling-clock
> tick.  However, in that very same situation, if you are returning to
> idle or to NO_HZ_FULL userspace execution, RCU does -not- need the
> scheduling-clock tick set.
> 
> One way I could do this is to have rcu_needs_cpu() return three values:
> Zero for RCU doesn't need a scheduling-clock tick for any reason,
> one if RCU needs a scheduling-clock tick only if returning to kernel
> mode, and two if RCU unconditionally needs the scheduling-clock tick.
> Would that work, or is there a better approach?

You know, it just feels like RCU -should- be able to solve this
internally.  So if determining that you are returning to kernel mode is
at all inconvenient, give me a couple days to think this through.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-14  5:06                             ` Paul E. McKenney
  2014-06-14 11:26                               ` Paul E. McKenney
@ 2014-06-14 13:05                               ` Frederic Weisbecker
  1 sibling, 0 replies; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-14 13:05 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Fri, Jun 13, 2014 at 10:06:06PM -0700, Paul E. McKenney wrote:
> One complication...  So if the grace period has gone on for a long time,
> and you are returning to kernel mode, RCU will need the scheduling-clock
> tick.  However, in that very same situation, if you are returning to
> idle or to NO_HZ_FULL userspace execution, RCU does -not- need the
> scheduling-clock tick set.

Right.

> One way I could do this is to have rcu_needs_cpu() return three values:
> Zero for RCU doesn't need a scheduling-clock tick for any reason,
> one if RCU needs a scheduling-clock tick only if returning to kernel
> mode, and two if RCU unconditionally needs the scheduling-clock tick.
> Would that work, or is there a better approach?

For an interrupt, based on the context tracking state, I can check where we
return afterward if we are in an interrupt using context_tracking_in_user().

Now probably rcu_needs_cpu() should check that by itself and, depending
on where we return, only tell if we keep the tick or not.

What do you think?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-14 11:26                               ` Paul E. McKenney
@ 2014-06-14 13:10                                 ` Frederic Weisbecker
  2014-06-14 14:29                                   ` Paul E. McKenney
  0 siblings, 1 reply; 33+ messages in thread
From: Frederic Weisbecker @ 2014-06-14 13:10 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Sat, Jun 14, 2014 at 04:26:39AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 13, 2014 at 10:06:06PM -0700, Paul E. McKenney wrote:
> > On Sat, Jun 14, 2014 at 01:39:36AM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jun 13, 2014 at 04:27:15PM -0700, Paul E. McKenney wrote:
> 
> [ . . . ]
> 
> > > > If I was to extend rcu_needs_cpu(), I would add a flag and another counter
> > > > to the rcu_data structure.  If rcu_needs_cpu() saw the flag set and the
> > > > counter equal to the current ->completed value, it would return true.
> > > > 
> > > > I already have the rcu_kick_nohz_cpu() in rcu_implicit_dynticks_qs(),
> > > > so it is just a matter of also setting the flag and copying ->completed
> > > > to the new counter at that point.  I currently get to this point if the
> > > > CPU has managed to run for more than one jiffy without hitting either
> > > > idle or userspace execution.  Fair enough?
> > > 
> > > Perfect for me!
> > 
> > One complication...  So if the grace period has gone on for a long time,
> > and you are returning to kernel mode, RCU will need the scheduling-clock
> > tick.  However, in that very same situation, if you are returning to
> > idle or to NO_HZ_FULL userspace execution, RCU does -not- need the
> > scheduling-clock tick set.
> > 
> > One way I could do this is to have rcu_needs_cpu() return three values:
> > Zero for RCU doesn't need a scheduling-clock tick for any reason,
> > one if RCU needs a scheduling-clock tick only if returning to kernel
> > mode, and two if RCU unconditionally needs the scheduling-clock tick.
> > Would that work, or is there a better approach?
> 
> You know, it just feels like RCU -should- be able to solve this
> internally.

Right, I should have read all emails before answering :)

> So if determining that you are returning to kernel mode is
> at all inconvenient, give me a couple days to think this through.

Right so context_tracking_in_user() is probably what you need. Its semantics
are:

1) If we are in syscall or exception, it says we are in the kernel
2) If we are in interrupt, it says where we return (kernel or userspace).

Or course that's all subject to tiny error margin (soft versus hard userspace
detection) but that shouldn't matter much as even with true hard detection,
things can change quickly.

> 
> 							Thanx, Paul
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] rcu: Only pin GP kthread when full dynticks is actually used
  2014-06-14 13:10                                 ` Frederic Weisbecker
@ 2014-06-14 14:29                                   ` Paul E. McKenney
  0 siblings, 0 replies; 33+ messages in thread
From: Paul E. McKenney @ 2014-06-14 14:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Josh Triplett, LKML, Steven Rostedt, Mathieu Desnoyers

On Sat, Jun 14, 2014 at 03:10:15PM +0200, Frederic Weisbecker wrote:
> On Sat, Jun 14, 2014 at 04:26:39AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 13, 2014 at 10:06:06PM -0700, Paul E. McKenney wrote:
> > > On Sat, Jun 14, 2014 at 01:39:36AM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Jun 13, 2014 at 04:27:15PM -0700, Paul E. McKenney wrote:
> > 
> > [ . . . ]
> > 
> > > > > If I was to extend rcu_needs_cpu(), I would add a flag and another counter
> > > > > to the rcu_data structure.  If rcu_needs_cpu() saw the flag set and the
> > > > > counter equal to the current ->completed value, it would return true.
> > > > > 
> > > > > I already have the rcu_kick_nohz_cpu() in rcu_implicit_dynticks_qs(),
> > > > > so it is just a matter of also setting the flag and copying ->completed
> > > > > to the new counter at that point.  I currently get to this point if the
> > > > > CPU has managed to run for more than one jiffy without hitting either
> > > > > idle or userspace execution.  Fair enough?
> > > > 
> > > > Perfect for me!
> > > 
> > > One complication...  So if the grace period has gone on for a long time,
> > > and you are returning to kernel mode, RCU will need the scheduling-clock
> > > tick.  However, in that very same situation, if you are returning to
> > > idle or to NO_HZ_FULL userspace execution, RCU does -not- need the
> > > scheduling-clock tick set.
> > > 
> > > One way I could do this is to have rcu_needs_cpu() return three values:
> > > Zero for RCU doesn't need a scheduling-clock tick for any reason,
> > > one if RCU needs a scheduling-clock tick only if returning to kernel
> > > mode, and two if RCU unconditionally needs the scheduling-clock tick.
> > > Would that work, or is there a better approach?
> > 
> > You know, it just feels like RCU -should- be able to solve this
> > internally.
> 
> Right, I should have read all emails before answering :)
> 
> > So if determining that you are returning to kernel mode is
> > at all inconvenient, give me a couple days to think this through.
> 
> Right so context_tracking_in_user() is probably what you need. Its semantics
> are:
> 
> 1) If we are in syscall or exception, it says we are in the kernel
> 2) If we are in interrupt, it says where we return (kernel or userspace).
> 
> Or course that's all subject to tiny error margin (soft versus hard userspace
> detection) but that shouldn't matter much as even with true hard detection,
> things can change quickly.

So there is an impressionistic painting of RCU inside my head.  And in
one corner, there is something that might be a dog.  If it really is a
dog, or can be convinced to become one, perhaps it can herd the sheep
in the middle of the painting.  At least they look sort of like sheep.
They might instead be rainclouds.  Or powdered-wig-wearing barristers.
Of course, in the latter case, introducing them to the dog might be
worthwhile just for sheer entertainment value.

Anyway, I am heading out to the gym.  Perhaps a few gym sessions and
a couple of sleep cycles will convert the painting to useful code.
This approach has worked well for me many times over the decades, so
here is hoping that it does again.

Apologies for the twisted metaphor, but you did ask!  Even if only
implicitly.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-06-14 14:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-13  0:16 [PATCH] rcu: Only pin GP kthread when full dynticks is actually used Frederic Weisbecker
2014-06-13  1:24 ` Paul E. McKenney
2014-06-13  1:35   ` Paul E. McKenney
2014-06-13 12:47     ` Frederic Weisbecker
2014-06-13 15:52       ` Paul E. McKenney
2014-06-13 16:00         ` Frederic Weisbecker
2014-06-13 16:16           ` Paul E. McKenney
2014-06-13 16:21             ` Frederic Weisbecker
2014-06-13 16:44               ` Josh Triplett
2014-06-13 20:48                 ` Paul E. McKenney
2014-06-13 21:10                   ` Josh Triplett
2014-06-13 22:49                     ` Paul E. McKenney
2014-06-13 23:10                       ` Frederic Weisbecker
2014-06-13 23:27                         ` Paul E. McKenney
2014-06-13 23:39                           ` Frederic Weisbecker
2014-06-14  5:06                             ` Paul E. McKenney
2014-06-14 11:26                               ` Paul E. McKenney
2014-06-14 13:10                                 ` Frederic Weisbecker
2014-06-14 14:29                                   ` Paul E. McKenney
2014-06-14 13:05                               ` Frederic Weisbecker
2014-06-13 20:49               ` Paul E. McKenney
2014-06-13 23:13                 ` Frederic Weisbecker
2014-06-13 23:22                   ` Paul E. McKenney
2014-06-13  2:05   ` Paul E. McKenney
2014-06-13 12:55     ` Frederic Weisbecker
2014-06-13 15:55       ` Paul E. McKenney
2014-06-13 16:03         ` Frederic Weisbecker
2014-06-13 16:20           ` Paul E. McKenney
2014-06-13 16:10         ` Paul E. McKenney
2014-06-13 12:42   ` Frederic Weisbecker
2014-06-13 15:58     ` Paul E. McKenney
2014-06-13 16:09       ` Frederic Weisbecker
2014-06-13 16:23         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).