linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched: A few nohz_full improvements
@ 2017-06-19  2:11 Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-19  2:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Rik van Riel, Thomas Gleixner, Ingo Molnar

Hi,

As I was working on removing the remaining 1Hz, I stepped in the last
(AFAIK) remaining buggy bits in scheduler_tick() on nohz_full.

Hopefully I can finally offload this tick in the next patchsets.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	nohz/0hz

HEAD: c1811204f1fd1d170223c0d548a57456685e831d

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      sched/loadavg: Generalize idle naming to nohz
      nohz: Move idle balancer registration to idle path
      sched: Spare idle load balancing on nohz_full CPUs


 Documentation/trace/ftrace.txt |  2 +-
 include/linux/sched/nohz.h     |  8 +++----
 kernel/sched/fair.c            |  4 ++++
 kernel/sched/loadavg.c         | 51 +++++++++++++++++++++---------------------
 kernel/time/tick-sched.c       |  9 ++++----
 5 files changed, 40 insertions(+), 34 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz
  2017-06-19  2:11 [PATCH 0/3] sched: A few nohz_full improvements Frederic Weisbecker
@ 2017-06-19  2:12 ` Frederic Weisbecker
  2017-06-20 17:38   ` Rik van Riel
  2017-06-22 11:10   ` [tip:sched/core] sched/loadavg: Generalize "_idle" naming to "_nohz" tip-bot for Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 2/3] nohz: Move idle balancer registration to idle path Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs Frederic Weisbecker
  2 siblings, 2 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-19  2:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Rik van Riel, Thomas Gleixner, Ingo Molnar

The loadavg naming code still assumes that nohz == idle whereas its code
is actually handling well both nohz idle and nohz full.

So lets fix the naming according to what the code actually does.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/trace/ftrace.txt |  2 +-
 include/linux/sched/nohz.h     |  8 +++----
 kernel/sched/loadavg.c         | 51 +++++++++++++++++++++---------------------
 kernel/time/tick-sched.c       |  4 ++--
 4 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 94a987b..fff8ff6 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1609,7 +1609,7 @@ Doing the same with chrt -r 5 and function-trace set.
   <idle>-0       3dN.2   14us : sched_avg_update <-__cpu_load_update
   <idle>-0       3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   <idle>-0       3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
-  <idle>-0       3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
+  <idle>-0       3dN.1   15us : calc_load_nohz_stop <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : hrtimer_cancel <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : hrtimer_try_to_cancel <-hrtimer_cancel
diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 4995b71..7d3f75d 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -23,11 +23,11 @@ static inline void set_cpu_sd_state_idle(void) { }
 #endif
 
 #ifdef CONFIG_NO_HZ_COMMON
-void calc_load_enter_idle(void);
-void calc_load_exit_idle(void);
+void calc_load_nohz_start(void);
+void calc_load_nohz_stop(void);
 #else
-static inline void calc_load_enter_idle(void) { }
-static inline void calc_load_exit_idle(void) { }
+static inline void calc_load_nohz_start(void) { }
+static inline void calc_load_nohz_stop(void) { }
 #endif /* CONFIG_NO_HZ_COMMON */
 
 #if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index f15fb2b..f14716a 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -117,7 +117,7 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
  * load-average relies on per-cpu sampling from the tick, it is affected by
  * NO_HZ.
  *
- * The basic idea is to fold the nr_active delta into a global idle-delta upon
+ * The basic idea is to fold the nr_active delta into a global NO_HZ-delta upon
  * entering NO_HZ state such that we can include this as an 'extra' cpu delta
  * when we read the global state.
  *
@@ -126,7 +126,7 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
  *  - When we go NO_HZ idle during the window, we can negate our sample
  *    contribution, causing under-accounting.
  *
- *    We avoid this by keeping two idle-delta counters and flipping them
+ *    We avoid this by keeping two NO_HZ-delta counters and flipping them
  *    when the window starts, thus separating old and new NO_HZ load.
  *
  *    The only trick is the slight shift in index flip for read vs write.
@@ -137,22 +137,22 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
  *    r:0 0 1           1 0           0 1           1 0
  *    w:0 1 1           0 0           1 1           0 0
  *
- *    This ensures we'll fold the old idle contribution in this window while
+ *    This ensures we'll fold the old NO_HZ contribution in this window while
  *    accumlating the new one.
  *
- *  - When we wake up from NO_HZ idle during the window, we push up our
+ *  - When we wake up from NO_HZ during the window, we push up our
  *    contribution, since we effectively move our sample point to a known
  *    busy state.
  *
  *    This is solved by pushing the window forward, and thus skipping the
- *    sample, for this cpu (effectively using the idle-delta for this cpu which
+ *    sample, for this cpu (effectively using the NO_HZ-delta for this cpu which
  *    was in effect at the time the window opened). This also solves the issue
- *    of having to deal with a cpu having been in NOHZ idle for multiple
- *    LOAD_FREQ intervals.
+ *    of having to deal with a cpu having been in NO_HZ for multiple LOAD_FREQ
+ *    intervals.
  *
  * When making the ILB scale, we should try to pull this in as well.
  */
-static atomic_long_t calc_load_idle[2];
+static atomic_long_t calc_load_nohz[2];
 static int calc_load_idx;
 
 static inline int calc_load_write_idx(void)
@@ -167,7 +167,7 @@ static inline int calc_load_write_idx(void)
 
 	/*
 	 * If the folding window started, make sure we start writing in the
-	 * next idle-delta.
+	 * next NO_HZ-delta.
 	 */
 	if (!time_before(jiffies, READ_ONCE(calc_load_update)))
 		idx++;
@@ -180,24 +180,24 @@ static inline int calc_load_read_idx(void)
 	return calc_load_idx & 1;
 }
 
-void calc_load_enter_idle(void)
+void calc_load_nohz_start(void)
 {
 	struct rq *this_rq = this_rq();
 	long delta;
 
 	/*
-	 * We're going into NOHZ mode, if there's any pending delta, fold it
-	 * into the pending idle delta.
+	 * We're going into NO_HZ mode, if there's any pending delta, fold it
+	 * into the pending NO_HZ delta.
 	 */
 	delta = calc_load_fold_active(this_rq, 0);
 	if (delta) {
 		int idx = calc_load_write_idx();
 
-		atomic_long_add(delta, &calc_load_idle[idx]);
+		atomic_long_add(delta, &calc_load_nohz[idx]);
 	}
 }
 
-void calc_load_exit_idle(void)
+void calc_load_nohz_stop(void)
 {
 	struct rq *this_rq = this_rq();
 
@@ -217,13 +217,13 @@ void calc_load_exit_idle(void)
 		this_rq->calc_load_update += LOAD_FREQ;
 }
 
-static long calc_load_fold_idle(void)
+static long calc_load_nohz_fold(void)
 {
 	int idx = calc_load_read_idx();
 	long delta = 0;
 
-	if (atomic_long_read(&calc_load_idle[idx]))
-		delta = atomic_long_xchg(&calc_load_idle[idx], 0);
+	if (atomic_long_read(&calc_load_nohz[idx]))
+		delta = atomic_long_xchg(&calc_load_nohz[idx], 0);
 
 	return delta;
 }
@@ -299,9 +299,9 @@ calc_load_n(unsigned long load, unsigned long exp,
 
 /*
  * NO_HZ can leave us missing all per-cpu ticks calling
- * calc_load_account_active(), but since an idle CPU folds its delta into
- * calc_load_tasks_idle per calc_load_account_idle(), all we need to do is fold
- * in the pending idle delta if our idle period crossed a load cycle boundary.
+ * calc_load_fold_active(), but since a NO_HZ CPU folds its delta into
+ * calc_load_nohz per calc_load_nohz_start(), all we need to do is fold
+ * in the pending NO_HZ delta if our NO_HZ period crossed a load cycle boundary.
  *
  * Once we've updated the global active value, we need to apply the exponential
  * weights adjusted to the number of cycles missed.
@@ -330,7 +330,7 @@ static void calc_global_nohz(void)
 	}
 
 	/*
-	 * Flip the idle index...
+	 * Flip the NO_HZ index...
 	 *
 	 * Make sure we first write the new time then flip the index, so that
 	 * calc_load_write_idx() will see the new time when it reads the new
@@ -341,7 +341,7 @@ static void calc_global_nohz(void)
 }
 #else /* !CONFIG_NO_HZ_COMMON */
 
-static inline long calc_load_fold_idle(void) { return 0; }
+static inline long calc_load_nohz_fold(void) { return 0; }
 static inline void calc_global_nohz(void) { }
 
 #endif /* CONFIG_NO_HZ_COMMON */
@@ -362,9 +362,9 @@ void calc_global_load(unsigned long ticks)
 		return;
 
 	/*
-	 * Fold the 'old' idle-delta to include all NO_HZ cpus.
+	 * Fold the 'old' NO_HZ-delta to include all NO_HZ cpus.
 	 */
-	delta = calc_load_fold_idle();
+	delta = calc_load_nohz_fold();
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 
@@ -378,7 +378,8 @@ void calc_global_load(unsigned long ticks)
 	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);
 
 	/*
-	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.
+	 * In case we went to NO_HZ for multiple LOAD_FREQ intervals
+	 * catch up in bulk.
 	 */
 	calc_global_nohz();
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2046009..1c9a508 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -786,7 +786,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 */
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
-		calc_load_enter_idle();
+		calc_load_nohz_start();
 		cpu_load_update_nohz_start();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
@@ -833,7 +833,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	 */
 	timer_clear_idle();
 
-	calc_load_exit_idle();
+	calc_load_nohz_stop();
 	touch_softlockup_watchdog_sched();
 	/*
 	 * Cancel the scheduled timer and restore the tick
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] nohz: Move idle balancer registration to idle path
  2017-06-19  2:11 [PATCH 0/3] sched: A few nohz_full improvements Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz Frederic Weisbecker
@ 2017-06-19  2:12 ` Frederic Weisbecker
  2017-06-20 17:39   ` Rik van Riel
  2017-06-22 11:11   ` [tip:sched/core] nohz: Move idle balancer registration to the " tip-bot for Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs Frederic Weisbecker
  2 siblings, 2 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-19  2:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Rik van Riel, Thomas Gleixner, Ingo Molnar

The idle load balancing registration path assumes that we only stop the
tick when the CPU is idle, ignoring the nohz full case. As a result, a
nohz full CPU that is running a task may be chosen to perform idle load
balancing.

Lets make sure that only CPUs in dynticks idle mode can be picked as
idle load balancers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1c9a508..77c3be5 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -785,7 +785,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 * the scheduler tick in nohz_restart_sched_tick.
 	 */
 	if (!ts->tick_stopped) {
-		nohz_balance_enter_idle(cpu);
 		calc_load_nohz_start();
 		cpu_load_update_nohz_start();
 
@@ -938,8 +937,10 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 			ts->idle_expires = expires;
 		}
 
-		if (!was_stopped && ts->tick_stopped)
+		if (!was_stopped && ts->tick_stopped) {
 			ts->idle_jiffies = ts->last_jiffies;
+			nohz_balance_enter_idle(cpu);
+		}
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
  2017-06-19  2:11 [PATCH 0/3] sched: A few nohz_full improvements Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz Frederic Weisbecker
  2017-06-19  2:12 ` [PATCH 2/3] nohz: Move idle balancer registration to idle path Frederic Weisbecker
@ 2017-06-19  2:12 ` Frederic Weisbecker
  2017-06-20 17:42   ` Rik van Riel
  2017-06-22 11:11   ` [tip:sched/core] sched/fair: " tip-bot for Frederic Weisbecker
  2 siblings, 2 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-19  2:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Rik van Riel, Thomas Gleixner, Ingo Molnar

Although idle load balancing obviously only concern idle CPUs, it can
be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get rid
of an idle load balancing duty once a tick fires while it runs a task
and this can take a while in a nohz_full CPU.

We could fix that and escape the idle load balancing duty from the very
idle exit path but that would bring unecessary overhead. Lets just not
bother and leave that job to housekeeping CPUs (those outside nohz_full
range). The nohz_full CPUs simply don't want any disturbance.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d711093..cfca960 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
 	if (!cpu_active(cpu))
 		return;
 
+	/* Spare idle load balancing on CPUs that don't want to be disturbed */
+	if (!is_housekeeping_cpu(cpu))
+		return;
+
 	if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
 		return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz
  2017-06-19  2:12 ` [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz Frederic Weisbecker
@ 2017-06-20 17:38   ` Rik van Riel
  2017-06-22 11:10   ` [tip:sched/core] sched/loadavg: Generalize "_idle" naming to "_nohz" tip-bot for Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2017-06-20 17:38 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> The loadavg naming code still assumes that nohz == idle whereas its
> code
> is actually handling well both nohz idle and nohz full.
> 
> So lets fix the naming according to what the code actually does.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] nohz: Move idle balancer registration to idle path
  2017-06-19  2:12 ` [PATCH 2/3] nohz: Move idle balancer registration to idle path Frederic Weisbecker
@ 2017-06-20 17:39   ` Rik van Riel
  2017-06-22 11:11   ` [tip:sched/core] nohz: Move idle balancer registration to the " tip-bot for Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2017-06-20 17:39 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> The idle load balancing registration path assumes that we only stop
> the
> tick when the CPU is idle, ignoring the nohz full case. As a result,
> a
> nohz full CPU that is running a task may be chosen to perform idle
> load
> balancing.
> 
> Lets make sure that only CPUs in dynticks idle mode can be picked as
> idle load balancers.

Woah. Dang.

> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
  2017-06-19  2:12 ` [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs Frederic Weisbecker
@ 2017-06-20 17:42   ` Rik van Riel
  2017-06-20 19:06     ` Mike Galbraith
  2017-06-20 20:26     ` Frederic Weisbecker
  2017-06-22 11:11   ` [tip:sched/core] sched/fair: " tip-bot for Frederic Weisbecker
  1 sibling, 2 replies; 15+ messages in thread
From: Rik van Riel @ 2017-06-20 17:42 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra; +Cc: LKML, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1698 bytes --]

On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> Although idle load balancing obviously only concern idle CPUs, it can
> be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get
> rid
> of an idle load balancing duty once a tick fires while it runs a task
> and this can take a while in a nohz_full CPU.
> 
> We could fix that and escape the idle load balancing duty from the
> very
> idle exit path but that would bring unecessary overhead. Lets just
> not
> bother and leave that job to housekeeping CPUs (those outside
> nohz_full
> range). The nohz_full CPUs simply don't want any disturbance.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/sched/fair.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d711093..cfca960 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
>  	if (!cpu_active(cpu))
>  		return;
>  
> +	/* Spare idle load balancing on CPUs that don't want to be
> disturbed */
> +	if (!is_housekeeping_cpu(cpu))
> +		return;
> +
>  	if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
>  		return;

I am not entirely convinced on this one.

Doesn't the if (on_null_domain(cpu_rq(cpu)) test
a few lines down take care of this already?

Do we want nohz_full to always automatically
imply that no idle balancing will happen, like
on isolated CPUs?

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
  2017-06-20 17:42   ` Rik van Riel
@ 2017-06-20 19:06     ` Mike Galbraith
  2017-06-21 13:23       ` Frederic Weisbecker
  2017-06-20 20:26     ` Frederic Weisbecker
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2017-06-20 19:06 UTC (permalink / raw)
  To: Rik van Riel, Frederic Weisbecker, Peter Zijlstra
  Cc: LKML, Thomas Gleixner, Ingo Molnar

On Tue, 2017-06-20 at 13:42 -0400, Rik van Riel wrote:
> On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> > Although idle load balancing obviously only concern idle CPUs, it can
> > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get
> > rid
> > of an idle load balancing duty once a tick fires while it runs a task
> > and this can take a while in a nohz_full CPU.
> > 
> > We could fix that and escape the idle load balancing duty from the
> > very
> > idle exit path but that would bring unecessary overhead. Lets just
> > not
> > bother and leave that job to housekeeping CPUs (those outside
> > nohz_full
> > range). The nohz_full CPUs simply don't want any disturbance.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > ---
> >  kernel/sched/fair.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d711093..cfca960 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
> >  	if (!cpu_active(cpu))
> >  		return;
> >  
> > +	/* Spare idle load balancing on CPUs that don't want to be
> > disturbed */
> > +	if (!is_housekeeping_cpu(cpu))
> > +		return;
> > +
> >  	if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> >  		return;
> 
> I am not entirely convinced on this one.
> 
> Doesn't the if (on_null_domain(cpu_rq(cpu)) test
> a few lines down take care of this already?
> 
> Do we want nohz_full to always automatically
> imply that no idle balancing will happen, like
> on isolated CPUs?

IMO, nohz_full capable CPUs that are not isolated should automatically
become housekeepers, and nohz_full _active_ upon becoming isolated.
 When a used as a housekeeper, you still pay a price for having the
nohz_full capability available, but it doesn't have to be as high. 

In my kernels, I use cpusets to turn nohz on/off set wise, so CPUs can
be ticking, dyntick, nohz_full or housekeeper, RT load balancing and
cpupri on/off as well if you want to assume full responsibility.  It's
a tad (from box of xxl tads) ugly, but more flexible.

	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
  2017-06-20 17:42   ` Rik van Riel
  2017-06-20 19:06     ` Mike Galbraith
@ 2017-06-20 20:26     ` Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-20 20:26 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Peter Zijlstra, LKML, Thomas Gleixner, Ingo Molnar

On Tue, Jun 20, 2017 at 01:42:27PM -0400, Rik van Riel wrote:
> On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> > Although idle load balancing obviously only concern idle CPUs, it can
> > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get
> > rid
> > of an idle load balancing duty once a tick fires while it runs a task
> > and this can take a while in a nohz_full CPU.
> > 
> > We could fix that and escape the idle load balancing duty from the
> > very
> > idle exit path but that would bring unecessary overhead. Lets just
> > not
> > bother and leave that job to housekeeping CPUs (those outside
> > nohz_full
> > range). The nohz_full CPUs simply don't want any disturbance.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > ---
> >  kernel/sched/fair.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d711093..cfca960 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
> >  	if (!cpu_active(cpu))
> >  		return;
> >  
> > +	/* Spare idle load balancing on CPUs that don't want to be
> > disturbed */
> > +	if (!is_housekeeping_cpu(cpu))
> > +		return;
> > +
> >  	if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> >  		return;
> 
> I am not entirely convinced on this one.
> 
> Doesn't the if (on_null_domain(cpu_rq(cpu)) test
> a few lines down take care of this already?

It shouldn't, since nohz_full= doesn't imply isolcpus= anymore.
Of course it does if the user manually adds them.

> 
> Do we want nohz_full to always automatically
> imply that no idle balancing will happen, like
> on isolated CPUs?

You're making a good point in that I would prefer that nohz_full be
only about the tick and let some sort of separate isolation subsystem
deal with individual isolation features: nohz, workqueues, idle load
balancing, etc...

That's why I rather used is_housekeeping_cpu() and not !tick_nohz_full_cpu()
because for now housekeepers are ~tick_nohz_full_mask but later it should be
cpu_possible_mask by default or some given set of CPUs defined by the future
isolation subsystem.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs
  2017-06-20 19:06     ` Mike Galbraith
@ 2017-06-21 13:23       ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-21 13:23 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Rik van Riel, Peter Zijlstra, LKML, Thomas Gleixner, Ingo Molnar

On Tue, Jun 20, 2017 at 09:06:48PM +0200, Mike Galbraith wrote:
> On Tue, 2017-06-20 at 13:42 -0400, Rik van Riel wrote:
> > On Mon, 2017-06-19 at 04:12 +0200, Frederic Weisbecker wrote:
> > > Although idle load balancing obviously only concern idle CPUs, it can
> > > be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get
> > > rid
> > > of an idle load balancing duty once a tick fires while it runs a task
> > > and this can take a while in a nohz_full CPU.
> > > 
> > > We could fix that and escape the idle load balancing duty from the
> > > very
> > > idle exit path but that would bring unecessary overhead. Lets just
> > > not
> > > bother and leave that job to housekeeping CPUs (those outside
> > > nohz_full
> > > range). The nohz_full CPUs simply don't want any disturbance.
> > > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Rik van Riel <riel@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > ---
> > >  kernel/sched/fair.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d711093..cfca960 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8659,6 +8659,10 @@ void nohz_balance_enter_idle(int cpu)
> > >  	if (!cpu_active(cpu))
> > >  		return;
> > >  
> > > +	/* Spare idle load balancing on CPUs that don't want to be
> > > disturbed */
> > > +	if (!is_housekeeping_cpu(cpu))
> > > +		return;
> > > +
> > >  	if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> > >  		return;
> > 
> > I am not entirely convinced on this one.
> > 
> > Doesn't the if (on_null_domain(cpu_rq(cpu)) test
> > a few lines down take care of this already?
> > 
> > Do we want nohz_full to always automatically
> > imply that no idle balancing will happen, like
> > on isolated CPUs?
> 
> IMO, nohz_full capable CPUs that are not isolated should automatically
> become housekeepers, and nohz_full _active_ upon becoming isolated.
>  When a used as a housekeeper, you still pay a price for having the
> nohz_full capability available, but it doesn't have to be as high. 

That's right. So in the end checking for housekeeper on idle load balancing
is something we want, but not with the current definition of housekeepers
which is every CPU outside of nohz_full.

I should set this patch aside until I manage to decouple housekeeping from
nohz_full.

> In my kernels, I use cpusets to turn nohz on/off set wise, so CPUs can
> be ticking, dyntick, nohz_full or housekeeper, RT load balancing and
> cpupri on/off as well if you want to assume full responsibility.  It's
> a tad (from box of xxl tads) ugly, but more flexible.

Indeed I think that, in the end, driving the isolation "intensity" through
cpusets is a good idea. It's going to be quite a headache in the case
of nohz_full though if we want to avoid races against tick dependency,
cputime accounting.

But at least I can start to move the other various isolation features
to cpusets.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/loadavg: Generalize "_idle" naming to "_nohz"
  2017-06-19  2:12 ` [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz Frederic Weisbecker
  2017-06-20 17:38   ` Rik van Riel
@ 2017-06-22 11:10   ` tip-bot for Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-06-22 11:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, hpa, riel, tglx, peterz, fweisbec, mingo

Commit-ID:  3c85d6db5e5f05ae6c3d7f5a0ceceb43746a5ca7
Gitweb:     http://git.kernel.org/tip/3c85d6db5e5f05ae6c3d7f5a0ceceb43746a5ca7
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Mon, 19 Jun 2017 04:12:00 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Jun 2017 11:30:01 +0200

sched/loadavg: Generalize "_idle" naming to "_nohz"

The loadavg naming code still assumes that nohz == idle whereas its code
is actually handling well both nohz idle and nohz full.

So lets fix the naming according to what the code actually does, to
unconfuse the reader.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497838322-10913-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/trace/ftrace.txt |  2 +-
 include/linux/sched/nohz.h     |  8 +++----
 kernel/sched/loadavg.c         | 51 +++++++++++++++++++++---------------------
 kernel/time/tick-sched.c       |  4 ++--
 4 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 94a987b..fff8ff6 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1609,7 +1609,7 @@ Doing the same with chrt -r 5 and function-trace set.
   <idle>-0       3dN.2   14us : sched_avg_update <-__cpu_load_update
   <idle>-0       3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   <idle>-0       3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
-  <idle>-0       3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
+  <idle>-0       3dN.1   15us : calc_load_nohz_stop <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : hrtimer_cancel <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : hrtimer_try_to_cancel <-hrtimer_cancel
diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 4995b71..7d3f75d 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -23,11 +23,11 @@ static inline void set_cpu_sd_state_idle(void) { }
 #endif
 
 #ifdef CONFIG_NO_HZ_COMMON
-void calc_load_enter_idle(void);
-void calc_load_exit_idle(void);
+void calc_load_nohz_start(void);
+void calc_load_nohz_stop(void);
 #else
-static inline void calc_load_enter_idle(void) { }
-static inline void calc_load_exit_idle(void) { }
+static inline void calc_load_nohz_start(void) { }
+static inline void calc_load_nohz_stop(void) { }
 #endif /* CONFIG_NO_HZ_COMMON */
 
 #if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index f15fb2b..f14716a 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -117,7 +117,7 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
  * load-average relies on per-cpu sampling from the tick, it is affected by
  * NO_HZ.
  *
- * The basic idea is to fold the nr_active delta into a global idle-delta upon
+ * The basic idea is to fold the nr_active delta into a global NO_HZ-delta upon
  * entering NO_HZ state such that we can include this as an 'extra' cpu delta
  * when we read the global state.
  *
@@ -126,7 +126,7 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
  *  - When we go NO_HZ idle during the window, we can negate our sample
  *    contribution, causing under-accounting.
  *
- *    We avoid this by keeping two idle-delta counters and flipping them
+ *    We avoid this by keeping two NO_HZ-delta counters and flipping them
  *    when the window starts, thus separating old and new NO_HZ load.
  *
  *    The only trick is the slight shift in index flip for read vs write.
@@ -137,22 +137,22 @@ calc_load(unsigned long load, unsigned long exp, unsigned long active)
  *    r:0 0 1           1 0           0 1           1 0
  *    w:0 1 1           0 0           1 1           0 0
  *
- *    This ensures we'll fold the old idle contribution in this window while
+ *    This ensures we'll fold the old NO_HZ contribution in this window while
  *    accumlating the new one.
  *
- *  - When we wake up from NO_HZ idle during the window, we push up our
+ *  - When we wake up from NO_HZ during the window, we push up our
  *    contribution, since we effectively move our sample point to a known
  *    busy state.
  *
  *    This is solved by pushing the window forward, and thus skipping the
- *    sample, for this cpu (effectively using the idle-delta for this cpu which
+ *    sample, for this cpu (effectively using the NO_HZ-delta for this cpu which
  *    was in effect at the time the window opened). This also solves the issue
- *    of having to deal with a cpu having been in NOHZ idle for multiple
- *    LOAD_FREQ intervals.
+ *    of having to deal with a cpu having been in NO_HZ for multiple LOAD_FREQ
+ *    intervals.
  *
  * When making the ILB scale, we should try to pull this in as well.
  */
-static atomic_long_t calc_load_idle[2];
+static atomic_long_t calc_load_nohz[2];
 static int calc_load_idx;
 
 static inline int calc_load_write_idx(void)
@@ -167,7 +167,7 @@ static inline int calc_load_write_idx(void)
 
 	/*
 	 * If the folding window started, make sure we start writing in the
-	 * next idle-delta.
+	 * next NO_HZ-delta.
 	 */
 	if (!time_before(jiffies, READ_ONCE(calc_load_update)))
 		idx++;
@@ -180,24 +180,24 @@ static inline int calc_load_read_idx(void)
 	return calc_load_idx & 1;
 }
 
-void calc_load_enter_idle(void)
+void calc_load_nohz_start(void)
 {
 	struct rq *this_rq = this_rq();
 	long delta;
 
 	/*
-	 * We're going into NOHZ mode, if there's any pending delta, fold it
-	 * into the pending idle delta.
+	 * We're going into NO_HZ mode, if there's any pending delta, fold it
+	 * into the pending NO_HZ delta.
 	 */
 	delta = calc_load_fold_active(this_rq, 0);
 	if (delta) {
 		int idx = calc_load_write_idx();
 
-		atomic_long_add(delta, &calc_load_idle[idx]);
+		atomic_long_add(delta, &calc_load_nohz[idx]);
 	}
 }
 
-void calc_load_exit_idle(void)
+void calc_load_nohz_stop(void)
 {
 	struct rq *this_rq = this_rq();
 
@@ -217,13 +217,13 @@ void calc_load_exit_idle(void)
 		this_rq->calc_load_update += LOAD_FREQ;
 }
 
-static long calc_load_fold_idle(void)
+static long calc_load_nohz_fold(void)
 {
 	int idx = calc_load_read_idx();
 	long delta = 0;
 
-	if (atomic_long_read(&calc_load_idle[idx]))
-		delta = atomic_long_xchg(&calc_load_idle[idx], 0);
+	if (atomic_long_read(&calc_load_nohz[idx]))
+		delta = atomic_long_xchg(&calc_load_nohz[idx], 0);
 
 	return delta;
 }
@@ -299,9 +299,9 @@ calc_load_n(unsigned long load, unsigned long exp,
 
 /*
  * NO_HZ can leave us missing all per-cpu ticks calling
- * calc_load_account_active(), but since an idle CPU folds its delta into
- * calc_load_tasks_idle per calc_load_account_idle(), all we need to do is fold
- * in the pending idle delta if our idle period crossed a load cycle boundary.
+ * calc_load_fold_active(), but since a NO_HZ CPU folds its delta into
+ * calc_load_nohz per calc_load_nohz_start(), all we need to do is fold
+ * in the pending NO_HZ delta if our NO_HZ period crossed a load cycle boundary.
  *
  * Once we've updated the global active value, we need to apply the exponential
  * weights adjusted to the number of cycles missed.
@@ -330,7 +330,7 @@ static void calc_global_nohz(void)
 	}
 
 	/*
-	 * Flip the idle index...
+	 * Flip the NO_HZ index...
 	 *
 	 * Make sure we first write the new time then flip the index, so that
 	 * calc_load_write_idx() will see the new time when it reads the new
@@ -341,7 +341,7 @@ static void calc_global_nohz(void)
 }
 #else /* !CONFIG_NO_HZ_COMMON */
 
-static inline long calc_load_fold_idle(void) { return 0; }
+static inline long calc_load_nohz_fold(void) { return 0; }
 static inline void calc_global_nohz(void) { }
 
 #endif /* CONFIG_NO_HZ_COMMON */
@@ -362,9 +362,9 @@ void calc_global_load(unsigned long ticks)
 		return;
 
 	/*
-	 * Fold the 'old' idle-delta to include all NO_HZ cpus.
+	 * Fold the 'old' NO_HZ-delta to include all NO_HZ cpus.
 	 */
-	delta = calc_load_fold_idle();
+	delta = calc_load_nohz_fold();
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 
@@ -378,7 +378,8 @@ void calc_global_load(unsigned long ticks)
 	WRITE_ONCE(calc_load_update, sample_window + LOAD_FREQ);
 
 	/*
-	 * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk.
+	 * In case we went to NO_HZ for multiple LOAD_FREQ intervals
+	 * catch up in bulk.
 	 */
 	calc_global_nohz();
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9c2dc64..b1b58a0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -783,7 +783,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 */
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
-		calc_load_enter_idle();
+		calc_load_nohz_start();
 		cpu_load_update_nohz_start();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
@@ -823,7 +823,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 	 */
 	timer_clear_idle();
 
-	calc_load_exit_idle();
+	calc_load_nohz_stop();
 	touch_softlockup_watchdog_sched();
 	/*
 	 * Cancel the scheduled timer and restore the tick

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] nohz: Move idle balancer registration to the idle path
  2017-06-19  2:12 ` [PATCH 2/3] nohz: Move idle balancer registration to idle path Frederic Weisbecker
  2017-06-20 17:39   ` Rik van Riel
@ 2017-06-22 11:11   ` tip-bot for Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-06-22 11:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, peterz, fweisbec, tglx, linux-kernel, hpa, riel, torvalds

Commit-ID:  a0db971e4eb69fc84eb3d7ef94f718b483550b4a
Gitweb:     http://git.kernel.org/tip/a0db971e4eb69fc84eb3d7ef94f718b483550b4a
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Mon, 19 Jun 2017 04:12:01 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Jun 2017 11:30:01 +0200

nohz: Move idle balancer registration to the idle path

The idle load balancing registration path assumes that we only stop the
tick when the CPU is idle, ignoring the nohz full case. As a result, a
nohz full CPU that is running a task may be chosen to perform idle load
balancing.

Lets make sure that only CPUs in dynticks idle mode can be picked as
idle load balancers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497838322-10913-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b1b58a0..db023e9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -782,7 +782,6 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	 * the scheduler tick in nohz_restart_sched_tick.
 	 */
 	if (!ts->tick_stopped) {
-		nohz_balance_enter_idle(cpu);
 		calc_load_nohz_start();
 		cpu_load_update_nohz_start();
 
@@ -923,8 +922,10 @@ static void __tick_nohz_idle_enter(struct tick_sched *ts)
 			ts->idle_expires = expires;
 		}
 
-		if (!was_stopped && ts->tick_stopped)
+		if (!was_stopped && ts->tick_stopped) {
 			ts->idle_jiffies = ts->last_jiffies;
+			nohz_balance_enter_idle(cpu);
+		}
 	}
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Spare idle load balancing on nohz_full CPUs
  2017-06-19  2:12 ` [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs Frederic Weisbecker
  2017-06-20 17:42   ` Rik van Riel
@ 2017-06-22 11:11   ` tip-bot for Frederic Weisbecker
  2017-06-22 13:52     ` Frederic Weisbecker
  1 sibling, 1 reply; 15+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2017-06-22 11:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: riel, torvalds, peterz, fweisbec, linux-kernel, hpa, tglx, mingo

Commit-ID:  387bc8b5536eeb0a92f4b4ab553539eaea2ac0ba
Gitweb:     http://git.kernel.org/tip/387bc8b5536eeb0a92f4b4ab553539eaea2ac0ba
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Mon, 19 Jun 2017 04:12:02 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Jun 2017 11:30:02 +0200

sched/fair: Spare idle load balancing on nohz_full CPUs

Although idle load balancing obviously only concerns idle CPUs, it can
be a disturbance on a busy nohz_full CPU. Indeed a CPU can only get rid
of an idle load balancing duty once a tick fires while it runs a task
and this can take a while on a nohz_full CPU.

We could fix that and escape the idle load balancing duty from the very
idle exit path but that would bring unecessary overhead. Lets just not
bother and leave that job to housekeeping CPUs (those outside nohz_full
range). The nohz_full CPUs simply don't want any disturbance.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1497838322-10913-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a24661a..694c258 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8683,6 +8683,10 @@ void nohz_balance_enter_idle(int cpu)
 	if (!cpu_active(cpu))
 		return;
 
+	/* Spare idle load balancing on CPUs that don't want to be disturbed: */
+	if (!is_housekeeping_cpu(cpu))
+		return;
+
 	if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
 		return;
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [tip:sched/core] sched/fair: Spare idle load balancing on nohz_full CPUs
  2017-06-22 11:11   ` [tip:sched/core] sched/fair: " tip-bot for Frederic Weisbecker
@ 2017-06-22 13:52     ` Frederic Weisbecker
  2017-06-22 19:47       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2017-06-22 13:52 UTC (permalink / raw)
  To: mingo, tglx, hpa, linux-kernel, riel, torvalds, peterz; +Cc: linux-tip-commits

Hi Ingo,

On Thu, Jun 22, 2017 at 04:11:53AM -0700, tip-bot for Frederic Weisbecker wrote:
> Commit-ID:  387bc8b5536eeb0a92f4b4ab553539eaea2ac0ba
> Gitweb:     http://git.kernel.org/tip/387bc8b5536eeb0a92f4b4ab553539eaea2ac0ba
> Author:     Frederic Weisbecker <fweisbec@gmail.com>
> AuthorDate: Mon, 19 Jun 2017 04:12:02 +0200
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Thu, 22 Jun 2017 11:30:02 +0200
> 
> sched/fair: Spare idle load balancing on nohz_full CPUs

Thanks for applying the series! The two other patches are indeed good but
Rik and Mike have mitigated feelings about this very patch. I think we need to decouple
housekeeping from nohz_full before applying it. Is it possible to set it aside
for now?

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [tip:sched/core] sched/fair: Spare idle load balancing on nohz_full CPUs
  2017-06-22 13:52     ` Frederic Weisbecker
@ 2017-06-22 19:47       ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2017-06-22 19:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: tglx, hpa, linux-kernel, riel, torvalds, peterz, linux-tip-commits


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> Hi Ingo,
> 
> On Thu, Jun 22, 2017 at 04:11:53AM -0700, tip-bot for Frederic Weisbecker wrote:
> > Commit-ID:  387bc8b5536eeb0a92f4b4ab553539eaea2ac0ba
> > Gitweb:     http://git.kernel.org/tip/387bc8b5536eeb0a92f4b4ab553539eaea2ac0ba
> > Author:     Frederic Weisbecker <fweisbec@gmail.com>
> > AuthorDate: Mon, 19 Jun 2017 04:12:02 +0200
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Thu, 22 Jun 2017 11:30:02 +0200
> > 
> > sched/fair: Spare idle load balancing on nohz_full CPUs
> 
> Thanks for applying the series! The two other patches are indeed good but
> Rik and Mike have mitigated feelings about this very patch. I think we need to decouple
> housekeeping from nohz_full before applying it. Is it possible to set it aside
> for now?

Well, if patches are forthcoming and the commit isn't buggy per se, we could have 
it with the note of it being improved further.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-22 19:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19  2:11 [PATCH 0/3] sched: A few nohz_full improvements Frederic Weisbecker
2017-06-19  2:12 ` [PATCH 1/3] sched/loadavg: Generalize idle naming to nohz Frederic Weisbecker
2017-06-20 17:38   ` Rik van Riel
2017-06-22 11:10   ` [tip:sched/core] sched/loadavg: Generalize "_idle" naming to "_nohz" tip-bot for Frederic Weisbecker
2017-06-19  2:12 ` [PATCH 2/3] nohz: Move idle balancer registration to idle path Frederic Weisbecker
2017-06-20 17:39   ` Rik van Riel
2017-06-22 11:11   ` [tip:sched/core] nohz: Move idle balancer registration to the " tip-bot for Frederic Weisbecker
2017-06-19  2:12 ` [PATCH 3/3] sched: Spare idle load balancing on nohz_full CPUs Frederic Weisbecker
2017-06-20 17:42   ` Rik van Riel
2017-06-20 19:06     ` Mike Galbraith
2017-06-21 13:23       ` Frederic Weisbecker
2017-06-20 20:26     ` Frederic Weisbecker
2017-06-22 11:11   ` [tip:sched/core] sched/fair: " tip-bot for Frederic Weisbecker
2017-06-22 13:52     ` Frederic Weisbecker
2017-06-22 19:47       ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).