linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched: Fix/improve nohz cpu load updates v2
@ 2016-04-08  1:07 Frederic Weisbecker
  2016-04-08  1:07 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-08  1:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

This series tries to fix issues against CFS cpu load accounting in
nohz configs (both idle and full nohz). Some optimizations coming along.

Following Peterz reviews, I've tried to improve the changelogs. Comments
have been updated as well and some changes have been refactored.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	sched/nohz-v2

HEAD: 55cb7c1a25acb140253ca4e55f8d43cd31404b55

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      sched: Gather cpu load functions under a more conventional namespace
      sched: Correctly handle nohz ticks cpu load accounting
      sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates


 Documentation/trace/ftrace.txt |  10 +--
 include/linux/sched.h          |   6 +-
 kernel/sched/core.c            |   5 +-
 kernel/sched/fair.c            | 146 +++++++++++++++++++++++++++--------------
 kernel/sched/sched.h           |  10 +--
 kernel/time/tick-sched.c       |   9 +--
 6 files changed, 120 insertions(+), 66 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace
  2016-04-08  1:07 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v2 Frederic Weisbecker
@ 2016-04-08  1:07 ` Frederic Weisbecker
  2016-04-08  1:07 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
  2016-04-08  1:07 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
  2 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-08  1:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

The cpu load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

	update_cpu_load_*() -> cpu_load_update_*()

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 Documentation/trace/ftrace.txt | 10 +++++-----
 include/linux/sched.h          |  4 ++--
 kernel/sched/core.c            |  2 +-
 kernel/sched/fair.c            | 24 ++++++++++++------------
 kernel/sched/sched.h           |  4 ++--
 kernel/time/tick-sched.c       |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   <idle>-0       3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  <idle>-0       3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  <idle>-0       3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   <idle>-0       3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  <idle>-0       3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  <idle>-0       3dN.2   14us : sched_avg_update <-__update_cpu_load
-  <idle>-0       3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  <idle>-0       3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  <idle>-0       3dN.2   14us : sched_avg_update <-__cpu_load_update
+  <idle>-0       3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   <idle>-0       3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   <idle>-0       3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 52c4847..aa5ee0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4c522a7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2915,7 +2915,7 @@ void scheduler_tick(void)
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
-	update_cpu_load_active(rq);
+	cpu_load_update_active(rq);
 	calc_global_load_tick(rq);
 	raw_spin_unlock(&rq->lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e66..f33764d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			      unsigned long pending_updates, int active)
 {
 	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __update_cpu_load_nohz(struct rq *this_rq,
+static void __cpu_load_update_nohz(struct rq *this_rq,
 				   unsigned long curr_jiffies,
 				   unsigned long load,
 				   int active)
@@ -4589,7 +4589,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__update_cpu_load(this_rq, load, pending_updates, active);
+		__cpu_load_update(this_rq, load, pending_updates, active);
 	}
 }
 
@@ -4610,7 +4610,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
-static void update_cpu_load_idle(struct rq *this_rq)
+static void cpu_load_update_idle(struct rq *this_rq)
 {
 	/*
 	 * bail if there's load or we're actually up-to-date.
@@ -4618,13 +4618,13 @@ static void update_cpu_load_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__update_cpu_load_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
 }
 
 /*
  * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
  */
-void update_cpu_load_nohz(int active)
+void cpu_load_update_nohz(int active)
 {
 	struct rq *this_rq = this_rq();
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
@@ -4634,7 +4634,7 @@ void update_cpu_load_nohz(int active)
 		return;
 
 	raw_spin_lock(&this_rq->lock);
-	__update_cpu_load_nohz(this_rq, curr_jiffies, load, active);
+	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
 	raw_spin_unlock(&this_rq->lock);
 }
 #endif /* CONFIG_NO_HZ */
@@ -4642,14 +4642,14 @@ void update_cpu_load_nohz(int active)
 /*
  * Called from scheduler_tick()
  */
-void update_cpu_load_active(struct rq *this_rq)
+void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
 	/*
-	 * See the mess around update_cpu_load_idle() / update_cpu_load_nohz().
+	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, load, 1, 1);
+	__cpu_load_update(this_rq, load, 1, 1);
 }
 
 /*
@@ -7957,7 +7957,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		if (time_after_eq(jiffies, rq->next_balance)) {
 			raw_spin_lock_irq(&rq->lock);
 			update_rq_clock(rq);
-			update_cpu_load_idle(rq);
+			cpu_load_update_idle(rq);
 			raw_spin_unlock_irq(&rq->lock);
 			rebalance_domains(rq, CPU_IDLE);
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d2..1802013 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -31,9 +31,9 @@ extern void calc_global_load_tick(struct rq *this_rq);
 extern long calc_load_fold_active(struct rq *this_rq);
 
 #ifdef CONFIG_SMP
-extern void update_cpu_load_active(struct rq *this_rq);
+extern void cpu_load_update_active(struct rq *this_rq);
 #else
-static inline void update_cpu_load_active(struct rq *this_rq) { }
+static inline void cpu_load_update_active(struct rq *this_rq) { }
 #endif
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 58e3310..66bdc9a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -806,7 +806,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	update_cpu_load_nohz(active);
+	cpu_load_update_nohz(active);
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-08  1:07 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v2 Frederic Weisbecker
  2016-04-08  1:07 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
@ 2016-04-08  1:07 ` Frederic Weisbecker
  2016-04-08  9:41   ` Peter Zijlstra
  2016-04-08  1:07 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
  2 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-08  1:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with cpu load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h    |  6 ++-
 kernel/sched/fair.c      | 95 +++++++++++++++++++++++++++++++-----------------
 kernel/time/tick-sched.c |  9 +++--
 3 files changed, 70 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa5ee0d..07b1b1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f33764d..1dd864d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4495,7 +4495,6 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
- * @active: !0 for NOHZ_FULL
  *
  * Update rq->cpu_load[] statistics. This function is usually called every
  * scheduler tick (TICK_NSEC).
@@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
  *
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
- * term. See the @active paramter.
+ * term.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
-			      unsigned long pending_updates, int active)
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
+			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+	unsigned long tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4574,10 +4573,23 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-				   unsigned long curr_jiffies,
-				   unsigned long load,
-				   int active)
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we need to avoid the delta approach from the regular tick when
+ * possible since that would seriously skew the load calculation. This is why we
+ * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on
+ * jiffies deltas for updates happening while in nohz mode (idle ticks, idle
+ * loop exit, nohz_idle_balance, nohz full exit...)
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
+static void cpu_load_update_nohz(struct rq *this_rq,
+				 unsigned long curr_jiffies,
+				 unsigned long load)
 {
 	unsigned long pending_updates;
 
@@ -4589,24 +4601,11 @@ static void __cpu_load_update_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__cpu_load_update(this_rq, load, pending_updates, active);
+		cpu_load_update(this_rq, load, pending_updates);
 	}
 }
 
 /*
- * There is no sane way to deal with nohz on smp when using jiffies because the
- * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
- * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
- *
- * Therefore we cannot use the delta approach from the regular tick since that
- * would seriously skew the load calculation. However we'll make do for those
- * updates happening while idle (nohz_idle_balance) or coming out of idle
- * (tick_nohz_idle_exit).
- *
- * This means we might still be one tick off for nohz periods.
- */
-
-/*
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
@@ -4618,26 +4617,54 @@ static void cpu_load_update_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0);
 }
 
 /*
- * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ * Record CPU load on nohz entry so we know the tickless load to account
+ * on nohz exit.
  */
-void cpu_load_update_nohz(int active)
+void cpu_load_update_nohz_start(void)
 {
 	struct rq *this_rq = this_rq();
+
+	/*
+	 * This is all lockless but should be fine. If weighted_cpuload changes
+	 * concurrently we'll exit nohz. And cpu_load write can race with
+	 * cpu_load_update_idle() but both updater would be writing the same.
+	 */
+	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
+}
+
+/*
+ * Account the tickless load in the end of a nohz frame.
+ */
+void cpu_load_update_nohz_stop(void)
+{
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
-	unsigned long load = active ? weighted_cpuload(cpu_of(this_rq)) : 0;
+	struct rq *this_rq = this_rq();
+	unsigned long load;
 
 	if (curr_jiffies == this_rq->last_load_update_tick)
 		return;
 
+	load = weighted_cpuload(cpu_of(this_rq));
 	raw_spin_lock(&this_rq->lock);
-	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
+	cpu_load_update_nohz(this_rq, curr_jiffies, load);
 	raw_spin_unlock(&this_rq->lock);
 }
-#endif /* CONFIG_NO_HZ */
+#else /* !CONFIG_NO_HZ_COMMON */
+static inline void cpu_load_update_nohz(struct rq *this_rq,
+					unsigned long curr_jiffies,
+					unsigned long load) { }
+#endif /* CONFIG_NO_HZ_COMMON */
+
+static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
+{
+	/* See the mess around cpu_load_update_nohz(). */
+	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+	cpu_load_update(this_rq, load, 1);
+}
 
 /*
  * Called from scheduler_tick()
@@ -4645,11 +4672,11 @@ void cpu_load_update_nohz(int active)
 void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
-	/*
-	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
-	 */
-	this_rq->last_load_update_tick = jiffies;
-	__cpu_load_update(this_rq, load, 1, 1);
+
+	if (tick_nohz_tick_stopped())
+		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
+	else
+		cpu_load_update_periodic(this_rq, load);
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 66bdc9a..31872bc 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -776,6 +776,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
 		calc_load_enter_idle();
+		cpu_load_update_nohz_start();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
@@ -802,11 +803,11 @@ out:
 	return tick;
 }
 
-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int active)
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	cpu_load_update_nohz(active);
+	cpu_load_update_nohz_stop();
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
@@ -833,7 +834,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick(ts, ktime_get(), 1);
+		tick_nohz_restart_sched_tick(ts, ktime_get());
 #endif
 }
 
@@ -1024,7 +1025,7 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(ts, now);
 
 	if (ts->tick_stopped) {
-		tick_nohz_restart_sched_tick(ts, now, 0);
+		tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-08  1:07 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v2 Frederic Weisbecker
  2016-04-08  1:07 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
  2016-04-08  1:07 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
@ 2016-04-08  1:07 ` Frederic Weisbecker
  2016-04-08 10:48   ` Peter Zijlstra
  2 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-08  1:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Some code in cpu load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  |  3 ++-
 kernel/sched/fair.c  | 43 ++++++++++++++++++++++++++++++++-----------
 kernel/sched/sched.h |  6 ++++--
 3 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c522a7..59a2821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7327,8 +7327,9 @@ void __init sched_init(void)
 
 		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
 			rq->cpu_load[j] = 0;
-
+#ifdef CONFIG_NO_HZ_COMMON
 		rq->last_load_update_tick = jiffies;
+#endif
 
 #ifdef CONFIG_SMP
 		rq->sd = NULL;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1dd864d..4618e5b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,6 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
+#ifdef CONFIG_NO_HZ_COMMON
 
 /*
  * per rq 'load' arrray crap; XXX kill this.
@@ -4490,6 +4491,33 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	return load;
 }
 
+static unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+		       unsigned long pending_updates, int idx)
+{
+	old_load = decay_load_missed(old_load, pending_updates - 1, idx);
+	if (tickless_load) {
+		old_load -= decay_load_missed(tickless_load, pending_updates - 1, idx);
+		/*
+		 * old_load can never be a negative value because a
+		 * decayed tickless_load cannot be greater than the
+		 * original tickless_load.
+		 */
+		old_load += tickless_load;
+	}
+	return old_load;
+}
+#else /* !CONFIG_NO_HZ_COMMON */
+
+static inline unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+		       unsigned long pending_updates, int idx)
+{
+	return old_load;
+}
+
+#endif /* CONFIG_NO_HZ_COMMON */
+
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
@@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
-		old_load = this_rq->cpu_load[i];
-		old_load = decay_load_missed(old_load, pending_updates - 1, i);
-		if (tickless_load) {
-			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
-			/*
-			 * old_load can never be a negative value because a
-			 * decayed tickless_load cannot be greater than the
-			 * original tickless_load.
-			 */
-			old_load += tickless_load;
-		}
+		old_load = cpu_load_update_missed(this_rq->cpu_load[i],
+						  tickless_load, pending_updates, i);
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This
@@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
 	/* See the mess around cpu_load_update_nohz(). */
 	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
 	cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..2302bb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
 	unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+#endif /* CONFIG_SMP */
 	u64 nohz_stamp;
 	unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON */
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-08  1:07 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
@ 2016-04-08  9:41   ` Peter Zijlstra
  2016-04-08 12:53     ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2016-04-08  9:41 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 03:07:12AM +0200, Frederic Weisbecker wrote:
> +void cpu_load_update_nohz_start(void)
>  {
>  	struct rq *this_rq = this_rq();
> +
> +	/*
> +	 * This is all lockless but should be fine. If weighted_cpuload changes
> +	 * concurrently we'll exit nohz. And cpu_load write can race with
> +	 * cpu_load_update_idle() but both updater would be writing the same.
> +	 */
> +	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
> +}

There is more to this; this also updates ->cpu_load[0] at possibly much
higher frequency than we've done before, while not updating the other
->cpu_load[] members.

Now, I'm not sure we care, but it is a bit odd.

> +/*
> + * Account the tickless load in the end of a nohz frame.
> + */
> +void cpu_load_update_nohz_stop(void)
> +{
>  	unsigned long curr_jiffies = READ_ONCE(jiffies);
> +	struct rq *this_rq = this_rq();
> +	unsigned long load;
>  
>  	if (curr_jiffies == this_rq->last_load_update_tick)
>  		return;
>  
> +	load = weighted_cpuload(cpu_of(this_rq));
>  	raw_spin_lock(&this_rq->lock);
> +	cpu_load_update_nohz(this_rq, curr_jiffies, load);
>  	raw_spin_unlock(&this_rq->lock);
>  }

And this makes us take rq->lock when waking from nohz; a bit
unfortunate. Do we really need this though? Will not a tick be
forthcoming real-soon-now?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-08  1:07 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
@ 2016-04-08 10:48   ` Peter Zijlstra
  2016-04-08 12:55     ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2016-04-08 10:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 03:07:13AM +0200, Frederic Weisbecker wrote:
> index 4c522a7..59a2821 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7327,8 +7327,9 @@ void __init sched_init(void)
>  
>  		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
>  			rq->cpu_load[j] = 0;
> -
> +#ifdef CONFIG_NO_HZ_COMMON
>  		rq->last_load_update_tick = jiffies;
> +#endif
>  
>  #ifdef CONFIG_SMP
>  		rq->sd = NULL;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1dd864d..4618e5b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
>  
>  static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
>  {
> +#ifdef CONFIG_NO_HZ_COMMON
>  	/* See the mess around cpu_load_update_nohz(). */
>  	this_rq->last_load_update_tick = READ_ONCE(jiffies);
> +#endif
>  	cpu_load_update(this_rq, load, 1);
>  }
>  

Here you do the simple #ifdef, while here you make a giant mess instead
of the relatively straight forward:

> @@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
>  
>  		/* scale is effectively 1 << i now, and >> i divides by scale */
>  
> -		old_load = this_rq->cpu_load[i];
#ifdef CONFIG_NO_HZ_COMMON
> -		old_load = decay_load_missed(old_load, pending_updates - 1, i);
> -		if (tickless_load) {
> -			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
> -			/*
> -			 * old_load can never be a negative value because a
> -			 * decayed tickless_load cannot be greater than the
> -			 * original tickless_load.
> -			 */
> -			old_load += tickless_load;
> -		}
#endif
>  		new_load = this_load;
>  		/*
>  		 * Round up the averaging division if load is increasing. This

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-08  9:41   ` Peter Zijlstra
@ 2016-04-08 12:53     ` Frederic Weisbecker
  2016-04-08 17:40       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-08 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 11:41:54AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 03:07:12AM +0200, Frederic Weisbecker wrote:
> > +void cpu_load_update_nohz_start(void)
> >  {
> >  	struct rq *this_rq = this_rq();
> > +
> > +	/*
> > +	 * This is all lockless but should be fine. If weighted_cpuload changes
> > +	 * concurrently we'll exit nohz. And cpu_load write can race with
> > +	 * cpu_load_update_idle() but both updater would be writing the same.
> > +	 */
> > +	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
> > +}
> 
> There is more to this; this also updates ->cpu_load[0] at possibly much
> higher frequency than we've done before, while not updating the other
> ->cpu_load[] members.
> 
> Now, I'm not sure we care, but it is a bit odd.

This is right. cpu_load[0] is aimed at showing the most recent load, without knowing
when was the last update (the last tick/update could have been recent or not, the readers shouldn't
care). Now we can indeed worry about it if this field is read altogether with the other indexes and
those are put into some relation. But it seems that each of the rq->cpu_load[i] are read independently
without relation or comparison. Now really I'm saying that without much assurance as I don't know
the details of the load balancing code.

If in doubt I can create a field in struct rq to record the tickless load instead
of storing it in cpu_load[0]. That was in fact the first direction I took in my drafts.

> 
> > +/*
> > + * Account the tickless load in the end of a nohz frame.
> > + */
> > +void cpu_load_update_nohz_stop(void)
> > +{
> >  	unsigned long curr_jiffies = READ_ONCE(jiffies);
> > +	struct rq *this_rq = this_rq();
> > +	unsigned long load;
> >  
> >  	if (curr_jiffies == this_rq->last_load_update_tick)
> >  		return;
> >  
> > +	load = weighted_cpuload(cpu_of(this_rq));
> >  	raw_spin_lock(&this_rq->lock);
> > +	cpu_load_update_nohz(this_rq, curr_jiffies, load);
> >  	raw_spin_unlock(&this_rq->lock);
> >  }
> 
> And this makes us take rq->lock when waking from nohz; a bit
> unfortunate. Do we really need this though? 

Note it was already the case before this patchset, the function was called
cpu_load_update_nohz() instead.

I think we need it, at least due to remote updates performed by cpu_load_update_idle()
(which only updates when load is 0 though).

> Will not a tick be forthcoming real-soon-now?

No guarantee about that. If the next task only runs for less than 10ms (in HZ=100),
there might be no tick until the next idle period.

Besides in nohz_full, the next task may be running tickless as well, in which case
we really need to update now.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-08 10:48   ` Peter Zijlstra
@ 2016-04-08 12:55     ` Frederic Weisbecker
  2016-04-08 17:44       ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-08 12:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 12:48:21PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 03:07:13AM +0200, Frederic Weisbecker wrote:
> > index 4c522a7..59a2821 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7327,8 +7327,9 @@ void __init sched_init(void)
> >  
> >  		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
> >  			rq->cpu_load[j] = 0;
> > -
> > +#ifdef CONFIG_NO_HZ_COMMON
> >  		rq->last_load_update_tick = jiffies;
> > +#endif
> >  
> >  #ifdef CONFIG_SMP
> >  		rq->sd = NULL;
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1dd864d..4618e5b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4661,8 +4680,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
> >  
> >  static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
> >  {
> > +#ifdef CONFIG_NO_HZ_COMMON
> >  	/* See the mess around cpu_load_update_nohz(). */
> >  	this_rq->last_load_update_tick = READ_ONCE(jiffies);
> > +#endif
> >  	cpu_load_update(this_rq, load, 1);
> >  }
> >  
> 
> Here you do the simple #ifdef, while here you make a giant mess instead
> of the relatively straight forward:
> 
> > @@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
> >  
> >  		/* scale is effectively 1 << i now, and >> i divides by scale */
> >  
> > -		old_load = this_rq->cpu_load[i];
> #ifdef CONFIG_NO_HZ_COMMON
> > -		old_load = decay_load_missed(old_load, pending_updates - 1, i);
> > -		if (tickless_load) {
> > -			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
> > -			/*
> > -			 * old_load can never be a negative value because a
> > -			 * decayed tickless_load cannot be greater than the
> > -			 * original tickless_load.
> > -			 */
> > -			old_load += tickless_load;
> > -		}
> #endif

Ah sure, if you prefer it that way, I can do that.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-08 12:53     ` Frederic Weisbecker
@ 2016-04-08 17:40       ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2016-04-08 17:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 02:53:21PM +0200, Frederic Weisbecker wrote:
> On Fri, Apr 08, 2016 at 11:41:54AM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 08, 2016 at 03:07:12AM +0200, Frederic Weisbecker wrote:
> > > +void cpu_load_update_nohz_start(void)
> > >  {
> > >  	struct rq *this_rq = this_rq();
> > > +
> > > +	/*
> > > +	 * This is all lockless but should be fine. If weighted_cpuload changes
> > > +	 * concurrently we'll exit nohz. And cpu_load write can race with
> > > +	 * cpu_load_update_idle() but both updater would be writing the same.
> > > +	 */
> > > +	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
> > > +}
> > 
> > There is more to this; this also updates ->cpu_load[0] at possibly much
> > higher frequency than we've done before, while not updating the other
> > ->cpu_load[] members.
> > 
> > Now, I'm not sure we care, but it is a bit odd.
> 
> This is right. cpu_load[0] is aimed at showing the most recent load,
> without knowing when was the last update (the last tick/update could
> have been recent or not, the readers shouldn't care). Now we can
> indeed worry about it if this field is read altogether with the other
> indexes and those are put into some relation. But it seems that each
> of the rq->cpu_load[i] are read independently without relation or
> comparison. Now really I'm saying that without much assurance as I
> don't know the details of the load balancing code.
> 
> If in doubt I can create a field in struct rq to record the tickless
> load instead of storing it in cpu_load[0]. That was in fact the first
> direction I took in my drafts.

Yeah, no I think this is fine, just maybe wants a comment.

> > > +/*
> > > + * Account the tickless load in the end of a nohz frame.
> > > + */
> > > +void cpu_load_update_nohz_stop(void)
> > > +{
> > >  	unsigned long curr_jiffies = READ_ONCE(jiffies);
> > > +	struct rq *this_rq = this_rq();
> > > +	unsigned long load;
> > >  
> > >  	if (curr_jiffies == this_rq->last_load_update_tick)
> > >  		return;
> > >  
> > > +	load = weighted_cpuload(cpu_of(this_rq));
> > >  	raw_spin_lock(&this_rq->lock);
> > > +	cpu_load_update_nohz(this_rq, curr_jiffies, load);
> > >  	raw_spin_unlock(&this_rq->lock);
> > >  }
> > 
> > And this makes us take rq->lock when waking from nohz; a bit
> > unfortunate. Do we really need this though? 
> 
> Note it was already the case before this patchset, the function was called
> cpu_load_update_nohz() instead.

Ah, ok. I got lost in the whole rename + nohz maze (again). OK no
problem then.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-08 12:55     ` Frederic Weisbecker
@ 2016-04-08 17:44       ` Peter Zijlstra
  2016-04-11 13:18         ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2016-04-08 17:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 02:55:22PM +0200, Frederic Weisbecker wrote:
> > > @@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
> > >  
> > >  		/* scale is effectively 1 << i now, and >> i divides by scale */
> > >  
> > > -		old_load = this_rq->cpu_load[i];
> > #ifdef CONFIG_NO_HZ_COMMON
> > > -		old_load = decay_load_missed(old_load, pending_updates - 1, i);
> > > -		if (tickless_load) {
> > > -			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
> > > -			/*
> > > -			 * old_load can never be a negative value because a
> > > -			 * decayed tickless_load cannot be greater than the
> > > -			 * original tickless_load.
> > > -			 */
> > > -			old_load += tickless_load;
> > > -		}
> > #endif
> 
> Ah sure, if you prefer it that way, I can do that.

Yes please. I normally favour the thing you did, but here it makes
tricky code that much harder to read.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-08 17:44       ` Peter Zijlstra
@ 2016-04-11 13:18         ` Frederic Weisbecker
  2016-04-11 14:53           ` Chris Metcalf
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-11 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 08, 2016 at 07:44:14PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 02:55:22PM +0200, Frederic Weisbecker wrote:
> > > > @@ -4540,17 +4568,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
> > > >  
> > > >  		/* scale is effectively 1 << i now, and >> i divides by scale */
> > > >  
> > > > -		old_load = this_rq->cpu_load[i];
> > > #ifdef CONFIG_NO_HZ_COMMON
> > > > -		old_load = decay_load_missed(old_load, pending_updates - 1, i);
> > > > -		if (tickless_load) {
> > > > -			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
> > > > -			/*
> > > > -			 * old_load can never be a negative value because a
> > > > -			 * decayed tickless_load cannot be greater than the
> > > > -			 * original tickless_load.
> > > > -			 */
> > > > -			old_load += tickless_load;
> > > > -		}
> > > #endif
> > 
> > Ah sure, if you prefer it that way, I can do that.
> 
> Yes please. I normally favour the thing you did, but here it makes
> tricky code that much harder to read.

So I tried and it warns about the unused variable tickless_load, so I
would need two scattered ifdeffery in the function:

@@ -4528,7 +4529,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			    unsigned long pending_updates)
 {
+#ifdef CONFIG_NO_HZ_COMMON
 	unsigned long tickless_load = this_rq->cpu_load[0];
+#endif
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4541,6 +4544,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
 		old_load = this_rq->cpu_load[i];
+#ifdef CONFIG_NO_HZ_COMMON
 		old_load = decay_load_missed(old_load, pending_updates - 1, i);
 		if (tickless_load) {
 			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
@@ -4551,6 +4555,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			 */
 			old_load += tickless_load;
 		}
+#endif
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This


Are you still sure you don't want the ifdeffed inline function?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-11 13:18         ` Frederic Weisbecker
@ 2016-04-11 14:53           ` Chris Metcalf
  2016-04-11 18:21             ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Metcalf @ 2016-04-11 14:53 UTC (permalink / raw)
  To: Frederic Weisbecker, Peter Zijlstra
  Cc: LKML, Byungchul Park, Thomas Gleixner, Luiz Capitulino,
	Christoph Lameter, Paul E . McKenney, Mike Galbraith,
	Rik van Riel, Ingo Molnar

On 4/11/2016 9:18 AM, Frederic Weisbecker wrote:
> So I tried and it warns about the unused variable tickless_load, so I
> would need two scattered ifdeffery in the function:
>
> @@ -4528,7 +4529,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
>   static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
>   			    unsigned long pending_updates)
>   {
> +#ifdef CONFIG_NO_HZ_COMMON
>   	unsigned long tickless_load = this_rq->cpu_load[0];
> +#endif

Just move the initialization down to the first use, as a regular
assignment, and add __maybe_unused to the declaration, and the compiler
will then keep quiet (see Documentation/CodingStyle).

I have no comment on which of the approaches looks better overall,
but I think using __maybe_unused definitely improves this approach.
-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-11 14:53           ` Chris Metcalf
@ 2016-04-11 18:21             ` Frederic Weisbecker
  2016-04-12 14:23               ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-11 18:21 UTC (permalink / raw)
  To: Chris Metcalf
  Cc: Peter Zijlstra, LKML, Byungchul Park, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Mon, Apr 11, 2016 at 10:53:01AM -0400, Chris Metcalf wrote:
> On 4/11/2016 9:18 AM, Frederic Weisbecker wrote:
> >So I tried and it warns about the unused variable tickless_load, so I
> >would need two scattered ifdeffery in the function:
> >
> >@@ -4528,7 +4529,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
> >  static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
> >  			    unsigned long pending_updates)
> >  {
> >+#ifdef CONFIG_NO_HZ_COMMON
> >  	unsigned long tickless_load = this_rq->cpu_load[0];
> >+#endif
> 
> Just move the initialization down to the first use, as a regular
> assignment, and add __maybe_unused to the declaration, and the compiler
> will then keep quiet (see Documentation/CodingStyle).
> 
> I have no comment on which of the approaches looks better overall,
> but I think using __maybe_unused definitely improves this approach.

I thought about it yeah. I usually avoid __maybe_unused because it's often
a bad sign concerning the code layout.

Now in this precise case I wouldn't mind though. Peter what's your opinion?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-11 18:21             ` Frederic Weisbecker
@ 2016-04-12 14:23               ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2016-04-12 14:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Chris Metcalf, LKML, Byungchul Park, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Mon, Apr 11, 2016 at 08:21:31PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 11, 2016 at 10:53:01AM -0400, Chris Metcalf wrote:
> > On 4/11/2016 9:18 AM, Frederic Weisbecker wrote:
> > >So I tried and it warns about the unused variable tickless_load, so I
> > >would need two scattered ifdeffery in the function:
> > >
> > >@@ -4528,7 +4529,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
> > >  static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
> > >  			    unsigned long pending_updates)
> > >  {
> > >+#ifdef CONFIG_NO_HZ_COMMON
> > >  	unsigned long tickless_load = this_rq->cpu_load[0];
> > >+#endif
> > 
> > Just move the initialization down to the first use, as a regular
> > assignment, and add __maybe_unused to the declaration, and the compiler
> > will then keep quiet (see Documentation/CodingStyle).
> > 
> > I have no comment on which of the approaches looks better overall,
> > but I think using __maybe_unused definitely improves this approach.
> 
> I thought about it yeah. I usually avoid __maybe_unused because it's often
> a bad sign concerning the code layout.
> 
> Now in this precise case I wouldn't mind though. Peter what's your opinion?

Sure, go with __maybe_unused.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
@ 2016-04-18 13:36   ` Frederic Weisbecker
  0 siblings, 0 replies; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-18 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Wed, Apr 13, 2016 at 03:56:52PM +0200, Frederic Weisbecker wrote:
> Some code in cpu load update only concern NO_HZ configs but it is
> built on all configurations. When NO_HZ isn't built, that code is harmless
> but just happens to take some useless ressources in CPU and memory:
> 
> 1) one useless field in struct rq
> 2) jiffies record on every tick that is never used (cpu_load_update_periodic)
> 3) decay_load_missed is called two times on every tick to eventually
>    return immediately with no action taken. And that function is dead
>    code.
> 
> For pure optimization purposes, lets conditionally build the NO_HZ
> related code.
> 
> Cc: Byungchul Park <byungchul.park@lge.com>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  kernel/sched/core.c  | 3 ++-
>  kernel/sched/fair.c  | 9 +++++++--
>  kernel/sched/sched.h | 6 ++++--
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4c522a7..59a2821 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7327,8 +7327,9 @@ void __init sched_init(void)
>  
>  		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
>  			rq->cpu_load[j] = 0;
> -
> +#ifdef CONFIG_NO_HZ_COMMON
>  		rq->last_load_update_tick = jiffies;
> +#endif

I forgot to also conditionally initialize it in sched_init(), I need to resend
the patchset with that fixed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-13 13:56 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3 Frederic Weisbecker
@ 2016-04-13 13:56 ` Frederic Weisbecker
  2016-04-18 13:36   ` Frederic Weisbecker
  0 siblings, 1 reply; 16+ messages in thread
From: Frederic Weisbecker @ 2016-04-13 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Some code in cpu load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  | 3 ++-
 kernel/sched/fair.c  | 9 +++++++--
 kernel/sched/sched.h | 6 ++++--
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c522a7..59a2821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7327,8 +7327,9 @@ void __init sched_init(void)
 
 		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
 			rq->cpu_load[j] = 0;
-
+#ifdef CONFIG_NO_HZ_COMMON
 		rq->last_load_update_tick = jiffies;
+#endif
 
 #ifdef CONFIG_SMP
 		rq->sd = NULL;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a8b79f9..93eea4f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,7 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
-
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * per rq 'load' arrray crap; XXX kill this.
  */
@@ -4489,6 +4489,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	}
 	return load;
 }
+#endif /* CONFIG_NO_HZ_COMMON */
 
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
@@ -4528,7 +4529,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = this_rq->cpu_load[0];
+	unsigned long __maybe_unused tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4541,6 +4542,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
 		old_load = this_rq->cpu_load[i];
+#ifdef CONFIG_NO_HZ_COMMON
 		old_load = decay_load_missed(old_load, pending_updates - 1, i);
 		if (tickless_load) {
 			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
@@ -4551,6 +4553,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			 */
 			old_load += tickless_load;
 		}
+#endif
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This
@@ -4663,8 +4666,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
 	/* See the mess around cpu_load_update_nohz(). */
 	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
 	cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..2302bb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
 	unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+#endif /* CONFIG_SMP */
 	u64 nohz_stamp;
 	unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON */
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-04-18 13:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08  1:07 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v2 Frederic Weisbecker
2016-04-08  1:07 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
2016-04-08  1:07 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
2016-04-08  9:41   ` Peter Zijlstra
2016-04-08 12:53     ` Frederic Weisbecker
2016-04-08 17:40       ` Peter Zijlstra
2016-04-08  1:07 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
2016-04-08 10:48   ` Peter Zijlstra
2016-04-08 12:55     ` Frederic Weisbecker
2016-04-08 17:44       ` Peter Zijlstra
2016-04-11 13:18         ` Frederic Weisbecker
2016-04-11 14:53           ` Chris Metcalf
2016-04-11 18:21             ` Frederic Weisbecker
2016-04-12 14:23               ` Peter Zijlstra
2016-04-13 13:56 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3 Frederic Weisbecker
2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
2016-04-18 13:36   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).