linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3
@ 2016-04-13 13:56 Frederic Weisbecker
  2016-04-13 13:56 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-13 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Thanks Peterz and Chris, here is the v3 that addresses your reviews:

* Add comment about cpu_load[0] being updated more frequently than
  other cpu_load[idx]. See comment above cpu_load_update_start().

* Simplify ifdeffery for decay_load_missed() calls. Use __maybe_unused
  on the related variable.

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	sched/nohz-v3

HEAD: 101db6e825806a08e14c3e06b5242c10608db2df

Thanks,
	Frederic
---

Frederic Weisbecker (3):
      sched: Gather cpu load functions under a more conventional namespace
      sched: Correctly handle nohz ticks cpu load accounting
      sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates


 Documentation/trace/ftrace.txt |  10 ++--
 include/linux/sched.h          |   6 ++-
 kernel/sched/core.c            |   5 +-
 kernel/sched/fair.c            | 112 +++++++++++++++++++++++++++--------------
 kernel/sched/sched.h           |  10 ++--
 kernel/time/tick-sched.c       |   9 ++--
 6 files changed, 96 insertions(+), 56 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace
  2016-04-13 13:56 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3 Frederic Weisbecker
@ 2016-04-13 13:56 ` Frederic Weisbecker
  2016-04-23 12:58   ` [tip:sched/core] sched/fair: Gather CPU " tip-bot for Frederic Weisbecker
  2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
  2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
  2 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-13 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

The cpu load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

	update_cpu_load_*() -> cpu_load_update_*()

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 Documentation/trace/ftrace.txt | 10 +++++-----
 include/linux/sched.h          |  4 ++--
 kernel/sched/core.c            |  2 +-
 kernel/sched/fair.c            | 24 ++++++++++++------------
 kernel/sched/sched.h           |  4 ++--
 kernel/time/tick-sched.c       |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   <idle>-0       3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  <idle>-0       3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  <idle>-0       3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   <idle>-0       3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  <idle>-0       3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  <idle>-0       3dN.2   14us : sched_avg_update <-__update_cpu_load
-  <idle>-0       3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  <idle>-0       3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  <idle>-0       3dN.2   14us : sched_avg_update <-__cpu_load_update
+  <idle>-0       3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   <idle>-0       3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   <idle>-0       3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 52c4847..aa5ee0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..4c522a7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2915,7 +2915,7 @@ void scheduler_tick(void)
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
-	update_cpu_load_active(rq);
+	cpu_load_update_active(rq);
 	calc_global_load_tick(rq);
 	raw_spin_unlock(&rq->lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e66..f33764d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			      unsigned long pending_updates, int active)
 {
 	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __update_cpu_load_nohz(struct rq *this_rq,
+static void __cpu_load_update_nohz(struct rq *this_rq,
 				   unsigned long curr_jiffies,
 				   unsigned long load,
 				   int active)
@@ -4589,7 +4589,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__update_cpu_load(this_rq, load, pending_updates, active);
+		__cpu_load_update(this_rq, load, pending_updates, active);
 	}
 }
 
@@ -4610,7 +4610,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
-static void update_cpu_load_idle(struct rq *this_rq)
+static void cpu_load_update_idle(struct rq *this_rq)
 {
 	/*
 	 * bail if there's load or we're actually up-to-date.
@@ -4618,13 +4618,13 @@ static void update_cpu_load_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__update_cpu_load_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
 }
 
 /*
  * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
  */
-void update_cpu_load_nohz(int active)
+void cpu_load_update_nohz(int active)
 {
 	struct rq *this_rq = this_rq();
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
@@ -4634,7 +4634,7 @@ void update_cpu_load_nohz(int active)
 		return;
 
 	raw_spin_lock(&this_rq->lock);
-	__update_cpu_load_nohz(this_rq, curr_jiffies, load, active);
+	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
 	raw_spin_unlock(&this_rq->lock);
 }
 #endif /* CONFIG_NO_HZ */
@@ -4642,14 +4642,14 @@ void update_cpu_load_nohz(int active)
 /*
  * Called from scheduler_tick()
  */
-void update_cpu_load_active(struct rq *this_rq)
+void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
 	/*
-	 * See the mess around update_cpu_load_idle() / update_cpu_load_nohz().
+	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, load, 1, 1);
+	__cpu_load_update(this_rq, load, 1, 1);
 }
 
 /*
@@ -7957,7 +7957,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		if (time_after_eq(jiffies, rq->next_balance)) {
 			raw_spin_lock_irq(&rq->lock);
 			update_rq_clock(rq);
-			update_cpu_load_idle(rq);
+			cpu_load_update_idle(rq);
 			raw_spin_unlock_irq(&rq->lock);
 			rebalance_domains(rq, CPU_IDLE);
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d2..1802013 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -31,9 +31,9 @@ extern void calc_global_load_tick(struct rq *this_rq);
 extern long calc_load_fold_active(struct rq *this_rq);
 
 #ifdef CONFIG_SMP
-extern void update_cpu_load_active(struct rq *this_rq);
+extern void cpu_load_update_active(struct rq *this_rq);
 #else
-static inline void update_cpu_load_active(struct rq *this_rq) { }
+static inline void cpu_load_update_active(struct rq *this_rq) { }
 #endif
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 58e3310..66bdc9a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -806,7 +806,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	update_cpu_load_nohz(active);
+	cpu_load_update_nohz(active);
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-13 13:56 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3 Frederic Weisbecker
  2016-04-13 13:56 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
@ 2016-04-13 13:56 ` Frederic Weisbecker
  2016-04-18  8:22   ` Byungchul Park
                     ` (3 more replies)
  2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
  2 siblings, 4 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-13 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with cpu load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h    |  6 ++-
 kernel/sched/fair.c      | 97 +++++++++++++++++++++++++++++++-----------------
 kernel/time/tick-sched.c |  9 +++--
 3 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa5ee0d..07b1b1e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f33764d..a8b79f9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4495,7 +4495,6 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
- * @active: !0 for NOHZ_FULL
  *
  * Update rq->cpu_load[] statistics. This function is usually called every
  * scheduler tick (TICK_NSEC).
@@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
  *
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
- * term. See the @active paramter.
+ * term.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
-			      unsigned long pending_updates, int active)
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
+			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+	unsigned long tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4574,10 +4573,23 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-				   unsigned long curr_jiffies,
-				   unsigned long load,
-				   int active)
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we need to avoid the delta approach from the regular tick when
+ * possible since that would seriously skew the load calculation. This is why we
+ * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on
+ * jiffies deltas for updates happening while in nohz mode (idle ticks, idle
+ * loop exit, nohz_idle_balance, nohz full exit...)
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
+static void cpu_load_update_nohz(struct rq *this_rq,
+				 unsigned long curr_jiffies,
+				 unsigned long load)
 {
 	unsigned long pending_updates;
 
@@ -4589,24 +4601,11 @@ static void __cpu_load_update_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__cpu_load_update(this_rq, load, pending_updates, active);
+		cpu_load_update(this_rq, load, pending_updates);
 	}
 }
 
 /*
- * There is no sane way to deal with nohz on smp when using jiffies because the
- * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
- * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
- *
- * Therefore we cannot use the delta approach from the regular tick since that
- * would seriously skew the load calculation. However we'll make do for those
- * updates happening while idle (nohz_idle_balance) or coming out of idle
- * (tick_nohz_idle_exit).
- *
- * This means we might still be one tick off for nohz periods.
- */
-
-/*
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
@@ -4618,26 +4617,56 @@ static void cpu_load_update_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0);
 }
 
 /*
- * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ * Record CPU load on nohz entry so we know the tickless load to account
+ * on nohz exit. cpu_load[0] happens then to be updated more frequently
+ * than other cpu_load[idx] but it should be fine as cpu_load readers
+ * shouldn't rely into synchronized cpu_load[*] updates.
  */
-void cpu_load_update_nohz(int active)
+void cpu_load_update_nohz_start(void)
 {
 	struct rq *this_rq = this_rq();
+
+	/*
+	 * This is all lockless but should be fine. If weighted_cpuload changes
+	 * concurrently we'll exit nohz. And cpu_load write can race with
+	 * cpu_load_update_idle() but both updater would be writing the same.
+	 */
+	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
+}
+
+/*
+ * Account the tickless load in the end of a nohz frame.
+ */
+void cpu_load_update_nohz_stop(void)
+{
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
-	unsigned long load = active ? weighted_cpuload(cpu_of(this_rq)) : 0;
+	struct rq *this_rq = this_rq();
+	unsigned long load;
 
 	if (curr_jiffies == this_rq->last_load_update_tick)
 		return;
 
+	load = weighted_cpuload(cpu_of(this_rq));
 	raw_spin_lock(&this_rq->lock);
-	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
+	cpu_load_update_nohz(this_rq, curr_jiffies, load);
 	raw_spin_unlock(&this_rq->lock);
 }
-#endif /* CONFIG_NO_HZ */
+#else /* !CONFIG_NO_HZ_COMMON */
+static inline void cpu_load_update_nohz(struct rq *this_rq,
+					unsigned long curr_jiffies,
+					unsigned long load) { }
+#endif /* CONFIG_NO_HZ_COMMON */
+
+static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
+{
+	/* See the mess around cpu_load_update_nohz(). */
+	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+	cpu_load_update(this_rq, load, 1);
+}
 
 /*
  * Called from scheduler_tick()
@@ -4645,11 +4674,11 @@ void cpu_load_update_nohz(int active)
 void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
-	/*
-	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
-	 */
-	this_rq->last_load_update_tick = jiffies;
-	__cpu_load_update(this_rq, load, 1, 1);
+
+	if (tick_nohz_tick_stopped())
+		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
+	else
+		cpu_load_update_periodic(this_rq, load);
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 66bdc9a..31872bc 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -776,6 +776,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
 		calc_load_enter_idle();
+		cpu_load_update_nohz_start();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
@@ -802,11 +803,11 @@ out:
 	return tick;
 }
 
-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int active)
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	cpu_load_update_nohz(active);
+	cpu_load_update_nohz_stop();
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
@@ -833,7 +834,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick(ts, ktime_get(), 1);
+		tick_nohz_restart_sched_tick(ts, ktime_get());
 #endif
 }
 
@@ -1024,7 +1025,7 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(ts, now);
 
 	if (ts->tick_stopped) {
-		tick_nohz_restart_sched_tick(ts, now, 0);
+		tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-13 13:56 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3 Frederic Weisbecker
  2016-04-13 13:56 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
  2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
@ 2016-04-13 13:56 ` Frederic Weisbecker
  2016-04-18 13:36   ` Frederic Weisbecker
  2016-04-19 15:36   ` [PATCH v2] " Frederic Weisbecker
  2 siblings, 2 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-13 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Some code in cpu load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  | 3 ++-
 kernel/sched/fair.c  | 9 +++++++--
 kernel/sched/sched.h | 6 ++++--
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c522a7..59a2821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7327,8 +7327,9 @@ void __init sched_init(void)
 
 		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
 			rq->cpu_load[j] = 0;
-
+#ifdef CONFIG_NO_HZ_COMMON
 		rq->last_load_update_tick = jiffies;
+#endif
 
 #ifdef CONFIG_SMP
 		rq->sd = NULL;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a8b79f9..93eea4f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,7 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
-
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * per rq 'load' arrray crap; XXX kill this.
  */
@@ -4489,6 +4489,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	}
 	return load;
 }
+#endif /* CONFIG_NO_HZ_COMMON */
 
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
@@ -4528,7 +4529,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = this_rq->cpu_load[0];
+	unsigned long __maybe_unused tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4541,6 +4542,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
 		old_load = this_rq->cpu_load[i];
+#ifdef CONFIG_NO_HZ_COMMON
 		old_load = decay_load_missed(old_load, pending_updates - 1, i);
 		if (tickless_load) {
 			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
@@ -4551,6 +4553,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			 */
 			old_load += tickless_load;
 		}
+#endif
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This
@@ -4663,8 +4666,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
 	/* See the mess around cpu_load_update_nohz(). */
 	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
 	cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..2302bb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
 	unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+#endif /* CONFIG_SMP */
 	u64 nohz_stamp;
 	unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON */
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
@ 2016-04-18  8:22   ` Byungchul Park
  2016-04-18  9:17   ` Byungchul Park
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Byungchul Park @ 2016-04-18  8:22 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Wed, Apr 13, 2016 at 03:56:51PM +0200, Frederic Weisbecker wrote:
> @@ -4524,12 +4523,12 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
>   *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
>   *
>   * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
> - * term. See the @active paramter.
> + * term.
>   */
> -static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
> -			      unsigned long pending_updates, int active)
> +static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
> +			    unsigned long pending_updates)
>  {
> -	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
> +	unsigned long tickless_load = this_rq->cpu_load[0];

Hello,

Good for my humble code to be fixed so we can write it like this here.

> @@ -4618,26 +4617,56 @@ static void cpu_load_update_idle(struct rq *this_rq)
>  	if (weighted_cpuload(cpu_of(this_rq)))
>  		return;
>  
> -	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
> +	cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0);
>  }
>  
>  /*
> - * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
> + * Record CPU load on nohz entry so we know the tickless load to account
> + * on nohz exit. cpu_load[0] happens then to be updated more frequently
> + * than other cpu_load[idx] but it should be fine as cpu_load readers
> + * shouldn't rely into synchronized cpu_load[*] updates.
>   */
> -void cpu_load_update_nohz(int active)
> +void cpu_load_update_nohz_start(void)
>  {
>  	struct rq *this_rq = this_rq();
> +
> +	/*
> +	 * This is all lockless but should be fine. If weighted_cpuload changes
> +	 * concurrently we'll exit nohz. And cpu_load write can race with
> +	 * cpu_load_update_idle() but both updater would be writing the same.
> +	 */
> +	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));

Like it.

> +/*
> + * Account the tickless load in the end of a nohz frame.
> + */
> +void cpu_load_update_nohz_stop(void)
> +{
>  	unsigned long curr_jiffies = READ_ONCE(jiffies);
> -	unsigned long load = active ? weighted_cpuload(cpu_of(this_rq)) : 0;
> +	struct rq *this_rq = this_rq();
> +	unsigned long load;
>  
>  	if (curr_jiffies == this_rq->last_load_update_tick)
>  		return;
>  
> +	load = weighted_cpuload(cpu_of(this_rq));

Like it.

> @@ -4645,11 +4674,11 @@ void cpu_load_update_nohz(int active)
>  void cpu_load_update_active(struct rq *this_rq)
>  {
>  	unsigned long load = weighted_cpuload(cpu_of(this_rq));
> -	/*
> -	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
> -	 */
> -	this_rq->last_load_update_tick = jiffies;
> -	__cpu_load_update(this_rq, load, 1, 1);
> +
> +	if (tick_nohz_tick_stopped())
> +		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
> +	else
> +		cpu_load_update_periodic(this_rq, load);

Oh! We have missed it until now. Terrible.. :-(

Thank you,
Byungchul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
  2016-04-18  8:22   ` Byungchul Park
@ 2016-04-18  9:17   ` Byungchul Park
  2016-04-18 13:35     ` Frederic Weisbecker
  2016-04-20  7:59   ` Wanpeng Li
  2016-04-23 12:59   ` [tip:sched/core] sched/fair: Correctly handle nohz ticks CPU " tip-bot for Frederic Weisbecker
  3 siblings, 1 reply; 15+ messages in thread
From: Byungchul Park @ 2016-04-18  9:17 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Wed, Apr 13, 2016 at 03:56:51PM +0200, Frederic Weisbecker wrote:
> @@ -4645,11 +4674,11 @@ void cpu_load_update_nohz(int active)
>  void cpu_load_update_active(struct rq *this_rq)
>  {
>  	unsigned long load = weighted_cpuload(cpu_of(this_rq));
> -	/*
> -	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
> -	 */
> -	this_rq->last_load_update_tick = jiffies;
> -	__cpu_load_update(this_rq, load, 1, 1);
> +
> +	if (tick_nohz_tick_stopped())
> +		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
> +	else
> +		cpu_load_update_periodic(this_rq, load);

Considering it further, I wonder if needing it.
(Sorry if I missed something.)

Case 1. tickless -> (scheduler_tick) -> tickless

	I am not sure for this case if the rq's load can be changed or not,
	especially, if the rq's load can be changed *at this point*.
	Please remind that the load[0] is set here.

Case 2. tickless -> (scheduler_tick) -> restart tick

	Will be done by the tick restart routine when exiting irq.
	-> no problem.

Case 3. tick -> (scheduler_tick) -> tickless

	Same as before.
	-> no problem.

Case 4. tick -> (scheduler_tick) -> tick

	We can rely on regular schedule_tick().
	-> no problem.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-18  9:17   ` Byungchul Park
@ 2016-04-18 13:35     ` Frederic Weisbecker
  2016-04-19  0:01       ` Byungchul Park
  0 siblings, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-18 13:35 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Peter Zijlstra, LKML, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Mon, Apr 18, 2016 at 06:17:21PM +0900, Byungchul Park wrote:
> On Wed, Apr 13, 2016 at 03:56:51PM +0200, Frederic Weisbecker wrote:
> > @@ -4645,11 +4674,11 @@ void cpu_load_update_nohz(int active)
> >  void cpu_load_update_active(struct rq *this_rq)
> >  {
> >  	unsigned long load = weighted_cpuload(cpu_of(this_rq));
> > -	/*
> > -	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
> > -	 */
> > -	this_rq->last_load_update_tick = jiffies;
> > -	__cpu_load_update(this_rq, load, 1, 1);
> > +
> > +	if (tick_nohz_tick_stopped())
> > +		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
> > +	else
> > +		cpu_load_update_periodic(this_rq, load);
> 
> Considering it further, I wonder if needing it.
> (Sorry if I missed something.)
> 
> Case 1. tickless -> (scheduler_tick) -> tickless
> 
> 	I am not sure for this case if the rq's load can be changed or not,
> 	especially, if the rq's load can be changed *at this point*.
> 	Please remind that the load[0] is set here.

load[0] won't change because it's set by cpu_load_update_nohz_start().
But all the other load[idx] need to be decayed further.

> 
> Case 2. tickless -> (scheduler_tick) -> restart tick
> 
> 	Will be done by the tick restart routine when exiting irq.
> 	-> no problem.
> 
> Case 3. tick -> (scheduler_tick) -> tickless
> 
> 	Same as before.
> 	-> no problem.
> 
> Case 4. tick -> (scheduler_tick) -> tick
> 
> 	We can rely on regular schedule_tick().
> 	-> no problem.
> 

Thanks for your review!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
@ 2016-04-18 13:36   ` Frederic Weisbecker
  2016-04-19 15:36   ` [PATCH v2] " Frederic Weisbecker
  1 sibling, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-18 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Wed, Apr 13, 2016 at 03:56:52PM +0200, Frederic Weisbecker wrote:
> Some code in cpu load update only concern NO_HZ configs but it is
> built on all configurations. When NO_HZ isn't built, that code is harmless
> but just happens to take some useless ressources in CPU and memory:
> 
> 1) one useless field in struct rq
> 2) jiffies record on every tick that is never used (cpu_load_update_periodic)
> 3) decay_load_missed is called two times on every tick to eventually
>    return immediately with no action taken. And that function is dead
>    code.
> 
> For pure optimization purposes, lets conditionally build the NO_HZ
> related code.
> 
> Cc: Byungchul Park <byungchul.park@lge.com>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  kernel/sched/core.c  | 3 ++-
>  kernel/sched/fair.c  | 9 +++++++--
>  kernel/sched/sched.h | 6 ++++--
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4c522a7..59a2821 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7327,8 +7327,9 @@ void __init sched_init(void)
>  
>  		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
>  			rq->cpu_load[j] = 0;
> -
> +#ifdef CONFIG_NO_HZ_COMMON
>  		rq->last_load_update_tick = jiffies;
> +#endif

I forgot to also conditionally initialize it in sched_init(), I need to resend
the patchset with that fixed.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-18 13:35     ` Frederic Weisbecker
@ 2016-04-19  0:01       ` Byungchul Park
  2016-04-19 14:01         ` Frederic Weisbecker
  0 siblings, 1 reply; 15+ messages in thread
From: Byungchul Park @ 2016-04-19  0:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Mon, Apr 18, 2016 at 03:35:06PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 18, 2016 at 06:17:21PM +0900, Byungchul Park wrote:
> > On Wed, Apr 13, 2016 at 03:56:51PM +0200, Frederic Weisbecker wrote:
> > > @@ -4645,11 +4674,11 @@ void cpu_load_update_nohz(int active)
> > >  void cpu_load_update_active(struct rq *this_rq)
> > >  {
> > >  	unsigned long load = weighted_cpuload(cpu_of(this_rq));
> > > -	/*
> > > -	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
> > > -	 */
> > > -	this_rq->last_load_update_tick = jiffies;
> > > -	__cpu_load_update(this_rq, load, 1, 1);
> > > +
> > > +	if (tick_nohz_tick_stopped())
> > > +		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
> > > +	else
> > > +		cpu_load_update_periodic(this_rq, load);
> > 
> > Considering it further, I wonder if needing it.
> > (Sorry if I missed something.)
> > 
> > Case 1. tickless -> (scheduler_tick) -> tickless
> > 
> > 	I am not sure for this case if the rq's load can be changed or not,
> > 	especially, if the rq's load can be changed *at this point*.
> > 	Please remind that the load[0] is set here.
> 
> load[0] won't change because it's set by cpu_load_update_nohz_start().
> But all the other load[idx] need to be decayed further.

Ah. Right. Sched tick will be handled even in the case 1...

I like your patches. But I am still wondering if the sched tick handling is
necessary even in the case 1. Of course it's another problem though.

Thanks anyway,
Byungchul

> 
> > 
> > Case 2. tickless -> (scheduler_tick) -> restart tick
> > 
> > 	Will be done by the tick restart routine when exiting irq.
> > 	-> no problem.
> > 
> > Case 3. tick -> (scheduler_tick) -> tickless
> > 
> > 	Same as before.
> > 	-> no problem.
> > 
> > Case 4. tick -> (scheduler_tick) -> tick
> > 
> > 	We can rely on regular schedule_tick().
> > 	-> no problem.
> > 
> 
> Thanks for your review!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-19  0:01       ` Byungchul Park
@ 2016-04-19 14:01         ` Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-19 14:01 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Peter Zijlstra, LKML, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Tue, Apr 19, 2016 at 09:01:00AM +0900, Byungchul Park wrote:
> On Mon, Apr 18, 2016 at 03:35:06PM +0200, Frederic Weisbecker wrote:
> > On Mon, Apr 18, 2016 at 06:17:21PM +0900, Byungchul Park wrote:
> > > On Wed, Apr 13, 2016 at 03:56:51PM +0200, Frederic Weisbecker wrote:
> > > > @@ -4645,11 +4674,11 @@ void cpu_load_update_nohz(int active)
> > > >  void cpu_load_update_active(struct rq *this_rq)
> > > >  {
> > > >  	unsigned long load = weighted_cpuload(cpu_of(this_rq));
> > > > -	/*
> > > > -	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
> > > > -	 */
> > > > -	this_rq->last_load_update_tick = jiffies;
> > > > -	__cpu_load_update(this_rq, load, 1, 1);
> > > > +
> > > > +	if (tick_nohz_tick_stopped())
> > > > +		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
> > > > +	else
> > > > +		cpu_load_update_periodic(this_rq, load);
> > > 
> > > Considering it further, I wonder if needing it.
> > > (Sorry if I missed something.)
> > > 
> > > Case 1. tickless -> (scheduler_tick) -> tickless
> > > 
> > > 	I am not sure for this case if the rq's load can be changed or not,
> > > 	especially, if the rq's load can be changed *at this point*.
> > > 	Please remind that the load[0] is set here.
> > 
> > load[0] won't change because it's set by cpu_load_update_nohz_start().
> > But all the other load[idx] need to be decayed further.
> 
> Ah. Right. Sched tick will be handled even in the case 1...
> 
> I like your patches. But I am still wondering if the sched tick handling is
> necessary even in the case 1. Of course it's another problem though.

Right, we could indeed ignore those ticks happening in dynticks/idle and just wait
for the end of the dynticks frame that calls cpu_load_update_stop(). In fact
the first version of this patchset did that but Thomas didn't seem to like it.

That said more local updates means less need for remote updates through
cpu_load_update_idle().

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates
  2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
  2016-04-18 13:36   ` Frederic Weisbecker
@ 2016-04-19 15:36   ` Frederic Weisbecker
  2016-04-23 12:59     ` [tip:sched/core] sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU " tip-bot for Frederic Weisbecker
  1 sibling, 1 reply; 15+ messages in thread
From: Frederic Weisbecker @ 2016-04-19 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Some code in cpu load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c  | 5 ++---
 kernel/sched/fair.c  | 9 +++++++--
 kernel/sched/sched.h | 6 ++++--
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c522a7..06321d8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7328,8 +7328,6 @@ void __init sched_init(void)
 		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
 			rq->cpu_load[j] = 0;
 
-		rq->last_load_update_tick = jiffies;
-
 #ifdef CONFIG_SMP
 		rq->sd = NULL;
 		rq->rd = NULL;
@@ -7348,12 +7346,13 @@ void __init sched_init(void)
 
 		rq_attach_root(rq, &def_root_domain);
 #ifdef CONFIG_NO_HZ_COMMON
+		rq->last_load_update_tick = jiffies;
 		rq->nohz_flags = 0;
 #endif
 #ifdef CONFIG_NO_HZ_FULL
 		rq->last_sched_tick = 0;
 #endif
-#endif
+#endif /* CONFIG_SMP */
 		init_rq_hrtick(rq);
 		atomic_set(&rq->nr_iowait, 0);
 	}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a8b79f9..93eea4f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,7 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
-
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * per rq 'load' arrray crap; XXX kill this.
  */
@@ -4489,6 +4489,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	}
 	return load;
 }
+#endif /* CONFIG_NO_HZ_COMMON */
 
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
@@ -4528,7 +4529,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = this_rq->cpu_load[0];
+	unsigned long __maybe_unused tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4541,6 +4542,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
 		old_load = this_rq->cpu_load[i];
+#ifdef CONFIG_NO_HZ_COMMON
 		old_load = decay_load_missed(old_load, pending_updates - 1, i);
 		if (tickless_load) {
 			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
@@ -4551,6 +4553,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			 */
 			old_load += tickless_load;
 		}
+#endif
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This
@@ -4663,8 +4666,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
 	/* See the mess around cpu_load_update_nohz(). */
 	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
 	cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..2302bb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
 	unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+#endif /* CONFIG_SMP */
 	u64 nohz_stamp;
 	unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON */
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
  2016-04-18  8:22   ` Byungchul Park
  2016-04-18  9:17   ` Byungchul Park
@ 2016-04-20  7:59   ` Wanpeng Li
  2016-04-23 12:59   ` [tip:sched/core] sched/fair: Correctly handle nohz ticks CPU " tip-bot for Frederic Weisbecker
  3 siblings, 0 replies; 15+ messages in thread
From: Wanpeng Li @ 2016-04-20  7:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, LKML, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Hi Frederic,
2016-04-13 21:56 GMT+08:00 Frederic Weisbecker <fweisbec@gmail.com>:
> Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
> mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
> mode and we try to minimize the ticks as much as possible. The nohz
> subsystem uses a confusing terminology with the internal state
> "ts->tick_stopped" which is also available through its public interface
> with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
> reduced with the best effort rather than stopped. In the best case the
> tick can indeed be actually stopped but there is no guarantee about that.
> If a timer needs to fire one second later, a tick will fire while the
> CPU is in nohz mode and this is a very common scenario.
>
> Now this confusion happens to be a problem with cpu load updates:
> cpu_load_update_active() doesn't handle nohz ticks correctly because it
> assumes that ticks are completely stopped in nohz mode and that
> cpu_load_update_active() can't be called in dynticks mode. When that
> happens, the whole previous tickless load is ignored and the function
> just records the load for the current tick, ignoring potentially long
> idle periods behind.

This is for one timer interrupt per second scenario, right?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Gather CPU load functions under a more conventional namespace
  2016-04-13 13:56 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
@ 2016-04-23 12:58   ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2016-04-23 12:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, riel, lcapitulino, paulmck, byungchul.park, peterz,
	fweisbec, mingo, hpa, cl, linux-kernel, efault, cmetcalf

Commit-ID:  cee1afce3053e7aa0793fbd5f2e845fa2cef9e33
Gitweb:     http://git.kernel.org/tip/cee1afce3053e7aa0793fbd5f2e845fa2cef9e33
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 13 Apr 2016 15:56:50 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 23 Apr 2016 14:20:41 +0200

sched/fair: Gather CPU load functions under a more conventional namespace

The CPU load update related functions have a weak naming convention
currently, starting with update_cpu_load_*() which isn't ideal as
"update" is a very generic concept.

Since two of these functions are public already (and a third is to come)
that's enough to introduce a more conventional naming scheme. So let's
do the following rename instead:

	update_cpu_load_*() -> cpu_load_update_*()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/trace/ftrace.txt | 10 +++++-----
 include/linux/sched.h          |  4 ++--
 kernel/sched/core.c            |  2 +-
 kernel/sched/fair.c            | 24 ++++++++++++------------
 kernel/sched/sched.h           |  4 ++--
 kernel/time/tick-sched.c       |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   <idle>-0       3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  <idle>-0       3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  <idle>-0       3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   <idle>-0       3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  <idle>-0       3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  <idle>-0       3dN.2   14us : sched_avg_update <-__update_cpu_load
-  <idle>-0       3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  <idle>-0       3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  <idle>-0       3dN.2   14us : sched_avg_update <-__cpu_load_update
+  <idle>-0       3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   <idle>-0       3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   <idle>-0       3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 13c1c1d..0b7f602 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06efbb9..c98a268 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2917,7 +2917,7 @@ void scheduler_tick(void)
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
-	update_cpu_load_active(rq);
+	cpu_load_update_active(rq);
 	calc_global_load_tick(rq);
 	raw_spin_unlock(&rq->lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c328bd7..ecd81c4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4559,7 +4559,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4594,7 +4594,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			      unsigned long pending_updates, int active)
 {
 	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4642,7 +4642,7 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __update_cpu_load_nohz(struct rq *this_rq,
+static void __cpu_load_update_nohz(struct rq *this_rq,
 				   unsigned long curr_jiffies,
 				   unsigned long load,
 				   int active)
@@ -4657,7 +4657,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__update_cpu_load(this_rq, load, pending_updates, active);
+		__cpu_load_update(this_rq, load, pending_updates, active);
 	}
 }
 
@@ -4678,7 +4678,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
-static void update_cpu_load_idle(struct rq *this_rq)
+static void cpu_load_update_idle(struct rq *this_rq)
 {
 	/*
 	 * bail if there's load or we're actually up-to-date.
@@ -4686,13 +4686,13 @@ static void update_cpu_load_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__update_cpu_load_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
 }
 
 /*
  * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
  */
-void update_cpu_load_nohz(int active)
+void cpu_load_update_nohz(int active)
 {
 	struct rq *this_rq = this_rq();
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
@@ -4702,7 +4702,7 @@ void update_cpu_load_nohz(int active)
 		return;
 
 	raw_spin_lock(&this_rq->lock);
-	__update_cpu_load_nohz(this_rq, curr_jiffies, load, active);
+	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
 	raw_spin_unlock(&this_rq->lock);
 }
 #endif /* CONFIG_NO_HZ */
@@ -4710,14 +4710,14 @@ void update_cpu_load_nohz(int active)
 /*
  * Called from scheduler_tick()
  */
-void update_cpu_load_active(struct rq *this_rq)
+void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
 	/*
-	 * See the mess around update_cpu_load_idle() / update_cpu_load_nohz().
+	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, load, 1, 1);
+	__cpu_load_update(this_rq, load, 1, 1);
 }
 
 /*
@@ -8031,7 +8031,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		if (time_after_eq(jiffies, rq->next_balance)) {
 			raw_spin_lock_irq(&rq->lock);
 			update_rq_clock(rq);
-			update_cpu_load_idle(rq);
+			cpu_load_update_idle(rq);
 			raw_spin_unlock_irq(&rq->lock);
 			rebalance_domains(rq, CPU_IDLE);
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a7cbad7..32d9e22 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -31,9 +31,9 @@ extern void calc_global_load_tick(struct rq *this_rq);
 extern long calc_load_fold_active(struct rq *this_rq);
 
 #ifdef CONFIG_SMP
-extern void update_cpu_load_active(struct rq *this_rq);
+extern void cpu_load_update_active(struct rq *this_rq);
 #else
-static inline void update_cpu_load_active(struct rq *this_rq) { }
+static inline void cpu_load_update_active(struct rq *this_rq) { }
 #endif
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 58e3310..66bdc9a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -806,7 +806,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	update_cpu_load_nohz(active);
+	cpu_load_update_nohz(active);
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Correctly handle nohz ticks CPU load accounting
  2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
                     ` (2 preceding siblings ...)
  2016-04-20  7:59   ` Wanpeng Li
@ 2016-04-23 12:59   ` tip-bot for Frederic Weisbecker
  3 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2016-04-23 12:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, paulmck, peterz, efault, linux-kernel, cmetcalf,
	lcapitulino, fweisbec, riel, cl, tglx, byungchul.park, hpa

Commit-ID:  1f41906a6fda1114debd3898668bd7ab6470ee41
Gitweb:     http://git.kernel.org/tip/1f41906a6fda1114debd3898668bd7ab6470ee41
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Wed, 13 Apr 2016 15:56:51 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 23 Apr 2016 14:20:42 +0200

sched/fair: Correctly handle nohz ticks CPU load accounting

Ticks can happen while the CPU is in dynticks-idle or dynticks-singletask
mode. In fact "nohz" or "dynticks" only mean that we exit the periodic
mode and we try to minimize the ticks as much as possible. The nohz
subsystem uses a confusing terminology with the internal state
"ts->tick_stopped" which is also available through its public interface
with tick_nohz_tick_stopped(). This is a misnomer as the tick is instead
reduced with the best effort rather than stopped. In the best case the
tick can indeed be actually stopped but there is no guarantee about that.
If a timer needs to fire one second later, a tick will fire while the
CPU is in nohz mode and this is a very common scenario.

Now this confusion happens to be a problem with CPU load updates:
cpu_load_update_active() doesn't handle nohz ticks correctly because it
assumes that ticks are completely stopped in nohz mode and that
cpu_load_update_active() can't be called in dynticks mode. When that
happens, the whole previous tickless load is ignored and the function
just records the load for the current tick, ignoring potentially long
idle periods behind.

In order to solve this, we could account the current load for the
previous nohz time but there is a risk that we account the load of a
task that got freshly enqueued for the whole nohz period.

So instead, lets record the dynticks load on nohz frame entry so we know
what to record in case of nohz ticks, then use this record to account
the tickless load on nohz ticks and nohz frame end.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1460555812-25375-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h    |  6 ++-
 kernel/sched/fair.c      | 97 +++++++++++++++++++++++++++++++-----------------
 kernel/time/tick-sched.c |  9 +++--
 3 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0b7f602..d894f2d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ecd81c4..b70367a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4563,7 +4563,6 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
- * @active: !0 for NOHZ_FULL
  *
  * Update rq->cpu_load[] statistics. This function is usually called every
  * scheduler tick (TICK_NSEC).
@@ -4592,12 +4591,12 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  *   load[i]_n = (1 - 1/2^i)^n * load[i]_0
  *
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
- * term. See the @active paramter.
+ * term.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
-			      unsigned long pending_updates, int active)
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
+			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+	unsigned long tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4642,10 +4641,23 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-				   unsigned long curr_jiffies,
-				   unsigned long load,
-				   int active)
+/*
+ * There is no sane way to deal with nohz on smp when using jiffies because the
+ * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
+ * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
+ *
+ * Therefore we need to avoid the delta approach from the regular tick when
+ * possible since that would seriously skew the load calculation. This is why we
+ * use cpu_load_update_periodic() for CPUs out of nohz. However we'll rely on
+ * jiffies deltas for updates happening while in nohz mode (idle ticks, idle
+ * loop exit, nohz_idle_balance, nohz full exit...)
+ *
+ * This means we might still be one tick off for nohz periods.
+ */
+
+static void cpu_load_update_nohz(struct rq *this_rq,
+				 unsigned long curr_jiffies,
+				 unsigned long load)
 {
 	unsigned long pending_updates;
 
@@ -4657,24 +4669,11 @@ static void __cpu_load_update_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__cpu_load_update(this_rq, load, pending_updates, active);
+		cpu_load_update(this_rq, load, pending_updates);
 	}
 }
 
 /*
- * There is no sane way to deal with nohz on smp when using jiffies because the
- * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
- * causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
- *
- * Therefore we cannot use the delta approach from the regular tick since that
- * would seriously skew the load calculation. However we'll make do for those
- * updates happening while idle (nohz_idle_balance) or coming out of idle
- * (tick_nohz_idle_exit).
- *
- * This means we might still be one tick off for nohz periods.
- */
-
-/*
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
@@ -4686,26 +4685,56 @@ static void cpu_load_update_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0);
 }
 
 /*
- * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ * Record CPU load on nohz entry so we know the tickless load to account
+ * on nohz exit. cpu_load[0] happens then to be updated more frequently
+ * than other cpu_load[idx] but it should be fine as cpu_load readers
+ * shouldn't rely into synchronized cpu_load[*] updates.
  */
-void cpu_load_update_nohz(int active)
+void cpu_load_update_nohz_start(void)
 {
 	struct rq *this_rq = this_rq();
+
+	/*
+	 * This is all lockless but should be fine. If weighted_cpuload changes
+	 * concurrently we'll exit nohz. And cpu_load write can race with
+	 * cpu_load_update_idle() but both updater would be writing the same.
+	 */
+	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
+}
+
+/*
+ * Account the tickless load in the end of a nohz frame.
+ */
+void cpu_load_update_nohz_stop(void)
+{
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
-	unsigned long load = active ? weighted_cpuload(cpu_of(this_rq)) : 0;
+	struct rq *this_rq = this_rq();
+	unsigned long load;
 
 	if (curr_jiffies == this_rq->last_load_update_tick)
 		return;
 
+	load = weighted_cpuload(cpu_of(this_rq));
 	raw_spin_lock(&this_rq->lock);
-	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
+	cpu_load_update_nohz(this_rq, curr_jiffies, load);
 	raw_spin_unlock(&this_rq->lock);
 }
-#endif /* CONFIG_NO_HZ */
+#else /* !CONFIG_NO_HZ_COMMON */
+static inline void cpu_load_update_nohz(struct rq *this_rq,
+					unsigned long curr_jiffies,
+					unsigned long load) { }
+#endif /* CONFIG_NO_HZ_COMMON */
+
+static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
+{
+	/* See the mess around cpu_load_update_nohz(). */
+	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+	cpu_load_update(this_rq, load, 1);
+}
 
 /*
  * Called from scheduler_tick()
@@ -4713,11 +4742,11 @@ void cpu_load_update_nohz(int active)
 void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
-	/*
-	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
-	 */
-	this_rq->last_load_update_tick = jiffies;
-	__cpu_load_update(this_rq, load, 1, 1);
+
+	if (tick_nohz_tick_stopped())
+		cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), load);
+	else
+		cpu_load_update_periodic(this_rq, load);
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 66bdc9a..31872bc 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -776,6 +776,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
 		calc_load_enter_idle();
+		cpu_load_update_nohz_start();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
@@ -802,11 +803,11 @@ out:
 	return tick;
 }
 
-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int active)
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	cpu_load_update_nohz(active);
+	cpu_load_update_nohz_stop();
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
@@ -833,7 +834,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick(ts, ktime_get(), 1);
+		tick_nohz_restart_sched_tick(ts, ktime_get());
 #endif
 }
 
@@ -1024,7 +1025,7 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(ts, now);
 
 	if (ts->tick_stopped) {
-		tick_nohz_restart_sched_tick(ts, now, 0);
+		tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU load updates
  2016-04-19 15:36   ` [PATCH v2] " Frederic Weisbecker
@ 2016-04-23 12:59     ` tip-bot for Frederic Weisbecker
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Frederic Weisbecker @ 2016-04-23 12:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: paulmck, tglx, byungchul.park, mingo, efault, riel, lcapitulino,
	linux-kernel, fweisbec, cmetcalf, hpa, cl, peterz

Commit-ID:  9fd81dd5ce0b12341c9f83346f8d32ac68bd3841
Gitweb:     http://git.kernel.org/tip/9fd81dd5ce0b12341c9f83346f8d32ac68bd3841
Author:     Frederic Weisbecker <fweisbec@gmail.com>
AuthorDate: Tue, 19 Apr 2016 17:36:51 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 23 Apr 2016 14:20:42 +0200

sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU load updates

Some code in CPU load update only concern NO_HZ configs but it is
built on all configurations. When NO_HZ isn't built, that code is harmless
but just happens to take some useless ressources in CPU and memory:

1) one useless field in struct rq
2) jiffies record on every tick that is never used (cpu_load_update_periodic)
3) decay_load_missed is called two times on every tick to eventually
   return immediately with no action taken. And that function is dead
   code.

For pure optimization purposes, lets conditionally build the NO_HZ
related code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1461080211-16271-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  | 5 ++---
 kernel/sched/fair.c  | 9 +++++++--
 kernel/sched/sched.h | 6 ++++--
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c98a268..71dffbb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7381,8 +7381,6 @@ void __init sched_init(void)
 		for (j = 0; j < CPU_LOAD_IDX_MAX; j++)
 			rq->cpu_load[j] = 0;
 
-		rq->last_load_update_tick = jiffies;
-
 #ifdef CONFIG_SMP
 		rq->sd = NULL;
 		rq->rd = NULL;
@@ -7401,12 +7399,13 @@ void __init sched_init(void)
 
 		rq_attach_root(rq, &def_root_domain);
 #ifdef CONFIG_NO_HZ_COMMON
+		rq->last_load_update_tick = jiffies;
 		rq->nohz_flags = 0;
 #endif
 #ifdef CONFIG_NO_HZ_FULL
 		rq->last_sched_tick = 0;
 #endif
-#endif
+#endif /* CONFIG_SMP */
 		init_rq_hrtick(rq);
 		atomic_set(&rq->nr_iowait, 0);
 	}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b70367a..b8a33ab 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
-
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * per rq 'load' arrray crap; XXX kill this.
  */
@@ -4557,6 +4557,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	}
 	return load;
 }
+#endif /* CONFIG_NO_HZ_COMMON */
 
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
@@ -4596,7 +4597,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			    unsigned long pending_updates)
 {
-	unsigned long tickless_load = this_rq->cpu_load[0];
+	unsigned long __maybe_unused tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4609,6 +4610,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
 		old_load = this_rq->cpu_load[i];
+#ifdef CONFIG_NO_HZ_COMMON
 		old_load = decay_load_missed(old_load, pending_updates - 1, i);
 		if (tickless_load) {
 			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
@@ -4619,6 +4621,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			 */
 			old_load += tickless_load;
 		}
+#endif
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This
@@ -4731,8 +4734,10 @@ static inline void cpu_load_update_nohz(struct rq *this_rq,
 
 static void cpu_load_update_periodic(struct rq *this_rq, unsigned long load)
 {
+#ifdef CONFIG_NO_HZ_COMMON
 	/* See the mess around cpu_load_update_nohz(). */
 	this_rq->last_load_update_tick = READ_ONCE(jiffies);
+#endif
 	cpu_load_update(this_rq, load, 1);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 32d9e22..69da6fc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,11 +585,13 @@ struct rq {
 #endif
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
-	unsigned long last_load_update_tick;
 #ifdef CONFIG_NO_HZ_COMMON
+#ifdef CONFIG_SMP
+	unsigned long last_load_update_tick;
+#endif /* CONFIG_SMP */
 	u64 nohz_stamp;
 	unsigned long nohz_flags;
-#endif
+#endif /* CONFIG_NO_HZ_COMMON */
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-04-23 13:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13 13:56 [PATCH 0/3] sched: Fix/improve nohz cpu load updates v3 Frederic Weisbecker
2016-04-13 13:56 ` [PATCH 1/3] sched: Gather cpu load functions under a more conventional namespace Frederic Weisbecker
2016-04-23 12:58   ` [tip:sched/core] sched/fair: Gather CPU " tip-bot for Frederic Weisbecker
2016-04-13 13:56 ` [PATCH 2/3] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
2016-04-18  8:22   ` Byungchul Park
2016-04-18  9:17   ` Byungchul Park
2016-04-18 13:35     ` Frederic Weisbecker
2016-04-19  0:01       ` Byungchul Park
2016-04-19 14:01         ` Frederic Weisbecker
2016-04-20  7:59   ` Wanpeng Li
2016-04-23 12:59   ` [tip:sched/core] sched/fair: Correctly handle nohz ticks CPU " tip-bot for Frederic Weisbecker
2016-04-13 13:56 ` [PATCH 3/3] sched: Optimize !CONFIG_NO_HZ_COMMON cpu load updates Frederic Weisbecker
2016-04-18 13:36   ` Frederic Weisbecker
2016-04-19 15:36   ` [PATCH v2] " Frederic Weisbecker
2016-04-23 12:59     ` [tip:sched/core] sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU " tip-bot for Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).