All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] sched: Fix/improve nohz cpu load updates
@ 2016-04-01 13:23 Frederic Weisbecker
  2016-04-01 13:23 ` [PATCH 1/4] sched: Gather cpu load functions under a common namespace Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-01 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Here is another attempt to fix the nohz cpu load accounting after the
first try (https://lwn.net/Articles/671749/), this time using an
entirely different direction, probably more "natural".

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	sched/nohz

HEAD: fccff1624c5bf5d91977da587167f52ab4ceed5b

Thanks,
	Frederic
---

Frederic Weisbecker (4):
      sched: Gather cpu load functions under a common namespace
      sched: Correctly handle nohz ticks cpu load accounting
      sched: Optimize tick periodic cpu load updates
      sched: Conditionally build cpu load decay code for nohz


 Documentation/trace/ftrace.txt |  10 ++--
 include/linux/sched.h          |   6 ++-
 kernel/sched/core.c            |   2 +-
 kernel/sched/fair.c            | 117 ++++++++++++++++++++++++++---------------
 kernel/sched/sched.h           |   8 +--
 kernel/time/tick-sched.c       |   9 ++--
 6 files changed, 94 insertions(+), 58 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] sched: Gather cpu load functions under a common namespace
  2016-04-01 13:23 [PATCH 0/4] sched: Fix/improve nohz cpu load updates Frederic Weisbecker
@ 2016-04-01 13:23 ` Frederic Weisbecker
  2016-04-02  7:09   ` Peter Zijlstra
  2016-04-01 13:23 ` [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-01 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

This way they are easily grep'able and recognized.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 Documentation/trace/ftrace.txt | 10 +++++-----
 include/linux/sched.h          |  4 ++--
 kernel/sched/core.c            |  2 +-
 kernel/sched/fair.c            | 24 ++++++++++++------------
 kernel/sched/sched.h           |  4 ++--
 kernel/time/tick-sched.c       |  2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297..9857606 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1562,12 +1562,12 @@ Doing the same with chrt -r 5 and function-trace set.
   <idle>-0       3dN.1   12us : menu_hrtimer_cancel <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : ktime_get <-tick_nohz_idle_exit
   <idle>-0       3dN.1   12us : tick_do_update_jiffies64 <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : update_cpu_load_nohz <-tick_nohz_idle_exit
-  <idle>-0       3dN.1   13us : _raw_spin_lock <-update_cpu_load_nohz
+  <idle>-0       3dN.1   13us : cpu_load_update_nohz <-tick_nohz_idle_exit
+  <idle>-0       3dN.1   13us : _raw_spin_lock <-cpu_load_update_nohz
   <idle>-0       3dN.1   13us : add_preempt_count <-_raw_spin_lock
-  <idle>-0       3dN.2   13us : __update_cpu_load <-update_cpu_load_nohz
-  <idle>-0       3dN.2   14us : sched_avg_update <-__update_cpu_load
-  <idle>-0       3dN.2   14us : _raw_spin_unlock <-update_cpu_load_nohz
+  <idle>-0       3dN.2   13us : __cpu_load_update <-cpu_load_update_nohz
+  <idle>-0       3dN.2   14us : sched_avg_update <-__cpu_load_update
+  <idle>-0       3dN.2   14us : _raw_spin_unlock <-cpu_load_update_nohz
   <idle>-0       3dN.2   14us : sub_preempt_count <-_raw_spin_unlock
   <idle>-0       3dN.1   15us : calc_load_exit_idle <-tick_nohz_idle_exit
   <idle>-0       3dN.1   15us : touch_softlockup_watchdog <-tick_nohz_idle_exit
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 60bba7e..86adc0e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,9 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void update_cpu_load_nohz(int active);
+extern void cpu_load_update_nohz(int active);
 #else
-static inline void update_cpu_load_nohz(int active) { }
+static inline void cpu_load_update_nohz(int active) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d8465ee..e507329 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2897,7 +2897,7 @@ void scheduler_tick(void)
 	raw_spin_lock(&rq->lock);
 	update_rq_clock(rq);
 	curr->sched_class->task_tick(rq, curr, 0);
-	update_cpu_load_active(rq);
+	cpu_load_update_active(rq);
 	calc_global_load_tick(rq);
 	raw_spin_unlock(&rq->lock);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e66..f33764d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4491,7 +4491,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 }
 
 /**
- * __update_cpu_load - update the rq->cpu_load[] statistics
+ * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
  * @this_load: The current load
  * @pending_updates: The number of missed updates
@@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
+static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			      unsigned long pending_updates, int active)
 {
 	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
@@ -4574,7 +4574,7 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-static void __update_cpu_load_nohz(struct rq *this_rq,
+static void __cpu_load_update_nohz(struct rq *this_rq,
 				   unsigned long curr_jiffies,
 				   unsigned long load,
 				   int active)
@@ -4589,7 +4589,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__update_cpu_load(this_rq, load, pending_updates, active);
+		__cpu_load_update(this_rq, load, pending_updates, active);
 	}
 }
 
@@ -4610,7 +4610,7 @@ static void __update_cpu_load_nohz(struct rq *this_rq,
  * Called from nohz_idle_balance() to update the load ratings before doing the
  * idle balance.
  */
-static void update_cpu_load_idle(struct rq *this_rq)
+static void cpu_load_update_idle(struct rq *this_rq)
 {
 	/*
 	 * bail if there's load or we're actually up-to-date.
@@ -4618,13 +4618,13 @@ static void update_cpu_load_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__update_cpu_load_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
 }
 
 /*
  * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
  */
-void update_cpu_load_nohz(int active)
+void cpu_load_update_nohz(int active)
 {
 	struct rq *this_rq = this_rq();
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
@@ -4634,7 +4634,7 @@ void update_cpu_load_nohz(int active)
 		return;
 
 	raw_spin_lock(&this_rq->lock);
-	__update_cpu_load_nohz(this_rq, curr_jiffies, load, active);
+	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
 	raw_spin_unlock(&this_rq->lock);
 }
 #endif /* CONFIG_NO_HZ */
@@ -4642,14 +4642,14 @@ void update_cpu_load_nohz(int active)
 /*
  * Called from scheduler_tick()
  */
-void update_cpu_load_active(struct rq *this_rq)
+void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
 	/*
-	 * See the mess around update_cpu_load_idle() / update_cpu_load_nohz().
+	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
 	 */
 	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, load, 1, 1);
+	__cpu_load_update(this_rq, load, 1, 1);
 }
 
 /*
@@ -7957,7 +7957,7 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		if (time_after_eq(jiffies, rq->next_balance)) {
 			raw_spin_lock_irq(&rq->lock);
 			update_rq_clock(rq);
-			update_cpu_load_idle(rq);
+			cpu_load_update_idle(rq);
 			raw_spin_unlock_irq(&rq->lock);
 			rebalance_domains(rq, CPU_IDLE);
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d2..1802013 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -31,9 +31,9 @@ extern void calc_global_load_tick(struct rq *this_rq);
 extern long calc_load_fold_active(struct rq *this_rq);
 
 #ifdef CONFIG_SMP
-extern void update_cpu_load_active(struct rq *this_rq);
+extern void cpu_load_update_active(struct rq *this_rq);
 #else
-static inline void update_cpu_load_active(struct rq *this_rq) { }
+static inline void cpu_load_update_active(struct rq *this_rq) { }
 #endif
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 084b79f..d62eb77 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -807,7 +807,7 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	update_cpu_load_nohz(active);
+	cpu_load_update_nohz(active);
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-01 13:23 [PATCH 0/4] sched: Fix/improve nohz cpu load updates Frederic Weisbecker
  2016-04-01 13:23 ` [PATCH 1/4] sched: Gather cpu load functions under a common namespace Frederic Weisbecker
@ 2016-04-01 13:23 ` Frederic Weisbecker
  2016-04-02  7:15   ` Peter Zijlstra
  2016-04-01 13:23 ` [PATCH 3/4] sched: Optimize tick periodic cpu load updates Frederic Weisbecker
  2016-04-01 13:23 ` [PATCH 4/4] sched: Conditionally build cpu load decay code for nohz Frederic Weisbecker
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-01 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Ticks can happen in the middle of a nohz frame and
cpu_load_update_active() doesn't handle these correctly. It forgets the
whole previous tickless load and just records the current tick, ignoring
potentially long idle periods.

In order to solve this, record the load on nohz frame entry so we know
what to record in case of nohz interruptions, then use this recorded load
to account the tickless load on nohz ticks and nohz frame end.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 include/linux/sched.h    |  6 +++--
 kernel/sched/fair.c      | 63 +++++++++++++++++++++++++++++-------------------
 kernel/time/tick-sched.c |  9 ++++---
 3 files changed, 47 insertions(+), 31 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 86adc0e..6f9415a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -178,9 +178,11 @@ extern void get_iowait_load(unsigned long *nr_waiters, unsigned long *load);
 extern void calc_global_load(unsigned long ticks);
 
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
-extern void cpu_load_update_nohz(int active);
+extern void cpu_load_update_nohz_start(void);
+extern void cpu_load_update_nohz_stop(void);
 #else
-static inline void cpu_load_update_nohz(int active) { }
+static inline void cpu_load_update_nohz_start(void) { }
+static inline void cpu_load_update_nohz_stop(void) { }
 #endif
 
 extern void dump_cpu_task(int cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f33764d..394f008 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4527,9 +4527,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * term. See the @active paramter.
  */
 static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
-			      unsigned long pending_updates, int active)
+			      unsigned long pending_updates)
 {
-	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
+	unsigned long tickless_load = this_rq->cpu_load[0];
 	int i, scale;
 
 	this_rq->nr_load_updates++;
@@ -4567,17 +4567,9 @@ static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
 	sched_avg_update(this_rq);
 }
 
-/* Used instead of source_load when we know the type == 0 */
-static unsigned long weighted_cpuload(const int cpu)
-{
-	return cfs_rq_runnable_load_avg(&cpu_rq(cpu)->cfs);
-}
-
-#ifdef CONFIG_NO_HZ_COMMON
-static void __cpu_load_update_nohz(struct rq *this_rq,
-				   unsigned long curr_jiffies,
-				   unsigned long load,
-				   int active)
+static void cpu_load_update(struct rq *this_rq,
+			    unsigned long curr_jiffies,
+			    unsigned long load)
 {
 	unsigned long pending_updates;
 
@@ -4589,10 +4581,17 @@ static void __cpu_load_update_nohz(struct rq *this_rq,
 		 * In the NOHZ_FULL case, we were non-idle, we should consider
 		 * its weighted load.
 		 */
-		__cpu_load_update(this_rq, load, pending_updates, active);
+		__cpu_load_update(this_rq, load, pending_updates);
 	}
 }
 
+/* Used instead of source_load when we know the type == 0 */
+static unsigned long weighted_cpuload(const int cpu)
+{
+	return cfs_rq_runnable_load_avg(&cpu_rq(cpu)->cfs);
+}
+
+#ifdef CONFIG_NO_HZ_COMMON
 /*
  * There is no sane way to deal with nohz on smp when using jiffies because the
  * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
@@ -4618,26 +4617,43 @@ static void cpu_load_update_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	__cpu_load_update_nohz(this_rq, READ_ONCE(jiffies), 0, 0);
+	cpu_load_update(this_rq, READ_ONCE(jiffies), 0);
 }
 
 /*
- * Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
+ * Record CPU load on nohz entry so we know the tickless load to account
+ * on nohz exit.
  */
-void cpu_load_update_nohz(int active)
+void cpu_load_update_nohz_start(void)
 {
 	struct rq *this_rq = this_rq();
+
+	/*
+	 * This is all lockless but should be fine. If weighted_cpuload changes
+	 * concurrently we'll exit nohz. And cpu_load write can race with
+	 * cpu_load_update_idle() but both updater would be writing the same.
+	 */
+	this_rq->cpu_load[0] = weighted_cpuload(cpu_of(this_rq));
+}
+
+/*
+ * Account the tickless load in the end of a nohz frame.
+ */
+void cpu_load_update_nohz_stop(void)
+{
 	unsigned long curr_jiffies = READ_ONCE(jiffies);
-	unsigned long load = active ? weighted_cpuload(cpu_of(this_rq)) : 0;
+	struct rq *this_rq = this_rq();
+	unsigned long load;
 
 	if (curr_jiffies == this_rq->last_load_update_tick)
 		return;
 
+	load = weighted_cpuload(cpu_of(this_rq));
 	raw_spin_lock(&this_rq->lock);
-	__cpu_load_update_nohz(this_rq, curr_jiffies, load, active);
+	cpu_load_update(this_rq, curr_jiffies, load);
 	raw_spin_unlock(&this_rq->lock);
 }
-#endif /* CONFIG_NO_HZ */
+#endif /* CONFIG_NO_HZ_COMMON */
 
 /*
  * Called from scheduler_tick()
@@ -4645,11 +4661,8 @@ void cpu_load_update_nohz(int active)
 void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
-	/*
-	 * See the mess around cpu_load_update_idle() / cpu_load_update_nohz().
-	 */
-	this_rq->last_load_update_tick = jiffies;
-	__cpu_load_update(this_rq, load, 1, 1);
+
+	cpu_load_update(this_rq, READ_ONCE(jiffies), load);
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index d62eb77..342110f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -777,6 +777,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	if (!ts->tick_stopped) {
 		nohz_balance_enter_idle(cpu);
 		calc_load_enter_idle();
+		cpu_load_update_nohz_start();
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
@@ -803,11 +804,11 @@ out:
 	return tick;
 }
 
-static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now, int active)
+static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
 {
 	/* Update jiffies first */
 	tick_do_update_jiffies64(now);
-	cpu_load_update_nohz(active);
+	cpu_load_update_nohz_stop();
 
 	calc_load_exit_idle();
 	touch_softlockup_watchdog_sched();
@@ -834,7 +835,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
-		tick_nohz_restart_sched_tick(ts, ktime_get(), 1);
+		tick_nohz_restart_sched_tick(ts, ktime_get());
 #endif
 }
 
@@ -1025,7 +1026,7 @@ void tick_nohz_idle_exit(void)
 		tick_nohz_stop_idle(ts, now);
 
 	if (ts->tick_stopped) {
-		tick_nohz_restart_sched_tick(ts, now, 0);
+		tick_nohz_restart_sched_tick(ts, now);
 		tick_nohz_account_idle_ticks(ts);
 	}
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] sched: Optimize tick periodic cpu load updates
  2016-04-01 13:23 [PATCH 0/4] sched: Fix/improve nohz cpu load updates Frederic Weisbecker
  2016-04-01 13:23 ` [PATCH 1/4] sched: Gather cpu load functions under a common namespace Frederic Weisbecker
  2016-04-01 13:23 ` [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
@ 2016-04-01 13:23 ` Frederic Weisbecker
  2016-04-02  7:23   ` Peter Zijlstra
  2016-04-01 13:23 ` [PATCH 4/4] sched: Conditionally build cpu load decay code for nohz Frederic Weisbecker
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-01 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

Don't bother with the whole pending tickless cpu load machinery if
we run a tick periodic kernel. That's less job for the CPU on ticks.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/fair.c  | 47 +++++++++++++++++++++++------------------------
 kernel/sched/sched.h |  4 +++-
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 394f008..1bb053e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4526,7 +4526,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * see decay_load_misses(). For NOHZ_FULL we get to subtract and add the extra
  * term. See the @active paramter.
  */
-static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
+static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 			      unsigned long pending_updates)
 {
 	unsigned long tickless_load = this_rq->cpu_load[0];
@@ -4567,24 +4567,6 @@ static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
 	sched_avg_update(this_rq);
 }
 
-static void cpu_load_update(struct rq *this_rq,
-			    unsigned long curr_jiffies,
-			    unsigned long load)
-{
-	unsigned long pending_updates;
-
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	if (pending_updates) {
-		this_rq->last_load_update_tick = curr_jiffies;
-		/*
-		 * In the regular NOHZ case, we were idle, this means load 0.
-		 * In the NOHZ_FULL case, we were non-idle, we should consider
-		 * its weighted load.
-		 */
-		__cpu_load_update(this_rq, load, pending_updates);
-	}
-}
-
 /* Used instead of source_load when we know the type == 0 */
 static unsigned long weighted_cpuload(const int cpu)
 {
@@ -4592,6 +4574,18 @@ static unsigned long weighted_cpuload(const int cpu)
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
+static unsigned long cpu_load_pending(struct rq *this_rq)
+{
+	unsigned long curr_jiffies = READ_ONCE(jiffies);
+	unsigned long pending_updates;
+
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	if (pending_updates)
+		this_rq->last_load_update_tick = curr_jiffies;
+
+	return pending_updates;
+}
+
 /*
  * There is no sane way to deal with nohz on smp when using jiffies because the
  * cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
@@ -4617,7 +4611,7 @@ static void cpu_load_update_idle(struct rq *this_rq)
 	if (weighted_cpuload(cpu_of(this_rq)))
 		return;
 
-	cpu_load_update(this_rq, READ_ONCE(jiffies), 0);
+	cpu_load_update(this_rq, 0, cpu_load_pending(this_rq));
 }
 
 /*
@@ -4641,18 +4635,23 @@ void cpu_load_update_nohz_start(void)
  */
 void cpu_load_update_nohz_stop(void)
 {
-	unsigned long curr_jiffies = READ_ONCE(jiffies);
 	struct rq *this_rq = this_rq();
 	unsigned long load;
 
-	if (curr_jiffies == this_rq->last_load_update_tick)
+	if (jiffies == this_rq->last_load_update_tick)
 		return;
 
 	load = weighted_cpuload(cpu_of(this_rq));
+
 	raw_spin_lock(&this_rq->lock);
-	cpu_load_update(this_rq, curr_jiffies, load);
+	cpu_load_update(this_rq, load, cpu_load_pending(this_rq));
 	raw_spin_unlock(&this_rq->lock);
 }
+#else /* !CONFIG_NO_HZ_COMMON */
+static inline unsigned long cpu_load_pending(struct rq *this_rq)
+{
+	return 1;
+}
 #endif /* CONFIG_NO_HZ_COMMON */
 
 /*
@@ -4662,7 +4661,7 @@ void cpu_load_update_active(struct rq *this_rq)
 {
 	unsigned long load = weighted_cpuload(cpu_of(this_rq));
 
-	cpu_load_update(this_rq, READ_ONCE(jiffies), load);
+	cpu_load_update(this_rq, load, cpu_load_pending(this_rq));
 }
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1802013..d951701 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -585,8 +585,10 @@ struct rq {
 #endif
 	#define CPU_LOAD_IDX_MAX 5
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
+#ifdef CONFIG_NO_HZ_COMMON
+# ifdef CONFIG_SMP
 	unsigned long last_load_update_tick;
-#ifdef CONFIG_NO_HZ_COMMON
+# endif
 	u64 nohz_stamp;
 	unsigned long nohz_flags;
 #endif
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] sched: Conditionally build cpu load decay code for nohz
  2016-04-01 13:23 [PATCH 0/4] sched: Fix/improve nohz cpu load updates Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2016-04-01 13:23 ` [PATCH 3/4] sched: Optimize tick periodic cpu load updates Frederic Weisbecker
@ 2016-04-01 13:23 ` Frederic Weisbecker
  2016-04-02  7:23   ` Peter Zijlstra
  3 siblings, 1 reply; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-01 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Frederic Weisbecker, Byungchul Park, Chris Metcalf,
	Thomas Gleixner, Luiz Capitulino, Christoph Lameter,
	Paul E . McKenney, Mike Galbraith, Rik van Riel, Ingo Molnar

To complete the tick periodic kernel optimizations.

Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/fair.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1bb053e..0bb872e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4423,6 +4423,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 }
 
 #ifdef CONFIG_SMP
+#ifdef CONFIG_NO_HZ_COMMON
 
 /*
  * per rq 'load' arrray crap; XXX kill this.
@@ -4490,6 +4491,33 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	return load;
 }
 
+static unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+		       unsigned long pending_updates, int idx)
+{
+	old_load = decay_load_missed(old_load, pending_updates - 1, idx);
+	if (tickless_load) {
+		old_load -= decay_load_missed(tickless_load, pending_updates - 1, idx);
+		/*
+		 * old_load can never be a negative value because a
+		 * decayed tickless_load cannot be greater than the
+		 * original tickless_load.
+		 */
+		old_load += tickless_load;
+	}
+	return old_load;
+}
+#else /* !CONFIG_NO_HZ_COMMON */
+
+static inline unsigned long
+cpu_load_update_missed(unsigned long old_load, unsigned long tickless_load,
+		       unsigned long pending_updates, int idx)
+{
+	return old_load;
+}
+
+#endif /* CONFIG_NO_HZ_COMMON */
+
 /**
  * __cpu_load_update - update the rq->cpu_load[] statistics
  * @this_rq: The rq to update statistics for
@@ -4541,17 +4569,8 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
-		old_load = this_rq->cpu_load[i];
-		old_load = decay_load_missed(old_load, pending_updates - 1, i);
-		if (tickless_load) {
-			old_load -= decay_load_missed(tickless_load, pending_updates - 1, i);
-			/*
-			 * old_load can never be a negative value because a
-			 * decayed tickless_load cannot be greater than the
-			 * original tickless_load.
-			 */
-			old_load += tickless_load;
-		}
+		old_load = cpu_load_update_missed(this_rq->cpu_load[i],
+						  tickless_load, pending_updates, i);
 		new_load = this_load;
 		/*
 		 * Round up the averaging division if load is increasing. This
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] sched: Gather cpu load functions under a common namespace
  2016-04-01 13:23 ` [PATCH 1/4] sched: Gather cpu load functions under a common namespace Frederic Weisbecker
@ 2016-04-02  7:09   ` Peter Zijlstra
  2016-04-02 12:28     ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-02  7:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 01, 2016 at 03:23:04PM +0200, Frederic Weisbecker wrote:
> This way they are easily grep'able and recognized.

Please mention the actual renames done and the new naming scheme.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-01 13:23 ` [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
@ 2016-04-02  7:15   ` Peter Zijlstra
  2016-04-02 12:32     ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-02  7:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 01, 2016 at 03:23:05PM +0200, Frederic Weisbecker wrote:
> Ticks can happen in the middle of a nohz frame and

I'm still miffed with that.. And this changelog doesn't even explain why
and how.

> cpu_load_update_active() doesn't handle these correctly. It forgets the
> whole previous tickless load and just records the current tick, ignoring
> potentially long idle periods.
> 
> In order to solve this, record the load on nohz frame entry so we know
> what to record in case of nohz interruptions, then use this recorded load
> to account the tickless load on nohz ticks and nohz frame end.


> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f33764d..394f008 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4527,9 +4527,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
>   * term. See the @active paramter.

^^^^^^^^^^^^^^

What active parameter... you need to update that comment.

>   */
>  static void __cpu_load_update(struct rq *this_rq, unsigned long this_load,
> -			      unsigned long pending_updates, int active)
> +			      unsigned long pending_updates)
>  {
> -	unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;
> +	unsigned long tickless_load = this_rq->cpu_load[0];
>  	int i, scale;
>  
>  	this_rq->nr_load_updates++;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] sched: Optimize tick periodic cpu load updates
  2016-04-01 13:23 ` [PATCH 3/4] sched: Optimize tick periodic cpu load updates Frederic Weisbecker
@ 2016-04-02  7:23   ` Peter Zijlstra
  2016-04-02 12:38     ` Frederic Weisbecker
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-02  7:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 01, 2016 at 03:23:06PM +0200, Frederic Weisbecker wrote:
> Don't bother with the whole pending tickless cpu load machinery if
> we run a tick periodic kernel. That's less job for the CPU on ticks.

Again, the changelog really could use help. Is this a pure optimization
patch? If so, do you have numbers?

> +++ b/kernel/sched/sched.h
> @@ -585,8 +585,10 @@ struct rq {
>  #endif
>  	#define CPU_LOAD_IDX_MAX 5
>  	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
> +#ifdef CONFIG_NO_HZ_COMMON
> +# ifdef CONFIG_SMP

I'm not a fan of this #ifdef indenting and nothing near there uses this
style, so please don't introduce it here.

>  	unsigned long last_load_update_tick;
> -#ifdef CONFIG_NO_HZ_COMMON
> +# endif
>  	u64 nohz_stamp;
>  	unsigned long nohz_flags;
>  #endif
> -- 
> 2.7.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] sched: Conditionally build cpu load decay code for nohz
  2016-04-01 13:23 ` [PATCH 4/4] sched: Conditionally build cpu load decay code for nohz Frederic Weisbecker
@ 2016-04-02  7:23   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2016-04-02  7:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Fri, Apr 01, 2016 at 03:23:07PM +0200, Frederic Weisbecker wrote:
> To complete the tick periodic kernel optimizations.

-ENOCHANGELOG

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] sched: Gather cpu load functions under a common namespace
  2016-04-02  7:09   ` Peter Zijlstra
@ 2016-04-02 12:28     ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-02 12:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Sat, Apr 02, 2016 at 09:09:08AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 01, 2016 at 03:23:04PM +0200, Frederic Weisbecker wrote:
> > This way they are easily grep'able and recognized.
> 
> Please mention the actual renames done and the new naming scheme.

Right, I'll add more details.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting
  2016-04-02  7:15   ` Peter Zijlstra
@ 2016-04-02 12:32     ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-02 12:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Sat, Apr 02, 2016 at 09:15:20AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 01, 2016 at 03:23:05PM +0200, Frederic Weisbecker wrote:
> > Ticks can happen in the middle of a nohz frame and
> 
> I'm still miffed with that.. And this changelog doesn't even explain why
> and how.

Indeed, it looks like I've cooked sloppy changelogs in this series, I'll
do another pass on all of them.

> 
> > cpu_load_update_active() doesn't handle these correctly. It forgets the
> > whole previous tickless load and just records the current tick, ignoring
> > potentially long idle periods.
> > 
> > In order to solve this, record the load on nohz frame entry so we know
> > what to record in case of nohz interruptions, then use this recorded load
> > to account the tickless load on nohz ticks and nohz frame end.
> 
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index f33764d..394f008 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4527,9 +4527,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
> >   * term. See the @active paramter.
> 
> ^^^^^^^^^^^^^^
> 
> What active parameter... you need to update that comment.

Yeah, forgot that.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] sched: Optimize tick periodic cpu load updates
  2016-04-02  7:23   ` Peter Zijlstra
@ 2016-04-02 12:38     ` Frederic Weisbecker
  0 siblings, 0 replies; 12+ messages in thread
From: Frederic Weisbecker @ 2016-04-02 12:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Byungchul Park, Chris Metcalf, Thomas Gleixner,
	Luiz Capitulino, Christoph Lameter, Paul E . McKenney,
	Mike Galbraith, Rik van Riel, Ingo Molnar

On Sat, Apr 02, 2016 at 09:23:37AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 01, 2016 at 03:23:06PM +0200, Frederic Weisbecker wrote:
> > Don't bother with the whole pending tickless cpu load machinery if
> > we run a tick periodic kernel. That's less job for the CPU on ticks.
> 
> Again, the changelog really could use help. Is this a pure optimization
> patch? If so, do you have numbers?

Well until now we have always tried to keep the nohz code under ifdef.
For optimizations and kernel size. I haven't measured it though, I guess
the gain is hardly visible.

> 
> > +++ b/kernel/sched/sched.h
> > @@ -585,8 +585,10 @@ struct rq {
> >  #endif
> >  	#define CPU_LOAD_IDX_MAX 5
> >  	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
> > +#ifdef CONFIG_NO_HZ_COMMON
> > +# ifdef CONFIG_SMP
> 
> I'm not a fan of this #ifdef indenting and nothing near there uses this
> style, so please don't introduce it here.

Ok.

Thanks.

> 
> >  	unsigned long last_load_update_tick;
> > -#ifdef CONFIG_NO_HZ_COMMON
> > +# endif
> >  	u64 nohz_stamp;
> >  	unsigned long nohz_flags;
> >  #endif
> > -- 
> > 2.7.0
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-04-02 12:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 13:23 [PATCH 0/4] sched: Fix/improve nohz cpu load updates Frederic Weisbecker
2016-04-01 13:23 ` [PATCH 1/4] sched: Gather cpu load functions under a common namespace Frederic Weisbecker
2016-04-02  7:09   ` Peter Zijlstra
2016-04-02 12:28     ` Frederic Weisbecker
2016-04-01 13:23 ` [PATCH 2/4] sched: Correctly handle nohz ticks cpu load accounting Frederic Weisbecker
2016-04-02  7:15   ` Peter Zijlstra
2016-04-02 12:32     ` Frederic Weisbecker
2016-04-01 13:23 ` [PATCH 3/4] sched: Optimize tick periodic cpu load updates Frederic Weisbecker
2016-04-02  7:23   ` Peter Zijlstra
2016-04-02 12:38     ` Frederic Weisbecker
2016-04-01 13:23 ` [PATCH 4/4] sched: Conditionally build cpu load decay code for nohz Frederic Weisbecker
2016-04-02  7:23   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.