All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] sched: Disabled LB_BIAS with full dynticks
@ 2013-06-20 20:45 Frederic Weisbecker
  2013-06-20 20:45 ` [RFC PATCH 1/4] sched: Disable lb_bias feature for " Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-20 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov,
	Alex Shi, Paul Turner, Mike Galbraith, Vincent Guittot


Hi,

So I've been looking at update_cpu_load_active(),
update_idle_cpu_load() and update_cpu_load_nohz() to
conclude that dynticks are well handled by decay_load_missed()
which catch up with the missed nohz load.

However decay_load_missed() seem to always assume that the nohz
load was zero. So it works well for dynticks idle but it doesn't
seem to fit well for full dynticks. (My understanding of the maths involved
in decay_load_missed might be wrong though, so correct me if
I brainfarted).

Moreover even if we had a mathematical solution to catch up with
full dynticks load like decay_load_missed() does for dynticks idle, we
would need to be able to perform the update from remote readers such as
source_load() or target_load(), which is another problem on its own.

So this patchset propose a simple solution as a start: disable LB_BIAS
sched feature in full dynticks. LB_BIAS is the only user of these
decayed cpu load stats.

LB_BIAS sched feature is quite an obfuscated feature though
so I'm not sure about all the implication of that. It seems to
be about the selection of CPU targets from the different enqueuing
places that each have a different timeslice zooming weight while observing
a target load. And that weight influence the CPU selection.

May be for HPC loads that run only one task per CPU it doesn't matter
much. But some other workloads may suffer from that, especially when
full dynticks will be usable when more than a single task compete on
the CPUs.

Anyway at least this patchset can help starting a discussion.

Those who want to play can fetch from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	sched/core


Thanks,
	Frederic
---

Frederic Weisbecker (4):
      sched: Disable lb_bias feature for full dynticks
      sched: Consolidate nohz cpu load prelude code
      sched: Conditionally build decaying cpu load stats
      sched: Consolidate open coded preemptible() checks


 kernel/context_tracking.c |    3 +-
 kernel/sched/core.c       |    4 +--
 kernel/sched/fair.c       |   13 ++++++-
 kernel/sched/features.h   |    3 ++
 kernel/sched/proc.c       |   75 ++++++++++++++++++++++-----------------------
 kernel/sched/sched.h      |    4 ++
 6 files changed, 57 insertions(+), 45 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH 1/4] sched: Disable lb_bias feature for full dynticks
  2013-06-20 20:45 [RFC PATCH 0/4] sched: Disabled LB_BIAS with full dynticks Frederic Weisbecker
@ 2013-06-20 20:45 ` Frederic Weisbecker
  2013-06-20 21:01   ` Paul E. McKenney
  2013-06-20 20:45 ` [RFC PATCH 2/4] sched: Consolidate nohz cpu load prelude code Frederic Weisbecker
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-20 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov,
	Alex Shi, Paul Turner, Mike Galbraith, Vincent Guittot

If we run in full dynticks mode, we currently have no way to
correctly update the secondary decaying indexes of the CPU
load stats as it is typically maintained by update_cpu_load_active()
at each tick.

We have an available infrastructure that handles tickless loads
(cf: decay_load_missed) but it seems to only work for idle tickless
loads, which only applies if the CPU hasn't run any real task but
idle on the tickless timeslice.

Until we can provide a sane mathematical solution to handle full
dynticks loads, lets simply deactivate the LB_BIAS sched feature
if CONFIG_NO_HZ_FULL as it is currently the only user of the decayed
load records.

The first load index that represents the current runqueue load weight
is still maintained and usable.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c     |   13 +++++++++++--
 kernel/sched/features.h |    3 +++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c0ac2c3..2e8df6f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2937,6 +2937,15 @@ static unsigned long weighted_cpuload(const int cpu)
 	return cpu_rq(cpu)->load.weight;
 }
 
+static inline int sched_lb_bias(void)
+{
+#ifndef CONFIG_NO_HZ_FULL
+	return sched_feat(LB_BIAS);
+#else
+	return 0;
+#endif
+}
+
 /*
  * Return a low guess at the load of a migration-source cpu weighted
  * according to the scheduling class and "nice" value.
@@ -2949,7 +2958,7 @@ static unsigned long source_load(int cpu, int type)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
-	if (type == 0 || !sched_feat(LB_BIAS))
+	if (type == 0 || !sched_lb_bias())
 		return total;
 
 	return min(rq->cpu_load[type-1], total);
@@ -2964,7 +2973,7 @@ static unsigned long target_load(int cpu, int type)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long total = weighted_cpuload(cpu);
 
-	if (type == 0 || !sched_feat(LB_BIAS))
+	if (type == 0 || !sched_lb_bias())
 		return total;
 
 	return max(rq->cpu_load[type-1], total);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 99399f8..635f902 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -43,7 +43,10 @@ SCHED_FEAT(ARCH_POWER, true)
 
 SCHED_FEAT(HRTICK, false)
 SCHED_FEAT(DOUBLE_TICK, false)
+
+#ifndef CONFIG_NO_HZ_FULL
 SCHED_FEAT(LB_BIAS, true)
+#endif
 
 /*
  * Decrement CPU power based on time not spent running tasks
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/4] sched: Consolidate nohz cpu load prelude code
  2013-06-20 20:45 [RFC PATCH 0/4] sched: Disabled LB_BIAS with full dynticks Frederic Weisbecker
  2013-06-20 20:45 ` [RFC PATCH 1/4] sched: Disable lb_bias feature for " Frederic Weisbecker
@ 2013-06-20 20:45 ` Frederic Weisbecker
  2013-06-20 21:01   ` Paul E. McKenney
  2013-06-20 20:45 ` [RFC PATCH 3/4] sched: Conditionally build decaying cpu load stats Frederic Weisbecker
  2013-06-20 20:45 ` [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
  3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-20 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov,
	Alex Shi, Paul Turner, Mike Galbraith, Vincent Guittot

Gather the common code that computes the pending idle cpu load
to decay.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/proc.c |   40 ++++++++++++++--------------------------
 1 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index bb3a6a0..030528a 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -470,11 +470,14 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
  * scheduler tick (TICK_NSEC). With tickless idle this will not be called
  * every tick. We fix it up based on jiffies.
  */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
-			      unsigned long pending_updates)
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
 {
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long pending_updates;
 	int i, scale;
 
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	this_rq->last_load_update_tick = curr_jiffies;
 	this_rq->nr_load_updates++;
 
 	/* Update our load: */
@@ -521,20 +524,15 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
  */
 void update_idle_cpu_load(struct rq *this_rq)
 {
-	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
 	unsigned long load = this_rq->load.weight;
-	unsigned long pending_updates;
 
 	/*
 	 * bail if there's load or we're actually up-to-date.
 	 */
-	if (load || curr_jiffies == this_rq->last_load_update_tick)
+	if (load || jiffies == this_rq->last_load_update_tick)
 		return;
 
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	this_rq->last_load_update_tick = curr_jiffies;
-
-	__update_cpu_load(this_rq, load, pending_updates);
+	__update_cpu_load(this_rq, load);
 }
 
 /*
@@ -543,22 +541,16 @@ void update_idle_cpu_load(struct rq *this_rq)
 void update_cpu_load_nohz(void)
 {
 	struct rq *this_rq = this_rq();
-	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-	unsigned long pending_updates;
 
-	if (curr_jiffies == this_rq->last_load_update_tick)
+	if (jiffies == this_rq->last_load_update_tick)
 		return;
 
 	raw_spin_lock(&this_rq->lock);
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	if (pending_updates) {
-		this_rq->last_load_update_tick = curr_jiffies;
-		/*
-		 * We were idle, this means load 0, the current load might be
-		 * !0 due to remote wakeups and the sort.
-		 */
-		__update_cpu_load(this_rq, 0, pending_updates);
-	}
+	/*
+	 * We were idle, this means load 0, the current load might be
+	 * !0 due to remote wakeups and the sort.
+	 */
+	__update_cpu_load(this_rq, 0);
 	raw_spin_unlock(&this_rq->lock);
 }
 #endif /* CONFIG_NO_HZ */
@@ -568,11 +560,7 @@ void update_cpu_load_nohz(void)
  */
 void update_cpu_load_active(struct rq *this_rq)
 {
-	/*
-	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
-	 */
-	this_rq->last_load_update_tick = jiffies;
-	__update_cpu_load(this_rq, this_rq->load.weight, 1);
+	__update_cpu_load(this_rq, this_rq->load.weight);
 
 	calc_load_account_active(this_rq);
 }
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 3/4] sched: Conditionally build decaying cpu load stats
  2013-06-20 20:45 [RFC PATCH 0/4] sched: Disabled LB_BIAS with full dynticks Frederic Weisbecker
  2013-06-20 20:45 ` [RFC PATCH 1/4] sched: Disable lb_bias feature for " Frederic Weisbecker
  2013-06-20 20:45 ` [RFC PATCH 2/4] sched: Consolidate nohz cpu load prelude code Frederic Weisbecker
@ 2013-06-20 20:45 ` Frederic Weisbecker
  2013-06-20 20:45 ` [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
  3 siblings, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-20 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov,
	Alex Shi, Paul Turner, Mike Galbraith, Vincent Guittot

Now that the decaying cpu load stat indexes used by LB_BIAS
are ignored in full dynticks mode, let's conditionally build
that code to optimize the off case.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/proc.c  |   45 ++++++++++++++++++++++++++++-----------------
 kernel/sched/sched.h |    4 ++++
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index 030528a..34920e4 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -394,6 +394,7 @@ static void calc_load_account_active(struct rq *this_rq)
 	this_rq->calc_load_update += LOAD_FREQ;
 }
 
+#ifdef CONFIG_NO_HZ_IDLE
 /*
  * End of global load-average stuff
  */
@@ -465,26 +466,13 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 	return load;
 }
 
-/*
- * Update rq->cpu_load[] statistics. This function is usually called every
- * scheduler tick (TICK_NSEC). With tickless idle this will not be called
- * every tick. We fix it up based on jiffies.
- */
-static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
+static void update_cpu_load_decayed(struct rq *this_rq, unsigned long this_load,
+				    unsigned long pending_updates)
 {
-	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
-	unsigned long pending_updates;
 	int i, scale;
+	unsigned long old_load, new_load;
 
-	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
-	this_rq->last_load_update_tick = curr_jiffies;
-	this_rq->nr_load_updates++;
-
-	/* Update our load: */
-	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
 	for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
-		unsigned long old_load, new_load;
-
 		/* scale is effectively 1 << i now, and >> i divides by scale */
 
 		old_load = this_rq->cpu_load[i];
@@ -500,6 +488,30 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
 
 		this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
 	}
+}
+#else /* CONFIG_NO_HZ_IDLE */
+static inline void update_cpu_load_decayed(struct rq *this_rq, unsigned long this_load,
+					   unsigned long pending_updates)
+{ }
+#endif
+
+/*
+ * Update rq->cpu_load[] statistics. This function is usually called every
+ * scheduler tick (TICK_NSEC). With tickless idle this will not be called
+ * every tick. We fix it up based on jiffies.
+ */
+static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
+{
+	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
+	unsigned long pending_updates;
+
+	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
+	this_rq->last_load_update_tick = curr_jiffies;
+	this_rq->nr_load_updates++;
+
+	/* Update our load: */
+	this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
+	update_cpu_load_decayed(this_rq, this_load, pending_updates);
 
 	sched_avg_update(this_rq);
 }
@@ -561,6 +573,5 @@ void update_cpu_load_nohz(void)
 void update_cpu_load_active(struct rq *this_rq)
 {
 	__update_cpu_load(this_rq, this_rq->load.weight);
-
 	calc_load_account_active(this_rq);
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 029601a..ffa241df 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -410,7 +410,11 @@ struct rq {
 	 * remote CPUs use both these fields when doing load calculation.
 	 */
 	unsigned int nr_running;
+#ifdef CONFIG_NO_HZ_IDLE
 	#define CPU_LOAD_IDX_MAX 5
+#else
+	#define CPU_LOAD_IDX_MAX 1
+#endif
 	unsigned long cpu_load[CPU_LOAD_IDX_MAX];
 	unsigned long last_load_update_tick;
 #ifdef CONFIG_NO_HZ_COMMON
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks
  2013-06-20 20:45 [RFC PATCH 0/4] sched: Disabled LB_BIAS with full dynticks Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-06-20 20:45 ` [RFC PATCH 3/4] sched: Conditionally build decaying cpu load stats Frederic Weisbecker
@ 2013-06-20 20:45 ` Frederic Weisbecker
  2013-06-26 13:05   ` Peter Zijlstra
  3 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2013-06-20 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Ingo Molnar, Li Zhong, Paul E. McKenney,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Borislav Petkov,
	Alex Shi, Paul Turner, Mike Galbraith, Vincent Guittot

preempt_schedule() and preempt_schedule_context() open
code their preemptability checks.

Use the standard API instead for consolidation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alex Shi <alex.shi@intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/context_tracking.c |    3 +--
 kernel/sched/core.c       |    4 +---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 6667700..08db730 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -88,10 +88,9 @@ void user_enter(void)
  */
 void __sched notrace preempt_schedule_context(void)
 {
-	struct thread_info *ti = current_thread_info();
 	enum ctx_state prev_ctx;
 
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(!preemptible()))
 		return;
 
 	/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ceeaf0f..f38d0be 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2515,13 +2515,11 @@ void __sched schedule_preempt_disabled(void)
  */
 asmlinkage void __sched notrace preempt_schedule(void)
 {
-	struct thread_info *ti = current_thread_info();
-
 	/*
 	 * If there is a non-zero preempt_count or interrupts are disabled,
 	 * we do not want to preempt the current task. Just return..
 	 */
-	if (likely(ti->preempt_count || irqs_disabled()))
+	if (likely(!preemptible()))
 		return;
 
 	do {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/4] sched: Consolidate nohz cpu load prelude code
  2013-06-20 20:45 ` [RFC PATCH 2/4] sched: Consolidate nohz cpu load prelude code Frederic Weisbecker
@ 2013-06-20 21:01   ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2013-06-20 21:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Li Zhong, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Borislav Petkov, Alex Shi, Paul Turner,
	Mike Galbraith, Vincent Guittot

On Thu, Jun 20, 2013 at 10:45:39PM +0200, Frederic Weisbecker wrote:
> Gather the common code that computes the pending idle cpu load
> to decay.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Alex Shi <alex.shi@intel.com>
> Cc: Paul Turner <pjt@google.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/proc.c |   40 ++++++++++++++--------------------------
>  1 files changed, 14 insertions(+), 26 deletions(-)
> 
> diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
> index bb3a6a0..030528a 100644
> --- a/kernel/sched/proc.c
> +++ b/kernel/sched/proc.c
> @@ -470,11 +470,14 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
>   * scheduler tick (TICK_NSEC). With tickless idle this will not be called
>   * every tick. We fix it up based on jiffies.
>   */
> -static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> -			      unsigned long pending_updates)
> +static void __update_cpu_load(struct rq *this_rq, unsigned long this_load)
>  {
> +	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);

Isn't jiffies declared volatile?  (Looks that way to me.)  If so, there
is no need for ACCESS_ONCE().

> +	unsigned long pending_updates;
>  	int i, scale;
> 
> +	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
> +	this_rq->last_load_update_tick = curr_jiffies;
>  	this_rq->nr_load_updates++;
> 
>  	/* Update our load: */
> @@ -521,20 +524,15 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
>   */
>  void update_idle_cpu_load(struct rq *this_rq)
>  {
> -	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
>  	unsigned long load = this_rq->load.weight;
> -	unsigned long pending_updates;
> 
>  	/*
>  	 * bail if there's load or we're actually up-to-date.
>  	 */
> -	if (load || curr_jiffies == this_rq->last_load_update_tick)
> +	if (load || jiffies == this_rq->last_load_update_tick)
>  		return;
> 
> -	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
> -	this_rq->last_load_update_tick = curr_jiffies;
> -
> -	__update_cpu_load(this_rq, load, pending_updates);
> +	__update_cpu_load(this_rq, load);
>  }
> 
>  /*
> @@ -543,22 +541,16 @@ void update_idle_cpu_load(struct rq *this_rq)
>  void update_cpu_load_nohz(void)
>  {
>  	struct rq *this_rq = this_rq();
> -	unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> -	unsigned long pending_updates;
> 
> -	if (curr_jiffies == this_rq->last_load_update_tick)
> +	if (jiffies == this_rq->last_load_update_tick)
>  		return;
> 
>  	raw_spin_lock(&this_rq->lock);
> -	pending_updates = curr_jiffies - this_rq->last_load_update_tick;
> -	if (pending_updates) {
> -		this_rq->last_load_update_tick = curr_jiffies;
> -		/*
> -		 * We were idle, this means load 0, the current load might be
> -		 * !0 due to remote wakeups and the sort.
> -		 */
> -		__update_cpu_load(this_rq, 0, pending_updates);
> -	}
> +	/*
> +	 * We were idle, this means load 0, the current load might be
> +	 * !0 due to remote wakeups and the sort.
> +	 */
> +	__update_cpu_load(this_rq, 0);
>  	raw_spin_unlock(&this_rq->lock);
>  }
>  #endif /* CONFIG_NO_HZ */
> @@ -568,11 +560,7 @@ void update_cpu_load_nohz(void)
>   */
>  void update_cpu_load_active(struct rq *this_rq)
>  {
> -	/*
> -	 * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> -	 */
> -	this_rq->last_load_update_tick = jiffies;
> -	__update_cpu_load(this_rq, this_rq->load.weight, 1);
> +	__update_cpu_load(this_rq, this_rq->load.weight);
> 
>  	calc_load_account_active(this_rq);
>  }
> -- 
> 1.7.5.4
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 1/4] sched: Disable lb_bias feature for full dynticks
  2013-06-20 20:45 ` [RFC PATCH 1/4] sched: Disable lb_bias feature for " Frederic Weisbecker
@ 2013-06-20 21:01   ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2013-06-20 21:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Li Zhong, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Borislav Petkov, Alex Shi, Paul Turner,
	Mike Galbraith, Vincent Guittot

On Thu, Jun 20, 2013 at 10:45:38PM +0200, Frederic Weisbecker wrote:
> If we run in full dynticks mode, we currently have no way to
> correctly update the secondary decaying indexes of the CPU
> load stats as it is typically maintained by update_cpu_load_active()
> at each tick.
> 
> We have an available infrastructure that handles tickless loads
> (cf: decay_load_missed) but it seems to only work for idle tickless
> loads, which only applies if the CPU hasn't run any real task but
> idle on the tickless timeslice.
> 
> Until we can provide a sane mathematical solution to handle full
> dynticks loads, lets simply deactivate the LB_BIAS sched feature
> if CONFIG_NO_HZ_FULL as it is currently the only user of the decayed
> load records.
> 
> The first load index that represents the current runqueue load weight
> is still maintained and usable.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Alex Shi <alex.shi@intel.com>
> Cc: Paul Turner <pjt@google.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c     |   13 +++++++++++--
>  kernel/sched/features.h |    3 +++
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c0ac2c3..2e8df6f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2937,6 +2937,15 @@ static unsigned long weighted_cpuload(const int cpu)
>  	return cpu_rq(cpu)->load.weight;
>  }
> 
> +static inline int sched_lb_bias(void)
> +{
> +#ifndef CONFIG_NO_HZ_FULL
> +	return sched_feat(LB_BIAS);
> +#else
> +	return 0;
> +#endif
> +}
> +
>  /*
>   * Return a low guess at the load of a migration-source cpu weighted
>   * according to the scheduling class and "nice" value.
> @@ -2949,7 +2958,7 @@ static unsigned long source_load(int cpu, int type)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long total = weighted_cpuload(cpu);
> 
> -	if (type == 0 || !sched_feat(LB_BIAS))
> +	if (type == 0 || !sched_lb_bias())
>  		return total;
> 
>  	return min(rq->cpu_load[type-1], total);
> @@ -2964,7 +2973,7 @@ static unsigned long target_load(int cpu, int type)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long total = weighted_cpuload(cpu);
> 
> -	if (type == 0 || !sched_feat(LB_BIAS))
> +	if (type == 0 || !sched_lb_bias())
>  		return total;
> 
>  	return max(rq->cpu_load[type-1], total);
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 99399f8..635f902 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -43,7 +43,10 @@ SCHED_FEAT(ARCH_POWER, true)
> 
>  SCHED_FEAT(HRTICK, false)
>  SCHED_FEAT(DOUBLE_TICK, false)
> +
> +#ifndef CONFIG_NO_HZ_FULL
>  SCHED_FEAT(LB_BIAS, true)
> +#endif
> 
>  /*
>   * Decrement CPU power based on time not spent running tasks
> -- 
> 1.7.5.4
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks
  2013-06-20 20:45 ` [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
@ 2013-06-26 13:05   ` Peter Zijlstra
  2013-07-01 11:20     ` Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-06-26 13:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Li Zhong, Paul E. McKenney, Steven Rostedt,
	Thomas Gleixner, Borislav Petkov, Alex Shi, Paul Turner,
	Mike Galbraith, Vincent Guittot

On Thu, Jun 20, 2013 at 10:45:41PM +0200, Frederic Weisbecker wrote:
> preempt_schedule() and preempt_schedule_context() open
> code their preemptability checks.
> 
> Use the standard API instead for consolidation.
> 
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Alex Shi <alex.shi@intel.com>
> Cc: Paul Turner <pjt@google.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/context_tracking.c |    3 +--
>  kernel/sched/core.c       |    4 +---
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 6667700..08db730 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -88,10 +88,9 @@ void user_enter(void)
>   */
>  void __sched notrace preempt_schedule_context(void)
>  {
> -	struct thread_info *ti = current_thread_info();
>  	enum ctx_state prev_ctx;
>  
> -	if (likely(ti->preempt_count || irqs_disabled()))
> +	if (likely(!preemptible()))
>  		return;
>  

#ifdef CONFIG_PREEMPT_COUNT
# define preemptible()  (preempt_count() == 0 && !irqs_disabled())
#else
# define preemptible()  0
#endif


Wouldn't that give a problem for !PREEMPT_COUNT?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks
  2013-06-26 13:05   ` Peter Zijlstra
@ 2013-07-01 11:20     ` Frederic Weisbecker
  2013-07-01 11:55       ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2013-07-01 11:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Li Zhong, Paul E. McKenney, Steven Rostedt,
	Thomas Gleixner, Borislav Petkov, Alex Shi, Paul Turner,
	Mike Galbraith, Vincent Guittot

On Wed, Jun 26, 2013 at 03:05:11PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 20, 2013 at 10:45:41PM +0200, Frederic Weisbecker wrote:
> > preempt_schedule() and preempt_schedule_context() open
> > code their preemptability checks.
> > 
> > Use the standard API instead for consolidation.
> > 
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Alex Shi <alex.shi@intel.com>
> > Cc: Paul Turner <pjt@google.com>
> > Cc: Mike Galbraith <efault@gmx.de>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/context_tracking.c |    3 +--
> >  kernel/sched/core.c       |    4 +---
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> > index 6667700..08db730 100644
> > --- a/kernel/context_tracking.c
> > +++ b/kernel/context_tracking.c
> > @@ -88,10 +88,9 @@ void user_enter(void)
> >   */
> >  void __sched notrace preempt_schedule_context(void)
> >  {
> > -	struct thread_info *ti = current_thread_info();
> >  	enum ctx_state prev_ctx;
> >  
> > -	if (likely(ti->preempt_count || irqs_disabled()))
> > +	if (likely(!preemptible()))
> >  		return;
> >  
> 
> #ifdef CONFIG_PREEMPT_COUNT
> # define preemptible()  (preempt_count() == 0 && !irqs_disabled())
> #else
> # define preemptible()  0
> #endif
> 
> 
> Wouldn't that give a problem for !PREEMPT_COUNT?

preempt_schedule_context() depends on CONFIG_PREEMPT which depends on
CONFIG_PREEMPT_COUNT, so that should work. Or I missed something?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks
  2013-07-01 11:20     ` Frederic Weisbecker
@ 2013-07-01 11:55       ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2013-07-01 11:55 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Ingo Molnar, Li Zhong, Paul E. McKenney, Steven Rostedt,
	Thomas Gleixner, Borislav Petkov, Alex Shi, Paul Turner,
	Mike Galbraith, Vincent Guittot

On Mon, Jul 01, 2013 at 01:20:20PM +0200, Frederic Weisbecker wrote:
> On Wed, Jun 26, 2013 at 03:05:11PM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 20, 2013 at 10:45:41PM +0200, Frederic Weisbecker wrote:
> > > preempt_schedule() and preempt_schedule_context() open
> > > code their preemptability checks.
> > > 
> > > Use the standard API instead for consolidation.
> > > 
> > > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Li Zhong <zhong@linux.vnet.ibm.com>
> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: Alex Shi <alex.shi@intel.com>
> > > Cc: Paul Turner <pjt@google.com>
> > > Cc: Mike Galbraith <efault@gmx.de>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > ---
> > >  kernel/context_tracking.c |    3 +--
> > >  kernel/sched/core.c       |    4 +---
> > >  2 files changed, 2 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> > > index 6667700..08db730 100644
> > > --- a/kernel/context_tracking.c
> > > +++ b/kernel/context_tracking.c
> > > @@ -88,10 +88,9 @@ void user_enter(void)
> > >   */
> > >  void __sched notrace preempt_schedule_context(void)
> > >  {
> > > -	struct thread_info *ti = current_thread_info();
> > >  	enum ctx_state prev_ctx;
> > >  
> > > -	if (likely(ti->preempt_count || irqs_disabled()))
> > > +	if (likely(!preemptible()))
> > >  		return;
> > >  
> > 
> > #ifdef CONFIG_PREEMPT_COUNT
> > # define preemptible()  (preempt_count() == 0 && !irqs_disabled())
> > #else
> > # define preemptible()  0
> > #endif
> > 
> > 
> > Wouldn't that give a problem for !PREEMPT_COUNT?
> 
> preempt_schedule_context() depends on CONFIG_PREEMPT which depends on
> CONFIG_PREEMPT_COUNT, so that should work. Or I missed something?

Ah indeed. I just tripped over the fact that its weird to not test
irqs_disabled() even though we don't have the preempt count.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-01 11:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-20 20:45 [RFC PATCH 0/4] sched: Disabled LB_BIAS with full dynticks Frederic Weisbecker
2013-06-20 20:45 ` [RFC PATCH 1/4] sched: Disable lb_bias feature for " Frederic Weisbecker
2013-06-20 21:01   ` Paul E. McKenney
2013-06-20 20:45 ` [RFC PATCH 2/4] sched: Consolidate nohz cpu load prelude code Frederic Weisbecker
2013-06-20 21:01   ` Paul E. McKenney
2013-06-20 20:45 ` [RFC PATCH 3/4] sched: Conditionally build decaying cpu load stats Frederic Weisbecker
2013-06-20 20:45 ` [RFC PATCH 4/4] sched: Consolidate open coded preemptible() checks Frederic Weisbecker
2013-06-26 13:05   ` Peter Zijlstra
2013-07-01 11:20     ` Frederic Weisbecker
2013-07-01 11:55       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.