linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost
@ 2020-03-30  9:01 Valentin Schneider
  2020-03-30 13:35 ` Peter Zijlstra
  2020-04-08 12:20 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Valentin Schneider
  0 siblings, 2 replies; 3+ messages in thread
From: Valentin Schneider @ 2020-03-30  9:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, vincent.guittot, dietmar.eggemann, riel

sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq->avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

"""
The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
has a signed type and a negative value, the resulting value is
implementation-defined.
"""

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

-4      -> 0xF..FC
-4 >> 3 -> 0xF..FF == -1 != -4 / 8

-8      -> 0xF..F8
-8 >> 3 -> 0xF..FF == -1 == -8 / 8

-9      -> 0xF..F7
-9 >> 3 -> 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

- sub x1, x1, x0
+ subs x1, x1, x0 // set condition codes
+ add x0, x1, #0x7
+ csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1
  add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/core.c  | 6 ------
 kernel/sched/fair.c  | 7 ++-----
 kernel/sched/sched.h | 6 ++++++
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c1f923d647ee..fc0776ee01bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2124,12 +2124,6 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	return cpu;
 }
 
-static void update_avg(u64 *avg, u64 sample)
-{
-	s64 diff = sample - *avg;
-	*avg += diff >> 3;
-}
-
 void sched_set_stop_task(int cpu, struct task_struct *stop)
 {
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7fb20adabeb..90b600c9b5a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6080,8 +6080,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	struct sched_domain *this_sd;
 	u64 avg_cost, avg_idle;
-	u64 time, cost;
-	s64 delta;
+	u64 time;
 	int this = smp_processor_id();
 	int cpu, nr = INT_MAX;
 
@@ -6119,9 +6118,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	time = cpu_clock(this) - time;
-	cost = this_sd->avg_scan_cost;
-	delta = (s64)(time - cost) / 8;
-	this_sd->avg_scan_cost += delta;
+	update_avg(&this_sd->avg_scan_cost, time);
 
 	return cpu;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1e72d1b3d3ce..80ee7e9c0b40 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -195,6 +195,12 @@ static inline int task_has_dl_policy(struct task_struct *p)
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+static inline void update_avg(u64 *avg, u64 sample)
+{
+	s64 diff = sample - *avg;
+	*avg += diff / 8;
+}
+
 /*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost
  2020-03-30  9:01 [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost Valentin Schneider
@ 2020-03-30 13:35 ` Peter Zijlstra
  2020-04-08 12:20 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2020-03-30 13:35 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, riel

On Mon, Mar 30, 2020 at 10:01:27AM +0100, Valentin Schneider wrote:
> 
> """
> The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
> has a signed type and a negative value, the resulting value is
> implementation-defined.

True; but the kernel uses -fwrapv, which would mandate 2s complement,
which then gets us to the next point:

> """
> 
> Not only this, but (arithmetic) right shifting a negative value (using 2's
> complement) is *not* equivalent to dividing it by the corresponding power
> of 2. 

True; good catch.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: sched/urgent] sched/fair: Align rq->avg_idle and rq->avg_scan_cost
  2020-03-30  9:01 [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost Valentin Schneider
  2020-03-30 13:35 ` Peter Zijlstra
@ 2020-04-08 12:20 ` tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-04-08 12:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb
Gitweb:        https://git.kernel.org/tip/d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Mon, 30 Mar 2020 10:01:27 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 08 Apr 2020 11:35:18 +02:00

sched/fair: Align rq->avg_idle and rq->avg_scan_cost

sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq->avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

  """
  The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
  has a signed type and a negative value, the resulting value is
  implementation-defined.
  """

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

  -4      -> 0xF..FC
  -4 >> 3 -> 0xF..FF == -1 != -4 / 8

  -8      -> 0xF..F8
  -8 >> 3 -> 0xF..FF == -1 == -8 / 8

  -9      -> 0xF..F7
  -9 >> 3 -> 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

  - sub x1, x1, x0
  + subs x1, x1, x0 // set condition codes
  + add x0, x1, #0x7
  + csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1
    add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com
---
 kernel/sched/core.c  | 6 ------
 kernel/sched/fair.c  | 7 ++-----
 kernel/sched/sched.h | 6 ++++++
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a2694ba..f6b329b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2119,12 +2119,6 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	return cpu;
 }
 
-static void update_avg(u64 *avg, u64 sample)
-{
-	s64 diff = sample - *avg;
-	*avg += diff >> 3;
-}
-
 void sched_set_stop_task(int cpu, struct task_struct *stop)
 {
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1ea3ddd..fb025e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6080,8 +6080,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	struct sched_domain *this_sd;
 	u64 avg_cost, avg_idle;
-	u64 time, cost;
-	s64 delta;
+	u64 time;
 	int this = smp_processor_id();
 	int cpu, nr = INT_MAX;
 
@@ -6119,9 +6118,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	time = cpu_clock(this) - time;
-	cost = this_sd->avg_scan_cost;
-	delta = (s64)(time - cost) / 8;
-	this_sd->avg_scan_cost += delta;
+	update_avg(&this_sd->avg_scan_cost, time);
 
 	return cpu;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0f616bf..cd00814 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -195,6 +195,12 @@ static inline int task_has_dl_policy(struct task_struct *p)
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+static inline void update_avg(u64 *avg, u64 sample)
+{
+	s64 diff = sample - *avg;
+	*avg += diff / 8;
+}
+
 /*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-08 12:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30  9:01 [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost Valentin Schneider
2020-03-30 13:35 ` Peter Zijlstra
2020-04-08 12:20 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).