All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost
@ 2020-03-30  9:01 Valentin Schneider
  2020-03-30 13:35 ` Peter Zijlstra
  2020-04-08 12:20 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Valentin Schneider
  0 siblings, 2 replies; 3+ messages in thread
From: Valentin Schneider @ 2020-03-30  9:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, vincent.guittot, dietmar.eggemann, riel

sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq->avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

"""
The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
has a signed type and a negative value, the resulting value is
implementation-defined.
"""

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

-4      -> 0xF..FC
-4 >> 3 -> 0xF..FF == -1 != -4 / 8

-8      -> 0xF..F8
-8 >> 3 -> 0xF..FF == -1 == -8 / 8

-9      -> 0xF..F7
-9 >> 3 -> 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

- sub x1, x1, x0
+ subs x1, x1, x0 // set condition codes
+ add x0, x1, #0x7
+ csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1
  add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/core.c  | 6 ------
 kernel/sched/fair.c  | 7 ++-----
 kernel/sched/sched.h | 6 ++++++
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c1f923d647ee..fc0776ee01bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2124,12 +2124,6 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	return cpu;
 }
 
-static void update_avg(u64 *avg, u64 sample)
-{
-	s64 diff = sample - *avg;
-	*avg += diff >> 3;
-}
-
 void sched_set_stop_task(int cpu, struct task_struct *stop)
 {
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7fb20adabeb..90b600c9b5a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6080,8 +6080,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	struct sched_domain *this_sd;
 	u64 avg_cost, avg_idle;
-	u64 time, cost;
-	s64 delta;
+	u64 time;
 	int this = smp_processor_id();
 	int cpu, nr = INT_MAX;
 
@@ -6119,9 +6118,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	time = cpu_clock(this) - time;
-	cost = this_sd->avg_scan_cost;
-	delta = (s64)(time - cost) / 8;
-	this_sd->avg_scan_cost += delta;
+	update_avg(&this_sd->avg_scan_cost, time);
 
 	return cpu;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1e72d1b3d3ce..80ee7e9c0b40 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -195,6 +195,12 @@ static inline int task_has_dl_policy(struct task_struct *p)
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+static inline void update_avg(u64 *avg, u64 sample)
+{
+	s64 diff = sample - *avg;
+	*avg += diff / 8;
+}
+
 /*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost
  2020-03-30  9:01 [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost Valentin Schneider
@ 2020-03-30 13:35 ` Peter Zijlstra
  2020-04-08 12:20 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2020-03-30 13:35 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, riel

On Mon, Mar 30, 2020 at 10:01:27AM +0100, Valentin Schneider wrote:
> 
> """
> The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
> has a signed type and a negative value, the resulting value is
> implementation-defined.

True; but the kernel uses -fwrapv, which would mandate 2s complement,
which then gets us to the next point:

> """
> 
> Not only this, but (arithmetic) right shifting a negative value (using 2's
> complement) is *not* equivalent to dividing it by the corresponding power
> of 2. 

True; good catch.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip: sched/urgent] sched/fair: Align rq->avg_idle and rq->avg_scan_cost
  2020-03-30  9:01 [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost Valentin Schneider
  2020-03-30 13:35 ` Peter Zijlstra
@ 2020-04-08 12:20 ` tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-04-08 12:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb
Gitweb:        https://git.kernel.org/tip/d76343c6b2b79f5e89c392bc9ce9dabc4c9e90cb
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Mon, 30 Mar 2020 10:01:27 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 08 Apr 2020 11:35:18 +02:00

sched/fair: Align rq->avg_idle and rq->avg_scan_cost

sched/core.c uses update_avg() for rq->avg_idle and sched/fair.c uses an
open-coded version (with the exact same decay factor) for
rq->avg_scan_cost. On top of that, select_idle_cpu() expects to be able to
compare these two fields.

The only difference between the two is that rq->avg_scan_cost is computed
using a pure division rather than a shift. Turns out it actually matters,
first of all because the shifted value can be negative, and the standard
has this to say about it:

  """
  The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1
  has a signed type and a negative value, the resulting value is
  implementation-defined.
  """

Not only this, but (arithmetic) right shifting a negative value (using 2's
complement) is *not* equivalent to dividing it by the corresponding power
of 2. Let's look at a few examples:

  -4      -> 0xF..FC
  -4 >> 3 -> 0xF..FF == -1 != -4 / 8

  -8      -> 0xF..F8
  -8 >> 3 -> 0xF..FF == -1 == -8 / 8

  -9      -> 0xF..F7
  -9 >> 3 -> 0xF..FE == -2 != -9 / 8

Make update_avg() use a division, and export it to the private scheduler
header to reuse it where relevant. Note that this still lets compilers use
a shift here, but should prevent any unwanted surprise. The disassembly of
select_idle_cpu() remains unchanged on arm64, and ttwu_do_wakeup() gains 2
instructions; the diff sort of looks like this:

  - sub x1, x1, x0
  + subs x1, x1, x0 // set condition codes
  + add x0, x1, #0x7
  + csel x0, x0, x1, mi // x0 = x1 < 0 ? x0 : x1
    add x0, x3, x0, asr #3

which does the right thing (i.e. gives us the expected result while still
using an arithmetic shift)

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200330090127.16294-1-valentin.schneider@arm.com
---
 kernel/sched/core.c  | 6 ------
 kernel/sched/fair.c  | 7 ++-----
 kernel/sched/sched.h | 6 ++++++
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a2694ba..f6b329b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2119,12 +2119,6 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	return cpu;
 }
 
-static void update_avg(u64 *avg, u64 sample)
-{
-	s64 diff = sample - *avg;
-	*avg += diff >> 3;
-}
-
 void sched_set_stop_task(int cpu, struct task_struct *stop)
 {
 	struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1ea3ddd..fb025e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6080,8 +6080,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	struct sched_domain *this_sd;
 	u64 avg_cost, avg_idle;
-	u64 time, cost;
-	s64 delta;
+	u64 time;
 	int this = smp_processor_id();
 	int cpu, nr = INT_MAX;
 
@@ -6119,9 +6118,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	}
 
 	time = cpu_clock(this) - time;
-	cost = this_sd->avg_scan_cost;
-	delta = (s64)(time - cost) / 8;
-	this_sd->avg_scan_cost += delta;
+	update_avg(&this_sd->avg_scan_cost, time);
 
 	return cpu;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0f616bf..cd00814 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -195,6 +195,12 @@ static inline int task_has_dl_policy(struct task_struct *p)
 
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
+static inline void update_avg(u64 *avg, u64 sample)
+{
+	s64 diff = sample - *avg;
+	*avg += diff / 8;
+}
+
 /*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-04-08 12:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-30  9:01 [PATCH] sched: Align rq->avg_idle and rq->avg_scan_cost Valentin Schneider
2020-03-30 13:35 ` Peter Zijlstra
2020-04-08 12:20 ` [tip: sched/urgent] sched/fair: " tip-bot2 for Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.