linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] sched/fair: Clean up sched metric definitions
@ 2015-10-04 17:56 Yuyang Du
  2015-10-04 17:56 ` [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Yuyang Du @ 2015-10-04 17:56 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: pjt, bsegall, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Hi Peter and Ingo,

As discussed recently, the sched metrics need a little bit cleanup. This
series of patches attempt to do that: refactor, rename, remove...

Thanks a lot to Ben, Morten, Dietmar, Vincent, and others who provided
valuable comments.

Thanks,
Yuyang

Yuyang Du (4):
  sched/fair: Generalize the load/util averages resolution definition
  sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE
  sched/fair: Remove scale_load_down() for load_avg
  sched/fair: Rename scale_load() and scale_load_down()

 include/linux/sched.h | 80 +++++++++++++++++++++++++++++++++++++++++++--------
 kernel/sched/core.c   |  8 +++---
 kernel/sched/fair.c   | 26 +++++++----------
 kernel/sched/sched.h  | 30 +++++++++++--------
 4 files changed, 101 insertions(+), 43 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition
  2015-10-04 17:56 [PATCH 0/4] sched/fair: Clean up sched metric definitions Yuyang Du
@ 2015-10-04 17:56 ` Yuyang Du
  2015-10-05  7:04   ` Ingo Molnar
  2015-10-05 10:52   ` Peter Zijlstra
  2015-10-04 17:56 ` [PATCH 2/4] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE Yuyang Du
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Yuyang Du @ 2015-10-04 17:56 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: pjt, bsegall, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Metric needs certain resolution to allow detail we can look into,
which also determines the range of the metric.

For instance, increasing the resolution of [0, 1] (two levels), one
can multiply 1024 and get [0..1024] (1025 levels).

In sched/fair, a few metrics depend on the resolution: weight, load,
load_avg, util_avg, freq, and capacity. In order to reduce the risks
of making mistakes relating to the resolution and range, we generalize
the resolution by defining a basic resolution constant number, and
then formalize all metrics to base on the basic resolution.

The basic resolution is 1024 or (1 << 10). Further, one can recursively
apply the basic resolution to have higher resolution.

Pointed out by Ben Segall, weight (visible to user, e.g., NICE-0 has
1024) and load (e.g., NICE_0_LOAD) have independent resolution, but
they must be well calibrated.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 include/linux/sched.h | 15 ++++++++++++---
 kernel/sched/fair.c   |  4 ----
 kernel/sched/sched.h  | 15 ++++++++++-----
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index bd38b3e..b3ba0fb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -910,9 +910,18 @@ enum cpu_idle_type {
 };
 
 /*
+ * Integer metrics need certain resolution to allow how much detail we
+ * can look into, e.g., load, load_avg, util_avg, freq, and capacity.
+ * We define a basic resolution constant number, and then formalize
+ * all these metrics based on that basic resolution.
+ */
+# define SCHED_RESOLUTION_SHIFT	10
+# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
+
+/*
  * Increase resolution of cpu_capacity calculations
  */
-#define SCHED_CAPACITY_SHIFT	10
+#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
 #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
 
 /*
@@ -1180,8 +1189,8 @@ struct load_weight {
  * 1) load_avg factors frequency scaling into the amount of time that a
  * sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
  * aggregated such weights of all runnable and blocked sched_entities.
- * 2) util_avg factors frequency and cpu scaling into the amount of time
- * that a sched_entity is running on a CPU, in the range [0..SCHED_LOAD_SCALE].
+ * 2) util_avg factors frequency and cpu capacity scaling into the amount of time
+ * that a sched_entity is running on a CPU, in the range [0..SCHED_CAPACITY_SCALE].
  * For cfs_rq, it is the aggregated such times of all runnable and
  * blocked sched_entities.
  * The 64 bit load_sum can:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4df37a4..c61fd8e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2522,10 +2522,6 @@ static u32 __compute_runnable_contrib(u64 n)
 	return contrib + runnable_avg_yN_sum[n];
 }
 
-#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
-#error "load tracking assumes 2^10 as unit"
-#endif
-
 #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3845a71..31b4022 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -53,18 +53,23 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
  * increased costs.
  */
 #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
-# define SCHED_LOAD_RESOLUTION	10
-# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
-# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
+# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
+# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
+# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
 #else
-# define SCHED_LOAD_RESOLUTION	0
+# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
 # define scale_load(w)		(w)
 # define scale_load_down(w)	(w)
 #endif
 
-#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
 #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
 
+/*
+ * NICE_0's weight (visible to user) and its load (invisible to user) have
+ * independent resolution, but they should be well calibrated. We use scale_load()
+ * and scale_load_down(w) to convert between them, the following must be true:
+ * scale_load(prio_to_weight[20]) == NICE_0_LOAD
+ */
 #define NICE_0_LOAD		SCHED_LOAD_SCALE
 #define NICE_0_SHIFT		SCHED_LOAD_SHIFT
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE
  2015-10-04 17:56 [PATCH 0/4] sched/fair: Clean up sched metric definitions Yuyang Du
  2015-10-04 17:56 ` [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
@ 2015-10-04 17:56 ` Yuyang Du
  2015-10-06  9:29   ` Vincent Guittot
  2015-10-04 17:56 ` [PATCH 3/4] sched/fair: Remove scale_load_down() for load_avg Yuyang Du
  2015-10-04 17:56 ` [PATCH 4/4] sched/fair: Rename scale_load() and scale_load_down() Yuyang Du
  3 siblings, 1 reply; 10+ messages in thread
From: Yuyang Du @ 2015-10-04 17:56 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: pjt, bsegall, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

After cleaning up the sched metrics, these two definitions that cause
ambiguity are not needed any more. Use NICE_0_LOAD_SHIFT and NICE_0_LOAD
instead (the names suggest clearly who they are).

Suggested-by: Ben Segall <bsegall@google.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/fair.c  |  4 ++--
 kernel/sched/sched.h | 22 +++++++++++-----------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c61fd8e..fdb7937 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,7 +682,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 	sa->period_contrib = 1023;
 	sa->load_avg = scale_load_down(se->load.weight);
 	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
-	sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
+	sa->util_avg = SCHED_CAPACITY_SCALE;
 	sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
@@ -6651,7 +6651,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	if (busiest->group_type == group_overloaded &&
 	    local->group_type   == group_overloaded) {
 		load_above_capacity = busiest->sum_nr_running *
-					SCHED_LOAD_SCALE;
+					SCHED_CAPACITY_SCALE;
 		if (load_above_capacity > busiest->group_capacity)
 			load_above_capacity -= busiest->group_capacity;
 		else
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 31b4022..3d03956 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -53,25 +53,25 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
  * increased costs.
  */
 #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
-# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
+# define NICE_0_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
 # define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
 # define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
 #else
-# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
+# define NICE_0_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
 # define scale_load(w)		(w)
 # define scale_load_down(w)	(w)
 #endif
 
-#define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
-
 /*
- * NICE_0's weight (visible to user) and its load (invisible to user) have
- * independent resolution, but they should be well calibrated. We use scale_load()
- * and scale_load_down(w) to convert between them, the following must be true:
- * scale_load(prio_to_weight[20]) == NICE_0_LOAD
+ * Task weight (visible to user) and its load (invisible to user) have
+ * independent resolution, but they should be well calibrated. We use
+ * scale_load() and scale_load_down(w) to convert between them. The
+ * following must be true:
+ *
+ *  scale_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ *
  */
-#define NICE_0_LOAD		SCHED_LOAD_SCALE
-#define NICE_0_SHIFT		SCHED_LOAD_SHIFT
+#define NICE_0_LOAD		(1L << NICE_0_LOAD_SHIFT)
 
 /*
  * Single value that decides SCHED_DEADLINE internal math precision.
@@ -850,7 +850,7 @@ DECLARE_PER_CPU(struct sched_domain *, sd_asym);
 struct sched_group_capacity {
 	atomic_t ref;
 	/*
-	 * CPU capacity of this group, SCHED_LOAD_SCALE being max capacity
+	 * CPU capacity of this group, SCHED_CAPACITY_SCALE being max capacity
 	 * for a single CPU.
 	 */
 	unsigned int capacity;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] sched/fair: Remove scale_load_down() for load_avg
  2015-10-04 17:56 [PATCH 0/4] sched/fair: Clean up sched metric definitions Yuyang Du
  2015-10-04 17:56 ` [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
  2015-10-04 17:56 ` [PATCH 2/4] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE Yuyang Du
@ 2015-10-04 17:56 ` Yuyang Du
  2015-10-04 17:56 ` [PATCH 4/4] sched/fair: Rename scale_load() and scale_load_down() Yuyang Du
  3 siblings, 0 replies; 10+ messages in thread
From: Yuyang Du @ 2015-10-04 17:56 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: pjt, bsegall, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Currently, load_avg = scale_load_down(load) * runnable%. This does
not make much sense, because load_avg is primarily the load that
takes runnable time ratio into account.

We therefore remove scale_load_down() for load_avg. But we need to
carefully consider the overflow risk if load has higher resolution
(2*SCHED_RESOLUTION_SHIFT). The only case an overflow may occur due
to us is on 64bit kernel with increased load resolution. In that
case, the 64bit load_sum can afford 4251057 (=2^64/47742/88761/1024)
entities with the highest load (=88761*1024) always runnable on one
single cfs_rq, which may be an issue, but should be fine.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 include/linux/sched.h | 69 +++++++++++++++++++++++++++++++++++++++++++--------
 kernel/sched/fair.c   | 11 ++++----
 2 files changed, 63 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b3ba0fb..a63271e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1185,18 +1185,65 @@ struct load_weight {
 };
 
 /*
- * The load_avg/util_avg accumulates an infinite geometric series.
- * 1) load_avg factors frequency scaling into the amount of time that a
- * sched_entity is runnable on a rq into its weight. For cfs_rq, it is the
- * aggregated such weights of all runnable and blocked sched_entities.
- * 2) util_avg factors frequency and cpu capacity scaling into the amount of time
- * that a sched_entity is running on a CPU, in the range [0..SCHED_CAPACITY_SCALE].
- * For cfs_rq, it is the aggregated such times of all runnable and
+ * The load_avg/util_avg accumulates an infinite geometric series
+ * (see __update_load_avg() in kernel/sched/fair.c).
+ *
+ * [load_avg definition]
+ *
+ * load_avg = runnable% * load
+ *
+ * where runnable% is the time ratio that a sched_entity is runnable.
+ * For cfs_rq, it is the aggregated such load_avg of all runnable and
  * blocked sched_entities.
- * The 64 bit load_sum can:
- * 1) for cfs_rq, afford 4353082796 (=2^64/47742/88761) entities with
- * the highest weight (=88761) always runnable, we should not overflow
- * 2) for entity, support any load.weight always runnable
+ *
+ * load_avg may also take frequency scaling into account:
+ *
+ * load_avg = runnable% * load * freq%
+ *
+ * where freq% is the CPU frequency normalize to the highest frequency
+ *
+ * [util_avg definition]
+ *
+ * util_avg = running% * SCHED_CAPACITY_SCALE
+ *
+ * where running% is the time ratio that a sched_entity is running on
+ * a CPU. For cfs_rq, it is the aggregated such util_avg of all running
+ * and blocked sched_entities.
+ *
+ * util_avg may also factor frequency scaling and CPU capacity scaling:
+ *
+ * util_avg = running% * SCHED_CAPACITY_SCALE * freq% * capacity%
+ *
+ * where freq% is the same as above, and capacity% is the CPU capacity
+ * normalized to the greatest capacity (due to uarch differences, etc).
+ *
+ * N.B., the above ratios (runnable%, running%, freq%, and capacity%)
+ * themselves are in the range of [0, 1]. We therefore scale them to
+ * not lose the intermediate values due to integer rounding and provide
+ * as fine resolution as necessary. This is for example reflected by
+ * util_avg's SCHED_CAPACITY_SCALE.
+ *
+ * [Overflow issue]
+ *
+ * On 64bit kernel:
+ *
+ * When load has low resolution (SCHED_RESOLUTION_SHIFT), the 64bit
+ * load_sum can have 4353082796 (=2^64/47742/88761) entities with
+ * the highest load (=88761) always runnable on a cfs_rq, we should
+ * not overflow.
+ *
+ * When load has high resolution (2*SCHED_RESOLUTION_SHIFT), the 64bit
+ * load_sum can have 4251057 (=2^64/47742/88761/1024) entities with
+ * the highest load (=88761*1024) always runnable on ONE cfs_rq, we
+ * should be fine.
+ *
+ * For all other cases (including 32bit kernel), struct load_weight's
+ * weight will overflow first before we do, because:
+ *
+ *    Max(load_avg) <= Max(load.weight)
+ *
+ * Then, it is the load_weight's responsibility to consider overflow
+ * issues.
  */
 struct sched_avg {
 	u64 last_update_time, load_sum;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdb7937..807d960 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,7 +680,7 @@ void init_entity_runnable_average(struct sched_entity *se)
 	 * will definitely be update (after enqueue).
 	 */
 	sa->period_contrib = 1023;
-	sa->load_avg = scale_load_down(se->load.weight);
+	sa->load_avg = se->load.weight;
 	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
 	sa->util_avg = SCHED_CAPACITY_SCALE;
 	sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
@@ -2697,7 +2697,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 	}
 
 	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-		scale_load_down(cfs_rq->load.weight), cfs_rq->curr != NULL, cfs_rq);
+		cfs_rq->load.weight, cfs_rq->curr != NULL, cfs_rq);
 
 #ifndef CONFIG_64BIT
 	smp_wmb();
@@ -2718,8 +2718,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
 	 * Track task load average for carrying it to new CPU after migrated, and
 	 * track group sched_entity load average for task_h_load calc in migration
 	 */
-	__update_load_avg(now, cpu, &se->avg,
-			  se->on_rq * scale_load_down(se->load.weight),
+	__update_load_avg(now, cpu, &se->avg, se->on_rq * se->load.weight,
 			  cfs_rq->curr == se, NULL);
 
 	if (update_cfs_rq_load_avg(now, cfs_rq) && update_tg)
@@ -2756,7 +2755,7 @@ skip_aging:
 static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	__update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
-			  &se->avg, se->on_rq * scale_load_down(se->load.weight),
+			  &se->avg, se->on_rq * se->load.weight,
 			  cfs_rq->curr == se, NULL);
 
 	cfs_rq->avg.load_avg = max_t(long, cfs_rq->avg.load_avg - se->avg.load_avg, 0);
@@ -2776,7 +2775,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	migrated = !sa->last_update_time;
 	if (!migrated) {
 		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
-			se->on_rq * scale_load_down(se->load.weight),
+			se->on_rq * se->load.weight,
 			cfs_rq->curr == se, NULL);
 	}
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] sched/fair: Rename scale_load() and scale_load_down()
  2015-10-04 17:56 [PATCH 0/4] sched/fair: Clean up sched metric definitions Yuyang Du
                   ` (2 preceding siblings ...)
  2015-10-04 17:56 ` [PATCH 3/4] sched/fair: Remove scale_load_down() for load_avg Yuyang Du
@ 2015-10-04 17:56 ` Yuyang Du
  3 siblings, 0 replies; 10+ messages in thread
From: Yuyang Du @ 2015-10-04 17:56 UTC (permalink / raw)
  To: mingo, peterz, linux-kernel
  Cc: pjt, bsegall, morten.rasmussen, vincent.guittot,
	dietmar.eggemann, Yuyang Du

Rename scale_load() and scale_load_down() to user_to_kernel_load()
and kernel_to_user_load() respectively, to allow the names to bear
what they are really about.

Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/core.c  |  8 ++++----
 kernel/sched/fair.c  |  7 ++++---
 kernel/sched/sched.h | 15 ++++++++-------
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ffe7b7e..1359871 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -818,12 +818,12 @@ static void set_load_weight(struct task_struct *p)
 	 * SCHED_IDLE tasks get minimal weight:
 	 */
 	if (idle_policy(p->policy)) {
-		load->weight = scale_load(WEIGHT_IDLEPRIO);
+		load->weight = user_to_kernel_load(WEIGHT_IDLEPRIO);
 		load->inv_weight = WMULT_IDLEPRIO;
 		return;
 	}
 
-	load->weight = scale_load(prio_to_weight[prio]);
+	load->weight = user_to_kernel_load(prio_to_weight[prio]);
 	load->inv_weight = prio_to_wmult[prio];
 }
 
@@ -8199,7 +8199,7 @@ static void cpu_cgroup_exit(struct cgroup_subsys_state *css,
 static int cpu_shares_write_u64(struct cgroup_subsys_state *css,
 				struct cftype *cftype, u64 shareval)
 {
-	return sched_group_set_shares(css_tg(css), scale_load(shareval));
+	return sched_group_set_shares(css_tg(css), user_to_kernel_load(shareval));
 }
 
 static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
@@ -8207,7 +8207,7 @@ static u64 cpu_shares_read_u64(struct cgroup_subsys_state *css,
 {
 	struct task_group *tg = css_tg(css);
 
-	return (u64) scale_load_down(tg->shares);
+	return (u64) kernel_to_user_load(tg->shares);
 }
 
 #ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 807d960..72db21e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -189,7 +189,7 @@ static void __update_inv_weight(struct load_weight *lw)
 	if (likely(lw->inv_weight))
 		return;
 
-	w = scale_load_down(lw->weight);
+	w = kernel_to_user_load(lw->weight);
 
 	if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
 		lw->inv_weight = 1;
@@ -213,7 +213,7 @@ static void __update_inv_weight(struct load_weight *lw)
  */
 static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct load_weight *lw)
 {
-	u64 fact = scale_load_down(weight);
+	u64 fact = kernel_to_user_load(weight);
 	int shift = WMULT_SHIFT;
 
 	__update_inv_weight(lw);
@@ -8205,7 +8205,8 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
 	if (!tg->se[0])
 		return -EINVAL;
 
-	shares = clamp(shares, scale_load(MIN_SHARES), scale_load(MAX_SHARES));
+	shares = clamp(shares, user_to_kernel_load(MIN_SHARES),
+		       user_to_kernel_load(MAX_SHARES));
 
 	mutex_lock(&shares_mutex);
 	if (tg->shares == shares)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3d03956..0a1e972 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -54,21 +54,22 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
  */
 #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
 # define NICE_0_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
-# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
-# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
+# define user_to_kernel_load(w)	((w) << SCHED_RESOLUTION_SHIFT)
+# define kernel_to_user_load(w)	((w) >> SCHED_RESOLUTION_SHIFT)
 #else
 # define NICE_0_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
-# define scale_load(w)		(w)
-# define scale_load_down(w)	(w)
+# define user_to_kernel_load(w)	(w)
+# define kernel_to_user_load(w)	(w)
 #endif
 
 /*
  * Task weight (visible to user) and its load (invisible to user) have
  * independent resolution, but they should be well calibrated. We use
- * scale_load() and scale_load_down(w) to convert between them. The
- * following must be true:
+ * user_to_kernel_load() and kernel_to_user_load(w) to convert between
+ * them. The following must be true:
  *
- *  scale_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * user_to_kernel_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ * kernel_to_user_load(NICE_0_LOAD) == prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]
  *
  */
 #define NICE_0_LOAD		(1L << NICE_0_LOAD_SHIFT)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition
  2015-10-04 17:56 ` [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
@ 2015-10-05  7:04   ` Ingo Molnar
  2015-10-06  8:00     ` Yuyang Du
  2015-10-05 10:52   ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2015-10-05  7:04 UTC (permalink / raw)
  To: Yuyang Du
  Cc: peterz, linux-kernel, pjt, bsegall, morten.rasmussen,
	vincent.guittot, dietmar.eggemann


* Yuyang Du <yuyang.du@intel.com> wrote:

> +# define SCHED_RESOLUTION_SHIFT	10

>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */

Might be worth fixing?

Also, I noticed this:

> -# define SCHED_LOAD_RESOLUTION	10
> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)

So in the #if 0 (inactive) section we change it from 10 to 20 ...

> -# define SCHED_LOAD_RESOLUTION	0
> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
>  # define scale_load(w)		(w)
>  # define scale_load_down(w)	(w)
>  #endif
>  
> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)

... then we change the actually active definition from 20 to 10?

Was that intended?

Please double check the 'make allyesconfig' disassembly of kernel/sched/built-in.o 
before/after this patch to make sure it does not change any code.

( No full allyesconfig build needed: 'make -j16 kernel/sched' should cut down on
  the build time. )

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition
  2015-10-04 17:56 ` [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
  2015-10-05  7:04   ` Ingo Molnar
@ 2015-10-05 10:52   ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2015-10-05 10:52 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, pjt, bsegall, morten.rasmussen,
	vincent.guittot, dietmar.eggemann

On Mon, Oct 05, 2015 at 01:56:56AM +0800, Yuyang Du wrote:
> Metric needs certain resolution to allow detail we can look into,
> which also determines the range of the metric.
> 
> For instance, increasing the resolution of [0, 1] (two levels), one
> can multiply 1024 and get [0..1024] (1025 levels).
> 

>  /*
> + * Integer metrics need certain resolution to allow how much detail we
> + * can look into, e.g., load, load_avg, util_avg, freq, and capacity.
> + * We define a basic resolution constant number, and then formalize
> + * all these metrics based on that basic resolution.
> + */
> +# define SCHED_RESOLUTION_SHIFT	10
> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
> +
> +/*

I find all that most confusing, maybe just refer to fixed point
arithmetic, that's a well defined and well understood concept.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition
  2015-10-05  7:04   ` Ingo Molnar
@ 2015-10-06  8:00     ` Yuyang Du
  2015-10-09 23:04       ` Yuyang Du
  0 siblings, 1 reply; 10+ messages in thread
From: Yuyang Du @ 2015-10-06  8:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: peterz, linux-kernel, pjt, bsegall, morten.rasmussen,
	vincent.guittot, dietmar.eggemann, lizefan

On Mon, Oct 05, 2015 at 09:04:45AM +0200, Ingo Molnar wrote:
> 
> * Yuyang Du <yuyang.du@intel.com> wrote:
> 
> > +# define SCHED_RESOLUTION_SHIFT	10
> 
> >  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> 
> Might be worth fixing?

Yes, it should be. Peter has already brought this up.
 
> Also, I noticed this:
> 
> > -# define SCHED_LOAD_RESOLUTION	10
> > +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> 
> So in the #if 0 (inactive) section we change it from 10 to 20 ...
> 
> > -# define SCHED_LOAD_RESOLUTION	0
> > +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
> >  # define scale_load(w)		(w)
> >  # define scale_load_down(w)	(w)
> >  #endif
> >  
> > -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
> 
> ... then we change the actually active definition from 20 to 10?
> 
> Was that intended?

Yes, it is intended. And it is atually the same question as your previous
one.

This is about whether we have the discrepancy between weight (user can
assign) and load (used by CFS to charge the task).

And this is about the precison of charging vs. the precision of the
proportional runtime (weight) the CFS promises to provide.

As charging happens millions of times, accumulative error is the concern.

The comments said increasing load resolution improves the small weight
task share distribution.

However, as charging is done like runtime / load, runtime is real time
in ns, a big numbeer, so higher/bigger load may lead to more drift?
Because the error of the integer division is about the same as the divider?
 
Or as higher load is used, maybe we fix this (remove the bloody #if 0)
by making it a CONFIG_ ?

> Please double check the 'make allyesconfig' disassembly of kernel/sched/built-in.o 
> before/after this patch to make sure it does not change any code.
> 
> ( No full allyesconfig build needed: 'make -j16 kernel/sched' should cut down on
>   the build time. )

I checked. Unfortunately, before differs from after. But the difference
should not be caused by this patch, as this patch does not make any
difference after macro expansion.

 
> Thanks,
> 
> 	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE
  2015-10-04 17:56 ` [PATCH 2/4] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE Yuyang Du
@ 2015-10-06  9:29   ` Vincent Guittot
  0 siblings, 0 replies; 10+ messages in thread
From: Vincent Guittot @ 2015-10-06  9:29 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Paul Turner,
	Benjamin Segall, Morten Rasmussen, Dietmar Eggemann

On 4 October 2015 at 19:56, Yuyang Du <yuyang.du@intel.com> wrote:
> After cleaning up the sched metrics, these two definitions that cause
> ambiguity are not needed any more. Use NICE_0_LOAD_SHIFT and NICE_0_LOAD
> instead (the names suggest clearly who they are).
>
> Suggested-by: Ben Segall <bsegall@google.com>
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  kernel/sched/fair.c  |  4 ++--
>  kernel/sched/sched.h | 22 +++++++++++-----------
>  2 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c61fd8e..fdb7937 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct sched_entity *se)
>         sa->period_contrib = 1023;
>         sa->load_avg = scale_load_down(se->load.weight);
>         sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> -       sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> +       sa->util_avg = SCHED_CAPACITY_SCALE;
>         sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>         /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>  }
> @@ -6651,7 +6651,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>         if (busiest->group_type == group_overloaded &&
>             local->group_type   == group_overloaded) {
>                 load_above_capacity = busiest->sum_nr_running *
> -                                       SCHED_LOAD_SCALE;
> +                                       SCHED_CAPACITY_SCALE;

load_above_capacity is then compared against load_avg. In patch 3, you
directly use weight instead of scale_down(weight) to compute the
load_avg. It implies that load_above_capacity must also move to the
same range. So you will have to replace SCHED_CAPACITY_SCALE with
NICE_0_LOAD.
This comment applied to patch 3 but it was easier to describe the
issue here with the code than doing that in patch 3 which doesn't have
reference to this code

So you should better use scale_down(NICE_0_LOAD) in this patch and
remove the scale_down in patch 3 to keep only NICE_0_LOAD so you will
be consistent in each patch


>                 if (load_above_capacity > busiest->group_capacity)
>                         load_above_capacity -= busiest->group_capacity;

Here you will also have to move the capacity in the same range than
the load. So in patch 3 you will have to use
scale_load(busiest->group_capacity)

Regards,
Vincent

>                 else
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 31b4022..3d03956 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -53,25 +53,25 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>   * increased costs.
>   */
>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> -# define SCHED_LOAD_SHIFT      (SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> +# define NICE_0_LOAD_SHIFT     (SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
>  # define scale_load(w)         ((w) << SCHED_RESOLUTION_SHIFT)
>  # define scale_load_down(w)    ((w) >> SCHED_RESOLUTION_SHIFT)
>  #else
> -# define SCHED_LOAD_SHIFT      (SCHED_RESOLUTION_SHIFT)
> +# define NICE_0_LOAD_SHIFT     (SCHED_RESOLUTION_SHIFT)
>  # define scale_load(w)         (w)
>  # define scale_load_down(w)    (w)
>  #endif
>
> -#define SCHED_LOAD_SCALE       (1L << SCHED_LOAD_SHIFT)
> -
>  /*
> - * NICE_0's weight (visible to user) and its load (invisible to user) have
> - * independent resolution, but they should be well calibrated. We use scale_load()
> - * and scale_load_down(w) to convert between them, the following must be true:
> - * scale_load(prio_to_weight[20]) == NICE_0_LOAD
> + * Task weight (visible to user) and its load (invisible to user) have
> + * independent resolution, but they should be well calibrated. We use
> + * scale_load() and scale_load_down(w) to convert between them. The
> + * following must be true:
> + *
> + *  scale_load(prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
> + *
>   */
> -#define NICE_0_LOAD            SCHED_LOAD_SCALE
> -#define NICE_0_SHIFT           SCHED_LOAD_SHIFT
> +#define NICE_0_LOAD            (1L << NICE_0_LOAD_SHIFT)
>
>  /*
>   * Single value that decides SCHED_DEADLINE internal math precision.
> @@ -850,7 +850,7 @@ DECLARE_PER_CPU(struct sched_domain *, sd_asym);
>  struct sched_group_capacity {
>         atomic_t ref;
>         /*
> -        * CPU capacity of this group, SCHED_LOAD_SCALE being max capacity
> +        * CPU capacity of this group, SCHED_CAPACITY_SCALE being max capacity
>          * for a single CPU.
>          */
>         unsigned int capacity;
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition
  2015-10-06  8:00     ` Yuyang Du
@ 2015-10-09 23:04       ` Yuyang Du
  0 siblings, 0 replies; 10+ messages in thread
From: Yuyang Du @ 2015-10-09 23:04 UTC (permalink / raw)
  To: Ingo Molnar, peterz
  Cc: peterz, linux-kernel, pjt, bsegall, morten.rasmussen,
	vincent.guittot, dietmar.eggemann, lizefan

On Tue, Oct 06, 2015 at 04:00:00PM +0800, Yuyang Du wrote:
> On Mon, Oct 05, 2015 at 09:04:45AM +0200, Ingo Molnar wrote:
> > 
> > * Yuyang Du <yuyang.du@intel.com> wrote:
> > 
> > > +# define SCHED_RESOLUTION_SHIFT	10
> > 
> > >  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> > 
> > Might be worth fixing?
> 
> Yes, it should be. Peter has already brought this up.
  
Do you have any suggestion on how to fix this?

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-10-10  6:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-04 17:56 [PATCH 0/4] sched/fair: Clean up sched metric definitions Yuyang Du
2015-10-04 17:56 ` [PATCH 1/4] sched/fair: Generalize the load/util averages resolution definition Yuyang Du
2015-10-05  7:04   ` Ingo Molnar
2015-10-06  8:00     ` Yuyang Du
2015-10-09 23:04       ` Yuyang Du
2015-10-05 10:52   ` Peter Zijlstra
2015-10-04 17:56 ` [PATCH 2/4] sched/fair: Remove SCHED_LOAD_SHIFT and SCHED_LOAD_SCALE Yuyang Du
2015-10-06  9:29   ` Vincent Guittot
2015-10-04 17:56 ` [PATCH 3/4] sched/fair: Remove scale_load_down() for load_avg Yuyang Du
2015-10-04 17:56 ` [PATCH 4/4] sched/fair: Rename scale_load() and scale_load_down() Yuyang Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).