linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] sched/fair: Initiate a new task's util avg to a bounded value
@ 2016-03-29 20:30 Yuyang Du
  2016-03-30 14:54 ` Peter Zijlstra
  2016-03-31  9:29 ` [tip:sched/core] " tip-bot for Yuyang Du
  0 siblings, 2 replies; 3+ messages in thread
From: Yuyang Du @ 2016-03-29 20:30 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: steve.muckle, morten.rasmussen, aryabinin, pjt, bsegall, Yuyang Du

Hi Peter,

It looks like this patch is left last year. I fixed a bug - it
did not init task group's entities.

And I started this new thread to attend to it.

Please refer to the link for the previous version and some data experimented:

http://article.gmane.org/gmane.linux.kernel/2117689

Would you please give it a look?

Thanks,
Yuyang

---
Subject: [PATCH v2] sched/fair: Initiate a new task's util avg to a bounded
 value

A new task's util_avg is set to full utilization of a CPU (100%
time running). This accelerates a new task's utilization ramp-up,
useful to boost its execution in early time. However, it may result
in (insanely) high utilization for a transient time period when a
flood of tasks are spawned. Importantly, it violates the
"fundamentally bounded" CPU utilization, and its side effect is
negative if we don't take any measure to bound it.

This patch proposes an algorithm to address this issue. It has
two methods to approach a sensible initial util_avg:

(1) An expected (or average) util_avg based on its cfs_rq's util_avg:

util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight

(2) A trajectory of how successive new tasks' util develops, which
gives 1/2 of the left utilization budget to a new task such that
the additional util is noticeably large (when overall util is low) or
unnoticeably small (when overall util is high enough). In the meantime,
the aggregate utilization is well bounded:

util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n

where n denotes the nth task.

If util_avg is larger than util_avg_cap, then the effective util is
clamped to the util_avg_cap.

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
---
 kernel/sched/core.c  |  2 ++
 kernel/sched/fair.c  | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h |  1 +
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0b21e7a..49180f6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2413,6 +2413,8 @@ void wake_up_new_task(struct task_struct *p)
 	 */
 	set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
+	/* Post initialize new task's util average when its cfs_rq is set */
+	post_init_entity_util_avg(&p->se);
 
 	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 303d639..3f56c3a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,17 +682,69 @@ void init_entity_runnable_average(struct sched_entity *se)
 	sa->period_contrib = 1023;
 	sa->load_avg = scale_load_down(se->load.weight);
 	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
-	sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
-	sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+	/*
+	 * At this point, util_avg won't be used in select_task_rq_fair anyway
+	 */
+	sa->util_avg = 0;
+	sa->util_sum = 0;
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
 
+/*
+ * With new tasks being created, their initial util_avgs are extrapolated
+ * based on the cfs_rq's current util_avg:
+ *
+ * util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight
+ *
+ * However, in many cases, the above util_avg does not give a desired
+ * value. Moreover, the sum of the util_avgs may be divergent, such
+ * as when the series is a harmonic series.
+ *
+ * To solve this problem, we also cap the util_avg of successive tasks to
+ * only 1/2 of the left utilization budget:
+ *
+ * util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
+ *
+ * where n denotes the nth task.
+ *
+ * For example, a simplest series from the beginning would be like:
+ *
+ *  task  util_avg: 512, 256, 128,  64,  32,   16,    8, ...
+ * cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
+ *
+ * Finally, that extrapolated util_avg is clamped to the cap (util_avg_cap)
+ * if util_avg > util_avg_cap.
+ */
+void post_init_entity_util_avg(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	struct sched_avg *sa = &se->avg;
+	long cap = (long)(scale_load_down(SCHED_LOAD_SCALE) - cfs_rq->avg.util_avg) / 2;
+
+	if (cap > 0) {
+		if (cfs_rq->avg.util_avg != 0) {
+			sa->util_avg = cfs_rq->avg.util_avg * se->load.weight;
+			sa->util_avg /=	 (cfs_rq->avg.load_avg + 1);
+
+			if (sa->util_avg > cap)
+				sa->util_avg = cap;
+		}
+		else {
+			sa->util_avg = cap;
+		}
+		sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+	}
+}
+
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
 static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
 #else
 void init_entity_runnable_average(struct sched_entity *se)
 {
 }
+void post_init_entity_util_avg(struct sched_entity *se)
+{
+}
 #endif
 
 /*
@@ -8358,6 +8410,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 		init_cfs_rq(cfs_rq);
 		init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
 		init_entity_runnable_average(se);
+		post_init_entity_util_avg(se);
 	}
 
 	return 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e6d4a3f..0283235 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1313,6 +1313,7 @@ extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 unsigned long to_ratio(u64 period, u64 runtime);
 
 extern void init_entity_runnable_average(struct sched_entity *se);
+extern void post_init_entity_util_avg(struct sched_entity *se);
 
 #ifdef CONFIG_NO_HZ_FULL
 extern bool sched_can_stop_tick(struct rq *rq);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] sched/fair: Initiate a new task's util avg to a bounded value
  2016-03-29 20:30 [PATCH v2] sched/fair: Initiate a new task's util avg to a bounded value Yuyang Du
@ 2016-03-30 14:54 ` Peter Zijlstra
  2016-03-31  9:29 ` [tip:sched/core] " tip-bot for Yuyang Du
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Zijlstra @ 2016-03-30 14:54 UTC (permalink / raw)
  To: Yuyang Du
  Cc: mingo, linux-kernel, steve.muckle, morten.rasmussen, aryabinin,
	pjt, bsegall

On Wed, Mar 30, 2016 at 04:30:56AM +0800, Yuyang Du wrote:
> Hi Peter,
> 
> It looks like this patch is left last year. I fixed a bug - it
> did not init task group's entities.
> 
> And I started this new thread to attend to it.
> 
> Please refer to the link for the previous version and some data experimented:
> 
> http://article.gmane.org/gmane.linux.kernel/2117689
> 
> Would you please give it a look?

Ah yes, sorry I seem to have totally missed all that, its been a hectic
few months.

Yes, this looks nice, I'll stuff it in, see what happens ;-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [tip:sched/core] sched/fair: Initiate a new task's util avg to a bounded value
  2016-03-29 20:30 [PATCH v2] sched/fair: Initiate a new task's util avg to a bounded value Yuyang Du
  2016-03-30 14:54 ` Peter Zijlstra
@ 2016-03-31  9:29 ` tip-bot for Yuyang Du
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Yuyang Du @ 2016-03-31  9:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, mingo, tglx, peterz, efault, linux-kernel, torvalds,
	yuyang.du, aryabinin

Commit-ID:  2b8c41daba327c633228169e8bd8ec067ab443f8
Gitweb:     http://git.kernel.org/tip/2b8c41daba327c633228169e8bd8ec067ab443f8
Author:     Yuyang Du <yuyang.du@intel.com>
AuthorDate: Wed, 30 Mar 2016 04:30:56 +0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 Mar 2016 10:49:46 +0200

sched/fair: Initiate a new task's util avg to a bounded value

A new task's util_avg is set to full utilization of a CPU (100% time
running). This accelerates a new task's utilization ramp-up, useful to
boost its execution in early time. However, it may result in
(insanely) high utilization for a transient time period when a flood
of tasks are spawned. Importantly, it violates the "fundamentally
bounded" CPU utilization, and its side effect is negative if we don't
take any measure to bound it.

This patch proposes an algorithm to address this issue. It has
two methods to approach a sensible initial util_avg:

(1) An expected (or average) util_avg based on its cfs_rq's util_avg:

  util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight

(2) A trajectory of how successive new tasks' util develops, which
gives 1/2 of the left utilization budget to a new task such that
the additional util is noticeably large (when overall util is low) or
unnoticeably small (when overall util is high enough). In the meantime,
the aggregate utilization is well bounded:

  util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n

where n denotes the nth task.

If util_avg is larger than util_avg_cap, then the effective util is
clamped to the util_avg_cap.

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bsegall@google.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Link: http://lkml.kernel.org/r/1459283456-21682-1-git-send-email-yuyang.du@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  |  2 ++
 kernel/sched/fair.c  | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h |  1 +
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b5cf01d..1159423 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2431,6 +2431,8 @@ void wake_up_new_task(struct task_struct *p)
 	 */
 	set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
+	/* Post initialize new task's util average when its cfs_rq is set */
+	post_init_entity_util_avg(&p->se);
 
 	rq = __task_rq_lock(p);
 	activate_task(rq, p, 0);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4bb5ace..b8cc1c3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -682,17 +682,68 @@ void init_entity_runnable_average(struct sched_entity *se)
 	sa->period_contrib = 1023;
 	sa->load_avg = scale_load_down(se->load.weight);
 	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
-	sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
-	sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+	/*
+	 * At this point, util_avg won't be used in select_task_rq_fair anyway
+	 */
+	sa->util_avg = 0;
+	sa->util_sum = 0;
 	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
 }
 
+/*
+ * With new tasks being created, their initial util_avgs are extrapolated
+ * based on the cfs_rq's current util_avg:
+ *
+ *   util_avg = cfs_rq->util_avg / (cfs_rq->load_avg + 1) * se.load.weight
+ *
+ * However, in many cases, the above util_avg does not give a desired
+ * value. Moreover, the sum of the util_avgs may be divergent, such
+ * as when the series is a harmonic series.
+ *
+ * To solve this problem, we also cap the util_avg of successive tasks to
+ * only 1/2 of the left utilization budget:
+ *
+ *   util_avg_cap = (1024 - cfs_rq->avg.util_avg) / 2^n
+ *
+ * where n denotes the nth task.
+ *
+ * For example, a simplest series from the beginning would be like:
+ *
+ *  task  util_avg: 512, 256, 128,  64,  32,   16,    8, ...
+ * cfs_rq util_avg: 512, 768, 896, 960, 992, 1008, 1016, ...
+ *
+ * Finally, that extrapolated util_avg is clamped to the cap (util_avg_cap)
+ * if util_avg > util_avg_cap.
+ */
+void post_init_entity_util_avg(struct sched_entity *se)
+{
+	struct cfs_rq *cfs_rq = cfs_rq_of(se);
+	struct sched_avg *sa = &se->avg;
+	long cap = (long)(scale_load_down(SCHED_LOAD_SCALE) - cfs_rq->avg.util_avg) / 2;
+
+	if (cap > 0) {
+		if (cfs_rq->avg.util_avg != 0) {
+			sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
+			sa->util_avg /= (cfs_rq->avg.load_avg + 1);
+
+			if (sa->util_avg > cap)
+				sa->util_avg = cap;
+		} else {
+			sa->util_avg = cap;
+		}
+		sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
+	}
+}
+
 static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
 static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
 #else
 void init_entity_runnable_average(struct sched_entity *se)
 {
 }
+void post_init_entity_util_avg(struct sched_entity *se)
+{
+}
 #endif
 
 /*
@@ -8384,6 +8435,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
 		init_cfs_rq(cfs_rq);
 		init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
 		init_entity_runnable_average(se);
+		post_init_entity_util_avg(se);
 	}
 
 	return 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d2..a7cbad7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1313,6 +1313,7 @@ extern void init_dl_task_timer(struct sched_dl_entity *dl_se);
 unsigned long to_ratio(u64 period, u64 runtime);
 
 extern void init_entity_runnable_average(struct sched_entity *se);
+extern void post_init_entity_util_avg(struct sched_entity *se);
 
 #ifdef CONFIG_NO_HZ_FULL
 extern bool sched_can_stop_tick(struct rq *rq);

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-03-31  9:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-29 20:30 [PATCH v2] sched/fair: Initiate a new task's util avg to a bounded value Yuyang Du
2016-03-30 14:54 ` Peter Zijlstra
2016-03-31  9:29 ` [tip:sched/core] " tip-bot for Yuyang Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).