linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>, Todd Kjos <tkjos@google.com>
Subject: [PATCH 2/2] sched/fair: util_est: add running_sum tracking
Date: Mon,  4 Jun 2018 17:06:00 +0100	[thread overview]
Message-ID: <20180604160600.22052-3-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20180604160600.22052-1-patrick.bellasi@arm.com>

The estimated utilization of a task is affected by the task being
preempted, either by another FAIR task of by a task of an higher
priority class (i.e. RT or DL). Indeed, when a preemption happens, the
PELT utilization of the preempted task is going to be decayed a bit.
That's actually correct for utilization, which goal is to measure the
actual CPU bandwidth consumed by a task.

However, the above behavior does not allow to know exactly what is the
utilization a task "would have used" if it was running without
being preempted. Thus, this reduces the effectiveness of util_est for a
task because it does not always allow to predict how much CPU a task is
likely to require.

Let's improve the estimated utilization by adding a new "sort-of" PELT
signal, explicitly only for SE which has the following behavior:
 a) at each enqueue time of a task, its value is the (already decayed)
    util_avg of the task being enqueued
 b) it's updated at each update_load_avg
 c) it can just increase, whenever the task is actually RUNNING on a
    CPU, while it's kept stable while the task is RUNNANBLE but not
    actively consuming CPU bandwidth

Such a defined signal is exactly equivalent to the util_avg for a task
running alone on a CPU while, in case the task is preempted, it allows
to know at dequeue time how much would have been the task utilization if
it was running alone on that CPU.

This new signal is named "running_avg", since it tracks the actual
RUNNING time of a task by ignoring any form of preemption.

>From an implementation standpoint, since the sched_avg should fit into a
single cache line, we save space by tracking only a new runnable sum:
   p->se.avg.running_sum
while the conversion into a running_avg is done on demand whenever we
need it, which is at task dequeue time when a new util_est sample has to
be collected.

The conversion from "running_sum" to "running_avg" is done by performing
a single division by LOAD_AVG_MAX, which introduces a small error since
in the division we do not consider the (sa->period_contrib - 1024)
compensation factor used in ___update_load_avg(). However:
 a) this error is expected to be limited (~2-3%)
 b) it can be safely ignored since the estimated utilization is the only
    consumer which is already subject to small estimation errors

The additional corresponding benefit is that, at run-time, we pay the
cost for a additional sum and multiply, while the more expensive
division is required only at dequeue time.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
 include/linux/sched.h |  1 +
 kernel/sched/fair.c   | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9d8732dab264..2bd5f1c68da9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -399,6 +399,7 @@ struct sched_avg {
 	u64				load_sum;
 	u64				runnable_load_sum;
 	u32				util_sum;
+	u32				running_sum;
 	u32				period_contrib;
 	unsigned long			load_avg;
 	unsigned long			runnable_load_avg;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f74441be3f44..5d54d6a4c31f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3161,6 +3161,8 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
 		sa->runnable_load_sum =
 			decay_load(sa->runnable_load_sum, periods);
 		sa->util_sum = decay_load((u64)(sa->util_sum), periods);
+		if (running)
+			sa->running_sum = decay_load(sa->running_sum, periods);
 
 		/*
 		 * Step 2
@@ -3176,8 +3178,10 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
 		sa->load_sum += load * contrib;
 	if (runnable)
 		sa->runnable_load_sum += runnable * contrib;
-	if (running)
+	if (running) {
 		sa->util_sum += contrib * scale_cpu;
+		sa->running_sum += contrib * scale_cpu;
+	}
 
 	return periods;
 }
@@ -3963,6 +3967,12 @@ static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
 	WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
 }
 
+static inline void util_est_enqueue_running(struct task_struct *p)
+{
+	/* Initilize the (non-preempted) utilization */
+	p->se.avg.running_sum = p->se.avg.util_sum;
+}
+
 /*
  * Check if a (signed) value is within a specified (unsigned) margin,
  * based on the observation that:
@@ -4018,7 +4028,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 	 * Skip update of task's estimated utilization when its EWMA is
 	 * already ~1% close to its last activation value.
 	 */
-	ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
+	ue.enqueued = p->se.avg.running_sum / LOAD_AVG_MAX;
 	last_ewma_diff = ue.enqueued - ue.ewma;
 	if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
 		return;
@@ -5437,6 +5447,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	if (!se)
 		add_nr_running(rq, 1);
 
+	util_est_enqueue_running(p);
+
 	hrtick_update(rq);
 }
 
-- 
2.15.1

  parent reply	other threads:[~2018-06-04 16:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-04 16:05 [PATCH 0/2] Improve estimated utilization of preempted FAIR tasks Patrick Bellasi
2018-06-04 16:05 ` [PATCH 1/2] sched/fair: pelt: use u32 for util_avg Patrick Bellasi
2018-06-05  1:30   ` kbuild test robot
2018-06-05  1:34   ` kbuild test robot
2018-06-04 16:06 ` Patrick Bellasi [this message]
2018-06-04 17:46   ` [PATCH 2/2] sched/fair: util_est: add running_sum tracking Joel Fernandes
2018-06-05 15:21     ` Patrick Bellasi
2018-06-05 19:33       ` Joel Fernandes
2018-06-05 19:43         ` Joel Fernandes
2018-06-05  1:29   ` kbuild test robot
2018-06-05  6:57   ` Vincent Guittot
2018-06-05 15:11     ` Patrick Bellasi
2018-06-05 15:31       ` Juri Lelli
2018-06-05 16:54         ` Patrick Bellasi
2018-06-05 20:46           ` Joel Fernandes
2018-06-05 23:15             ` Saravana Kannan
2018-06-06  8:26       ` Vincent Guittot
2018-06-06 10:38         ` Patrick Bellasi
2018-06-05 10:46   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180604160600.22052-3-patrick.bellasi@arm.com \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).