linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: josef@toxicpanda.com
To: mingo@redhat.com, peterz@infradead.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Cc: Josef Bacik <jbacik@fb.com>
Subject: [RFC][PATCH] sched: attach extra runtime to the right avg
Date: Thu, 29 Jun 2017 21:56:06 -0400	[thread overview]
Message-ID: <1498787766-9593-1-git-send-email-jbacik@fb.com> (raw)

From: Josef Bacik <jbacik@fb.com>

We only track the load avg of a se in 1024 ns chunks, so in order to
make up for the loss of the < 1024 ns part of a run/sleep delta we only
add the time we processed to the se->avg.last_update_time.  The problem
is there is no way to know if this extra time was while we were asleep
or while we were running.  Instead keep track of the remainder and apply
it in the appropriate place.  If the remainder was while we were
running, add it to the delta the next time we update the load avg while
running, and the same for sleeping.  This (coupled with other fixes)
mostly fixes the regression to my workload introduced by Peter's
experimental runnable load propagation patches.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 include/linux/sched.h |  2 ++
 kernel/sched/fair.c   | 21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b69fc6..ed147d4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -316,6 +316,8 @@ struct sched_avg {
 	u64				load_sum;
 	u32				util_sum;
 	u32				period_contrib;
+	u32				sleep_remainder;
+	u32				run_remainder;
 	unsigned long			load_avg;
 	unsigned long			util_avg;
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c77e4b1..573bdcd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -813,6 +813,8 @@ void post_init_entity_util_avg(struct sched_entity *se)
 			 * expected state.
 			 */
 			se->avg.last_update_time = cfs_rq_clock_task(cfs_rq);
+			se->avg.sleep_remainder = 0;
+			se->avg.run_remainder = 0;
 			return;
 		}
 	}
@@ -2882,14 +2884,19 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		  unsigned long weight, int running, struct cfs_rq *cfs_rq)
 {
 	u64 delta;
+	u32 remainder = running ? sa->run_remainder : sa->sleep_remainder;
 
-	delta = now - sa->last_update_time;
+	delta = now - sa->last_update_time + remainder;
 	/*
 	 * This should only happen when time goes backwards, which it
 	 * unfortunately does during sched clock init when we swap over to TSC.
 	 */
 	if ((s64)delta < 0) {
 		sa->last_update_time = now;
+		if (running)
+			sa->run_remainder = 0;
+		else
+			sa->sleep_remainder = 0;
 		return 0;
 	}
 
@@ -2897,12 +2904,16 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	 * Use 1024ns as the unit of measurement since it's a reasonable
 	 * approximation of 1us and fast to compute.
 	 */
+	remainder = delta & (1023UL);
+	sa->last_update_time = now;
+	if (running)
+		sa->run_remainder = remainder;
+	else
+		sa->sleep_remainder = remainder;
 	delta >>= 10;
 	if (!delta)
 		return 0;
 
-	sa->last_update_time += delta << 10;
-
 	/*
 	 * Now we know we crossed measurement unit boundaries. The *_avg
 	 * accrues by two steps:
@@ -6102,6 +6113,8 @@ static void migrate_task_rq_fair(struct task_struct *p)
 
 	/* Tell new CPU we are migrated */
 	p->se.avg.last_update_time = 0;
+	p->se.avg.sleep_remainder = 0;
+	p->se.avg.run_remainder = 0;
 
 	/* We have migrated, no longer consider this task hot */
 	p->se.exec_start = 0;
@@ -9259,6 +9272,8 @@ static void task_move_group_fair(struct task_struct *p)
 #ifdef CONFIG_SMP
 	/* Tell se's cfs_rq has been changed -- migrated */
 	p->se.avg.last_update_time = 0;
+	p->se.avg.sleep_remainder = 0;
+	p->se.avg.run_remainder = 0;
 #endif
 	attach_task_cfs_rq(p);
 }
-- 
2.7.4

             reply	other threads:[~2017-06-30  1:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30  1:56 josef [this message]
2017-07-02  9:37 ` [RFC][PATCH] sched: attach extra runtime to the right avg Ingo Molnar
2017-07-03 13:30   ` Josef Bacik
2017-07-04  9:41   ` Peter Zijlstra
2017-07-04 10:13     ` Ingo Molnar
2017-07-04 12:21       ` Peter Zijlstra
2017-07-04 12:40         ` Peter Zijlstra
2017-07-04 12:47           ` Josef Bacik
2017-07-04 13:51           ` Josef Bacik
2017-07-04 14:28             ` Peter Zijlstra
2017-07-03  7:26 ` Vincent Guittot
2017-07-03 13:41   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1498787766-9593-1-git-send-email-jbacik@fb.com \
    --to=josef@toxicpanda.com \
    --cc=jbacik@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).