linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity
Date: Tue, 2 May 2017 10:19:05 +0200	[thread overview]
Message-ID: <20170502081905.GA4626@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20170501215604.GB19079@htj.duckdns.org>

On Mon, May 01, 2017 at 05:56:04PM -0400, Tejun Heo wrote:

> I still wonder what gcfs_rq se->load_avg.avg is good for tho?  It's
> nice to keep the value in line but is it actually used anywhere?

Its used to manage its parent's cfs_rq sums boundary conditions. When we
enqueue the group se to the parent group, we do:

  enqueue_entity()
    enqueue_entity_load_avg()
      // update runnable
      // add se.avg to cfs_rq.avg if 'migrated'

Now obviously group se's never actually migrate between CPUs, but they
still pass through that code. I think the 'migrate' condition also
triggers for new entities for example, they 'migrate' into the system.

Similar for dequeue, that should trigger when groups entities
get destroyed, they 'migrate' out of the system.

But I get what you're saying. They're not overly useful as such.

> So, if changing gcfs_rq se->load_avg.avg to match the gcfs_rq's
> runnable_load_avg is icky, and I can see why that would be, we can
> simply introduce a separate channel of propagation ...

Can you have a play with something like the below? I suspect
'shares_runnable' might work for you here.

Its still 'icky', but not more so than calc_cfs_shares() was to begin
with. When many CPUs are involved the global sum (tg_shares) is still
dominated by avg_load and the division will have a downward bias, but
given the purpose that might not be too bad.

---
 kernel/sched/fair.c | 62 +++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a903276fcb62..7d1fb5f421bc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2630,26 +2630,56 @@ static inline void account_numa_dequeue(struct rq *rq, struct task_struct *p)
 	cfs_rq->nr_running--;
 }
 
+enum shares_type {
+	shares_runnable,
+	shares_avg,
+	shares_weight,
+};
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 # ifdef CONFIG_SMP
-static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
+static long
+calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg, enum shares_type)
 {
-	long tg_weight, load, shares;
+	long tg_weight, tg_shares, load, shares;
 
-	/*
-	 * This really should be: cfs_rq->avg.load_avg, but instead we use
-	 * cfs_rq->load.weight, which is its upper bound. This helps ramp up
-	 * the shares for small weight interactive tasks.
-	 */
-	load = scale_load_down(cfs_rq->load.weight);
+	tg_shares = READ_ONCE(tg->shares);
+
+	switch (shares_type) {
+	case shares_runnable:
+		/*
+		 * Instead of the correct cfs_rq->avg.load_avg we use
+		 * cfs_rq->runnable_load_avg, which does not include the
+		 * blocked load.
+		 */
+		load = cfs_rq->runnable_load_avg;
+		break;
+
+	case shares_avg:
+		load = cfs_rq->avg.load_avg;
+		break;
+
+	case shares_weight:
+		/*
+		 * Instead of the correct cfs_rq->avg.load_avg we use
+		 * cfs_rq->load.weight, which is its upper bound. This helps
+		 * ramp up the shares for small weight interactive tasks.
+		 */
+		load = scale_load_down(cfs_rq->load.weight);
+		break;
+	}
 
 	tg_weight = atomic_long_read(&tg->load_avg);
 
-	/* Ensure tg_weight >= load */
+	/*
+	 * This ensures the sum is up-to-date for this CPU, in case of the other
+	 * two approximations it biases the sum towards their value and in case
+	 * of (near) UP ensures the division ends up <= 1.
+	 */
 	tg_weight -= cfs_rq->tg_load_avg_contrib;
 	tg_weight += load;
 
-	shares = (tg->shares * load);
+	shares = (tg_shares * load);
 	if (tg_weight)
 		shares /= tg_weight;
 
@@ -2665,15 +2695,11 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
 	 * case no task is runnable on a CPU MIN_SHARES=2 should be returned
 	 * instead of 0.
 	 */
-	if (shares < MIN_SHARES)
-		shares = MIN_SHARES;
-	if (shares > tg->shares)
-		shares = tg->shares;
-
-	return shares;
+	return clamp_t(long, shares, MIN_SHARES, tg_shares);
 }
 # else /* CONFIG_SMP */
-static inline long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
+static inline long
+calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg, enum shares_type)
 {
 	return tg->shares;
 }
@@ -2715,7 +2741,7 @@ static void update_cfs_shares(struct sched_entity *se)
 	if (likely(se->load.weight == tg->shares))
 		return;
 #endif
-	shares = calc_cfs_shares(cfs_rq, tg);
+	shares = calc_cfs_shares(cfs_rq, tg, shares_weight);
 
 	reweight_entity(cfs_rq_of(se), se, shares);
 }

  reply	other threads:[~2017-05-02  8:19 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 20:13 [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is in use Tejun Heo
2017-04-24 20:14 ` [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity Tejun Heo
2017-04-24 21:33   ` [PATCH v2 " Tejun Heo
2017-05-03 18:00     ` Peter Zijlstra
2017-05-03 21:45       ` Tejun Heo
2017-05-04  5:51         ` Peter Zijlstra
2017-05-04  6:21           ` Peter Zijlstra
2017-05-04  9:49             ` Dietmar Eggemann
2017-05-04 10:57               ` Peter Zijlstra
2017-05-04 17:39               ` Tejun Heo
2017-05-05 10:36                 ` Dietmar Eggemann
2017-05-04 10:26       ` Vincent Guittot
2017-04-25  8:35   ` [PATCH " Vincent Guittot
2017-04-25 18:12     ` Tejun Heo
2017-04-26 16:51       ` Vincent Guittot
2017-04-26 22:40         ` Tejun Heo
2017-04-27  7:00           ` Vincent Guittot
2017-05-01 14:17         ` Peter Zijlstra
2017-05-01 14:52           ` Peter Zijlstra
2017-05-01 21:56           ` Tejun Heo
2017-05-02  8:19             ` Peter Zijlstra [this message]
2017-05-02  8:30               ` Peter Zijlstra
2017-05-02 20:00                 ` Tejun Heo
2017-05-03  9:10                   ` Peter Zijlstra
2017-04-26 16:14   ` Vincent Guittot
2017-04-26 22:27     ` Tejun Heo
2017-04-27  8:59       ` Vincent Guittot
2017-04-28 17:46         ` Tejun Heo
2017-05-02  7:20           ` Vincent Guittot
2017-04-24 20:14 ` [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Tejun Heo
2017-04-25  8:46   ` Vincent Guittot
2017-04-25  9:05     ` Vincent Guittot
2017-04-25 12:59       ` Vincent Guittot
2017-04-25 18:49         ` Tejun Heo
2017-04-25 20:49           ` Tejun Heo
2017-04-25 21:15             ` Chris Mason
2017-04-25 21:08           ` Tejun Heo
2017-04-26 10:21             ` Vincent Guittot
2017-04-27  0:30               ` Tejun Heo
2017-04-27  8:28                 ` Vincent Guittot
2017-04-28 16:14                   ` Tejun Heo
2017-05-02  6:56                     ` Vincent Guittot
2017-05-02 20:56                       ` Tejun Heo
2017-05-03  7:25                         ` Vincent Guittot
2017-05-03  7:54                           ` Vincent Guittot
2017-04-26 18:12   ` Vincent Guittot
2017-04-26 22:52     ` Tejun Heo
2017-04-27  8:29       ` Vincent Guittot
2017-04-28 20:33         ` Tejun Heo
2017-04-28 20:38           ` Tejun Heo
2017-05-01 15:56           ` Peter Zijlstra
2017-05-02 22:01             ` Tejun Heo
2017-05-02  7:18           ` Vincent Guittot
2017-05-02 13:26             ` Vincent Guittot
2017-05-02 22:37               ` Tejun Heo
2017-05-02 21:50             ` Tejun Heo
2017-05-03  7:34               ` Vincent Guittot
2017-05-03  9:37                 ` Peter Zijlstra
2017-05-03 10:37                   ` Vincent Guittot
2017-05-03 13:09                     ` Peter Zijlstra
2017-05-03 21:49                       ` Tejun Heo
2017-05-04  8:19                         ` Vincent Guittot
2017-05-04 17:43                           ` Tejun Heo
2017-05-04 19:02                             ` Vincent Guittot
2017-05-04 19:04                               ` Tejun Heo
2017-04-24 21:35 ` [PATCH 3/2] sched/fair: Skip __update_load_avg() on cfs_rq sched_entities Tejun Heo
2017-04-24 21:48   ` Peter Zijlstra
2017-04-24 22:54     ` Tejun Heo
2017-04-25 21:09   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170502081905.GA4626@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=clm@fb.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).