linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Paul Turner <pjt@google.com>,
	Chris Mason <clm@fb.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg
Date: Tue, 2 May 2017 15:26:12 +0200	[thread overview]
Message-ID: <20170502132612.GA30485@linaro.org> (raw)
In-Reply-To: <CAKfTPtB3PectZVm_z6NhSszXwG-G=y-J1Y5Rxk48bLxHQ8mx4g@mail.gmail.com>

Hi Tejun,

Le Tuesday 02 May 2017 à 09:18:53 (+0200), Vincent Guittot a écrit :
> On 28 April 2017 at 22:33, Tejun Heo <tj@kernel.org> wrote:
> > Hello, Vincent.
> >
> > On Thu, Apr 27, 2017 at 10:29:10AM +0200, Vincent Guittot wrote:
> >> On 27 April 2017 at 00:52, Tejun Heo <tj@kernel.org> wrote:
> >> > Hello,
> >> >
> >> > On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
> >> >> On 24 April 2017 at 22:14, Tejun Heo <tj@kernel.org> wrote:
> >> >> Can the problem be on the load balance side instead ?  and more
> >> >> precisely in the wakeup path ?
> >> >> After looking at the trace, it seems that task placement happens at
> >> >> wake up path and if it fails to select the right idle cpu at wake up,
> >> >> you will have to wait for a load balance which is alreayd too late
> >> >
> >> > Oh, I was tracing most of scheduler activities and the ratios of
> >> > wakeups picking idle CPUs were about the same regardless of cgroup
> >> > membership.  I can confidently say that the latency issue that I'm
> >> > seeing is from load balancer picking the wrong busiest CPU, which is
> >> > not to say that there can be other problems.
> >>
> >> ok. Is there any trace that you can share ? your behavior seems
> >> different of mine
> >
> >

[ snip]

> > You can notice that B's pertask weight is 4.409 which is way higher
> > than A's 2.779, and this is from Q014-asdf's contribution to Q014-/ is
> > twice as high as it should be.  The root queue's runnable avg should
> 
> Are you sure that this is because of blocked load in group A ? it can
> be that Q014-asdf has already have to wait before running and its load
> still increase while runnable but not running .
> IIUC your trace, group A has 2 running tasks and group B only one but
> load_balance selects B because of its sgs->avg_load being higher. But
> this can also happen even if runnable_load_avg of child cfs_rq was
> propagated correctly in group entity because we can have situation
> where a group A has only 1 task with higher load than 2 tasks on
> groupB and even if blocked load is not taken into account, and
> load_balance will select A.
> 
> IMHO, we should better improve load balance selection. I'm going to
> add smarter group selection in load_balance. that's something we
> should have already done but it was difficult without load/util_avg
> propagation. it should be doable now

Could you test the patch in load_balance below ?
If group is not overloaded which means that threads have all runtime they
want, we select the cfs_rq according to the number of running threads instead

---
 kernel/sched/fair.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a903276..87e3b77 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7069,7 +7069,8 @@ static unsigned long task_h_load(struct task_struct *p)
 /********** Helpers for find_busiest_group ************************/
 
 enum group_type {
-	group_other = 0,
+	group_idle = 0,
+	group_other,
 	group_imbalanced,
 	group_overloaded,
 };
@@ -7383,6 +7384,9 @@ group_type group_classify(struct sched_group *group,
 	if (sgs->group_no_capacity)
 		return group_overloaded;
 
+	if (!sgs->sum_nr_running)
+		return group_idle;
+
 	if (sg_imbalanced(group))
 		return group_imbalanced;
 
@@ -7476,8 +7480,19 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (sgs->group_type < busiest->group_type)
 		return false;
 
-	if (sgs->avg_load <= busiest->avg_load)
+	if (sgs->group_type == group_other) {
+		/*
+		 * The groups are not overloaded so there is enough cpu time
+		 * for all threads. In this case, takes the group with the
+		 * highest number of tasks per CPU in order to improve
+		 * scheduling latency
+		 */
+		if ((sgs->sum_nr_running * busiest->group_weight) <=
+				(busiest->sum_nr_running * sgs->group_weight))
+			return false;
+	} if (sgs->avg_load <= busiest->avg_load) {
 		return false;
+	}
 
 	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
 		goto asym_packing;
@@ -7969,6 +7984,9 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 		    !check_cpu_capacity(rq, env->sd))
 			continue;
 
+		if (!rq->cfs.h_nr_running)
+			continue;
+
 		/*
 		 * For the load comparisons with the other cpu's, consider
 		 * the weighted_cpuload() scaled with the cpu capacity, so
-- 
2.7.4


> 
> > only contain what's currently active but because we're scaling load
> > avg which includes both active and blocked, we're ending up picking
> > group B over A.
> >

  reply	other threads:[~2017-05-02 13:26 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 20:13 [RFC PATCHSET] sched/fair: fix load balancer behavior when cgroup is in use Tejun Heo
2017-04-24 20:14 ` [PATCH 1/2] sched/fair: Fix how load gets propagated from cfs_rq to its sched_entity Tejun Heo
2017-04-24 21:33   ` [PATCH v2 " Tejun Heo
2017-05-03 18:00     ` Peter Zijlstra
2017-05-03 21:45       ` Tejun Heo
2017-05-04  5:51         ` Peter Zijlstra
2017-05-04  6:21           ` Peter Zijlstra
2017-05-04  9:49             ` Dietmar Eggemann
2017-05-04 10:57               ` Peter Zijlstra
2017-05-04 17:39               ` Tejun Heo
2017-05-05 10:36                 ` Dietmar Eggemann
2017-05-04 10:26       ` Vincent Guittot
2017-04-25  8:35   ` [PATCH " Vincent Guittot
2017-04-25 18:12     ` Tejun Heo
2017-04-26 16:51       ` Vincent Guittot
2017-04-26 22:40         ` Tejun Heo
2017-04-27  7:00           ` Vincent Guittot
2017-05-01 14:17         ` Peter Zijlstra
2017-05-01 14:52           ` Peter Zijlstra
2017-05-01 21:56           ` Tejun Heo
2017-05-02  8:19             ` Peter Zijlstra
2017-05-02  8:30               ` Peter Zijlstra
2017-05-02 20:00                 ` Tejun Heo
2017-05-03  9:10                   ` Peter Zijlstra
2017-04-26 16:14   ` Vincent Guittot
2017-04-26 22:27     ` Tejun Heo
2017-04-27  8:59       ` Vincent Guittot
2017-04-28 17:46         ` Tejun Heo
2017-05-02  7:20           ` Vincent Guittot
2017-04-24 20:14 ` [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Tejun Heo
2017-04-25  8:46   ` Vincent Guittot
2017-04-25  9:05     ` Vincent Guittot
2017-04-25 12:59       ` Vincent Guittot
2017-04-25 18:49         ` Tejun Heo
2017-04-25 20:49           ` Tejun Heo
2017-04-25 21:15             ` Chris Mason
2017-04-25 21:08           ` Tejun Heo
2017-04-26 10:21             ` Vincent Guittot
2017-04-27  0:30               ` Tejun Heo
2017-04-27  8:28                 ` Vincent Guittot
2017-04-28 16:14                   ` Tejun Heo
2017-05-02  6:56                     ` Vincent Guittot
2017-05-02 20:56                       ` Tejun Heo
2017-05-03  7:25                         ` Vincent Guittot
2017-05-03  7:54                           ` Vincent Guittot
2017-04-26 18:12   ` Vincent Guittot
2017-04-26 22:52     ` Tejun Heo
2017-04-27  8:29       ` Vincent Guittot
2017-04-28 20:33         ` Tejun Heo
2017-04-28 20:38           ` Tejun Heo
2017-05-01 15:56           ` Peter Zijlstra
2017-05-02 22:01             ` Tejun Heo
2017-05-02  7:18           ` Vincent Guittot
2017-05-02 13:26             ` Vincent Guittot [this message]
2017-05-02 22:37               ` Tejun Heo
2017-05-02 21:50             ` Tejun Heo
2017-05-03  7:34               ` Vincent Guittot
2017-05-03  9:37                 ` Peter Zijlstra
2017-05-03 10:37                   ` Vincent Guittot
2017-05-03 13:09                     ` Peter Zijlstra
2017-05-03 21:49                       ` Tejun Heo
2017-05-04  8:19                         ` Vincent Guittot
2017-05-04 17:43                           ` Tejun Heo
2017-05-04 19:02                             ` Vincent Guittot
2017-05-04 19:04                               ` Tejun Heo
2017-04-24 21:35 ` [PATCH 3/2] sched/fair: Skip __update_load_avg() on cfs_rq sched_entities Tejun Heo
2017-04-24 21:48   ` Peter Zijlstra
2017-04-24 22:54     ` Tejun Heo
2017-04-25 21:09   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170502132612.GA30485@linaro.org \
    --to=vincent.guittot@linaro.org \
    --cc=clm@fb.com \
    --cc=efault@gmx.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).