From: Oliver Sang <oliver.sang@intel.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
Ingo Molnar <mingo@kernel.org>, Ben Segall <bsegall@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Mel Gorman <mgorman@suse.de>, Mike Galbraith <efault@gmx.de>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
LKML <linux-kernel@vger.kernel.org>,
OTC LSE PnP <otc.lse.pnp@intel.com>,
lkp@lists.01.org
Subject: Re: [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression
Date: Mon, 18 May 2020 15:00:46 +0800 [thread overview]
Message-ID: <20200518070046.GA23559@xsang-OptiPlex-9020> (raw)
In-Reply-To: <20200515141226.17700-1-hdanton@sina.com>
On Fri, May 15, 2020 at 10:12:26PM +0800, Hillf Danton wrote:
>
> On Fri, 15 May 2020 09:43:39 +0800 Oliver Sang wrote:
> > On Thu, May 14, 2020 at 07:09:35PM +0200, Vincent Guittot wrote:
> > > Hi Oliver,
> > >
> > > On Thu, 14 May 2020 at 16:05, kernel test robot <oliver.sang@intel.com> wrote:
> > > >
> > > > Hi Vincent Guittot,
> > > >
> > > > Below report FYI.
> > > > Last year, we actually reported an improvement "[sched/fair] 0b0695f2b3:
> > > > vm-scalability.median 3.1% improvement" on link [1].
> > > > but now we found the regression on pts.compress-gzip.
> > > > This seems align with what showed in "[v4,00/10] sched/fair: rework the CFS
> > > > load balance" (link [2]), where showed the reworked load balance could have
> > > > both positive and negative effect for different test suites.
> > >
> > > We have tried to run all possible use cases but it's impossible to
> > > covers all so there were a possibility that one that is not covered,
> > > would regressed.
> > >
> > > > And also from link [3], the patch set risks regressions.
> > > >
> > > > We also confirmed this regression on another platform
> > > > (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 8G memory),
> > > > below is the data (lower is better).
> > > > v5.4 4.1
> > > > fcf0553db6f4c79387864f6e4ab4a891601f395e 4.01
> > > > 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad912 4.89
> > > > v5.5 5.18
> > > > v5.6 4.62
> > > > v5.7-rc2 4.53
> > > > v5.7-rc3 4.59
> > > >
> > > > It seems there are some recovery on latest kernels, but not fully back.
>
> Hi
>
> Here is a tiny diff for growing balance in the over loaded case. Wish it's
> likely going to help you spot the factors behind the regression.
Thanks Hillf!
just wondering what's the target release of below patch?
>
> Hillf
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8683,15 +8683,12 @@ find_idlest_group(struct sched_domain *s
> struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
> struct sg_lb_stats local_sgs, tmp_sgs;
> struct sg_lb_stats *sgs;
> - unsigned long imbalance;
> + unsigned long hal, lal;
> struct sg_lb_stats idlest_sgs = {
> .avg_load = UINT_MAX,
> .group_type = group_overloaded,
> };
>
> - imbalance = scale_load_down(NICE_0_LOAD) *
> - (sd->imbalance_pct-100) / 100;
> -
> do {
> int local_group;
>
> @@ -8744,31 +8741,26 @@ find_idlest_group(struct sched_domain *s
>
> switch (local_sgs.group_type) {
> case group_overloaded:
> - case group_fully_busy:
> - /*
> - * When comparing groups across NUMA domains, it's possible for
> - * the local domain to be very lightly loaded relative to the
> - * remote domains but "imbalance" skews the comparison making
> - * remote CPUs look much more favourable. When considering
> - * cross-domain, add imbalance to the load on the remote node
> - * and consider staying local.
> - */
> -
> - if ((sd->flags & SD_NUMA) &&
> - ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load))
> - return NULL;
> + if (idlest_sgs.avg_load < local_sgs.avg_load) {
> + hal = local_sgs.avg_load;
> + lal = idlest_sgs.avg_load;
> + } else {
> + lal = local_sgs.avg_load; /* low avg load */
> + hal = idlest_sgs.avg_load; /* high avg load */
> + }
>
> - /*
> - * If the local group is less loaded than the selected
> - * idlest group don't try and push any tasks.
> - */
> - if (idlest_sgs.avg_load >= (local_sgs.avg_load + imbalance))
> + /* No push if groups are balanced in terms of load */
> + if (100 * hal <= sd->imbalance_pct * lal)
> return NULL;
>
> - if (100 * local_sgs.avg_load <= sd->imbalance_pct * idlest_sgs.avg_load)
> + /* No push if it only grows imbalance */
> + if (hal == idlest_sgs.avg_load)
> return NULL;
> break;
>
> + case group_fully_busy:
> + /* No push because groups are unusually balanced */
> + return NULL;
> case group_imbalanced:
> case group_asym_packing:
> /* Those type are not used in the slow wakeup path */
> --
>
> > > > We were just wondering whether you could share some lights the further works
> > > > on the load balance after patch set [2] which could cause the performance
> > > > change?
> > > > And whether you have plan to refine the load balance algorithm further?
> > >
> > > I'm going to have a look at your regression to understand what is
> > > going wrong and how it can be fixed
> >
> > Thanks a lot!
> >
> > >
> > > Thanks
> > > Vincent
> > >
> >
>
next prev parent reply other threads:[~2020-05-18 6:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-14 14:15 [sched/fair] 0b0695f2b3: phoronix-test-suite.compress-gzip.0.seconds 19.8% regression kernel test robot
2020-05-14 17:09 ` Vincent Guittot
2020-05-15 1:43 ` Oliver Sang
[not found] ` <20200515141226.17700-1-hdanton@sina.com>
2020-05-18 7:00 ` Oliver Sang [this message]
2020-05-20 13:04 ` Vincent Guittot
2020-05-21 8:38 ` Oliver Sang
2020-05-25 8:02 ` Vincent Guittot
2020-05-29 17:26 ` Vincent Guittot
2020-06-02 5:23 ` Oliver Sang
2020-06-02 14:23 ` Oliver Sang
2020-06-03 17:06 ` Vincent Guittot
2020-06-04 8:56 ` Mel Gorman
2020-06-05 7:06 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200518070046.GA23559@xsang-OptiPlex-9020 \
--to=oliver.sang@intel.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=efault@gmx.de \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@lists.01.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=otc.lse.pnp@intel.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).