linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Hillf Danton <hdanton@sina.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	kernel test robot <rong.a.chen@intel.com>,
	Tao Zhou <ouwen210@hotmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [LKP] [sched/fair] 6c8116c914: stress-ng.mmapfork.ops_per_sec -38.0% regression
Date: Mon, 27 Apr 2020 14:46:07 +0200	[thread overview]
Message-ID: <CAKfTPtABxPBmS6=qn96=7X5vfF0ae15M+RAiduH0sb11+gyKew@mail.gmail.com> (raw)
In-Reply-To: <20200427113533.4688-1-hdanton@sina.com>

On Mon, 27 Apr 2020 at 13:35, Hillf Danton <hdanton@sina.com> wrote:
>
>
> On Mon, 27 Apr 2020 11:03:58 +0200 Vincent Guittot wrote:
> > On Sun, 26 Apr 2020 at 14:42, Hillf Danton wrote:
> > >
> > > On 4/21/2020 8:47 AM, kernel test robot wrote:
> > > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a 56.4% improvement of stress-ng.fifo.ops_per_sec due to commit:
> > > >
> > > >
> > > > commit: 6c8116c914b65be5e4d6f66d69c8142eb0648c22 ("sched/fair: Fix condition of avg_load calculation")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > in testcase: stress-ng
> > > > on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
> > > > with following parameters:
> > > >
> > > >     nr_threads: 100%
> > > >     disk: 1HDD
> > > >     testtime: 1s
> > > >     class: scheduler
> > > >     cpufreq_governor: performance
> > > >     ucode: 0xb000038
> > > >     sc_pid_max: 4194304
> > > >
> > >
> > > We need to handle group_fully_busy in a different way from
> > > group_overloaded as task push does not help grow load balance
> > > in the former case.
> >
> > Have you tested this patch for the UC above ? Do you have figures ?
> >
> No I am looking for a box of 88 threads. Likely to get access to it in
> as early as three weeks.
>
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8744,30 +8744,20 @@ find_idlest_group(struct sched_domain *s
> > >
> > >         switch (local_sgs.group_type) {
> > >         case group_overloaded:
> > > -       case group_fully_busy:
> > > -               /*
> > > -                * When comparing groups across NUMA domains, it's possible for
> > > -                * the local domain to be very lightly loaded relative to the
> > > -                * remote domains but "imbalance" skews the comparison making
> > > -                * remote CPUs look much more favourable. When considering
> > > -                * cross-domain, add imbalance to the load on the remote node
> > > -                * and consider staying local.
> > > -                */
> > > -
> > > -               if ((sd->flags & SD_NUMA) &&
> > > -                   ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load))
> > > +               if (100 * local_sgs.avg_load <= sd->imbalance_pct * (idlest_sgs.avg_load + imbalance))
> > > +                       return idlest;
> >
> > So you have completely removed the NUMA special case without explaining why.
> >
> That was for the local domain that is lightly loaded, as the comment says,
> it now is overloaded.

The load value is not linked to the overloaded state of the group as
you can be overloaded but still have a low load especially with cgroup

That's also why there are 2 type of comparison:
an absolute comparison for low load value
and a proportional comparison for normal/high value

>
> > And you have also removed the tests for small load.
> >
> It is a heuristic I want to avoid. It can be replaced with the load of the
> task in question as best effort.
>
> > Could you explain the rationale behind all these changes ?
> >
> > Also keep in mind that the current version provide +58% improvement
> > for  stress-ng.fifo
> >
> Yes it's great. I'm on the minor one.
>
> > > +               if (local_sgs.avg_load > idlest_sgs.avg_load + imbalance)
> > > +                       return idlest;
> > > +               else
> > >                         return NULL;
> > >
> > > +       case group_fully_busy:
> > >                 /*
> > > -                * If the local group is less loaded than the selected
> > > -                * idlest group don't try and push any tasks.
> > > +                * Pushing task to the idlest group will make the target group
> > > +                * overloaded, leaving the local group that is overloaded fully busy,
> > > +                * thus we earn nothing except for the exchange of group types.
> >
> > For this case both local and idlest are fully busy and in this case
> > one will become overloaded so you must compare the load to be fair in
> > the spread of load
> >
> It may be sooner than thought that the newly overloaded group is looking to
> push task out, and we'll see a task ping-pong if it happens.
>
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8683,15 +8683,11 @@ find_idlest_group(struct sched_domain *s
>         struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
>         struct sg_lb_stats local_sgs, tmp_sgs;
>         struct sg_lb_stats *sgs;
> -       unsigned long imbalance;
>         struct sg_lb_stats idlest_sgs = {
>                         .avg_load = UINT_MAX,
>                         .group_type = group_overloaded,
>         };
>
> -       imbalance = scale_load_down(NICE_0_LOAD) *
> -                               (sd->imbalance_pct-100) / 100;
> -
>         do {
>                 int local_group;
>
> @@ -8743,31 +8739,26 @@ find_idlest_group(struct sched_domain *s
>                 return idlest;
>
>         switch (local_sgs.group_type) {
> -       case group_overloaded:
>         case group_fully_busy:
> -               /*
> -                * When comparing groups across NUMA domains, it's possible for
> -                * the local domain to be very lightly loaded relative to the
> -                * remote domains but "imbalance" skews the comparison making
> -                * remote CPUs look much more favourable. When considering
> -                * cross-domain, add imbalance to the load on the remote node
> -                * and consider staying local.
> -                */
> -
> -               if ((sd->flags & SD_NUMA) &&
> -                   ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load))
> -                       return NULL;
> -
> -               /*
> -                * If the local group is less loaded than the selected
> -                * idlest group don't try and push any tasks.
> -                */
> -               if (idlest_sgs.avg_load >= (local_sgs.avg_load + imbalance))
> -                       return NULL;
> -
> -               if (100 * local_sgs.avg_load <= sd->imbalance_pct * idlest_sgs.avg_load)
> -                       return NULL;
> -               break;
> +               return NULL;
> +       case group_overloaded:
> +               /* No push if balanced */
> +               if (100 * local_sgs.avg_load > sd->imbalance_pct *
> +                                               idlest_sgs.avg_load) {
> +                       unsigned long avg_load;
> +
> +                       avg_load = task_h_load(p) + idlest_sgs.group_load;
> +                       avg_load = (avg_load * SCHED_CAPACITY_SCALE) /
> +                                               idlest_sgs.group_capacity;
> +
> +                       if (100 * local_sgs.avg_load <= sd->imbalance_pct *
> +                                                               avg_load)
> +                               return idlest;
> +
> +                       if (local_sgs.avg_load > avg_load)
> +                               return idlest;
> +               }
> +               return NULL;
>
>         case group_imbalanced:
>         case group_asym_packing:
>

  parent reply	other threads:[~2020-04-27 12:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21  0:47 [sched/fair] 6c8116c914: stress-ng.mmapfork.ops_per_sec -38.0% regression kernel test robot
2020-04-24  8:15 ` [LKP] " Xing Zhengjun
2020-04-24 15:16   ` [LKP] " Vincent Guittot
     [not found] ` <20200425012306.13516-1-hdanton@sina.com>
2020-04-25  8:42   ` [LKP] " Vincent Guittot
     [not found]   ` <20200426124208.8872-1-hdanton@sina.com>
2020-04-27  9:03     ` Vincent Guittot
     [not found]     ` <20200427113533.4688-1-hdanton@sina.com>
2020-04-27 12:46       ` Vincent Guittot [this message]
2020-06-12  7:59         ` Xing Zhengjun
     [not found]           ` <BL0PR14MB377940B17C0889D725FF78599A9C0@BL0PR14MB3779.namprd14.prod.outlook.com>
2020-06-15  8:14             ` Xing Zhengjun
2020-06-30  7:43               ` Vincent Guittot
     [not found]                 ` <BL0PR14MB377920305EDF3047DCAF8B719A6F0@BL0PR14MB3779.namprd14.prod.outlook.com>
2020-06-30 14:22                   ` Vincent Guittot
     [not found]                     ` <BL0PR14MB37795DCF260BAF4EB2648BE89A650@BL0PR14MB3779.namprd14.prod.outlook.com>
2020-07-10 12:48                       ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtABxPBmS6=qn96=7X5vfF0ae15M+RAiduH0sb11+gyKew@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=ouwen210@hotmail.com \
    --cc=peterz@infradead.org \
    --cc=rong.a.chen@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).