All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Chen Yu <yu.chen.surf@gmail.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Sachin Sant <sachinp@linux.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, 0day robot <lkp@intel.com>,
	Huang Ying <ying.huang@intel.com>,
	feng.tang@intel.com, zhengjun.xing@linux.intel.com,
	fengwei.yin@intel.com, Aubrey Li <aubrey.li@linux.intel.com>,
	Chen Yu <yu.c.chen@intel.com>,
	tim.c.chen@intel.com
Subject: Re: [sched/pelt] 2d02fa8cc2: stress-ng.pipeherd.ops_per_sec -9.7% regression
Date: Fri, 8 Apr 2022 10:57:19 +0200	[thread overview]
Message-ID: <CAKfTPtCCLH8dHx=J7mUNthbu785nr5d1LkZz6z6cG9BK85LFcA@mail.gmail.com> (raw)
In-Reply-To: <CADjb_WRbm3pt9kqhPKhTRSAcJFt1whaKw2MMMyYzkdW=rDXFdA@mail.gmail.com>

On Tue, 5 Apr 2022 at 16:23, Chen Yu <yu.chen.surf@gmail.com> wrote:
>
> On Mon, Apr 4, 2022 at 5:53 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Fri, 1 Apr 2022 at 20:32, Chen Yu <yu.chen.surf@gmail.com> wrote:
> > >
> > > On Fri, Apr 1, 2022 at 12:17 AM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > On Thu, 31 Mar 2022 at 16:19, Chen Yu <yu.chen.surf@gmail.com> wrote:
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > On Wed, Feb 9, 2022 at 1:17 PM kernel test robot <oliver.sang@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a -9.7% regression of stress-ng.pipeherd.ops_per_sec due to commit:
> > > > > >
> > > > > >
> > > > > > commit: 2d02fa8cc21a93da35cfba462bf8ab87bf2db651 ("sched/pelt: Relax the sync of load_sum with load_avg")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > > >
> > > > > > in testcase: stress-ng
> > > > > > on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
> > > > > > with following parameters:
> > > > > >
> > > > > >         nr_threads: 100%
> > > > > >         testtime: 60s
> > > > > >         class: memory
> > > > > >         test: pipeherd
> > > > > >         cpufreq_governor: performance
> > > > > >         ucode: 0xd000280
> > > > > >
> > > > > This week we have re-run the test result and it seems that this
> > > > > regression is still there.
> > > > > As we are evaluating whether this report is valid or if the
> > > > > downgrading is expected, appreciated
> > > > > if you could give suggestion on further steps:
> > > > >
> > > > > 1.  If I understand correctly,
> > > > > 2d02fa8cc21a93da35cfba462bf8ab87bf2db651 ("sched/pelt: Relax the sync
> > > > > of load_sum with load_avg")
> > > > >      fixed the calculating of  load_sum.  Before this patch  the
> > > > > contribution part would be 'skipped' and caused the load_sum
> > > > >      to be lower than expected.
> > > >
> > > > Yes, you understand it correctly
> > > >
> > > > > 2. If above is true, after this patch, the load_sum becomes higher. Is
> > > > > there a scenario that higher load_sum added to 1 cfs_rq brings
> > > > >     more imbalance between this group and other sched_group, thus
> > > > > brings more task migration/wake up? (because in below perf result,
> > > > >     it seems that, with this patch applied, there are slightly more
> > > > > take wake up)
> > > >
> > > > This change should not impact load balance as it only does comparison
> > > > and I expect the load increase to happen on all cfs rq.
> > > > The only place that could be impacted, would be wake_affine_weight()
> > > > because it removes task load from previous cfs rq load before
> > > > comparing.
> > > > The task's load was not impacted by the underestimate which means that
> > > > the load of prev cfs  might be seen lower than current cfs after
> > > > subtracting the task's load whereas both cfs rqs were similarly
> > > > underestimated.
> > > > Now the load of prev cfs rq is not underestimated and becomes
> > > > comparable or slightly higher than the current cfs and the task
> > > > migrate on current cfs instead of staying on prev one at wakeup
> > > >
> > > Could you please elaborate a little more on this scenario, since both current
> > > and previous cfs rqs were underestimated, how could previous cfs rq has
> > > lower load than the current one before applying this patch?
> > >
> > > Say, suppose the previous cfs rq has a load of L1, and current cfs rq has
> > > a load of L2, the waken task has a load of h, then wake_affine_weight()
> > > compares L1 - h  with L2 + h ,  when L1 < L2 + 2h, the task will remain on
> > > previous CPU.  Since L1 and L2 were underestimated in the same scale,
> > > I'm not quite sure how this patch would affect the choice between
> > > prev and current CPU.
> >
> > Let's take the example of  this_cpu load L1 = 0 and prev_cpu load L2 =
> > 2h'+d. h' reflects h in the cpu load and d is a small delta load. The
> > task will migrate if we have the condition below:
> >
> >     h < 2h'-h+d
> >
> > With this patch, we assume that h' == h as we don't underestimate the
> > load of cfs rqs anymore. The condition for migrating the task is :
> >     h < h+d
> > And the task will migrate on this cpu as soon as there is a small load
> > on prev_cpu in addition to the 2h.
> >
> > Without the patch, the load of cfs_rqs are underestimated which means
> > that the task's load is underestimated in the cfs rq. This can be
> > described as h'  == h-U. U being the underestimated part. In this case
> > the condition to migrate the task becomes:
> >     h < h-2U+d
> > The task will migrate on this cpu is d is large enough to compensate
> > the underestimation so we will migrate less often
> >
> I see. Thanks for this example! So in this scenario when previous CPU
> has some higher load than the current CPU, without this patch applied,
> the OS would have more chances to keep the task on the previous CPU,
> thus less migration and less rq lock contention(according to the perf result).
> I don't have a good idea in mind on how to deal with this case, except that
> by disabling WA_WEIGHT or WA_BIAS(prefer 'this' CPU I suppose).

I don't think that there is any good solution for this bench. It was
taking advantage of the underestimated load_avg because it seems that
it doesn't like to migrate on local cpu but on the other side, some
benches will take advantage of this migration.

Thanks
Vincent
>
> --
> Thanks,
> Chenyu

WARNING: multiple messages have this Message-ID (diff)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: lkp@lists.01.org
Subject: Re: [sched/pelt] 2d02fa8cc2: stress-ng.pipeherd.ops_per_sec -9.7% regression
Date: Fri, 08 Apr 2022 10:57:19 +0200	[thread overview]
Message-ID: <CAKfTPtCCLH8dHx=J7mUNthbu785nr5d1LkZz6z6cG9BK85LFcA@mail.gmail.com> (raw)
In-Reply-To: <CADjb_WRbm3pt9kqhPKhTRSAcJFt1whaKw2MMMyYzkdW=rDXFdA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5726 bytes --]

On Tue, 5 Apr 2022 at 16:23, Chen Yu <yu.chen.surf@gmail.com> wrote:
>
> On Mon, Apr 4, 2022 at 5:53 PM Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Fri, 1 Apr 2022 at 20:32, Chen Yu <yu.chen.surf@gmail.com> wrote:
> > >
> > > On Fri, Apr 1, 2022 at 12:17 AM Vincent Guittot
> > > <vincent.guittot@linaro.org> wrote:
> > > >
> > > > On Thu, 31 Mar 2022 at 16:19, Chen Yu <yu.chen.surf@gmail.com> wrote:
> > > > >
> > > > > Hi Vincent,
> > > > >
> > > > > On Wed, Feb 9, 2022 at 1:17 PM kernel test robot <oliver.sang@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a -9.7% regression of stress-ng.pipeherd.ops_per_sec due to commit:
> > > > > >
> > > > > >
> > > > > > commit: 2d02fa8cc21a93da35cfba462bf8ab87bf2db651 ("sched/pelt: Relax the sync of load_sum with load_avg")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > > >
> > > > > > in testcase: stress-ng
> > > > > > on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
> > > > > > with following parameters:
> > > > > >
> > > > > >         nr_threads: 100%
> > > > > >         testtime: 60s
> > > > > >         class: memory
> > > > > >         test: pipeherd
> > > > > >         cpufreq_governor: performance
> > > > > >         ucode: 0xd000280
> > > > > >
> > > > > This week we have re-run the test result and it seems that this
> > > > > regression is still there.
> > > > > As we are evaluating whether this report is valid or if the
> > > > > downgrading is expected, appreciated
> > > > > if you could give suggestion on further steps:
> > > > >
> > > > > 1.  If I understand correctly,
> > > > > 2d02fa8cc21a93da35cfba462bf8ab87bf2db651 ("sched/pelt: Relax the sync
> > > > > of load_sum with load_avg")
> > > > >      fixed the calculating of  load_sum.  Before this patch  the
> > > > > contribution part would be 'skipped' and caused the load_sum
> > > > >      to be lower than expected.
> > > >
> > > > Yes, you understand it correctly
> > > >
> > > > > 2. If above is true, after this patch, the load_sum becomes higher. Is
> > > > > there a scenario that higher load_sum added to 1 cfs_rq brings
> > > > >     more imbalance between this group and other sched_group, thus
> > > > > brings more task migration/wake up? (because in below perf result,
> > > > >     it seems that, with this patch applied, there are slightly more
> > > > > take wake up)
> > > >
> > > > This change should not impact load balance as it only does comparison
> > > > and I expect the load increase to happen on all cfs rq.
> > > > The only place that could be impacted, would be wake_affine_weight()
> > > > because it removes task load from previous cfs rq load before
> > > > comparing.
> > > > The task's load was not impacted by the underestimate which means that
> > > > the load of prev cfs  might be seen lower than current cfs after
> > > > subtracting the task's load whereas both cfs rqs were similarly
> > > > underestimated.
> > > > Now the load of prev cfs rq is not underestimated and becomes
> > > > comparable or slightly higher than the current cfs and the task
> > > > migrate on current cfs instead of staying on prev one at wakeup
> > > >
> > > Could you please elaborate a little more on this scenario, since both current
> > > and previous cfs rqs were underestimated, how could previous cfs rq has
> > > lower load than the current one before applying this patch?
> > >
> > > Say, suppose the previous cfs rq has a load of L1, and current cfs rq has
> > > a load of L2, the waken task has a load of h, then wake_affine_weight()
> > > compares L1 - h  with L2 + h ,  when L1 < L2 + 2h, the task will remain on
> > > previous CPU.  Since L1 and L2 were underestimated in the same scale,
> > > I'm not quite sure how this patch would affect the choice between
> > > prev and current CPU.
> >
> > Let's take the example of  this_cpu load L1 = 0 and prev_cpu load L2 =
> > 2h'+d. h' reflects h in the cpu load and d is a small delta load. The
> > task will migrate if we have the condition below:
> >
> >     h < 2h'-h+d
> >
> > With this patch, we assume that h' == h as we don't underestimate the
> > load of cfs rqs anymore. The condition for migrating the task is :
> >     h < h+d
> > And the task will migrate on this cpu as soon as there is a small load
> > on prev_cpu in addition to the 2h.
> >
> > Without the patch, the load of cfs_rqs are underestimated which means
> > that the task's load is underestimated in the cfs rq. This can be
> > described as h'  == h-U. U being the underestimated part. In this case
> > the condition to migrate the task becomes:
> >     h < h-2U+d
> > The task will migrate on this cpu is d is large enough to compensate
> > the underestimation so we will migrate less often
> >
> I see. Thanks for this example! So in this scenario when previous CPU
> has some higher load than the current CPU, without this patch applied,
> the OS would have more chances to keep the task on the previous CPU,
> thus less migration and less rq lock contention(according to the perf result).
> I don't have a good idea in mind on how to deal with this case, except that
> by disabling WA_WEIGHT or WA_BIAS(prefer 'this' CPU I suppose).

I don't think that there is any good solution for this bench. It was
taking advantage of the underestimated load_avg because it seems that
it doesn't like to migrate on local cpu but on the other side, some
benches will take advantage of this migration.

Thanks
Vincent
>
> --
> Thanks,
> Chenyu

  reply	other threads:[~2022-04-08  8:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-04 14:19 [sched/pelt] 2d02fa8cc2: stress-ng.pipeherd.ops_per_sec -9.7% regression kernel test robot
2022-02-04 14:19 ` kernel test robot
2022-03-31 14:19 ` Chen Yu
2022-03-31 14:19   ` Chen Yu
2022-03-31 16:17   ` Vincent Guittot
2022-03-31 16:17     ` Vincent Guittot
2022-04-01 18:32     ` Chen Yu
2022-04-01 18:32       ` Chen Yu
2022-04-04  9:52       ` Vincent Guittot
2022-04-04  9:52         ` Vincent Guittot
2022-04-05 14:22         ` Chen Yu
2022-04-05 14:22           ` Chen Yu
2022-04-08  8:57           ` Vincent Guittot [this message]
2022-04-08  8:57             ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtCCLH8dHx=J7mUNthbu785nr5d1LkZz6z6cG9BK85LFcA@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=aubrey.li@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=sachinp@linux.ibm.com \
    --cc=tim.c.chen@intel.com \
    --cc=ying.huang@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=yu.chen.surf@gmail.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.