linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Abel Wu <wuyun.abel@bytedance.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Josh Don <joshdon@google.com>, Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core
Date: Mon, 29 Aug 2022 22:11:39 +0800	[thread overview]
Message-ID: <51009414-5ffb-b6ec-a501-7b2514a0f3cc@bytedance.com> (raw)
In-Reply-To: <20220829130831.odhemmcmuecqxkbz@techsingularity.net>

Hi Mel, thanks for reviewing!

On 8/29/22 9:08 PM, Mel Gorman Wrote:
> On Tue, Jul 12, 2022 at 04:20:32PM +0800, Abel Wu wrote:
>> When SIS_UTIL is enabled, SIS domain scan will be skipped if
>> the LLC is overloaded. Since the overloaded status is checked
>> in the load balancing at LLC level, the interval is llc_size
>> miliseconds. The duration might be long enough to affect the
>> overall system throughput if idle cores are out of reach in
>> SIS domain scan.
>>
>> Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
> 
> Split this patch to move the this_sd lookup into the SIS_PROP section.

OK

> 
> Otherwise, this is the most controversial patch in the series and the
> most likely to cause problems where it wins on some machines and
> workloads and loses on others.
> 
> The corner case to worry about is a workload that is rapidly idling and
> the has_idle_core hint is often wrong. This can happen with hackbench for
> example, or at least this was true when I last looked which is quite some
> time ago. If this hint is often wrong, then there will be full scan cost
> incurred regardless of SIS_UTIL that often fails to find a core.

Yes, I can't agree more. And the situation can become worse when LLC
gets bigger as you mentioned below. I will exclude this part from v2.

> 
> So, first suggestion is to move this patch to the end of the series as
> the other patches are relatively harmless. They could even be merged in
> isolation as a cleanup.
> 
> Second, using the other patches as your baseline, include in the
> changelog what you tested that showed a benefit, what type of machine
> it was and in particular include the number of cores, nodes and the
> span of the LLC.  If you measured any regressions, include that as well
> and make a call on whether you think the patch wins more than it loses.
> The reason to include that information is because the worst corner case
> (all CPUs scanned uselessly) costs more the larger the span of LLC is.
> If all the testing was done on a 2-core SMT-2 machine, the overhead of the
> patch would be negligible but very different if the LLC span is 20 cores.
> While the patch is not obviously wrong, it definitely needs better data,
> Even if you do not have a large test machine available, it's still helpful
> to have it in the changelog because a reviewer like me can decide "this
> needs testing on a larger machine".

Thanks for your detailed suggestions. I will attach benchmark results
along with some analysis next time when posting performance related
patches.

> 
> I did queue this up the entire series for testing and while it sometimes
> benefitted, there were large counter-examples. tbench4 on Zen3 showed some
> large regressions (e.g. 41% loss on 64 clients with a massive increase in
> variability) which has multiple small L3 caches per node. tbench4 (slightly
> modified in mmtests to produce a usable metric) in general showed issues
> across multiple x86-64 machines both AMD and Intel, multiple generations
> with a noticable increase in system CPU usage when the client counts reach
> the stage where the machine is close to saturated. perfpipe for some
> weird reason showed a large regression apparent regresion on Broadwell
> but the variability was also crazy so probably can be ignored. hackbench
> overall looked ok so it's possible I'm wrong about the idle_cores hint
> being often wrong on that workload and I should double check that. It's
> possible the hint is wrong some of the times but right often enough to
> either benefit from using an idle core or by finding an idle sibling which
> may be preferable to stacking tasks on the same CPU.

I attached my test result in one of my replies[1]: netperf showed ~3.5%
regression, hackbench improved a lot, and tbench4 drew. I tested several
times and the results didn't seem vary much.

> 
> The lack of data and the lack of a note on the potential downside is the
> main reason why I'm not acking patch. tbench4 was a particular concern on
> my own tests and it's possible a better patch would be a hybrid approach
> where a limited search of two cores (excluding target) is allowed even if
> SIS_UTIL indicates overload but has_idle_cores is true.
> 

Agreed. And the reason I will exclude this part in v2 is that I plan to
make it part of another feature, SIS filter [2]. The latest version of
SIS filter (not posted yet but soon) will contain all the idle cpus so
we don't need a full scan when has_idle_core. Scanning the filter then
is enough. While it may still cost too much if too many false positive
bits in the filter. Does this direction make sense to you?

[1] 
https://lore.kernel.org/lkml/eaa543fa-421d-2194-be94-6a5e24a33b37@bytedance.com/
[2] 
https://lore.kernel.org/lkml/20220619120451.95251-1-wuyun.abel@bytedance.com/

Thanks & Best Regards,
Abel

  reply	other threads:[~2022-08-29 14:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12  8:20 [PATCH 0/5] sched/fair: SIS improvements and cleanups Abel Wu
2022-07-12  8:20 ` [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core Abel Wu
2022-07-13  3:47   ` Chen Yu
2022-07-13 16:14     ` Abel Wu
2022-07-14  6:19   ` Yicong Yang
2022-07-14  6:58     ` Abel Wu
2022-07-14  7:15       ` Yicong Yang
2022-07-14  8:00         ` Abel Wu
2022-07-14  8:16           ` Yicong Yang
2022-07-14  8:34             ` Yicong Yang
2022-08-04  9:59       ` Chen Yu
2022-08-15  2:54         ` Abel Wu
2022-08-10 13:50   ` Chen Yu
2022-08-15  2:44     ` Abel Wu
2022-08-29 13:08   ` Mel Gorman
2022-08-29 14:11     ` Abel Wu [this message]
2022-08-29 14:56       ` Mel Gorman
2022-09-01 13:08         ` Abel Wu
2022-09-02  4:12         ` Abel Wu
2022-09-02 10:25           ` Mel Gorman
2022-09-05 14:40             ` Abel Wu
2022-09-06  9:57               ` Mel Gorman
2022-09-07  7:27                 ` Chen Yu
2022-09-07  8:41                   ` Mel Gorman
2022-09-07  7:52                 ` Abel Wu
2022-07-12  8:20 ` [PATCH 2/5] sched/fair: default to false in test_idle_cores Abel Wu
2022-08-29 12:36   ` Mel Gorman
2022-07-12  8:20 ` [PATCH 3/5] sched/fair: remove redundant check in select_idle_smt Abel Wu
2022-08-29 12:36   ` Mel Gorman
2022-07-12  8:20 ` [PATCH 4/5] sched/fair: avoid double search on same cpu Abel Wu
2022-08-29 12:36   ` Mel Gorman
2022-07-12  8:20 ` [PATCH 5/5] sched/fair: remove useless check in select_idle_core Abel Wu
2022-08-29 12:37   ` Mel Gorman
2022-08-15 13:31 ` [PATCH 0/5] sched/fair: SIS improvements and cleanups Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51009414-5ffb-b6ec-a501-7b2514a0f3cc@bytedance.com \
    --to=wuyun.abel@bytedance.com \
    --cc=joshdon@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).