All of lore.kernel.org
 help / color / mirror / Atom feed
From: Subhra Mazumdar <subhra.mazumdar@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	daniel.lezcano@linaro.org, steven.sistare@oracle.com,
	dhaval.giani@oracle.com, rohit.k.jain@oracle.com
Subject: Re: [PATCH 1/3] sched: remove select_idle_core() for scalability
Date: Mon, 30 Apr 2018 16:38:42 -0700	[thread overview]
Message-ID: <e2011797-382b-0ac9-79eb-e17109ef4c96@oracle.com> (raw)
In-Reply-To: <20180425174909.GB4043@hirez.programming.kicks-ass.net>



On 04/25/2018 10:49 AM, Peter Zijlstra wrote:
> On Tue, Apr 24, 2018 at 02:45:50PM -0700, Subhra Mazumdar wrote:
>> So what you said makes sense in theory but is not borne out by real
>> world results. This indicates that threads of these benchmarks care more
>> about running immediately on any idle cpu rather than spending time to find
>> fully idle core to run on.
> But you only ran on Intel which emunerates siblings far apart in the
> cpuid space. Which is not something we should rely on.
>
>>> So by only doing a linear scan on CPU number you will actually fill
>>> cores instead of equally spreading across cores. Worse still, by
>>> limiting the scan to _4_ you only barely even get onto a next core for
>>> SMT4 hardware, never mind SMT8.
>> Again this doesn't matter for the benchmarks I ran. Most are happy to make
>> the tradeoff on x86 (SMT2). Limiting the scan is mitigated by the fact that
>> the scan window is rotated over all cpus, so idle cpus will be found soon.
> You've not been reading well. The Intel machine you tested this on most
> likely doesn't suffer that problem because of the way it happens to
> iterate SMT threads.
>
> How does Sparc iterate its SMT siblings in cpuid space?
SPARC does sequential enumeration of siblings first, although this needs to
be confirmed if non-sequential enumeration on x86 is the reason of the
improvements through tests. I don't have a SPARC test system handy now.
>
> Also, your benchmarks chose an unfortunate nr of threads vs topology.
> The 2^n thing chosen never hits the 100% core case (6,22 resp.).
>
>>> So while I'm not adverse to limiting the empty core search; I do feel it
>>> is important to have. Overloading cores when you don't have to is not
>>> good.
>> Can we have a config or a way for enabling/disabling select_idle_core?
> I like Rohit's suggestion of folding select_idle_core and
> select_idle_cpu much better, then it stays SMT aware.
>
> Something like the completely untested patch below.
I tried both the patches you suggested, the first with merging of
select_idle_core and select_idle_cpu and second with the new way of
calculating avg_idle and finally both combined. I ran the following
benchmarks for each, the merge only patch seems to giving similar
improvements as my original patch for Uperf and Oracle DB tests, but it
regresses for hackbench. If we can fix this I am OK with it. I can do a run
of other benchamrks after that.

I also noticed a possible bug later in the merge code. Shouldn't it be:

if (busy < best_busy) {
         best_busy = busy;
         best_cpu = first_idle;
}

Unfortunately I noticed it after all runs.

merge:

Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups  baseline       %stdev  patch %stdev
1       0.5742         21.13   0.5099 (11.2%) 2.24
2       0.5776         7.87    0.5385 (6.77%) 3.38
4       0.9578         1.12    1.0626 (-10.94%) 1.35
8       1.7018         1.35    1.8615 (-9.38%) 0.73
16      2.9955         1.36    3.2424 (-8.24%) 0.66
32      5.4354         0.59    5.749  (-5.77%) 0.55

Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline        %stdev  patch %stdev
8       49.47           0.35    49.98 (1.03%) 1.36
16      95.28           0.77    97.46 (2.29%) 0.11
32      156.77          1.17    167.03 (6.54%) 1.98
48      193.24          0.22    230.96 (19.52%) 2.44
64      216.21          9.33    299.55 (38.54%) 4
128     379.62          10.29   357.87 (-5.73%) 0.85

Oracle DB on 2 socket, 44 core and 88 threads Intel x86 machine
(normalized, higher is better):
users   baseline        %stdev  patch %stdev
20      1               1.35    0.9919 (-0.81%) 0.14
40      1               0.42    0.9959 (-0.41%) 0.72
60      1               1.54    0.9872 (-1.28%) 1.27
80      1               0.58    0.9925 (-0.75%) 0.5
100     1               0.77    1.0145 (1.45%) 1.29
120     1               0.35    1.0136 (1.36%) 1.15
140     1               0.19    1.0404 (4.04%) 0.91
160     1               0.09    1.0317 (3.17%) 1.41
180     1               0.99    1.0322 (3.22%) 0.51
200     1               1.03    1.0245 (2.45%) 0.95
220     1               1.69    1.0296 (2.96%) 2.83

new avg_idle:

Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups  baseline       %stdev  patch %stdev
1       0.5742         21.13   0.5241 (8.73%) 8.26
2       0.5776         7.87    0.5436 (5.89%) 8.53
4       0.9578         1.12    0.989 (-3.26%) 1.9
8       1.7018         1.35    1.7568 (-3.23%) 1.22
16      2.9955         1.36    3.1119 (-3.89%) 0.92
32      5.4354         0.59    5.5889 (-2.82%) 0.64

Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline        %stdev  patch %stdev
8       49.47           0.35    48.11 (-2.75%) 0.29
16      95.28           0.77    93.67 (-1.68%) 0.68
32      156.77          1.17    158.28 (0.96%) 0.29
48      193.24          0.22    190.04 (-1.66%) 0.34
64      216.21          9.33    189.45 (-12.38%) 2.05
128     379.62          10.29   326.59 (-13.97%) 13.07

Oracle DB on 2 socket, 44 core and 88 threads Intel x86 machine
(normalized, higher is better):
users   baseline        %stdev  patch %stdev
20      1               1.35    1.0026 (0.26%) 0.25
40      1               0.42    0.9857 (-1.43%) 1.47
60      1               1.54    0.9903 (-0.97%) 0.99
80      1               0.58    0.9968 (-0.32%) 1.19
100     1               0.77    0.9933 (-0.67%) 0.53
120     1               0.35    0.9919 (-0.81%) 0.9
140     1               0.19    0.9915 (-0.85%) 0.36
160     1               0.09    0.9811 (-1.89%) 1.21
180     1               0.99    1.0002 (0.02%) 0.87
200     1               1.03    1.0037 (0.37%) 2.5
220     1               1.69    0.998 (-0.2%) 0.8

merge + new avg_idle:

Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups  baseline       %stdev  patch %stdev
1       0.5742         21.13   0.6522 (-13.58%) 12.53
2       0.5776         7.87    0.7593 (-31.46%) 2.7
4       0.9578         1.12    1.0952 (-14.35%) 1.08
8       1.7018         1.35    1.8722 (-10.01%) 0.68
16      2.9955         1.36    3.2987 (-10.12%) 0.58
32      5.4354         0.59    5.7751 (-6.25%) 0.46

Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline        %stdev  patch %stdev
8       49.47           0.35    51.29 (3.69%) 0.86
16      95.28           0.77    98.95 (3.85%) 0.41
32      156.77          1.17    165.76 (5.74%) 0.26
48      193.24          0.22    234.25 (21.22%) 0.63
64      216.21          9.33    306.87 (41.93%) 2.11
128     379.62          10.29   355.93 (-6.24%) 8.28

Oracle DB on 2 socket, 44 core and 88 threads Intel x86 machine
(normalized, higher is better):
users   baseline        %stdev  patch %stdev
20      1               1.35    1.0085 (0.85%) 0.72
40      1               0.42    1.0017 (0.17%) 0.3
60      1               1.54    0.9974 (-0.26%) 1.18
80      1               0.58    1.0115 (1.15%) 0.93
100     1               0.77    0.9959 (-0.41%) 1.21
120     1               0.35    1.0034 (0.34%) 0.72
140     1               0.19    1.0123 (1.23%) 0.93
160     1               0.09    1.0057 (0.57%) 0.65
180     1               0.99    1.0195 (1.95%) 0.99
200     1               1.03    1.0474 (4.74%) 0.55
220     1               1.69    1.0392 (3.92%) 0.36

  reply	other threads:[~2018-04-30 23:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-24  0:41 [RFC/RFT PATCH 0/3] Improve scheduler scalability for fast path subhra mazumdar
2018-04-24  0:41 ` [PATCH 1/3] sched: remove select_idle_core() for scalability subhra mazumdar
2018-04-24 12:46   ` Peter Zijlstra
2018-04-24 21:45     ` Subhra Mazumdar
2018-04-25 17:49       ` Peter Zijlstra
2018-04-30 23:38         ` Subhra Mazumdar [this message]
2018-05-01 18:03           ` Peter Zijlstra
2018-05-02 21:58             ` Subhra Mazumdar
2018-05-04 18:51               ` Subhra Mazumdar
2018-05-29 21:36               ` Peter Zijlstra
2018-05-30 22:08                 ` Subhra Mazumdar
2018-05-31  9:26                   ` Peter Zijlstra
2018-04-24  0:41 ` [PATCH 2/3] sched: introduce per-cpu var next_cpu to track search limit subhra mazumdar
2018-04-24 12:47   ` Peter Zijlstra
2018-04-24 22:39     ` Subhra Mazumdar
2018-04-24  0:41 ` [PATCH 3/3] sched: limit cpu search and rotate search window for scalability subhra mazumdar
2018-04-24 12:48   ` Peter Zijlstra
2018-04-24 22:43     ` Subhra Mazumdar
2018-04-24 12:48   ` Peter Zijlstra
2018-04-24 22:48     ` Subhra Mazumdar
2018-04-24 12:53   ` Peter Zijlstra
2018-04-25  0:10     ` Subhra Mazumdar
2018-04-25 15:36       ` Peter Zijlstra
2018-04-25 18:01         ` Peter Zijlstra
2018-05-04  2:46   ` [lkp-robot] [sched] 9824134a55: hackbench.throughput +85.7% improvement kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2011797-382b-0ac9-79eb-e17109ef4c96@oracle.com \
    --to=subhra.mazumdar@oracle.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dhaval.giani@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rohit.k.jain@oracle.com \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.