All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@kernel.org, peterz@infradead.org,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: [PATCH v2 0/9] sched: Streamline select_task_rq() & select_task_rq_fair()
Date: Wed, 11 Mar 2020 18:15:52 +0000	[thread overview]
Message-ID: <20200311181601.18314-1-valentin.schneider@arm.com> (raw)

I've been staring at select_task_rq_fair() for some time now, and have come
to "hate" the usage of the sd_flag parameter. It is used as both an
indicator of the wakeup type (ttwu/fork/exec) and as a domain walk search
target. CFS is the only class doing this, the other classes just need the
wakeup type but get passed a domain flag instead.

This series gets rid of select_task_rq()'s sd_flag parameter and also tries
to optimize select_task_rq_fair().

This is based on tip/sched/core at:
a0f03b617c3b ("sched/numa: Stop an exhastive search if a reasonable swap candidate or idle CPU is found")

Patches
=======

o Patch 1 is a simple dead parameter cleanup
o Patches 2-4 get rid of SD_LOAD_BALANCE
o Patches 5-6 involve getting rid of the sd_flag parameter for
  select_task_rq(). 
o Patch 7 is an extra cleanup in the select_task_rq_fair() region.
o Patches 8-9 split the want_affine vs !want_affine paths of
  select_task_rq_fair(), which unearths a small optimization. Sort of a
  single patch split in two for the sake of review.

Testing
=======

Testing was done against a slightly older tip/sched/core at:
25ac227a25ac ("sched/fair: Remove wake_cap()")

I got annoyed by the variance in my 500 iteration runs, so I scripted
something to run batches of 5k iterations. It looks pretty stable from one
batch to another. I also stared at some boxplots to convince myself I
wasn't needlessly making things worse - you too can stare at them here [1].

Note: the 'X%' stats are the percentiles, so 50% is the 50th percentile.

Juno r0
-------

2+4 big.LITTLE. SD levels are {MC, DIE}.

Hackbench
~~~~~~~~~

15000 iterations of
  $ hackbench
(lower is better):

|       |   -PATCH |   +PATCH |  DELTA |
|-------+----------+----------+--------|
| mean  | 0.622235 | 0.618834 | -0.55% |
| std   | 0.018121 | 0.017579 | -2.99% |
| min   | 0.571000 | 0.573000 | +0.35% |
| 50%   | 0.620000 | 0.617000 | -0.48% |
| 75%   | 0.633000 | 0.629000 | -0.63% |
| 99%   | 0.674000 | 0.669000 | -0.74% |
| max   | 0.818000 | 0.756000 | -7.58% |

The boxplot shows a single outlier to the very left for both commits, which
are the minimums reported above. Other than that, +PATCH has somewhat lower
outliers on the righthand side: worst cases are a tad better.

Sysbench
~~~~~~~~

15000 iterations of
  $ sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=6 run
(higher is better):

|       |       -PATCH |       +PATCH |   DELTA |
|-------+--------------+--------------+---------|
| mean  | 15318.954000 | 15628.416933 |  +2.02% |
| std   |   235.466202 |   205.162730 | -12.87% |
| min   | 13025.000000 | 13554.000000 |  +4.06% |
| 50%   | 15366.000000 | 15681.000000 |  +2.05% |
| 75%   | 15497.000000 | 15765.000000 |  +1.73% |
| 99%   | 15651.000000 | 15893.000000 |  +1.55% |
| max   | 15716.000000 | 15972.000000 |  +1.63% |

That's overall a tad better.

Dual-socket Xeon E5
-------------------

Each socket is 10 cores w/ SMT2 - 40 CPUs total. SD levels are
{SMT, MC, NUMA}.

Hackbench
~~~~~~~~~

25000 iterations of
  $ hackbench -l 1000
(lower is better):

|       |       -PATCH |       +PATCH |  DELTA |
|-------+--------------+--------------+--------|
| mean  |     0.946312 |     0.944685 | -0.17% |
| std   |     0.006419 |     0.006447 | +0.44% |
| min   |     0.906000 |     0.897000 | -0.99% |
| 50%   |     0.947000 |     0.945000 | -0.21% |
| 75%   |     0.950000 |     0.949000 | -0.11% |
| 99%   |     0.959000 |     0.958000 | -0.10% |
| max   |     0.988000 |     0.967000 | -2.13% |

The boxplot shows that the min improvement is some sort of fluke - it's a
single point standing out on the left. The mean *is* slightly lowered,
which most likely comes from +PATCH having less high-value outliers.

I looked into some statistical tests, but my samples distribution isn't a
normal distribution (which is a requirement for most of them). This
actually can't happen by construction according to [2], since hackbench
outputs the maximum of a set of random of variables. We could instead use
the average of all sender/receiver pairs, or even the invidual time taken
per each pair; that being said, I don't think each value produced by a pair
could be seen as independent variables, given that there'll be more > 1
task per CPU.

Wilcoxon's test [3] gives me a p-value of ~1e-182, so there *is* a
significant difference between the two datasets, but it does not say if the
difference is in the mean, variance, or any other parameter of the
distribution.

Sysbench
~~~~~~~~~

25000 iterations of
  $ sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=40 run
(higher is better):

|       |       -PATCH |       +PATCH |   DELTA |
|-------+--------------+--------------+---------|
| mean  | 23937.937560 | 24280.668640 |  +1.43% |
| std   |   547.139948 |   484.963639 | -11.36% |
| min   | 21526.000000 | 21917.000000 |  +1.82% |
| 50%   | 24032.000000 | 24374.000000 |  +1.42% |
| 75%   | 24355.000000 | 24638.000000 |  +1.16% |
| 99%   | 24869.010000 | 25084.000000 |  +0.86% |
| max   | 25410.000000 | 25623.000000 |  +0.84% |

As with the Juno, that's overall a tad better.

Takeaway
--------

The TL;DR for those fancy stats seems to be: it's hard to say much about
hackbench, and sysbench likes it a bit. The important thing for me is to
not introduce regressions with my stupid change, and AFAICT it is the case.

Links
=====

[1]: https://htmlpreview.github.io/?https://gist.githubusercontent.com/valschneider/433b3772d1776c52214dd05be2ab2f03/raw/316fbd9f774fa381c60731511c881a3360111563/streamline_v2_bplots.html
[2]: https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem
[3]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html#scipy.stats.wilcoxon

Revisions
=========

v1 -> v2
--------
o Removed the 'RFC' tag
o Made the sd_flags syctl read-only
o Removed the SD_LOAD_BALANCE flag
o Cleaned up ugly changes thanks to the above

Valentin Schneider (9):
  sched/fair: find_idlest_group(): Remove unused sd_flag parameter
  sched/debug: Make sd->flags sysctl read-only
  sched: Remove checks against SD_LOAD_BALANCE
  sched/topology: Kill SD_LOAD_BALANCE
  sched: Add WF_TTWU, WF_EXEC wakeup flags
  sched: Kill select_task_rq()'s sd_flag parameter
  sched/fair: Dissociate wakeup decisions from SD flag value
  sched/fair: Split select_task_rq_fair want_affine logic
  sched/topology: Define and use shortcut pointers for wakeup sd_flag
    scan

 include/linux/sched/topology.h | 29 +++++++------
 kernel/sched/core.c            | 10 ++---
 kernel/sched/deadline.c        |  4 +-
 kernel/sched/debug.c           |  2 +-
 kernel/sched/fair.c            | 75 +++++++++++++++++++---------------
 kernel/sched/idle.c            |  2 +-
 kernel/sched/rt.c              |  4 +-
 kernel/sched/sched.h           | 13 ++++--
 kernel/sched/stop_task.c       |  2 +-
 kernel/sched/topology.c        | 43 +++++++++----------
 10 files changed, 98 insertions(+), 86 deletions(-)

--
2.24.0


             reply	other threads:[~2020-03-11 18:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-11 18:15 Valentin Schneider [this message]
2020-03-11 18:15 ` [PATCH v2 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
2020-03-19  9:05   ` Dietmar Eggemann
2020-03-11 18:15 ` [PATCH v2 2/9] sched/debug: Make sd->flags sysctl read-only Valentin Schneider
2020-03-19  9:07   ` Dietmar Eggemann
2020-03-19 12:04     ` Valentin Schneider
2020-03-11 18:15 ` [PATCH v2 3/9] sched: Remove checks against SD_LOAD_BALANCE Valentin Schneider
2020-03-19 10:28   ` Dietmar Eggemann
2020-03-19 12:05     ` Valentin Schneider
2020-03-23 14:26       ` Dietmar Eggemann
2020-03-23 17:17         ` Valentin Schneider
2020-03-11 18:15 ` [PATCH v2 4/9] sched/topology: Kill SD_LOAD_BALANCE Valentin Schneider
2020-03-19 10:29   ` Dietmar Eggemann
2020-03-19 12:06     ` Valentin Schneider
2020-03-11 18:15 ` [PATCH v2 5/9] sched: Add WF_TTWU, WF_EXEC wakeup flags Valentin Schneider
2020-03-11 18:15 ` [PATCH v2 6/9] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
2020-03-11 18:15 ` [PATCH v2 7/9] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
2020-03-11 18:16 ` [PATCH v2 8/9] sched/fair: Split select_task_rq_fair want_affine logic Valentin Schneider
2020-03-19 10:30   ` Dietmar Eggemann
2020-03-19 12:06     ` Valentin Schneider
2020-03-11 18:16 ` [PATCH v2 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
2020-03-19 10:46   ` Dietmar Eggemann
2020-03-19 12:22     ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200311181601.18314-1-valentin.schneider@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.