linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: chris hyser <chris.hyser@oracle.com>
To: Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Joel Fernandes <joelaf@google.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>
Cc: mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	joel@joelfernandes.org, vineeth@bitbyteword.org,
	Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com,
	Aaron Lu <ziqian.lzq@antfin.com>
Subject: Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
Date: Tue, 15 Sep 2020 17:49:23 -0400	[thread overview]
Message-ID: <f20f4d5b-574a-7c3d-8c08-3e6b7893fc58@oracle.com> (raw)
In-Reply-To: <d02923d38df20f1d8c51cf4df6dce66ac0a385ce.1598643276.git.jdesfossez@digitalocean.com>

On 8/28/20 3:51 PM, Julien Desfossez wrote:
> From: Aaron Lu <aaron.lwe@gmail.com>
> 
> This patch provides a vruntime based way to compare two cfs task's
> priority, be it on the same cpu or different threads of the same core.
> 
> When the two tasks are on the same CPU, we just need to find a common
> cfs_rq both sched_entities are on and then do the comparison.
> 
> When the two tasks are on differen threads of the same core, each thread
> will choose the next task to run the usual way and then the root level
> sched entities which the two tasks belong to will be used to decide
> which task runs next core wide.
> 
> An illustration for the cross CPU case:
> 
>     cpu0         cpu1
>   /   |  \     /   |  \
> se1 se2 se3  se4 se5 se6
>      /  \            /   \
>    se21 se22       se61  se62
>    (A)                    /
>                         se621
>                          (B)
> 
> Assume CPU0 and CPU1 are smt siblings and cpu0 has decided task A to
> run next and cpu1 has decided task B to run next. To compare priority
> of task A and B, we compare priority of se2 and se6. Whose vruntime is
> smaller, who wins.
> 
> To make this work, the root level sched entities' vruntime of the two
> threads must be directly comparable. So one of the hyperthread's root
> cfs_rq's min_vruntime is chosen as the core wide one and all root level
> sched entities' vruntime is normalized against it.
> 
> All sub cfs_rqs and sched entities are not interesting in cross cpu
> priority comparison as they will only participate in the usual cpu local
> schedule decision so no need to normalize their vruntimes.
> 
> Signed-off-by: Aaron Lu <ziqian.lzq@antfin.com>
> ---
>   kernel/sched/core.c  |  23 +++----
>   kernel/sched/fair.c  | 142 ++++++++++++++++++++++++++++++++++++++++++-
>   kernel/sched/sched.h |   3 +
>   3 files changed, 150 insertions(+), 18 deletions(-)


While investigating reported 'uperf' performance regressions between core sched v5 and core sched v6/v7, this patch 
seems to be the first indicator of about a 40% perf loss in moving between v5 and v6 (and the accounting here is carried 
forward into this patch). Unfortunately, it is not the easiest thing to trace back as the patchsets are not directly 
comparable in this case and moving into v7, the base kernel revision has changed from 5.6 to 5.9.

The regressions were duplicated with the following setup: on a 24 core VM, create a cgroup and in it, fire off the uperf 
server and the client running 2 mins worth of 100 threads doing short TCP reads and writes. Do this for both the cgroup 
core sched tagged and not tagged (technically tearing everything down and rebuilding it in between). Short and easy to 
do dozens of runs for statistical averaging.

What ever the above version of this test might map to in real life, it presumably exacerbates the competition between 
softirq threads and the core sched tagged threads which was observed in the reports.

Here are the uperf results of the various patchsets. Note, that disabling smt is better for these tests and that that 
presumably reflects the overall overhead of core scheduling which went from bad to really bad. The primary focus in this 
email is to start to understand what happened within core sched itself.

patchset          smt=on/cs=off  smt=off    smt=on/cs=on
--------------------------------------------------------
v5-v5.6.y      :    1.78Gb/s     1.57Gb/s     1.07Gb/s
pre-v6-v5.6.y  :    1.75Gb/s     1.55Gb/s    822.16Mb/s
v6-5.7         :    1.87Gs/s     1.56Gb/s    561.6Mb/s
v6-5.7-hotplug :    1.75Gb/s     1.58Gb/s    438.21Mb/s
v7             :    1.80Gb/s     1.61Gb/s    440.44Mb/s

If you start by contrasting v5 and v6 on the same base 5.6 kernel to try to rule out kernel to kernel version 
differences, bisecting v6 pointed to the v6 version of (ie this patch):

"[RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison"

although all that really seems to be saying is that the new means of vruntime accounting (still present in v7) has 
caused performance in the patchset to drop which is plausible; different numbers, different scheduler behavior. A rough 
attempt to verify this by backporting parts of the new accounting onto the v5 patchset show where that initial switching 
from old to new accounting dropped perf to about 791Mb/s and the rest of the changes (as shown in the v6 numbers though 
not backported), only bring the v6 patchset back to 822.16Mb/s. That is not 100% proof, but seems very suspicious.

This change in vruntime accounting seems to account for about 40% of the total v5-to-v7 perf loss though clearly lots of 
other changes have occurred in between. Certainly not saying there is a bug here, just time to bring in the original 
authors and start a general discussion.

-chrish

  parent reply	other threads:[~2020-09-15 21:52 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
2020-09-03  5:13   ` Randy Dunlap
2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
2020-08-28 20:51   ` Peter Zijlstra
2020-08-28 22:02     ` Vineeth Pillai
2020-08-28 22:23       ` Joel Fernandes
2020-08-29  7:47       ` peterz
2020-08-31 13:01         ` Vineeth Pillai
2020-08-31 14:24         ` Joel Fernandes
2020-09-01  3:38         ` Joel Fernandes
2020-09-01  5:10         ` Joel Fernandes
2020-09-01 12:34           ` Vineeth Pillai
2020-09-01 17:30             ` Joel Fernandes
2020-09-01 21:23               ` Vineeth Pillai
2020-09-02  1:11                 ` Joel Fernandes
2020-08-28 20:55   ` Peter Zijlstra
2020-08-28 22:15     ` Vineeth Pillai
2020-09-15 20:08   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
2020-08-28 21:25   ` Peter Zijlstra
2020-08-28 23:24     ` Vineeth Pillai
2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
2020-08-28 21:29   ` Peter Zijlstra
2020-09-17 14:15     ` Vineeth Pillai
2020-09-17 20:39       ` Vineeth Pillai
2020-09-23  1:46     ` Joel Fernandes
2020-09-23  1:52       ` Joel Fernandes
2020-09-25 15:02         ` Joel Fernandes
2020-09-15 21:49   ` chris hyser [this message]
     [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
2020-09-16 14:24       ` chris hyser
2020-09-16 20:53         ` chris hyser
2020-09-17  1:09           ` Li, Aubrey
2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
2020-09-02  7:08   ` Pavan Kondeti
2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
2020-09-01 15:54   ` Thomas Gleixner
2020-09-01 16:50     ` Joel Fernandes
2020-09-01 20:02       ` Thomas Gleixner
2020-09-02  1:29         ` Joel Fernandes
2020-09-02  7:53           ` Thomas Gleixner
2020-09-02 15:12             ` Joel Fernandes
2020-09-02 16:57             ` Dario Faggioli
2020-09-03  4:34               ` Joel Fernandes
2020-09-03 11:05                 ` Vineeth Pillai
2020-09-03 13:20                 ` Thomas Gleixner
2020-09-03 20:30                   ` Joel Fernandes
2020-09-03 13:43                 ` Dario Faggioli
2020-09-03 20:25                   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f20f4d5b-574a-7c3d-8c08-3e6b7893fc58@oracle.com \
    --to=chris.hyser@oracle.com \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=benbjiang@tencent.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joelaf@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).