linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <efault@gmx.de>
To: Chen Yu <yu.c.chen@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>,
	Tim Chen <tim.c.chen@intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	Yicong Yang <yangyicong@hisilicon.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Len Brown <len.brown@intel.com>, Chen Yu <yu.chen.surf@gmail.com>,
	Arjan Van De Ven <arjan.van.de.ven@intel.com>,
	Aaron Lu <aaron.lu@intel.com>, Barry Song <baohua@kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Introduce SIS_PAIR to wakeup task on local idle core first
Date: Tue, 16 May 2023 08:23:35 +0200	[thread overview]
Message-ID: <19664c68f77f5b23a86e5636a17ad2cbfa073f78.camel@gmx.de> (raw)
In-Reply-To: <20230516011159.4552-1-yu.c.chen@intel.com>

On Tue, 2023-05-16 at 09:11 +0800, Chen Yu wrote:
> [Problem Statement]
>
...

> 20.26%    19.89%  [kernel.kallsyms]          [k] update_cfs_group
> 13.53%    12.15%  [kernel.kallsyms]          [k] update_load_avg

Yup, that's a serious problem, but...

> [Benchmark]
>
> The baseline is on sched/core branch on top of
> commit a6fcdd8d95f7 ("sched/debug: Correct printing for rq->nr_uninterruptible")
>
> Tested will-it-scale context_switch1 case, it shows good improvement
> both on a server and a desktop:
>
> Intel(R) Xeon(R) Platinum 8480+, Sapphire Rapids 2 x 56C/112T = 224 CPUs
> context_switch1_processes -s 100 -t 112 -n
> baseline                   SIS_PAIR
> 1.0                        +68.13%
>
> Intel Core(TM) i9-10980XE, Cascade Lake 18C/36T
> context_switch1_processes -s 100 -t 18 -n
> baseline                   SIS_PAIR
> 1.0                        +45.2%

git@homer: ./context_switch1_processes -s 100 -t 8 -n
(running in an autogroup)

   PerfTop:   30853 irqs/sec  kernel:96.8%  exact: 96.8% lost: 0/0 drop: 0/0 [4000Hz cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------------------------------------

     5.72%  [kernel]       [k] switch_mm_irqs_off
     4.23%  [kernel]       [k] __update_load_avg_se
     3.76%  [kernel]       [k] __update_load_avg_cfs_rq
     3.70%  [kernel]       [k] __schedule
     3.65%  [kernel]       [k] entry_SYSCALL_64
     3.22%  [kernel]       [k] enqueue_task_fair
     2.91%  [kernel]       [k] update_curr
     2.67%  [kernel]       [k] select_task_rq_fair
     2.60%  [kernel]       [k] pipe_read
     2.55%  [kernel]       [k] __switch_to
     2.54%  [kernel]       [k] __calc_delta
     2.44%  [kernel]       [k] dequeue_task_fair
     2.38%  [kernel]       [k] reweight_entity
     2.13%  [kernel]       [k] pipe_write
     1.96%  [kernel]       [k] restore_fpregs_from_fpstate
     1.93%  [kernel]       [k] select_idle_smt
     1.77%  [kernel]       [k] update_load_avg <==
     1.73%  [kernel]       [k] native_sched_clock
     1.66%  [kernel]       [k] try_to_wake_up
     1.52%  [kernel]       [k] _raw_spin_lock_irqsave
     1.47%  [kernel]       [k] update_min_vruntime
     1.42%  [kernel]       [k] update_cfs_group <==
     1.36%  [kernel]       [k] vfs_write
     1.32%  [kernel]       [k] prepare_to_wait_event

...not one with global scope.  My little i7-4790 can play ping-pong all
day long, as can untold numbers of other boxen around the globe.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 48b6f0ca13ac..e65028dcd6a6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7125,6 +7125,21 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>             asym_fits_cpu(task_util, util_min, util_max, target))
>                 return target;
>  
> +       /*
> +        * If the waker and the wakee are good friends to each other,
> +        * putting them within the same SMT domain could reduce C2C
> +        * overhead. SMT idle sibling should be preferred to wakee's
> +        * previous CPU, because the latter could still have the risk of C2C
> +        * overhead.
> +        */
> +       if (sched_feat(SIS_PAIR) && sched_smt_active() &&
> +           current->last_wakee == p && p->last_wakee == current) {
> +               i = select_idle_smt(p, smp_processor_id());
> +
> +               if ((unsigned int)i < nr_cpumask_bits)
> +                       return i;
> +       }
> +
>         /*
>          * If the previous CPU is cache affine and idle, don't be stupid:
>          */

Global scope solutions for non-global issues tend to not work out.  

Below is a sample of potential scaling wreckage for boxen that are NOT
akin to the one you're watching turn caches into silicon based pudding.

Note the *_RR numbers.  Those poked me in the eye because they closely
resemble pipe ping-pong, all fun and games with about as close to zero
work other than scheduling as network-land can get, but for my box, SMT
was the third best option of three.

You just can't beat idle core selection when it comes to getting work
done, which is why SIS evolved to select cores first.

Your box and ilk need help that treats the disease and not the symptom,
or barring that, help that precisely targets boxen having the disease.

	-Mike

10 seconds of 1 netperf client/server instance, no knobs twiddled.

TCP_SENDFILE-1  stacked    Avg:  65387
TCP_SENDFILE-1  cross-smt  Avg:  65658
TCP_SENDFILE-1  cross-core Avg:  96318

TCP_STREAM-1    stacked    Avg:  44322
TCP_STREAM-1    cross-smt  Avg:  42390
TCP_STREAM-1    cross-core Avg:  77850

TCP_MAERTS-1    stacked    Avg:  36636
TCP_MAERTS-1    cross-smt  Avg:  42333
TCP_MAERTS-1    cross-core Avg:  74122

UDP_STREAM-1    stacked    Avg:  52618
UDP_STREAM-1    cross-smt  Avg:  55298
UDP_STREAM-1    cross-core Avg:  97415

TCP_RR-1        stacked    Avg: 242606
TCP_RR-1        cross-smt  Avg: 140863
TCP_RR-1        cross-core Avg: 219400

UDP_RR-1        stacked    Avg: 282253
UDP_RR-1        cross-smt  Avg: 202062
UDP_RR-1        cross-core Avg: 288620

  reply	other threads:[~2023-05-16  6:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16  1:11 [RFC PATCH] sched/fair: Introduce SIS_PAIR to wakeup task on local idle core first Chen Yu
2023-05-16  6:23 ` Mike Galbraith [this message]
2023-05-16  8:41   ` Chen Yu
2023-05-16 11:51     ` Mike Galbraith
2023-05-17 16:57       ` Chen Yu
2023-05-17 19:52         ` Mike Galbraith
2023-05-18  3:41           ` Chen Yu
2023-05-19 11:15             ` Mike Galbraith
2023-05-18  3:30         ` K Prateek Nayak
2023-05-18  4:17           ` Chen Yu
2023-05-18 10:26             ` K Prateek Nayak
2023-05-22  4:10               ` Chen Yu
2023-05-22  7:10                 ` Mike Galbraith
2023-05-25  7:47                   ` Chen Yu
2023-05-25  9:33                     ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19664c68f77f5b23a86e5636a17ad2cbfa073f78.camel@gmx.de \
    --to=efault@gmx.de \
    --cc=aaron.lu@intel.com \
    --cc=arjan.van.de.ven@intel.com \
    --cc=baohua@kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wuyun.abel@bytedance.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yu.c.chen@intel.com \
    --cc=yu.chen.surf@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).