linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: "Li, Aubrey" <aubrey.li@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, Jiang Biao <benbjiang@gmail.com>
Subject: RE: [RFC PATCH v8] sched/fair: select idle cpu from idle cpumask for task wakeup
Date: Sun, 13 Dec 2020 23:29:28 +0000	[thread overview]
Message-ID: <121565627e944f8e9dde4080d19d5b02@hisilicon.com> (raw)
In-Reply-To: <698a61bf-6eea-8725-95c0-a5ea811e2bb4@linux.intel.com>



> -----Original Message-----
> From: Li, Aubrey [mailto:aubrey.li@linux.intel.com]
> Sent: Saturday, December 12, 2020 4:25 AM
> To: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Ingo Molnar <mingo@redhat.com>; Peter Zijlstra <peterz@infradead.org>;
> Juri Lelli <juri.lelli@redhat.com>; Mel Gorman <mgorman@techsingularity.net>;
> Valentin Schneider <valentin.schneider@arm.com>; Qais Yousef
> <qais.yousef@arm.com>; Dietmar Eggemann <dietmar.eggemann@arm.com>; Steven
> Rostedt <rostedt@goodmis.org>; Ben Segall <bsegall@google.com>; Tim Chen
> <tim.c.chen@linux.intel.com>; linux-kernel <linux-kernel@vger.kernel.org>;
> Mel Gorman <mgorman@suse.de>; Jiang Biao <benbjiang@gmail.com>
> Subject: Re: [RFC PATCH v8] sched/fair: select idle cpu from idle cpumask for
> task wakeup
> 
> On 2020/12/11 23:22, Vincent Guittot wrote:
> > On Fri, 11 Dec 2020 at 16:19, Li, Aubrey <aubrey.li@linux.intel.com> wrote:
> >>
> >> On 2020/12/11 23:07, Vincent Guittot wrote:
> >>> On Thu, 10 Dec 2020 at 02:44, Aubrey Li <aubrey.li@linux.intel.com> wrote:
> >>>>
> >>>> Add idle cpumask to track idle cpus in sched domain. Every time
> >>>> a CPU enters idle, the CPU is set in idle cpumask to be a wakeup
> >>>> target. And if the CPU is not in idle, the CPU is cleared in idle
> >>>> cpumask during scheduler tick to ratelimit idle cpumask update.
> >>>>
> >>>> When a task wakes up to select an idle cpu, scanning idle cpumask
> >>>> has lower cost than scanning all the cpus in last level cache domain,
> >>>> especially when the system is heavily loaded.
> >>>>
> >>>> Benchmarks including hackbench, schbench, uperf, sysbench mysql and
> >>>> kbuild have been tested on a x86 4 socket system with 24 cores per
> >>>> socket and 2 hyperthreads per core, total 192 CPUs, no regression
> >>>> found.
> >>>>
> >>>> v7->v8:
> >>>> - refine update_idle_cpumask, no functionality change
> >>>> - fix a suspicious RCU usage warning with CONFIG_PROVE_RCU=y
> >>>>
> >>>> v6->v7:
> >>>> - place the whole idle cpumask mechanism under CONFIG_SMP
> >>>>
> >>>> v5->v6:
> >>>> - decouple idle cpumask update from stop_tick signal, set idle CPU
> >>>>   in idle cpumask every time the CPU enters idle
> >>>>
> >>>> v4->v5:
> >>>> - add update_idle_cpumask for s2idle case
> >>>> - keep the same ordering of tick_nohz_idle_stop_tick() and update_
> >>>>   idle_cpumask() everywhere
> >>>>
> >>>> v3->v4:
> >>>> - change setting idle cpumask from every idle entry to tickless idle
> >>>>   if cpu driver is available
> >>>> - move clearing idle cpumask to scheduler_tick to decouple nohz mode
> >>>>
> >>>> v2->v3:
> >>>> - change setting idle cpumask to every idle entry, otherwise schbench
> >>>>   has a regression of 99th percentile latency
> >>>> - change clearing idle cpumask to nohz_balancer_kick(), so updating
> >>>>   idle cpumask is ratelimited in the idle exiting path
> >>>> - set SCHED_IDLE cpu in idle cpumask to allow it as a wakeup target
> >>>>
> >>>> v1->v2:
> >>>> - idle cpumask is updated in the nohz routines, by initializing idle
> >>>>   cpumask with sched_domain_span(sd), nohz=off case remains the original
> >>>>   behavior
> >>>>
> >>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>> Cc: Mel Gorman <mgorman@suse.de>
> >>>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> >>>> Cc: Qais Yousef <qais.yousef@arm.com>
> >>>> Cc: Valentin Schneider <valentin.schneider@arm.com>
> >>>> Cc: Jiang Biao <benbjiang@gmail.com>
> >>>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> >>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> >>>
> >>> This version looks good to me. I don't see regressions of v5 anymore
> >>> and see some improvements on heavy cases
> >>
> >> v5 or v8?
> >
> > the v8 looks good to me and I don't see the regressions that I have
> > seen with the v5 anymore
> >
> Sounds great, thanks, :)


Hi Aubrey,

The patch looks great. But I didn't find any hackbench improvement
on kunpeng 920 which has 24 cores for each llc span. Llc span is also
one numa node. The topology is like:
# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 128669 MB
node 0 free: 126995 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47
node 1 size: 128997 MB
node 1 free: 127539 MB
node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71
node 2 size: 129021 MB
node 2 free: 127106 MB
node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
91 92 93 94 95
node 3 size: 127993 MB
node 3 free: 126739 MB
node distances:
node   0   1   2   3
  0:  10  12  20  22
  1:  12  10  22  24
  2:  20  22  10  12
  3:  22  24  12  10

Benchmark command:
numactl -N 0-1 hackbench -p -T -l 20000 -g $1

for each g, I ran 10 times to get the average time. And I tested
g from 1 to 10.

g     1      2      3      4      5      6       7     8        9       10
w/o   1.4733 1.5992 1.9353 2.1563 2.8448 3.3305 3.9616 4.4870 5.0786 5.6983
w/    1.4709 1.6152 1.9474 2.1512 2.8298 3.2998 3.9472 4.4803 5.0462 5.6505

Is it because the core number is small in llc span in my test?

Thanks
Barry

  reply	other threads:[~2020-12-13 23:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10  1:43 [RFC PATCH v8] sched/fair: select idle cpu from idle cpumask for task wakeup Aubrey Li
2020-12-11 15:07 ` Vincent Guittot
2020-12-11 15:18   ` Li, Aubrey
2020-12-11 15:22     ` Vincent Guittot
2020-12-11 15:24       ` Li, Aubrey
2020-12-13 23:29         ` Song Bao Hua (Barry Song) [this message]
2020-12-15 12:41           ` Li, Aubrey
2021-03-04 13:51   ` Li, Aubrey
2021-03-08 11:30     ` Vincent Guittot
2021-03-08 13:50       ` Li, Aubrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=121565627e944f8e9dde4080d19d5b02@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).