All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: "Li, Aubrey" <aubrey.li@linux.intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, Jiang Biao <benbjiang@gmail.com>
Subject: RE: [RFC PATCH v8] sched/fair: select idle cpu from idle cpumask for task wakeup
Date: Sun, 13 Dec 2020 23:29:28 +0000	[thread overview]
Message-ID: <121565627e944f8e9dde4080d19d5b02@hisilicon.com> (raw)
In-Reply-To: <698a61bf-6eea-8725-95c0-a5ea811e2bb4@linux.intel.com>



> -----Original Message-----
> From: Li, Aubrey [mailto:aubrey.li@linux.intel.com]
> Sent: Saturday, December 12, 2020 4:25 AM
> To: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Ingo Molnar <mingo@redhat.com>; Peter Zijlstra <peterz@infradead.org>;
> Juri Lelli <juri.lelli@redhat.com>; Mel Gorman <mgorman@techsingularity.net>;
> Valentin Schneider <valentin.schneider@arm.com>; Qais Yousef
> <qais.yousef@arm.com>; Dietmar Eggemann <dietmar.eggemann@arm.com>; Steven
> Rostedt <rostedt@goodmis.org>; Ben Segall <bsegall@google.com>; Tim Chen
> <tim.c.chen@linux.intel.com>; linux-kernel <linux-kernel@vger.kernel.org>;
> Mel Gorman <mgorman@suse.de>; Jiang Biao <benbjiang@gmail.com>
> Subject: Re: [RFC PATCH v8] sched/fair: select idle cpu from idle cpumask for
> task wakeup
> 
> On 2020/12/11 23:22, Vincent Guittot wrote:
> > On Fri, 11 Dec 2020 at 16:19, Li, Aubrey <aubrey.li@linux.intel.com> wrote:
> >>
> >> On 2020/12/11 23:07, Vincent Guittot wrote:
> >>> On Thu, 10 Dec 2020 at 02:44, Aubrey Li <aubrey.li@linux.intel.com> wrote:
> >>>>
> >>>> Add idle cpumask to track idle cpus in sched domain. Every time
> >>>> a CPU enters idle, the CPU is set in idle cpumask to be a wakeup
> >>>> target. And if the CPU is not in idle, the CPU is cleared in idle
> >>>> cpumask during scheduler tick to ratelimit idle cpumask update.
> >>>>
> >>>> When a task wakes up to select an idle cpu, scanning idle cpumask
> >>>> has lower cost than scanning all the cpus in last level cache domain,
> >>>> especially when the system is heavily loaded.
> >>>>
> >>>> Benchmarks including hackbench, schbench, uperf, sysbench mysql and
> >>>> kbuild have been tested on a x86 4 socket system with 24 cores per
> >>>> socket and 2 hyperthreads per core, total 192 CPUs, no regression
> >>>> found.
> >>>>
> >>>> v7->v8:
> >>>> - refine update_idle_cpumask, no functionality change
> >>>> - fix a suspicious RCU usage warning with CONFIG_PROVE_RCU=y
> >>>>
> >>>> v6->v7:
> >>>> - place the whole idle cpumask mechanism under CONFIG_SMP
> >>>>
> >>>> v5->v6:
> >>>> - decouple idle cpumask update from stop_tick signal, set idle CPU
> >>>>   in idle cpumask every time the CPU enters idle
> >>>>
> >>>> v4->v5:
> >>>> - add update_idle_cpumask for s2idle case
> >>>> - keep the same ordering of tick_nohz_idle_stop_tick() and update_
> >>>>   idle_cpumask() everywhere
> >>>>
> >>>> v3->v4:
> >>>> - change setting idle cpumask from every idle entry to tickless idle
> >>>>   if cpu driver is available
> >>>> - move clearing idle cpumask to scheduler_tick to decouple nohz mode
> >>>>
> >>>> v2->v3:
> >>>> - change setting idle cpumask to every idle entry, otherwise schbench
> >>>>   has a regression of 99th percentile latency
> >>>> - change clearing idle cpumask to nohz_balancer_kick(), so updating
> >>>>   idle cpumask is ratelimited in the idle exiting path
> >>>> - set SCHED_IDLE cpu in idle cpumask to allow it as a wakeup target
> >>>>
> >>>> v1->v2:
> >>>> - idle cpumask is updated in the nohz routines, by initializing idle
> >>>>   cpumask with sched_domain_span(sd), nohz=off case remains the original
> >>>>   behavior
> >>>>
> >>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>> Cc: Mel Gorman <mgorman@suse.de>
> >>>> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> >>>> Cc: Qais Yousef <qais.yousef@arm.com>
> >>>> Cc: Valentin Schneider <valentin.schneider@arm.com>
> >>>> Cc: Jiang Biao <benbjiang@gmail.com>
> >>>> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> >>>> Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
> >>>
> >>> This version looks good to me. I don't see regressions of v5 anymore
> >>> and see some improvements on heavy cases
> >>
> >> v5 or v8?
> >
> > the v8 looks good to me and I don't see the regressions that I have
> > seen with the v5 anymore
> >
> Sounds great, thanks, :)


Hi Aubrey,

The patch looks great. But I didn't find any hackbench improvement
on kunpeng 920 which has 24 cores for each llc span. Llc span is also
one numa node. The topology is like:
# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 128669 MB
node 0 free: 126995 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
43 44 45 46 47
node 1 size: 128997 MB
node 1 free: 127539 MB
node 2 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71
node 2 size: 129021 MB
node 2 free: 127106 MB
node 3 cpus: 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
91 92 93 94 95
node 3 size: 127993 MB
node 3 free: 126739 MB
node distances:
node   0   1   2   3
  0:  10  12  20  22
  1:  12  10  22  24
  2:  20  22  10  12
  3:  22  24  12  10

Benchmark command:
numactl -N 0-1 hackbench -p -T -l 20000 -g $1

for each g, I ran 10 times to get the average time. And I tested
g from 1 to 10.

g     1      2      3      4      5      6       7     8        9       10
w/o   1.4733 1.5992 1.9353 2.1563 2.8448 3.3305 3.9616 4.4870 5.0786 5.6983
w/    1.4709 1.6152 1.9474 2.1512 2.8298 3.2998 3.9472 4.4803 5.0462 5.6505

Is it because the core number is small in llc span in my test?

Thanks
Barry

  reply	other threads:[~2020-12-13 23:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-10  1:43 [RFC PATCH v8] sched/fair: select idle cpu from idle cpumask for task wakeup Aubrey Li
2020-12-11 15:07 ` Vincent Guittot
2020-12-11 15:18   ` Li, Aubrey
2020-12-11 15:22     ` Vincent Guittot
2020-12-11 15:24       ` Li, Aubrey
2020-12-13 23:29         ` Song Bao Hua (Barry Song) [this message]
2020-12-15 12:41           ` Li, Aubrey
2021-03-04 13:51   ` Li, Aubrey
2021-03-08 11:30     ` Vincent Guittot
2021-03-08 13:50       ` Li, Aubrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=121565627e944f8e9dde4080d19d5b02@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@gmail.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.