All of lore.kernel.org
 help / color / mirror / Atom feed
From: K Prateek Nayak <kprateek.nayak@amd.com>
To: Abel Wu <wuyun.abel@bytedance.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <valentin.schneider@arm.com>
Cc: Josh Don <joshdon@google.com>, Chen Yu <yu.c.chen@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	"Gautham R . Shenoy" <gautham.shenoy@amd.com>,
	Aubrey Li <aubrey.li@intel.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	Yicong Yang <yangyicong@huawei.com>,
	Barry Song <21cnbao@gmail.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 0/4] sched/fair: Improve scan efficiency of SIS
Date: Tue, 15 Nov 2022 16:58:37 +0530	[thread overview]
Message-ID: <906747ff-148c-f058-dc94-7a9225125f52@amd.com> (raw)
In-Reply-To: <2a049755-57cb-4943-0850-cbbf2537c97e@bytedance.com>

Hello Abel,

Thank you for taking a look at the report.

On 11/15/2022 2:01 PM, Abel Wu wrote:
> Hi Prateek, thanks very much for your detailed testing!
> 
> On 11/14/22 1:45 PM, K Prateek Nayak wrote:
>> Hello Abel,
>>
>> Sorry for the delay. I've tested the patch on a dual socket Zen3 system
>> (2 x 64C/128T)
>>
>> tl;dr
>>
>> o I do not notice any regressions with the standard benchmarks.
>> o schbench sees a nice improvement to the tail latency when the number
>>    of worker are equal to the number of cores in the system in NPS1 and
>>    NPS2 mode. (Marked with "^")
>> o Few data points show improvements in tbench in NPS1 and NPS2 mode.
>>    (Marked with "^")
>>
>> I'm still in the process of running larger workloads. If there is any
>> specific workload you would like me to run on the test system, please
>> do let me know. Below is the detailed report:
> 
> Not particularly in my mind, and I think testing larger workloads is
> great. Thanks!
>
>>
>> Following are the results from running standard benchmarks on a
>> dual socket Zen3 (2 x 64C/128T) machine configured in different
>> NPS modes.
>>
>> NPS Modes are used to logically divide single socket into
>> multiple NUMA region.
>> Following is the NUMA configuration for each NPS mode on the system:
>>
>> NPS1: Each socket is a NUMA node.
>>      Total 2 NUMA nodes in the dual socket machine.
>>
>>      Node 0: 0-63,   128-191
>>      Node 1: 64-127, 192-255
>>
>> NPS2: Each socket is further logically divided into 2 NUMA regions.
>>      Total 4 NUMA nodes exist over 2 socket.
>>          Node 0: 0-31,   128-159
>>      Node 1: 32-63,  160-191
>>      Node 2: 64-95,  192-223
>>      Node 3: 96-127, 223-255
>>
>> NPS4: Each socket is logically divided into 4 NUMA regions.
>>      Total 8 NUMA nodes exist over 2 socket.
>>          Node 0: 0-15,    128-143
>>      Node 1: 16-31,   144-159
>>      Node 2: 32-47,   160-175
>>      Node 3: 48-63,   176-191
>>      Node 4: 64-79,   192-207
>>      Node 5: 80-95,   208-223
>>      Node 6: 96-111,  223-231
>>      Node 7: 112-127, 232-255
>>
>> Benchmark Results:
>>
>> Kernel versions:
>> - tip:          5.19.0 tip sched/core
>> - sis_core:     5.19.0 tip sched/core + this series
>>
>> When we started testing, the tip was at:
>> commit fdf756f71271 ("sched: Fix more TASK_state comparisons")
>>
>> ~~~~~~~~~~~~~
>> ~ hackbench ~
>> ~~~~~~~~~~~~~
>>
>> o NPS1
>>
>> Test:            tip            sis_core
>>   1-groups:       4.06 (0.00 pct)       4.26 (-4.92 pct)    *
>>   1-groups:       4.14 (0.00 pct)       4.09 (1.20 pct)    [Verification Run]
>>   2-groups:       4.76 (0.00 pct)       4.71 (1.05 pct)
>>   4-groups:       5.22 (0.00 pct)       5.11 (2.10 pct)
>>   8-groups:       5.35 (0.00 pct)       5.31 (0.74 pct)
>> 16-groups:       7.21 (0.00 pct)       6.80 (5.68 pct)
>>
>> o NPS2
>>
>> Test:            tip            sis_core
>>   1-groups:       4.09 (0.00 pct)       4.08 (0.24 pct)
>>   2-groups:       4.70 (0.00 pct)       4.69 (0.21 pct)
>>   4-groups:       5.05 (0.00 pct)       4.92 (2.57 pct)
>>   8-groups:       5.35 (0.00 pct)       5.26 (1.68 pct)
>> 16-groups:       6.37 (0.00 pct)       6.34 (0.47 pct)
>>
>> o NPS4
>>
>> Test:            tip            sis_core
>>   1-groups:       4.07 (0.00 pct)       3.99 (1.96 pct)
>>   2-groups:       4.65 (0.00 pct)       4.59 (1.29 pct)
>>   4-groups:       5.13 (0.00 pct)       5.00 (2.53 pct)
>>   8-groups:       5.47 (0.00 pct)       5.43 (0.73 pct)
>> 16-groups:       6.82 (0.00 pct)       6.56 (3.81 pct)
> 
> Although each cpu will get 2.5 tasks when 16-groups, which can
> be considered overloaded, I tested in AMD EPYC 7Y83 machine and
> the total cpu usage was ~82% (with some older kernel version),
> so there is still lots of idle time.
> 
> I guess cutting off at 16-groups is because it's enough loaded
> compared to the real workloads, so testing more groups might just
> be a waste of time?

The machine has 16 LLCs so I capped the results at 16-groups.
Previously I had seen some run-to-run variance with larger group counts
so I limited the reports to 16-groups. I'll run hackbench with more
number of groups (32, 64, 128, 256) and get back to you with the
results along with results for a couple of long running workloads. 

> 
> Thanks & Best Regards,
>     Abel
> 
> [..snip..]
>


--
Thanks and Regards,
Prateek

  reply	other threads:[~2022-11-15 11:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 12:28 [PATCH v6 0/4] sched/fair: Improve scan efficiency of SIS Abel Wu
2022-10-19 12:28 ` [PATCH v6 1/4] sched/fair: Skip core update if task pending Abel Wu
2022-10-19 12:28 ` [PATCH v6 2/4] sched/fair: Ignore SIS_UTIL when has_idle_core Abel Wu
2022-10-19 12:28 ` [PATCH v6 3/4] sched/fair: Introduce SIS_CORE Abel Wu
2022-10-21  4:03   ` Chen Yu
2022-10-21  4:30     ` Abel Wu
2022-10-21  4:34       ` Chen Yu
2022-10-21  9:35         ` Abel Wu
2022-10-21 11:14           ` Chen Yu
2022-10-19 12:28 ` [PATCH v6 4/4] sched/fair: Deal with SIS scan failures Abel Wu
2022-11-04  7:29 ` [PATCH v6 0/4] sched/fair: Improve scan efficiency of SIS Abel Wu
2022-11-14  5:45 ` K Prateek Nayak
2022-11-15  8:31   ` Abel Wu
2022-11-15 11:28     ` K Prateek Nayak [this message]
2022-11-22 11:28       ` K Prateek Nayak
2022-11-24  3:50         ` Abel Wu
2023-02-07  3:42 ` K Prateek Nayak
2023-02-16 13:18   ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=906747ff-148c-f058-dc94-7a9225125f52@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=21cnbao@gmail.com \
    --cc=aubrey.li@intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=riel@surriel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wuyun.abel@bytedance.com \
    --cc=yangyicong@huawei.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.