linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
To: David Vernet <void@manifault.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, kprateek.nayak@amd.com, aaron.lu@intel.com,
	clm@meta.com, tj@kernel.org, roman.gushchin@linux.dev,
	kernel-team@meta.com
Subject: Re: [PATCH v2 6/7] sched: Shard per-LLC shared runqueues
Date: Wed, 12 Jul 2023 15:36:27 +0530	[thread overview]
Message-ID: <ZK57IxhhZMw2ZaEA@BLR-5CG11610CF.amd.com> (raw)
In-Reply-To: <20230711195757.GD389526@maniforge>

On Tue, Jul 11, 2023 at 02:57:57PM -0500, David Vernet wrote:
> On Tue, Jul 11, 2023 at 12:49:58PM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 10, 2023 at 03:03:41PM -0500, David Vernet wrote:

[..snip..]

> > > +static int shared_runq_shard_idx(const struct shared_runq *runq, int cpu)
> > > +{
> > > +	return cpu % runq->num_shards;
> > 
> > I would suggest either:
> > 
> > 	(cpu >> 1) % num_shards
> >
> > or keeping num_shards even, to give SMT siblings a fighting chance to
> > hit the same bucket.
> 
> Given that neither of these approaches guarantees that the SMT siblings
> are in the same bucket, I'll just go with your suggestion which is
> simpler.
> 
> Seems inevitable that we'll want to have another debugfs knob to adjust
> the number of shards, but IMO it's preferable to just apply your
> suggestion in v3 and hold off on adding that complexity until we know we
> need it.
> 
> > (I've no idea how SMT4 (or worse SMT8) is typically enumerated, so
> > someone from the Power/Sparc/MIPS world would have to go play with that
> > if they so care)
> 
> Yeah, no idea either. If these things end up varying a lot across
> different architectures then we can look into making shard assignment
> architecture specific.

On POWER, the SMT siblings are enumerated in a sequential fashion, i.e

CPU id of a thread = Core_id * threads_per_core + thread_id_within_core.

But IIRC, POWER sets L2 domain as the LLC. On POWER8 (with SMT8) and
POWER9(with SMT4 on Baremetal and SMT8 on VMs), LLC size is 8. Even
with SHARED_RUNQ_SHARD_SZ = 6, there will only be 1 shard with the
current formula

	num_shards = max(per_cpu(sd_llc_size, i)/SHARED_RUNQ_SHARD_SZ, 1);

(Aside: with the above formula, on a topology with 6 < sd_llc_size <
12, num_shards will remain 1, with the shard size exceeding the
intended SHARD_SZ. Was this the intention ?)

Even on x86, there is no uniformity in how the SMT threads are
numbered. On AMD EPYC Baremetal, the first threads of all the cores
are enumerated first and then the sibling threads. So, on an EPYC
server with 128 cores in total, the SMT sibings are {0,128}, {1, 129}, ...

With SHARED_RUNQ_SHARD_SZ = 6,

On Zen2 EPYC Baremetal, with LLC size = 8, num_shards = 1.  This
simplifies stuff!

On Zen3, Zen4 EPYC Baremetal, with LLC size = 16, num_shards = 2.

Here, (cpu % num_shards) ensures that the SMT siblings belong to the
same shard along with 3 other cores.

On some Intel servers, it is possible that the CPU numbers are
interleaved across the two sockets. On my 2 socket, 32Cores per socket
Ice Lake Server, all the even numbered CPUs are in one socket and all
the odd numbered CPUs in the other socket.

The SMT siblings are {0,64}, {2, 66}, .... on one socket and {1, 65},
{3, 67}, .. on the other.

On this system, LLC size = 64. With SHARED_RUNQ_SHARD_SZ = 6,
num_shards = 10.

So with (cpu % num_shards) the siblings {0, 64} ... will belong to
different shards.

What would be good to have is

1. shard_size determined by individual architectures. If none is
   provided, we pick the default shard_size.

2. A sharding scheme which guarantees that SMT siblinngs will belong
   to the same shard as long as shard_size is at least as big as the SMT
   size.

--
Thanks and Regards
gautham.



  reply	other threads:[~2023-07-12 10:06 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 20:03 [PATCH v2 0/7] sched: Implement shared runqueue in CFS David Vernet
2023-07-10 20:03 ` [PATCH v2 1/7] sched: Expose move_queued_task() from core.c David Vernet
2023-07-10 20:03 ` [PATCH v2 2/7] sched: Move is_cpu_allowed() into sched.h David Vernet
2023-07-10 20:03 ` [PATCH v2 3/7] sched: Check cpu_active() earlier in newidle_balance() David Vernet
2023-07-10 20:03 ` [PATCH v2 4/7] sched/fair: Add SHARED_RUNQ sched feature and skeleton calls David Vernet
2023-07-11  9:45   ` Peter Zijlstra
2023-07-11 16:19     ` David Vernet
2023-07-12  8:39   ` Abel Wu
2023-07-12 21:34     ` David Vernet
2023-07-10 20:03 ` [PATCH v2 5/7] sched: Implement shared runqueue in CFS David Vernet
2023-07-11 10:18   ` Peter Zijlstra
2023-07-11 16:26     ` David Vernet
2023-07-12  6:00   ` Gautham R. Shenoy
2023-07-12 19:13     ` David Vernet
2023-07-12 10:47   ` Abel Wu
2023-07-12 22:16     ` David Vernet
2023-07-13  3:43       ` Abel Wu
2023-07-13  4:05         ` David Vernet
2023-07-13  7:58   ` Aaron Lu
2023-07-13  8:29   ` Peter Zijlstra
2023-07-10 20:03 ` [PATCH v2 6/7] sched: Shard per-LLC shared runqueues David Vernet
2023-07-11 10:49   ` Peter Zijlstra
2023-07-11 19:57     ` David Vernet
2023-07-12 10:06       ` Gautham R. Shenoy [this message]
2023-07-12 12:22         ` Peter Zijlstra
2023-07-10 20:03 ` [PATCH v2 7/7] sched: Move shared_runq to __{enqueue,dequeue}_entity() David Vernet
2023-07-11 10:51   ` Peter Zijlstra
2023-07-11 16:30     ` David Vernet
2023-07-11 11:42 ` [PATCH v2 0/7] sched: Implement shared runqueue in CFS Peter Zijlstra
2023-07-11 21:33   ` David Vernet
2023-07-21  9:12 ` Gautham R. Shenoy
2023-07-25 20:22   ` David Vernet
2023-08-02  6:32     ` Gautham R. Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZK57IxhhZMw2ZaEA@BLR-5CG11610CF.amd.com \
    --to=gautham.shenoy@amd.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=clm@meta.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).