linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Vernet <void@manifault.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, joshdon@google.com,
	roman.gushchin@linux.dev, tj@kernel.org, kernel-team@meta.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS
Date: Tue, 20 Jun 2023 12:36:26 -0500	[thread overview]
Message-ID: <20230620173626.GA3027191@maniforge> (raw)
In-Reply-To: <20230616005338.GA115001@ziqianlu-dell>

On Fri, Jun 16, 2023 at 08:53:38AM +0800, Aaron Lu wrote:
> On Thu, Jun 15, 2023 at 06:26:05PM -0500, David Vernet wrote:
>  
> > Ok, it seems that the issue is that I wasn't creating enough netperf
> > clients. I assumed that -n $(nproc) was sufficient. I was able to repro
> 
> Yes that switch is confusing.
> 
> > the contention on my 26 core / 52 thread skylake client as well:
> > 
> > 
>  
> > Thanks for the help in getting the repro on my end.
> 
> You are welcome.
> 
> > So yes, there is certainly a scalability concern to bear in mind for
> > swqueue for LLCs with a lot of cores. If you have a lot of tasks quickly
> > e.g. blocking and waking on futexes in a tight loop, I expect a similar
> > issue would be observed.
> > 
> > On the other hand, the issue did not occur on my 7950X. I also wasn't
> 
> Using netperf/UDP_RR?

Correct

> > able to repro the contention on the Skylake if I ran with the default
> > netperf workload rather than UDP_RR (even with the additional clients).
> 
> I also tried that on the 18cores/36threads/LLC Skylake and the contention
> is indeed much smaller than UDP_RR:
> 
>      7.30%     7.29%  [kernel.vmlinux]      [k]      native_queued_spin_lock_slowpath
> 
> But I wouldn't say it's entirely gone. Also consider Skylake has a lot
> fewer cores per LLC than later Intel servers like Icelake and Sapphire
> Rapids and I expect things would be worse on those two machines.

I cannot reproduce this contention locally, even on a slightly larger
Skylake. Not really sure what to make of the difference here. Perhaps
it's because you're running with CONFIG_SCHED_CORE=y? What is the
change in throughput when you run the default workload on your SKL?

> > I didn't bother to take the mean of all of the throughput results
> > between NO_SWQUEUE and SWQUEUE, but they looked roughly equal.
> > 
> > So swqueue isn't ideal for every configuration, but I'll echo my
> > sentiment from [0] that this shouldn't on its own necessarily preclude
> > it from being merged given that it does help a large class of
> > configurations and workloads, and it's disabled by default.
> > 
> > [0]: https://lore.kernel.org/all/20230615000103.GC2883716@maniforge/
> 
> I was wondering: does it make sense to do some divide on machines with
> big LLCs? Like converting the per-LLC swqueue to per-group swqueue where
> the group can be made of ~8 cpus of the same LLC. This will have a
> similar effect of reducing the number of CPUs in a single LLC so the
> scalability issue can hopefully be fixed while at the same time, it
> might still help some workloads. I realized this isn't ideal in that
> wakeup happens at LLC scale so the group thing may not fit very well
> here.
> 
> Just a thought, feel free to ignore it if you don't think this is
> feasible :-)

That's certainly an idea we could explore, but my inclination would be
to keep everything at a per-LLC granularity. It makes it easier to
reason about performance; both in terms of work conservation per-LLC
(again, not every workload suffers from having large LLCs even if others
do, and halving the size of a swqueue in an LLC could harm other
workloads which benefit from the increased work conservation), and in
terms of contention. To the latter point, I think it would be difficult
to choose an LLC size that wasn't somewhat artificial and workload
specific. If someone has that requirement, I think sched_ext would be a
better alternative.

Thanks,
David

  reply	other threads:[~2023-06-20 17:36 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13  5:20 [RFC PATCH 0/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13  5:20 ` [RFC PATCH 1/3] sched: Make migrate_task_to() take any task David Vernet
2023-06-21 13:04   ` Peter Zijlstra
2023-06-22  2:07     ` David Vernet
2023-06-13  5:20 ` [RFC PATCH 2/3] sched/fair: Add SWQUEUE sched feature and skeleton calls David Vernet
2023-06-21 12:49   ` Peter Zijlstra
2023-06-22 14:53     ` David Vernet
2023-06-13  5:20 ` [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS David Vernet
2023-06-13  8:32   ` Peter Zijlstra
2023-06-14  4:35     ` Aaron Lu
2023-06-14  9:27       ` Peter Zijlstra
2023-06-15  0:01       ` David Vernet
2023-06-15  4:49         ` Aaron Lu
2023-06-15  7:31           ` Aaron Lu
2023-06-15 23:26             ` David Vernet
2023-06-16  0:53               ` Aaron Lu
2023-06-20 17:36                 ` David Vernet [this message]
2023-06-21  2:35                   ` Aaron Lu
2023-06-21  2:43                     ` David Vernet
2023-06-21  4:54                       ` Aaron Lu
2023-06-21  5:43                         ` David Vernet
2023-06-21  6:03                           ` Aaron Lu
2023-06-22 15:57                             ` Chris Mason
2023-06-13  8:41   ` Peter Zijlstra
2023-06-14 20:26     ` David Vernet
2023-06-16  8:08   ` Vincent Guittot
2023-06-20 19:54     ` David Vernet
2023-06-20 21:37       ` Roman Gushchin
2023-06-21 14:22       ` Peter Zijlstra
2023-06-19  6:13   ` Gautham R. Shenoy
2023-06-20 20:08     ` David Vernet
2023-06-21  8:17       ` Gautham R. Shenoy
2023-06-22  1:43         ` David Vernet
2023-06-22  9:11           ` Gautham R. Shenoy
2023-06-22 10:29             ` Peter Zijlstra
2023-06-23  9:50               ` Gautham R. Shenoy
2023-06-26  6:04                 ` Gautham R. Shenoy
2023-06-27  3:17                   ` David Vernet
2023-06-27 16:31                     ` Chris Mason
2023-06-21 14:20   ` Peter Zijlstra
2023-06-21 20:34     ` David Vernet
2023-06-22 10:58       ` Peter Zijlstra
2023-06-22 14:43         ` David Vernet
2023-07-10 11:57 ` [RFC PATCH 0/3] " K Prateek Nayak
2023-07-11  4:43   ` David Vernet
2023-07-11  5:06     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230620173626.GA3027191@maniforge \
    --to=void@manifault.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).