linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Minchan Kim <minchan@kernel.org>,
	 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	 Vitaly Wool <vitaly.wool@konsulko.com>,
	Nhat Pham <nphamcs@gmail.com>,
	 Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
	Yu Zhao <yuzhao@google.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: zswap: multiple zpool support
Date: Wed, 14 Jun 2023 13:50:09 -0700	[thread overview]
Message-ID: <CAJD7tkZ0BS5+0P_QiLnYKjm2tV6GGwzdSSXEZX67RLM3Ft9QmA@mail.gmail.com> (raw)
In-Reply-To: <ZInV6Wju2NdLyzC6@cmpxchg.org>

On Wed, Jun 14, 2023 at 7:59 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Jun 13, 2023 at 01:13:59PM -0700, Yosry Ahmed wrote:
> > On Mon, Jun 5, 2023 at 6:56 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > >
> > > On Fri, Jun 2, 2023 at 1:24 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > Sorry, I should have been more precise.
> > > >
> > > > I'm saying that using NR_CPUS pools, and replacing the hash with
> > > > smp_processor_id(), would accomplish your goal of pool concurrency.
> > > > But it would do so with a broadly-used, well-understood scaling
> > > > factor. We might not need a config option at all.
> > > >
> > > > The lock would still be there, but contention would be reduced fairly
> > > > optimally (barring preemption) for store concurrency at least. Not
> > > > fully eliminated due to frees and compaction, though, yes.
> >
> > I thought about this again and had some internal discussions, and I am
> > more unsure about it. Beyond the memory overhead, having too many
> > zpools might result in higher fragmentation within the zpools. For
> > zsmalloc, we do not compact across multiple zpools for example.
> >
> > We have been using a specific number of zpools in our production for
> > years now, we know it can be tuned to achieve performance gains. OTOH,
> > percpu zpools (or NR_CPUS pools) seems like too big of a hammer,
> > probably too many zpools in a lot of cases, and we wouldn't know how
> > many zpools actually fits our workloads.
>
> Is it the same number across your entire fleet and all workloads?

Yes.

>
> How large *is* the number in relation to CPUs?

It differs based on the number of cpus on the machine. We use 32
zpools on all machines.

>
> > I see value in allowing the number of zpools to be directly
> > configurable (it can always be left as 1), and am worried that with
> > percpu we would be throwing away years of production testing for an
> > unknown.
> >
> > I am obviously biased, but I don't think this adds significant
> > complexity to the zswap code as-is (or as v2 is to be precise).
>
> I had typed out this long list of reasons why I hate this change, and
> then deleted it to suggest the per-cpu scaling factor.
>
> But to summarize my POV, I think a user-facing config option for this
> is completely inappropriate. There are no limits, no guidance, no sane
> default. And it's very selective about micro-optimizing this one lock
> when there are several locks and datastructures of the same scope in
> the swap path. This isn't a reasonable question to ask people building
> kernels. It's writing code through the Kconfig file.

It's not just swap path, it's any contention that happens within the
zpool between its different operations (map, alloc, compaction, etc).
My thought was that if a user observes high contention in any of the
zpool operations, they can increase the number of zpools -- basically
this should be empirically decided. If unsure, the user can just leave
it as a single zpool.

>
> Data structure scalability should be solved in code, not with config
> options.

I agree, but until we have a more fundamental architectural solution,
having multiple zpools to address scalability is a win. We can remove
the config option later if needed.

>
> My vote on the patch as proposed is NAK.

I hear the argument about the config option not being ideal here, but
NR_CPUs is also not ideal.

How about if we introduce it as a constant in the kernel? We have a
lot of other constants around the kernel that do not scale with the
machine size (e.g. SWAP_CLUSTER_MAX). We can start with 32, which is a
value that we have tested in our data centers for many years now and
know to work well. We can revisit later if needed.

WDYT?


  reply	other threads:[~2023-06-14 20:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31  2:29 [PATCH] mm: zswap: multiple zpool support Yosry Ahmed
2023-05-31  2:32 ` Yosry Ahmed
     [not found] ` <20230601155825.GF102494@cmpxchg.org>
     [not found]   ` <CAJD7tkaFSfpTtB_ua_9QzR2voE1-hixv6RMJZd=WqpGmY93dSw@mail.gmail.com>
     [not found]     ` <20230602164942.GA215355@cmpxchg.org>
     [not found]       ` <CAJD7tkbp96S8MdrcH8y0V2G5Q-Zq6U4DAuweYhP-MjUWgcmjsQ@mail.gmail.com>
     [not found]         ` <20230602183410.GB215355@cmpxchg.org>
     [not found]           ` <CAJD7tka18Vw7HpA1gh6wWJcaCJ_U6jNfCC696pX=UkbiXKZMvQ@mail.gmail.com>
     [not found]             ` <20230602202453.GA218605@cmpxchg.org>
     [not found]               ` <CAJD7tkZBxddm1c4f99KmY+VwKU3jbYBMaNuTtfpis7a1E6242Q@mail.gmail.com>
2023-06-13 20:13                 ` Yosry Ahmed
2023-06-14 14:59                   ` Johannes Weiner
2023-06-14 20:50                     ` Yosry Ahmed [this message]
2023-06-20 19:48                       ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkZ0BS5+0P_QiLnYKjm2tV6GGwzdSSXEZX67RLM3Ft9QmA@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).