linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Minchan Kim <minchan@kernel.org>,
	 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	 Vitaly Wool <vitaly.wool@konsulko.com>,
	Nhat Pham <nphamcs@gmail.com>,
	 Domenico Cerasuolo <cerasuolodomenico@gmail.com>,
	Yu Zhao <yuzhao@google.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: zswap: multiple zpool support
Date: Tue, 20 Jun 2023 12:48:23 -0700	[thread overview]
Message-ID: <CAJD7tkaN7mQEMf2P6grROmB36UOPXpYodp-MUBm-pZzpkVzoCQ@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkZ0BS5+0P_QiLnYKjm2tV6GGwzdSSXEZX67RLM3Ft9QmA@mail.gmail.com>

On Wed, Jun 14, 2023 at 1:50 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Jun 14, 2023 at 7:59 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Tue, Jun 13, 2023 at 01:13:59PM -0700, Yosry Ahmed wrote:
> > > On Mon, Jun 5, 2023 at 6:56 PM Yosry Ahmed <yosryahmed@google.com> wrote:
> > > >
> > > > On Fri, Jun 2, 2023 at 1:24 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > > Sorry, I should have been more precise.
> > > > >
> > > > > I'm saying that using NR_CPUS pools, and replacing the hash with
> > > > > smp_processor_id(), would accomplish your goal of pool concurrency.
> > > > > But it would do so with a broadly-used, well-understood scaling
> > > > > factor. We might not need a config option at all.
> > > > >
> > > > > The lock would still be there, but contention would be reduced fairly
> > > > > optimally (barring preemption) for store concurrency at least. Not
> > > > > fully eliminated due to frees and compaction, though, yes.
> > >
> > > I thought about this again and had some internal discussions, and I am
> > > more unsure about it. Beyond the memory overhead, having too many
> > > zpools might result in higher fragmentation within the zpools. For
> > > zsmalloc, we do not compact across multiple zpools for example.
> > >
> > > We have been using a specific number of zpools in our production for
> > > years now, we know it can be tuned to achieve performance gains. OTOH,
> > > percpu zpools (or NR_CPUS pools) seems like too big of a hammer,
> > > probably too many zpools in a lot of cases, and we wouldn't know how
> > > many zpools actually fits our workloads.
> >
> > Is it the same number across your entire fleet and all workloads?
>
> Yes.
>
> >
> > How large *is* the number in relation to CPUs?
>
> It differs based on the number of cpus on the machine. We use 32
> zpools on all machines.
>
> >
> > > I see value in allowing the number of zpools to be directly
> > > configurable (it can always be left as 1), and am worried that with
> > > percpu we would be throwing away years of production testing for an
> > > unknown.
> > >
> > > I am obviously biased, but I don't think this adds significant
> > > complexity to the zswap code as-is (or as v2 is to be precise).
> >
> > I had typed out this long list of reasons why I hate this change, and
> > then deleted it to suggest the per-cpu scaling factor.
> >
> > But to summarize my POV, I think a user-facing config option for this
> > is completely inappropriate. There are no limits, no guidance, no sane
> > default. And it's very selective about micro-optimizing this one lock
> > when there are several locks and datastructures of the same scope in
> > the swap path. This isn't a reasonable question to ask people building
> > kernels. It's writing code through the Kconfig file.
>
> It's not just swap path, it's any contention that happens within the
> zpool between its different operations (map, alloc, compaction, etc).
> My thought was that if a user observes high contention in any of the
> zpool operations, they can increase the number of zpools -- basically
> this should be empirically decided. If unsure, the user can just leave
> it as a single zpool.
>
> >
> > Data structure scalability should be solved in code, not with config
> > options.
>
> I agree, but until we have a more fundamental architectural solution,
> having multiple zpools to address scalability is a win. We can remove
> the config option later if needed.
>
> >
> > My vote on the patch as proposed is NAK.
>
> I hear the argument about the config option not being ideal here, but
> NR_CPUs is also not ideal.
>
> How about if we introduce it as a constant in the kernel? We have a
> lot of other constants around the kernel that do not scale with the
> machine size (e.g. SWAP_CLUSTER_MAX). We can start with 32, which is a
> value that we have tested in our data centers for many years now and
> know to work well. We can revisit later if needed.
>
> WDYT?

I sent v3 [1] with the proposed constant instead of a config option,
hopefully this is more acceptable.

[1]https://lore.kernel.org/lkml/20230620194644.3142384-1-yosryahmed@google.com/


      reply	other threads:[~2023-06-20 19:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31  2:29 [PATCH] mm: zswap: multiple zpool support Yosry Ahmed
2023-05-31  2:32 ` Yosry Ahmed
     [not found] ` <20230601155825.GF102494@cmpxchg.org>
     [not found]   ` <CAJD7tkaFSfpTtB_ua_9QzR2voE1-hixv6RMJZd=WqpGmY93dSw@mail.gmail.com>
     [not found]     ` <20230602164942.GA215355@cmpxchg.org>
     [not found]       ` <CAJD7tkbp96S8MdrcH8y0V2G5Q-Zq6U4DAuweYhP-MjUWgcmjsQ@mail.gmail.com>
     [not found]         ` <20230602183410.GB215355@cmpxchg.org>
     [not found]           ` <CAJD7tka18Vw7HpA1gh6wWJcaCJ_U6jNfCC696pX=UkbiXKZMvQ@mail.gmail.com>
     [not found]             ` <20230602202453.GA218605@cmpxchg.org>
     [not found]               ` <CAJD7tkZBxddm1c4f99KmY+VwKU3jbYBMaNuTtfpis7a1E6242Q@mail.gmail.com>
2023-06-13 20:13                 ` Yosry Ahmed
2023-06-14 14:59                   ` Johannes Weiner
2023-06-14 20:50                     ` Yosry Ahmed
2023-06-20 19:48                       ` Yosry Ahmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkaN7mQEMf2P6grROmB36UOPXpYodp-MUBm-pZzpkVzoCQ@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).