Linux-mm Archive on
 help / color / Atom feed
From: Michal Hocko <>
To: Andrew Morton <>
Cc: Tim Chen <>,
	Matthew Wilcox <>,
	Vladimir Davydov <>,
	Johannes Weiner <>,
	Dave Hansen <>,
	Ying Huang <>,,
Subject: Re: [Patch] mm: Increase pagevec size on large system
Date: Wed, 1 Jul 2020 12:05:57 +0200
Message-ID: <> (raw)
In-Reply-To: <>

On Tue 30-06-20 17:27:13, Andrew Morton wrote:
> On Mon, 29 Jun 2020 09:57:42 -0700 Tim Chen <> wrote:
> > I am okay with Matthew's suggestion of keeping the stack pagevec size unchanged.
> > Andrew, do you have a preference?
> > 
> > I was assuming that for people who really care about saving the kernel memory
> > usage, they would make CONFIG_NR_CPUS small. I also have a hard time coming
> > up with a better scheme.
> > 
> > Otherwise, we will have to adjust the pagevec size when we actually 
> > found out how many CPUs we have brought online.  It seems like a lot
> > of added complexity for going that route.
> Even if we were to do this, the worst-case stack usage on the largest
> systems might be an issue.  If it isn't then we might as well hard-wire
> it to 31 elements anyway,

I am not sure this is really a matter of how large the machine is. For
example in the writeout paths this really depends on how complex the IO
stack is much more.

Direct memory reclaim is also a very sensitive stack context. As we are
not doing any writeout anymore I believe a large part of the on stack fs
usage is not really relevant. There seem to be only few on stack users
inside mm and they shouldn't be part of the memory reclaim AFAICS.
I have simply did
$ git grep "^[[:space:]]*struct pagevec[[:space:]][^*]"
and fortunately there weren't that many hits to get an idea about the
usage. There is some usage in the graphic stack that should be double
check though.

Btw. I think that pvec is likely a suboptimal data structure for many
on stack users. It allows only very few slots to batch. Something like
mmu_gather which can optimistically increase the batch sounds like
something that would be worth

The main question is whether the improvement is  visible on any
non-artificial workloads. If yes then the quick fix
is likely the best way forward. If this is mostly a microbench thingy
then I would be happier to see a more longterm solution. E.g. scale
pcp pagevec sizes on the machine size or even use something better than
pvec (e.g. lru_deactivate_file could scale much more and I am not sure
pcp aspect is really improving anything - why don't we simply invalidate
all gathered pages at once at the end of invalidate_mapping_pages?).
Michal Hocko

      reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26 21:23 Tim Chen
2020-06-27  3:13 ` Matthew Wilcox
2020-06-27  3:47   ` Andrew Morton
2020-06-29 16:57     ` Tim Chen
2020-07-01  0:27       ` Andrew Morton
2020-07-01 10:05         ` Michal Hocko [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on

Archives are clonable:
	git clone --mirror linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ \
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone