Linux-Fsdevel Archive on lore.kernel.org
 help / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>, Chris Mason <clm@fb.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"vdavydov.dev@gmail.com" <vdavydov.dev@gmail.com>
Subject: Re: [PATCH 1/2] Revert "mm: don't reclaim inodes with many attached pages"
Date: Thu, 31 Jan 2019 18:57:10 +0000
Message-ID: <20190131185704.GA8755@castle.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190131091011.GP18811@dhcp22.suse.cz>

On Thu, Jan 31, 2019 at 10:10:11AM +0100, Michal Hocko wrote:
> On Thu 31-01-19 12:34:03, Dave Chinner wrote:
> > On Wed, Jan 30, 2019 at 12:21:07PM +0000, Chris Mason wrote:
> > > 
> > > 
> > > On 29 Jan 2019, at 23:17, Dave Chinner wrote:
> > > 
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > This reverts commit a76cf1a474d7dbcd9336b5f5afb0162baa142cf0.
> > > >
> > > > This change causes serious changes to page cache and inode cache
> > > > behaviour and balance, resulting in major performance regressions
> > > > when combining worklaods such as large file copies and kernel
> > > > compiles.
> > > >
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=202441
> > > 
> > > I'm a little confused by the latest comment in the bz:
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=202441#c24
> > 
> > Which says the first patch that changed the shrinker behaviour is
> > the underlying cause of the regression.
> > 
> > > Are these reverts sufficient?
> > 
> > I think so.
> > 
> > > Roman beat me to suggesting Rik's followup.  We hit a different problem 
> > > in prod with small slabs, and have a lot of instrumentation on Rik's 
> > > code helping.
> > 
> > I think that's just another nasty, expedient hack that doesn't solve
> > the underlying problem. Solving the underlying problem does not
> > require changing core reclaim algorithms and upsetting a page
> > reclaim/shrinker balance that has been stable and worked well for
> > just about everyone for years.
> 
> I tend to agree with Dave here. Slab pressure balancing is quite subtle
> and easy to get wrong. If we want to plug the problem with offline
> memcgs then the fix should be targeted at that problem. So maybe we want
> to emulate high pressure on offline memcgs only. There might be other
> issues to resolve for small caches but let's start with something more
> targeted first please.

First, the path proposed by Dave is not regression-safe too. A slab object
can be used by other cgroups as well, so creating an artificial pressure on
the dying cgroup might perfectly affect the rest of the system. We do reparent
slab lists on offlining, so there is even no easy way to iterate over them.
Also, creating an artifical pressure will create unnecessary CPU load.

So I'd really prefer to make the "natural" memory pressure to be applied
in a way, that doesn't leave any stalled objects behind.

Second, the code around slab pressure is not "worked well for years": as I can
see the latest major change was made about a year ago by Josef Bacik
(9092c71bb724 "mm: use sc->priority for slab shrink targets").

The existing balance, even if it works perfectly for some cases, isn't something
set in stone. We're really under-scanning small cgroups, and I strongly believe
that what Rik is proposing is a right thing to do. If we don't scan objects
in small cgroups unless we have really strong memory pressure, we're basically
wasting memory.

And it really makes no sense to reclaim inodes with tons of attached pagecache
as easy as "empty" inodes. At the end, all we need is to free some memory, and
treating a sub-page object equal to many thousands page object is just strange.
If it's simple "wrong" and I do miss something, please, explain. Maybe we need
something more complicated than in my patch, but saying that existing code is
just perfect and can't be touched at all makes no sense to me.

So, assuming all this, can we, please, first check if Rik's patch is addressing
the regression?

Thanks!

  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30  4:17 [PATCH 0/2] [REGRESSION v4.19-20] mm: shrinkers are now way too aggressive Dave Chinner
2019-01-30  4:17 ` [PATCH 1/2] Revert "mm: don't reclaim inodes with many attached pages" Dave Chinner
2019-01-30 12:21   ` Chris Mason
2019-01-31  1:34     ` Dave Chinner
2019-01-31  9:10       ` Michal Hocko
2019-01-31 18:57         ` Roman Gushchin [this message]
2019-01-31 22:19           ` Dave Chinner
2019-02-04 21:47             ` Dave Chinner
2019-02-07 10:27             ` Jan Kara
2019-02-08  5:37               ` Andrew Morton
2019-02-08  9:55                 ` Jan Kara
2019-02-08 12:50                   ` Jan Kara
2019-02-08 22:49                     ` Andrew Morton
2019-02-09  3:42                       ` Roman Gushchin
2019-02-08 21:25                 ` Dave Chinner
2019-02-11 15:34               ` Wolfgang Walter
2019-01-31 15:48       ` Chris Mason
2019-02-01 23:39         ` Dave Chinner
2019-01-30  4:17 ` [PATCH 2/2] Revert "mm: slowly shrink slabs with a relatively small number of objects" Dave Chinner
2019-01-30  5:48 ` [PATCH 0/2] [REGRESSION v4.19-20] mm: shrinkers are now way too aggressive Roman Gushchin

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190131185704.GA8755@castle.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox