Re: [PATCH] mm: vmscan: fix IO/refault regression in cache workingset transition

From: Johannes Weiner <hannes@cmpxchg.org>
To: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@suse.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] mm: vmscan: fix IO/refault regression in cache workingset transition
Date: Thu, 6 Apr 2017 10:49:22 -0400	[thread overview]
Message-ID: <20170406144922.GA32364@cmpxchg.org> (raw)
In-Reply-To: <1491430264.16856.43.camel@redhat.com>

On Wed, Apr 05, 2017 at 06:11:04PM -0400, Rik van Riel wrote:
> On Tue, 2017-04-04 at 18:00 -0400, Johannes Weiner wrote:
> 
> > +
> > +	/*
> > +	 * When refaults are being observed, it means a new
> > workingset
> > +	 * is being established. Disable active list protection to
> > get
> > +	 * rid of the stale workingset quickly.
> > +	 */
> 
> This looks a little aggressive. What is this
> expected to do when you have multiple workloads
> sharing the same LRU, and one of the workloads
> is doing refaults, while the other workload is
> continuing to use the same working set as before?

It is aggressive, but it seems to be a trade-off between three things:
maximizing workingset protection during stable periods; minimizing
repeat refaults during workingset transitions; both of those when the
LRU is shared.

The data point we would need to balance optimally between these cases
is whether the active list is hot or stale, but we only have that once
we disable active list protection and challenge those pages.

The more conservative we go about this, the more IO cost to establish
the incoming workingset pages.

I actually did experiment with this. Instead of disabling active list
protection entirely, I reverted to the more conservative 50/50 ratio
during refaults. The 50/50 split addressed the regression, but the
aggressive behavior fared measurably better across three different
services I tested this on (one of them *is* multi-workingset, but the
jobs are cgrouped so they don't *really* share LRUs).

That win was intriguing, but it would be bad if it came out of the
budget of truly shared LRUs (for which I have no quantification).

Since this is a regression fix, it would be fair to be conservative
and use the 50/50 split for transitions here; keep the more adaptive
behavior for a future optimization.

What do you think?