Re: [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation

From: Minchan Kim <minchan@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
	hannes@cmpxchg.org, riel@redhat.com, akpm@linux-foundation.org,
	linux-mm@kvack.org, kernel-team@fb.com,
	Josef Bacik <jbacik@fb.com>,
	mhocko@kernel.org, cl@linux.com
Subject: Re: [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation
Date: Thu, 6 Jul 2017 12:56:56 +0900	[thread overview]
Message-ID: <20170706035656.GA24077@bbox> (raw)
In-Reply-To: <20170705235823.GV17542@dastard>

On Thu, Jul 06, 2017 at 09:58:23AM +1000, Dave Chinner wrote:
> On Wed, Jul 05, 2017 at 01:59:12PM +0900, Minchan Kim wrote:
> > Hi Dave,
> > 
> > On Wed, Jul 05, 2017 at 08:57:58AM +1000, Dave Chinner wrote:
> > > On Tue, Jul 04, 2017 at 09:21:37AM -0400, Josef Bacik wrote:
> > > > On Tue, Jul 04, 2017 at 12:01:00PM +0900, Minchan Kim wrote:
> > > > > 1. slab *page* reclaim
> > > > > 
> > > > > Your claim is that it's hard to reclaim a page by slab fragmentation so need to
> > > > > reclaim objects more aggressively.
> > > > > 
> > > > > Basically, aggressive scanning doesn't guarantee to reclaim a page but it just
> > > > > increases the possibility. Even, if we think slab works with merging feature(i.e.,
> > > > > it mixes same size several type objects in a slab), the possibility will be huge
> > > > > dropped if you try to bail out on a certain shrinker. So for working well,
> > > > > we should increase aggressiveness too much to sweep every objects from all shrinker.
> > > > > I guess that's why your patch makes the logic very aggressive.
> > > > > In here, my concern with that aggressive is to reclaim all objects too early
> > > > > and it ends up making void caching scheme. I'm not sure it's gain in the end.
> > > > >
> > > > 
> > > > Well the fact is what we have doesn't work, and I've been staring at this
> > > > problem for a few months and I don't have a better solution.
> > > > 
> > > > And keep in mind we're talking about a purely slab workload, something that
> > > > isn't likely to be a common case.  And even if our scan target is 2x, we aren't
> > > > going to reclaim the entire cache before we bail out.  We only scan in
> > > > 'batch_size' chunks, which generally is 1024.  In the worst case that we have
> > > > one in use object on every slab page we own then yes we're fucked, but we're
> > > > still fucked with the current code, only with the current code it'll take us 20
> > > > minutes of looping in the vm vs. seconds scanning the whole list twice.
> > > 
> > > Right - this is where growth/allocation rate based aging scans
> > > come into play, rather than waiting for the VM to hit some unknown
> > > ceiling and do an unpredictable amount of scanning.
> > 
> > http://www.spinics.net/lists/linux-mm/msg129470.html
> > 
> > I suggested static scanning increasement(1/12 + 2/12 + 3/12...) which is
> > more aggressive compared to as-is. With this, in a reclaim cycle(priority
> > 12..0), we guarantees that scanning of entire objects list four times
> > while LRU is two times. Although I believe we don't need four times
> > (i.e., it's enough with two times), it's just compromise solution with
> > Josef's much too agressive slab reclaim.
> > It would be more predictable and aggressive from VM point of view.
> 
> Yes, I read and understood that post, but you are talking about
> changing reclaim behaviour when there is low memory, not dealing
> with the aging problem. It's a brute force big hammer approach, yet

The reason I mentioned reclaim behaviour that his patch changes it really
aggressive and one of the reason he did is we cannot a full slab page
by slab fragmentation. That's really I dislike. If slab fragmentation
is really issue, it should be done by [anti|de]-fragmenation whatever,
not increasing reclaim agressivess which is just band-aid, even it could be
hurt system performance via heavy reclaiming.

> we already know that increasingly aggressive reclaim of caches is a
> problem for filesystems in that it drives us into OOM conditions
> faster. i.e. agressive shrinker reclaim trashes the working set of
> cached filesystem metadata and so increases the GFP_NOFS memory

True. That is my point so please do not reclaim behaviour much too
aggressive.

> allocation demand required by the filesystem at times of critically
> low memory....

Indeed.

> 
> > If some of shrinker cannot be happy with this policy, it would accelerate
> > the scanning for only that shrinker under shrink_slab call although I don't
> > like it because it's out of control from VM pov so I'm okay your per-shrinker
> > aging callback regardless of shrink_slab. My point is if some of shrinker is
> > painful to be reclaimed, it should have own model to solve it rather than
> > making general slab reclaim strately very aggressive.
> 
> We already do this in various filesystems. The issue is that we
> don't know that we should reclaim caches until shrinker callbacks
> start happening. i.e. there's *no feedback mechanism* that allows us
> to age shrinker controlled caches over time. memory reclaim is a
> feedback loop but if we never hit low memory, then it's never
> invoked until we actually run out of memory and so it drowns in
> aging rather than reclaim work when it does get run.

Yes, it's the problem page cache as well as slab cache.
In case of page cache, as you know well, VM want to do best effort
two level LRU promotion/demotion. Although it's not perfect, it would have worked.

Cannot FS slab cache handle aging more smarter? The LFU you suggested
would be better than as-is but would be hard to avoid latency.

Anyway, I'm okay if each shrinker can have the aging method unless
it doesn't increases slab reclaim behaviour too much aggressive
without convincing reason.

> 
> Stop looking at the code and start thinking about the architecture -
> how the subsystems connect and what control/feedback mechanisms are
> required to allow them to work correctly together. We solve balance
> and breakdown problems by identifying the missing/sub-optimal
> feedback loops and fixing them. In this case, what we are missing is
> the mechanism to detect and control "cache growth in single use
> workloads when there is no memory pressure". Sustained cache
> allocation should trigger some amount of aging regardless of how
> much free memory we have, otherwise we simply fill up memory with
> objects we're never going to use again.....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>