From: Josef Bacik <josef@toxicpanda.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
Minchan Kim <minchan@kernel.org>,
hannes@cmpxchg.org, riel@redhat.com, akpm@linux-foundation.org,
linux-mm@kvack.org, kernel-team@fb.com,
Josef Bacik <jbacik@fb.com>,
mhocko@kernel.org, cl@linux.com
Subject: Re: [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation
Date: Mon, 3 Jul 2017 09:52:37 -0400 [thread overview]
Message-ID: <20170703135236.GD27097@destiny> (raw)
In-Reply-To: <20170702015843.GA17762@dastard>
On Sun, Jul 02, 2017 at 11:58:43AM +1000, Dave Chinner wrote:
> [I don't have the full context, haven't seen the proposed patches,
> etc so I'm only commenting on small bits of what I was cc'd on]
>
> On Fri, Jun 30, 2017 at 11:03:24AM -0400, Josef Bacik wrote:
> > On Fri, Jun 30, 2017 at 11:17:13AM +0900, Minchan Kim wrote:
> >
> > <snip>
> >
> > > >
> > > > Because this static step down wastes cycles. Why loop 10 times when you could
> > > > set the target at actual usage and try to get everything in one go? Most
> > > > shrinkable slabs adhere to this default of in use first model, which means that
> > > > we have to hit an object in the lru twice before it is freed. So in order to
> > >
> > > I didn't know that.
> > >
> > > > reclaim anything we have to scan a slab cache's entire lru at least once before
> > > > any reclaim starts happening. If we're doing this static step down thing we
> > >
> > > If it's really true, I think that shrinker should be fixed first.
> > >
> >
> > Easier said than done. I've fixed this for the super shrinkers, but like I said
> > below, all it takes is some asshole doing find / -exec stat {} \; twice to put
> > us back in the same situation again. There's no aging mechanism other than
> > memory reclaim, so we get into this shitty situation of aging+reclaiming at the
> > same time.
>
> To me, this sounds like the problem that needs fixing. i.e.
> threshold detection to trigger demand-based aging of shrinkable
> caches at allocation time rather than waiting for memory pressure to
> trigger aging.
>
> > > I guess at the first time, all of objects in shrinker could be INUSE state
> > > as you said, however, on steady state, they would work like real LRU
> > > to reflect recency, otherwise, I want to call it broken and we shouldn't
> > > design general slab aging model for those specific one.
> >
> > Yeah that's totally a valid argument to make, but the idea of coming up with
> > something completely different hurts my head, and I'm trying to fix this problem
> > right now, not in 6 cycles when we all finally agree on the new mechanism.
>
> Add a shrinker callback to the slab allocation code that checks the
> rate of growth of the cache. If the cache is growing fast and
> getting large, run the shrinker every so often to slow down the rate
> of growth. The larger the cache, the slower the allowed rate of
> growth. It's the feedback loop that we are missing between slab
> allocation and slab reclaim that prevents use from controlling slab
> cache growthi and aging cleanly.
>
> This would also allow us to cap the size of caching slabs easily
> - something people have been asking us to do for years....
>
> Note: we need to make a clear distinction between slabs used as
> "heap" memory and slabs used for allocating large numbers of objects
> that are cached for performance reasons. The problem here is cache
> management - the fact we are using slabs to allocate the memory for
> the objects is largely irrelevant...
>
> > > > I expanded on this above, but I'll give a more concrete example. Consider xfs
> > > > metadata, we allocate a slab object and read in our page, use it, and then free
> > > > the buffer which put's it on the lru list. XFS marks this with a INUSE flag,
> > > > which must be cleared before it is free'd from the LRU. We scan through the
> > > > LRU, clearing the INUSE flag and moving the object to the back of the LRU, but
> > > > not actually free'ing it. This happens for all (well most) objects that end up
> > > > on the LRU, and this design pattern is used _everywhere_. Until recently it was
> > > > used for the super shrinkers, but I changed that so thankfully the bulk of the
> > > > problem is gone. However if you do a find / -exec stat {} \;, and then do it
> > > > again, you'll end up with the same scenario for the super shrinker. There's no
> > > > aging except via pressure on the slabs, so worst case we always have to scan the
> > > > entire slab object lru once before we start reclaiming anything. Being
> > > > agressive here I think is ok, we have things in place to make sure we don't over
> > > > reclaim.
> > >
> > > Thanks for the detail example. Now I understood but the question is it is
> > > always true? I mean at the first stage(ie, first population of objects), it
> > > seems to be but at the steady stage, I guess some of objects have INUSE,
> > > others not by access pattern so it emulates LRU model. No?
> > >
> >
> > Sort of. Lets take icache/dcache. We open a file and close a file, this get's
> > added to the LRU. We open the file and close the file again, it's already on
> > the LRU so it stays where it was and gets the INUSE flag set. Once an object is
> > on the LRU it doesn't move unless we hit it via the shrinker. Even if we open
> > and close the file, and then open and keep it open, the file stays on the LRU,
> > and is only removed once the shrinker hits it, sees it's refcount is > 1, and
> > removes it from the list.
>
> Yes, but the lazy LRU removal is a performance feature - the cost
> in terms of lock contention of removing dentries inodes from the LRU
> on first reference is prohibitive, especially for short term usage
> like 'find / -exec stat {} \;' workloads. Git does this sort of
> traverse/stat a lot, so making this path any slower would make lots
> of people unhappy...
>
Sure we want it when it's useful, but often times it ends up in latencies due to
reclaim. I like your aging thing, it would definitely help in our worst case,
I'll give that a whirl after my vacation. Thanks,
Josef
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-03 13:52 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-08 19:19 [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation josef
2017-06-08 19:19 ` [PATCH 2/2] mm: make kswapd try harder to keep active pages in cache josef
2017-06-13 5:28 ` [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation Minchan Kim
2017-06-13 12:01 ` Josef Bacik
2017-06-14 6:40 ` Minchan Kim
2017-06-19 15:11 ` Josef Bacik
2017-06-20 2:46 ` Minchan Kim
2017-06-27 13:59 ` Josef Bacik
2017-06-30 2:17 ` Minchan Kim
2017-06-30 15:03 ` Josef Bacik
2017-07-02 1:58 ` Dave Chinner
2017-07-03 13:52 ` Josef Bacik [this message]
2017-07-03 1:33 ` Minchan Kim
2017-07-03 13:50 ` Josef Bacik
2017-07-04 3:01 ` Minchan Kim
2017-07-04 13:21 ` Josef Bacik
2017-07-04 22:57 ` Dave Chinner
2017-07-05 4:59 ` Minchan Kim
2017-07-05 23:58 ` Dave Chinner
2017-07-06 3:56 ` Minchan Kim
2017-07-05 13:33 ` Josef Bacik
2017-07-05 23:30 ` Dave Chinner
2017-07-05 4:43 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170703135236.GD27097@destiny \
--to=josef@toxicpanda.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=jbacik@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).