All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <kernel-team@fb.com>,
	Josef Bacik <jbacik@fb.com>, Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] mm: slowly shrink slabs with a relatively small number of objects
Date: Tue, 4 Sep 2018 10:52:46 -0700	[thread overview]
Message-ID: <20180904175243.GA4889@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20180904161431.GP14951@dhcp22.suse.cz>

On Tue, Sep 04, 2018 at 06:14:31PM +0200, Michal Hocko wrote:
> On Tue 04-09-18 08:34:49, Roman Gushchin wrote:
> > On Tue, Sep 04, 2018 at 09:00:05AM +0200, Michal Hocko wrote:
> > > On Mon 03-09-18 13:28:06, Roman Gushchin wrote:
> > > > On Mon, Sep 03, 2018 at 08:29:56PM +0200, Michal Hocko wrote:
> > > > > On Fri 31-08-18 14:31:41, Roman Gushchin wrote:
> > > > > > On Fri, Aug 31, 2018 at 05:15:39PM -0400, Rik van Riel wrote:
> > > > > > > On Fri, 2018-08-31 at 13:34 -0700, Roman Gushchin wrote:
> > > > > > > 
> > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > > > index fa2c150ab7b9..c910cf6bf606 100644
> > > > > > > > --- a/mm/vmscan.c
> > > > > > > > +++ b/mm/vmscan.c
> > > > > > > > @@ -476,6 +476,10 @@ static unsigned long do_shrink_slab(struct
> > > > > > > > shrink_control *shrinkctl,
> > > > > > > >  	delta = freeable >> priority;
> > > > > > > >  	delta *= 4;
> > > > > > > >  	do_div(delta, shrinker->seeks);
> > > > > > > > +
> > > > > > > > +	if (delta == 0 && freeable > 0)
> > > > > > > > +		delta = min(freeable, batch_size);
> > > > > > > > +
> > > > > > > >  	total_scan += delta;
> > > > > > > >  	if (total_scan < 0) {
> > > > > > > >  		pr_err("shrink_slab: %pF negative objects to delete
> > > > > > > > nr=%ld\n",
> > > > > > > 
> > > > > > > I agree that we need to shrink slabs with fewer than
> > > > > > > 4096 objects, but do we want to put more pressure on
> > > > > > > a slab the moment it drops below 4096 than we applied
> > > > > > > when it had just over 4096 objects on it?
> > > > > > > 
> > > > > > > With this patch, a slab with 5000 objects on it will
> > > > > > > get 1 item scanned, while a slab with 4000 objects on
> > > > > > > it will see shrinker->batch or SHRINK_BATCH objects
> > > > > > > scanned every time.
> > > > > > > 
> > > > > > > I don't know if this would cause any issues, just
> > > > > > > something to ponder.
> > > > > > 
> > > > > > Hm, fair enough. So, basically we can always do
> > > > > > 
> > > > > >     delta = max(delta, min(freeable, batch_size));
> > > > > > 
> > > > > > Does it look better?
> > > > > 
> > > > > Why don't you use the same heuristic we use for the normal LRU raclaim?
> > > > 
> > > > Because we do reparent kmem lru lists on offlining.
> > > > Take a look at memcg_offline_kmem().
> > > 
> > > Then I must be missing something. Why are we growing the number of dead
> > > cgroups then?
> > 
> > We do reparent LRU lists, but not objects. Objects (or, more precisely, pages)
> > are still holding a reference to the memcg.
> 
> OK, this is what I missed. I thought that the reparenting includes all
> the pages as well. Is there any strong reason that we cannot do that?
> Performance/Locking/etc.?
> 
> Or maybe do not reparent at all and rely on the same reclaim heuristic
> we do for normal pages?
> 
> I am not opposing your patch but I am trying to figure out whether that
> is the best approach.

I don't think the current logic does make sense. Why should cgroups
with less than 4k kernel objects be excluded from being scanned?

Reparenting of all pages is definitely an option to consider,
but it's not free in any case, so if there is no problem,
why should we? Let's keep it as a last measure. In my case,
the proposed patch works perfectly: the number of dying cgroups
jumps around 100, where it grew steadily to 2k and more before.

I believe that reparenting of LRU lists is required to minimize
the number of LRU lists to scan, but I'm not sure.

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@surriel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, Josef Bacik <jbacik@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] mm: slowly shrink slabs with a relatively small number of objects
Date: Tue, 4 Sep 2018 10:52:46 -0700	[thread overview]
Message-ID: <20180904175243.GA4889@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20180904161431.GP14951@dhcp22.suse.cz>

On Tue, Sep 04, 2018 at 06:14:31PM +0200, Michal Hocko wrote:
> On Tue 04-09-18 08:34:49, Roman Gushchin wrote:
> > On Tue, Sep 04, 2018 at 09:00:05AM +0200, Michal Hocko wrote:
> > > On Mon 03-09-18 13:28:06, Roman Gushchin wrote:
> > > > On Mon, Sep 03, 2018 at 08:29:56PM +0200, Michal Hocko wrote:
> > > > > On Fri 31-08-18 14:31:41, Roman Gushchin wrote:
> > > > > > On Fri, Aug 31, 2018 at 05:15:39PM -0400, Rik van Riel wrote:
> > > > > > > On Fri, 2018-08-31 at 13:34 -0700, Roman Gushchin wrote:
> > > > > > > 
> > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > > > index fa2c150ab7b9..c910cf6bf606 100644
> > > > > > > > --- a/mm/vmscan.c
> > > > > > > > +++ b/mm/vmscan.c
> > > > > > > > @@ -476,6 +476,10 @@ static unsigned long do_shrink_slab(struct
> > > > > > > > shrink_control *shrinkctl,
> > > > > > > >  	delta = freeable >> priority;
> > > > > > > >  	delta *= 4;
> > > > > > > >  	do_div(delta, shrinker->seeks);
> > > > > > > > +
> > > > > > > > +	if (delta == 0 && freeable > 0)
> > > > > > > > +		delta = min(freeable, batch_size);
> > > > > > > > +
> > > > > > > >  	total_scan += delta;
> > > > > > > >  	if (total_scan < 0) {
> > > > > > > >  		pr_err("shrink_slab: %pF negative objects to delete
> > > > > > > > nr=%ld\n",
> > > > > > > 
> > > > > > > I agree that we need to shrink slabs with fewer than
> > > > > > > 4096 objects, but do we want to put more pressure on
> > > > > > > a slab the moment it drops below 4096 than we applied
> > > > > > > when it had just over 4096 objects on it?
> > > > > > > 
> > > > > > > With this patch, a slab with 5000 objects on it will
> > > > > > > get 1 item scanned, while a slab with 4000 objects on
> > > > > > > it will see shrinker->batch or SHRINK_BATCH objects
> > > > > > > scanned every time.
> > > > > > > 
> > > > > > > I don't know if this would cause any issues, just
> > > > > > > something to ponder.
> > > > > > 
> > > > > > Hm, fair enough. So, basically we can always do
> > > > > > 
> > > > > >     delta = max(delta, min(freeable, batch_size));
> > > > > > 
> > > > > > Does it look better?
> > > > > 
> > > > > Why don't you use the same heuristic we use for the normal LRU raclaim?
> > > > 
> > > > Because we do reparent kmem lru lists on offlining.
> > > > Take a look at memcg_offline_kmem().
> > > 
> > > Then I must be missing something. Why are we growing the number of dead
> > > cgroups then?
> > 
> > We do reparent LRU lists, but not objects. Objects (or, more precisely, pages)
> > are still holding a reference to the memcg.
> 
> OK, this is what I missed. I thought that the reparenting includes all
> the pages as well. Is there any strong reason that we cannot do that?
> Performance/Locking/etc.?
> 
> Or maybe do not reparent at all and rely on the same reclaim heuristic
> we do for normal pages?
> 
> I am not opposing your patch but I am trying to figure out whether that
> is the best approach.

I don't think the current logic does make sense. Why should cgroups
with less than 4k kernel objects be excluded from being scanned?

Reparenting of all pages is definitely an option to consider,
but it's not free in any case, so if there is no problem,
why should we? Let's keep it as a last measure. In my case,
the proposed patch works perfectly: the number of dying cgroups
jumps around 100, where it grew steadily to 2k and more before.

I believe that reparenting of LRU lists is required to minimize
the number of LRU lists to scan, but I'm not sure.

  reply	other threads:[~2018-09-04 17:53 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-31 20:34 [PATCH] mm: slowly shrink slabs with a relatively small number of objects Roman Gushchin
2018-08-31 20:34 ` Roman Gushchin
2018-08-31 21:15 ` Rik van Riel
2018-08-31 21:31   ` Roman Gushchin
2018-08-31 21:31     ` Roman Gushchin
2018-09-01  1:27     ` Rik van Riel
2018-09-03 18:29     ` Michal Hocko
2018-09-03 20:28       ` Roman Gushchin
2018-09-03 20:28         ` Roman Gushchin
2018-09-04  7:00         ` Michal Hocko
2018-09-04 15:34           ` Roman Gushchin
2018-09-04 15:34             ` Roman Gushchin
2018-09-04 16:14             ` Michal Hocko
2018-09-04 17:52               ` Roman Gushchin [this message]
2018-09-04 17:52                 ` Roman Gushchin
2018-09-04 18:06                 ` Michal Hocko
2018-09-04 18:07                   ` Michal Hocko
2018-09-04 20:34                 ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180904175243.GA4889@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jbacik@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.