All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Yang Shi <shy828301@gmail.com>,
	guro@fb.com, ktkhai@virtuozzo.com, shakeelb@google.com,
	david@fromorbit.com, hannes@cmpxchg.org, mhocko@suse.com,
	akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v7 PATCH 12/12] mm: vmscan: shrink deferred objects proportional to priority
Date: Thu, 11 Feb 2021 14:10:49 +0100	[thread overview]
Message-ID: <acd1915c-306b-08a8-9e0f-b06c1e09fb4c@suse.cz> (raw)
In-Reply-To: <20210209174646.1310591-13-shy828301@gmail.com>

On 2/9/21 6:46 PM, Yang Shi wrote:
> The number of deferred objects might get windup to an absurd number, and it
> results in clamp of slab objects.  It is undesirable for sustaining workingset.
> 
> So shrink deferred objects proportional to priority and cap nr_deferred to twice
> of cache items.

Makes sense to me, minimally it's simpler than the old code and avoiding absurd
growth of nr_deferred should be a good thing, as well as the "proportional to
priority" part.

I just suspect there's a bit of unnecessary bias in the implementation, as
explained below:

> The idea is borrowed from Dave Chinner's patch:
> https://lore.kernel.org/linux-xfs/20191031234618.15403-13-david@fromorbit.com/
> 
> Tested with kernel build and vfs metadata heavy workload in our production
> environment, no regression is spotted so far.
> 
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/vmscan.c | 40 +++++-----------------------------------
>  1 file changed, 5 insertions(+), 35 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 66163082cc6f..d670b119d6bd 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -654,7 +654,6 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  	 */
>  	nr = count_nr_deferred(shrinker, shrinkctl);
>  
> -	total_scan = nr;
>  	if (shrinker->seeks) {
>  		delta = freeable >> priority;
>  		delta *= 4;
> @@ -668,37 +667,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  		delta = freeable / 2;
>  	}
>  
> +	total_scan = nr >> priority;
>  	total_scan += delta;

So, our scan goal consists of the part based on freeable objects (delta), plus a
part of the defferred objects (nr >> priority). Fine.

> -	if (total_scan < 0) {
> -		pr_err("shrink_slab: %pS negative objects to delete nr=%ld\n",
> -		       shrinker->scan_objects, total_scan);
> -		total_scan = freeable;
> -		next_deferred = nr;
> -	} else
> -		next_deferred = total_scan;
> -
> -	/*
> -	 * We need to avoid excessive windup on filesystem shrinkers
> -	 * due to large numbers of GFP_NOFS allocations causing the
> -	 * shrinkers to return -1 all the time. This results in a large
> -	 * nr being built up so when a shrink that can do some work
> -	 * comes along it empties the entire cache due to nr >>>
> -	 * freeable. This is bad for sustaining a working set in
> -	 * memory.
> -	 *
> -	 * Hence only allow the shrinker to scan the entire cache when
> -	 * a large delta change is calculated directly.
> -	 */
> -	if (delta < freeable / 4)
> -		total_scan = min(total_scan, freeable / 2);
> -
> -	/*
> -	 * Avoid risking looping forever due to too large nr value:
> -	 * never try to free more than twice the estimate number of
> -	 * freeable entries.
> -	 */
> -	if (total_scan > freeable * 2)
> -		total_scan = freeable * 2;
> +	total_scan = min(total_scan, (2 * freeable));

Probably unnecessary as we cap next_deferred below anyway? So total_scan cannot
grow without limits anymore. But can't hurt.

>  	trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
>  				   freeable, delta, total_scan, priority);
> @@ -737,10 +708,9 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>  		cond_resched();
>  	}
>  
> -	if (next_deferred >= scanned)
> -		next_deferred -= scanned;
> -	else
> -		next_deferred = 0;
> +	next_deferred = max_t(long, (nr - scanned), 0) + total_scan;

And here's the bias I think. Suppose we scanned 0 due to e.g. GFP_NOFS. We count
as newly deferred both the "delta" part of total_scan, which is fine, but also
the "nr >> priority" part, where we failed to our share of the "reduce
nr_deferred" work, but I don't think it means we should also increase
nr_deferred by that amount of failed work.
OTOH if we succeed and scan exactly the whole goal, we are subtracting from
nr_deferred both the "nr >> priority" part, which is correct, but also delta,
which was new work, not deferred one, so that's incorrect IMHO as well.
So the calculation should probably be something like this?

	next_deferred = max_t(long, nr + delta - scanned, 0);

Thanks,
Vlastimil

> +	next_deferred = min(next_deferred, (2 * freeable));
> +
>  	/*
>  	 * move the unused scan count back into the shrinker in a
>  	 * manner that handles concurrent updates.
> 


  reply	other threads:[~2021-02-11 13:36 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09 17:46 [v7 PATCH 0/12] Make shrinker's nr_deferred memcg aware Yang Shi
2021-02-09 17:46 ` [v7 PATCH 01/12] mm: vmscan: use nid from shrink_control for tracepoint Yang Shi
2021-02-09 19:14   ` Shakeel Butt
2021-02-09 19:14     ` Shakeel Butt
2021-02-10 16:58     ` Yang Shi
2021-02-10 16:58       ` Yang Shi
2021-02-09 19:21   ` Roman Gushchin
2021-02-09 17:46 ` [v7 PATCH 02/12] mm: vmscan: consolidate shrinker_maps handling code Yang Shi
2021-02-09 20:27   ` Roman Gushchin
2021-02-10 14:19   ` Shakeel Butt
2021-02-10 14:19     ` Shakeel Butt
2021-02-09 17:46 ` [v7 PATCH 03/12] mm: vmscan: use shrinker_rwsem to protect shrinker_maps allocation Yang Shi
2021-02-09 20:33   ` Roman Gushchin
2021-02-09 23:28     ` Yang Shi
2021-02-09 23:28       ` Yang Shi
2021-02-09 17:46 ` [v7 PATCH 04/12] mm: vmscan: remove memcg_shrinker_map_size Yang Shi
2021-02-09 20:43   ` Roman Gushchin
2021-02-09 23:31     ` Yang Shi
2021-02-09 23:31       ` Yang Shi
2021-02-10 18:14     ` Vlastimil Babka
2021-02-09 17:46 ` [v7 PATCH 05/12] mm: memcontrol: rename shrinker_map to shrinker_info Yang Shi
2021-02-09 20:50   ` Roman Gushchin
2021-02-09 23:33     ` Yang Shi
2021-02-09 23:33       ` Yang Shi
2021-02-10  0:16       ` Roman Gushchin
2021-02-11 16:47       ` Kirill Tkhai
2021-02-11 17:29         ` Yang Shi
2021-02-11 17:29           ` Yang Shi
2021-02-09 17:46 ` [v7 PATCH 06/12] mm: vmscan: add shrinker_info_protected() helper Yang Shi
2021-02-10  0:22   ` Roman Gushchin
2021-02-10  1:07     ` Yang Shi
2021-02-10  1:07       ` Yang Shi
2021-02-10  1:29       ` Roman Gushchin
2021-02-10 12:12   ` Kirill Tkhai
2021-02-10 18:17   ` Vlastimil Babka
2021-02-12  6:54   ` [mm] bd741fb2ad: WARNING:suspicious_RCU_usage kernel test robot
2021-02-12  6:54     ` kernel test robot
2021-02-09 17:46 ` [v7 PATCH 07/12] mm: vmscan: use a new flag to indicate shrinker is registered Yang Shi
2021-02-10  0:39   ` Roman Gushchin
2021-02-10  1:12     ` Yang Shi
2021-02-10  1:12       ` Yang Shi
2021-02-10  1:34       ` Roman Gushchin
2021-02-10  1:55         ` Yang Shi
2021-02-10  1:55           ` Yang Shi
2021-02-10 18:45     ` Yang Shi
2021-02-10 18:45       ` Yang Shi
2021-02-10 18:23   ` Vlastimil Babka
2021-02-09 17:46 ` [v7 PATCH 08/12] mm: vmscan: add per memcg shrinker nr_deferred Yang Shi
2021-02-10  1:10   ` Roman Gushchin
2021-02-10  1:25     ` Yang Shi
2021-02-10  1:25       ` Yang Shi
2021-02-10  1:40       ` Roman Gushchin
2021-02-10  1:57         ` Yang Shi
2021-02-10  1:57           ` Yang Shi
2021-02-09 17:46 ` [v7 PATCH 09/12] mm: vmscan: use per memcg nr_deferred of shrinker Yang Shi
2021-02-10  1:27   ` Roman Gushchin
2021-02-10  1:52     ` Yang Shi
2021-02-10  1:52       ` Yang Shi
2021-02-10 14:36       ` Kirill Tkhai
2021-02-10 16:41         ` Yang Shi
2021-02-10 16:41           ` Yang Shi
2021-02-09 17:46 ` [v7 PATCH 10/12] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers Yang Shi
2021-02-10  1:23   ` Roman Gushchin
2021-02-09 17:46 ` [v7 PATCH 11/12] mm: memcontrol: reparent nr_deferred when memcg offline Yang Shi
2021-02-10  1:18   ` Roman Gushchin
2021-02-09 17:46 ` [v7 PATCH 12/12] mm: vmscan: shrink deferred objects proportional to priority Yang Shi
2021-02-11 13:10   ` Vlastimil Babka [this message]
2021-02-11 17:29     ` Yang Shi
2021-02-11 17:29       ` Yang Shi
2021-02-11 18:52       ` Vlastimil Babka
2021-02-11 19:15         ` Yang Shi
2021-02-11 19:15           ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acd1915c-306b-08a8-9e0f-b06c1e09fb4c@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.