linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@kernel.org>
Cc: Nils Holland <nholland@tisys.org>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on)
Date: Thu, 29 Dec 2016 10:20:26 +0900	[thread overview]
Message-ID: <20161229012026.GB15541@bbox> (raw)
In-Reply-To: <20161227155532.GI1308@dhcp22.suse.cz>

On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote:
> Hi,
> could you try to run with the following patch on top of the previous
> one? I do not think it will make a large change in your workload but
> I think we need something like that so some testing under which is known
> to make a high lowmem pressure would be really appreciated. If you have
> more time to play with it then running with and without the patch with
> mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us
> whether it make any difference at all.
> 
> I would also appreciate if Mel and Johannes had a look at it. I am not
> yet sure whether we need the same thing for anon/file balancing in
> get_scan_count. I suspect we need but need to think more about that.
> 
> Thanks a lot again!
> ---
> From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Tue, 27 Dec 2016 16:28:44 +0100
> Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count
> 
> get_scan_count considers the whole node LRU size when
> - doing SCAN_FILE due to many page cache inactive pages
> - calculating the number of pages to scan
> 
> in both cases this might lead to unexpected behavior especially on 32b
> systems where we can expect lowmem memory pressure very often.
> 
> A large highmem zone can easily distort SCAN_FILE heuristic because
> there might be only few file pages from the eligible zones on the node
> lru and we would still enforce file lru scanning which can lead to
> trashing while we could still scan anonymous pages.

Nit:
It doesn't make thrashing because isolate_lru_pages filter out them
but I agree it makes pointless CPU burning to find eligible pages.

> 
> The later use of lruvec_lru_size can be problematic as well. Especially
> when there are not many pages from the eligible zones. We would have to
> skip over many pages to find anything to reclaim but shrink_node_memcg
> would only reduce the remaining number to scan by SWAP_CLUSTER_MAX
> at maximum. Therefore we can end up going over a large LRU many times
> without actually having chance to reclaim much if anything at all. The
> closer we are out of memory on lowmem zone the worse the problem will
> be.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmscan.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c98b1a585992..785b4d7fb8a0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int
>  }
>  
>  /*
> + * Return the number of pages on the given lru which are eligibne for the
                                                            eligible
> + * given zone_idx
> + */
> +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec,
> +		enum lru_list lru, int zone_idx)

Nit:

Although there is a comment, function name is rather confusing when I compared
it with lruvec_zone_lru_size.

lruvec_eligible_zones_lru_size is better?


> +{
> +	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> +	unsigned long lru_size;
> +	int zid;
> +
> +	lru_size = lruvec_lru_size(lruvec, lru);
> +	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
> +		struct zone *zone = &pgdat->node_zones[zid];
> +		unsigned long size;
> +
> +		if (!managed_zone(zone))
> +			continue;
> +
> +		size = lruvec_zone_lru_size(lruvec, lru, zid);
> +		lru_size -= min(size, lru_size);
> +	}
> +
> +	return lru_size;
> +}
> +
> +/*
>   * Add a shrinker callback to be called from the vm.
>   */
>  int register_shrinker(struct shrinker *shrinker)
> @@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	 * system is under heavy pressure.
>  	 */
>  	if (!inactive_list_is_low(lruvec, true, sc) &&
> -	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) {
> +	    lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
>  		scan_balance = SCAN_FILE;
>  		goto out;
>  	}
> @@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  			unsigned long size;
>  			unsigned long scan;
>  
> -			size = lruvec_lru_size(lruvec, lru);
> +			size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx);
>  			scan = size >> sc->priority;
>  
>  			if (!scan && pass && force_scan)
> -- 
> 2.10.2

Nit:

With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than
own custom calculation to filter out non-eligible pages. 

Anyway, I think this patch does right things so I suppose this.

Acked-by: Minchan Kim <minchan@kernel.org>

  parent reply	other threads:[~2016-12-29  1:20 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland
2016-12-16  7:39 ` Michal Hocko
2016-12-16 15:58   ` OOM: Better, but still there on Michal Hocko
2016-12-16 15:58     ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko
2016-12-16 15:58     ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko
2016-12-16 17:31       ` Johannes Weiner
2016-12-16 22:12         ` Michal Hocko
2016-12-17 11:17           ` Tetsuo Handa
2016-12-18 16:37             ` Michal Hocko
2016-12-16 18:47     ` OOM: Better, but still there on Nils Holland
2016-12-17  0:02       ` Michal Hocko
2016-12-17 12:59         ` Nils Holland
2016-12-17 14:44           ` Tetsuo Handa
2016-12-17 17:11             ` Nils Holland
2016-12-17 21:06             ` Nils Holland
2016-12-18  5:14               ` Tetsuo Handa
2016-12-19 13:45               ` Michal Hocko
2016-12-20  2:08                 ` Nils Holland
2016-12-21  7:36                   ` Michal Hocko
2016-12-21 11:00                     ` Tetsuo Handa
2016-12-21 11:16                       ` Michal Hocko
2016-12-21 14:04                         ` Chris Mason
2016-12-22 10:10                     ` Nils Holland
2016-12-22 10:27                       ` Michal Hocko
2016-12-22 10:35                         ` Nils Holland
2016-12-22 10:46                           ` Tetsuo Handa
2016-12-22 19:17                       ` Michal Hocko
2016-12-22 21:46                         ` Nils Holland
2016-12-23 10:51                           ` Michal Hocko
2016-12-23 12:18                             ` Nils Holland
2016-12-23 12:57                               ` Michal Hocko
2016-12-23 14:47                                 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko
2016-12-23 22:26                                   ` Nils Holland
2016-12-26 12:48                                     ` Michal Hocko
2016-12-26 18:57                                       ` Nils Holland
2016-12-27  8:08                                         ` Michal Hocko
2016-12-27 11:23                                           ` Nils Holland
2016-12-27 11:27                                             ` Michal Hocko
2016-12-27 15:55                                       ` Michal Hocko
2016-12-27 16:28                                         ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot
2016-12-28  8:51                                           ` Michal Hocko
2016-12-27 19:33                                         ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland
2016-12-28  8:57                                           ` Michal Hocko
2016-12-29  1:20                                         ` Minchan Kim [this message]
2016-12-29  9:04                                           ` Michal Hocko
2016-12-30  2:05                                             ` Minchan Kim
2016-12-30 10:40                                               ` Michal Hocko
2016-12-29  0:31                                       ` Minchan Kim
2016-12-29  0:48                                         ` Minchan Kim
2016-12-29  8:52                                           ` Michal Hocko
2016-12-30 10:19                                       ` Mel Gorman
2016-12-30 11:05                                         ` Michal Hocko
2016-12-30 12:43                                           ` Mel Gorman
2016-12-25 22:25                                   ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot
2016-12-26 12:26                                     ` Michal Hocko
2016-12-26 12:50                                       ` Michal Hocko
2016-12-18  0:28             ` OOM: Better, but still there on Xin Zhou
2016-12-16 18:15   ` OOM: Better, but still there on 4.9 Chris Mason
2016-12-16 22:14     ` Michal Hocko
2016-12-16 22:47       ` Chris Mason
2016-12-16 23:31         ` Michal Hocko
2016-12-16 19:50   ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161229012026.GB15541@bbox \
    --to=minchan@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=hannes@cmpxchg.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=nholland@tisys.org \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).