From: Minchan Kim <minchan@kernel.org> To: Michal Hocko <mhocko@kernel.org> Cc: Nils Holland <nholland@tisys.org>, Mel Gorman <mgorman@suse.de>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>, linux-btrfs@vger.kernel.org Subject: Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Date: Thu, 29 Dec 2016 10:20:26 +0900 [thread overview] Message-ID: <20161229012026.GB15541@bbox> (raw) In-Reply-To: <20161227155532.GI1308@dhcp22.suse.cz> On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > Hi, > could you try to run with the following patch on top of the previous > one? I do not think it will make a large change in your workload but > I think we need something like that so some testing under which is known > to make a high lowmem pressure would be really appreciated. If you have > more time to play with it then running with and without the patch with > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > whether it make any difference at all. > > I would also appreciate if Mel and Johannes had a look at it. I am not > yet sure whether we need the same thing for anon/file balancing in > get_scan_count. I suspect we need but need to think more about that. > > Thanks a lot again! > --- > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Tue, 27 Dec 2016 16:28:44 +0100 > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count > > get_scan_count considers the whole node LRU size when > - doing SCAN_FILE due to many page cache inactive pages > - calculating the number of pages to scan > > in both cases this might lead to unexpected behavior especially on 32b > systems where we can expect lowmem memory pressure very often. > > A large highmem zone can easily distort SCAN_FILE heuristic because > there might be only few file pages from the eligible zones on the node > lru and we would still enforce file lru scanning which can lead to > trashing while we could still scan anonymous pages. Nit: It doesn't make thrashing because isolate_lru_pages filter out them but I agree it makes pointless CPU burning to find eligible pages. > > The later use of lruvec_lru_size can be problematic as well. Especially > when there are not many pages from the eligible zones. We would have to > skip over many pages to find anything to reclaim but shrink_node_memcg > would only reduce the remaining number to scan by SWAP_CLUSTER_MAX > at maximum. Therefore we can end up going over a large LRU many times > without actually having chance to reclaim much if anything at all. The > closer we are out of memory on lowmem zone the worse the problem will > be. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/vmscan.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c98b1a585992..785b4d7fb8a0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int > } > > /* > + * Return the number of pages on the given lru which are eligibne for the eligible > + * given zone_idx > + */ > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec, > + enum lru_list lru, int zone_idx) Nit: Although there is a comment, function name is rather confusing when I compared it with lruvec_zone_lru_size. lruvec_eligible_zones_lru_size is better? > +{ > + struct pglist_data *pgdat = lruvec_pgdat(lruvec); > + unsigned long lru_size; > + int zid; > + > + lru_size = lruvec_lru_size(lruvec, lru); > + for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { > + struct zone *zone = &pgdat->node_zones[zid]; > + unsigned long size; > + > + if (!managed_zone(zone)) > + continue; > + > + size = lruvec_zone_lru_size(lruvec, lru, zid); > + lru_size -= min(size, lru_size); > + } > + > + return lru_size; > +} > + > +/* > * Add a shrinker callback to be called from the vm. > */ > int register_shrinker(struct shrinker *shrinker) > @@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, > * system is under heavy pressure. > */ > if (!inactive_list_is_low(lruvec, true, sc) && > - lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) { > + lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) { > scan_balance = SCAN_FILE; > goto out; > } > @@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, > unsigned long size; > unsigned long scan; > > - size = lruvec_lru_size(lruvec, lru); > + size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx); > scan = size >> sc->priority; > > if (!scan && pass && force_scan) > -- > 2.10.2 Nit: With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than own custom calculation to filter out non-eligible pages. Anyway, I think this patch does right things so I suppose this. Acked-by: Minchan Kim <minchan@kernel.org>
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org> To: Michal Hocko <mhocko@kernel.org> Cc: Nils Holland <nholland@tisys.org>, Mel Gorman <mgorman@suse.de>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>, linux-btrfs@vger.kernel.org Subject: Re: [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Date: Thu, 29 Dec 2016 10:20:26 +0900 [thread overview] Message-ID: <20161229012026.GB15541@bbox> (raw) In-Reply-To: <20161227155532.GI1308@dhcp22.suse.cz> On Tue, Dec 27, 2016 at 04:55:33PM +0100, Michal Hocko wrote: > Hi, > could you try to run with the following patch on top of the previous > one? I do not think it will make a large change in your workload but > I think we need something like that so some testing under which is known > to make a high lowmem pressure would be really appreciated. If you have > more time to play with it then running with and without the patch with > mm_vmscan_direct_reclaim_{start,end} tracepoints enabled could tell us > whether it make any difference at all. > > I would also appreciate if Mel and Johannes had a look at it. I am not > yet sure whether we need the same thing for anon/file balancing in > get_scan_count. I suspect we need but need to think more about that. > > Thanks a lot again! > --- > From b51f50340fe9e40b68be198b012f8ab9869c1850 Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@suse.com> > Date: Tue, 27 Dec 2016 16:28:44 +0100 > Subject: [PATCH] mm, vmscan: consider eligible zones in get_scan_count > > get_scan_count considers the whole node LRU size when > - doing SCAN_FILE due to many page cache inactive pages > - calculating the number of pages to scan > > in both cases this might lead to unexpected behavior especially on 32b > systems where we can expect lowmem memory pressure very often. > > A large highmem zone can easily distort SCAN_FILE heuristic because > there might be only few file pages from the eligible zones on the node > lru and we would still enforce file lru scanning which can lead to > trashing while we could still scan anonymous pages. Nit: It doesn't make thrashing because isolate_lru_pages filter out them but I agree it makes pointless CPU burning to find eligible pages. > > The later use of lruvec_lru_size can be problematic as well. Especially > when there are not many pages from the eligible zones. We would have to > skip over many pages to find anything to reclaim but shrink_node_memcg > would only reduce the remaining number to scan by SWAP_CLUSTER_MAX > at maximum. Therefore we can end up going over a large LRU many times > without actually having chance to reclaim much if anything at all. The > closer we are out of memory on lowmem zone the worse the problem will > be. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/vmscan.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c98b1a585992..785b4d7fb8a0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -252,6 +252,32 @@ unsigned long lruvec_zone_lru_size(struct lruvec *lruvec, enum lru_list lru, int > } > > /* > + * Return the number of pages on the given lru which are eligibne for the eligible > + * given zone_idx > + */ > +static unsigned long lruvec_lru_size_zone_idx(struct lruvec *lruvec, > + enum lru_list lru, int zone_idx) Nit: Although there is a comment, function name is rather confusing when I compared it with lruvec_zone_lru_size. lruvec_eligible_zones_lru_size is better? > +{ > + struct pglist_data *pgdat = lruvec_pgdat(lruvec); > + unsigned long lru_size; > + int zid; > + > + lru_size = lruvec_lru_size(lruvec, lru); > + for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { > + struct zone *zone = &pgdat->node_zones[zid]; > + unsigned long size; > + > + if (!managed_zone(zone)) > + continue; > + > + size = lruvec_zone_lru_size(lruvec, lru, zid); > + lru_size -= min(size, lru_size); > + } > + > + return lru_size; > +} > + > +/* > * Add a shrinker callback to be called from the vm. > */ > int register_shrinker(struct shrinker *shrinker) > @@ -2207,7 +2233,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, > * system is under heavy pressure. > */ > if (!inactive_list_is_low(lruvec, true, sc) && > - lruvec_lru_size(lruvec, LRU_INACTIVE_FILE) >> sc->priority) { > + lruvec_lru_size_zone_idx(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) { > scan_balance = SCAN_FILE; > goto out; > } > @@ -2274,7 +2300,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg, > unsigned long size; > unsigned long scan; > > - size = lruvec_lru_size(lruvec, lru); > + size = lruvec_lru_size_zone_idx(lruvec, lru, sc->reclaim_idx); > scan = size >> sc->priority; > > if (!scan && pass && force_scan) > -- > 2.10.2 Nit: With this patch, inactive_list_is_low can use lruvec_lru_size_zone_idx rather than own custom calculation to filter out non-eligible pages. Anyway, I think this patch does right things so I suppose this. Acked-by: Minchan Kim <minchan@kernel.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-29 1:20 UTC|newest] Thread overview: 123+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-12-15 22:57 OOM: Better, but still there on 4.9 Nils Holland 2016-12-16 7:39 ` Michal Hocko 2016-12-16 7:39 ` Michal Hocko 2016-12-16 15:58 ` OOM: Better, but still there on Michal Hocko 2016-12-16 15:58 ` Michal Hocko 2016-12-16 15:58 ` [PATCH 1/2] mm: consolidate GFP_NOFAIL checks in the allocator slowpath Michal Hocko 2016-12-16 15:58 ` Michal Hocko 2016-12-16 15:58 ` [PATCH 2/2] mm, oom: do not enfore OOM killer for __GFP_NOFAIL automatically Michal Hocko 2016-12-16 15:58 ` Michal Hocko 2016-12-16 17:31 ` Johannes Weiner 2016-12-16 17:31 ` Johannes Weiner 2016-12-16 22:12 ` Michal Hocko 2016-12-16 22:12 ` Michal Hocko 2016-12-17 11:17 ` Tetsuo Handa 2016-12-17 11:17 ` Tetsuo Handa 2016-12-18 16:37 ` Michal Hocko 2016-12-18 16:37 ` Michal Hocko 2016-12-16 18:47 ` OOM: Better, but still there on Nils Holland 2016-12-16 18:47 ` Nils Holland 2016-12-17 0:02 ` Michal Hocko 2016-12-17 0:02 ` Michal Hocko 2016-12-17 12:59 ` Nils Holland 2016-12-17 12:59 ` Nils Holland 2016-12-17 14:44 ` Tetsuo Handa 2016-12-17 14:44 ` Tetsuo Handa 2016-12-17 17:11 ` Nils Holland 2016-12-17 17:11 ` Nils Holland 2016-12-17 21:06 ` Nils Holland 2016-12-17 21:06 ` Nils Holland 2016-12-18 5:14 ` Tetsuo Handa 2016-12-18 5:14 ` Tetsuo Handa 2016-12-19 13:45 ` Michal Hocko 2016-12-19 13:45 ` Michal Hocko 2016-12-20 2:08 ` Nils Holland 2016-12-20 2:08 ` Nils Holland 2016-12-21 7:36 ` Michal Hocko 2016-12-21 7:36 ` Michal Hocko 2016-12-21 11:00 ` Tetsuo Handa 2016-12-21 11:00 ` Tetsuo Handa 2016-12-21 11:16 ` Michal Hocko 2016-12-21 11:16 ` Michal Hocko 2016-12-21 14:04 ` Chris Mason 2016-12-21 14:04 ` Chris Mason 2016-12-22 10:10 ` Nils Holland 2016-12-22 10:10 ` Nils Holland 2016-12-22 10:27 ` Michal Hocko 2016-12-22 10:27 ` Michal Hocko 2016-12-22 10:35 ` Nils Holland 2016-12-22 10:35 ` Nils Holland 2016-12-22 10:46 ` Tetsuo Handa 2016-12-22 10:46 ` Tetsuo Handa 2016-12-22 19:17 ` Michal Hocko 2016-12-22 19:17 ` Michal Hocko 2016-12-22 21:46 ` Nils Holland 2016-12-22 21:46 ` Nils Holland 2016-12-23 10:51 ` Michal Hocko 2016-12-23 10:51 ` Michal Hocko 2016-12-23 12:18 ` Nils Holland 2016-12-23 12:18 ` Nils Holland 2016-12-23 12:57 ` Michal Hocko 2016-12-23 12:57 ` Michal Hocko 2016-12-23 14:47 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Michal Hocko 2016-12-23 14:47 ` Michal Hocko 2016-12-23 22:26 ` Nils Holland 2016-12-23 22:26 ` Nils Holland 2016-12-26 12:48 ` Michal Hocko 2016-12-26 12:48 ` Michal Hocko 2016-12-26 18:57 ` Nils Holland 2016-12-26 18:57 ` Nils Holland 2016-12-27 8:08 ` Michal Hocko 2016-12-27 8:08 ` Michal Hocko 2016-12-27 11:23 ` Nils Holland 2016-12-27 11:23 ` Nils Holland 2016-12-27 11:27 ` Michal Hocko 2016-12-27 11:27 ` Michal Hocko 2016-12-27 15:55 ` Michal Hocko 2016-12-27 15:55 ` Michal Hocko 2016-12-27 16:28 ` [PATCH] mm, vmscan: consider eligible zones in get_scan_count kbuild test robot 2016-12-28 8:51 ` Michal Hocko 2016-12-28 8:51 ` Michal Hocko 2016-12-27 19:33 ` [RFC PATCH] mm, memcg: fix (Re: OOM: Better, but still there on) Nils Holland 2016-12-27 19:33 ` Nils Holland 2016-12-28 8:57 ` Michal Hocko 2016-12-28 8:57 ` Michal Hocko 2016-12-29 1:20 ` Minchan Kim [this message] 2016-12-29 1:20 ` Minchan Kim 2016-12-29 9:04 ` Michal Hocko 2016-12-29 9:04 ` Michal Hocko 2016-12-30 2:05 ` Minchan Kim 2016-12-30 2:05 ` Minchan Kim 2016-12-30 10:40 ` Michal Hocko 2016-12-30 10:40 ` Michal Hocko 2016-12-29 0:31 ` Minchan Kim 2016-12-29 0:31 ` Minchan Kim 2016-12-29 0:48 ` Minchan Kim 2016-12-29 0:48 ` Minchan Kim 2016-12-29 8:52 ` Michal Hocko 2016-12-29 8:52 ` Michal Hocko 2016-12-30 10:19 ` Mel Gorman 2016-12-30 10:19 ` Mel Gorman 2016-12-30 11:05 ` Michal Hocko 2016-12-30 11:05 ` Michal Hocko 2016-12-30 12:43 ` Mel Gorman 2016-12-30 12:43 ` Mel Gorman 2016-12-25 22:25 ` [lkp-developer] [mm, memcg] d18e2b2aca: WARNING:at_mm/memcontrol.c:#mem_cgroup_update_lru_size kernel test robot 2016-12-25 22:25 ` kernel test robot 2016-12-26 12:26 ` Michal Hocko 2016-12-26 12:26 ` Michal Hocko 2016-12-26 12:26 ` Michal Hocko 2016-12-26 12:50 ` Michal Hocko 2016-12-26 12:50 ` Michal Hocko 2016-12-26 12:50 ` Michal Hocko 2016-12-18 0:28 ` OOM: Better, but still there on Xin Zhou 2016-12-16 18:15 ` OOM: Better, but still there on 4.9 Chris Mason 2016-12-16 18:15 ` Chris Mason 2016-12-16 22:14 ` Michal Hocko 2016-12-16 22:14 ` Michal Hocko 2016-12-16 22:47 ` Chris Mason 2016-12-16 22:47 ` Chris Mason 2016-12-16 23:31 ` Michal Hocko 2016-12-16 23:31 ` Michal Hocko 2016-12-16 19:50 ` Chris Mason 2016-12-16 19:50 ` Chris Mason
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20161229012026.GB15541@bbox \ --to=minchan@kernel.org \ --cc=clm@fb.com \ --cc=dsterba@suse.cz \ --cc=hannes@cmpxchg.org \ --cc=linux-btrfs@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@kernel.org \ --cc=nholland@tisys.org \ --cc=penguin-kernel@I-love.SAKURA.ne.jp \ --cc=vdavydov.dev@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.