From: hejianet <hejianet@gmail.com> To: Michal Hocko <mhocko@kernel.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mgorman@techsingularity.net>, Vlastimil Babka <vbabka@suse.cz>, Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com> Subject: Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there Date: Wed, 22 Feb 2017 22:31:50 +0800 [thread overview] Message-ID: <e07c7437-37e4-3630-0bd9-3f225412fd52@gmail.com> (raw) In-Reply-To: <20170222114105.GI5753@dhcp22.suse.cz> Hi Michal On 22/02/2017 7:41 PM, Michal Hocko wrote: > On Wed 22-02-17 17:04:48, Jia He wrote: >> When I try to dynamically allocate the hugepages more than system total >> free memory: >> e.g. echo 4000 >/proc/sys/vm/nr_hugepages > > I assume that the command has terminated with less huge pages allocated > than requested but > Yes, at last the allocated hugepages are less than 4000 HugePages_Total: 1864 HugePages_Free: 1864 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 16384 kB In the bad case, although kswapd takes 100% cpu, the number of HugePages_Total is not increase at all. >> Node 3, zone DMA > [...] >> pages free 2951 >> min 2821 >> low 3526 >> high 4231 > > it left the zone below high watermark with > >> node_scanned 0 >> spanned 245760 >> present 245760 >> managed 245388 >> nr_free_pages 2951 >> nr_zone_inactive_anon 0 >> nr_zone_active_anon 0 >> nr_zone_inactive_file 0 >> nr_zone_active_file 0 > > no pages reclaimable, so kswapd will not go to sleep. It would be quite > easy and comfortable to call it a misconfiguration but it seems that > it might be quite easy to hit with NUMA machines which have large > differences in the node sizes. I guess it makes sense to back off > the kswapd rather than burning CPU without any way to make forward > progress. agree. > > [...] > >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 532a2a7..a05e3ab 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -3139,7 +3139,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) >> if (!managed_zone(zone)) >> continue; >> >> - if (!zone_balanced(zone, order, classzone_idx)) >> + if (!zone_balanced(zone, order, classzone_idx) >> + && zone_reclaimable_pages(zone)) >> return false; > > OK, this makes some sense, although zone_reclaimable_pages doesn't count > SLAB reclaimable pages. So we might go to sleep with a reclaimable slab > still around. This is not really easy to address because the reclaimable > slab doesn't really imply that those pages will be reclaimed... Yes, even in the bad case, when kswapd takes all the cpu, the reclaimable pages are not decreased > >> } >> >> @@ -3502,6 +3503,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) >> { >> pg_data_t *pgdat; >> int z; >> + int node_has_relaimable_pages = 0; >> >> if (!managed_zone(zone)) >> return; >> @@ -3522,8 +3524,15 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) >> >> if (zone_balanced(zone, order, classzone_idx)) >> return; >> + >> + if (!zone_reclaimable_pages(zone)) >> + node_has_relaimable_pages = 1; > > What, this doesn't make any sense? Did you mean if (zone_reclaimable_pages)? I mean, if any one zone has reclaimable pages, then this zone's *node* has reclaimable pages. Thus, the kswapN for this node should be waken up. e.g. node 1 has 2 zones. zone A has no reclaimable pages but zone B has. Thus node 1 has reclaimable pages, and kswapd1 will be waken up. I use node_has_relaimable_pages in the loop to check all the zones' reclaimable pages number. So I prefer the name node_has_relaimable_pages instead of zone_has_relaimable_pages Did I understand it correctly? Thanks B.R. Jia > >> } >> >> + /* Dont wake kswapd if no reclaimable pages */ >> + if (!node_has_relaimable_pages) >> + return; >> + >> trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); >> wake_up_interruptible(&pgdat->kswapd_wait); >> } >> -- >> 1.8.5.6 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >
WARNING: multiple messages have this Message-ID (diff)
From: hejianet <hejianet@gmail.com> To: Michal Hocko <mhocko@kernel.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mgorman@techsingularity.net>, Vlastimil Babka <vbabka@suse.cz>, Minchan Kim <minchan@kernel.org>, Rik van Riel <riel@redhat.com> Subject: Re: [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there Date: Wed, 22 Feb 2017 22:31:50 +0800 [thread overview] Message-ID: <e07c7437-37e4-3630-0bd9-3f225412fd52@gmail.com> (raw) In-Reply-To: <20170222114105.GI5753@dhcp22.suse.cz> Hi Michal On 22/02/2017 7:41 PM, Michal Hocko wrote: > On Wed 22-02-17 17:04:48, Jia He wrote: >> When I try to dynamically allocate the hugepages more than system total >> free memory: >> e.g. echo 4000 >/proc/sys/vm/nr_hugepages > > I assume that the command has terminated with less huge pages allocated > than requested but > Yes, at last the allocated hugepages are less than 4000 HugePages_Total: 1864 HugePages_Free: 1864 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 16384 kB In the bad case, although kswapd takes 100% cpu, the number of HugePages_Total is not increase at all. >> Node 3, zone DMA > [...] >> pages free 2951 >> min 2821 >> low 3526 >> high 4231 > > it left the zone below high watermark with > >> node_scanned 0 >> spanned 245760 >> present 245760 >> managed 245388 >> nr_free_pages 2951 >> nr_zone_inactive_anon 0 >> nr_zone_active_anon 0 >> nr_zone_inactive_file 0 >> nr_zone_active_file 0 > > no pages reclaimable, so kswapd will not go to sleep. It would be quite > easy and comfortable to call it a misconfiguration but it seems that > it might be quite easy to hit with NUMA machines which have large > differences in the node sizes. I guess it makes sense to back off > the kswapd rather than burning CPU without any way to make forward > progress. agree. > > [...] > >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 532a2a7..a05e3ab 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -3139,7 +3139,8 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) >> if (!managed_zone(zone)) >> continue; >> >> - if (!zone_balanced(zone, order, classzone_idx)) >> + if (!zone_balanced(zone, order, classzone_idx) >> + && zone_reclaimable_pages(zone)) >> return false; > > OK, this makes some sense, although zone_reclaimable_pages doesn't count > SLAB reclaimable pages. So we might go to sleep with a reclaimable slab > still around. This is not really easy to address because the reclaimable > slab doesn't really imply that those pages will be reclaimed... Yes, even in the bad case, when kswapd takes all the cpu, the reclaimable pages are not decreased > >> } >> >> @@ -3502,6 +3503,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) >> { >> pg_data_t *pgdat; >> int z; >> + int node_has_relaimable_pages = 0; >> >> if (!managed_zone(zone)) >> return; >> @@ -3522,8 +3524,15 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) >> >> if (zone_balanced(zone, order, classzone_idx)) >> return; >> + >> + if (!zone_reclaimable_pages(zone)) >> + node_has_relaimable_pages = 1; > > What, this doesn't make any sense? Did you mean if (zone_reclaimable_pages)? I mean, if any one zone has reclaimable pages, then this zone's *node* has reclaimable pages. Thus, the kswapN for this node should be waken up. e.g. node 1 has 2 zones. zone A has no reclaimable pages but zone B has. Thus node 1 has reclaimable pages, and kswapd1 will be waken up. I use node_has_relaimable_pages in the loop to check all the zones' reclaimable pages number. So I prefer the name node_has_relaimable_pages instead of zone_has_relaimable_pages Did I understand it correctly? Thanks B.R. Jia > >> } >> >> + /* Dont wake kswapd if no reclaimable pages */ >> + if (!node_has_relaimable_pages) >> + return; >> + >> trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); >> wake_up_interruptible(&pgdat->kswapd_wait); >> } >> -- >> 1.8.5.6 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-02-22 14:32 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-02-22 9:04 [RFC PATCH] mm/vmscan: fix high cpu usage of kswapd if there Jia He 2017-02-22 9:04 ` Jia He 2017-02-22 11:41 ` Michal Hocko 2017-02-22 11:41 ` Michal Hocko 2017-02-22 14:31 ` hejianet [this message] 2017-02-22 14:31 ` hejianet 2017-02-22 15:48 ` Michal Hocko 2017-02-22 15:48 ` Michal Hocko 2017-02-23 2:25 ` hejianet 2017-02-23 2:25 ` hejianet 2017-02-22 20:16 ` Johannes Weiner 2017-02-22 20:16 ` Johannes Weiner 2017-02-22 20:24 ` Johannes Weiner 2017-02-22 20:24 ` Johannes Weiner 2017-02-23 7:29 ` Michal Hocko 2017-02-23 7:29 ` Michal Hocko 2017-02-23 2:24 ` hejianet 2017-02-23 2:46 ` hejianet 2017-02-23 3:15 ` Fwd: " hejianet 2017-02-23 7:21 ` Michal Hocko 2017-02-23 7:21 ` Michal Hocko 2017-02-23 10:19 ` Michal Hocko 2017-02-23 10:19 ` Michal Hocko 2017-02-23 11:16 ` Michal Hocko 2017-02-23 11:16 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e07c7437-37e4-3630-0bd9-3f225412fd52@gmail.com \ --to=hejianet@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=hannes@cmpxchg.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@kernel.org \ --cc=minchan@kernel.org \ --cc=riel@redhat.com \ --cc=vbabka@suse.cz \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.