From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Rik van Riel <riel@redhat.com>,
anton@sambar.org, linux-kernel@vger.kernel.org,
Michal Hocko <mhocko@suse.cz>,
linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
linuxppc-dev@lists.ozlabs.org, Dan Streetman <ddstreet@ieee.org>
Subject: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages
Date: Fri, 27 Mar 2015 15:23:50 -0700 [thread overview]
Message-ID: <20150327222350.GA22887@linux.vnet.ibm.com> (raw)
In-Reply-To: <5515BAF7.6070604@intel.com>
On 27.03.2015 [13:17:59 -0700], Dave Hansen wrote:
> On 03/27/2015 12:28 PM, Nishanth Aravamudan wrote:
> > @@ -2585,7 +2585,7 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
> >
> > for (i = 0; i <= ZONE_NORMAL; i++) {
> > zone = &pgdat->node_zones[i];
> > - if (!populated_zone(zone))
> > + if (!populated_zone(zone) || !zone_reclaimable(zone))
> > continue;
> >
> > pfmemalloc_reserve += min_wmark_pages(zone);
>
> Do you really want zone_reclaimable()? Or do you want something more
> direct like "zone_reclaimable_pages(zone) == 0"?
Yeah, I guess in my testing this worked out to be the same, since
zone_reclaimable_pages(zone) is 0 and so zone_reclaimable(zone) will
always be false. Thanks!
Based upon 675becce15 ("mm: vmscan: do not throttle based on pfmemalloc
reserves if node has no ZONE_NORMAL") from Mel.
We have a system with the following topology:
# numactl -H
available: 3 nodes (0,2-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31
node 0 size: 28273 MB
node 0 free: 27323 MB
node 2 cpus:
node 2 size: 16384 MB
node 2 free: 0 MB
node 3 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 3 size: 30533 MB
node 3 free: 13273 MB
node distances:
node 0 2 3
0: 10 20 20
2: 20 10 20
3: 20 20 10
Node 2 has no free memory, because:
# cat /sys/devices/system/node/node2/hugepages/hugepages-16777216kB/nr_hugepages
1
This leads to the following zoneinfo:
Node 2, zone DMA
pages free 0
min 1840
low 2300
high 2760
scanned 0
spanned 262144
present 262144
managed 262144
...
all_unreclaimable: 1
If one then attempts to allocate some normal 16M hugepages via
echo 37 > /proc/sys/vm/nr_hugepages
The echo never returns and kswapd2 consumes CPU cycles.
This is because throttle_direct_reclaim ends up calling
wait_event(pfmemalloc_wait, pfmemalloc_watermark_ok...).
pfmemalloc_watermark_ok() in turn checks all zones on the node if there
are any reserves, and if so, then indicates the watermarks are ok, by
seeing if there are sufficient free pages.
675becce15 added a condition already for memoryless nodes. In this case,
though, the node has memory, it is just all consumed (and not
reclaimable). Effectively, though, the result is the same on this call
to pfmemalloc_watermark_ok() and thus seems like a reasonable additional
condition.
With this change, the afore-mentioned 16M hugepage allocation attempt
succeeds and correctly round-robins between Nodes 1 and 3.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
---
v1 -> v2:
Check against zone_reclaimable_pages, rather zone_reclaimable, based
upon feedback from Dave Hansen.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd71bac..c627fa4c991f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2646,7 +2646,8 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
for (i = 0; i <= ZONE_NORMAL; i++) {
zone = &pgdat->node_zones[i];
- if (!populated_zone(zone))
+ if (!populated_zone(zone) ||
+ zone_reclaimable_pages(zone) == 0)
continue;
pfmemalloc_reserve += min_wmark_pages(zone);
next prev parent reply other threads:[~2015-03-27 22:23 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-27 19:28 [PATCH] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable zones Nishanth Aravamudan
2015-03-27 19:39 ` Nishanth Aravamudan
2015-03-27 19:58 ` Dan Streetman
2015-03-27 20:17 ` Dave Hansen
2015-03-27 22:23 ` Nishanth Aravamudan [this message]
2015-03-31 9:48 ` [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages Michal Hocko
2015-04-03 7:57 ` Vlastimil Babka
2015-04-03 17:45 ` Nishanth Aravamudan
2015-05-05 22:09 ` Nishanth Aravamudan
2015-05-06 9:28 ` Vlastimil Babka
2015-05-08 22:47 ` Andrew Morton
2015-05-08 23:18 ` Nishanth Aravamudan
2015-04-03 17:43 ` Nishanth Aravamudan
2015-04-03 18:24 ` Michal Hocko
2015-04-03 18:50 ` Nishanth Aravamudan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150327222350.GA22887@linux.vnet.ibm.com \
--to=nacc@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=anton@sambar.org \
--cc=dave.hansen@intel.com \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).