From: Con Kolivas <kernel@kolivas.org>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andrew Morton <akpm@osdl.org>,
ck@vds.kolivas.org, linux list <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: [PATCH] mm: limit lowmem_reserve
Date: Thu, 18 May 2006 00:11:41 +1000 [thread overview]
Message-ID: <200605180011.43216.kernel@kolivas.org> (raw)
In-Reply-To: <443710F7.3040201@yahoo.com.au>
I hate to resuscitate this old thread, sorry but I'm still not sure we
resolved it and I want to make sure this issue isn't here as I see it.
On Saturday 08 April 2006 11:25, Nick Piggin wrote:
> Con Kolivas wrote:
> > Ok. I think I presented enough information for why I thought
> > zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a
> > vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep
> > that much ZONE_DMA free, I don't think I've ever seen that much free on
> > my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users.
> > Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is
> > on any vmsplit too as far as I understand the ratio.
>
> Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
> 12MB protection should never come into it (unless it is buggy?).
An i386 pc with a 3GB split will have approx
4000 pages ZONE_DMA
and lowmem reserve will set lowmem reserve to approx
0 0 3000 3000
So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a
ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's
almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen
like this:
In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a
ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over
all zones from 0 to end_zone and (vmscan.c:1147) we end up calling
if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0))
which would now call zone_watermark_ok with zone being a ZONE_DMA, and
end_zone being the idx of a ZONE_NORMAL.
So in summary if I'm not mistaken (and I'm good at being mistaken), if we
balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end
up trying to flush the crap out of ZONE_DMA.
On my test case this indeed happens and my ZONE_DMA never goes below 3000
pages free. If I lower the reserve even further my pages free gets stuck at
3208 and can't free any more, and doesn't ever drop below that either.
Here is the patch I was proposing
---
It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok fail repeatedly if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
---
mm/page_alloc.c | 24 +++++++++++++++++++++---
1 files changed, 21 insertions(+), 3 deletions(-)
Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c 2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c 2006-04-06 11:28:11.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
zone->lowmem_reserve[j] = 0;
for (idx = j-1; idx >= 0; idx--) {
+ unsigned long max_reserve;
+ unsigned long reserve;
struct zone *lower_zone;
+ lower_zone = pgdat->node_zones + idx;
+ /*
+ * Put an upper limit on the reserve at 1/4
+ * the lower_zone size. This prevents large
+ * zone size differences such as 3G VMSPLIT
+ * or low sysctl values from making
+ * zone_watermark_ok always fail. This
+ * enforces a lower limit on the reserve_ratio
+ */
+ max_reserve = lower_zone->present_pages / 4;
+
if (sysctl_lowmem_reserve_ratio[idx] < 1)
sysctl_lowmem_reserve_ratio[idx] = 1;
-
- lower_zone = pgdat->node_zones + idx;
- lower_zone->lowmem_reserve[j] = present_pages /
+ reserve = present_pages /
sysctl_lowmem_reserve_ratio[idx];
+ if (max_reserve && reserve > max_reserve) {
+ reserve = max_reserve;
+ sysctl_lowmem_reserve_ratio[idx] =
+ present_pages / max_reserve;
+ }
+
+ lower_zone->lowmem_reserve[j] = reserve;
present_pages += lower_zone->present_pages;
}
}
--
-ck
next prev parent reply other threads:[~2006-05-17 14:12 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-02 4:01 2.6.16-ck3 Con Kolivas
2006-04-02 4:46 ` 2.6.16-ck3 Nick Piggin
2006-04-02 8:51 ` 2.6.16-ck3 Con Kolivas
2006-04-02 9:37 ` 2.6.16-ck3 Nick Piggin
2006-04-02 9:39 ` [ck] 2.6.16-ck3 Con Kolivas
2006-04-02 9:51 ` Nick Piggin
2006-04-03 2:48 ` lowmem_reserve question Con Kolivas
2006-04-03 4:42 ` Mike Galbraith
2006-04-03 4:48 ` Con Kolivas
2006-04-03 4:50 ` [ck] " Con Kolivas
2006-04-03 5:14 ` Mike Galbraith
2006-04-03 5:18 ` Con Kolivas
2006-04-03 5:31 ` Mike Galbraith
2006-04-04 2:35 ` [ck] " Con Kolivas
2006-04-06 1:10 ` [PATCH] mm: limit lowmem_reserve Con Kolivas
2006-04-06 1:29 ` Respin: " Con Kolivas
2006-04-06 2:43 ` Andrew Morton
2006-04-06 2:55 ` Con Kolivas
2006-04-06 2:58 ` Con Kolivas
2006-04-06 3:40 ` Andrew Morton
2006-04-06 4:36 ` Con Kolivas
2006-04-06 4:52 ` Con Kolivas
2006-04-07 6:25 ` Nick Piggin
2006-04-07 9:02 ` Con Kolivas
2006-04-07 12:40 ` Nick Piggin
2006-04-08 0:15 ` Con Kolivas
2006-04-08 0:55 ` Nick Piggin
2006-04-08 1:01 ` Con Kolivas
2006-04-08 1:25 ` Nick Piggin
2006-05-17 14:11 ` Con Kolivas [this message]
2006-05-18 7:11 ` Nick Piggin
2006-05-18 7:21 ` Con Kolivas
2006-05-18 7:26 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200605180011.43216.kernel@kolivas.org \
--to=kernel@kolivas.org \
--cc=akpm@osdl.org \
--cc=ck@vds.kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).