linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.16-ck3
@ 2006-04-02  4:01 Con Kolivas
  2006-04-02  4:46 ` 2.6.16-ck3 Nick Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-02  4:01 UTC (permalink / raw)
  To: ck list; +Cc: linux list

[-- Attachment #1: Type: text/plain, Size: 1309 bytes --]

These are patches designed to improve system responsiveness and interactivity. 
It is configurable to any workload but the default ck patch is aimed at the 
desktop and cks is available with more emphasis on serverspace.

THESE INCLUDE THE PATCHES FROM 2.6.16.1 SO START WITH 2.6.16 AS YOUR BASE

Apply to 2.6.16
http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/2.6.16/2.6.16-ck3/patch-2.6.16-ck3.bz2

or server version
http://www.kernel.org/pub/linux/kernel/people/ck/patches/cks/patch-2.6.16-cks3.bz2

web:
http://kernel.kolivas.org

all patches:
http://www.kernel.org/pub/linux/kernel/people/ck/patches/

Split patches available.


Changes:

Added:
 +sched-staircase14.2_15.patch
A major improvement in the staircase code fixes low I/O under heavy cpu load, 
makes it starvation resistant, improves behaviour under heavy "system" loads, 
fixes a slowdown under "compute" mode and has numerous other 
microoptimisations.


Modified:
 -swsusp-post_resume_aggressive_swap_prefetch.patch
 +swsusp-post_resume_aggressive_swap_prefetch-1.patch
A minor change in the code to perform aggressive swap prefetching on swsusp 
resume slightly later should speed up resume time.

 -2.6.16-ck2-version.patch
 +2.6.16-ck3-version.patch
Version update


Cheers,
Con

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.16-ck3
  2006-04-02  4:01 2.6.16-ck3 Con Kolivas
@ 2006-04-02  4:46 ` Nick Piggin
  2006-04-02  8:51   ` 2.6.16-ck3 Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-04-02  4:46 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux list, Andrew Morton

Con Kolivas wrote:
> These are patches designed to improve system responsiveness and interactivity. 
> It is configurable to any workload but the default ck patch is aimed at the 
> desktop and cks is available with more emphasis on serverspace.
> 
> THESE INCLUDE THE PATCHES FROM 2.6.16.1 SO START WITH 2.6.16 AS YOUR BASE
> 
> Apply to 2.6.16
> http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/2.6.16/2.6.16-ck3/patch-2.6.16-ck3.bz2
> 

The swap prefetching here, and the one in -mm AFAIKS still do not follow
the lowmem reserve ratio correctly. This might explain why prefetching
appears to help some people after updatedb swaps stuff out to make room
for pagecache -- it may actually be dipping into lower zones when it
shouldn't.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.16-ck3
  2006-04-02  4:46 ` 2.6.16-ck3 Nick Piggin
@ 2006-04-02  8:51   ` Con Kolivas
  2006-04-02  9:37     ` 2.6.16-ck3 Nick Piggin
  2006-04-02  9:39     ` [ck] 2.6.16-ck3 Con Kolivas
  0 siblings, 2 replies; 33+ messages in thread
From: Con Kolivas @ 2006-04-02  8:51 UTC (permalink / raw)
  To: Nick Piggin; +Cc: ck list, linux list, Andrew Morton

On Sunday 02 April 2006 14:46, Nick Piggin wrote:
> The swap prefetching here, and the one in -mm AFAIKS still do not follow
> the lowmem reserve ratio correctly. This might explain why prefetching
> appears to help some people after updatedb swaps stuff out to make room
> for pagecache -- it may actually be dipping into lower zones when it
> shouldn't.

Curious. I was under the impression lowmem reserve only did anything if you 
manually set it, and the users reporting on swap prefetch behaviour are not 
the sort of users likely to do so. I'm happy to fix whatever the lowmem 
reserve bug is but I doubt this bug is making swap prefetch behave better for 
ordinary users. Well, whatever the case is I'll have another look at lowmem 
reserve of course. 

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: 2.6.16-ck3
  2006-04-02  8:51   ` 2.6.16-ck3 Con Kolivas
@ 2006-04-02  9:37     ` Nick Piggin
  2006-04-02  9:39     ` [ck] 2.6.16-ck3 Con Kolivas
  1 sibling, 0 replies; 33+ messages in thread
From: Nick Piggin @ 2006-04-02  9:37 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck list, linux list, Andrew Morton

Con Kolivas wrote:
> On Sunday 02 April 2006 14:46, Nick Piggin wrote:
> 
> Curious. I was under the impression lowmem reserve only did anything if you 
> manually set it, and the users reporting on swap prefetch behaviour are not 
> the sort of users likely to do so. I'm happy to fix whatever the lowmem 

It is enabled by default for over a year.

> reserve bug is but I doubt this bug is making swap prefetch behave better for 
> ordinary users. Well, whatever the case is I'll have another look at lowmem 
> reserve of course. 
> 

It would potentially make swap prefetch very happy to swap pages into the
dma zone and the normal zone on highmem systems when the system is
otherwise full of pagecache. So it might easily change behaviour on those
systems.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ck] Re: 2.6.16-ck3
  2006-04-02  8:51   ` 2.6.16-ck3 Con Kolivas
  2006-04-02  9:37     ` 2.6.16-ck3 Nick Piggin
@ 2006-04-02  9:39     ` Con Kolivas
  2006-04-02  9:51       ` Nick Piggin
  1 sibling, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-02  9:39 UTC (permalink / raw)
  To: ck; +Cc: Nick Piggin, Andrew Morton, linux list

On Sunday 02 April 2006 18:51, Con Kolivas wrote:
> On Sunday 02 April 2006 14:46, Nick Piggin wrote:
> > The swap prefetching here, and the one in -mm AFAIKS still do not follow
> > the lowmem reserve ratio correctly. This might explain why prefetching
> > appears to help some people after updatedb swaps stuff out to make room
> > for pagecache -- it may actually be dipping into lower zones when it
> > shouldn't.
>
> Curious. I was under the impression lowmem reserve only did anything if you
> manually set it, and the users reporting on swap prefetch behaviour are not
> the sort of users likely to do so. I'm happy to fix whatever the lowmem
> reserve bug is but I doubt this bug is making swap prefetch behave better
> for ordinary users. Well, whatever the case is I'll have another look at
> lowmem reserve of course.

Ok I can't see what I'm doing wrong.

here are my watermarks

idx = zone_idx(z);
ns->lowfree[idx] = z->pages_high * 3 + z->lowmem_reserve[idx];
ns->highfree[idx] = ns->lowfree[idx] + z->pages_high;

It's (3 * pages_high) +lowmem_reserve which is well in excess of the reserve 
so I can't see any problem. Am I missing something?

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ck] Re: 2.6.16-ck3
  2006-04-02  9:39     ` [ck] 2.6.16-ck3 Con Kolivas
@ 2006-04-02  9:51       ` Nick Piggin
  2006-04-03  2:48         ` lowmem_reserve question Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-04-02  9:51 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck, Andrew Morton, linux list

Con Kolivas wrote:

> Ok I can't see what I'm doing wrong.
> 
> here are my watermarks
> 
> idx = zone_idx(z);
> ns->lowfree[idx] = z->pages_high * 3 + z->lowmem_reserve[idx];
> ns->highfree[idx] = ns->lowfree[idx] + z->pages_high;
> 
> It's (3 * pages_high) +lowmem_reserve which is well in excess of the reserve 
> so I can't see any problem. Am I missing something?
> 

That zone->lowmem_reserve[zone_idx(zone)] == 0 ?

;)

lowmem_reserve could be much bigger than zone->high*3, when higher
zones are much larger.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* lowmem_reserve question
  2006-04-02  9:51       ` Nick Piggin
@ 2006-04-03  2:48         ` Con Kolivas
  2006-04-03  4:42           ` Mike Galbraith
  2006-04-04  2:35           ` [ck] " Con Kolivas
  0 siblings, 2 replies; 33+ messages in thread
From: Con Kolivas @ 2006-04-03  2:48 UTC (permalink / raw)
  To: Nick Piggin; +Cc: ck, Andrew Morton, linux list

On Sunday 02 April 2006 19:51, Nick Piggin wrote:
> That zone->lowmem_reserve[zone_idx(zone)] == 0 ?

I haven't figured out how to tackle the swap prefetch issue with lowmem 
reserve just yet. While trying to digest just what the lowmem_reserve does 
and how it's utilised I looked at some of the numbers

int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { 256, 256, 32 };

lower_zone->lowmem_reserve[j] = present_pages / 
sysctl_lowmem_reserve_ratio[idx];

This is interesting because there are no bounds on this value and it seems 
possible to set the sysctl to have a lowmem_reserve that is larger than the 
zone size. Ok that's a sysctl so if a user is setting it wrongly that's their 
fault... or should there be some upper bound?

Furthermore, now that we have the option of up to 3GB lowmem split on 32bit we 
can have a default lowmem_reserve of almost 12MB (if I'm reading it right) 
which seems very tight with only 16MB of ZONE_DMA. 

On a basically idle 1GB lowmem box that I have it looks like this:

Node 0, zone      DMA
  pages free     1025
        min      15
        low      18
        high     22
        active   2185
        inactive 0
        scanned  555 (a: 21 i: 6)
        spanned  4096
        present  4096
        protection: (0, 0, 1007, 1007)

With 3GB lowmem the default settings seem too tight to me. The way I see it, 
there should be some upper bounds on the lowmem reserves. Or perhaps I'm just 
missing something again... I'm feeling even thicker than usual.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: lowmem_reserve question
  2006-04-03  2:48         ` lowmem_reserve question Con Kolivas
@ 2006-04-03  4:42           ` Mike Galbraith
  2006-04-03  4:48             ` Con Kolivas
  2006-04-04  2:35           ` [ck] " Con Kolivas
  1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2006-04-03  4:42 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux list

On Mon, 2006-04-03 at 12:48 +1000, Con Kolivas wrote:
> Furthermore, now that we have the option of up to 3GB lowmem split on 32bit we 
> can have a default lowmem_reserve of almost 12MB (if I'm reading it right) 
> which seems very tight with only 16MB of ZONE_DMA. 

I haven't crawled around in the vm for ages, but I think that's only
16MB if you support antique cards.

	-Mike


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: lowmem_reserve question
  2006-04-03  4:42           ` Mike Galbraith
@ 2006-04-03  4:48             ` Con Kolivas
  2006-04-03  4:50               ` [ck] " Con Kolivas
  2006-04-03  5:14               ` Mike Galbraith
  0 siblings, 2 replies; 33+ messages in thread
From: Con Kolivas @ 2006-04-03  4:48 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Nick Piggin, ck, Andrew Morton, linux list

On Monday 03 April 2006 14:42, Mike Galbraith wrote:
> On Mon, 2006-04-03 at 12:48 +1000, Con Kolivas wrote:
> > Furthermore, now that we have the option of up to 3GB lowmem split on
> > 32bit we can have a default lowmem_reserve of almost 12MB (if I'm reading
> > it right) which seems very tight with only 16MB of ZONE_DMA.
>
> I haven't crawled around in the vm for ages, but I think that's only
> 16MB if you support antique cards.

That's what the ram is used for, but that is all the ZONE_DMA 32bit machines 
get, whether you use it for that purpose or not. My concern is that this will 
have all sorts of effects on the balancing since it will always appear almost 
full.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ck] Re: lowmem_reserve question
  2006-04-03  4:48             ` Con Kolivas
@ 2006-04-03  4:50               ` Con Kolivas
  2006-04-03  5:14               ` Mike Galbraith
  1 sibling, 0 replies; 33+ messages in thread
From: Con Kolivas @ 2006-04-03  4:50 UTC (permalink / raw)
  To: ck; +Cc: Mike Galbraith, Nick Piggin, linux list, Andrew Morton

On Monday 03 April 2006 14:48, Con Kolivas wrote:
> On Monday 03 April 2006 14:42, Mike Galbraith wrote:
> > On Mon, 2006-04-03 at 12:48 +1000, Con Kolivas wrote:
> > > Furthermore, now that we have the option of up to 3GB lowmem split on
> > > 32bit we can have a default lowmem_reserve of almost 12MB (if I'm
> > > reading it right) which seems very tight with only 16MB of ZONE_DMA.
> >
> > I haven't crawled around in the vm for ages, but I think that's only
> > 16MB if you support antique cards.
>
> That's what the ram is used for, but that is all the ZONE_DMA 32bit
> machines get

32bit i386 architecture I mean.

> , whether you use it for that purpose or not. My concern is 
> that this will have all sorts of effects on the balancing since it will
> always appear almost full.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: lowmem_reserve question
  2006-04-03  4:48             ` Con Kolivas
  2006-04-03  4:50               ` [ck] " Con Kolivas
@ 2006-04-03  5:14               ` Mike Galbraith
  2006-04-03  5:18                 ` Con Kolivas
  1 sibling, 1 reply; 33+ messages in thread
From: Mike Galbraith @ 2006-04-03  5:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux list

On Mon, 2006-04-03 at 14:48 +1000, Con Kolivas wrote:
> On Monday 03 April 2006 14:42, Mike Galbraith wrote:
> > On Mon, 2006-04-03 at 12:48 +1000, Con Kolivas wrote:
> > > Furthermore, now that we have the option of up to 3GB lowmem split on
> > > 32bit we can have a default lowmem_reserve of almost 12MB (if I'm reading
> > > it right) which seems very tight with only 16MB of ZONE_DMA.
> >
> > I haven't crawled around in the vm for ages, but I think that's only
> > 16MB if you support antique cards.
> 
> That's what the ram is used for, but that is all the ZONE_DMA 32bit machines 
> get, whether you use it for that purpose or not. My concern is that this will 
> have all sorts of effects on the balancing since it will always appear almost 
> full.

If that dinky 16MB zone still exists, and always appears nearly full, be
happy.  It used to be a real PITA.

	-Mike


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: lowmem_reserve question
  2006-04-03  5:14               ` Mike Galbraith
@ 2006-04-03  5:18                 ` Con Kolivas
  2006-04-03  5:31                   ` Mike Galbraith
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-03  5:18 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Nick Piggin, ck, Andrew Morton, linux list

On Monday 03 April 2006 15:14, Mike Galbraith wrote:
> On Mon, 2006-04-03 at 14:48 +1000, Con Kolivas wrote:
> > On Monday 03 April 2006 14:42, Mike Galbraith wrote:
> > > On Mon, 2006-04-03 at 12:48 +1000, Con Kolivas wrote:
> > > > Furthermore, now that we have the option of up to 3GB lowmem split on
> > > > 32bit we can have a default lowmem_reserve of almost 12MB (if I'm
> > > > reading it right) which seems very tight with only 16MB of ZONE_DMA.
> > >
> > > I haven't crawled around in the vm for ages, but I think that's only
> > > 16MB if you support antique cards.
> >
> > That's what the ram is used for, but that is all the ZONE_DMA 32bit
> > machines get, whether you use it for that purpose or not. My concern is
> > that this will have all sorts of effects on the balancing since it will
> > always appear almost full.
>
> If that dinky 16MB zone still exists, and always appears nearly full, be
> happy.  It used to be a real PITA.

That's not the point. If you try to do any allocation anywhere else it also 
checks that zone, and it will find it full (always) leading to reclaim all 
over the place for no good reason. This has nothing to do with actually 
wanting to use that space or otherwise.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: lowmem_reserve question
  2006-04-03  5:18                 ` Con Kolivas
@ 2006-04-03  5:31                   ` Mike Galbraith
  0 siblings, 0 replies; 33+ messages in thread
From: Mike Galbraith @ 2006-04-03  5:31 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Nick Piggin, ck, Andrew Morton, linux list

On Mon, 2006-04-03 at 15:18 +1000, Con Kolivas wrote:
> On Monday 03 April 2006 15:14, Mike Galbraith wrote:
> > If that dinky 16MB zone still exists, and always appears nearly full, be
> > happy.  It used to be a real PITA.
> 
> That's not the point. If you try to do any allocation anywhere else it also 
> checks that zone, and it will find it full (always) leading to reclaim all 
> over the place for no good reason. This has nothing to do with actually 
> wanting to use that space or otherwise.

That doesn't make any sense.  Why would you scan/reclaim if the zone is
not depleted?

Like I said, I'm _way_ out of date.  The problem scenario used to be
that you run low on memory, dip into dinky dma zone, pin it, then grind
to powder trying to find a reclaimable dma page.

	-Mike


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ck] lowmem_reserve question
  2006-04-03  2:48         ` lowmem_reserve question Con Kolivas
  2006-04-03  4:42           ` Mike Galbraith
@ 2006-04-04  2:35           ` Con Kolivas
  2006-04-06  1:10             ` [PATCH] mm: limit lowmem_reserve Con Kolivas
  1 sibling, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-04  2:35 UTC (permalink / raw)
  To: ck; +Cc: Nick Piggin, Andrew Morton, linux list

On Mon, 3 Apr 2006 12:48 pm, Con Kolivas wrote:
> While trying to digest just what the lowmem_reserve does 
> and how it's utilised I looked at some of the numbers
>
> int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { 256, 256, 32 };
>
> lower_zone->lowmem_reserve[j] = present_pages /
> sysctl_lowmem_reserve_ratio[idx];
>
> This is interesting because there are no bounds on this value and it seems
> possible to set the sysctl to have a lowmem_reserve that is larger than the
> zone size. Ok that's a sysctl so if a user is setting it wrongly that's
> their fault... or should there be some upper bound?
>
> Furthermore, now that we have the option of up to 3GB lowmem split on 32bit
> we can have a default lowmem_reserve of almost 12MB (if I'm reading it
> right) which seems very tight with only 16MB of ZONE_DMA.
>
> On a basically idle 1GB lowmem box that I have it looks like this:
>
> Node 0, zone      DMA
>   pages free     1025
>         min      15
>         low      18
>         high     22
>         active   2185
>         inactive 0
>         scanned  555 (a: 21 i: 6)
>         spanned  4096
>         present  4096
>         protection: (0, 0, 1007, 1007)
>
> With 3GB lowmem the default settings seem too tight to me. The way I see
> it, there should be some upper bounds on the lowmem reserves. Or perhaps
> I'm just missing something again... I'm feeling even thicker than usual.

Silence. Low priority I guess.

If I propose a patch that might get some response. /me threatens to post a 
patch.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] mm: limit lowmem_reserve
  2006-04-04  2:35           ` [ck] " Con Kolivas
@ 2006-04-06  1:10             ` Con Kolivas
  2006-04-06  1:29               ` Respin: " Con Kolivas
  2006-04-07  6:25               ` Nick Piggin
  0 siblings, 2 replies; 33+ messages in thread
From: Con Kolivas @ 2006-04-06  1:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ck, Nick Piggin, linux list, linux-mm

It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok always fail if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 mm/page_alloc.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c	2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c	2006-04-06 11:09:17.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
 			zone->lowmem_reserve[j] = 0;
 
 			for (idx = j-1; idx >= 0; idx--) {
+				unsigned long max_reserve;
+				unsigned long reserve;
 				struct zone *lower_zone;
 
+				lower_zone = pgdat->node_zones + idx;
+				/*
+				 * Put an upper limit on the reserve at 1/4
+				 * the lower_zone size. This prevents large
+				 * zone size differences such as 3G VMSPLIT
+				 * or low sysctl values from making
+				 * zone_watermark_ok always fail. This
+				 * enforces a lower limit on the reserve_ratio
+				 */
+				max_reserve = lower_zone->present_pages / 4;
+
 				if (sysctl_lowmem_reserve_ratio[idx] < 1)
 					sysctl_lowmem_reserve_ratio[idx] = 1;
-
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
+				reserve = present_pages /
 					sysctl_lowmem_reserve_ratio[idx];
+				if (reserve > max_reserve) {
+					reserve = max_reserve;
+					sysctl_lowmem_reserve_ratio[idx] =
+						present_pages / max_reserve;
+				}
+
+				lower_zone->lowmem_reserve[j] = reserve;
 				present_pages += lower_zone->present_pages;
 			}
 		}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  1:10             ` [PATCH] mm: limit lowmem_reserve Con Kolivas
@ 2006-04-06  1:29               ` Con Kolivas
  2006-04-06  2:43                 ` Andrew Morton
  2006-04-07  6:25               ` Nick Piggin
  1 sibling, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-06  1:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, ck, Nick Piggin, linux-mm

Err zone needs to have some pages too sorry.

Respin
---
It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok fail repeatedly if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 mm/page_alloc.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c	2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c	2006-04-06 11:28:11.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
 			zone->lowmem_reserve[j] = 0;
 
 			for (idx = j-1; idx >= 0; idx--) {
+				unsigned long max_reserve;
+				unsigned long reserve;
 				struct zone *lower_zone;
 
+				lower_zone = pgdat->node_zones + idx;
+				/*
+				 * Put an upper limit on the reserve at 1/4
+				 * the lower_zone size. This prevents large
+				 * zone size differences such as 3G VMSPLIT
+				 * or low sysctl values from making
+				 * zone_watermark_ok always fail. This
+				 * enforces a lower limit on the reserve_ratio
+				 */
+				max_reserve = lower_zone->present_pages / 4;
+
 				if (sysctl_lowmem_reserve_ratio[idx] < 1)
 					sysctl_lowmem_reserve_ratio[idx] = 1;
-
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
+				reserve = present_pages /
 					sysctl_lowmem_reserve_ratio[idx];
+				if (max_reserve && reserve > max_reserve) {
+					reserve = max_reserve;
+					sysctl_lowmem_reserve_ratio[idx] =
+						present_pages / max_reserve;
+				}
+
+				lower_zone->lowmem_reserve[j] = reserve;
 				present_pages += lower_zone->present_pages;
 			}
 		}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  1:29               ` Respin: " Con Kolivas
@ 2006-04-06  2:43                 ` Andrew Morton
  2006-04-06  2:55                   ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2006-04-06  2:43 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, ck, nickpiggin, linux-mm

Con Kolivas <kernel@kolivas.org> wrote:
>
> It is possible with a low enough lowmem_reserve ratio to make
>  zone_watermark_ok fail repeatedly if the lower_zone is small enough.

Is that actually a problem?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  2:43                 ` Andrew Morton
@ 2006-04-06  2:55                   ` Con Kolivas
  2006-04-06  2:58                     ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-06  2:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, ck, nickpiggin, linux-mm

On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > It is possible with a low enough lowmem_reserve ratio to make
> >  zone_watermark_ok fail repeatedly if the lower_zone is small enough.
>
> Is that actually a problem?

Every single call to get_page_from_freelist will call on zone reclaim. It 
seems a problem to me if every call to __alloc_pages will do that?

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  2:55                   ` Con Kolivas
@ 2006-04-06  2:58                     ` Con Kolivas
  2006-04-06  3:40                       ` Andrew Morton
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-06  2:58 UTC (permalink / raw)
  To: ck; +Cc: Andrew Morton, nickpiggin, linux-kernel, linux-mm

On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > It is possible with a low enough lowmem_reserve ratio to make
> > >  zone_watermark_ok fail repeatedly if the lower_zone is small enough.
> >
> > Is that actually a problem?
>
> Every single call to get_page_from_freelist will call on zone reclaim. It
> seems a problem to me if every call to __alloc_pages will do that?

every call to __alloc_pages of that zone I mean

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  2:58                     ` Con Kolivas
@ 2006-04-06  3:40                       ` Andrew Morton
  2006-04-06  4:36                         ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2006-04-06  3:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck, nickpiggin, linux-kernel, linux-mm

Con Kolivas <kernel@kolivas.org> wrote:
>
> On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> > On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > It is possible with a low enough lowmem_reserve ratio to make
> > > >  zone_watermark_ok fail repeatedly if the lower_zone is small enough.
> > >
> > > Is that actually a problem?
> >
> > Every single call to get_page_from_freelist will call on zone reclaim. It
> > seems a problem to me if every call to __alloc_pages will do that?
> 
> every call to __alloc_pages of that zone I mean
> 

One would need to check with the NUMA guys.  zone_reclaim() has a
(lame-looking) timer in there to prevent it from doing too much work.

That, or I'm missing something.  This problem wasn't particularly well
described, sorry.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  3:40                       ` Andrew Morton
@ 2006-04-06  4:36                         ` Con Kolivas
  2006-04-06  4:52                           ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-06  4:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ck, nickpiggin, linux-kernel, linux-mm

On Thursday 06 April 2006 13:40, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> > > On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > > It is possible with a low enough lowmem_reserve ratio to make
> > > > >  zone_watermark_ok fail repeatedly if the lower_zone is small
> > > > > enough.
> > > >
> > > > Is that actually a problem?
> > >
> > > Every single call to get_page_from_freelist will call on zone reclaim.
> > > It seems a problem to me if every call to __alloc_pages will do that?
> >
> > every call to __alloc_pages of that zone I mean
>
> One would need to check with the NUMA guys.  zone_reclaim() has a
> (lame-looking) timer in there to prevent it from doing too much work.
>
> That, or I'm missing something.  This problem wasn't particularly well
> described, sorry.

Ah ok. This all came about because I'm trying to honour the lowmem_reserve 
better in swap_prefetch at Nick's request. It's hard to honour a watermark 
that on some configurations is never reached.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  4:36                         ` Con Kolivas
@ 2006-04-06  4:52                           ` Con Kolivas
  0 siblings, 0 replies; 33+ messages in thread
From: Con Kolivas @ 2006-04-06  4:52 UTC (permalink / raw)
  To: ck; +Cc: Andrew Morton, nickpiggin, linux-kernel, linux-mm

On Thursday 06 April 2006 14:36, Con Kolivas wrote:
> On Thursday 06 April 2006 13:40, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> > > > On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > > > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > > > It is possible with a low enough lowmem_reserve ratio to make
> > > > > >  zone_watermark_ok fail repeatedly if the lower_zone is small
> > > > > > enough.
> > > > >
> > > > > Is that actually a problem?
> > > >
> > > > Every single call to get_page_from_freelist will call on zone
> > > > reclaim. It seems a problem to me if every call to __alloc_pages will
> > > > do that?
> > >
> > > every call to __alloc_pages of that zone I mean
> >
> > One would need to check with the NUMA guys.  zone_reclaim() has a
> > (lame-looking) timer in there to prevent it from doing too much work.
> >
> > That, or I'm missing something.  This problem wasn't particularly well
> > described, sorry.
>
> Ah ok. This all came about because I'm trying to honour the lowmem_reserve
> better in swap_prefetch at Nick's request. It's hard to honour a watermark
> that on some configurations is never reached.

Forget that. If the numa people don't care about it I shouldn't touch it. I 
thought I was doing something helpful at the source but got no response from 
Nick or the the other numa_ids out there so they obviously don't care. I'll 
tackle it differently in swap prefetch.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-06  1:10             ` [PATCH] mm: limit lowmem_reserve Con Kolivas
  2006-04-06  1:29               ` Respin: " Con Kolivas
@ 2006-04-07  6:25               ` Nick Piggin
  2006-04-07  9:02                 ` Con Kolivas
  1 sibling, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-04-07  6:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> It is possible with a low enough lowmem_reserve ratio to make
> zone_watermark_ok always fail if the lower_zone is small enough.

I don't see how this would happen?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-07  6:25               ` Nick Piggin
@ 2006-04-07  9:02                 ` Con Kolivas
  2006-04-07 12:40                   ` Nick Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-07  9:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Friday 07 April 2006 16:25, Nick Piggin wrote:
> Con Kolivas wrote:
> > It is possible with a low enough lowmem_reserve ratio to make
> > zone_watermark_ok always fail if the lower_zone is small enough.
>
> I don't see how this would happen?

3GB lowmem and a reserve ratio of 180 is enough to do it.

Cheers,
Con

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-07  9:02                 ` Con Kolivas
@ 2006-04-07 12:40                   ` Nick Piggin
  2006-04-08  0:15                     ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-04-07 12:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Friday 07 April 2006 16:25, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>It is possible with a low enough lowmem_reserve ratio to make
>>>zone_watermark_ok always fail if the lower_zone is small enough.
>>
>>I don't see how this would happen?
> 
> 
> 3GB lowmem and a reserve ratio of 180 is enough to do it.
> 

How would zone_watermark_ok always fail though?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-07 12:40                   ` Nick Piggin
@ 2006-04-08  0:15                     ` Con Kolivas
  2006-04-08  0:55                       ` Nick Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-08  0:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Friday 07 April 2006 22:40, Nick Piggin wrote:
> Con Kolivas wrote:
> > On Friday 07 April 2006 16:25, Nick Piggin wrote:
> >>Con Kolivas wrote:
> >>>It is possible with a low enough lowmem_reserve ratio to make
> >>>zone_watermark_ok always fail if the lower_zone is small enough.
> >>
> >>I don't see how this would happen?
> >
> > 3GB lowmem and a reserve ratio of 180 is enough to do it.
>
> How would zone_watermark_ok always fail though?

Withdrew this patch a while back; ignore

-- 
-ck

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  0:15                     ` Con Kolivas
@ 2006-04-08  0:55                       ` Nick Piggin
  2006-04-08  1:01                         ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-04-08  0:55 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Friday 07 April 2006 22:40, Nick Piggin wrote:
> 

>>How would zone_watermark_ok always fail though?
> 
> 
> Withdrew this patch a while back; ignore
> 

Well, whether or not that particular patch isa good idea, it
is definitely a bug if zone_watermark_ok could ever always
fail due to lowmem reserve and we should fix it.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  0:55                       ` Nick Piggin
@ 2006-04-08  1:01                         ` Con Kolivas
  2006-04-08  1:25                           ` Nick Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-04-08  1:01 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Saturday 08 April 2006 10:55, Nick Piggin wrote:
> Con Kolivas wrote:
> > On Friday 07 April 2006 22:40, Nick Piggin wrote:
> >>How would zone_watermark_ok always fail though?
> >
> > Withdrew this patch a while back; ignore
>
> Well, whether or not that particular patch isa good idea, it
> is definitely a bug if zone_watermark_ok could ever always
> fail due to lowmem reserve and we should fix it.

Ok. I think I presented enough information for why I thought zone_watermark_ok 
would fail (for ZONE_DMA). With 16MB ZONE_DMA and a vmsplit of 3GB we have a 
lowmem_reserve of 12MB. It's pretty hard to keep that much ZONE_DMA free, I 
don't think I've ever seen that much free on my ZONE_DMA on an ordinary 
desktop without any particular ZONE_DMA users. Changing the tunable can make 
the lowmem_reserve larger than ZONE_DMA is on any vmsplit too as far as I 
understand the ratio.

-- 
-ck

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  1:01                         ` Con Kolivas
@ 2006-04-08  1:25                           ` Nick Piggin
  2006-05-17 14:11                             ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-04-08  1:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Saturday 08 April 2006 10:55, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>On Friday 07 April 2006 22:40, Nick Piggin wrote:
>>>
>>>>How would zone_watermark_ok always fail though?
>>>
>>>Withdrew this patch a while back; ignore
>>
>>Well, whether or not that particular patch isa good idea, it
>>is definitely a bug if zone_watermark_ok could ever always
>>fail due to lowmem reserve and we should fix it.
> 
> 
> Ok. I think I presented enough information for why I thought zone_watermark_ok 
> would fail (for ZONE_DMA). With 16MB ZONE_DMA and a vmsplit of 3GB we have a 
> lowmem_reserve of 12MB. It's pretty hard to keep that much ZONE_DMA free, I 
> don't think I've ever seen that much free on my ZONE_DMA on an ordinary 
> desktop without any particular ZONE_DMA users. Changing the tunable can make 
> the lowmem_reserve larger than ZONE_DMA is on any vmsplit too as far as I 
> understand the ratio.
> 

Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
12MB protection should never come into it (unless it is buggy?).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  1:25                           ` Nick Piggin
@ 2006-05-17 14:11                             ` Con Kolivas
  2006-05-18  7:11                               ` Nick Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-05-17 14:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

I hate to resuscitate this old thread, sorry but I'm still not sure we 
resolved it and I want to make sure this issue isn't here as I see it.

On Saturday 08 April 2006 11:25, Nick Piggin wrote:
> Con Kolivas wrote:
> > Ok. I think I presented enough information for why I thought
> > zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a
> > vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep
> > that much ZONE_DMA free, I don't think I've ever seen that much free on
> > my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users.
> > Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is
> > on any vmsplit too as far as I understand the ratio.
>
> Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
> 12MB protection should never come into it (unless it is buggy?).

An i386 pc with a 3GB split will have approx

4000 pages ZONE_DMA

and lowmem reserve will set lowmem reserve to approx

0 0 3000 3000

So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a 
ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's 
almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen 
like this:

In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a 
ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over 
all zones from 0 to end_zone and (vmscan.c:1147) we end up calling

if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0))

which would now call zone_watermark_ok with zone being a ZONE_DMA, and 
end_zone being the idx of a ZONE_NORMAL.

So in summary if I'm not mistaken (and I'm good at being mistaken), if we 
balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end 
up trying to flush the crap out of ZONE_DMA.

On my test case this indeed happens and my ZONE_DMA never goes below 3000
pages free. If I lower the reserve even further my pages free gets stuck at
3208 and can't free any more, and doesn't ever drop below that either.

Here is the patch I was proposing

---
It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok fail repeatedly if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 mm/page_alloc.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c	2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c	2006-04-06 11:28:11.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
 			zone->lowmem_reserve[j] = 0;
 
 			for (idx = j-1; idx >= 0; idx--) {
+				unsigned long max_reserve;
+				unsigned long reserve;
 				struct zone *lower_zone;
 
+				lower_zone = pgdat->node_zones + idx;
+				/*
+				 * Put an upper limit on the reserve at 1/4
+				 * the lower_zone size. This prevents large
+				 * zone size differences such as 3G VMSPLIT
+				 * or low sysctl values from making
+				 * zone_watermark_ok always fail. This
+				 * enforces a lower limit on the reserve_ratio
+				 */
+				max_reserve = lower_zone->present_pages / 4;
+
 				if (sysctl_lowmem_reserve_ratio[idx] < 1)
 					sysctl_lowmem_reserve_ratio[idx] = 1;
-
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
+				reserve = present_pages /
 					sysctl_lowmem_reserve_ratio[idx];
+				if (max_reserve && reserve > max_reserve) {
+					reserve = max_reserve;
+					sysctl_lowmem_reserve_ratio[idx] =
+						present_pages / max_reserve;
+				}
+
+				lower_zone->lowmem_reserve[j] = reserve;
 				present_pages += lower_zone->present_pages;
 			}
 		}


-- 
-ck

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-05-17 14:11                             ` Con Kolivas
@ 2006-05-18  7:11                               ` Nick Piggin
  2006-05-18  7:21                                 ` Con Kolivas
  0 siblings, 1 reply; 33+ messages in thread
From: Nick Piggin @ 2006-05-18  7:11 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> I hate to resuscitate this old thread, sorry but I'm still not sure we 
> resolved it and I want to make sure this issue isn't here as I see it.
> 

OK, reclaim is slightly different.

> On Saturday 08 April 2006 11:25, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>Ok. I think I presented enough information for why I thought
>>>zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a
>>>vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep
>>>that much ZONE_DMA free, I don't think I've ever seen that much free on
>>>my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users.
>>>Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is
>>>on any vmsplit too as far as I understand the ratio.
>>
>>Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
>>12MB protection should never come into it (unless it is buggy?).
> 
> 
> An i386 pc with a 3GB split will have approx
> 
> 4000 pages ZONE_DMA
> 
> and lowmem reserve will set lowmem reserve to approx
> 
> 0 0 3000 3000
> 
> So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a 
> ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's 
> almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen 
> like this:
> 
> In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a 
> ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over 
> all zones from 0 to end_zone and (vmscan.c:1147) we end up calling
> 
> if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0))
> 
> which would now call zone_watermark_ok with zone being a ZONE_DMA, and 
> end_zone being the idx of a ZONE_NORMAL.
> 
> So in summary if I'm not mistaken (and I'm good at being mistaken), if we 
> balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end 
> up trying to flush the crap out of ZONE_DMA.

If we're under memory pressure, kswapd will try to free up any candidate
zone, yes.

> 
> On my test case this indeed happens and my ZONE_DMA never goes below 3000
> pages free. If I lower the reserve even further my pages free gets stuck at
> 3208 and can't free any more, and doesn't ever drop below that either.
> 
> Here is the patch I was proposing

What problem does that fix though?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-05-18  7:11                               ` Nick Piggin
@ 2006-05-18  7:21                                 ` Con Kolivas
  2006-05-18  7:26                                   ` Nick Piggin
  0 siblings, 1 reply; 33+ messages in thread
From: Con Kolivas @ 2006-05-18  7:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Thursday 18 May 2006 17:11, Nick Piggin wrote:
> If we're under memory pressure, kswapd will try to free up any candidate
> zone, yes.
>
> > On my test case this indeed happens and my ZONE_DMA never goes below 3000
> > pages free. If I lower the reserve even further my pages free gets stuck
> > at 3208 and can't free any more, and doesn't ever drop below that either.
> >
> > Here is the patch I was proposing
>
> What problem does that fix though?

It's a generic concern and I honestly don't know how significant it is which 
is why I'm asking if it needs attention. That concern being that any time 
we're under any sort of memory pressure, ZONE_DMA will undergo intense 
reclaim even though there may not really be anything specifically going on in 
ZONE_DMA. It just seems a waste of cycles doing that.

-- 
-ck

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-05-18  7:21                                 ` Con Kolivas
@ 2006-05-18  7:26                                   ` Nick Piggin
  0 siblings, 0 replies; 33+ messages in thread
From: Nick Piggin @ 2006-05-18  7:26 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Thursday 18 May 2006 17:11, Nick Piggin wrote:
> 
>>If we're under memory pressure, kswapd will try to free up any candidate
>>zone, yes.
>>
>>
>>>On my test case this indeed happens and my ZONE_DMA never goes below 3000
>>>pages free. If I lower the reserve even further my pages free gets stuck
>>>at 3208 and can't free any more, and doesn't ever drop below that either.
>>>
>>>Here is the patch I was proposing
>>
>>What problem does that fix though?
> 
> 
> It's a generic concern and I honestly don't know how significant it is which 
> is why I'm asking if it needs attention. That concern being that any time 
> we're under any sort of memory pressure, ZONE_DMA will undergo intense 
> reclaim even though there may not really be anything specifically going on in 
> ZONE_DMA. It just seems a waste of cycles doing that.
> 

If it doesn't have any/much pagecache or slab cache in it, there won't be
intense reclaim; if it does then it can be reclaimed and the memory used.

reclaim / allocation could be slightly smarter about scaling watermarks,
however I don't think it is much of an issue at the moment.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2006-05-18  7:26 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-02  4:01 2.6.16-ck3 Con Kolivas
2006-04-02  4:46 ` 2.6.16-ck3 Nick Piggin
2006-04-02  8:51   ` 2.6.16-ck3 Con Kolivas
2006-04-02  9:37     ` 2.6.16-ck3 Nick Piggin
2006-04-02  9:39     ` [ck] 2.6.16-ck3 Con Kolivas
2006-04-02  9:51       ` Nick Piggin
2006-04-03  2:48         ` lowmem_reserve question Con Kolivas
2006-04-03  4:42           ` Mike Galbraith
2006-04-03  4:48             ` Con Kolivas
2006-04-03  4:50               ` [ck] " Con Kolivas
2006-04-03  5:14               ` Mike Galbraith
2006-04-03  5:18                 ` Con Kolivas
2006-04-03  5:31                   ` Mike Galbraith
2006-04-04  2:35           ` [ck] " Con Kolivas
2006-04-06  1:10             ` [PATCH] mm: limit lowmem_reserve Con Kolivas
2006-04-06  1:29               ` Respin: " Con Kolivas
2006-04-06  2:43                 ` Andrew Morton
2006-04-06  2:55                   ` Con Kolivas
2006-04-06  2:58                     ` Con Kolivas
2006-04-06  3:40                       ` Andrew Morton
2006-04-06  4:36                         ` Con Kolivas
2006-04-06  4:52                           ` Con Kolivas
2006-04-07  6:25               ` Nick Piggin
2006-04-07  9:02                 ` Con Kolivas
2006-04-07 12:40                   ` Nick Piggin
2006-04-08  0:15                     ` Con Kolivas
2006-04-08  0:55                       ` Nick Piggin
2006-04-08  1:01                         ` Con Kolivas
2006-04-08  1:25                           ` Nick Piggin
2006-05-17 14:11                             ` Con Kolivas
2006-05-18  7:11                               ` Nick Piggin
2006-05-18  7:21                                 ` Con Kolivas
2006-05-18  7:26                                   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).