linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
@ 2008-09-29 17:10 Gerald Schaefer
  2008-09-29 17:36 ` Andy Whitcroft
  0 siblings, 1 reply; 8+ messages in thread
From: Gerald Schaefer @ 2008-09-29 17:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, schwidefsky, heiko.carstens, KAMEZAWA Hiroyuki,
	Yasunori Goto, Mel Gorman, Andy Whitcroft, Andrew Morton

Hi,

is zone->lru_lock really the right lock to take in setup_per_zone_pages_min()?
All other functions in mm/page_alloc.c take zone->lock instead, for working
with page->lru free-list or PageBuddy().

setup_per_zone_pages_min() eventually calls move_freepages(), which is also
manipulating the page->lru free-list and checking for PageBuddy(). Both
should be protected by zone->lock instead of zone->lru_lock, if I understood
that right, or else there could be a race with the other functions in
mm/page_alloc.c.

We ran into a list corruption bug in free_pages_bulk() once, during memory
hotplug stress test, but cannot reproduce it easily. So I cannot verify if
using zone->lock instead of zone->lru_lock would fix it, but to me it looks
like this may be the problem.

Any thoughts?

BTW, I also wonder if a spin_lock_irq() would be enough, instead of
spin_lock_irqsave(), because this function should never be called from
interrupt context, right?

Thanks,
Gerald



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
  2008-09-29 17:10 setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock Gerald Schaefer
@ 2008-09-29 17:36 ` Andy Whitcroft
  2008-09-29 21:20   ` Gerald Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2008-09-29 17:36 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: linux-kernel, linux-mm, schwidefsky, heiko.carstens,
	KAMEZAWA Hiroyuki, Yasunori Goto, Mel Gorman, Andrew Morton

On Mon, Sep 29, 2008 at 07:10:57PM +0200, Gerald Schaefer wrote:
> Hi,
> 
> is zone->lru_lock really the right lock to take in setup_per_zone_pages_min()?
> All other functions in mm/page_alloc.c take zone->lock instead, for working
> with page->lru free-list or PageBuddy().
> 
> setup_per_zone_pages_min() eventually calls move_freepages(), which is also
> manipulating the page->lru free-list and checking for PageBuddy(). Both
> should be protected by zone->lock instead of zone->lru_lock, if I understood
> that right, or else there could be a race with the other functions in
> mm/page_alloc.c.
> 
> We ran into a list corruption bug in free_pages_bulk() once, during memory
> hotplug stress test, but cannot reproduce it easily. So I cannot verify if
> using zone->lock instead of zone->lru_lock would fix it, but to me it looks
> like this may be the problem.
> 
> Any thoughts?
> 
> BTW, I also wonder if a spin_lock_irq() would be enough, instead of
> spin_lock_irqsave(), because this function should never be called from
> interrupt context, right?

The allocator protects it freelists using zone->lock (as we can see in
rmqueue_bulk), so anything which manipulates those should also be using
that lock.  move_freepages() is scanning the cmap and picking up free
pages directly off the free lists, it is expecting those lists to be
stable; it would appear to need zone->lock.  It does look like
setup_per_zone_pages_min() is holding the wrong thing at first look.

-apw

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
  2008-09-29 17:36 ` Andy Whitcroft
@ 2008-09-29 21:20   ` Gerald Schaefer
  2008-09-30  0:40     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 8+ messages in thread
From: Gerald Schaefer @ 2008-09-29 21:20 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, schwidefsky, heiko.carstens,
	KAMEZAWA Hiroyuki, Yasunori Goto, Mel Gorman, Andrew Morton

On Mon, 2008-09-29 at 18:36 +0100, Andy Whitcroft wrote:
> The allocator protects it freelists using zone->lock (as we can see in
> rmqueue_bulk), so anything which manipulates those should also be using
> that lock.  move_freepages() is scanning the cmap and picking up free
> pages directly off the free lists, it is expecting those lists to be
> stable; it would appear to need zone->lock.  It does look like
> setup_per_zone_pages_min() is holding the wrong thing at first look.

I just noticed that the spin_lock in that function is much older than the
call to setup_zone_migrate_reserve(), which then calls move_freepages().
So it seems that the zone->lru_lock there does (did?) have another purpose,
maybe protecting zone->present_pages/pages_min/etc.

Looks like the need for a zone->lock (if any) was added later, but I'm not
sure if makes sense to take both locks together, or if the lru_lock is still
needed at all.

Thanks,
Gerald



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
  2008-09-29 21:20   ` Gerald Schaefer
@ 2008-09-30  0:40     ` KAMEZAWA Hiroyuki
  2008-09-30  1:53       ` Yasunori Goto
  0 siblings, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-30  0:40 UTC (permalink / raw)
  To: gerald.schaefer
  Cc: Andy Whitcroft, linux-kernel, linux-mm, schwidefsky,
	heiko.carstens, Yasunori Goto, Mel Gorman, Andrew Morton

On Mon, 29 Sep 2008 23:20:05 +0200
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Mon, 2008-09-29 at 18:36 +0100, Andy Whitcroft wrote:
> > The allocator protects it freelists using zone->lock (as we can see in
> > rmqueue_bulk), so anything which manipulates those should also be using
> > that lock.  move_freepages() is scanning the cmap and picking up free
> > pages directly off the free lists, it is expecting those lists to be
> > stable; it would appear to need zone->lock.  It does look like
> > setup_per_zone_pages_min() is holding the wrong thing at first look.
> 
> I just noticed that the spin_lock in that function is much older than the
> call to setup_zone_migrate_reserve(), which then calls move_freepages().
> So it seems that the zone->lru_lock there does (did?) have another purpose,
> maybe protecting zone->present_pages/pages_min/etc.
> 
Maybe.

> Looks like the need for a zone->lock (if any) was added later, but I'm not
> sure if makes sense to take both locks together, or if the lru_lock is still
> needed at all.
>
At first look, replacing zone->lru_lock with zone->lock is enough...
This function is an only one function which use zone->lru_lock in page_alloc.c
And zone_watermark_ok() which access zone->pages_min/low/high is not under any
locks. So, taking zone->lru_lock here doesn't seem to be necessary...


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
  2008-09-30  0:40     ` KAMEZAWA Hiroyuki
@ 2008-09-30  1:53       ` Yasunori Goto
  2008-10-01 17:39         ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
  0 siblings, 1 reply; 8+ messages in thread
From: Yasunori Goto @ 2008-09-30  1:53 UTC (permalink / raw)
  To: gerald.schaefer
  Cc: Andy Whitcroft, linux-kernel, linux-mm, schwidefsky,
	heiko.carstens, Mel Gorman, Andrew Morton, KAMEZAWA Hiroyuki

> On Mon, 29 Sep 2008 23:20:05 +0200
> Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> 
> > On Mon, 2008-09-29 at 18:36 +0100, Andy Whitcroft wrote:
> > > The allocator protects it freelists using zone->lock (as we can see in
> > > rmqueue_bulk), so anything which manipulates those should also be using
> > > that lock.  move_freepages() is scanning the cmap and picking up free
> > > pages directly off the free lists, it is expecting those lists to be
> > > stable; it would appear to need zone->lock.  It does look like
> > > setup_per_zone_pages_min() is holding the wrong thing at first look.
> > 
> > I just noticed that the spin_lock in that function is much older than the
> > call to setup_zone_migrate_reserve(), which then calls move_freepages().
> > So it seems that the zone->lru_lock there does (did?) have another purpose,
> > maybe protecting zone->present_pages/pages_min/etc.
> > 
> Maybe.

The zone->lru_lock() have been used before memory hotplug code was
implemented. But I can't find any reason why it have been used.

> 
> > Looks like the need for a zone->lock (if any) was added later, but I'm not
> > sure if makes sense to take both locks together, or if the lru_lock is still
> > needed at all.
> >
> At first look, replacing zone->lru_lock with zone->lock is enough...
> This function is an only one function which use zone->lru_lock in page_alloc.c
> And zone_watermark_ok() which access zone->pages_min/low/high is not under any
> locks. So, taking zone->lru_lock here doesn't seem to be necessary...

I agree.


Bye.
-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock
  2008-09-30  1:53       ` Yasunori Goto
@ 2008-10-01 17:39         ` Gerald Schaefer
  2008-10-02  5:49           ` KAMEZAWA Hiroyuki
  2008-10-02 10:00           ` Yasunori Goto
  0 siblings, 2 replies; 8+ messages in thread
From: Gerald Schaefer @ 2008-10-01 17:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yasunori Goto, Andy Whitcroft, linux-kernel, linux-mm,
	schwidefsky, heiko.carstens, Mel Gorman, KAMEZAWA Hiroyuki

From: Gerald Schaefer <gerald.schaefer@de.ibm.com> 

This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
There seems to be no need for the lru_lock anymore, but there is a need for
zone->lock instead, because that function may call move_freepages() via
setup_zone_migrate_reserve().

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>

---
 mm/page_alloc.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
 	for_each_zone(zone) {
 		u64 tmp;
 
-		spin_lock_irqsave(&zone->lru_lock, flags);
+		spin_lock_irqsave(&zone->lock, flags);
 		tmp = (u64)pages_min * zone->present_pages;
 		do_div(tmp, lowmem_pages);
 		if (is_highmem(zone)) {
@@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
 		zone->pages_low   = zone->pages_min + (tmp >> 2);
 		zone->pages_high  = zone->pages_min + (tmp >> 1);
 		setup_zone_migrate_reserve(zone);
-		spin_unlock_irqrestore(&zone->lru_lock, flags);
+		spin_unlock_irqrestore(&zone->lock, flags);
 	}
 
 	/* update totalreserve_pages */



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock
  2008-10-01 17:39         ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
@ 2008-10-02  5:49           ` KAMEZAWA Hiroyuki
  2008-10-02 10:00           ` Yasunori Goto
  1 sibling, 0 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-10-02  5:49 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Andrew Morton, Yasunori Goto, Andy Whitcroft, linux-kernel,
	linux-mm, schwidefsky, heiko.carstens, Mel Gorman

On Wed, 01 Oct 2008 19:39:32 +0200
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> From: Gerald Schaefer <gerald.schaefer@de.ibm.com> 
> 
> This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
> There seems to be no need for the lru_lock anymore, but there is a need for
> zone->lock instead, because that function may call move_freepages() via
> setup_zone_migrate_reserve().
> 
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> 
Thank you!.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


> ---
>  mm/page_alloc.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
>  	for_each_zone(zone) {
>  		u64 tmp;
>  
> -		spin_lock_irqsave(&zone->lru_lock, flags);
> +		spin_lock_irqsave(&zone->lock, flags);
>  		tmp = (u64)pages_min * zone->present_pages;
>  		do_div(tmp, lowmem_pages);
>  		if (is_highmem(zone)) {
> @@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
>  		zone->pages_low   = zone->pages_min + (tmp >> 2);
>  		zone->pages_high  = zone->pages_min + (tmp >> 1);
>  		setup_zone_migrate_reserve(zone);
> -		spin_unlock_irqrestore(&zone->lru_lock, flags);
> +		spin_unlock_irqrestore(&zone->lock, flags);
>  	}
>  
>  	/* update totalreserve_pages */
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock
  2008-10-01 17:39         ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
  2008-10-02  5:49           ` KAMEZAWA Hiroyuki
@ 2008-10-02 10:00           ` Yasunori Goto
  1 sibling, 0 replies; 8+ messages in thread
From: Yasunori Goto @ 2008-10-02 10:00 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Andrew Morton, Andy Whitcroft, linux-kernel, linux-mm,
	schwidefsky, heiko.carstens, Mel Gorman, KAMEZAWA Hiroyuki


Thanks!

Tested-by: Yasunori Goto <y-goto@jp.fujitsu.com>


> From: Gerald Schaefer <gerald.schaefer@de.ibm.com> 
> 
> This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
> There seems to be no need for the lru_lock anymore, but there is a need for
> zone->lock instead, because that function may call move_freepages() via
> setup_zone_migrate_reserve().
> 
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> 
> ---
>  mm/page_alloc.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
>  	for_each_zone(zone) {
>  		u64 tmp;
>  
> -		spin_lock_irqsave(&zone->lru_lock, flags);
> +		spin_lock_irqsave(&zone->lock, flags);
>  		tmp = (u64)pages_min * zone->present_pages;
>  		do_div(tmp, lowmem_pages);
>  		if (is_highmem(zone)) {
> @@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
>  		zone->pages_low   = zone->pages_min + (tmp >> 2);
>  		zone->pages_high  = zone->pages_min + (tmp >> 1);
>  		setup_zone_migrate_reserve(zone);
> -		spin_unlock_irqrestore(&zone->lru_lock, flags);
> +		spin_unlock_irqrestore(&zone->lock, flags);
>  	}
>  
>  	/* update totalreserve_pages */
> 
> 

-- 
Yasunori Goto 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-10-02 10:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-29 17:10 setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock Gerald Schaefer
2008-09-29 17:36 ` Andy Whitcroft
2008-09-29 21:20   ` Gerald Schaefer
2008-09-30  0:40     ` KAMEZAWA Hiroyuki
2008-09-30  1:53       ` Yasunori Goto
2008-10-01 17:39         ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
2008-10-02  5:49           ` KAMEZAWA Hiroyuki
2008-10-02 10:00           ` Yasunori Goto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).