* setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
@ 2008-09-29 17:10 Gerald Schaefer
2008-09-29 17:36 ` Andy Whitcroft
0 siblings, 1 reply; 8+ messages in thread
From: Gerald Schaefer @ 2008-09-29 17:10 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, schwidefsky, heiko.carstens, KAMEZAWA Hiroyuki,
Yasunori Goto, Mel Gorman, Andy Whitcroft, Andrew Morton
Hi,
is zone->lru_lock really the right lock to take in setup_per_zone_pages_min()?
All other functions in mm/page_alloc.c take zone->lock instead, for working
with page->lru free-list or PageBuddy().
setup_per_zone_pages_min() eventually calls move_freepages(), which is also
manipulating the page->lru free-list and checking for PageBuddy(). Both
should be protected by zone->lock instead of zone->lru_lock, if I understood
that right, or else there could be a race with the other functions in
mm/page_alloc.c.
We ran into a list corruption bug in free_pages_bulk() once, during memory
hotplug stress test, but cannot reproduce it easily. So I cannot verify if
using zone->lock instead of zone->lru_lock would fix it, but to me it looks
like this may be the problem.
Any thoughts?
BTW, I also wonder if a spin_lock_irq() would be enough, instead of
spin_lock_irqsave(), because this function should never be called from
interrupt context, right?
Thanks,
Gerald
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
2008-09-29 17:10 setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock Gerald Schaefer
@ 2008-09-29 17:36 ` Andy Whitcroft
2008-09-29 21:20 ` Gerald Schaefer
0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2008-09-29 17:36 UTC (permalink / raw)
To: Gerald Schaefer
Cc: linux-kernel, linux-mm, schwidefsky, heiko.carstens,
KAMEZAWA Hiroyuki, Yasunori Goto, Mel Gorman, Andrew Morton
On Mon, Sep 29, 2008 at 07:10:57PM +0200, Gerald Schaefer wrote:
> Hi,
>
> is zone->lru_lock really the right lock to take in setup_per_zone_pages_min()?
> All other functions in mm/page_alloc.c take zone->lock instead, for working
> with page->lru free-list or PageBuddy().
>
> setup_per_zone_pages_min() eventually calls move_freepages(), which is also
> manipulating the page->lru free-list and checking for PageBuddy(). Both
> should be protected by zone->lock instead of zone->lru_lock, if I understood
> that right, or else there could be a race with the other functions in
> mm/page_alloc.c.
>
> We ran into a list corruption bug in free_pages_bulk() once, during memory
> hotplug stress test, but cannot reproduce it easily. So I cannot verify if
> using zone->lock instead of zone->lru_lock would fix it, but to me it looks
> like this may be the problem.
>
> Any thoughts?
>
> BTW, I also wonder if a spin_lock_irq() would be enough, instead of
> spin_lock_irqsave(), because this function should never be called from
> interrupt context, right?
The allocator protects it freelists using zone->lock (as we can see in
rmqueue_bulk), so anything which manipulates those should also be using
that lock. move_freepages() is scanning the cmap and picking up free
pages directly off the free lists, it is expecting those lists to be
stable; it would appear to need zone->lock. It does look like
setup_per_zone_pages_min() is holding the wrong thing at first look.
-apw
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
2008-09-29 17:36 ` Andy Whitcroft
@ 2008-09-29 21:20 ` Gerald Schaefer
2008-09-30 0:40 ` KAMEZAWA Hiroyuki
0 siblings, 1 reply; 8+ messages in thread
From: Gerald Schaefer @ 2008-09-29 21:20 UTC (permalink / raw)
To: Andy Whitcroft
Cc: linux-kernel, linux-mm, schwidefsky, heiko.carstens,
KAMEZAWA Hiroyuki, Yasunori Goto, Mel Gorman, Andrew Morton
On Mon, 2008-09-29 at 18:36 +0100, Andy Whitcroft wrote:
> The allocator protects it freelists using zone->lock (as we can see in
> rmqueue_bulk), so anything which manipulates those should also be using
> that lock. move_freepages() is scanning the cmap and picking up free
> pages directly off the free lists, it is expecting those lists to be
> stable; it would appear to need zone->lock. It does look like
> setup_per_zone_pages_min() is holding the wrong thing at first look.
I just noticed that the spin_lock in that function is much older than the
call to setup_zone_migrate_reserve(), which then calls move_freepages().
So it seems that the zone->lru_lock there does (did?) have another purpose,
maybe protecting zone->present_pages/pages_min/etc.
Looks like the need for a zone->lock (if any) was added later, but I'm not
sure if makes sense to take both locks together, or if the lru_lock is still
needed at all.
Thanks,
Gerald
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
2008-09-29 21:20 ` Gerald Schaefer
@ 2008-09-30 0:40 ` KAMEZAWA Hiroyuki
2008-09-30 1:53 ` Yasunori Goto
0 siblings, 1 reply; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-09-30 0:40 UTC (permalink / raw)
To: gerald.schaefer
Cc: Andy Whitcroft, linux-kernel, linux-mm, schwidefsky,
heiko.carstens, Yasunori Goto, Mel Gorman, Andrew Morton
On Mon, 29 Sep 2008 23:20:05 +0200
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> On Mon, 2008-09-29 at 18:36 +0100, Andy Whitcroft wrote:
> > The allocator protects it freelists using zone->lock (as we can see in
> > rmqueue_bulk), so anything which manipulates those should also be using
> > that lock. move_freepages() is scanning the cmap and picking up free
> > pages directly off the free lists, it is expecting those lists to be
> > stable; it would appear to need zone->lock. It does look like
> > setup_per_zone_pages_min() is holding the wrong thing at first look.
>
> I just noticed that the spin_lock in that function is much older than the
> call to setup_zone_migrate_reserve(), which then calls move_freepages().
> So it seems that the zone->lru_lock there does (did?) have another purpose,
> maybe protecting zone->present_pages/pages_min/etc.
>
Maybe.
> Looks like the need for a zone->lock (if any) was added later, but I'm not
> sure if makes sense to take both locks together, or if the lru_lock is still
> needed at all.
>
At first look, replacing zone->lru_lock with zone->lock is enough...
This function is an only one function which use zone->lru_lock in page_alloc.c
And zone_watermark_ok() which access zone->pages_min/low/high is not under any
locks. So, taking zone->lru_lock here doesn't seem to be necessary...
Thanks,
-Kame
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock
2008-09-30 0:40 ` KAMEZAWA Hiroyuki
@ 2008-09-30 1:53 ` Yasunori Goto
2008-10-01 17:39 ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
0 siblings, 1 reply; 8+ messages in thread
From: Yasunori Goto @ 2008-09-30 1:53 UTC (permalink / raw)
To: gerald.schaefer
Cc: Andy Whitcroft, linux-kernel, linux-mm, schwidefsky,
heiko.carstens, Mel Gorman, Andrew Morton, KAMEZAWA Hiroyuki
> On Mon, 29 Sep 2008 23:20:05 +0200
> Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
>
> > On Mon, 2008-09-29 at 18:36 +0100, Andy Whitcroft wrote:
> > > The allocator protects it freelists using zone->lock (as we can see in
> > > rmqueue_bulk), so anything which manipulates those should also be using
> > > that lock. move_freepages() is scanning the cmap and picking up free
> > > pages directly off the free lists, it is expecting those lists to be
> > > stable; it would appear to need zone->lock. It does look like
> > > setup_per_zone_pages_min() is holding the wrong thing at first look.
> >
> > I just noticed that the spin_lock in that function is much older than the
> > call to setup_zone_migrate_reserve(), which then calls move_freepages().
> > So it seems that the zone->lru_lock there does (did?) have another purpose,
> > maybe protecting zone->present_pages/pages_min/etc.
> >
> Maybe.
The zone->lru_lock() have been used before memory hotplug code was
implemented. But I can't find any reason why it have been used.
>
> > Looks like the need for a zone->lock (if any) was added later, but I'm not
> > sure if makes sense to take both locks together, or if the lru_lock is still
> > needed at all.
> >
> At first look, replacing zone->lru_lock with zone->lock is enough...
> This function is an only one function which use zone->lru_lock in page_alloc.c
> And zone_watermark_ok() which access zone->pages_min/low/high is not under any
> locks. So, taking zone->lru_lock here doesn't seem to be necessary...
I agree.
Bye.
--
Yasunori Goto
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock
2008-09-30 1:53 ` Yasunori Goto
@ 2008-10-01 17:39 ` Gerald Schaefer
2008-10-02 5:49 ` KAMEZAWA Hiroyuki
2008-10-02 10:00 ` Yasunori Goto
0 siblings, 2 replies; 8+ messages in thread
From: Gerald Schaefer @ 2008-10-01 17:39 UTC (permalink / raw)
To: Andrew Morton
Cc: Yasunori Goto, Andy Whitcroft, linux-kernel, linux-mm,
schwidefsky, heiko.carstens, Mel Gorman, KAMEZAWA Hiroyuki
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
There seems to be no need for the lru_lock anymore, but there is a need for
zone->lock instead, because that function may call move_freepages() via
setup_zone_migrate_reserve().
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
for_each_zone(zone) {
u64 tmp;
- spin_lock_irqsave(&zone->lru_lock, flags);
+ spin_lock_irqsave(&zone->lock, flags);
tmp = (u64)pages_min * zone->present_pages;
do_div(tmp, lowmem_pages);
if (is_highmem(zone)) {
@@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
zone->pages_low = zone->pages_min + (tmp >> 2);
zone->pages_high = zone->pages_min + (tmp >> 1);
setup_zone_migrate_reserve(zone);
- spin_unlock_irqrestore(&zone->lru_lock, flags);
+ spin_unlock_irqrestore(&zone->lock, flags);
}
/* update totalreserve_pages */
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock
2008-10-01 17:39 ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
@ 2008-10-02 5:49 ` KAMEZAWA Hiroyuki
2008-10-02 10:00 ` Yasunori Goto
1 sibling, 0 replies; 8+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-10-02 5:49 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Andrew Morton, Yasunori Goto, Andy Whitcroft, linux-kernel,
linux-mm, schwidefsky, heiko.carstens, Mel Gorman
On Wed, 01 Oct 2008 19:39:32 +0200
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:
> From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>
> This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
> There seems to be no need for the lru_lock anymore, but there is a need for
> zone->lock instead, because that function may call move_freepages() via
> setup_zone_migrate_reserve().
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>
Thank you!.
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
> mm/page_alloc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
> for_each_zone(zone) {
> u64 tmp;
>
> - spin_lock_irqsave(&zone->lru_lock, flags);
> + spin_lock_irqsave(&zone->lock, flags);
> tmp = (u64)pages_min * zone->present_pages;
> do_div(tmp, lowmem_pages);
> if (is_highmem(zone)) {
> @@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
> zone->pages_low = zone->pages_min + (tmp >> 2);
> zone->pages_high = zone->pages_min + (tmp >> 1);
> setup_zone_migrate_reserve(zone);
> - spin_unlock_irqrestore(&zone->lru_lock, flags);
> + spin_unlock_irqrestore(&zone->lock, flags);
> }
>
> /* update totalreserve_pages */
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock
2008-10-01 17:39 ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
2008-10-02 5:49 ` KAMEZAWA Hiroyuki
@ 2008-10-02 10:00 ` Yasunori Goto
1 sibling, 0 replies; 8+ messages in thread
From: Yasunori Goto @ 2008-10-02 10:00 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Andrew Morton, Andy Whitcroft, linux-kernel, linux-mm,
schwidefsky, heiko.carstens, Mel Gorman, KAMEZAWA Hiroyuki
Thanks!
Tested-by: Yasunori Goto <y-goto@jp.fujitsu.com>
> From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>
> This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock.
> There seems to be no need for the lru_lock anymore, but there is a need for
> zone->lock instead, because that function may call move_freepages() via
> setup_zone_migrate_reserve().
>
> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>
> ---
> mm/page_alloc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -4207,7 +4207,7 @@ void setup_per_zone_pages_min(void)
> for_each_zone(zone) {
> u64 tmp;
>
> - spin_lock_irqsave(&zone->lru_lock, flags);
> + spin_lock_irqsave(&zone->lock, flags);
> tmp = (u64)pages_min * zone->present_pages;
> do_div(tmp, lowmem_pages);
> if (is_highmem(zone)) {
> @@ -4239,7 +4239,7 @@ void setup_per_zone_pages_min(void)
> zone->pages_low = zone->pages_min + (tmp >> 2);
> zone->pages_high = zone->pages_min + (tmp >> 1);
> setup_zone_migrate_reserve(zone);
> - spin_unlock_irqrestore(&zone->lru_lock, flags);
> + spin_unlock_irqrestore(&zone->lock, flags);
> }
>
> /* update totalreserve_pages */
>
>
--
Yasunori Goto
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-10-02 10:04 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-29 17:10 setup_per_zone_pages_min(): zone->lock vs. zone->lru_lock Gerald Schaefer
2008-09-29 17:36 ` Andy Whitcroft
2008-09-29 21:20 ` Gerald Schaefer
2008-09-30 0:40 ` KAMEZAWA Hiroyuki
2008-09-30 1:53 ` Yasunori Goto
2008-10-01 17:39 ` [PATCH] setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock Gerald Schaefer
2008-10-02 5:49 ` KAMEZAWA Hiroyuki
2008-10-02 10:00 ` Yasunori Goto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).