From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757562AbYHZJ3h (ORCPT ); Tue, 26 Aug 2008 05:29:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752737AbYHZJ33 (ORCPT ); Tue, 26 Aug 2008 05:29:29 -0400 Received: from bigben2.bytemark.co.uk ([80.68.81.132]:58356 "EHLO bigben2.bytemark.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752632AbYHZJ32 (ORCPT ); Tue, 26 Aug 2008 05:29:28 -0400 Date: Tue, 26 Aug 2008 10:29:29 +0100 From: Andy Whitcroft To: Mel Gorman Cc: Adam Litke , Dave Hansen , linux-mm , linux-kernel , Andrew Morton , nacc , agl Subject: Re: [BUG] [PATCH v2] Make setup_zone_migrate_reserve() aware of overlapping nodes Message-ID: <20080826092929.GD29207@brain> References: <1218837685.12953.11.camel@localhost.localdomain> <1219252134.13885.25.camel@localhost.localdomain> <1219255911.8960.41.camel@nimitz> <1219262152.13885.27.camel@localhost.localdomain> <20080821113338.GA29950@csn.ul.ie> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080821113338.GA29950@csn.ul.ie> User-Agent: Mutt/1.5.17+20080114 (2008-01-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 21, 2008 at 12:33:39PM +0100, Mel Gorman wrote: > On (20/08/08 14:55), Adam Litke didst pronounce: > > Changes since V1 > > - Fix build for !NUMA > > - Add VM_BUG_ON() to catch this problem at the source > > > > I have gotten to the root cause of the hugetlb badness I reported back on > > August 15th. My system has the following memory topology (note the > > overlapping node): > > > > Node 0 Memory: 0x8000000-0x44000000 > > Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000 > > > > setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking > > for a pageblock to move onto the MIGRATE_RESERVE list. Finding no > > candidates, it happily continues the scan into 0x8000000-0x44000000. When > > a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on > > the wrong zone. Oops. > > > > (Andrew: once the proper fix is agreed upon, this should also be a > > candidate for -stable.) > > > > setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes. > > > > Signed-off-by: Adam Litke > > > > zone_to_nid(zone) is called every time in the loop even though it will never > change. This is less than optimal but setup_zone_migrate_reserve() is only > called during init and when min_free_kbytes is adjusted so it's not worth > worrying about. Otherwise it looks good. > > Acked-by: Mel Gorman > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index af982f7..feb7916 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -694,6 +694,9 @@ static int move_freepages(struct zone *zone, > > #endif > > > > for (page = start_page; page <= end_page;) { > > + /* Make sure we are not inadvertently changing nodes */ > > + VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); > > + > > if (!pfn_valid_within(page_to_pfn(page))) { > > page++; > > continue; > > @@ -2516,6 +2519,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) > > continue; > > page = pfn_to_page(pfn); > > > > + /* Watch out for overlapping nodes */ > > + if (page_to_nid(page) != zone_to_nid(zone)) > > + continue; > > + > > /* Blocks with reserved pages will never free, skip them. */ > > if (PageReserved(page)) > > continue; This patch looks sane. I do note that we have a config option to tell us whether we have any possibility of overlapping nodes, and we have an early version of a check for this early_pfn_in_nid() in mm.h. You might consider having a non-early variant of this which could be optimised away for those arches which do not have CONFIG_NODES_SPAN_OTHER_NODES. In 'unearlifying' this to pfn_in_nid() I think we have a small naming issue with these function as they are only valid for use with pfns within an existing node. They should probabally both be *pfn_in_nid_within() or something in line with pfn_valid_within(). -apw