From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932884AbaFLIVU (ORCPT ); Thu, 12 Jun 2014 04:21:20 -0400 Received: from cantor2.suse.de ([195.135.220.15]:41206 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754632AbaFLIVR (ORCPT ); Thu, 12 Jun 2014 04:21:17 -0400 Message-ID: <539962EC.20901@suse.cz> Date: Thu, 12 Jun 2014 10:21:00 +0200 From: Vlastimil Babka User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: Minchan Kim CC: David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Greg Thelen , Mel Gorman , Joonsoo Kim , Michal Nazarewicz , Naoya Horiguchi , Christoph Lameter , Rik van Riel Subject: Re: [RFC PATCH 09/10] mm, compaction: try to capture the just-created high-order freepage References: <1402305982-6928-1-git-send-email-vbabka@suse.cz> <1402305982-6928-9-git-send-email-vbabka@suse.cz> <53986E31.7090500@suse.cz> <20140612022011.GB12415@bbox> In-Reply-To: <20140612022011.GB12415@bbox> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/12/2014 04:20 AM, Minchan Kim wrote: > On Wed, Jun 11, 2014 at 04:56:49PM +0200, Vlastimil Babka wrote: >> On 06/09/2014 11:26 AM, Vlastimil Babka wrote: >>> Compaction uses watermark checking to determine if it succeeded in creating >>> a high-order free page. My testing has shown that this is quite racy and it >>> can happen that watermark checking in compaction succeeds, and moments later >>> the watermark checking in page allocation fails, even though the number of >>> free pages has increased meanwhile. >>> >>> It should be more reliable if direct compaction captured the high-order free >>> page as soon as it detects it, and pass it back to allocation. This would >>> also reduce the window for somebody else to allocate the free page. >>> >>> This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a >>> suitable high-order page immediately when it is made available"), but later >>> reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable >>> high-order page") due to flaws. >>> >>> This patch differs from the previous attempt in two aspects: >>> >>> 1) The previous patch scanned free lists to capture the page. In this patch, >>> only the cc->order aligned block that the migration scanner just finished >>> is considered, but only if pages were actually isolated for migration in >>> that block. Tracking cc->order aligned blocks also has benefits for the >>> following patch that skips blocks where non-migratable pages were found. >>> > > Generally I like this. Thanks. >>> 2) In this patch, the isolated free page is allocated through extending >>> get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets >>> all operations such as prep_new_page() and page->pfmemalloc setting that >>> was missing in the previous attempt, zone statistics are updated etc. >>> > > But this part is problem. > Capturing is not common but you are adding more overhead in hotpath for rare cases > where even they are ok to fail so it's not a good deal. > In such case, We have no choice but to do things you mentioned (ex,statistics, > prep_new_page, pfmemalloc) manually in __alloc_pages_direct_compact. OK, I will try. >>> Evaluation is pending. >> >> Uh, so if anyone wants to test it, here's a fixed version, as initial evaluation >> showed it does not actually capture anything (which should not affect patch 10/10 >> though) and debugging this took a while. >> >> - for pageblock_order (i.e. THP), capture was never attempted, as the for cycle >> in isolate_migratepages_range() has ended right before the >> low_pfn == next_capture_pfn check >> - lru_add_drain() has to be done before pcplists drain. This made a big difference >> (~50 successful captures -> ~1300 successful captures) >> Note that __alloc_pages_direct_compact() is missing lru_add_drain() as well, and >> all the existing watermark-based compaction termination decisions (which happen >> before the drain in __alloc_pages_direct_compact()) don't do any draining at all. >> >> -----8<----- >> From: Vlastimil Babka >> Date: Wed, 28 May 2014 17:05:18 +0200 >> Subject: [PATCH fixed 09/10] mm, compaction: try to capture the just-created >> high-order freepage >> >> Compaction uses watermark checking to determine if it succeeded in creating >> a high-order free page. My testing has shown that this is quite racy and it >> can happen that watermark checking in compaction succeeds, and moments later >> the watermark checking in page allocation fails, even though the number of >> free pages has increased meanwhile. >> >> It should be more reliable if direct compaction captured the high-order free >> page as soon as it detects it, and pass it back to allocation. This would >> also reduce the window for somebody else to allocate the free page. >> >> This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a >> suitable high-order page immediately when it is made available"), but later >> reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable >> high-order page") due to flaws. >> >> This patch differs from the previous attempt in two aspects: >> >> 1) The previous patch scanned free lists to capture the page. In this patch, >> only the cc->order aligned block that the migration scanner just finished >> is considered, but only if pages were actually isolated for migration in >> that block. Tracking cc->order aligned blocks also has benefits for the >> following patch that skips blocks where non-migratable pages were found. >> >> 2) In this patch, the isolated free page is allocated through extending >> get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets >> all operations such as prep_new_page() and page->pfmemalloc setting that >> was missing in the previous attempt, zone statistics are updated etc. >> >> Evaluation is pending. >> >> Signed-off-by: Vlastimil Babka >> Cc: Minchan Kim >> Cc: Mel Gorman >> Cc: Joonsoo Kim >> Cc: Michal Nazarewicz >> Cc: Naoya Horiguchi >> Cc: Christoph Lameter >> Cc: Rik van Riel >> Cc: David Rientjes >> --- >> include/linux/compaction.h | 5 ++- >> mm/compaction.c | 103 +++++++++++++++++++++++++++++++++++++++++++-- >> mm/internal.h | 2 + >> mm/page_alloc.c | 69 ++++++++++++++++++++++++------ >> 4 files changed, 161 insertions(+), 18 deletions(-) >> >> diff --git a/include/linux/compaction.h b/include/linux/compaction.h >> index 01e3132..69579f5 100644 >> --- a/include/linux/compaction.h >> +++ b/include/linux/compaction.h >> @@ -10,6 +10,8 @@ >> #define COMPACT_PARTIAL 2 >> /* The full zone was compacted */ >> #define COMPACT_COMPLETE 3 >> +/* Captured a high-order free page in direct compaction */ >> +#define COMPACT_CAPTURED 4 >> >> #ifdef CONFIG_COMPACTION >> extern int sysctl_compact_memory; >> @@ -22,7 +24,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write, >> extern int fragmentation_index(struct zone *zone, unsigned int order); >> extern unsigned long try_to_compact_pages(struct zonelist *zonelist, >> int order, gfp_t gfp_mask, nodemask_t *mask, >> - enum migrate_mode mode, bool *contended); >> + enum migrate_mode mode, bool *contended, >> + struct page **captured_page); >> extern void compact_pgdat(pg_data_t *pgdat, int order); >> extern void reset_isolation_suitable(pg_data_t *pgdat); >> extern unsigned long compaction_suitable(struct zone *zone, int order); >> diff --git a/mm/compaction.c b/mm/compaction.c >> index d1e30ba..2988758 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -541,6 +541,16 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, >> const isolate_mode_t mode = (cc->mode == MIGRATE_ASYNC ? >> ISOLATE_ASYNC_MIGRATE : 0) | >> (unevictable ? ISOLATE_UNEVICTABLE : 0); >> + unsigned long capture_pfn = 0; /* current candidate for capturing */ >> + unsigned long next_capture_pfn = 0; /* next candidate for capturing */ >> + >> + if (cc->order > PAGE_ALLOC_COSTLY_ORDER >> + && gfpflags_to_migratetype(cc->gfp_mask) == MIGRATE_MOVABLE >> + && cc->order <= pageblock_order) { > > You sent with RFC mark so I will not review detailed thing but just design stuff. > > Why does capture work for limited high-order range? I thought the overhead of maintaining the pfn's and trying the capture would be a bad tradeoff for low-order compactions which I suppose have a good chance of succeeding even without capture. But I admit I don't have data to support this yet. > Direct compaction is really costly operation for the process and he did it > at the cost of his resource(ie, timeslice) so anyone try to do direct compaction > deserves to have a precious result regardless of order. > Another question: Why couldn't the capture work for only MIGRATE_CMA? CMA allocations don't go through standard direct compaction. They also use memory isolation to prevent parallel activity from stealing the pages freed by compaction. And importantly they set cc->order = -1, as the goal is not to compact a single high-order page, but arbitrary long range. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by kanga.kvack.org (Postfix) with ESMTP id B866D900002 for ; Thu, 12 Jun 2014 04:21:17 -0400 (EDT) Received: by mail-wg0-f52.google.com with SMTP id b13so829801wgh.23 for ; Thu, 12 Jun 2014 01:21:17 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id ww2si481200wjc.29.2014.06.12.01.21.15 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 12 Jun 2014 01:21:16 -0700 (PDT) Message-ID: <539962EC.20901@suse.cz> Date: Thu, 12 Jun 2014 10:21:00 +0200 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: [RFC PATCH 09/10] mm, compaction: try to capture the just-created high-order freepage References: <1402305982-6928-1-git-send-email-vbabka@suse.cz> <1402305982-6928-9-git-send-email-vbabka@suse.cz> <53986E31.7090500@suse.cz> <20140612022011.GB12415@bbox> In-Reply-To: <20140612022011.GB12415@bbox> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: David Rientjes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Greg Thelen , Mel Gorman , Joonsoo Kim , Michal Nazarewicz , Naoya Horiguchi , Christoph Lameter , Rik van Riel On 06/12/2014 04:20 AM, Minchan Kim wrote: > On Wed, Jun 11, 2014 at 04:56:49PM +0200, Vlastimil Babka wrote: >> On 06/09/2014 11:26 AM, Vlastimil Babka wrote: >>> Compaction uses watermark checking to determine if it succeeded in creating >>> a high-order free page. My testing has shown that this is quite racy and it >>> can happen that watermark checking in compaction succeeds, and moments later >>> the watermark checking in page allocation fails, even though the number of >>> free pages has increased meanwhile. >>> >>> It should be more reliable if direct compaction captured the high-order free >>> page as soon as it detects it, and pass it back to allocation. This would >>> also reduce the window for somebody else to allocate the free page. >>> >>> This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a >>> suitable high-order page immediately when it is made available"), but later >>> reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable >>> high-order page") due to flaws. >>> >>> This patch differs from the previous attempt in two aspects: >>> >>> 1) The previous patch scanned free lists to capture the page. In this patch, >>> only the cc->order aligned block that the migration scanner just finished >>> is considered, but only if pages were actually isolated for migration in >>> that block. Tracking cc->order aligned blocks also has benefits for the >>> following patch that skips blocks where non-migratable pages were found. >>> > > Generally I like this. Thanks. >>> 2) In this patch, the isolated free page is allocated through extending >>> get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets >>> all operations such as prep_new_page() and page->pfmemalloc setting that >>> was missing in the previous attempt, zone statistics are updated etc. >>> > > But this part is problem. > Capturing is not common but you are adding more overhead in hotpath for rare cases > where even they are ok to fail so it's not a good deal. > In such case, We have no choice but to do things you mentioned (ex,statistics, > prep_new_page, pfmemalloc) manually in __alloc_pages_direct_compact. OK, I will try. >>> Evaluation is pending. >> >> Uh, so if anyone wants to test it, here's a fixed version, as initial evaluation >> showed it does not actually capture anything (which should not affect patch 10/10 >> though) and debugging this took a while. >> >> - for pageblock_order (i.e. THP), capture was never attempted, as the for cycle >> in isolate_migratepages_range() has ended right before the >> low_pfn == next_capture_pfn check >> - lru_add_drain() has to be done before pcplists drain. This made a big difference >> (~50 successful captures -> ~1300 successful captures) >> Note that __alloc_pages_direct_compact() is missing lru_add_drain() as well, and >> all the existing watermark-based compaction termination decisions (which happen >> before the drain in __alloc_pages_direct_compact()) don't do any draining at all. >> >> -----8<----- >> From: Vlastimil Babka >> Date: Wed, 28 May 2014 17:05:18 +0200 >> Subject: [PATCH fixed 09/10] mm, compaction: try to capture the just-created >> high-order freepage >> >> Compaction uses watermark checking to determine if it succeeded in creating >> a high-order free page. My testing has shown that this is quite racy and it >> can happen that watermark checking in compaction succeeds, and moments later >> the watermark checking in page allocation fails, even though the number of >> free pages has increased meanwhile. >> >> It should be more reliable if direct compaction captured the high-order free >> page as soon as it detects it, and pass it back to allocation. This would >> also reduce the window for somebody else to allocate the free page. >> >> This has been already implemented by 1fb3f8ca0e92 ("mm: compaction: capture a >> suitable high-order page immediately when it is made available"), but later >> reverted by 8fb74b9f ("mm: compaction: partially revert capture of suitable >> high-order page") due to flaws. >> >> This patch differs from the previous attempt in two aspects: >> >> 1) The previous patch scanned free lists to capture the page. In this patch, >> only the cc->order aligned block that the migration scanner just finished >> is considered, but only if pages were actually isolated for migration in >> that block. Tracking cc->order aligned blocks also has benefits for the >> following patch that skips blocks where non-migratable pages were found. >> >> 2) In this patch, the isolated free page is allocated through extending >> get_page_from_freelist() and buffered_rmqueue(). This ensures that it gets >> all operations such as prep_new_page() and page->pfmemalloc setting that >> was missing in the previous attempt, zone statistics are updated etc. >> >> Evaluation is pending. >> >> Signed-off-by: Vlastimil Babka >> Cc: Minchan Kim >> Cc: Mel Gorman >> Cc: Joonsoo Kim >> Cc: Michal Nazarewicz >> Cc: Naoya Horiguchi >> Cc: Christoph Lameter >> Cc: Rik van Riel >> Cc: David Rientjes >> --- >> include/linux/compaction.h | 5 ++- >> mm/compaction.c | 103 +++++++++++++++++++++++++++++++++++++++++++-- >> mm/internal.h | 2 + >> mm/page_alloc.c | 69 ++++++++++++++++++++++++------ >> 4 files changed, 161 insertions(+), 18 deletions(-) >> >> diff --git a/include/linux/compaction.h b/include/linux/compaction.h >> index 01e3132..69579f5 100644 >> --- a/include/linux/compaction.h >> +++ b/include/linux/compaction.h >> @@ -10,6 +10,8 @@ >> #define COMPACT_PARTIAL 2 >> /* The full zone was compacted */ >> #define COMPACT_COMPLETE 3 >> +/* Captured a high-order free page in direct compaction */ >> +#define COMPACT_CAPTURED 4 >> >> #ifdef CONFIG_COMPACTION >> extern int sysctl_compact_memory; >> @@ -22,7 +24,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write, >> extern int fragmentation_index(struct zone *zone, unsigned int order); >> extern unsigned long try_to_compact_pages(struct zonelist *zonelist, >> int order, gfp_t gfp_mask, nodemask_t *mask, >> - enum migrate_mode mode, bool *contended); >> + enum migrate_mode mode, bool *contended, >> + struct page **captured_page); >> extern void compact_pgdat(pg_data_t *pgdat, int order); >> extern void reset_isolation_suitable(pg_data_t *pgdat); >> extern unsigned long compaction_suitable(struct zone *zone, int order); >> diff --git a/mm/compaction.c b/mm/compaction.c >> index d1e30ba..2988758 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -541,6 +541,16 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, >> const isolate_mode_t mode = (cc->mode == MIGRATE_ASYNC ? >> ISOLATE_ASYNC_MIGRATE : 0) | >> (unevictable ? ISOLATE_UNEVICTABLE : 0); >> + unsigned long capture_pfn = 0; /* current candidate for capturing */ >> + unsigned long next_capture_pfn = 0; /* next candidate for capturing */ >> + >> + if (cc->order > PAGE_ALLOC_COSTLY_ORDER >> + && gfpflags_to_migratetype(cc->gfp_mask) == MIGRATE_MOVABLE >> + && cc->order <= pageblock_order) { > > You sent with RFC mark so I will not review detailed thing but just design stuff. > > Why does capture work for limited high-order range? I thought the overhead of maintaining the pfn's and trying the capture would be a bad tradeoff for low-order compactions which I suppose have a good chance of succeeding even without capture. But I admit I don't have data to support this yet. > Direct compaction is really costly operation for the process and he did it > at the cost of his resource(ie, timeslice) so anyone try to do direct compaction > deserves to have a precious result regardless of order. > Another question: Why couldn't the capture work for only MIGRATE_CMA? CMA allocations don't go through standard direct compaction. They also use memory isolation to prevent parallel activity from stealing the pages freed by compaction. And importantly they set cc->order = -1, as the goal is not to compact a single high-order page, but arbitrary long range. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org