* [patch] mm, compaction: ignore watermarks when isolating free pages @ 2016-06-15 22:34 David Rientjes 2016-06-16 7:15 ` Vlastimil Babka 0 siblings, 1 reply; 5+ messages in thread From: David Rientjes @ 2016-06-15 22:34 UTC (permalink / raw) To: Andrew Morton, Mel Gorman Cc: Hugh Dickins, Vlastimil Babka, Joonsoo Kim, linux-kernel, linux-mm The goal of memory compaction is to defragment memory by moving migratable pages to free pages at the end of the zone. No additional memory is being allocated. Ignore per-zone low watermarks in __isolate_free_page() because memory is either fully migrated or isolated free pages are returned when migration fails. This fixes an issue where the compaction freeing scanner can isolate memory but the zone drops below its low watermark for that page order, so the scanner must continue to scan all memory pointlessly. Signed-off-by: David Rientjes <rientjes@google.com> --- mm/page_alloc.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2484,23 +2484,14 @@ EXPORT_SYMBOL_GPL(split_page); int __isolate_free_page(struct page *page, unsigned int order) { - unsigned long watermark; struct zone *zone; - int mt; + const int mt = get_pageblock_migratetype(page); BUG_ON(!PageBuddy(page)); - zone = page_zone(page); - mt = get_pageblock_migratetype(page); - - if (!is_migrate_isolate(mt)) { - /* Obey watermarks as if the page was being allocated */ - watermark = low_wmark_pages(zone) + (1 << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) - return 0; + if (!is_migrate_isolate(mt)) __mod_zone_freepage_state(zone, -(1UL << order), mt); - } /* Remove page from free list */ list_del(&page->lru); @@ -2520,7 +2511,6 @@ int __isolate_free_page(struct page *page, unsigned int order) } } - return 1UL << order; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mm, compaction: ignore watermarks when isolating free pages 2016-06-15 22:34 [patch] mm, compaction: ignore watermarks when isolating free pages David Rientjes @ 2016-06-16 7:15 ` Vlastimil Babka 2016-06-20 22:27 ` [patch] mm, compaction: abort free scanner if split fails David Rientjes 0 siblings, 1 reply; 5+ messages in thread From: Vlastimil Babka @ 2016-06-16 7:15 UTC (permalink / raw) To: David Rientjes, Andrew Morton, Mel Gorman Cc: Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm, Michal Hocko On 06/16/2016 12:34 AM, David Rientjes wrote: > The goal of memory compaction is to defragment memory by moving migratable > pages to free pages at the end of the zone. No additional memory is being > allocated. > > Ignore per-zone low watermarks in __isolate_free_page() because memory is > either fully migrated or isolated free pages are returned when migration > fails. Michal Hocko suggested that too, but I didn't think it safe that compaction should go below the min watermark, even temporarily. It means the system is struggling with order-0 allocations, so making it worse for the benefit of high-order allocations doesn't make sense. The high-order allocation would likely fail anyway due to watermark checks, even if the page of sufficient order was formed by compaction. So in my series, I just changed the low watermark check to min [1]. > This fixes an issue where the compaction freeing scanner can isolate > memory but the zone drops below its low watermark for that page order, so > the scanner must continue to scan all memory pointlessly. Good point, looks like failing the watermark is the only reason when __isolate_free_page() can fail. isolate_freepages_block() and its callers should take this as an indication that compaction should return with failure immediately. [1] http://article.gmane.org/gmane.linux.kernel/2231369 > Signed-off-by: David Rientjes <rientjes@google.com> > --- > mm/page_alloc.c | 14 ++------------ > 1 file changed, 2 insertions(+), 12 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2484,23 +2484,14 @@ EXPORT_SYMBOL_GPL(split_page); > > int __isolate_free_page(struct page *page, unsigned int order) > { > - unsigned long watermark; > struct zone *zone; > - int mt; > + const int mt = get_pageblock_migratetype(page); > > BUG_ON(!PageBuddy(page)); > - > zone = page_zone(page); > - mt = get_pageblock_migratetype(page); > - > - if (!is_migrate_isolate(mt)) { > - /* Obey watermarks as if the page was being allocated */ > - watermark = low_wmark_pages(zone) + (1 << order); > - if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) > - return 0; > > + if (!is_migrate_isolate(mt)) > __mod_zone_freepage_state(zone, -(1UL << order), mt); > - } > > /* Remove page from free list */ > list_del(&page->lru); > @@ -2520,7 +2511,6 @@ int __isolate_free_page(struct page *page, unsigned int order) > } > } > > - > return 1UL << order; > } > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch] mm, compaction: abort free scanner if split fails 2016-06-16 7:15 ` Vlastimil Babka @ 2016-06-20 22:27 ` David Rientjes 2016-06-21 11:43 ` Vlastimil Babka 0 siblings, 1 reply; 5+ messages in thread From: David Rientjes @ 2016-06-20 22:27 UTC (permalink / raw) To: Andrew Morton, Vlastimil Babka Cc: Mel Gorman, Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm If the memory compaction free scanner cannot successfully split a free page (only possible due to per-zone low watermark), terminate the free scanner rather than continuing to scan memory needlessly. If the per-zone watermark is insufficient for a free page of order <= cc->order, then terminate the scanner since future splits will also likely fail. This prevents the compaction freeing scanner from scanning all memory on very large zones (very noticeable for zones > 128GB, for instance) when all splits will likely fail. Signed-off-by: David Rientjes <rientjes@google.com> --- mm/compaction.c | 49 +++++++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c --- a/mm/compaction.c +++ b/mm/compaction.c @@ -494,24 +494,22 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* Found a free page, will break it into order-0 pages */ order = page_order(page); - isolated = __isolate_free_page(page, page_order(page)); + isolated = __isolate_free_page(page, order); + if (!isolated) + break; set_page_private(page, order); total_isolated += isolated; list_add_tail(&page->lru, freelist); - /* If a page was split, advance to the end of it */ - if (isolated) { - cc->nr_freepages += isolated; - if (!strict && - cc->nr_migratepages <= cc->nr_freepages) { - blockpfn += isolated; - break; - } - - blockpfn += isolated - 1; - cursor += isolated - 1; - continue; + /* Advance to the end of split page */ + cc->nr_freepages += isolated; + if (!strict && cc->nr_migratepages <= cc->nr_freepages) { + blockpfn += isolated; + break; } + blockpfn += isolated - 1; + cursor += isolated - 1; + continue; isolate_fail: if (strict) @@ -521,6 +519,9 @@ isolate_fail: } + if (locked) + spin_unlock_irqrestore(&cc->zone->lock, flags); + /* * There is a tiny chance that we have read bogus compound_order(), * so be careful to not go outside of the pageblock. @@ -542,9 +543,6 @@ isolate_fail: if (strict && blockpfn < end_pfn) total_isolated = 0; - if (locked) - spin_unlock_irqrestore(&cc->zone->lock, flags); - /* Update the pageblock-skip if the whole pageblock was scanned */ if (blockpfn == end_pfn) update_pageblock_skip(cc, valid_page, total_isolated, false); @@ -622,7 +620,7 @@ isolate_freepages_range(struct compact_control *cc, */ } - /* split_free_page does not map the pages */ + /* __isolate_free_page() does not map the pages */ map_pages(&freelist); if (pfn < end_pfn) { @@ -1071,6 +1069,7 @@ static void isolate_freepages(struct compact_control *cc) block_end_pfn = block_start_pfn, block_start_pfn -= pageblock_nr_pages, isolate_start_pfn = block_start_pfn) { + unsigned long isolated; /* * This can iterate a massively long zone without finding any @@ -1095,8 +1094,12 @@ static void isolate_freepages(struct compact_control *cc) continue; /* Found a block suitable for isolating free pages from. */ - isolate_freepages_block(cc, &isolate_start_pfn, - block_end_pfn, freelist, false); + isolated = isolate_freepages_block(cc, &isolate_start_pfn, + block_end_pfn, freelist, false); + /* If free page split failed, do not continue needlessly */ + if (!isolated && isolate_start_pfn < block_end_pfn && + cc->nr_freepages <= cc->nr_migratepages) + break; /* * If we isolated enough freepages, or aborted due to async @@ -1124,7 +1127,7 @@ static void isolate_freepages(struct compact_control *cc) } } - /* split_free_page does not map the pages */ + /* __isolate_free_page() does not map the pages */ map_pages(freelist); /* @@ -1703,6 +1706,12 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, continue; } + /* Don't attempt compaction if splitting free page will fail */ + if (!zone_watermark_ok(zone, 0, + low_wmark_pages(zone) + (1 << order), + 0, 0)) + continue; + status = compact_zone_order(zone, order, gfp_mask, mode, &zone_contended, alloc_flags, ac_classzone_idx(ac)); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mm, compaction: abort free scanner if split fails 2016-06-20 22:27 ` [patch] mm, compaction: abort free scanner if split fails David Rientjes @ 2016-06-21 11:43 ` Vlastimil Babka 2016-06-21 20:43 ` David Rientjes 0 siblings, 1 reply; 5+ messages in thread From: Vlastimil Babka @ 2016-06-21 11:43 UTC (permalink / raw) To: David Rientjes, Andrew Morton Cc: Mel Gorman, Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm, Minchan Kim On 06/21/2016 12:27 AM, David Rientjes wrote: > If the memory compaction free scanner cannot successfully split a free > page (only possible due to per-zone low watermark), terminate the free > scanner rather than continuing to scan memory needlessly. > > If the per-zone watermark is insufficient for a free page of > order <= cc->order, then terminate the scanner since future splits will > also likely fail. > > This prevents the compaction freeing scanner from scanning all memory on > very large zones (very noticeable for zones > 128GB, for instance) when > all splits will likely fail. > > Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> But some notes below. > --- > mm/compaction.c | 49 +++++++++++++++++++++++++++++-------------------- > 1 file changed, 29 insertions(+), 20 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -494,24 +494,22 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, > > /* Found a free page, will break it into order-0 pages */ > order = page_order(page); > - isolated = __isolate_free_page(page, page_order(page)); > + isolated = __isolate_free_page(page, order); > + if (!isolated) > + break; This seems to fix as a side-effect a bug in Joonsoo's mmotm patch mm-compaction-split-freepages-without-holding-the-zone-lock.patch, that Minchan found: http://marc.info/?l=linux-mm&m=146607176528495&w=2 So it should be noted somewhere so they are merged together. Or Joonsoo posts an isolated fix and this patch has to rebase. > set_page_private(page, order); > total_isolated += isolated; > list_add_tail(&page->lru, freelist); > > - /* If a page was split, advance to the end of it */ > - if (isolated) { > - cc->nr_freepages += isolated; > - if (!strict && > - cc->nr_migratepages <= cc->nr_freepages) { > - blockpfn += isolated; > - break; > - } > - > - blockpfn += isolated - 1; > - cursor += isolated - 1; > - continue; > + /* Advance to the end of split page */ > + cc->nr_freepages += isolated; > + if (!strict && cc->nr_migratepages <= cc->nr_freepages) { > + blockpfn += isolated; > + break; > } > + blockpfn += isolated - 1; > + cursor += isolated - 1; > + continue; > > isolate_fail: > if (strict) > @@ -521,6 +519,9 @@ isolate_fail: > > } > > + if (locked) > + spin_unlock_irqrestore(&cc->zone->lock, flags); > + > /* > * There is a tiny chance that we have read bogus compound_order(), > * so be careful to not go outside of the pageblock. > @@ -542,9 +543,6 @@ isolate_fail: > if (strict && blockpfn < end_pfn) > total_isolated = 0; > > - if (locked) > - spin_unlock_irqrestore(&cc->zone->lock, flags); > - > /* Update the pageblock-skip if the whole pageblock was scanned */ > if (blockpfn == end_pfn) > update_pageblock_skip(cc, valid_page, total_isolated, false); > @@ -622,7 +620,7 @@ isolate_freepages_range(struct compact_control *cc, > */ > } > > - /* split_free_page does not map the pages */ > + /* __isolate_free_page() does not map the pages */ > map_pages(&freelist); > > if (pfn < end_pfn) { > @@ -1071,6 +1069,7 @@ static void isolate_freepages(struct compact_control *cc) > block_end_pfn = block_start_pfn, > block_start_pfn -= pageblock_nr_pages, > isolate_start_pfn = block_start_pfn) { > + unsigned long isolated; > > /* > * This can iterate a massively long zone without finding any > @@ -1095,8 +1094,12 @@ static void isolate_freepages(struct compact_control *cc) > continue; > > /* Found a block suitable for isolating free pages from. */ > - isolate_freepages_block(cc, &isolate_start_pfn, > - block_end_pfn, freelist, false); > + isolated = isolate_freepages_block(cc, &isolate_start_pfn, > + block_end_pfn, freelist, false); > + /* If free page split failed, do not continue needlessly */ More accurately, free page isolation failed? > + if (!isolated && isolate_start_pfn < block_end_pfn && > + cc->nr_freepages <= cc->nr_migratepages) > + break; > > /* > * If we isolated enough freepages, or aborted due to async > @@ -1124,7 +1127,7 @@ static void isolate_freepages(struct compact_control *cc) > } > } > > - /* split_free_page does not map the pages */ > + /* __isolate_free_page() does not map the pages */ > map_pages(freelist); > > /* > @@ -1703,6 +1706,12 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > continue; > } > > + /* Don't attempt compaction if splitting free page will fail */ > + if (!zone_watermark_ok(zone, 0, > + low_wmark_pages(zone) + (1 << order), > + 0, 0)) > + continue; > + Please don't add this, compact_zone already checks this via compaction_suitable() (and the usual 2 << order gap), so this is adding yet another watermark check with a different kind of gap. Thanks. > status = compact_zone_order(zone, order, gfp_mask, mode, > &zone_contended, alloc_flags, > ac_classzone_idx(ac)); > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mm, compaction: abort free scanner if split fails 2016-06-21 11:43 ` Vlastimil Babka @ 2016-06-21 20:43 ` David Rientjes 0 siblings, 0 replies; 5+ messages in thread From: David Rientjes @ 2016-06-21 20:43 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm, Minchan Kim On Tue, 21 Jun 2016, Vlastimil Babka wrote: > > diff --git a/mm/compaction.c b/mm/compaction.c > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -494,24 +494,22 @@ static unsigned long isolate_freepages_block(struct > > compact_control *cc, > > > > /* Found a free page, will break it into order-0 pages */ > > order = page_order(page); > > - isolated = __isolate_free_page(page, page_order(page)); > > + isolated = __isolate_free_page(page, order); > > + if (!isolated) > > + break; > > This seems to fix as a side-effect a bug in Joonsoo's mmotm patch > mm-compaction-split-freepages-without-holding-the-zone-lock.patch, that > Minchan found: http://marc.info/?l=linux-mm&m=146607176528495&w=2 > > So it should be noted somewhere so they are merged together. Or Joonsoo posts > an isolated fix and this patch has to rebase. > Indeed, I hadn't noticed the differences between Linus's tree and -mm. Thanks very much for pointing it out. My interest is to eventually backport this to a much older kernel where we suffer from the same issue: it seems that we have always not terminated the freeing scanner when splitting the free page fails and we feel it because some of our systems have 128GB zones and migrate_pages() can call compaction_alloc() several times if it keeps getting -EAGAIN. It's very expensive. I'm not sure we should label it as a -fix for mm-compaction-split-freepages-without-holding-the-zone-lock.patch since the problem this patch is addressing has seemingly existed for years. Perhaps it would be better to have two patches, one as a -fix and then the abort on page split failure on top. I'll send out a two patch series in this form. > > set_page_private(page, order); > > total_isolated += isolated; > > list_add_tail(&page->lru, freelist); > > > > - /* If a page was split, advance to the end of it */ > > - if (isolated) { > > - cc->nr_freepages += isolated; > > - if (!strict && > > - cc->nr_migratepages <= cc->nr_freepages) { > > - blockpfn += isolated; > > - break; > > - } > > - > > - blockpfn += isolated - 1; > > - cursor += isolated - 1; > > - continue; > > + /* Advance to the end of split page */ > > + cc->nr_freepages += isolated; > > + if (!strict && cc->nr_migratepages <= cc->nr_freepages) { > > + blockpfn += isolated; > > + break; > > } > > + blockpfn += isolated - 1; > > + cursor += isolated - 1; > > + continue; > > > > isolate_fail: > > if (strict) > > @@ -521,6 +519,9 @@ isolate_fail: > > > > } > > > > + if (locked) > > + spin_unlock_irqrestore(&cc->zone->lock, flags); > > + > > /* > > * There is a tiny chance that we have read bogus compound_order(), > > * so be careful to not go outside of the pageblock. > > @@ -542,9 +543,6 @@ isolate_fail: > > if (strict && blockpfn < end_pfn) > > total_isolated = 0; > > > > - if (locked) > > - spin_unlock_irqrestore(&cc->zone->lock, flags); > > - > > /* Update the pageblock-skip if the whole pageblock was scanned */ > > if (blockpfn == end_pfn) > > update_pageblock_skip(cc, valid_page, total_isolated, false); > > @@ -622,7 +620,7 @@ isolate_freepages_range(struct compact_control *cc, > > */ > > } > > > > - /* split_free_page does not map the pages */ > > + /* __isolate_free_page() does not map the pages */ > > map_pages(&freelist); > > > > if (pfn < end_pfn) { > > @@ -1071,6 +1069,7 @@ static void isolate_freepages(struct compact_control > > *cc) > > block_end_pfn = block_start_pfn, > > block_start_pfn -= pageblock_nr_pages, > > isolate_start_pfn = block_start_pfn) { > > + unsigned long isolated; > > > > /* > > * This can iterate a massively long zone without finding any > > @@ -1095,8 +1094,12 @@ static void isolate_freepages(struct compact_control > > *cc) > > continue; > > > > /* Found a block suitable for isolating free pages from. */ > > - isolate_freepages_block(cc, &isolate_start_pfn, > > - block_end_pfn, freelist, false); > > + isolated = isolate_freepages_block(cc, &isolate_start_pfn, > > + block_end_pfn, freelist, > > false); > > + /* If free page split failed, do not continue needlessly */ > > More accurately, free page isolation failed? > Eek, maybe. The condition should only work if we terminated early because - need_resched() or zone->lock contention for MIGRATE_ASYNC, or - __isolate_free_page() fails. And the latter can only fail because of this (somewhat arbitrary) split watermark check. I'll rename it because it includes both, but I thought the next immediate condition check for cc->contended and its comment was explanatory enough. > > + if (!isolated && isolate_start_pfn < block_end_pfn && > > + cc->nr_freepages <= cc->nr_migratepages) > > + break; > > > > /* > > * If we isolated enough freepages, or aborted due to async > > @@ -1124,7 +1127,7 @@ static void isolate_freepages(struct compact_control > > *cc) > > } > > } > > > > - /* split_free_page does not map the pages */ > > + /* __isolate_free_page() does not map the pages */ > > map_pages(freelist); > > > > /* > > @@ -1703,6 +1706,12 @@ enum compact_result try_to_compact_pages(gfp_t > > gfp_mask, unsigned int order, > > continue; > > } > > > > + /* Don't attempt compaction if splitting free page will fail > > */ > > + if (!zone_watermark_ok(zone, 0, > > + low_wmark_pages(zone) + (1 << order), > > + 0, 0)) > > + continue; > > + > > Please don't add this, compact_zone already checks this via > compaction_suitable() (and the usual 2 << order gap), so this is adding yet > another watermark check with a different kind of gap. > Good point, thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-21 20:44 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-15 22:34 [patch] mm, compaction: ignore watermarks when isolating free pages David Rientjes 2016-06-16 7:15 ` Vlastimil Babka 2016-06-20 22:27 ` [patch] mm, compaction: abort free scanner if split fails David Rientjes 2016-06-21 11:43 ` Vlastimil Babka 2016-06-21 20:43 ` David Rientjes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).