* [patch] mm, compaction: ignore watermarks when isolating free pages
@ 2016-06-15 22:34 David Rientjes
2016-06-16 7:15 ` Vlastimil Babka
0 siblings, 1 reply; 5+ messages in thread
From: David Rientjes @ 2016-06-15 22:34 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman
Cc: Hugh Dickins, Vlastimil Babka, Joonsoo Kim, linux-kernel, linux-mm
The goal of memory compaction is to defragment memory by moving migratable
pages to free pages at the end of the zone. No additional memory is being
allocated.
Ignore per-zone low watermarks in __isolate_free_page() because memory is
either fully migrated or isolated free pages are returned when migration
fails.
This fixes an issue where the compaction freeing scanner can isolate
memory but the zone drops below its low watermark for that page order, so
the scanner must continue to scan all memory pointlessly.
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/page_alloc.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2484,23 +2484,14 @@ EXPORT_SYMBOL_GPL(split_page);
int __isolate_free_page(struct page *page, unsigned int order)
{
- unsigned long watermark;
struct zone *zone;
- int mt;
+ const int mt = get_pageblock_migratetype(page);
BUG_ON(!PageBuddy(page));
-
zone = page_zone(page);
- mt = get_pageblock_migratetype(page);
-
- if (!is_migrate_isolate(mt)) {
- /* Obey watermarks as if the page was being allocated */
- watermark = low_wmark_pages(zone) + (1 << order);
- if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
- return 0;
+ if (!is_migrate_isolate(mt))
__mod_zone_freepage_state(zone, -(1UL << order), mt);
- }
/* Remove page from free list */
list_del(&page->lru);
@@ -2520,7 +2511,6 @@ int __isolate_free_page(struct page *page, unsigned int order)
}
}
-
return 1UL << order;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mm, compaction: ignore watermarks when isolating free pages
2016-06-15 22:34 [patch] mm, compaction: ignore watermarks when isolating free pages David Rientjes
@ 2016-06-16 7:15 ` Vlastimil Babka
2016-06-20 22:27 ` [patch] mm, compaction: abort free scanner if split fails David Rientjes
0 siblings, 1 reply; 5+ messages in thread
From: Vlastimil Babka @ 2016-06-16 7:15 UTC (permalink / raw)
To: David Rientjes, Andrew Morton, Mel Gorman
Cc: Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm, Michal Hocko
On 06/16/2016 12:34 AM, David Rientjes wrote:
> The goal of memory compaction is to defragment memory by moving migratable
> pages to free pages at the end of the zone. No additional memory is being
> allocated.
>
> Ignore per-zone low watermarks in __isolate_free_page() because memory is
> either fully migrated or isolated free pages are returned when migration
> fails.
Michal Hocko suggested that too, but I didn't think it safe that
compaction should go below the min watermark, even temporarily. It means
the system is struggling with order-0 allocations, so making it worse
for the benefit of high-order allocations doesn't make sense. The
high-order allocation would likely fail anyway due to watermark checks,
even if the page of sufficient order was formed by compaction. So in my
series, I just changed the low watermark check to min [1].
> This fixes an issue where the compaction freeing scanner can isolate
> memory but the zone drops below its low watermark for that page order, so
> the scanner must continue to scan all memory pointlessly.
Good point, looks like failing the watermark is the only reason when
__isolate_free_page() can fail. isolate_freepages_block() and its
callers should take this as an indication that compaction should return
with failure immediately.
[1] http://article.gmane.org/gmane.linux.kernel/2231369
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
> mm/page_alloc.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2484,23 +2484,14 @@ EXPORT_SYMBOL_GPL(split_page);
>
> int __isolate_free_page(struct page *page, unsigned int order)
> {
> - unsigned long watermark;
> struct zone *zone;
> - int mt;
> + const int mt = get_pageblock_migratetype(page);
>
> BUG_ON(!PageBuddy(page));
> -
> zone = page_zone(page);
> - mt = get_pageblock_migratetype(page);
> -
> - if (!is_migrate_isolate(mt)) {
> - /* Obey watermarks as if the page was being allocated */
> - watermark = low_wmark_pages(zone) + (1 << order);
> - if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
> - return 0;
>
> + if (!is_migrate_isolate(mt))
> __mod_zone_freepage_state(zone, -(1UL << order), mt);
> - }
>
> /* Remove page from free list */
> list_del(&page->lru);
> @@ -2520,7 +2511,6 @@ int __isolate_free_page(struct page *page, unsigned int order)
> }
> }
>
> -
> return 1UL << order;
> }
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [patch] mm, compaction: abort free scanner if split fails
2016-06-16 7:15 ` Vlastimil Babka
@ 2016-06-20 22:27 ` David Rientjes
2016-06-21 11:43 ` Vlastimil Babka
0 siblings, 1 reply; 5+ messages in thread
From: David Rientjes @ 2016-06-20 22:27 UTC (permalink / raw)
To: Andrew Morton, Vlastimil Babka
Cc: Mel Gorman, Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm
If the memory compaction free scanner cannot successfully split a free
page (only possible due to per-zone low watermark), terminate the free
scanner rather than continuing to scan memory needlessly.
If the per-zone watermark is insufficient for a free page of
order <= cc->order, then terminate the scanner since future splits will
also likely fail.
This prevents the compaction freeing scanner from scanning all memory on
very large zones (very noticeable for zones > 128GB, for instance) when
all splits will likely fail.
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/compaction.c | 49 +++++++++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 20 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -494,24 +494,22 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
/* Found a free page, will break it into order-0 pages */
order = page_order(page);
- isolated = __isolate_free_page(page, page_order(page));
+ isolated = __isolate_free_page(page, order);
+ if (!isolated)
+ break;
set_page_private(page, order);
total_isolated += isolated;
list_add_tail(&page->lru, freelist);
- /* If a page was split, advance to the end of it */
- if (isolated) {
- cc->nr_freepages += isolated;
- if (!strict &&
- cc->nr_migratepages <= cc->nr_freepages) {
- blockpfn += isolated;
- break;
- }
-
- blockpfn += isolated - 1;
- cursor += isolated - 1;
- continue;
+ /* Advance to the end of split page */
+ cc->nr_freepages += isolated;
+ if (!strict && cc->nr_migratepages <= cc->nr_freepages) {
+ blockpfn += isolated;
+ break;
}
+ blockpfn += isolated - 1;
+ cursor += isolated - 1;
+ continue;
isolate_fail:
if (strict)
@@ -521,6 +519,9 @@ isolate_fail:
}
+ if (locked)
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+
/*
* There is a tiny chance that we have read bogus compound_order(),
* so be careful to not go outside of the pageblock.
@@ -542,9 +543,6 @@ isolate_fail:
if (strict && blockpfn < end_pfn)
total_isolated = 0;
- if (locked)
- spin_unlock_irqrestore(&cc->zone->lock, flags);
-
/* Update the pageblock-skip if the whole pageblock was scanned */
if (blockpfn == end_pfn)
update_pageblock_skip(cc, valid_page, total_isolated, false);
@@ -622,7 +620,7 @@ isolate_freepages_range(struct compact_control *cc,
*/
}
- /* split_free_page does not map the pages */
+ /* __isolate_free_page() does not map the pages */
map_pages(&freelist);
if (pfn < end_pfn) {
@@ -1071,6 +1069,7 @@ static void isolate_freepages(struct compact_control *cc)
block_end_pfn = block_start_pfn,
block_start_pfn -= pageblock_nr_pages,
isolate_start_pfn = block_start_pfn) {
+ unsigned long isolated;
/*
* This can iterate a massively long zone without finding any
@@ -1095,8 +1094,12 @@ static void isolate_freepages(struct compact_control *cc)
continue;
/* Found a block suitable for isolating free pages from. */
- isolate_freepages_block(cc, &isolate_start_pfn,
- block_end_pfn, freelist, false);
+ isolated = isolate_freepages_block(cc, &isolate_start_pfn,
+ block_end_pfn, freelist, false);
+ /* If free page split failed, do not continue needlessly */
+ if (!isolated && isolate_start_pfn < block_end_pfn &&
+ cc->nr_freepages <= cc->nr_migratepages)
+ break;
/*
* If we isolated enough freepages, or aborted due to async
@@ -1124,7 +1127,7 @@ static void isolate_freepages(struct compact_control *cc)
}
}
- /* split_free_page does not map the pages */
+ /* __isolate_free_page() does not map the pages */
map_pages(freelist);
/*
@@ -1703,6 +1706,12 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
continue;
}
+ /* Don't attempt compaction if splitting free page will fail */
+ if (!zone_watermark_ok(zone, 0,
+ low_wmark_pages(zone) + (1 << order),
+ 0, 0))
+ continue;
+
status = compact_zone_order(zone, order, gfp_mask, mode,
&zone_contended, alloc_flags,
ac_classzone_idx(ac));
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mm, compaction: abort free scanner if split fails
2016-06-20 22:27 ` [patch] mm, compaction: abort free scanner if split fails David Rientjes
@ 2016-06-21 11:43 ` Vlastimil Babka
2016-06-21 20:43 ` David Rientjes
0 siblings, 1 reply; 5+ messages in thread
From: Vlastimil Babka @ 2016-06-21 11:43 UTC (permalink / raw)
To: David Rientjes, Andrew Morton
Cc: Mel Gorman, Hugh Dickins, Joonsoo Kim, linux-kernel, linux-mm,
Minchan Kim
On 06/21/2016 12:27 AM, David Rientjes wrote:
> If the memory compaction free scanner cannot successfully split a free
> page (only possible due to per-zone low watermark), terminate the free
> scanner rather than continuing to scan memory needlessly.
>
> If the per-zone watermark is insufficient for a free page of
> order <= cc->order, then terminate the scanner since future splits will
> also likely fail.
>
> This prevents the compaction freeing scanner from scanning all memory on
> very large zones (very noticeable for zones > 128GB, for instance) when
> all splits will likely fail.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
But some notes below.
> ---
> mm/compaction.c | 49 +++++++++++++++++++++++++++++--------------------
> 1 file changed, 29 insertions(+), 20 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -494,24 +494,22 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>
> /* Found a free page, will break it into order-0 pages */
> order = page_order(page);
> - isolated = __isolate_free_page(page, page_order(page));
> + isolated = __isolate_free_page(page, order);
> + if (!isolated)
> + break;
This seems to fix as a side-effect a bug in Joonsoo's mmotm patch
mm-compaction-split-freepages-without-holding-the-zone-lock.patch, that
Minchan found: http://marc.info/?l=linux-mm&m=146607176528495&w=2
So it should be noted somewhere so they are merged together. Or Joonsoo
posts an isolated fix and this patch has to rebase.
> set_page_private(page, order);
> total_isolated += isolated;
> list_add_tail(&page->lru, freelist);
>
> - /* If a page was split, advance to the end of it */
> - if (isolated) {
> - cc->nr_freepages += isolated;
> - if (!strict &&
> - cc->nr_migratepages <= cc->nr_freepages) {
> - blockpfn += isolated;
> - break;
> - }
> -
> - blockpfn += isolated - 1;
> - cursor += isolated - 1;
> - continue;
> + /* Advance to the end of split page */
> + cc->nr_freepages += isolated;
> + if (!strict && cc->nr_migratepages <= cc->nr_freepages) {
> + blockpfn += isolated;
> + break;
> }
> + blockpfn += isolated - 1;
> + cursor += isolated - 1;
> + continue;
>
> isolate_fail:
> if (strict)
> @@ -521,6 +519,9 @@ isolate_fail:
>
> }
>
> + if (locked)
> + spin_unlock_irqrestore(&cc->zone->lock, flags);
> +
> /*
> * There is a tiny chance that we have read bogus compound_order(),
> * so be careful to not go outside of the pageblock.
> @@ -542,9 +543,6 @@ isolate_fail:
> if (strict && blockpfn < end_pfn)
> total_isolated = 0;
>
> - if (locked)
> - spin_unlock_irqrestore(&cc->zone->lock, flags);
> -
> /* Update the pageblock-skip if the whole pageblock was scanned */
> if (blockpfn == end_pfn)
> update_pageblock_skip(cc, valid_page, total_isolated, false);
> @@ -622,7 +620,7 @@ isolate_freepages_range(struct compact_control *cc,
> */
> }
>
> - /* split_free_page does not map the pages */
> + /* __isolate_free_page() does not map the pages */
> map_pages(&freelist);
>
> if (pfn < end_pfn) {
> @@ -1071,6 +1069,7 @@ static void isolate_freepages(struct compact_control *cc)
> block_end_pfn = block_start_pfn,
> block_start_pfn -= pageblock_nr_pages,
> isolate_start_pfn = block_start_pfn) {
> + unsigned long isolated;
>
> /*
> * This can iterate a massively long zone without finding any
> @@ -1095,8 +1094,12 @@ static void isolate_freepages(struct compact_control *cc)
> continue;
>
> /* Found a block suitable for isolating free pages from. */
> - isolate_freepages_block(cc, &isolate_start_pfn,
> - block_end_pfn, freelist, false);
> + isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> + block_end_pfn, freelist, false);
> + /* If free page split failed, do not continue needlessly */
More accurately, free page isolation failed?
> + if (!isolated && isolate_start_pfn < block_end_pfn &&
> + cc->nr_freepages <= cc->nr_migratepages)
> + break;
>
> /*
> * If we isolated enough freepages, or aborted due to async
> @@ -1124,7 +1127,7 @@ static void isolate_freepages(struct compact_control *cc)
> }
> }
>
> - /* split_free_page does not map the pages */
> + /* __isolate_free_page() does not map the pages */
> map_pages(freelist);
>
> /*
> @@ -1703,6 +1706,12 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
> continue;
> }
>
> + /* Don't attempt compaction if splitting free page will fail */
> + if (!zone_watermark_ok(zone, 0,
> + low_wmark_pages(zone) + (1 << order),
> + 0, 0))
> + continue;
> +
Please don't add this, compact_zone already checks this via
compaction_suitable() (and the usual 2 << order gap), so this is adding
yet another watermark check with a different kind of gap.
Thanks.
> status = compact_zone_order(zone, order, gfp_mask, mode,
> &zone_contended, alloc_flags,
> ac_classzone_idx(ac));
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch] mm, compaction: abort free scanner if split fails
2016-06-21 11:43 ` Vlastimil Babka
@ 2016-06-21 20:43 ` David Rientjes
0 siblings, 0 replies; 5+ messages in thread
From: David Rientjes @ 2016-06-21 20:43 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Mel Gorman, Hugh Dickins, Joonsoo Kim,
linux-kernel, linux-mm, Minchan Kim
On Tue, 21 Jun 2016, Vlastimil Babka wrote:
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -494,24 +494,22 @@ static unsigned long isolate_freepages_block(struct
> > compact_control *cc,
> >
> > /* Found a free page, will break it into order-0 pages */
> > order = page_order(page);
> > - isolated = __isolate_free_page(page, page_order(page));
> > + isolated = __isolate_free_page(page, order);
> > + if (!isolated)
> > + break;
>
> This seems to fix as a side-effect a bug in Joonsoo's mmotm patch
> mm-compaction-split-freepages-without-holding-the-zone-lock.patch, that
> Minchan found: http://marc.info/?l=linux-mm&m=146607176528495&w=2
>
> So it should be noted somewhere so they are merged together. Or Joonsoo posts
> an isolated fix and this patch has to rebase.
>
Indeed, I hadn't noticed the differences between Linus's tree and -mm.
Thanks very much for pointing it out.
My interest is to eventually backport this to a much older kernel where we
suffer from the same issue: it seems that we have always not terminated
the freeing scanner when splitting the free page fails and we feel it
because some of our systems have 128GB zones and migrate_pages() can call
compaction_alloc() several times if it keeps getting -EAGAIN. It's very
expensive.
I'm not sure we should label it as a -fix for
mm-compaction-split-freepages-without-holding-the-zone-lock.patch since
the problem this patch is addressing has seemingly existed for years.
Perhaps it would be better to have two patches, one as a -fix and then the
abort on page split failure on top. I'll send out a two patch series in
this form.
> > set_page_private(page, order);
> > total_isolated += isolated;
> > list_add_tail(&page->lru, freelist);
> >
> > - /* If a page was split, advance to the end of it */
> > - if (isolated) {
> > - cc->nr_freepages += isolated;
> > - if (!strict &&
> > - cc->nr_migratepages <= cc->nr_freepages) {
> > - blockpfn += isolated;
> > - break;
> > - }
> > -
> > - blockpfn += isolated - 1;
> > - cursor += isolated - 1;
> > - continue;
> > + /* Advance to the end of split page */
> > + cc->nr_freepages += isolated;
> > + if (!strict && cc->nr_migratepages <= cc->nr_freepages) {
> > + blockpfn += isolated;
> > + break;
> > }
> > + blockpfn += isolated - 1;
> > + cursor += isolated - 1;
> > + continue;
> >
> > isolate_fail:
> > if (strict)
> > @@ -521,6 +519,9 @@ isolate_fail:
> >
> > }
> >
> > + if (locked)
> > + spin_unlock_irqrestore(&cc->zone->lock, flags);
> > +
> > /*
> > * There is a tiny chance that we have read bogus compound_order(),
> > * so be careful to not go outside of the pageblock.
> > @@ -542,9 +543,6 @@ isolate_fail:
> > if (strict && blockpfn < end_pfn)
> > total_isolated = 0;
> >
> > - if (locked)
> > - spin_unlock_irqrestore(&cc->zone->lock, flags);
> > -
> > /* Update the pageblock-skip if the whole pageblock was scanned */
> > if (blockpfn == end_pfn)
> > update_pageblock_skip(cc, valid_page, total_isolated, false);
> > @@ -622,7 +620,7 @@ isolate_freepages_range(struct compact_control *cc,
> > */
> > }
> >
> > - /* split_free_page does not map the pages */
> > + /* __isolate_free_page() does not map the pages */
> > map_pages(&freelist);
> >
> > if (pfn < end_pfn) {
> > @@ -1071,6 +1069,7 @@ static void isolate_freepages(struct compact_control
> > *cc)
> > block_end_pfn = block_start_pfn,
> > block_start_pfn -= pageblock_nr_pages,
> > isolate_start_pfn = block_start_pfn) {
> > + unsigned long isolated;
> >
> > /*
> > * This can iterate a massively long zone without finding any
> > @@ -1095,8 +1094,12 @@ static void isolate_freepages(struct compact_control
> > *cc)
> > continue;
> >
> > /* Found a block suitable for isolating free pages from. */
> > - isolate_freepages_block(cc, &isolate_start_pfn,
> > - block_end_pfn, freelist, false);
> > + isolated = isolate_freepages_block(cc, &isolate_start_pfn,
> > + block_end_pfn, freelist,
> > false);
> > + /* If free page split failed, do not continue needlessly */
>
> More accurately, free page isolation failed?
>
Eek, maybe. The condition should only work if we terminated early because
- need_resched() or zone->lock contention for MIGRATE_ASYNC, or
- __isolate_free_page() fails.
And the latter can only fail because of this (somewhat arbitrary) split
watermark check. I'll rename it because it includes both, but I thought
the next immediate condition check for cc->contended and its comment was
explanatory enough.
> > + if (!isolated && isolate_start_pfn < block_end_pfn &&
> > + cc->nr_freepages <= cc->nr_migratepages)
> > + break;
> >
> > /*
> > * If we isolated enough freepages, or aborted due to async
> > @@ -1124,7 +1127,7 @@ static void isolate_freepages(struct compact_control
> > *cc)
> > }
> > }
> >
> > - /* split_free_page does not map the pages */
> > + /* __isolate_free_page() does not map the pages */
> > map_pages(freelist);
> >
> > /*
> > @@ -1703,6 +1706,12 @@ enum compact_result try_to_compact_pages(gfp_t
> > gfp_mask, unsigned int order,
> > continue;
> > }
> >
> > + /* Don't attempt compaction if splitting free page will fail
> > */
> > + if (!zone_watermark_ok(zone, 0,
> > + low_wmark_pages(zone) + (1 << order),
> > + 0, 0))
> > + continue;
> > +
>
> Please don't add this, compact_zone already checks this via
> compaction_suitable() (and the usual 2 << order gap), so this is adding yet
> another watermark check with a different kind of gap.
>
Good point, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-21 20:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-15 22:34 [patch] mm, compaction: ignore watermarks when isolating free pages David Rientjes
2016-06-16 7:15 ` Vlastimil Babka
2016-06-20 22:27 ` [patch] mm, compaction: abort free scanner if split fails David Rientjes
2016-06-21 11:43 ` Vlastimil Babka
2016-06-21 20:43 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).