* [PATCH 0/5] Improve hugepage allocation success rates under load V4
@ 2012-08-14 16:41 Mel Gorman
2012-08-14 16:41 ` [PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Mel Gorman @ 2012-08-14 16:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Rik van Riel, Minchan Kim, Jim Schutt, Linux-MM, LKML, Mel Gorman
Changelog since V3
o Add patch to backoff compaction in the event of lock contention
o Rebase to mmotm, cope with the removal of __GFP_NO_KSWAPD
o Removed RFC
Changelog since V2
o Capture !MIGRATE_MOVABLE pages where possible
o Document the treatment of MIGRATE_MOVABLE pages while capturing
o Expand changelogs
Changelog since V1
o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
o Expanded changelogs a little
Allocation success rates have been far lower since 3.4 due to commit
[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
commit was introduced for good reasons and it was known in advance that
the success rates would suffer but it was justified on the grounds that
the high allocation success rates were achieved by aggressive reclaim.
Success rates are expected to suffer even more in 3.6 due to commit
[7db8889a: mm: have order > 0 compaction start off where it left] which
testing has shown to severely reduce allocation success rates under load -
to 0% in one case.
This series aims to improve the allocation success rates without regressing
the benefits of commit fe2c2a10. The series is based on latest mmotm and
takes into account the __GFP_NO_KSWAPD flag is going away.
Patch 1 updates a stale comment seeing as I was in the general area.
Patch 2 updates reclaim/compaction to reclaim pages scaled on the number
of recent failures.
Patch 3 captures suitable high-order pages freed by compaction to reduce
races with parallel allocation requests.
Patch 4 fixes the upstream commit [7db8889a: mm: have order > 0 compaction
start off where it left] to enable compaction again
Patch 5 identifies when compacion is taking too long due to contention
and aborts.
STRESS-HIGHALLOC
3.6-rc1-akpm full-series
Pass 1 36.00 ( 0.00%) 51.00 (15.00%)
Pass 2 42.00 ( 0.00%) 63.00 (21.00%)
while Rested 86.00 ( 0.00%) 86.00 ( 0.00%)
From
http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__stress-highalloc-performance-ext3/hydra/comparison.html
I know that the allocation success rates in 3.3.6 was 78% in comparison to
36% in in the current akpm tree. With the full series applied, the success
rates are up to around 51% with some variability in the results. This is
not as high a success rate but it does not reclaim excessively which is
a key point.
MMTests Statistics: vmstat
Page Ins 3050912 3078892
Page Outs 8033528 8039096
Swap Ins 0 0
Swap Outs 0 0
Note that swap in/out rates remain at 0. In 3.3.6 with 78% success rates
there were 71881 pages swapped out.
Direct pages scanned 70942 122976
Kswapd pages scanned 1366300 1520122
Kswapd pages reclaimed 1366214 1484629
Direct pages reclaimed 70936 105716
Kswapd efficiency 99% 97%
Kswapd velocity 1072.550 1182.615
Direct efficiency 99% 85%
Direct velocity 55.690 95.672
The kswapd velocity changes very little as expected. kswapd velocity
is around the 1000 pages/sec mark where as in kernel 3.3.6 with the high
allocation success rates it was 8140 pages/second. Direct velocity is higher
as a result of patch 2 of the series but this is expected and is acceptable.
The direct reclaim and kswapd velocities change very little.
If these get accepted for merging then there is a difficulty in how they
should be handled. Commit [7db8889a: mm: have order > 0 compaction start
off where it left] is broken but it is already in 3.6-rc1 and needs to
be fixed. However, if just patch 4 from this series is applied then Jim
Schutt's workload is known to break again as his workload also requires
patch 5. While it would be preferred to have all these patches in 3.6 to
improve compaction in general, it would at least be acceptable if just
patches 4 and 5 were merged to 3.6 to fix a known problem without breaking
compaction completely. On the face of it, that would force __GFP_NO_KSWAPD
patches to be merged at the same time but I can do a version of this series
with __GFP_NO_KSWAPD change reverted and then rebase it on top of this
series. That might be best overall because I note that the __GFP_NO_KSWAPD
patch should have removed deferred_compaction from page_alloc.c but it
didn't but fixing that causes collisions with this series.
include/linux/compaction.h | 4 +-
include/linux/mm.h | 1 +
mm/compaction.c | 245 +++++++++++++++++++++++++++++++++-----------
mm/internal.h | 2 +
mm/page_alloc.c | 78 ++++++++++----
mm/vmscan.c | 10 ++
6 files changed, 256 insertions(+), 84 deletions(-)
--
1.7.9.2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages
2012-08-14 16:41 [PATCH 0/5] Improve hugepage allocation success rates under load V4 Mel Gorman
@ 2012-08-14 16:41 ` Mel Gorman
2012-08-14 16:41 ` [PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures Mel Gorman
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2012-08-14 16:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Rik van Riel, Minchan Kim, Jim Schutt, Linux-MM, LKML, Mel Gorman
The comment about order applied when the check was
order > PAGE_ALLOC_COSTLY_ORDER which has not been the case since
[c5a73c3d: thp: use compaction for all allocation orders]. Fixing
the comment while I'm in the general area.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
---
mm/compaction.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 36276e6..ea588eb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -812,11 +812,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
struct zone *zone;
int rc = COMPACT_SKIPPED;
- /*
- * Check whether it is worth even starting compaction. The order check is
- * made because an assumption is made that the page allocator can satisfy
- * the "cheaper" orders without taking special steps
- */
+ /* Check if the GFP flags allow compaction */
if (!order || !may_enter_fs || !may_perform_io)
return rc;
--
1.7.9.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures
2012-08-14 16:41 [PATCH 0/5] Improve hugepage allocation success rates under load V4 Mel Gorman
2012-08-14 16:41 ` [PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
@ 2012-08-14 16:41 ` Mel Gorman
2012-08-14 16:41 ` [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available Mel Gorman
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2012-08-14 16:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Rik van Riel, Minchan Kim, Jim Schutt, Linux-MM, LKML, Mel Gorman
If allocation fails after compaction then compaction may be deferred for
a number of allocation attempts. If there are subsequent failures,
compact_defer_shift is increased to defer for longer periods. This patch
uses that information to scale the number of pages reclaimed with
compact_defer_shift until allocations succeed again. The rationale is
that reclaiming the normal number of pages still allowed compaction to
fail and its success depends on the number of pages. If it's failing,
reclaim more pages until it succeeds again.
Note that this is not implying that VM reclaim is not reclaiming enough
pages or that its logic is broken. try_to_free_pages() always asks for
SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
what it does. Direct reclaim stops normally with this check.
if (sc->nr_reclaimed >= sc->nr_to_reclaim)
goto out;
should_continue_reclaim delays when that check is made until a minimum number
of pages for reclaim/compaction are reclaimed. It is possible that this patch
could instead set nr_to_reclaim in try_to_free_pages() and drive it from
there but that's behaves differently and not necessarily for the better. If
driven from do_try_to_free_pages(), it is also possible that priorities
will rise. When they reach DEF_PRIORITY-2, it will also start stalling
and setting pages for immediate reclaim which is more disruptive than not
desirable in this case. That is a more wide-reaching change that could
cause another regression related to THP requests causing interactive jitter.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
---
mm/vmscan.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8d01243..0dd35ef 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1743,6 +1743,7 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec,
{
unsigned long pages_for_compaction;
unsigned long inactive_lru_pages;
+ struct zone *zone;
/* If not in reclaim/compaction mode, stop */
if (!in_reclaim_compaction(sc))
@@ -1776,6 +1777,15 @@ static inline bool should_continue_reclaim(struct lruvec *lruvec,
* inactive lists are large enough, continue reclaiming
*/
pages_for_compaction = (2UL << sc->order);
+
+ /*
+ * If compaction is deferred for sc->order then scale the number of
+ * pages reclaimed based on the number of consecutive allocation
+ * failures
+ */
+ zone = lruvec_zone(lruvec);
+ if (zone->compact_order_failed <= sc->order)
+ pages_for_compaction <<= zone->compact_defer_shift;
inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE);
if (nr_swap_pages > 0)
inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON);
--
1.7.9.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available
2012-08-14 16:41 [PATCH 0/5] Improve hugepage allocation success rates under load V4 Mel Gorman
2012-08-14 16:41 ` [PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
2012-08-14 16:41 ` [PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures Mel Gorman
@ 2012-08-14 16:41 ` Mel Gorman
2012-08-14 16:41 ` [PATCH 4/5] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
2012-08-14 16:41 ` [PATCH 5/5] mm: compaction: Abort async compaction if locks are contended or taking too long Mel Gorman
4 siblings, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2012-08-14 16:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Rik van Riel, Minchan Kim, Jim Schutt, Linux-MM, LKML, Mel Gorman
While compaction is migrating pages to free up large contiguous blocks for
allocation it races with other allocation requests that may steal these
blocks or break them up. This patch alters direct compaction to capture a
suitable free page as soon as it becomes available to reduce this race. It
uses similar logic to split_free_page() to ensure that watermarks are
still obeyed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
---
include/linux/compaction.h | 4 +-
include/linux/mm.h | 1 +
mm/compaction.c | 88 ++++++++++++++++++++++++++++++++++++++------
mm/internal.h | 1 +
mm/page_alloc.c | 63 +++++++++++++++++++++++--------
5 files changed, 128 insertions(+), 29 deletions(-)
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 133ddcf..fd20c15 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,7 +22,7 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
- bool sync);
+ bool sync, struct page **page);
extern int compact_pgdat(pg_data_t *pgdat, int order);
extern unsigned long compaction_suitable(struct zone *zone, int order);
@@ -64,7 +64,7 @@ static inline bool compaction_deferred(struct zone *zone, int order)
#else
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
- bool sync)
+ bool sync, struct page **page)
{
return COMPACT_CONTINUE;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0514fe9..5ddb11b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -442,6 +442,7 @@ void put_pages_list(struct list_head *pages);
void split_page(struct page *page, unsigned int order);
int split_free_page(struct page *page);
+int capture_free_page(struct page *page, int alloc_order, int migratetype);
/*
* Compound pages have a destructor function. Provide a
diff --git a/mm/compaction.c b/mm/compaction.c
index ea588eb..a806a9c 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -50,6 +50,59 @@ static inline bool migrate_async_suitable(int migratetype)
return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
}
+static void compact_capture_page(struct compact_control *cc)
+{
+ unsigned long flags;
+ int mtype, mtype_low, mtype_high;
+
+ if (!cc->page || *cc->page)
+ return;
+
+ /*
+ * For MIGRATE_MOVABLE allocations we capture a suitable page ASAP
+ * regardless of the migratetype of the freelist is is captured from.
+ * This is fine because the order for a high-order MIGRATE_MOVABLE
+ * allocation is typically at least a pageblock size and overall
+ * fragmentation is not impaired. Other allocation types must
+ * capture pages from their own migratelist because otherwise they
+ * could pollute other pageblocks like MIGRATE_MOVABLE with
+ * difficult to move pages and making fragmentation worse overall.
+ */
+ if (cc->migratetype == MIGRATE_MOVABLE) {
+ mtype_low = 0;
+ mtype_high = MIGRATE_PCPTYPES;
+ } else {
+ mtype_low = cc->migratetype;
+ mtype_high = cc->migratetype + 1;
+ }
+
+ /* Speculatively examine the free lists without zone lock */
+ for (mtype = mtype_low; mtype < mtype_high; mtype++) {
+ int order;
+ for (order = cc->order; order < MAX_ORDER; order++) {
+ struct page *page;
+ struct free_area *area;
+ area = &(cc->zone->free_area[order]);
+ if (list_empty(&area->free_list[mtype]))
+ continue;
+
+ /* Take the lock and attempt capture of the page */
+ spin_lock_irqsave(&cc->zone->lock, flags);
+ if (!list_empty(&area->free_list[mtype])) {
+ page = list_entry(area->free_list[mtype].next,
+ struct page, lru);
+ if (capture_free_page(page, cc->order, mtype)) {
+ spin_unlock_irqrestore(&cc->zone->lock,
+ flags);
+ *cc->page = page;
+ return;
+ }
+ }
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+ }
+ }
+}
+
/*
* Isolate free pages onto a private freelist. Caller must hold zone->lock.
* If @strict is true, will abort returning 0 on any invalid PFNs or non-free
@@ -589,7 +642,6 @@ static unsigned long start_free_pfn(struct zone *zone)
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
- unsigned int order;
unsigned long watermark;
if (fatal_signal_pending(current))
@@ -632,14 +684,22 @@ static int compact_finished(struct zone *zone,
return COMPACT_CONTINUE;
/* Direct compactor: Is a suitable page free? */
- for (order = cc->order; order < MAX_ORDER; order++) {
- /* Job done if page is free of the right migratetype */
- if (!list_empty(&zone->free_area[order].free_list[cc->migratetype]))
- return COMPACT_PARTIAL;
-
- /* Job done if allocation would set block type */
- if (order >= pageblock_order && zone->free_area[order].nr_free)
+ if (cc->page) {
+ /* Was a suitable page captured? */
+ if (*cc->page)
return COMPACT_PARTIAL;
+ } else {
+ unsigned int order;
+ for (order = cc->order; order < MAX_ORDER; order++) {
+ struct free_area *area = &zone->free_area[cc->order];
+ /* Job done if page is free of the right migratetype */
+ if (!list_empty(&area->free_list[cc->migratetype]))
+ return COMPACT_PARTIAL;
+
+ /* Job done if allocation would set block type */
+ if (cc->order >= pageblock_order && area->nr_free)
+ return COMPACT_PARTIAL;
+ }
}
return COMPACT_CONTINUE;
@@ -761,6 +821,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
goto out;
}
}
+
+ /* Capture a page now if it is a suitable size */
+ compact_capture_page(cc);
}
out:
@@ -773,7 +836,7 @@ out:
static unsigned long compact_zone_order(struct zone *zone,
int order, gfp_t gfp_mask,
- bool sync)
+ bool sync, struct page **page)
{
struct compact_control cc = {
.nr_freepages = 0,
@@ -782,6 +845,7 @@ static unsigned long compact_zone_order(struct zone *zone,
.migratetype = allocflags_to_migratetype(gfp_mask),
.zone = zone,
.sync = sync,
+ .page = page,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
@@ -803,7 +867,7 @@ int sysctl_extfrag_threshold = 500;
*/
unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
- bool sync)
+ bool sync, struct page **page)
{
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
int may_enter_fs = gfp_mask & __GFP_FS;
@@ -823,7 +887,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
nodemask) {
int status;
- status = compact_zone_order(zone, order, gfp_mask, sync);
+ status = compact_zone_order(zone, order, gfp_mask, sync, page);
rc = max(status, rc);
/* If a normal allocation would succeed, stop compacting */
@@ -878,6 +942,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
struct compact_control cc = {
.order = order,
.sync = false,
+ .page = NULL,
};
return __compact_pgdat(pgdat, &cc);
@@ -888,6 +953,7 @@ static int compact_node(int nid)
struct compact_control cc = {
.order = -1,
.sync = true,
+ .page = NULL,
};
return __compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index 3314f79..b03f05e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,6 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
+ struct page **page; /* Page captured of requested size */
};
unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cefac39..d1759f5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1379,16 +1379,11 @@ void split_page(struct page *page, unsigned int order)
}
/*
- * Similar to split_page except the page is already free. As this is only
- * being used for migration, the migratetype of the block also changes.
- * As this is called with interrupts disabled, the caller is responsible
- * for calling arch_alloc_page() and kernel_map_page() after interrupts
- * are enabled.
- *
- * Note: this is probably too low level an operation for use in drivers.
- * Please consult with lkml before using this in your driver.
+ * Similar to the split_page family of functions except that the page
+ * required at the given order and being isolated now to prevent races
+ * with parallel allocators
*/
-int split_free_page(struct page *page)
+int capture_free_page(struct page *page, int alloc_order, int migratetype)
{
unsigned int order;
unsigned long watermark;
@@ -1410,10 +1405,11 @@ int split_free_page(struct page *page)
rmv_page_order(page);
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
- /* Split into individual pages */
- set_page_refcounted(page);
- split_page(page, order);
+ if (alloc_order != order)
+ expand(zone, page, alloc_order, order,
+ &zone->free_area[order], migratetype);
+ /* Set the pageblock if the captured page is at least a pageblock */
if (order >= pageblock_order - 1) {
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
@@ -1424,7 +1420,35 @@ int split_free_page(struct page *page)
}
}
- return 1 << order;
+ return 1UL << order;
+}
+
+/*
+ * Similar to split_page except the page is already free. As this is only
+ * being used for migration, the migratetype of the block also changes.
+ * As this is called with interrupts disabled, the caller is responsible
+ * for calling arch_alloc_page() and kernel_map_page() after interrupts
+ * are enabled.
+ *
+ * Note: this is probably too low level an operation for use in drivers.
+ * Please consult with lkml before using this in your driver.
+ */
+int split_free_page(struct page *page)
+{
+ unsigned int order;
+ int nr_pages;
+
+ BUG_ON(!PageBuddy(page));
+ order = page_order(page);
+
+ nr_pages = capture_free_page(page, order, 0);
+ if (!nr_pages)
+ return 0;
+
+ /* Split into individual pages */
+ set_page_refcounted(page);
+ split_page(page, order);
+ return nr_pages;
}
/*
@@ -2093,7 +2117,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
bool *deferred_compaction,
unsigned long *did_some_progress)
{
- struct page *page;
+ struct page *page = NULL;
if (!order)
return NULL;
@@ -2105,10 +2129,16 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
current->flags |= PF_MEMALLOC;
*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
- nodemask, sync_migration);
+ nodemask, sync_migration, &page);
current->flags &= ~PF_MEMALLOC;
- if (*did_some_progress != COMPACT_SKIPPED) {
+ /* If compaction captured a page, prep and use it */
+ if (page) {
+ prep_new_page(page, order, gfp_mask);
+ goto got_page;
+ }
+
+ if (*did_some_progress != COMPACT_SKIPPED) {
/* Page migration frees to the PCP lists but we want merging */
drain_pages(get_cpu());
put_cpu();
@@ -2118,6 +2148,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
alloc_flags & ~ALLOC_NO_WATERMARKS,
preferred_zone, migratetype);
if (page) {
+got_page:
preferred_zone->compact_considered = 0;
preferred_zone->compact_defer_shift = 0;
if (order >= preferred_zone->compact_order_failed)
--
1.7.9.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/5] mm: have order > 0 compaction start near a pageblock with free pages
2012-08-14 16:41 [PATCH 0/5] Improve hugepage allocation success rates under load V4 Mel Gorman
` (2 preceding siblings ...)
2012-08-14 16:41 ` [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available Mel Gorman
@ 2012-08-14 16:41 ` Mel Gorman
2012-09-13 15:52 ` Rik van Riel
2012-08-14 16:41 ` [PATCH 5/5] mm: compaction: Abort async compaction if locks are contended or taking too long Mel Gorman
4 siblings, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2012-08-14 16:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Rik van Riel, Minchan Kim, Jim Schutt, Linux-MM, LKML, Mel Gorman
commit [7db8889a: mm: have order > 0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages
C
Process A M S F
|---------------------------------------|
Process B M FS
C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn
In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.
Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.
Process A moves to "end_of_zone - one_pageblock" and runs this check
if (cc->order > 0 && (!cc->wrapped ||
zone->compact_cached_free_pfn >
cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);
compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
processes compacting it can end in a situation where the entire zone is
not being scanned at all. Further, it is possible for two processes to
ping-pong update to compact_cached_free_pfn which is just random.
Overall, the end result wrecks allocation success rates.
There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.
First, it gets rid of the skip logic because it's not clear that it matters
if two free scanners happen to be in the same block but with racing updates
it's too easy for it to skip over blocks it should not.
Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.
If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.
If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.
This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
---
mm/compaction.c | 54 ++++++++++++++++++++++++++++--------------------------
1 file changed, 28 insertions(+), 26 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a806a9c..c2d0958 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,20 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * Returns the start pfn of the last page block in a zone. This is the starting
+ * point for full compaction of a zone. Compaction searches for free pages from
+ * the end of each zone, while isolate_freepages_block scans forward inside each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+ unsigned long free_pfn;
+ free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ free_pfn &= ~(pageblock_nr_pages-1);
+ return free_pfn;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -475,17 +489,6 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
- /*
- * Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
- */
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
- pfn = min(pfn, zone->compact_cached_free_pfn);
-
if (!pfn_valid(pfn))
continue;
@@ -528,7 +531,15 @@ static void isolate_freepages(struct zone *zone,
*/
if (isolated) {
high_pfn = max(high_pfn, pfn);
- if (cc->order > 0)
+
+ /*
+ * If the free scanner has wrapped, update
+ * compact_cached_free_pfn to point to the highest
+ * pageblock with free pages. This reduces excessive
+ * scanning of full pageblocks near the end of the
+ * zone
+ */
+ if (cc->order > 0 && cc->wrapped)
zone->compact_cached_free_pfn = high_pfn;
}
}
@@ -538,6 +549,11 @@ static void isolate_freepages(struct zone *zone,
cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
+
+ /* If compact_cached_free_pfn is reset then set it now */
+ if (cc->order > 0 && !cc->wrapped &&
+ zone->compact_cached_free_pfn == start_free_pfn(zone))
+ zone->compact_cached_free_pfn = high_pfn;
}
/*
@@ -625,20 +641,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return ISOLATE_SUCCESS;
}
-/*
- * Returns the start pfn of the last page block in a zone. This is the starting
- * point for full compaction of a zone. Compaction searches for free pages from
- * the end of each zone, while isolate_freepages_block scans forward inside each
- * page block.
- */
-static unsigned long start_free_pfn(struct zone *zone)
-{
- unsigned long free_pfn;
- free_pfn = zone->zone_start_pfn + zone->spanned_pages;
- free_pfn &= ~(pageblock_nr_pages-1);
- return free_pfn;
-}
-
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
--
1.7.9.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/5] mm: compaction: Abort async compaction if locks are contended or taking too long
2012-08-14 16:41 [PATCH 0/5] Improve hugepage allocation success rates under load V4 Mel Gorman
` (3 preceding siblings ...)
2012-08-14 16:41 ` [PATCH 4/5] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
@ 2012-08-14 16:41 ` Mel Gorman
4 siblings, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2012-08-14 16:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Rik van Riel, Minchan Kim, Jim Schutt, Linux-MM, LKML, Mel Gorman
Jim Schutt reported a problem that pointed at compaction contending
heavily on locks. The workload is straight-forward and in his own words;
The systems in question have 24 SAS drives spread across 3 HBAs,
running 24 Ceph OSD instances, one per drive. FWIW these servers
are dual-socket Intel 5675 Xeons w/48 GB memory. I've got ~160
Ceph Linux clients doing dd simultaneously to a Ceph file system
backed by 12 of these servers.
Early in the test everything looks fine
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
31 15 0 287216 576 38606628 0 0 2 1158 2 14 1 3 95 0 0
27 15 0 225288 576 38583384 0 0 18 2222016 203357 134876 11 56 17 15 0
28 17 0 219256 576 38544736 0 0 11 2305932 203141 146296 11 49 23 17 0
6 18 0 215596 576 38552872 0 0 7 2363207 215264 166502 12 45 22 20 0
22 18 0 226984 576 38596404 0 0 3 2445741 223114 179527 12 43 23 22 0
and then it goes to pot
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
163 8 0 464308 576 36791368 0 0 11 22210 866 536 3 13 79 4 0
207 14 0 917752 576 36181928 0 0 712 1345376 134598 47367 7 90 1 2 0
123 12 0 685516 576 36296148 0 0 429 1386615 158494 60077 8 84 5 3 0
123 12 0 598572 576 36333728 0 0 1107 1233281 147542 62351 7 84 5 4 0
622 7 0 660768 576 36118264 0 0 557 1345548 151394 59353 7 85 4 3 0
223 11 0 283960 576 36463868 0 0 46 1107160 121846 33006 6 93 1 1 0
Note that system CPU usage is very high blocks being written out has
dropped by 42%. He analysed this with perf and found
perf record -g -a sleep 10
perf report --sort symbol --call-graph fractal,5
34.63% [k] _raw_spin_lock_irqsave
|
|--97.30%-- isolate_freepages
| compaction_alloc
| unmap_and_move
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_slowpath
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| do_page_fault
| page_fault
| |
| |--87.39%-- skb_copy_datagram_iovec
| | tcp_recvmsg
| | inet_recvmsg
| | sock_recvmsg
| | sys_recvfrom
| | system_call
| | __recv
| | |
| | --100.00%-- (nil)
| |
| --12.61%-- memcpy
--2.70%-- [...]
There was other data but primarily it is all showing that compaction is
contended heavily on the zone->lock and zone->lru_lock.
commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
while isolating pages for migration] noted that it was possible for
migration to hold the lru_lock for an excessive amount of time. Very
broadly speaking this patch expands the concept.
This patch introduces compact_checklock_irqsave() to check if a lock
is contended or the process needs to be scheduled. If either condition
is true then async compaction is aborted and the caller is informed.
The page allocator will fail a THP allocation if compaction failed due
to contention. This patch also introduces compact_trylock_irqsave()
which will acquire the lock only if it is not contended and the process
does not need to schedule.
Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/compaction.h | 4 +-
mm/compaction.c | 105 ++++++++++++++++++++++++++++++++++----------
mm/internal.h | 1 +
mm/page_alloc.c | 17 ++++---
4 files changed, 96 insertions(+), 31 deletions(-)
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index fd20c15..0e38a1d 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -22,7 +22,7 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
- bool sync, struct page **page);
+ bool sync, bool *contended, struct page **page);
extern int compact_pgdat(pg_data_t *pgdat, int order);
extern unsigned long compaction_suitable(struct zone *zone, int order);
@@ -64,7 +64,7 @@ static inline bool compaction_deferred(struct zone *zone, int order)
#else
static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
- bool sync, struct page **page)
+ bool sync, bool *contended, struct page **page)
{
return COMPACT_CONTINUE;
}
diff --git a/mm/compaction.c b/mm/compaction.c
index c2d0958..b95e263 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -50,6 +50,47 @@ static inline bool migrate_async_suitable(int migratetype)
return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
}
+/*
+ * Compaction requires the taking of some coarse locks that are potentially
+ * very heavily contended. Check if the process needs to be scheduled or
+ * if the lock is contended. For async compaction, back out in the event
+ * if contention is severe. For sync compaction, schedule.
+ *
+ * Returns true if the lock is held.
+ * Returns false if the lock is released and compaction should abort
+ */
+static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
+ bool locked, struct compact_control *cc)
+{
+ if (need_resched() || spin_is_contended(lock)) {
+ if (locked) {
+ spin_unlock_irqrestore(lock, *flags);
+ locked = false;
+ }
+
+ /* async aborts if taking too long or contended */
+ if (!cc->sync) {
+ if (cc->contended)
+ *cc->contended = true;
+ return false;
+ }
+
+ cond_resched();
+ if (fatal_signal_pending(current))
+ return false;
+ }
+
+ if (!locked)
+ spin_lock_irqsave(lock, *flags);
+ return true;
+}
+
+static inline bool compact_trylock_irqsave(spinlock_t *lock,
+ unsigned long *flags, struct compact_control *cc)
+{
+ return compact_checklock_irqsave(lock, flags, false, cc);
+}
+
static void compact_capture_page(struct compact_control *cc)
{
unsigned long flags;
@@ -87,7 +128,8 @@ static void compact_capture_page(struct compact_control *cc)
continue;
/* Take the lock and attempt capture of the page */
- spin_lock_irqsave(&cc->zone->lock, flags);
+ if (!compact_trylock_irqsave(&cc->zone->lock, &flags, cc))
+ return;
if (!list_empty(&area->free_list[mtype])) {
page = list_entry(area->free_list[mtype].next,
struct page, lru);
@@ -226,7 +268,7 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
}
/* Update the number of anon and file isolated pages in the zone */
-static void acct_isolated(struct zone *zone, struct compact_control *cc)
+static void acct_isolated(struct zone *zone, bool locked, struct compact_control *cc)
{
struct page *page;
unsigned int count[2] = { 0, };
@@ -234,8 +276,14 @@ static void acct_isolated(struct zone *zone, struct compact_control *cc)
list_for_each_entry(page, &cc->migratepages, lru)
count[!!page_is_file_cache(page)]++;
- __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
- __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
+ /* If locked we can use the interrupt unsafe versions */
+ if (locked) {
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
+ } else {
+ mod_zone_page_state(zone, NR_ISOLATED_ANON, count[0]);
+ mod_zone_page_state(zone, NR_ISOLATED_FILE, count[1]);
+ }
}
/* Similar to reclaim, but different enough that they don't share logic */
@@ -281,6 +329,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
struct list_head *migratelist = &cc->migratepages;
isolate_mode_t mode = 0;
struct lruvec *lruvec;
+ unsigned long flags;
+ bool locked;
/*
* Ensure that there are not too many pages isolated from the LRU
@@ -300,25 +350,22 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
/* Time to isolate some pages for migration */
cond_resched();
- spin_lock_irq(&zone->lru_lock);
+ spin_lock_irqsave(&zone->lru_lock, flags);
+ locked = true;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
- bool locked = true;
/* give a chance to irqs before checking need_resched() */
if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
- spin_unlock_irq(&zone->lru_lock);
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
locked = false;
}
- if (need_resched() || spin_is_contended(&zone->lru_lock)) {
- if (locked)
- spin_unlock_irq(&zone->lru_lock);
- cond_resched();
- spin_lock_irq(&zone->lru_lock);
- if (fatal_signal_pending(current))
- break;
- } else if (!locked)
- spin_lock_irq(&zone->lru_lock);
+
+ /* Check if it is ok to still hold the lock */
+ locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
+ locked, cc);
+ if (!locked)
+ break;
/*
* migrate_pfn does not necessarily start aligned to a
@@ -402,9 +449,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
}
}
- acct_isolated(zone, cc);
+ acct_isolated(zone, locked, cc);
- spin_unlock_irq(&zone->lru_lock);
+ if (locked)
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
@@ -514,7 +562,16 @@ static void isolate_freepages(struct zone *zone,
* are disabled
*/
isolated = 0;
- spin_lock_irqsave(&zone->lock, flags);
+
+ /*
+ * The zone lock must be held to isolate freepages. This
+ * unfortunately this is a very coarse lock and can be
+ * heavily contended if there are parallel allocations
+ * or parallel compactions. For async compaction do not
+ * spin on the lock
+ */
+ if (!compact_trylock_irqsave(&zone->lock, &flags, cc))
+ break;
if (suitable_migration_target(page)) {
end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
trace_mm_compaction_freepage_scanpfn(pfn);
@@ -837,8 +894,8 @@ out:
}
static unsigned long compact_zone_order(struct zone *zone,
- int order, gfp_t gfp_mask,
- bool sync, struct page **page)
+ int order, gfp_t gfp_mask, bool sync,
+ bool *contended, struct page **page)
{
struct compact_control cc = {
.nr_freepages = 0,
@@ -848,6 +905,7 @@ static unsigned long compact_zone_order(struct zone *zone,
.zone = zone,
.sync = sync,
.page = page,
+ .contended = contended,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
@@ -869,7 +927,7 @@ int sysctl_extfrag_threshold = 500;
*/
unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *nodemask,
- bool sync, struct page **page)
+ bool sync, bool *contended, struct page **page)
{
enum zone_type high_zoneidx = gfp_zone(gfp_mask);
int may_enter_fs = gfp_mask & __GFP_FS;
@@ -889,7 +947,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
nodemask) {
int status;
- status = compact_zone_order(zone, order, gfp_mask, sync, page);
+ status = compact_zone_order(zone, order, gfp_mask, sync,
+ contended, page);
rc = max(status, rc);
/* If a normal allocation would succeed, stop compacting */
diff --git a/mm/internal.h b/mm/internal.h
index b03f05e..e549a7f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,6 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
+ bool *contended; /* True if a lock was contended */
struct page **page; /* Page captured of requested size */
};
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d1759f5..373d05f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2114,7 +2114,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
int migratetype, bool sync_migration,
- bool *deferred_compaction,
+ bool *contended_compaction, bool *deferred_compaction,
unsigned long *did_some_progress)
{
struct page *page = NULL;
@@ -2129,7 +2129,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
current->flags |= PF_MEMALLOC;
*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
- nodemask, sync_migration, &page);
+ nodemask, sync_migration,
+ contended_compaction, &page);
current->flags &= ~PF_MEMALLOC;
/* If compaction captured a page, prep and use it */
@@ -2182,7 +2183,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
struct zonelist *zonelist, enum zone_type high_zoneidx,
nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
int migratetype, bool sync_migration,
- bool *deferred_compaction,
+ bool *contended_compaction, bool *deferred_compaction,
unsigned long *did_some_progress)
{
return NULL;
@@ -2355,6 +2356,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
unsigned long did_some_progress;
bool sync_migration = false;
bool deferred_compaction = false;
+ bool contended_compaction = false;
/*
* In the slowpath, we sanity check order to avoid ever trying to
@@ -2451,6 +2453,7 @@ rebalance:
nodemask,
alloc_flags, preferred_zone,
migratetype, sync_migration,
+ &contended_compaction,
&deferred_compaction,
&did_some_progress);
if (page)
@@ -2460,10 +2463,11 @@ rebalance:
/*
* If compaction is deferred for high-order allocations, it is because
* sync compaction recently failed. In this is the case and the caller
- * has requested the system not be heavily disrupted, fail the
- * allocation now instead of entering direct reclaim
+ * requested a movable allocation that does not heavily disrupt the
+ * system then fail the allocation instead of entering direct reclaim.
*/
- if (deferred_compaction)
+ if (contended_compaction &&
+ (gfp_mask & (__GFP_MOVABLE|__GFP_REPEAT)) == __GFP_MOVABLE)
goto nopage;
/* Try direct reclaim and then allocating */
@@ -2534,6 +2538,7 @@ rebalance:
nodemask,
alloc_flags, preferred_zone,
migratetype, sync_migration,
+ &contended_compaction,
&deferred_compaction,
&did_some_progress);
if (page)
--
1.7.9.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 4/5] mm: have order > 0 compaction start near a pageblock with free pages
2012-08-14 16:41 ` [PATCH 4/5] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
@ 2012-09-13 15:52 ` Rik van Riel
0 siblings, 0 replies; 7+ messages in thread
From: Rik van Riel @ 2012-09-13 15:52 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, Minchan Kim, Jim Schutt, Linux-MM, LKML
On 08/14/2012 12:41 PM, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
>
> C
> Process A M S F
> |---------------------------------------|
> Process B M FS
>
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
... actually, unless I am mistaken there may be a simple
approach to keep my "skip ahead" logic but make it proof
against the above scenario.
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block
It is not so much about being in the same block, as it is
about multiple invocations starting at the same block over
and over again.
> but with racing updates
> it's too easy for it to skip over blocks it should not.
If one thread stops compaction free page scanning in one
block, the next invocation will start by scanning that
block again, until it is exhausted.
We just need to make the code proof against the race you
described.
> @@ -475,17 +489,6 @@ static void isolate_freepages(struct zone *zone,
> pfn -= pageblock_nr_pages) {
> unsigned long isolated;
>
> - /*
> - * Skip ahead if another thread is compacting in the area
> - * simultaneously. If we wrapped around, we can only skip
> - * ahead if zone->compact_cached_free_pfn also wrapped to
> - * above our starting point.
> - */
> - if (cc->order > 0 && (!cc->wrapped ||
> - zone->compact_cached_free_pfn >
> - cc->start_free_pfn))
> - pfn = min(pfn, zone->compact_cached_free_pfn);
> -
> if (!pfn_valid(pfn))
> continue;
I think the skipping logic should look something like this:
static bool compaction_may_skip(struct zone *zone,
struct compaction_control *cc)
{
/* If we have not wrapped, we can only skip downwards. */
if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn)
return true;
/* If we have wrapped, we can skip ahead to our start point. */
if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn)
return true;
return false;
}
if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
I believe that would close the hole you described, while
not re-introducing the quadratic "start at the same block
every invocation, until we wrap" behaviour.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-09-13 15:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-14 16:41 [PATCH 0/5] Improve hugepage allocation success rates under load V4 Mel Gorman
2012-08-14 16:41 ` [PATCH 1/5] mm: compaction: Update comment in try_to_compact_pages Mel Gorman
2012-08-14 16:41 ` [PATCH 2/5] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures Mel Gorman
2012-08-14 16:41 ` [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available Mel Gorman
2012-08-14 16:41 ` [PATCH 4/5] mm: have order > 0 compaction start near a pageblock with free pages Mel Gorman
2012-09-13 15:52 ` Rik van Riel
2012-08-14 16:41 ` [PATCH 5/5] mm: compaction: Abort async compaction if locks are contended or taking too long Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).