All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 17:55 ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-06-28 17:55 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Mel Gorman, Andrew Morton, jaschut, minchan,
	kamezawa.hiroyu

Order > 0 compaction stops when enough free pages of the correct
page order have been coalesced. When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things
to at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting
at the end of the zone each time, even though previous invocations
of the compaction code already filled up all free memory on that end
of the zone.

This can cause isolate_freepages to take enormous amounts of CPU
with certain workloads on larger memory systems.

The obvious solution is to have isolate_freepages remember where
it left off last time, and continue at that point the next time
it gets invoked for an order > 0 compaction. This could cause
compaction to fail if cc->free_pfn and cc->migrate_pfn are close
together initially, in that case we restart from the end of the
zone and try once more.

Forced full (order == -1) compactions are left alone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2: implement Mel's suggestions, handling wrap-around etc

 include/linux/mmzone.h |    4 ++++
 mm/compaction.c        |   48 ++++++++++++++++++++++++++++++++++++++++++++----
 mm/internal.h          |    2 ++
 mm/page_alloc.c        |    5 +++++
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2427706..e629594 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -369,6 +369,10 @@ struct zone {
 	 */
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+	/* pfn where the last order > 0 compaction isolated free pages */
+	unsigned long		compact_cached_free_pfn;
+#endif
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..2668b77 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
 
+		/*
+		 * Skip ahead if another thread is compacting in the area
+		 * simultaneously. If we wrapped around, we can only skip
+		 * ahead if zone->compact_cached_free_pfn also wrapped to
+		 * above our starting point.
+		 */
+		if (cc->order > 0 && (!cc->wrapped ||
+				      zone->compact_cached_free_pfn >
+				      cc->start_free_pfn))
+			pfn = min(pfn, zone->compact_cached_free_pfn);
+
 		if (!pfn_valid(pfn))
 			continue;
 
@@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		if (isolated)
 			high_pfn = max(high_pfn, pfn);
+		if (cc->order > 0)
+			zone->compact_cached_free_pfn = high_pfn;
 	}
 
 	/* split_free_page does not map the pages */
@@ -565,8 +578,27 @@ static int compact_finished(struct zone *zone,
 	if (fatal_signal_pending(current))
 		return COMPACT_PARTIAL;
 
-	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn)
+	/*
+	 * A full (order == -1) compaction run starts at the beginning and
+	 * end of a zone; it completes when the migrate and free scanner meet. 
+	 * A partial (order > 0) compaction can start with the free scanner
+	 * at a random point in the zone, and may have to restart.
+	 */
+	if (cc->free_pfn <= cc->migrate_pfn) {
+		if (cc->order > 0 && !cc->wrapped) {
+			/* We started partway through; restart at the end. */
+			unsigned long free_pfn;
+			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+			free_pfn &= ~(pageblock_nr_pages-1);
+			zone->compact_cached_free_pfn = free_pfn;
+			cc->wrapped = 1;
+			return COMPACT_CONTINUE;
+		}
+		return COMPACT_COMPLETE;
+	}
+
+	/* We wrapped around and ended up where we started. */
+	if (cc->wrapped && cc->free_pfn <= cc->start_free_pfn)
 		return COMPACT_COMPLETE;
 
 	/*
@@ -664,8 +696,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 	/* Setup to move all movable pages to the end of the zone */
 	cc->migrate_pfn = zone->zone_start_pfn;
-	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
-	cc->free_pfn &= ~(pageblock_nr_pages-1);
+
+	if (cc->order > 0) {
+		/* Incremental compaction. Start where the last one stopped. */
+		cc->free_pfn = zone->compact_cached_free_pfn;
+		cc->start_free_pfn = cc->free_pfn;
+	} else {
+		/* Order == -1 starts at the end of the zone. */
+		cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
+		cc->free_pfn &= ~(pageblock_nr_pages-1);
+	}
 
 	migrate_prep_local();
 
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..0b72461 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -118,8 +118,10 @@ struct compact_control {
 	unsigned long nr_freepages;	/* Number of isolated free pages */
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
+	unsigned long start_free_pfn;	/* where we started the search */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
+	bool wrapped;			/* Last round for order>0 compaction */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..c353a61 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4394,6 +4394,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+		zone->compact_cached_free_pfn = zone->zone_start_pfn +
+						zone->spanned_pages;
+		zone->compact_cached_free_pfn &= ~(pageblock_nr_pages-1);
+#endif
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 17:55 ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-06-28 17:55 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Mel Gorman, Andrew Morton, jaschut, minchan,
	kamezawa.hiroyu

Order > 0 compaction stops when enough free pages of the correct
page order have been coalesced. When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things
to at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting
at the end of the zone each time, even though previous invocations
of the compaction code already filled up all free memory on that end
of the zone.

This can cause isolate_freepages to take enormous amounts of CPU
with certain workloads on larger memory systems.

The obvious solution is to have isolate_freepages remember where
it left off last time, and continue at that point the next time
it gets invoked for an order > 0 compaction. This could cause
compaction to fail if cc->free_pfn and cc->migrate_pfn are close
together initially, in that case we restart from the end of the
zone and try once more.

Forced full (order == -1) compactions are left alone.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
v2: implement Mel's suggestions, handling wrap-around etc

 include/linux/mmzone.h |    4 ++++
 mm/compaction.c        |   48 ++++++++++++++++++++++++++++++++++++++++++++----
 mm/internal.h          |    2 ++
 mm/page_alloc.c        |    5 +++++
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2427706..e629594 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -369,6 +369,10 @@ struct zone {
 	 */
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+	/* pfn where the last order > 0 compaction isolated free pages */
+	unsigned long		compact_cached_free_pfn;
+#endif
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..2668b77 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
 
+		/*
+		 * Skip ahead if another thread is compacting in the area
+		 * simultaneously. If we wrapped around, we can only skip
+		 * ahead if zone->compact_cached_free_pfn also wrapped to
+		 * above our starting point.
+		 */
+		if (cc->order > 0 && (!cc->wrapped ||
+				      zone->compact_cached_free_pfn >
+				      cc->start_free_pfn))
+			pfn = min(pfn, zone->compact_cached_free_pfn);
+
 		if (!pfn_valid(pfn))
 			continue;
 
@@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		if (isolated)
 			high_pfn = max(high_pfn, pfn);
+		if (cc->order > 0)
+			zone->compact_cached_free_pfn = high_pfn;
 	}
 
 	/* split_free_page does not map the pages */
@@ -565,8 +578,27 @@ static int compact_finished(struct zone *zone,
 	if (fatal_signal_pending(current))
 		return COMPACT_PARTIAL;
 
-	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn)
+	/*
+	 * A full (order == -1) compaction run starts at the beginning and
+	 * end of a zone; it completes when the migrate and free scanner meet. 
+	 * A partial (order > 0) compaction can start with the free scanner
+	 * at a random point in the zone, and may have to restart.
+	 */
+	if (cc->free_pfn <= cc->migrate_pfn) {
+		if (cc->order > 0 && !cc->wrapped) {
+			/* We started partway through; restart at the end. */
+			unsigned long free_pfn;
+			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+			free_pfn &= ~(pageblock_nr_pages-1);
+			zone->compact_cached_free_pfn = free_pfn;
+			cc->wrapped = 1;
+			return COMPACT_CONTINUE;
+		}
+		return COMPACT_COMPLETE;
+	}
+
+	/* We wrapped around and ended up where we started. */
+	if (cc->wrapped && cc->free_pfn <= cc->start_free_pfn)
 		return COMPACT_COMPLETE;
 
 	/*
@@ -664,8 +696,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 	/* Setup to move all movable pages to the end of the zone */
 	cc->migrate_pfn = zone->zone_start_pfn;
-	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
-	cc->free_pfn &= ~(pageblock_nr_pages-1);
+
+	if (cc->order > 0) {
+		/* Incremental compaction. Start where the last one stopped. */
+		cc->free_pfn = zone->compact_cached_free_pfn;
+		cc->start_free_pfn = cc->free_pfn;
+	} else {
+		/* Order == -1 starts at the end of the zone. */
+		cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
+		cc->free_pfn &= ~(pageblock_nr_pages-1);
+	}
 
 	migrate_prep_local();
 
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..0b72461 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -118,8 +118,10 @@ struct compact_control {
 	unsigned long nr_freepages;	/* Number of isolated free pages */
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
+	unsigned long start_free_pfn;	/* where we started the search */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
+	bool wrapped;			/* Last round for order>0 compaction */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..c353a61 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4394,6 +4394,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+		zone->compact_cached_free_pfn = zone->zone_start_pfn +
+						zone->spanned_pages;
+		zone->compact_cached_free_pfn &= ~(pageblock_nr_pages-1);
+#endif
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 17:55 ` Rik van Riel
@ 2012-06-28 20:19   ` Jim Schutt
  -1 siblings, 0 replies; 50+ messages in thread
From: Jim Schutt @ 2012-06-28 20:19 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, minchan,
	kamezawa.hiroyu

On 06/28/2012 11:55 AM, Rik van Riel wrote:

> ---
> v2: implement Mel's suggestions, handling wrap-around etc
>

So far I've run a total of ~28 TB of data over seventy minutes
or so through 12 machines running this version of the patch;
still no hint of trouble, still great performance.

Tested-by: Jim Schutt <jaschut@sandia.gov>

Thanks!

-- Jim


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 20:19   ` Jim Schutt
  0 siblings, 0 replies; 50+ messages in thread
From: Jim Schutt @ 2012-06-28 20:19 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, minchan,
	kamezawa.hiroyu

On 06/28/2012 11:55 AM, Rik van Riel wrote:

> ---
> v2: implement Mel's suggestions, handling wrap-around etc
>

So far I've run a total of ~28 TB of data over seventy minutes
or so through 12 machines running this version of the patch;
still no hint of trouble, still great performance.

Tested-by: Jim Schutt <jaschut@sandia.gov>

Thanks!

-- Jim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 20:19   ` Jim Schutt
@ 2012-06-28 20:57     ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-06-28 20:57 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, minchan,
	kamezawa.hiroyu

On 06/28/2012 04:19 PM, Jim Schutt wrote:
> On 06/28/2012 11:55 AM, Rik van Riel wrote:
>
>> ---
>> v2: implement Mel's suggestions, handling wrap-around etc
>>
>
> So far I've run a total of ~28 TB of data over seventy minutes
> or so through 12 machines running this version of the patch;
> still no hint of trouble, still great performance.
>
> Tested-by: Jim Schutt <jaschut@sandia.gov>

Awesome, thank you very much for analyzing and reporting
the issue, and testing the patch!

Mel? Andrew? :)

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 20:57     ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-06-28 20:57 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, minchan,
	kamezawa.hiroyu

On 06/28/2012 04:19 PM, Jim Schutt wrote:
> On 06/28/2012 11:55 AM, Rik van Riel wrote:
>
>> ---
>> v2: implement Mel's suggestions, handling wrap-around etc
>>
>
> So far I've run a total of ~28 TB of data over seventy minutes
> or so through 12 machines running this version of the patch;
> still no hint of trouble, still great performance.
>
> Tested-by: Jim Schutt <jaschut@sandia.gov>

Awesome, thank you very much for analyzing and reporting
the issue, and testing the patch!

Mel? Andrew? :)

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 17:55 ` Rik van Riel
@ 2012-06-28 20:59   ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-06-28 20:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, jaschut, minchan, kamezawa.hiroyu

On Thu, 28 Jun 2012 13:55:20 -0400
Rik van Riel <riel@redhat.com> wrote:

> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.

Is there a quality of service impact here?  Newly-compactable pages
at lower pfns than compact_cached_free_pfn will now get missed, leading
to a form of fragmentation?

> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated)
>  			high_pfn = max(high_pfn, pfn);
> +		if (cc->order > 0)
> +			zone->compact_cached_free_pfn = high_pfn;

Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
assume so, if lots of code in other places is correct but it's
unobvious from reading this function.

>  	}
>  
>  	/* split_free_page does not map the pages */
>
> ...
>
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -118,8 +118,10 @@ struct compact_control {
>  	unsigned long nr_freepages;	/* Number of isolated free pages */
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
> +	unsigned long start_free_pfn;	/* where we started the search */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>  	bool sync;			/* Synchronous migration */
> +	bool wrapped;			/* Last round for order>0 compaction */

This comment is incomprehensible :(

>  
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 20:59   ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-06-28 20:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, jaschut, minchan, kamezawa.hiroyu

On Thu, 28 Jun 2012 13:55:20 -0400
Rik van Riel <riel@redhat.com> wrote:

> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.

Is there a quality of service impact here?  Newly-compactable pages
at lower pfns than compact_cached_free_pfn will now get missed, leading
to a form of fragmentation?

> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated)
>  			high_pfn = max(high_pfn, pfn);
> +		if (cc->order > 0)
> +			zone->compact_cached_free_pfn = high_pfn;

Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
assume so, if lots of code in other places is correct but it's
unobvious from reading this function.

>  	}
>  
>  	/* split_free_page does not map the pages */
>
> ...
>
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -118,8 +118,10 @@ struct compact_control {
>  	unsigned long nr_freepages;	/* Number of isolated free pages */
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
> +	unsigned long start_free_pfn;	/* where we started the search */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>  	bool sync;			/* Synchronous migration */
> +	bool wrapped;			/* Last round for order>0 compaction */

This comment is incomprehensible :(

>  
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 20:59   ` Andrew Morton
@ 2012-06-28 21:24     ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-06-28 21:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, jaschut, minchan, kamezawa.hiroyu

On 06/28/2012 04:59 PM, Andrew Morton wrote:
> On Thu, 28 Jun 2012 13:55:20 -0400
> Rik van Riel<riel@redhat.com>  wrote:
>
>> Order>  0 compaction stops when enough free pages of the correct
>> page order have been coalesced. When doing subsequent higher order
>> allocations, it is possible for compaction to be invoked many times.
>>
>> However, the compaction code always starts out looking for things to
>> compact at the start of the zone, and for free pages to compact things
>> to at the end of the zone.
>>
>> This can cause quadratic behaviour, with isolate_freepages starting
>> at the end of the zone each time, even though previous invocations
>> of the compaction code already filled up all free memory on that end
>> of the zone.
>>
>> This can cause isolate_freepages to take enormous amounts of CPU
>> with certain workloads on larger memory systems.
>>
>> The obvious solution is to have isolate_freepages remember where
>> it left off last time, and continue at that point the next time
>> it gets invoked for an order>  0 compaction. This could cause
>> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
>> together initially, in that case we restart from the end of the
>> zone and try once more.
>>
>> Forced full (order == -1) compactions are left alone.
>
> Is there a quality of service impact here?  Newly-compactable pages
> at lower pfns than compact_cached_free_pfn will now get missed, leading
> to a form of fragmentation?

The compaction side of the zone always starts at the
very beginning of the zone.  I believe we can get
away with this, because skipping a whole transparent
hugepage or non-movable block is 512 times faster than
scanning an entire block for target pages in
isolate_freepages.

>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>   		 */
>>   		if (isolated)
>>   			high_pfn = max(high_pfn, pfn);
>> +		if (cc->order>  0)
>> +			zone->compact_cached_free_pfn = high_pfn;
>
> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> assume so, if lots of code in other places is correct but it's
> unobvious from reading this function.

Reading the code a few more times, I believe that it is
indeed aligned to pageblock size.

>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -118,8 +118,10 @@ struct compact_control {
>>   	unsigned long nr_freepages;	/* Number of isolated free pages */
>>   	unsigned long nr_migratepages;	/* Number of pages to migrate */
>>   	unsigned long free_pfn;		/* isolate_freepages search base */
>> +	unsigned long start_free_pfn;	/* where we started the search */
>>   	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>>   	bool sync;			/* Synchronous migration */
>> +	bool wrapped;			/* Last round for order>0 compaction */
>
> This comment is incomprehensible :(

Agreed.  I'm not sure how to properly describe that variable
in 30 or so characters :)

It denotes whether the current invocation of compaction,
called with order > 0, has had free_pfn and migrate_pfn
meet, resulting in free_pfn being reset to the top of
the zone.

Now, how to describe that briefly?

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 21:24     ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-06-28 21:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Mel Gorman, jaschut, minchan, kamezawa.hiroyu

On 06/28/2012 04:59 PM, Andrew Morton wrote:
> On Thu, 28 Jun 2012 13:55:20 -0400
> Rik van Riel<riel@redhat.com>  wrote:
>
>> Order>  0 compaction stops when enough free pages of the correct
>> page order have been coalesced. When doing subsequent higher order
>> allocations, it is possible for compaction to be invoked many times.
>>
>> However, the compaction code always starts out looking for things to
>> compact at the start of the zone, and for free pages to compact things
>> to at the end of the zone.
>>
>> This can cause quadratic behaviour, with isolate_freepages starting
>> at the end of the zone each time, even though previous invocations
>> of the compaction code already filled up all free memory on that end
>> of the zone.
>>
>> This can cause isolate_freepages to take enormous amounts of CPU
>> with certain workloads on larger memory systems.
>>
>> The obvious solution is to have isolate_freepages remember where
>> it left off last time, and continue at that point the next time
>> it gets invoked for an order>  0 compaction. This could cause
>> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
>> together initially, in that case we restart from the end of the
>> zone and try once more.
>>
>> Forced full (order == -1) compactions are left alone.
>
> Is there a quality of service impact here?  Newly-compactable pages
> at lower pfns than compact_cached_free_pfn will now get missed, leading
> to a form of fragmentation?

The compaction side of the zone always starts at the
very beginning of the zone.  I believe we can get
away with this, because skipping a whole transparent
hugepage or non-movable block is 512 times faster than
scanning an entire block for target pages in
isolate_freepages.

>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>   		 */
>>   		if (isolated)
>>   			high_pfn = max(high_pfn, pfn);
>> +		if (cc->order>  0)
>> +			zone->compact_cached_free_pfn = high_pfn;
>
> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> assume so, if lots of code in other places is correct but it's
> unobvious from reading this function.

Reading the code a few more times, I believe that it is
indeed aligned to pageblock size.

>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -118,8 +118,10 @@ struct compact_control {
>>   	unsigned long nr_freepages;	/* Number of isolated free pages */
>>   	unsigned long nr_migratepages;	/* Number of pages to migrate */
>>   	unsigned long free_pfn;		/* isolate_freepages search base */
>> +	unsigned long start_free_pfn;	/* where we started the search */
>>   	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>>   	bool sync;			/* Synchronous migration */
>> +	bool wrapped;			/* Last round for order>0 compaction */
>
> This comment is incomprehensible :(

Agreed.  I'm not sure how to properly describe that variable
in 30 or so characters :)

It denotes whether the current invocation of compaction,
called with order > 0, has had free_pfn and migrate_pfn
meet, resulting in free_pfn being reset to the top of
the zone.

Now, how to describe that briefly?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 21:24     ` Rik van Riel
@ 2012-06-28 21:35       ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-06-28 21:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, jaschut, minchan, kamezawa.hiroyu

On Thu, 28 Jun 2012 17:24:25 -0400
Rik van Riel <riel@redhat.com> wrote:

> 
> >> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
> >>   		 */
> >>   		if (isolated)
> >>   			high_pfn = max(high_pfn, pfn);
> >> +		if (cc->order>  0)
> >> +			zone->compact_cached_free_pfn = high_pfn;
> >
> > Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> > assume so, if lots of code in other places is correct but it's
> > unobvious from reading this function.
> 
> Reading the code a few more times, I believe that it is
> indeed aligned to pageblock size.

I'll slip this into -next for a while.

--- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
+++ a/mm/compaction.c
@@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
 
+		WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
 		/*
 		 * Record the highest PFN we isolated pages from. When next
 		 * looking for free pages, the search will restart here as
_

> >> --- a/mm/internal.h
> >> +++ b/mm/internal.h
> >> @@ -118,8 +118,10 @@ struct compact_control {
> >>   	unsigned long nr_freepages;	/* Number of isolated free pages */
> >>   	unsigned long nr_migratepages;	/* Number of pages to migrate */
> >>   	unsigned long free_pfn;		/* isolate_freepages search base */
> >> +	unsigned long start_free_pfn;	/* where we started the search */
> >>   	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> >>   	bool sync;			/* Synchronous migration */
> >> +	bool wrapped;			/* Last round for order>0 compaction */
> >
> > This comment is incomprehensible :(
> 
> Agreed.  I'm not sure how to properly describe that variable
> in 30 or so characters :)
> 
> It denotes whether the current invocation of compaction,
> called with order > 0, has had free_pfn and migrate_pfn
> meet, resulting in free_pfn being reset to the top of
> the zone.
> 
> Now, how to describe that briefly?

Use a multi-line comment above the definition ;)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 21:35       ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-06-28 21:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, jaschut, minchan, kamezawa.hiroyu

On Thu, 28 Jun 2012 17:24:25 -0400
Rik van Riel <riel@redhat.com> wrote:

> 
> >> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
> >>   		 */
> >>   		if (isolated)
> >>   			high_pfn = max(high_pfn, pfn);
> >> +		if (cc->order>  0)
> >> +			zone->compact_cached_free_pfn = high_pfn;
> >
> > Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> > assume so, if lots of code in other places is correct but it's
> > unobvious from reading this function.
> 
> Reading the code a few more times, I believe that it is
> indeed aligned to pageblock size.

I'll slip this into -next for a while.

--- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
+++ a/mm/compaction.c
@@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
 		}
 		spin_unlock_irqrestore(&zone->lock, flags);
 
+		WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
 		/*
 		 * Record the highest PFN we isolated pages from. When next
 		 * looking for free pages, the search will restart here as
_

> >> --- a/mm/internal.h
> >> +++ b/mm/internal.h
> >> @@ -118,8 +118,10 @@ struct compact_control {
> >>   	unsigned long nr_freepages;	/* Number of isolated free pages */
> >>   	unsigned long nr_migratepages;	/* Number of pages to migrate */
> >>   	unsigned long free_pfn;		/* isolate_freepages search base */
> >> +	unsigned long start_free_pfn;	/* where we started the search */
> >>   	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> >>   	bool sync;			/* Synchronous migration */
> >> +	bool wrapped;			/* Last round for order>0 compaction */
> >
> > This comment is incomprehensible :(
> 
> Agreed.  I'm not sure how to properly describe that variable
> in 30 or so characters :)
> 
> It denotes whether the current invocation of compaction,
> called with order > 0, has had free_pfn and migrate_pfn
> meet, resulting in free_pfn being reset to the top of
> the zone.
> 
> Now, how to describe that briefly?

Use a multi-line comment above the definition ;)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 17:55 ` Rik van Riel
@ 2012-06-28 23:27   ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-06-28 23:27 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

Hi Rik,

Thanks for quick great work!
I have some nitpick below.

On 06/29/2012 02:55 AM, Rik van Riel wrote:

> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v2: implement Mel's suggestions, handling wrap-around etc
> 
>  include/linux/mmzone.h |    4 ++++
>  mm/compaction.c        |   48 ++++++++++++++++++++++++++++++++++++++++++++----
>  mm/internal.h          |    2 ++
>  mm/page_alloc.c        |    5 +++++
>  4 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2427706..e629594 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -369,6 +369,10 @@ struct zone {
>  	 */
>  	spinlock_t		lock;
>  	int                     all_unreclaimable; /* All pages pinned */
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +	/* pfn where the last order > 0 compaction isolated free pages */


How about using  "partial compaction" word instead of (order > 0)?

> +	unsigned long		compact_cached_free_pfn;
> +#endif
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	/* see spanned/present_pages for more description */
>  	seqlock_t		span_seqlock;
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..2668b77 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> +		/*
> +		 * Skip ahead if another thread is compacting in the area
> +		 * simultaneously. If we wrapped around, we can only skip
> +		 * ahead if zone->compact_cached_free_pfn also wrapped to
> +		 * above our starting point.
> +		 */
> +		if (cc->order > 0 && (!cc->wrapped ||


So if (partial_compaction(cc) && ... ) or if (!full_compaction(cc) &&  ...)

> +				      zone->compact_cached_free_pfn >
> +				      cc->start_free_pfn))
> +			pfn = min(pfn, zone->compact_cached_free_pfn);


The pfn can be where migrate_pfn below?
I mean we need this?

if (pfn <= low_pfn)
	goto out;

Otherwise, we can steal free pages made by migration just.

> +
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated)
>  			high_pfn = max(high_pfn, pfn);
> +		if (cc->order > 0)
> +			zone->compact_cached_free_pfn = high_pfn;


Why do we cache high_pfn instead of pfn?
If we can't isolate any page, compact_cached_free_pfn would become low_pfn.
I expect it's not what you want.

>  	}
>  
>  	/* split_free_page does not map the pages */
> @@ -565,8 +578,27 @@ static int compact_finished(struct zone *zone,
>  	if (fatal_signal_pending(current))
>  		return COMPACT_PARTIAL;
>  
> -	/* Compaction run completes if the migrate and free scanner meet */
> -	if (cc->free_pfn <= cc->migrate_pfn)
> +	/*
> +	 * A full (order == -1) compaction run starts at the beginning and
> +	 * end of a zone; it completes when the migrate and free scanner meet. 
> +	 * A partial (order > 0) compaction can start with the free scanner
> +	 * at a random point in the zone, and may have to restart.
> +	 */
> +	if (cc->free_pfn <= cc->migrate_pfn) {
> +		if (cc->order > 0 && !cc->wrapped) {
> +			/* We started partway through; restart at the end. */
> +			unsigned long free_pfn;
> +			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +			free_pfn &= ~(pageblock_nr_pages-1);
> +			zone->compact_cached_free_pfn = free_pfn;
> +			cc->wrapped = 1;
> +			return COMPACT_CONTINUE;
> +		}
> +		return COMPACT_COMPLETE;
> +	}
> +
> +	/* We wrapped around and ended up where we started. */
> +	if (cc->wrapped && cc->free_pfn <= cc->start_free_pfn)
>  		return COMPACT_COMPLETE;
>  
>  	/*
> @@ -664,8 +696,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  
>  	/* Setup to move all movable pages to the end of the zone */
>  	cc->migrate_pfn = zone->zone_start_pfn;
> -	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> -	cc->free_pfn &= ~(pageblock_nr_pages-1);
> +
> +	if (cc->order > 0) {
> +		/* Incremental compaction. Start where the last one stopped. */
> +		cc->free_pfn = zone->compact_cached_free_pfn;
> +		cc->start_free_pfn = cc->free_pfn;
> +	} else {
> +		/* Order == -1 starts at the end of the zone. */
> +		cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> +		cc->free_pfn &= ~(pageblock_nr_pages-1);
> +	}
>  
>  	migrate_prep_local();
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 2ba87fb..0b72461 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -118,8 +118,10 @@ struct compact_control {
>  	unsigned long nr_freepages;	/* Number of isolated free pages */
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
> +	unsigned long start_free_pfn;	/* where we started the search */


For me, free_pfn and start_free_pfn are rather confusing.
I hope we can add more detail comment for free_pfn and start_free_pfn.
For start_free_pfn,
/* Where incremental compaction starts the search */

For free_pfn,
/* the highest PFN we isolated pages from */

>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>  	bool sync;			/* Synchronous migration */
> +	bool wrapped;			/* Last round for order>0 compaction */
>  
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..c353a61 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4394,6 +4394,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  
>  		zone->spanned_pages = size;
>  		zone->present_pages = realsize;
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +		zone->compact_cached_free_pfn = zone->zone_start_pfn +
> +						zone->spanned_pages;
> +		zone->compact_cached_free_pfn &= ~(pageblock_nr_pages-1);
> +#endif
>  #ifdef CONFIG_NUMA
>  		zone->node = nid;
>  		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-28 23:27   ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-06-28 23:27 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

Hi Rik,

Thanks for quick great work!
I have some nitpick below.

On 06/29/2012 02:55 AM, Rik van Riel wrote:

> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v2: implement Mel's suggestions, handling wrap-around etc
> 
>  include/linux/mmzone.h |    4 ++++
>  mm/compaction.c        |   48 ++++++++++++++++++++++++++++++++++++++++++++----
>  mm/internal.h          |    2 ++
>  mm/page_alloc.c        |    5 +++++
>  4 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2427706..e629594 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -369,6 +369,10 @@ struct zone {
>  	 */
>  	spinlock_t		lock;
>  	int                     all_unreclaimable; /* All pages pinned */
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +	/* pfn where the last order > 0 compaction isolated free pages */


How about using  "partial compaction" word instead of (order > 0)?

> +	unsigned long		compact_cached_free_pfn;
> +#endif
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	/* see spanned/present_pages for more description */
>  	seqlock_t		span_seqlock;
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..2668b77 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> +		/*
> +		 * Skip ahead if another thread is compacting in the area
> +		 * simultaneously. If we wrapped around, we can only skip
> +		 * ahead if zone->compact_cached_free_pfn also wrapped to
> +		 * above our starting point.
> +		 */
> +		if (cc->order > 0 && (!cc->wrapped ||


So if (partial_compaction(cc) && ... ) or if (!full_compaction(cc) &&  ...)

> +				      zone->compact_cached_free_pfn >
> +				      cc->start_free_pfn))
> +			pfn = min(pfn, zone->compact_cached_free_pfn);


The pfn can be where migrate_pfn below?
I mean we need this?

if (pfn <= low_pfn)
	goto out;

Otherwise, we can steal free pages made by migration just.

> +
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated)
>  			high_pfn = max(high_pfn, pfn);
> +		if (cc->order > 0)
> +			zone->compact_cached_free_pfn = high_pfn;


Why do we cache high_pfn instead of pfn?
If we can't isolate any page, compact_cached_free_pfn would become low_pfn.
I expect it's not what you want.

>  	}
>  
>  	/* split_free_page does not map the pages */
> @@ -565,8 +578,27 @@ static int compact_finished(struct zone *zone,
>  	if (fatal_signal_pending(current))
>  		return COMPACT_PARTIAL;
>  
> -	/* Compaction run completes if the migrate and free scanner meet */
> -	if (cc->free_pfn <= cc->migrate_pfn)
> +	/*
> +	 * A full (order == -1) compaction run starts at the beginning and
> +	 * end of a zone; it completes when the migrate and free scanner meet. 
> +	 * A partial (order > 0) compaction can start with the free scanner
> +	 * at a random point in the zone, and may have to restart.
> +	 */
> +	if (cc->free_pfn <= cc->migrate_pfn) {
> +		if (cc->order > 0 && !cc->wrapped) {
> +			/* We started partway through; restart at the end. */
> +			unsigned long free_pfn;
> +			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +			free_pfn &= ~(pageblock_nr_pages-1);
> +			zone->compact_cached_free_pfn = free_pfn;
> +			cc->wrapped = 1;
> +			return COMPACT_CONTINUE;
> +		}
> +		return COMPACT_COMPLETE;
> +	}
> +
> +	/* We wrapped around and ended up where we started. */
> +	if (cc->wrapped && cc->free_pfn <= cc->start_free_pfn)
>  		return COMPACT_COMPLETE;
>  
>  	/*
> @@ -664,8 +696,16 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>  
>  	/* Setup to move all movable pages to the end of the zone */
>  	cc->migrate_pfn = zone->zone_start_pfn;
> -	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> -	cc->free_pfn &= ~(pageblock_nr_pages-1);
> +
> +	if (cc->order > 0) {
> +		/* Incremental compaction. Start where the last one stopped. */
> +		cc->free_pfn = zone->compact_cached_free_pfn;
> +		cc->start_free_pfn = cc->free_pfn;
> +	} else {
> +		/* Order == -1 starts at the end of the zone. */
> +		cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
> +		cc->free_pfn &= ~(pageblock_nr_pages-1);
> +	}
>  
>  	migrate_prep_local();
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 2ba87fb..0b72461 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -118,8 +118,10 @@ struct compact_control {
>  	unsigned long nr_freepages;	/* Number of isolated free pages */
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
> +	unsigned long start_free_pfn;	/* where we started the search */


For me, free_pfn and start_free_pfn are rather confusing.
I hope we can add more detail comment for free_pfn and start_free_pfn.
For start_free_pfn,
/* Where incremental compaction starts the search */

For free_pfn,
/* the highest PFN we isolated pages from */

>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>  	bool sync;			/* Synchronous migration */
> +	bool wrapped;			/* Last round for order>0 compaction */
>  
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..c353a61 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4394,6 +4394,11 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  
>  		zone->spanned_pages = size;
>  		zone->present_pages = realsize;
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +		zone->compact_cached_free_pfn = zone->zone_start_pfn +
> +						zone->spanned_pages;
> +		zone->compact_cached_free_pfn &= ~(pageblock_nr_pages-1);
> +#endif
>  #ifdef CONFIG_NUMA
>  		zone->node = nid;
>  		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 17:55 ` Rik van Riel
@ 2012-06-29 10:02   ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-06-29 10:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Andrew Morton, jaschut, minchan, kamezawa.hiroyu

On Thu, Jun 28, 2012 at 01:55:20PM -0400, Rik van Riel wrote:
> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v2: implement Mel's suggestions, handling wrap-around etc
> 

I have not tested it myself but it looks correct! Thanks very much.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-29 10:02   ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-06-29 10:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Andrew Morton, jaschut, minchan, kamezawa.hiroyu

On Thu, Jun 28, 2012 at 01:55:20PM -0400, Rik van Riel wrote:
> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> v2: implement Mel's suggestions, handling wrap-around etc
> 

I have not tested it myself but it looks correct! Thanks very much.

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 17:55 ` Rik van Riel
@ 2012-06-30  3:51   ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 50+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-30  3:51 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut, minchan

(2012/06/29 2:55), Rik van Riel wrote:
> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
>
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
>
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
>
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
>
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
>
> Forced full (order == -1) compactions are left alone.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-06-30  3:51   ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 50+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-30  3:51 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut, minchan

(2012/06/29 2:55), Rik van Riel wrote:
> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
>
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
>
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
>
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
>
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
>
> Forced full (order == -1) compactions are left alone.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 21:35       ` Andrew Morton
@ 2012-07-02 17:42         ` Sasha Levin
  -1 siblings, 0 replies; 50+ messages in thread
From: Sasha Levin @ 2012-07-02 17:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, linux-mm, linux-kernel, Mel Gorman, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
> On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel <riel@redhat.com> wrote:
> > 
> > >> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
> > >>             */
> > >>            if (isolated)
> > >>                    high_pfn = max(high_pfn, pfn);
> > >> +          if (cc->order>  0)
> > >> +                  zone->compact_cached_free_pfn = high_pfn;
> > >
> > > Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> > > assume so, if lots of code in other places is correct but it's
> > > unobvious from reading this function.
> > 
> > Reading the code a few more times, I believe that it is
> > indeed aligned to pageblock size.
> 
> I'll slip this into -next for a while.
> 
> --- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
> +++ a/mm/compaction.c
> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>                 }
>                 spin_unlock_irqrestore(&zone->lock, flags);
>  
> +               WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
>                 /*
>                  * Record the highest PFN we isolated pages from. When next
>                  * looking for free pages, the search will restart here as 

I've triggered the following with today's -next:

[  372.893094] ------------[ cut here ]------------
[  372.896319] WARNING: at mm/compaction.c:470 isolate_freepages+0x1d9/0x370()
[  372.898900] Pid: 11417, comm: trinity-child55 Tainted: G        W    3.5.0-rc5-next-20120702-sasha-00007-g9a2ee81 #496
[  372.902293] Call Trace:
[  372.903152]  [<ffffffff810eb427>] warn_slowpath_common+0x87/0xb0
[  372.905295]  [<ffffffff810eb465>] warn_slowpath_null+0x15/0x20
[  372.908327]  [<ffffffff81207cd9>] isolate_freepages+0x1d9/0x370
[  372.912093]  [<ffffffff81207e98>] compaction_alloc+0x28/0x50
[  372.916275]  [<ffffffff81241c6c>] unmap_and_move+0x3c/0x140
[  372.920132]  [<ffffffff81241e30>] migrate_pages+0xc0/0x170
[  372.924141]  [<ffffffff81207e70>] ? isolate_freepages+0x370/0x370
[  372.928388]  [<ffffffff812087f2>] compact_zone+0x112/0x450
[  372.932342]  [<ffffffff81135488>] ? sched_clock_cpu+0x108/0x120
[  372.935289]  [<ffffffff81208f24>] compact_zone_order+0xb4/0xd0
[  372.937439]  [<ffffffff81208ff6>] try_to_compact_pages+0xb6/0x120
[  372.939535]  [<ffffffff811eae32>] __alloc_pages_direct_compact+0xc2/0x220
[  372.941912]  [<ffffffff811eb366>] __alloc_pages_slowpath+0x3d6/0x6a0
[  372.948587]  [<ffffffff811ead35>] ? get_page_from_freelist+0x625/0x660
[  372.957577]  [<ffffffff811eb876>] __alloc_pages_nodemask+0x246/0x330
[  372.964765]  [<ffffffff8122ea3f>] alloc_pages_vma+0x12f/0x140
[  372.971493]  [<ffffffff8124e9b4>] ? mem_cgroup_count_vm_event+0x144/0x170
[  372.980908]  [<ffffffff81247675>] do_huge_pmd_anonymous_page+0xf5/0x260
[  372.988788]  [<ffffffff81210573>] handle_mm_fault+0x1f3/0x360
[  372.997296]  [<ffffffff8120c037>] ? follow_page+0xe7/0x5a0
[  373.000737]  [<ffffffff81210bd8>] __get_user_pages+0x438/0x5d0
[  373.003473]  [<ffffffff81211cc6>] __mlock_vma_pages_range+0xc6/0xd0
[  373.007813]  [<ffffffff81211ec5>] mlock_vma_pages_range+0x75/0xb0
[  373.010956]  [<ffffffff81215adc>] mmap_region+0x4bc/0x5f0
[  373.012891]  [<ffffffff81215ec9>] do_mmap_pgoff+0x2b9/0x350
[  373.015029]  [<ffffffff812006bc>] ? vm_mmap_pgoff+0x6c/0xb0
[  373.017098]  [<ffffffff812006d4>] vm_mmap_pgoff+0x84/0xb0
[  373.019086]  [<ffffffff812133d2>] sys_mmap_pgoff+0x182/0x190
[  373.021106]  [<ffffffff8199070e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  373.025749]  [<ffffffff8106d4dd>] sys_mmap+0x1d/0x20
[  373.029489]  [<ffffffff8373b5f9>] system_call_fastpath+0x16/0x1b
[  373.042335] ---[ end trace 6d450e935ee18981 ]---


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-02 17:42         ` Sasha Levin
  0 siblings, 0 replies; 50+ messages in thread
From: Sasha Levin @ 2012-07-02 17:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, linux-mm, linux-kernel, Mel Gorman, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
> On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel <riel@redhat.com> wrote:
> > 
> > >> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
> > >>             */
> > >>            if (isolated)
> > >>                    high_pfn = max(high_pfn, pfn);
> > >> +          if (cc->order>  0)
> > >> +                  zone->compact_cached_free_pfn = high_pfn;
> > >
> > > Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> > > assume so, if lots of code in other places is correct but it's
> > > unobvious from reading this function.
> > 
> > Reading the code a few more times, I believe that it is
> > indeed aligned to pageblock size.
> 
> I'll slip this into -next for a while.
> 
> --- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
> +++ a/mm/compaction.c
> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>                 }
>                 spin_unlock_irqrestore(&zone->lock, flags);
>  
> +               WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
>                 /*
>                  * Record the highest PFN we isolated pages from. When next
>                  * looking for free pages, the search will restart here as 

I've triggered the following with today's -next:

[  372.893094] ------------[ cut here ]------------
[  372.896319] WARNING: at mm/compaction.c:470 isolate_freepages+0x1d9/0x370()
[  372.898900] Pid: 11417, comm: trinity-child55 Tainted: G        W    3.5.0-rc5-next-20120702-sasha-00007-g9a2ee81 #496
[  372.902293] Call Trace:
[  372.903152]  [<ffffffff810eb427>] warn_slowpath_common+0x87/0xb0
[  372.905295]  [<ffffffff810eb465>] warn_slowpath_null+0x15/0x20
[  372.908327]  [<ffffffff81207cd9>] isolate_freepages+0x1d9/0x370
[  372.912093]  [<ffffffff81207e98>] compaction_alloc+0x28/0x50
[  372.916275]  [<ffffffff81241c6c>] unmap_and_move+0x3c/0x140
[  372.920132]  [<ffffffff81241e30>] migrate_pages+0xc0/0x170
[  372.924141]  [<ffffffff81207e70>] ? isolate_freepages+0x370/0x370
[  372.928388]  [<ffffffff812087f2>] compact_zone+0x112/0x450
[  372.932342]  [<ffffffff81135488>] ? sched_clock_cpu+0x108/0x120
[  372.935289]  [<ffffffff81208f24>] compact_zone_order+0xb4/0xd0
[  372.937439]  [<ffffffff81208ff6>] try_to_compact_pages+0xb6/0x120
[  372.939535]  [<ffffffff811eae32>] __alloc_pages_direct_compact+0xc2/0x220
[  372.941912]  [<ffffffff811eb366>] __alloc_pages_slowpath+0x3d6/0x6a0
[  372.948587]  [<ffffffff811ead35>] ? get_page_from_freelist+0x625/0x660
[  372.957577]  [<ffffffff811eb876>] __alloc_pages_nodemask+0x246/0x330
[  372.964765]  [<ffffffff8122ea3f>] alloc_pages_vma+0x12f/0x140
[  372.971493]  [<ffffffff8124e9b4>] ? mem_cgroup_count_vm_event+0x144/0x170
[  372.980908]  [<ffffffff81247675>] do_huge_pmd_anonymous_page+0xf5/0x260
[  372.988788]  [<ffffffff81210573>] handle_mm_fault+0x1f3/0x360
[  372.997296]  [<ffffffff8120c037>] ? follow_page+0xe7/0x5a0
[  373.000737]  [<ffffffff81210bd8>] __get_user_pages+0x438/0x5d0
[  373.003473]  [<ffffffff81211cc6>] __mlock_vma_pages_range+0xc6/0xd0
[  373.007813]  [<ffffffff81211ec5>] mlock_vma_pages_range+0x75/0xb0
[  373.010956]  [<ffffffff81215adc>] mmap_region+0x4bc/0x5f0
[  373.012891]  [<ffffffff81215ec9>] do_mmap_pgoff+0x2b9/0x350
[  373.015029]  [<ffffffff812006bc>] ? vm_mmap_pgoff+0x6c/0xb0
[  373.017098]  [<ffffffff812006d4>] vm_mmap_pgoff+0x84/0xb0
[  373.019086]  [<ffffffff812133d2>] sys_mmap_pgoff+0x182/0x190
[  373.021106]  [<ffffffff8199070e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  373.025749]  [<ffffffff8106d4dd>] sys_mmap+0x1d/0x20
[  373.029489]  [<ffffffff8373b5f9>] system_call_fastpath+0x16/0x1b
[  373.042335] ---[ end trace 6d450e935ee18981 ]---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-02 17:42         ` Sasha Levin
@ 2012-07-03  0:57           ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-03  0:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On 07/02/2012 01:42 PM, Sasha Levin wrote:
> On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
>> On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel<riel@redhat.com>  wrote:
>>>
>>>>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>>>>              */
>>>>>             if (isolated)
>>>>>                     high_pfn = max(high_pfn, pfn);
>>>>> +          if (cc->order>   0)
>>>>> +                  zone->compact_cached_free_pfn = high_pfn;
>>>>
>>>> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
>>>> assume so, if lots of code in other places is correct but it's
>>>> unobvious from reading this function.
>>>
>>> Reading the code a few more times, I believe that it is
>>> indeed aligned to pageblock size.
>>
>> I'll slip this into -next for a while.
>>
>> --- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
>> +++ a/mm/compaction.c
>> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>>                  }
>>                  spin_unlock_irqrestore(&zone->lock, flags);
>>
>> +               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
>>                  /*
>>                   * Record the highest PFN we isolated pages from. When next
>>                   * looking for free pages, the search will restart here as
>
> I've triggered the following with today's -next:

I've been staring at the migrate code for most of the afternoon,
and am not sure how this is triggered.

At this point, I'm going to focus my attention on addressing
Minchan's comments on my code, and hoping someone who is more
familiar with the migrate code knows how high_pfn ends up
being not pageblock_nr_pages aligned...

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-03  0:57           ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-03  0:57 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrew Morton, linux-mm, linux-kernel, Mel Gorman, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On 07/02/2012 01:42 PM, Sasha Levin wrote:
> On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
>> On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel<riel@redhat.com>  wrote:
>>>
>>>>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>>>>              */
>>>>>             if (isolated)
>>>>>                     high_pfn = max(high_pfn, pfn);
>>>>> +          if (cc->order>   0)
>>>>> +                  zone->compact_cached_free_pfn = high_pfn;
>>>>
>>>> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
>>>> assume so, if lots of code in other places is correct but it's
>>>> unobvious from reading this function.
>>>
>>> Reading the code a few more times, I believe that it is
>>> indeed aligned to pageblock size.
>>
>> I'll slip this into -next for a while.
>>
>> --- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
>> +++ a/mm/compaction.c
>> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>>                  }
>>                  spin_unlock_irqrestore(&zone->lock, flags);
>>
>> +               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
>>                  /*
>>                   * Record the highest PFN we isolated pages from. When next
>>                   * looking for free pages, the search will restart here as
>
> I've triggered the following with today's -next:

I've been staring at the migrate code for most of the afternoon,
and am not sure how this is triggered.

At this point, I'm going to focus my attention on addressing
Minchan's comments on my code, and hoping someone who is more
familiar with the migrate code knows how high_pfn ends up
being not pageblock_nr_pages aligned...

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-03  0:57           ` Rik van Riel
@ 2012-07-03  2:54             ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-03  2:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	jaschut, kamezawa.hiroyu, Dave Jones

On 07/03/2012 09:57 AM, Rik van Riel wrote:

> On 07/02/2012 01:42 PM, Sasha Levin wrote:
>> On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
>>> On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel<riel@redhat.com>  wrote:
>>>>
>>>>>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>>>>>              */
>>>>>>             if (isolated)
>>>>>>                     high_pfn = max(high_pfn, pfn);
>>>>>> +          if (cc->order>   0)
>>>>>> +                  zone->compact_cached_free_pfn = high_pfn;
>>>>>
>>>>> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
>>>>> assume so, if lots of code in other places is correct but it's
>>>>> unobvious from reading this function.
>>>>
>>>> Reading the code a few more times, I believe that it is
>>>> indeed aligned to pageblock size.
>>>
>>> I'll slip this into -next for a while.
>>>
>>> ---
>>> a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
>>>
>>> +++ a/mm/compaction.c
>>> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>>>                  }
>>>                  spin_unlock_irqrestore(&zone->lock, flags);
>>>
>>> +               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
>>>                  /*
>>>                   * Record the highest PFN we isolated pages from.
>>> When next
>>>                   * looking for free pages, the search will restart
>>> here as
>>
>> I've triggered the following with today's -next:
> 
> I've been staring at the migrate code for most of the afternoon,
> and am not sure how this is triggered.
> 
> At this point, I'm going to focus my attention on addressing
> Minchan's comments on my code, and hoping someone who is more
> familiar with the migrate code knows how high_pfn ends up
> being not pageblock_nr_pages aligned...
> 


migrate_pfn does not necessarily start aligned to a pageblock.

        /* Setup to move all movable pages to the end of the zone */
        cc->migrate_pfn = zone->zone_start_pfn;

In isolate_freepages, high_pfn = low_pfn = cc->migrate_pfn + pageblock_nr_pages /* migrate_pfn doesn't aligned to a pageblock */

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-03  2:54             ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-03  2:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel, Mel Gorman,
	jaschut, kamezawa.hiroyu, Dave Jones

On 07/03/2012 09:57 AM, Rik van Riel wrote:

> On 07/02/2012 01:42 PM, Sasha Levin wrote:
>> On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
>>> On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel<riel@redhat.com>  wrote:
>>>>
>>>>>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>>>>>              */
>>>>>>             if (isolated)
>>>>>>                     high_pfn = max(high_pfn, pfn);
>>>>>> +          if (cc->order>   0)
>>>>>> +                  zone->compact_cached_free_pfn = high_pfn;
>>>>>
>>>>> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
>>>>> assume so, if lots of code in other places is correct but it's
>>>>> unobvious from reading this function.
>>>>
>>>> Reading the code a few more times, I believe that it is
>>>> indeed aligned to pageblock size.
>>>
>>> I'll slip this into -next for a while.
>>>
>>> ---
>>> a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
>>>
>>> +++ a/mm/compaction.c
>>> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>>>                  }
>>>                  spin_unlock_irqrestore(&zone->lock, flags);
>>>
>>> +               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
>>>                  /*
>>>                   * Record the highest PFN we isolated pages from.
>>> When next
>>>                   * looking for free pages, the search will restart
>>> here as
>>
>> I've triggered the following with today's -next:
> 
> I've been staring at the migrate code for most of the afternoon,
> and am not sure how this is triggered.
> 
> At this point, I'm going to focus my attention on addressing
> Minchan's comments on my code, and hoping someone who is more
> familiar with the migrate code knows how high_pfn ends up
> being not pageblock_nr_pages aligned...
> 


migrate_pfn does not necessarily start aligned to a pageblock.

        /* Setup to move all movable pages to the end of the zone */
        cc->migrate_pfn = zone->zone_start_pfn;

In isolate_freepages, high_pfn = low_pfn = cc->migrate_pfn + pageblock_nr_pages /* migrate_pfn doesn't aligned to a pageblock */

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-03  0:57           ` Rik van Riel
@ 2012-07-03 10:10             ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-07-03 10:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On Mon, Jul 02, 2012 at 08:57:03PM -0400, Rik van Riel wrote:
> On 07/02/2012 01:42 PM, Sasha Levin wrote:
> >On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
> >>On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel<riel@redhat.com>  wrote:
> >>>
> >>>>>@@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
> >>>>>             */
> >>>>>            if (isolated)
> >>>>>                    high_pfn = max(high_pfn, pfn);
> >>>>>+          if (cc->order>   0)
> >>>>>+                  zone->compact_cached_free_pfn = high_pfn;
> >>>>
> >>>>Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> >>>>assume so, if lots of code in other places is correct but it's
> >>>>unobvious from reading this function.
> >>>
> >>>Reading the code a few more times, I believe that it is
> >>>indeed aligned to pageblock size.
> >>
> >>I'll slip this into -next for a while.
> >>
> >>--- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
> >>+++ a/mm/compaction.c
> >>@@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
> >>                 }
> >>                 spin_unlock_irqrestore(&zone->lock, flags);
> >>
> >>+               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
> >>                 /*
> >>                  * Record the highest PFN we isolated pages from. When next
> >>                  * looking for free pages, the search will restart here as
> >
> >I've triggered the following with today's -next:
> 
> I've been staring at the migrate code for most of the afternoon,
> and am not sure how this is triggered.
> 

That warning is placed in isolate_freepages(). When the migration
scanner and free scanner have almost met it is possible for high_pfn to
be

cc->migrate_pfn + pageblock_nr_pages

and that is not necessarily pageblock aligned. Forcing it to be aligned
raises the possibility that the free scanner moves to another zone. This
is very unlikely but could happen if a high zone was very small.

I should have caught this when the warning was proposed :( IMO it's
safe to just drop the warning.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-03 10:10             ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-07-03 10:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Sasha Levin, Andrew Morton, linux-mm, linux-kernel, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On Mon, Jul 02, 2012 at 08:57:03PM -0400, Rik van Riel wrote:
> On 07/02/2012 01:42 PM, Sasha Levin wrote:
> >On Thu, 2012-06-28 at 14:35 -0700, Andrew Morton wrote:
> >>On Thu, 28 Jun 2012 17:24:25 -0400 Rik van Riel<riel@redhat.com>  wrote:
> >>>
> >>>>>@@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
> >>>>>             */
> >>>>>            if (isolated)
> >>>>>                    high_pfn = max(high_pfn, pfn);
> >>>>>+          if (cc->order>   0)
> >>>>>+                  zone->compact_cached_free_pfn = high_pfn;
> >>>>
> >>>>Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> >>>>assume so, if lots of code in other places is correct but it's
> >>>>unobvious from reading this function.
> >>>
> >>>Reading the code a few more times, I believe that it is
> >>>indeed aligned to pageblock size.
> >>
> >>I'll slip this into -next for a while.
> >>
> >>--- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
> >>+++ a/mm/compaction.c
> >>@@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
> >>                 }
> >>                 spin_unlock_irqrestore(&zone->lock, flags);
> >>
> >>+               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
> >>                 /*
> >>                  * Record the highest PFN we isolated pages from. When next
> >>                  * looking for free pages, the search will restart here as
> >
> >I've triggered the following with today's -next:
> 
> I've been staring at the migrate code for most of the afternoon,
> and am not sure how this is triggered.
> 

That warning is placed in isolate_freepages(). When the migration
scanner and free scanner have almost met it is possible for high_pfn to
be

cc->migrate_pfn + pageblock_nr_pages

and that is not necessarily pageblock aligned. Forcing it to be aligned
raises the possibility that the free scanner moves to another zone. This
is very unlikely but could happen if a high zone was very small.

I should have caught this when the warning was proposed :( IMO it's
safe to just drop the warning.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-06-28 23:27   ` Minchan Kim
@ 2012-07-03 14:59     ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-03 14:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

On 06/28/2012 07:27 PM, Minchan Kim wrote:

>> index 7ea259d..2668b77 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
>>   					pfn -= pageblock_nr_pages) {
>>   		unsigned long isolated;
>>
>> +		/*
>> +		 * Skip ahead if another thread is compacting in the area
>> +		 * simultaneously. If we wrapped around, we can only skip
>> +		 * ahead if zone->compact_cached_free_pfn also wrapped to
>> +		 * above our starting point.
>> +		 */
>> +		if (cc->order>  0&&  (!cc->wrapped ||
>
>
> So if (partial_compaction(cc)&&  ... ) or if (!full_compaction(cc)&&   ...

I am not sure that we want to abstract away what is happening
here.  We also are quite explicit with the meaning of cc->order
in compact_finished and other places in the compaction code.

>> +				      zone->compact_cached_free_pfn>
>> +				      cc->start_free_pfn))
>> +			pfn = min(pfn, zone->compact_cached_free_pfn);
>
>
> The pfn can be where migrate_pfn below?
> I mean we need this?
>
> if (pfn<= low_pfn)
> 	goto out;

That is a good point. I guess there is a small possibility that
another compaction thread is below us with cc->free_pfn and
cc->migrate_pfn, and we just inherited its cc->free_pfn via
zone->compact_cached_free_pfn, bringing us to below our own
cc->migrate_pfn.

Given that this was already possible with parallel compaction
in the past, I am not sure how important it is. It could result
in wasting a little bit of CPU, but your fix for it looks easy
enough.

Mel, any downside to compaction bailing (well, wrapping around)
a little earlier, like Minchan suggested?

>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>   		 */
>>   		if (isolated)
>>   			high_pfn = max(high_pfn, pfn);
>> +		if (cc->order>  0)
>> +			zone->compact_cached_free_pfn = high_pfn;
>
>
> Why do we cache high_pfn instead of pfn?

Reading the code, because we may not have isolated every
possible free page from this memory block.  The same reason
cc->free_pfn is set to high_pfn right before the function
exits.

> If we can't isolate any page, compact_cached_free_pfn would become low_pfn.
> I expect it's not what you want.

I guess we should only cache the value of high_pfn if
we isolated some pages?  In other words, this:

	if (isolated) {
		high_pfn = max(high_pfn, pfn);
		if (cc->order > 0)
			zone->compact_cached_free_pfn = high_pfn;
	}


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-03 14:59     ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-03 14:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

On 06/28/2012 07:27 PM, Minchan Kim wrote:

>> index 7ea259d..2668b77 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
>>   					pfn -= pageblock_nr_pages) {
>>   		unsigned long isolated;
>>
>> +		/*
>> +		 * Skip ahead if another thread is compacting in the area
>> +		 * simultaneously. If we wrapped around, we can only skip
>> +		 * ahead if zone->compact_cached_free_pfn also wrapped to
>> +		 * above our starting point.
>> +		 */
>> +		if (cc->order>  0&&  (!cc->wrapped ||
>
>
> So if (partial_compaction(cc)&&  ... ) or if (!full_compaction(cc)&&   ...

I am not sure that we want to abstract away what is happening
here.  We also are quite explicit with the meaning of cc->order
in compact_finished and other places in the compaction code.

>> +				      zone->compact_cached_free_pfn>
>> +				      cc->start_free_pfn))
>> +			pfn = min(pfn, zone->compact_cached_free_pfn);
>
>
> The pfn can be where migrate_pfn below?
> I mean we need this?
>
> if (pfn<= low_pfn)
> 	goto out;

That is a good point. I guess there is a small possibility that
another compaction thread is below us with cc->free_pfn and
cc->migrate_pfn, and we just inherited its cc->free_pfn via
zone->compact_cached_free_pfn, bringing us to below our own
cc->migrate_pfn.

Given that this was already possible with parallel compaction
in the past, I am not sure how important it is. It could result
in wasting a little bit of CPU, but your fix for it looks easy
enough.

Mel, any downside to compaction bailing (well, wrapping around)
a little earlier, like Minchan suggested?

>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>   		 */
>>   		if (isolated)
>>   			high_pfn = max(high_pfn, pfn);
>> +		if (cc->order>  0)
>> +			zone->compact_cached_free_pfn = high_pfn;
>
>
> Why do we cache high_pfn instead of pfn?

Reading the code, because we may not have isolated every
possible free page from this memory block.  The same reason
cc->free_pfn is set to high_pfn right before the function
exits.

> If we can't isolate any page, compact_cached_free_pfn would become low_pfn.
> I expect it's not what you want.

I guess we should only cache the value of high_pfn if
we isolated some pages?  In other words, this:

	if (isolated) {
		high_pfn = max(high_pfn, pfn);
		if (cc->order > 0)
			zone->compact_cached_free_pfn = high_pfn;
	}


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH -mm] mm: minor fixes for compaction
  2012-06-28 23:27   ` Minchan Kim
@ 2012-07-03 20:13     ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-03 20:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

This patch makes the comment for cc->wrapped longer, explaining
what is really going on. It also incorporates the comment fix
pointed out by Minchan.

Additionally, Minchan found that, when no pages get isolated,
high_pte could be a value that is much lower than desired,
which might potentially cause compaction to skip a range of
pages.

Only assign zone->compact_cache_free_pfn if we actually
isolated free pages for compaction.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
This does not address the one bit in Minchan's review that I am not sure about...

 include/linux/mmzone.h |    2 +-
 mm/compaction.c        |    7 ++++---
 mm/internal.h          |    6 +++++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e629594..e957fa1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -370,7 +370,7 @@ struct zone {
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
-	/* pfn where the last order > 0 compaction isolated free pages */
+	/* pfn where the last incremental compaction isolated free pages */
 	unsigned long		compact_cached_free_pfn;
 #endif
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/compaction.c b/mm/compaction.c
index 2668b77..2867166 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -472,10 +472,11 @@ static void isolate_freepages(struct zone *zone,
 		 * looking for free pages, the search will restart here as
 		 * page migration may have returned some pages to the allocator
 		 */
-		if (isolated)
+		if (isolated) {
 			high_pfn = max(high_pfn, pfn);
-		if (cc->order > 0)
-			zone->compact_cached_free_pfn = high_pfn;
+			if (cc->order > 0)
+				zone->compact_cached_free_pfn = high_pfn;
+		}
 	}
 
 	/* split_free_page does not map the pages */
diff --git a/mm/internal.h b/mm/internal.h
index 0b72461..da6b9b2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -121,7 +121,11 @@ struct compact_control {
 	unsigned long start_free_pfn;	/* where we started the search */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
-	bool wrapped;			/* Last round for order>0 compaction */
+	bool wrapped;			/* Order > 0 compactions are
+					   incremental, once free_pfn
+					   and migrate_pfn meet, we restart
+					   from the top of the zone;
+					   remember we wrapped around. */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH -mm] mm: minor fixes for compaction
@ 2012-07-03 20:13     ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-03 20:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

This patch makes the comment for cc->wrapped longer, explaining
what is really going on. It also incorporates the comment fix
pointed out by Minchan.

Additionally, Minchan found that, when no pages get isolated,
high_pte could be a value that is much lower than desired,
which might potentially cause compaction to skip a range of
pages.

Only assign zone->compact_cache_free_pfn if we actually
isolated free pages for compaction.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
This does not address the one bit in Minchan's review that I am not sure about...

 include/linux/mmzone.h |    2 +-
 mm/compaction.c        |    7 ++++---
 mm/internal.h          |    6 +++++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e629594..e957fa1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -370,7 +370,7 @@ struct zone {
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
-	/* pfn where the last order > 0 compaction isolated free pages */
+	/* pfn where the last incremental compaction isolated free pages */
 	unsigned long		compact_cached_free_pfn;
 #endif
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/compaction.c b/mm/compaction.c
index 2668b77..2867166 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -472,10 +472,11 @@ static void isolate_freepages(struct zone *zone,
 		 * looking for free pages, the search will restart here as
 		 * page migration may have returned some pages to the allocator
 		 */
-		if (isolated)
+		if (isolated) {
 			high_pfn = max(high_pfn, pfn);
-		if (cc->order > 0)
-			zone->compact_cached_free_pfn = high_pfn;
+			if (cc->order > 0)
+				zone->compact_cached_free_pfn = high_pfn;
+		}
 	}
 
 	/* split_free_page does not map the pages */
diff --git a/mm/internal.h b/mm/internal.h
index 0b72461..da6b9b2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -121,7 +121,11 @@ struct compact_control {
 	unsigned long start_free_pfn;	/* where we started the search */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
-	bool wrapped;			/* Last round for order>0 compaction */
+	bool wrapped;			/* Order > 0 compactions are
+					   incremental, once free_pfn
+					   and migrate_pfn meet, we restart
+					   from the top of the zone;
+					   remember we wrapped around. */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-03 10:10             ` Mel Gorman
@ 2012-07-03 21:48               ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-07-03 21:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Sasha Levin, linux-mm, linux-kernel, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On Tue, 3 Jul 2012 11:10:24 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> > >>>>>+          if (cc->order>   0)
> > >>>>>+                  zone->compact_cached_free_pfn = high_pfn;
> > >>>>
> > >>>>Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> > >>>>assume so, if lots of code in other places is correct but it's
> > >>>>unobvious from reading this function.
> > >>>
> > >>>Reading the code a few more times, I believe that it is
> > >>>indeed aligned to pageblock size.
> > >>
> > >>I'll slip this into -next for a while.
> > >>
> > >>--- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
> > >>+++ a/mm/compaction.c
> > >>@@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
> > >>                 }
> > >>                 spin_unlock_irqrestore(&zone->lock, flags);
> > >>
> > >>+               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
> > >>                 /*
> > >>                  * Record the highest PFN we isolated pages from. When next
> > >>                  * looking for free pages, the search will restart here as
> > >
> > >I've triggered the following with today's -next:
> > 
> > I've been staring at the migrate code for most of the afternoon,
> > and am not sure how this is triggered.
> > 
> 
> That warning is placed in isolate_freepages(). When the migration
> scanner and free scanner have almost met it is possible for high_pfn to
> be
> 
> cc->migrate_pfn + pageblock_nr_pages
> 
> and that is not necessarily pageblock aligned. Forcing it to be aligned
> raises the possibility that the free scanner moves to another zone. This
> is very unlikely but could happen if a high zone was very small.
> 
> I should have caught this when the warning was proposed :( IMO it's
> safe to just drop the warning.

The rest of this patch takes care to ensure that
->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
appears that this particular site will violate that.

What's up?  Do we need to fix this site, or do we remove all that
make-compact_cached_free_pfn-aligned code?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-03 21:48               ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-07-03 21:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Sasha Levin, linux-mm, linux-kernel, jaschut,
	minchan, kamezawa.hiroyu, Dave Jones

On Tue, 3 Jul 2012 11:10:24 +0100
Mel Gorman <mel@csn.ul.ie> wrote:

> > >>>>>+          if (cc->order>   0)
> > >>>>>+                  zone->compact_cached_free_pfn = high_pfn;
> > >>>>
> > >>>>Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
> > >>>>assume so, if lots of code in other places is correct but it's
> > >>>>unobvious from reading this function.
> > >>>
> > >>>Reading the code a few more times, I believe that it is
> > >>>indeed aligned to pageblock size.
> > >>
> > >>I'll slip this into -next for a while.
> > >>
> > >>--- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
> > >>+++ a/mm/compaction.c
> > >>@@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
> > >>                 }
> > >>                 spin_unlock_irqrestore(&zone->lock, flags);
> > >>
> > >>+               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
> > >>                 /*
> > >>                  * Record the highest PFN we isolated pages from. When next
> > >>                  * looking for free pages, the search will restart here as
> > >
> > >I've triggered the following with today's -next:
> > 
> > I've been staring at the migrate code for most of the afternoon,
> > and am not sure how this is triggered.
> > 
> 
> That warning is placed in isolate_freepages(). When the migration
> scanner and free scanner have almost met it is possible for high_pfn to
> be
> 
> cc->migrate_pfn + pageblock_nr_pages
> 
> and that is not necessarily pageblock aligned. Forcing it to be aligned
> raises the possibility that the free scanner moves to another zone. This
> is very unlikely but could happen if a high zone was very small.
> 
> I should have caught this when the warning was proposed :( IMO it's
> safe to just drop the warning.

The rest of this patch takes care to ensure that
->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
appears that this particular site will violate that.

What's up?  Do we need to fix this site, or do we remove all that
make-compact_cached_free_pfn-aligned code?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-03 14:59     ` Rik van Riel
@ 2012-07-04  2:28       ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  2:28 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

Hi Rik,

On 07/03/2012 11:59 PM, Rik van Riel wrote:

> On 06/28/2012 07:27 PM, Minchan Kim wrote:
> 
>>> index 7ea259d..2668b77 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
>>>                       pfn -= pageblock_nr_pages) {
>>>           unsigned long isolated;
>>>
>>> +        /*
>>> +         * Skip ahead if another thread is compacting in the area
>>> +         * simultaneously. If we wrapped around, we can only skip
>>> +         * ahead if zone->compact_cached_free_pfn also wrapped to
>>> +         * above our starting point.
>>> +         */
>>> +        if (cc->order>  0&&  (!cc->wrapped ||
>>
>>
>> So if (partial_compaction(cc)&&  ... ) or if (!full_compaction(cc)&&  
>> ...
> 
> I am not sure that we want to abstract away what is happening
> here.  We also are quite explicit with the meaning of cc->order
> in compact_finished and other places in the compaction code.
> 
>>> +                      zone->compact_cached_free_pfn>
>>> +                      cc->start_free_pfn))
>>> +            pfn = min(pfn, zone->compact_cached_free_pfn);
>>
>>
>> The pfn can be where migrate_pfn below?
>> I mean we need this?
>>
>> if (pfn<= low_pfn)
>>     goto out;
> 
> That is a good point. I guess there is a small possibility that
> another compaction thread is below us with cc->free_pfn and
> cc->migrate_pfn, and we just inherited its cc->free_pfn via
> zone->compact_cached_free_pfn, bringing us to below our own
> cc->migrate_pfn.
> 
> Given that this was already possible with parallel compaction
> in the past, I am not sure how important it is. It could result
> in wasting a little bit of CPU, but your fix for it looks easy
> enough.


In the past, it was impossible since we have per-compaction context free_pfn.
 

> 
> Mel, any downside to compaction bailing (well, wrapping around)
> a little earlier, like Minchan suggested?


I can't speak for Mel. But IMHO, if we meet such case, we can ignore compact_cached_free_pfn
, then go with just pfn instead of early bailing.

> 
>>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>>            */
>>>           if (isolated)
>>>               high_pfn = max(high_pfn, pfn);
>>> +        if (cc->order>  0)
>>> +            zone->compact_cached_free_pfn = high_pfn;
>>
>>
>> Why do we cache high_pfn instead of pfn?
> 
> Reading the code, because we may not have isolated every
> possible free page from this memory block.  The same reason
> cc->free_pfn is set to high_pfn right before the function
> exits.

> 

>> If we can't isolate any page, compact_cached_free_pfn would become
>> low_pfn.
>> I expect it's not what you want.
> 
> I guess we should only cache the value of high_pfn if
> we isolated some pages?  In other words, this:
> 
>     if (isolated) {
>         high_pfn = max(high_pfn, pfn);
>         if (cc->order > 0)
>             zone->compact_cached_free_pfn = high_pfn;
>     }
> 
> 


I agree.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-04  2:28       ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  2:28 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

Hi Rik,

On 07/03/2012 11:59 PM, Rik van Riel wrote:

> On 06/28/2012 07:27 PM, Minchan Kim wrote:
> 
>>> index 7ea259d..2668b77 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -422,6 +422,17 @@ static void isolate_freepages(struct zone *zone,
>>>                       pfn -= pageblock_nr_pages) {
>>>           unsigned long isolated;
>>>
>>> +        /*
>>> +         * Skip ahead if another thread is compacting in the area
>>> +         * simultaneously. If we wrapped around, we can only skip
>>> +         * ahead if zone->compact_cached_free_pfn also wrapped to
>>> +         * above our starting point.
>>> +         */
>>> +        if (cc->order>  0&&  (!cc->wrapped ||
>>
>>
>> So if (partial_compaction(cc)&&  ... ) or if (!full_compaction(cc)&&  
>> ...
> 
> I am not sure that we want to abstract away what is happening
> here.  We also are quite explicit with the meaning of cc->order
> in compact_finished and other places in the compaction code.
> 
>>> +                      zone->compact_cached_free_pfn>
>>> +                      cc->start_free_pfn))
>>> +            pfn = min(pfn, zone->compact_cached_free_pfn);
>>
>>
>> The pfn can be where migrate_pfn below?
>> I mean we need this?
>>
>> if (pfn<= low_pfn)
>>     goto out;
> 
> That is a good point. I guess there is a small possibility that
> another compaction thread is below us with cc->free_pfn and
> cc->migrate_pfn, and we just inherited its cc->free_pfn via
> zone->compact_cached_free_pfn, bringing us to below our own
> cc->migrate_pfn.
> 
> Given that this was already possible with parallel compaction
> in the past, I am not sure how important it is. It could result
> in wasting a little bit of CPU, but your fix for it looks easy
> enough.


In the past, it was impossible since we have per-compaction context free_pfn.
 

> 
> Mel, any downside to compaction bailing (well, wrapping around)
> a little earlier, like Minchan suggested?


I can't speak for Mel. But IMHO, if we meet such case, we can ignore compact_cached_free_pfn
, then go with just pfn instead of early bailing.

> 
>>> @@ -463,6 +474,8 @@ static void isolate_freepages(struct zone *zone,
>>>            */
>>>           if (isolated)
>>>               high_pfn = max(high_pfn, pfn);
>>> +        if (cc->order>  0)
>>> +            zone->compact_cached_free_pfn = high_pfn;
>>
>>
>> Why do we cache high_pfn instead of pfn?
> 
> Reading the code, because we may not have isolated every
> possible free page from this memory block.  The same reason
> cc->free_pfn is set to high_pfn right before the function
> exits.

> 

>> If we can't isolate any page, compact_cached_free_pfn would become
>> low_pfn.
>> I expect it's not what you want.
> 
> I guess we should only cache the value of high_pfn if
> we isolated some pages?  In other words, this:
> 
>     if (isolated) {
>         high_pfn = max(high_pfn, pfn);
>         if (cc->order > 0)
>             zone->compact_cached_free_pfn = high_pfn;
>     }
> 
> 


I agree.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-03 21:48               ` Andrew Morton
@ 2012-07-04  2:34                 ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  2:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

Hi Andrew,

On 07/04/2012 06:48 AM, Andrew Morton wrote:

> On Tue, 3 Jul 2012 11:10:24 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
>>>>>>>> +          if (cc->order>   0)
>>>>>>>> +                  zone->compact_cached_free_pfn = high_pfn;
>>>>>>>
>>>>>>> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
>>>>>>> assume so, if lots of code in other places is correct but it's
>>>>>>> unobvious from reading this function.
>>>>>>
>>>>>> Reading the code a few more times, I believe that it is
>>>>>> indeed aligned to pageblock size.
>>>>>
>>>>> I'll slip this into -next for a while.
>>>>>
>>>>> --- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
>>>>> +++ a/mm/compaction.c
>>>>> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>>>>>                 }
>>>>>                 spin_unlock_irqrestore(&zone->lock, flags);
>>>>>
>>>>> +               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
>>>>>                 /*
>>>>>                  * Record the highest PFN we isolated pages from. When next
>>>>>                  * looking for free pages, the search will restart here as
>>>>
>>>> I've triggered the following with today's -next:
>>>
>>> I've been staring at the migrate code for most of the afternoon,
>>> and am not sure how this is triggered.
>>>
>>
>> That warning is placed in isolate_freepages(). When the migration
>> scanner and free scanner have almost met it is possible for high_pfn to
>> be
>>
>> cc->migrate_pfn + pageblock_nr_pages
>>
>> and that is not necessarily pageblock aligned. Forcing it to be aligned
>> raises the possibility that the free scanner moves to another zone. This
>> is very unlikely but could happen if a high zone was very small.
>>
>> I should have caught this when the warning was proposed :( IMO it's
>> safe to just drop the warning.
> 
> The rest of this patch takes care to ensure that
> ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
> appears that this particular site will violate that.
> 
> What's up?  Do we need to fix this site, or do we remove all that
> make-compact_cached_free_pfn-aligned code?


I vote removing the warning because it doesn't related to Rik's incremental compaction.
Let's see. 

high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
In here, cc->migrate_pfn isn't necessarily pageblock aligined.
So if we don't consider compact_cached_free_pfn, it can hit.

static void isolate_freepages()
{
	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
	for (..) {
		...
		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
		
	}
}

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-04  2:34                 ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  2:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

Hi Andrew,

On 07/04/2012 06:48 AM, Andrew Morton wrote:

> On Tue, 3 Jul 2012 11:10:24 +0100
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
>>>>>>>> +          if (cc->order>   0)
>>>>>>>> +                  zone->compact_cached_free_pfn = high_pfn;
>>>>>>>
>>>>>>> Is high_pfn guaranteed to be aligned to pageblock_nr_pages here?  I
>>>>>>> assume so, if lots of code in other places is correct but it's
>>>>>>> unobvious from reading this function.
>>>>>>
>>>>>> Reading the code a few more times, I believe that it is
>>>>>> indeed aligned to pageblock size.
>>>>>
>>>>> I'll slip this into -next for a while.
>>>>>
>>>>> --- a/mm/compaction.c~isolate_freepages-check-that-high_pfn-is-aligned-as-expected
>>>>> +++ a/mm/compaction.c
>>>>> @@ -456,6 +456,7 @@ static void isolate_freepages(struct zon
>>>>>                 }
>>>>>                 spin_unlock_irqrestore(&zone->lock, flags);
>>>>>
>>>>> +               WARN_ON_ONCE(high_pfn&  (pageblock_nr_pages - 1));
>>>>>                 /*
>>>>>                  * Record the highest PFN we isolated pages from. When next
>>>>>                  * looking for free pages, the search will restart here as
>>>>
>>>> I've triggered the following with today's -next:
>>>
>>> I've been staring at the migrate code for most of the afternoon,
>>> and am not sure how this is triggered.
>>>
>>
>> That warning is placed in isolate_freepages(). When the migration
>> scanner and free scanner have almost met it is possible for high_pfn to
>> be
>>
>> cc->migrate_pfn + pageblock_nr_pages
>>
>> and that is not necessarily pageblock aligned. Forcing it to be aligned
>> raises the possibility that the free scanner moves to another zone. This
>> is very unlikely but could happen if a high zone was very small.
>>
>> I should have caught this when the warning was proposed :( IMO it's
>> safe to just drop the warning.
> 
> The rest of this patch takes care to ensure that
> ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
> appears that this particular site will violate that.
> 
> What's up?  Do we need to fix this site, or do we remove all that
> make-compact_cached_free_pfn-aligned code?


I vote removing the warning because it doesn't related to Rik's incremental compaction.
Let's see. 

high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
In here, cc->migrate_pfn isn't necessarily pageblock aligined.
So if we don't consider compact_cached_free_pfn, it can hit.

static void isolate_freepages()
{
	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
	for (..) {
		...
		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
		
	}
}

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm] mm: minor fixes for compaction
  2012-07-03 20:13     ` Rik van Riel
@ 2012-07-04  2:36       ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  2:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

On 07/04/2012 05:13 AM, Rik van Riel wrote:

> This patch makes the comment for cc->wrapped longer, explaining
> what is really going on. It also incorporates the comment fix
> pointed out by Minchan.
> 
> Additionally, Minchan found that, when no pages get isolated,
> high_pte could be a value that is much lower than desired,
> which might potentially cause compaction to skip a range of
> pages.
> 
> Only assign zone->compact_cache_free_pfn if we actually
> isolated free pages for compaction.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>


Reviewed-by: Minchan Kim <minchan@kernel.org>

> ---
> This does not address the one bit in Minchan's review that I am not sure about...


About this part, let's wait of Mel's opinion.

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm] mm: minor fixes for compaction
@ 2012-07-04  2:36       ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  2:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, linux-kernel, Mel Gorman, Andrew Morton, jaschut,
	kamezawa.hiroyu

On 07/04/2012 05:13 AM, Rik van Riel wrote:

> This patch makes the comment for cc->wrapped longer, explaining
> what is really going on. It also incorporates the comment fix
> pointed out by Minchan.
> 
> Additionally, Minchan found that, when no pages get isolated,
> high_pte could be a value that is much lower than desired,
> which might potentially cause compaction to skip a range of
> pages.
> 
> Only assign zone->compact_cache_free_pfn if we actually
> isolated free pages for compaction.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>


Reviewed-by: Minchan Kim <minchan@kernel.org>

> ---
> This does not address the one bit in Minchan's review that I am not sure about...


About this part, let's wait of Mel's opinion.

Thanks.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-04  2:34                 ` Minchan Kim
@ 2012-07-04  7:42                   ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-07-04  7:42 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

On Wed, 04 Jul 2012 11:34:09 +0900 Minchan Kim <minchan@kernel.org> wrote:

> > The rest of this patch takes care to ensure that
> > ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
> > appears that this particular site will violate that.
> > 
> > What's up?  Do we need to fix this site, or do we remove all that
> > make-compact_cached_free_pfn-aligned code?
> 
> 
> I vote removing the warning because it doesn't related to Rik's incremental compaction.
> Let's see. 
> 
> high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
> In here, cc->migrate_pfn isn't necessarily pageblock aligined.
> So if we don't consider compact_cached_free_pfn, it can hit.
> 
> static void isolate_freepages()
> {
> 	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
> 	for (..) {
> 		...
> 		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
> 		
> 	}
> }

Please, look at the patch.  In numerous places it is aligning
compact_cached_free_pfn to a multiple of pageblock_nr_pages.  But in
one place it doesn't do that.  So are all those alignment operations
necessary?


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-04  7:42                   ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2012-07-04  7:42 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

On Wed, 04 Jul 2012 11:34:09 +0900 Minchan Kim <minchan@kernel.org> wrote:

> > The rest of this patch takes care to ensure that
> > ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
> > appears that this particular site will violate that.
> > 
> > What's up?  Do we need to fix this site, or do we remove all that
> > make-compact_cached_free_pfn-aligned code?
> 
> 
> I vote removing the warning because it doesn't related to Rik's incremental compaction.
> Let's see. 
> 
> high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
> In here, cc->migrate_pfn isn't necessarily pageblock aligined.
> So if we don't consider compact_cached_free_pfn, it can hit.
> 
> static void isolate_freepages()
> {
> 	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
> 	for (..) {
> 		...
> 		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
> 		
> 	}
> }

Please, look at the patch.  In numerous places it is aligning
compact_cached_free_pfn to a multiple of pageblock_nr_pages.  But in
one place it doesn't do that.  So are all those alignment operations
necessary?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-04  7:42                   ` Andrew Morton
@ 2012-07-04  8:01                     ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  8:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

On 07/04/2012 04:42 PM, Andrew Morton wrote:

> On Wed, 04 Jul 2012 11:34:09 +0900 Minchan Kim <minchan@kernel.org> wrote:
> 
>>> The rest of this patch takes care to ensure that
>>> ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
>>> appears that this particular site will violate that.
>>>
>>> What's up?  Do we need to fix this site, or do we remove all that
>>> make-compact_cached_free_pfn-aligned code?
>>
>>
>> I vote removing the warning because it doesn't related to Rik's incremental compaction.
>> Let's see. 
>>
>> high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
>> In here, cc->migrate_pfn isn't necessarily pageblock aligined.
>> So if we don't consider compact_cached_free_pfn, it can hit.
>>
>> static void isolate_freepages()
>> {
>> 	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
>> 	for (..) {
>> 		...
>> 		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
>> 		
>> 	}
>> }
> 
> Please, look at the patch.  In numerous places it is aligning

> compact_cached_free_pfn to a multiple of pageblock_nr_pages.  But in

> one place it doesn't do that.  So are all those alignment operations
> necessary?


I mean if you *really* want to check the align, you should do following as

barrios@bbox:~/linux-memcg$ git diff
diff --git a/mm/compaction.c b/mm/compaction.c
index 6bb3e9f..12416d4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -467,16 +467,18 @@ static void isolate_freepages(struct zone *zone,
                }
                spin_unlock_irqrestore(&zone->lock, flags);
 
-               WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
                /*
                 * Record the highest PFN we isolated pages from. When next
                 * looking for free pages, the search will restart here as
                 * page migration may have returned some pages to the allocator
                 */
-               if (isolated)
+               if (isolated) {
                        high_pfn = max(high_pfn, pfn);
-               if (cc->order > 0)
-                       zone->compact_cached_free_pfn = high_pfn;
+                       if (cc->order > 0) {
+                               WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
+                               zone->compact_cached_free_pfn = high_pfn;
+                       }
+               }
        }
 
        /* split_free_page does not map the pages */


Because high_pfn could be not aligned in loop if it doesn't reset by max(high_pfn, pfn).
and it's legal. So regardless of Rik's patch, if you add such warning in that code,
it could emit WARNING, too. Rik already sent a patch which was similar to above
but he wanted to solve WARN_ON_ONCE problem by someone else.


-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-04  8:01                     ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-04  8:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

On 07/04/2012 04:42 PM, Andrew Morton wrote:

> On Wed, 04 Jul 2012 11:34:09 +0900 Minchan Kim <minchan@kernel.org> wrote:
> 
>>> The rest of this patch takes care to ensure that
>>> ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
>>> appears that this particular site will violate that.
>>>
>>> What's up?  Do we need to fix this site, or do we remove all that
>>> make-compact_cached_free_pfn-aligned code?
>>
>>
>> I vote removing the warning because it doesn't related to Rik's incremental compaction.
>> Let's see. 
>>
>> high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
>> In here, cc->migrate_pfn isn't necessarily pageblock aligined.
>> So if we don't consider compact_cached_free_pfn, it can hit.
>>
>> static void isolate_freepages()
>> {
>> 	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
>> 	for (..) {
>> 		...
>> 		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
>> 		
>> 	}
>> }
> 
> Please, look at the patch.  In numerous places it is aligning

> compact_cached_free_pfn to a multiple of pageblock_nr_pages.  But in

> one place it doesn't do that.  So are all those alignment operations
> necessary?


I mean if you *really* want to check the align, you should do following as

barrios@bbox:~/linux-memcg$ git diff
diff --git a/mm/compaction.c b/mm/compaction.c
index 6bb3e9f..12416d4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -467,16 +467,18 @@ static void isolate_freepages(struct zone *zone,
                }
                spin_unlock_irqrestore(&zone->lock, flags);
 
-               WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
                /*
                 * Record the highest PFN we isolated pages from. When next
                 * looking for free pages, the search will restart here as
                 * page migration may have returned some pages to the allocator
                 */
-               if (isolated)
+               if (isolated) {
                        high_pfn = max(high_pfn, pfn);
-               if (cc->order > 0)
-                       zone->compact_cached_free_pfn = high_pfn;
+                       if (cc->order > 0) {
+                               WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
+                               zone->compact_cached_free_pfn = high_pfn;
+                       }
+               }
        }
 
        /* split_free_page does not map the pages */


Because high_pfn could be not aligned in loop if it doesn't reset by max(high_pfn, pfn).
and it's legal. So regardless of Rik's patch, if you add such warning in that code,
it could emit WARNING, too. Rik already sent a patch which was similar to above
but he wanted to solve WARN_ON_ONCE problem by someone else.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-04  7:42                   ` Andrew Morton
@ 2012-07-04  9:57                     ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-07-04  9:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

On Wed, Jul 04, 2012 at 12:42:19AM -0700, Andrew Morton wrote:
> On Wed, 04 Jul 2012 11:34:09 +0900 Minchan Kim <minchan@kernel.org> wrote:
> 
> > > The rest of this patch takes care to ensure that
> > > ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
> > > appears that this particular site will violate that.
> > > 
> > > What's up?  Do we need to fix this site, or do we remove all that
> > > make-compact_cached_free_pfn-aligned code?
> > 
> > 
> > I vote removing the warning because it doesn't related to Rik's incremental compaction.
> > Let's see. 
> > 
> > high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
> > In here, cc->migrate_pfn isn't necessarily pageblock aligined.
> > So if we don't consider compact_cached_free_pfn, it can hit.
> > 
> > static void isolate_freepages()
> > {
> > 	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
> > 	for (..) {
> > 		...
> > 		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
> > 		
> > 	}
> > }
> 
> Please, look at the patch.  In numerous places it is aligning
> compact_cached_free_pfn to a multiple of pageblock_nr_pages.  But in
> one place it doesn't do that.  So are all those alignment operations
> necessary?
> 

I don't think the alignments are necessary. The main importance is that
it does not leave the zone.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-04  9:57                     ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-07-04  9:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

On Wed, Jul 04, 2012 at 12:42:19AM -0700, Andrew Morton wrote:
> On Wed, 04 Jul 2012 11:34:09 +0900 Minchan Kim <minchan@kernel.org> wrote:
> 
> > > The rest of this patch takes care to ensure that
> > > ->compact_cached_free_pfn is aligned to pageblock_nr_pages.  But it now
> > > appears that this particular site will violate that.
> > > 
> > > What's up?  Do we need to fix this site, or do we remove all that
> > > make-compact_cached_free_pfn-aligned code?
> > 
> > 
> > I vote removing the warning because it doesn't related to Rik's incremental compaction.
> > Let's see. 
> > 
> > high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages.
> > In here, cc->migrate_pfn isn't necessarily pageblock aligined.
> > So if we don't consider compact_cached_free_pfn, it can hit.
> > 
> > static void isolate_freepages()
> > {
> > 	high_pfn = min(low_pfn, pfn) = cc->migrate_pfn + pageblock_nr_pages;
> > 	for (..) {
> > 		...
> > 		 WARN_ON_ONCE(high_pfn & (pageblock_nr_pages - 1));
> > 		
> > 	}
> > }
> 
> Please, look at the patch.  In numerous places it is aligning
> compact_cached_free_pfn to a multiple of pageblock_nr_pages.  But in
> one place it doesn't do that.  So are all those alignment operations
> necessary?
> 

I don't think the alignments are necessary. The main importance is that
it does not leave the zone.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
  2012-07-04  2:28       ` Minchan Kim
@ 2012-07-04 10:08         ` Mel Gorman
  -1 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-07-04 10:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, linux-mm, linux-kernel, Andrew Morton, jaschut,
	kamezawa.hiroyu

On Wed, Jul 04, 2012 at 11:28:19AM +0900, Minchan Kim wrote:
> >>> +                      zone->compact_cached_free_pfn>
> >>> +                      cc->start_free_pfn))
> >>> +            pfn = min(pfn, zone->compact_cached_free_pfn);
> >>
> >>
> >> The pfn can be where migrate_pfn below?
> >> I mean we need this?
> >>
> >> if (pfn<= low_pfn)
> >>     goto out;
> > 
> > That is a good point. I guess there is a small possibility that
> > another compaction thread is below us with cc->free_pfn and
> > cc->migrate_pfn, and we just inherited its cc->free_pfn via
> > zone->compact_cached_free_pfn, bringing us to below our own
> > cc->migrate_pfn.
> > 
> > Given that this was already possible with parallel compaction
> > in the past, I am not sure how important it is. It could result
> > in wasting a little bit of CPU, but your fix for it looks easy
> > enough.
> 
> In the past, it was impossible since we have per-compaction context free_pfn.
>  
> 
> > 
> > Mel, any downside to compaction bailing (well, wrapping around)
> > a little earlier, like Minchan suggested?
> 
> 
> I can't speak for Mel. But IMHO, if we meet such case, we can ignore compact_cached_free_pfn
> , then go with just pfn instead of early bailing.
> 

Wrapping early is not a problem. As Minchan pointed out this applies in
the case where the migrate scanner and free scanner have almost met or
have overlapped. In this case, it might even be better to wrap early
because there is no point isolating free pages that will then have to be
migrated a second time when the free scanner wraps and the migrate
scanner moves in.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v2] mm: have order > 0 compaction start off where it left
@ 2012-07-04 10:08         ` Mel Gorman
  0 siblings, 0 replies; 50+ messages in thread
From: Mel Gorman @ 2012-07-04 10:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, linux-mm, linux-kernel, Andrew Morton, jaschut,
	kamezawa.hiroyu

On Wed, Jul 04, 2012 at 11:28:19AM +0900, Minchan Kim wrote:
> >>> +                      zone->compact_cached_free_pfn>
> >>> +                      cc->start_free_pfn))
> >>> +            pfn = min(pfn, zone->compact_cached_free_pfn);
> >>
> >>
> >> The pfn can be where migrate_pfn below?
> >> I mean we need this?
> >>
> >> if (pfn<= low_pfn)
> >>     goto out;
> > 
> > That is a good point. I guess there is a small possibility that
> > another compaction thread is below us with cc->free_pfn and
> > cc->migrate_pfn, and we just inherited its cc->free_pfn via
> > zone->compact_cached_free_pfn, bringing us to below our own
> > cc->migrate_pfn.
> > 
> > Given that this was already possible with parallel compaction
> > in the past, I am not sure how important it is. It could result
> > in wasting a little bit of CPU, but your fix for it looks easy
> > enough.
> 
> In the past, it was impossible since we have per-compaction context free_pfn.
>  
> 
> > 
> > Mel, any downside to compaction bailing (well, wrapping around)
> > a little earlier, like Minchan suggested?
> 
> 
> I can't speak for Mel. But IMHO, if we meet such case, we can ignore compact_cached_free_pfn
> , then go with just pfn instead of early bailing.
> 

Wrapping early is not a problem. As Minchan pointed out this applies in
the case where the migrate scanner and free scanner have almost met or
have overlapped. In this case, it might even be better to wrap early
because there is no point isolating free pages that will then have to be
migrated a second time when the free scanner wraps and the migrate
scanner moves in.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH -mm v3] mm: have order > 0 compaction start off where it left
  2012-07-04  8:01                     ` Minchan Kim
@ 2012-07-11 20:18                       ` Rik van Riel
  -1 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-11 20:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

This patch makes the comment for cc->wrapped longer, explaining
what is really going on. It also incorporates the comment fix
pointed out by Minchan.

Additionally, Minchan found that, when no pages get isolated,
high_pte could be a value that is much lower than desired,
which might potentially cause compaction to skip a range of
pages.

Only assign zone->compact_cache_free_pfn if we actually
isolated free pages for compaction.

Split out the calculation to get the start of the last page
block in a zone into its own, commented function.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/mmzone.h |    2 +-
 mm/compaction.c        |   30 ++++++++++++++++++++++--------
 mm/internal.h          |    6 +++++-
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e629594..e957fa1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -370,7 +370,7 @@ struct zone {
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
-	/* pfn where the last order > 0 compaction isolated free pages */
+	/* pfn where the last incremental compaction isolated free pages */
 	unsigned long		compact_cached_free_pfn;
 #endif
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/compaction.c b/mm/compaction.c
index 2668b77..3812c3e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -472,10 +472,11 @@ static void isolate_freepages(struct zone *zone,
 		 * looking for free pages, the search will restart here as
 		 * page migration may have returned some pages to the allocator
 		 */
-		if (isolated)
+		if (isolated) {
 			high_pfn = max(high_pfn, pfn);
-		if (cc->order > 0)
-			zone->compact_cached_free_pfn = high_pfn;
+			if (cc->order > 0)
+				zone->compact_cached_free_pfn = high_pfn;
+		}
 	}
 
 	/* split_free_page does not map the pages */
@@ -569,6 +570,21 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	return ISOLATE_SUCCESS;
 }
 
+/*
+ * Returns the start pfn of the laste page block in a zone.
+ * This is the starting point for full compaction of a zone.
+ * Compaction searches for free pages from the end of each zone,
+ * while isolate_freepages_block scans forward inside each page
+ * block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+	unsigned long free_pfn;
+	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+	free_pfn &= ~(pageblock_nr_pages-1);
+	return free_pfn;
+}
+
 static int compact_finished(struct zone *zone,
 			    struct compact_control *cc)
 {
@@ -587,10 +603,9 @@ static int compact_finished(struct zone *zone,
 	if (cc->free_pfn <= cc->migrate_pfn) {
 		if (cc->order > 0 && !cc->wrapped) {
 			/* We started partway through; restart at the end. */
-			unsigned long free_pfn;
-			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
-			free_pfn &= ~(pageblock_nr_pages-1);
+			unsigned long free_pfn = start_free_pfn(zone);
 			zone->compact_cached_free_pfn = free_pfn;
+			cc->free_pfn = free_pfn;
 			cc->wrapped = 1;
 			return COMPACT_CONTINUE;
 		}
@@ -703,8 +718,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		cc->start_free_pfn = cc->free_pfn;
 	} else {
 		/* Order == -1 starts at the end of the zone. */
-		cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
-		cc->free_pfn &= ~(pageblock_nr_pages-1);
+		cc->free_pfn = start_free_pfn(zone);
 	}
 
 	migrate_prep_local();
diff --git a/mm/internal.h b/mm/internal.h
index 0b72461..da6b9b2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -121,7 +121,11 @@ struct compact_control {
 	unsigned long start_free_pfn;	/* where we started the search */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
-	bool wrapped;			/* Last round for order>0 compaction */
+	bool wrapped;			/* Order > 0 compactions are
+					   incremental, once free_pfn
+					   and migrate_pfn meet, we restart
+					   from the top of the zone;
+					   remember we wrapped around. */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH -mm v3] mm: have order > 0 compaction start off where it left
@ 2012-07-11 20:18                       ` Rik van Riel
  0 siblings, 0 replies; 50+ messages in thread
From: Rik van Riel @ 2012-07-11 20:18 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Mel Gorman, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

This patch makes the comment for cc->wrapped longer, explaining
what is really going on. It also incorporates the comment fix
pointed out by Minchan.

Additionally, Minchan found that, when no pages get isolated,
high_pte could be a value that is much lower than desired,
which might potentially cause compaction to skip a range of
pages.

Only assign zone->compact_cache_free_pfn if we actually
isolated free pages for compaction.

Split out the calculation to get the start of the last page
block in a zone into its own, commented function.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/mmzone.h |    2 +-
 mm/compaction.c        |   30 ++++++++++++++++++++++--------
 mm/internal.h          |    6 +++++-
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e629594..e957fa1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -370,7 +370,7 @@ struct zone {
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
-	/* pfn where the last order > 0 compaction isolated free pages */
+	/* pfn where the last incremental compaction isolated free pages */
 	unsigned long		compact_cached_free_pfn;
 #endif
 #ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/compaction.c b/mm/compaction.c
index 2668b77..3812c3e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -472,10 +472,11 @@ static void isolate_freepages(struct zone *zone,
 		 * looking for free pages, the search will restart here as
 		 * page migration may have returned some pages to the allocator
 		 */
-		if (isolated)
+		if (isolated) {
 			high_pfn = max(high_pfn, pfn);
-		if (cc->order > 0)
-			zone->compact_cached_free_pfn = high_pfn;
+			if (cc->order > 0)
+				zone->compact_cached_free_pfn = high_pfn;
+		}
 	}
 
 	/* split_free_page does not map the pages */
@@ -569,6 +570,21 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	return ISOLATE_SUCCESS;
 }
 
+/*
+ * Returns the start pfn of the laste page block in a zone.
+ * This is the starting point for full compaction of a zone.
+ * Compaction searches for free pages from the end of each zone,
+ * while isolate_freepages_block scans forward inside each page
+ * block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+	unsigned long free_pfn;
+	free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+	free_pfn &= ~(pageblock_nr_pages-1);
+	return free_pfn;
+}
+
 static int compact_finished(struct zone *zone,
 			    struct compact_control *cc)
 {
@@ -587,10 +603,9 @@ static int compact_finished(struct zone *zone,
 	if (cc->free_pfn <= cc->migrate_pfn) {
 		if (cc->order > 0 && !cc->wrapped) {
 			/* We started partway through; restart at the end. */
-			unsigned long free_pfn;
-			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
-			free_pfn &= ~(pageblock_nr_pages-1);
+			unsigned long free_pfn = start_free_pfn(zone);
 			zone->compact_cached_free_pfn = free_pfn;
+			cc->free_pfn = free_pfn;
 			cc->wrapped = 1;
 			return COMPACT_CONTINUE;
 		}
@@ -703,8 +718,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 		cc->start_free_pfn = cc->free_pfn;
 	} else {
 		/* Order == -1 starts at the end of the zone. */
-		cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
-		cc->free_pfn &= ~(pageblock_nr_pages-1);
+		cc->free_pfn = start_free_pfn(zone);
 	}
 
 	migrate_prep_local();
diff --git a/mm/internal.h b/mm/internal.h
index 0b72461..da6b9b2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -121,7 +121,11 @@ struct compact_control {
 	unsigned long start_free_pfn;	/* where we started the search */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
-	bool wrapped;			/* Last round for order>0 compaction */
+	bool wrapped;			/* Order > 0 compactions are
+					   incremental, once free_pfn
+					   and migrate_pfn meet, we restart
+					   from the top of the zone;
+					   remember we wrapped around. */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v3] mm: have order > 0 compaction start off where it left
  2012-07-11 20:18                       ` Rik van Riel
@ 2012-07-12  2:26                         ` Minchan Kim
  -1 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-12  2:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

Hi Rik,

On Wed, Jul 11, 2012 at 04:18:00PM -0400, Rik van Riel wrote:
> This patch makes the comment for cc->wrapped longer, explaining
> what is really going on. It also incorporates the comment fix
> pointed out by Minchan.
> 
> Additionally, Minchan found that, when no pages get isolated,
> high_pte could be a value that is much lower than desired,

s/high_pte/high_pfn

> which might potentially cause compaction to skip a range of
> pages.
> 
> Only assign zone->compact_cache_free_pfn if we actually
> isolated free pages for compaction.
> 
> Split out the calculation to get the start of the last page
> block in a zone into its own, commented function.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Minchan Kim <minchan@kernel.org>

> ---
>  include/linux/mmzone.h |    2 +-
>  mm/compaction.c        |   30 ++++++++++++++++++++++--------
>  mm/internal.h          |    6 +++++-
>  3 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e629594..e957fa1 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -370,7 +370,7 @@ struct zone {
>  	spinlock_t		lock;
>  	int                     all_unreclaimable; /* All pages pinned */
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> -	/* pfn where the last order > 0 compaction isolated free pages */
> +	/* pfn where the last incremental compaction isolated free pages */
>  	unsigned long		compact_cached_free_pfn;
>  #endif
>  #ifdef CONFIG_MEMORY_HOTPLUG
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 2668b77..3812c3e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -472,10 +472,11 @@ static void isolate_freepages(struct zone *zone,
>  		 * looking for free pages, the search will restart here as
>  		 * page migration may have returned some pages to the allocator
>  		 */
> -		if (isolated)
> +		if (isolated) {
>  			high_pfn = max(high_pfn, pfn);
> -		if (cc->order > 0)
> -			zone->compact_cached_free_pfn = high_pfn;
> +			if (cc->order > 0)
> +				zone->compact_cached_free_pfn = high_pfn;
> +		}
>  	}
>  
>  	/* split_free_page does not map the pages */
> @@ -569,6 +570,21 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	return ISOLATE_SUCCESS;
>  }
>  
> +/*
> + * Returns the start pfn of the laste page block in a zone.

s/laste/last/




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH -mm v3] mm: have order > 0 compaction start off where it left
@ 2012-07-12  2:26                         ` Minchan Kim
  0 siblings, 0 replies; 50+ messages in thread
From: Minchan Kim @ 2012-07-12  2:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Mel Gorman, Sasha Levin, linux-mm, linux-kernel,
	jaschut, kamezawa.hiroyu, Dave Jones

Hi Rik,

On Wed, Jul 11, 2012 at 04:18:00PM -0400, Rik van Riel wrote:
> This patch makes the comment for cc->wrapped longer, explaining
> what is really going on. It also incorporates the comment fix
> pointed out by Minchan.
> 
> Additionally, Minchan found that, when no pages get isolated,
> high_pte could be a value that is much lower than desired,

s/high_pte/high_pfn

> which might potentially cause compaction to skip a range of
> pages.
> 
> Only assign zone->compact_cache_free_pfn if we actually
> isolated free pages for compaction.
> 
> Split out the calculation to get the start of the last page
> block in a zone into its own, commented function.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Minchan Kim <minchan@kernel.org>

> ---
>  include/linux/mmzone.h |    2 +-
>  mm/compaction.c        |   30 ++++++++++++++++++++++--------
>  mm/internal.h          |    6 +++++-
>  3 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e629594..e957fa1 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -370,7 +370,7 @@ struct zone {
>  	spinlock_t		lock;
>  	int                     all_unreclaimable; /* All pages pinned */
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> -	/* pfn where the last order > 0 compaction isolated free pages */
> +	/* pfn where the last incremental compaction isolated free pages */
>  	unsigned long		compact_cached_free_pfn;
>  #endif
>  #ifdef CONFIG_MEMORY_HOTPLUG
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 2668b77..3812c3e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -472,10 +472,11 @@ static void isolate_freepages(struct zone *zone,
>  		 * looking for free pages, the search will restart here as
>  		 * page migration may have returned some pages to the allocator
>  		 */
> -		if (isolated)
> +		if (isolated) {
>  			high_pfn = max(high_pfn, pfn);
> -		if (cc->order > 0)
> -			zone->compact_cached_free_pfn = high_pfn;
> +			if (cc->order > 0)
> +				zone->compact_cached_free_pfn = high_pfn;
> +		}
>  	}
>  
>  	/* split_free_page does not map the pages */
> @@ -569,6 +570,21 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
>  	return ISOLATE_SUCCESS;
>  }
>  
> +/*
> + * Returns the start pfn of the laste page block in a zone.

s/laste/last/



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2012-07-12  2:26 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-28 17:55 [PATCH -mm v2] mm: have order > 0 compaction start off where it left Rik van Riel
2012-06-28 17:55 ` Rik van Riel
2012-06-28 20:19 ` Jim Schutt
2012-06-28 20:19   ` Jim Schutt
2012-06-28 20:57   ` Rik van Riel
2012-06-28 20:57     ` Rik van Riel
2012-06-28 20:59 ` Andrew Morton
2012-06-28 20:59   ` Andrew Morton
2012-06-28 21:24   ` Rik van Riel
2012-06-28 21:24     ` Rik van Riel
2012-06-28 21:35     ` Andrew Morton
2012-06-28 21:35       ` Andrew Morton
2012-07-02 17:42       ` Sasha Levin
2012-07-02 17:42         ` Sasha Levin
2012-07-03  0:57         ` Rik van Riel
2012-07-03  0:57           ` Rik van Riel
2012-07-03  2:54           ` Minchan Kim
2012-07-03  2:54             ` Minchan Kim
2012-07-03 10:10           ` Mel Gorman
2012-07-03 10:10             ` Mel Gorman
2012-07-03 21:48             ` Andrew Morton
2012-07-03 21:48               ` Andrew Morton
2012-07-04  2:34               ` Minchan Kim
2012-07-04  2:34                 ` Minchan Kim
2012-07-04  7:42                 ` Andrew Morton
2012-07-04  7:42                   ` Andrew Morton
2012-07-04  8:01                   ` Minchan Kim
2012-07-04  8:01                     ` Minchan Kim
2012-07-11 20:18                     ` [PATCH -mm v3] " Rik van Riel
2012-07-11 20:18                       ` Rik van Riel
2012-07-12  2:26                       ` Minchan Kim
2012-07-12  2:26                         ` Minchan Kim
2012-07-04  9:57                   ` [PATCH -mm v2] " Mel Gorman
2012-07-04  9:57                     ` Mel Gorman
2012-06-28 23:27 ` Minchan Kim
2012-06-28 23:27   ` Minchan Kim
2012-07-03 14:59   ` Rik van Riel
2012-07-03 14:59     ` Rik van Riel
2012-07-04  2:28     ` Minchan Kim
2012-07-04  2:28       ` Minchan Kim
2012-07-04 10:08       ` Mel Gorman
2012-07-04 10:08         ` Mel Gorman
2012-07-03 20:13   ` [PATCH -mm] mm: minor fixes for compaction Rik van Riel
2012-07-03 20:13     ` Rik van Riel
2012-07-04  2:36     ` Minchan Kim
2012-07-04  2:36       ` Minchan Kim
2012-06-29 10:02 ` [PATCH -mm v2] mm: have order > 0 compaction start off where it left Mel Gorman
2012-06-29 10:02   ` Mel Gorman
2012-06-30  3:51 ` Kamezawa Hiroyuki
2012-06-30  3:51   ` Kamezawa Hiroyuki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.