All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-28  3:37 ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-06-28  3:37 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, Mel Gorman, kamezawa.hiroyu, minchan, linux-kernel, jaschut

Order > 0 compaction stops when enough free pages of the correct
page order have been coalesced. When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things
to at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting
at the end of the zone each time, even though previous invocations
of the compaction code already filled up all free memory on that end
of the zone.

This can cause isolate_freepages to take enormous amounts of CPU
with certain workloads on larger memory systems.

The obvious solution is to have isolate_freepages remember where
it left off last time, and continue at that point the next time
it gets invoked for an order > 0 compaction. This could cause
compaction to fail if cc->free_pfn and cc->migrate_pfn are close
together initially, in that case we restart from the end of the
zone and try once more.

Forced full (order == -1) compactions are left alone.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
CAUTION: due to the time of day, I have only COMPILE tested this code

 include/linux/mmzone.h |    4 ++++
 mm/compaction.c        |   25 +++++++++++++++++++++++--
 mm/internal.h          |    1 +
 mm/page_alloc.c        |    4 ++++
 4 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2427706..b8a5c36 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -369,6 +369,10 @@ struct zone {
 	 */
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+	/* pfn where the last order > 0 compaction isolated free pages */
+	unsigned long		last_free_pfn;
+#endif
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..0e9e995 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -422,6 +422,10 @@ static void isolate_freepages(struct zone *zone,
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
 
+		/* Skip ahead if somebody else is compacting simultaneously. */
+		if (cc->order > 0)
+			pfn = min(pfn, zone->last_free_pfn);
+
 		if (!pfn_valid(pfn))
 			continue;
 
@@ -463,6 +467,8 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		if (isolated)
 			high_pfn = max(high_pfn, pfn);
+		if (cc->order > 0)
+			zone->last_free_pfn = high_pfn;
 	}
 
 	/* split_free_page does not map the pages */
@@ -565,9 +571,24 @@ static int compact_finished(struct zone *zone,
 	if (fatal_signal_pending(current))
 		return COMPACT_PARTIAL;
 
-	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn)
+	/*
+	 * A full (order == -1) compaction run starts at the beginning and
+	 * end of a zone; it completes when the migrate and free scanner meet. 
+	 * A partial (order > 0) compaction can start with the free scanner
+	 * at a random point in the zone, and may have to restart.
+	 */
+	if (cc->free_pfn <= cc->migrate_pfn) {
+		if (cc->order > 0 && !cc->last_round) {
+			/* We started partway through; restart at the end. */
+			unsigned long free_pfn;
+			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+			free_pfn &= ~(pageblock_nr_pages-1);
+			zone->last_free_pfn = free_pfn;
+			cc->last_round = 1;
+			return COMPACT_CONTINUE;
+		}
 		return COMPACT_COMPLETE;
+	}
 
 	/*
 	 * order == -1 is expected when compacting via
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..b041874 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -120,6 +120,7 @@ struct compact_control {
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
+	bool last_round;		/* Last round for order>0 compaction */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..86de652 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4394,6 +4394,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+		zone->last_free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+		zone->last_free_pfn &= ~(pageblock_nr_pages-1);
+#endif
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-28  3:37 ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-06-28  3:37 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, Mel Gorman, kamezawa.hiroyu, minchan, linux-kernel, jaschut

Order > 0 compaction stops when enough free pages of the correct
page order have been coalesced. When doing subsequent higher order
allocations, it is possible for compaction to be invoked many times.

However, the compaction code always starts out looking for things to
compact at the start of the zone, and for free pages to compact things
to at the end of the zone.

This can cause quadratic behaviour, with isolate_freepages starting
at the end of the zone each time, even though previous invocations
of the compaction code already filled up all free memory on that end
of the zone.

This can cause isolate_freepages to take enormous amounts of CPU
with certain workloads on larger memory systems.

The obvious solution is to have isolate_freepages remember where
it left off last time, and continue at that point the next time
it gets invoked for an order > 0 compaction. This could cause
compaction to fail if cc->free_pfn and cc->migrate_pfn are close
together initially, in that case we restart from the end of the
zone and try once more.

Forced full (order == -1) compactions are left alone.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
CAUTION: due to the time of day, I have only COMPILE tested this code

 include/linux/mmzone.h |    4 ++++
 mm/compaction.c        |   25 +++++++++++++++++++++++--
 mm/internal.h          |    1 +
 mm/page_alloc.c        |    4 ++++
 4 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2427706..b8a5c36 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -369,6 +369,10 @@ struct zone {
 	 */
 	spinlock_t		lock;
 	int                     all_unreclaimable; /* All pages pinned */
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+	/* pfn where the last order > 0 compaction isolated free pages */
+	unsigned long		last_free_pfn;
+#endif
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..0e9e995 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -422,6 +422,10 @@ static void isolate_freepages(struct zone *zone,
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
 
+		/* Skip ahead if somebody else is compacting simultaneously. */
+		if (cc->order > 0)
+			pfn = min(pfn, zone->last_free_pfn);
+
 		if (!pfn_valid(pfn))
 			continue;
 
@@ -463,6 +467,8 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		if (isolated)
 			high_pfn = max(high_pfn, pfn);
+		if (cc->order > 0)
+			zone->last_free_pfn = high_pfn;
 	}
 
 	/* split_free_page does not map the pages */
@@ -565,9 +571,24 @@ static int compact_finished(struct zone *zone,
 	if (fatal_signal_pending(current))
 		return COMPACT_PARTIAL;
 
-	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn)
+	/*
+	 * A full (order == -1) compaction run starts at the beginning and
+	 * end of a zone; it completes when the migrate and free scanner meet. 
+	 * A partial (order > 0) compaction can start with the free scanner
+	 * at a random point in the zone, and may have to restart.
+	 */
+	if (cc->free_pfn <= cc->migrate_pfn) {
+		if (cc->order > 0 && !cc->last_round) {
+			/* We started partway through; restart at the end. */
+			unsigned long free_pfn;
+			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+			free_pfn &= ~(pageblock_nr_pages-1);
+			zone->last_free_pfn = free_pfn;
+			cc->last_round = 1;
+			return COMPACT_CONTINUE;
+		}
 		return COMPACT_COMPLETE;
+	}
 
 	/*
 	 * order == -1 is expected when compacting via
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..b041874 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -120,6 +120,7 @@ struct compact_control {
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
 	bool sync;			/* Synchronous migration */
+	bool last_round;		/* Last round for order>0 compaction */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..86de652 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4394,6 +4394,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
+#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+		zone->last_free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+		zone->last_free_pfn &= ~(pageblock_nr_pages-1);
+#endif
 #ifdef CONFIG_NUMA
 		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
  2012-06-28  3:37 ` Rik van Riel
@ 2012-06-28 10:29   ` Mel Gorman
  -1 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-06-28 10:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, akpm, kamezawa.hiroyu, minchan, linux-kernel, jaschut

On Wed, Jun 27, 2012 at 11:37:42PM -0400, Rik van Riel wrote:
> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.
> 
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>

In principal, this is a good idea and I like it. Not so sure about the
details :)

> ---
> CAUTION: due to the time of day, I have only COMPILE tested this code
> 
>  include/linux/mmzone.h |    4 ++++
>  mm/compaction.c        |   25 +++++++++++++++++++++++--
>  mm/internal.h          |    1 +
>  mm/page_alloc.c        |    4 ++++
>  4 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2427706..b8a5c36 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -369,6 +369,10 @@ struct zone {
>  	 */
>  	spinlock_t		lock;
>  	int                     all_unreclaimable; /* All pages pinned */
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +	/* pfn where the last order > 0 compaction isolated free pages */
> +	unsigned long		last_free_pfn;
> +#endif

last_free_pfn could be misleading as a name. At a glance it implies that
it stores the PFN of the highest free page. compact_cached_free_pfn?

>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	/* see spanned/present_pages for more description */
>  	seqlock_t		span_seqlock;
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..0e9e995 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -422,6 +422,10 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> +		/* Skip ahead if somebody else is compacting simultaneously. */
> +		if (cc->order > 0)
> +			pfn = min(pfn, zone->last_free_pfn);
> +
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -463,6 +467,8 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated)
>  			high_pfn = max(high_pfn, pfn);
> +		if (cc->order > 0)
> +			zone->last_free_pfn = high_pfn;
>  	}
>  
>  	/* split_free_page does not map the pages */

This is not necessarily good behaviour.

Lets say there are two parallel compactions running. Process A meets
the migration PFN and moves to the end of the zone to restart. Process B
finishes scanning mid-way through the zone and updates last_free_pfn. This
will cause Process A to "jump" to where Process B left off which is not
necessarily desirable.

Another side effect is that a workload that allocations/frees
aggressively will not compact as well as the "free" scanner is not
scanning the end of the zone each time. It would be better if
last_free_pfn was updated when a full pageblock was encountered

So;

1. Initialise last_free_pfn to the end of the zone
2. On compaction, scan from last_free_pfn and record where it started
3. If a pageblock is full, update last_free_pfn
4. If the migration and free scanner meet, reset last_free_pfn and
   the free scanner. Abort if the free scanner wraps to where it started

Does that make sense?

> @@ -565,9 +571,24 @@ static int compact_finished(struct zone *zone,
>  	if (fatal_signal_pending(current))
>  		return COMPACT_PARTIAL;
>  
> -	/* Compaction run completes if the migrate and free scanner meet */
> -	if (cc->free_pfn <= cc->migrate_pfn)
> +	/*
> +	 * A full (order == -1) compaction run starts at the beginning and
> +	 * end of a zone; it completes when the migrate and free scanner meet. 
> +	 * A partial (order > 0) compaction can start with the free scanner
> +	 * at a random point in the zone, and may have to restart.
> +	 */
> +	if (cc->free_pfn <= cc->migrate_pfn) {
> +		if (cc->order > 0 && !cc->last_round) {
> +			/* We started partway through; restart at the end. */
> +			unsigned long free_pfn;
> +			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +			free_pfn &= ~(pageblock_nr_pages-1);
> +			zone->last_free_pfn = free_pfn;
> +			cc->last_round = 1;
> +			return COMPACT_CONTINUE;
> +		}
>  		return COMPACT_COMPLETE;
> +	}
>  
>  	/*
>  	 * order == -1 is expected when compacting via
> diff --git a/mm/internal.h b/mm/internal.h
> index 2ba87fb..b041874 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -120,6 +120,7 @@ struct compact_control {
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>  	bool sync;			/* Synchronous migration */
> +	bool last_round;		/* Last round for order>0 compaction */
>  

I don't get what you mean by last_round. Did you mean "wrapped". When
false, it means the free scanner started from last_pfn and when true it
means it started from last_pfn, met the migrate scanner and wrapped
around to the end of the zone?

>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..86de652 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4394,6 +4394,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  
>  		zone->spanned_pages = size;
>  		zone->present_pages = realsize;
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +		zone->last_free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +		zone->last_free_pfn &= ~(pageblock_nr_pages-1);
> +#endif
>  #ifdef CONFIG_NUMA
>  		zone->node = nid;
>  		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-28 10:29   ` Mel Gorman
  0 siblings, 0 replies; 12+ messages in thread
From: Mel Gorman @ 2012-06-28 10:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, akpm, kamezawa.hiroyu, minchan, linux-kernel, jaschut

On Wed, Jun 27, 2012 at 11:37:42PM -0400, Rik van Riel wrote:
> Order > 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
> 
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
> 
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
> 
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
> 
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order > 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
> 
> Forced full (order == -1) compactions are left alone.
> 
> Reported-by: Jim Schutt <jaschut@sandia.gov>
> Signed-off-by: Rik van Riel <riel@redhat.com>

In principal, this is a good idea and I like it. Not so sure about the
details :)

> ---
> CAUTION: due to the time of day, I have only COMPILE tested this code
> 
>  include/linux/mmzone.h |    4 ++++
>  mm/compaction.c        |   25 +++++++++++++++++++++++--
>  mm/internal.h          |    1 +
>  mm/page_alloc.c        |    4 ++++
>  4 files changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 2427706..b8a5c36 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -369,6 +369,10 @@ struct zone {
>  	 */
>  	spinlock_t		lock;
>  	int                     all_unreclaimable; /* All pages pinned */
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +	/* pfn where the last order > 0 compaction isolated free pages */
> +	unsigned long		last_free_pfn;
> +#endif

last_free_pfn could be misleading as a name. At a glance it implies that
it stores the PFN of the highest free page. compact_cached_free_pfn?

>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	/* see spanned/present_pages for more description */
>  	seqlock_t		span_seqlock;
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..0e9e995 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -422,6 +422,10 @@ static void isolate_freepages(struct zone *zone,
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
>  
> +		/* Skip ahead if somebody else is compacting simultaneously. */
> +		if (cc->order > 0)
> +			pfn = min(pfn, zone->last_free_pfn);
> +
>  		if (!pfn_valid(pfn))
>  			continue;
>  
> @@ -463,6 +467,8 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		if (isolated)
>  			high_pfn = max(high_pfn, pfn);
> +		if (cc->order > 0)
> +			zone->last_free_pfn = high_pfn;
>  	}
>  
>  	/* split_free_page does not map the pages */

This is not necessarily good behaviour.

Lets say there are two parallel compactions running. Process A meets
the migration PFN and moves to the end of the zone to restart. Process B
finishes scanning mid-way through the zone and updates last_free_pfn. This
will cause Process A to "jump" to where Process B left off which is not
necessarily desirable.

Another side effect is that a workload that allocations/frees
aggressively will not compact as well as the "free" scanner is not
scanning the end of the zone each time. It would be better if
last_free_pfn was updated when a full pageblock was encountered

So;

1. Initialise last_free_pfn to the end of the zone
2. On compaction, scan from last_free_pfn and record where it started
3. If a pageblock is full, update last_free_pfn
4. If the migration and free scanner meet, reset last_free_pfn and
   the free scanner. Abort if the free scanner wraps to where it started

Does that make sense?

> @@ -565,9 +571,24 @@ static int compact_finished(struct zone *zone,
>  	if (fatal_signal_pending(current))
>  		return COMPACT_PARTIAL;
>  
> -	/* Compaction run completes if the migrate and free scanner meet */
> -	if (cc->free_pfn <= cc->migrate_pfn)
> +	/*
> +	 * A full (order == -1) compaction run starts at the beginning and
> +	 * end of a zone; it completes when the migrate and free scanner meet. 
> +	 * A partial (order > 0) compaction can start with the free scanner
> +	 * at a random point in the zone, and may have to restart.
> +	 */
> +	if (cc->free_pfn <= cc->migrate_pfn) {
> +		if (cc->order > 0 && !cc->last_round) {
> +			/* We started partway through; restart at the end. */
> +			unsigned long free_pfn;
> +			free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +			free_pfn &= ~(pageblock_nr_pages-1);
> +			zone->last_free_pfn = free_pfn;
> +			cc->last_round = 1;
> +			return COMPACT_CONTINUE;
> +		}
>  		return COMPACT_COMPLETE;
> +	}
>  
>  	/*
>  	 * order == -1 is expected when compacting via
> diff --git a/mm/internal.h b/mm/internal.h
> index 2ba87fb..b041874 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -120,6 +120,7 @@ struct compact_control {
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>  	bool sync;			/* Synchronous migration */
> +	bool last_round;		/* Last round for order>0 compaction */
>  

I don't get what you mean by last_round. Did you mean "wrapped". When
false, it means the free scanner started from last_pfn and when true it
means it started from last_pfn, met the migrate scanner and wrapped
around to the end of the zone?

>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..86de652 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4394,6 +4394,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  
>  		zone->spanned_pages = size;
>  		zone->present_pages = realsize;
> +#if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +		zone->last_free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +		zone->last_free_pfn &= ~(pageblock_nr_pages-1);
> +#endif
>  #ifdef CONFIG_NUMA
>  		zone->node = nid;
>  		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
  2012-06-28 10:29   ` Mel Gorman
@ 2012-06-28 16:30     ` Rik van Riel
  -1 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-06-28 16:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, akpm, kamezawa.hiroyu, minchan, linux-kernel, jaschut

On 06/28/2012 06:29 AM, Mel Gorman wrote:

> Lets say there are two parallel compactions running. Process A meets
> the migration PFN and moves to the end of the zone to restart. Process B
> finishes scanning mid-way through the zone and updates last_free_pfn. This
> will cause Process A to "jump" to where Process B left off which is not
> necessarily desirable.
>
> Another side effect is that a workload that allocations/frees
> aggressively will not compact as well as the "free" scanner is not
> scanning the end of the zone each time. It would be better if
> last_free_pfn was updated when a full pageblock was encountered
>
> So;
>
> 1. Initialise last_free_pfn to the end of the zone
> 2. On compaction, scan from last_free_pfn and record where it started
> 3. If a pageblock is full, update last_free_pfn
> 4. If the migration and free scanner meet, reset last_free_pfn and
>     the free scanner. Abort if the free scanner wraps to where it started
>
> Does that make sense?

Yes, that makes sense.  We still have to keep track
of whether we have wrapped around, but I guess that
allows for a better name for the bool :)

Maybe cc->wrapped?

Does anyone have a better name?

As for point (4), should we abort when we wrap
around to where we started, or should we abort
when free_pfn and migrate_pfn meet after we
wrapped around?

>> diff --git a/mm/internal.h b/mm/internal.h
>> index 2ba87fb..b041874 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -120,6 +120,7 @@ struct compact_control {
>>   	unsigned long free_pfn;		/* isolate_freepages search base */
>>   	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>>   	bool sync;			/* Synchronous migration */
>> +	bool last_round;		/* Last round for order>0 compaction */
>>
>
> I don't get what you mean by last_round. Did you mean "wrapped". When
> false, it means the free scanner started from last_pfn and when true it
> means it started from last_pfn, met the migrate scanner and wrapped
> around to the end of the zone?

Yes, I do mean "wrapped" :)

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-28 16:30     ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-06-28 16:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, akpm, kamezawa.hiroyu, minchan, linux-kernel, jaschut

On 06/28/2012 06:29 AM, Mel Gorman wrote:

> Lets say there are two parallel compactions running. Process A meets
> the migration PFN and moves to the end of the zone to restart. Process B
> finishes scanning mid-way through the zone and updates last_free_pfn. This
> will cause Process A to "jump" to where Process B left off which is not
> necessarily desirable.
>
> Another side effect is that a workload that allocations/frees
> aggressively will not compact as well as the "free" scanner is not
> scanning the end of the zone each time. It would be better if
> last_free_pfn was updated when a full pageblock was encountered
>
> So;
>
> 1. Initialise last_free_pfn to the end of the zone
> 2. On compaction, scan from last_free_pfn and record where it started
> 3. If a pageblock is full, update last_free_pfn
> 4. If the migration and free scanner meet, reset last_free_pfn and
>     the free scanner. Abort if the free scanner wraps to where it started
>
> Does that make sense?

Yes, that makes sense.  We still have to keep track
of whether we have wrapped around, but I guess that
allows for a better name for the bool :)

Maybe cc->wrapped?

Does anyone have a better name?

As for point (4), should we abort when we wrap
around to where we started, or should we abort
when free_pfn and migrate_pfn meet after we
wrapped around?

>> diff --git a/mm/internal.h b/mm/internal.h
>> index 2ba87fb..b041874 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -120,6 +120,7 @@ struct compact_control {
>>   	unsigned long free_pfn;		/* isolate_freepages search base */
>>   	unsigned long migrate_pfn;	/* isolate_migratepages search base */
>>   	bool sync;			/* Synchronous migration */
>> +	bool last_round;		/* Last round for order>0 compaction */
>>
>
> I don't get what you mean by last_round. Did you mean "wrapped". When
> false, it means the free scanner started from last_pfn and when true it
> means it started from last_pfn, met the migrate scanner and wrapped
> around to the end of the zone?

Yes, I do mean "wrapped" :)

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
  2012-06-28  3:37 ` Rik van Riel
@ 2012-06-28 17:16   ` Jim Schutt
  -1 siblings, 0 replies; 12+ messages in thread
From: Jim Schutt @ 2012-06-28 17:16 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, akpm, Mel Gorman, kamezawa.hiroyu, minchan, linux-kernel


On 06/27/2012 09:37 PM, Rik van Riel wrote:
> Order>  0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
>
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
>
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
>
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
>
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order>  0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
>
> Forced full (order == -1) compactions are left alone.
>
> Reported-by: Jim Schutt<jaschut@sandia.gov>
> Signed-off-by: Rik van Riel<riel@redhat.com>

Tested-by: Jim Schutt<jaschut@sandia.gov>

Please let me know if you further refine this patch
and would like me to test it with my workload.

> ---
> CAUTION: due to the time of day, I have only COMPILE tested this code
>
>   include/linux/mmzone.h |    4 ++++
>   mm/compaction.c        |   25 +++++++++++++++++++++++--
>   mm/internal.h          |    1 +
>   mm/page_alloc.c        |    4 ++++
>   4 files changed, 32 insertions(+), 2 deletions(-)

This patch is working great for me.

FWIW here's a typical vmstat report, after ~20 minutes of my Ceph load:

2012-06-28 10:59:16.887-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
  r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
23 21          0     393128        480   36883448    0    0     8 49583   90  273   6 25  58 11  0
  6 18          0     397892        480   36912832    0    0   281 2293321 203321 168790  11 43  22 24  0
17 23          0     394540        480   36921356    0    0   262 2227505 202744 163158  11 45  20 23  0
25 17          0     359404        480   36972884    0    0   205 2243941 201087 167874  11 42  23 24  0
21 20          0     367400        480   36934416    0    0   232 2310577 200666 156693  12 50  17 22  0
12 18          0     378048        480   36890624    0    0   232 2235455 196480 165692  11 44  22 24  0
17 18          0     372444        480   36874484    0    0   280 2185592 195885 168416  11 43  24 23  0
51 16          0     372760        480   36841148    0    0   245 2211135 195711 158012  11 46  23 20  0
23 17          0     375272        480   36847292    0    0   228 2323708 207079 164988  12 49  19 20  0
10 26          0     373540        480   36889240    0    0   341 2290586 201708 167954  11 46  19 23  0
44 14          0     303828        480   37020940    0    0   302 2180893 199958 168619  11 40  23 26  0
24 14          0     359320        480   36970272    0    0   345 2173978 197097 163760  11 47  22 20  0
32 19          0     355744        480   36917372    0    0   267 2276251 200123 167776  11 46  19 23  0
34 19          0     360824        480   36900032    0    0   259 2252057 200942 170912  11 43  21 25  0
13 17          0     361288        480   36919360    0    0   253 2149189 188426 170940  10 40  27 23  0
15 16          0     341828        480   36883988    0    0   317 2272817 205203 173732  11 48  19 21  0

Also FWIW, here's a typical "perf top" report with the patch applied:

    PerfTop:   17575 irqs/sec  kernel:80.4%  exact:  0.0% [1000Hz cycles],  (all, 24 CPUs)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              samples  pcnt function                    DSO
              _______ _____ ___________________________ ________________________________________________________________________________________

             27583.00 11.6% copy_user_generic_string    /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
             18387.00  7.8% __crc32c_le                 /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
             17264.00  7.3% _raw_spin_lock_irqsave      /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
             13890.00  5.9% ceph_crc32c_le              /usr/bin/ceph-osd
              5952.00  2.5% __copy_user_nocache         /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              4663.00  2.0% memmove                     /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              3141.00  1.3% _raw_spin_lock              /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2939.00  1.2% rb_prev                     /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2933.00  1.2% clflush_cache_range         /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2586.00  1.1% __list_del_entry            /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2357.00  1.0% intel_idle                  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2168.00  0.9% __set_page_dirty_nobuffers  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2110.00  0.9% get_pageblock_flags_group   /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2103.00  0.9% set_page_dirty              /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2090.00  0.9% futex_wake                  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1959.00  0.8% __memcpy                    /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1696.00  0.7% generic_bin_search          /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
              1628.00  0.7% btree_set_page_dirty        /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
              1606.00  0.7% _raw_spin_unlock_irqrestore /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1516.00  0.6% map_private_extent_buffer   /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
              1481.00  0.6% futex_requeue               /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1447.00  0.6% isolate_migratepages_range  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1365.00  0.6% __schedule                  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1361.00  0.6% memcpy                      /lib64/libc-2.12.so
              1263.00  0.5% trace_hardirqs_off          /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1238.00  0.5% tg_load_down                /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1220.00  0.5% move_freepages_block        /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1198.00  0.5% find_iova                   /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1139.00  0.5% process_responses           /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko

So far I've run a total of ~20 TB of data over fifty minutes
or so through 12 machines running this patch; no hint of
trouble, great performance.

Without this patch I would typically start having trouble
after just a few minutes of this load.

Thanks!

-- Jim



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-28 17:16   ` Jim Schutt
  0 siblings, 0 replies; 12+ messages in thread
From: Jim Schutt @ 2012-06-28 17:16 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, akpm, Mel Gorman, kamezawa.hiroyu, minchan, linux-kernel


On 06/27/2012 09:37 PM, Rik van Riel wrote:
> Order>  0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
>
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
>
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
>
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
>
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order>  0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
>
> Forced full (order == -1) compactions are left alone.
>
> Reported-by: Jim Schutt<jaschut@sandia.gov>
> Signed-off-by: Rik van Riel<riel@redhat.com>

Tested-by: Jim Schutt<jaschut@sandia.gov>

Please let me know if you further refine this patch
and would like me to test it with my workload.

> ---
> CAUTION: due to the time of day, I have only COMPILE tested this code
>
>   include/linux/mmzone.h |    4 ++++
>   mm/compaction.c        |   25 +++++++++++++++++++++++--
>   mm/internal.h          |    1 +
>   mm/page_alloc.c        |    4 ++++
>   4 files changed, 32 insertions(+), 2 deletions(-)

This patch is working great for me.

FWIW here's a typical vmstat report, after ~20 minutes of my Ceph load:

2012-06-28 10:59:16.887-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
  r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
23 21          0     393128        480   36883448    0    0     8 49583   90  273   6 25  58 11  0
  6 18          0     397892        480   36912832    0    0   281 2293321 203321 168790  11 43  22 24  0
17 23          0     394540        480   36921356    0    0   262 2227505 202744 163158  11 45  20 23  0
25 17          0     359404        480   36972884    0    0   205 2243941 201087 167874  11 42  23 24  0
21 20          0     367400        480   36934416    0    0   232 2310577 200666 156693  12 50  17 22  0
12 18          0     378048        480   36890624    0    0   232 2235455 196480 165692  11 44  22 24  0
17 18          0     372444        480   36874484    0    0   280 2185592 195885 168416  11 43  24 23  0
51 16          0     372760        480   36841148    0    0   245 2211135 195711 158012  11 46  23 20  0
23 17          0     375272        480   36847292    0    0   228 2323708 207079 164988  12 49  19 20  0
10 26          0     373540        480   36889240    0    0   341 2290586 201708 167954  11 46  19 23  0
44 14          0     303828        480   37020940    0    0   302 2180893 199958 168619  11 40  23 26  0
24 14          0     359320        480   36970272    0    0   345 2173978 197097 163760  11 47  22 20  0
32 19          0     355744        480   36917372    0    0   267 2276251 200123 167776  11 46  19 23  0
34 19          0     360824        480   36900032    0    0   259 2252057 200942 170912  11 43  21 25  0
13 17          0     361288        480   36919360    0    0   253 2149189 188426 170940  10 40  27 23  0
15 16          0     341828        480   36883988    0    0   317 2272817 205203 173732  11 48  19 21  0

Also FWIW, here's a typical "perf top" report with the patch applied:

    PerfTop:   17575 irqs/sec  kernel:80.4%  exact:  0.0% [1000Hz cycles],  (all, 24 CPUs)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

              samples  pcnt function                    DSO
              _______ _____ ___________________________ ________________________________________________________________________________________

             27583.00 11.6% copy_user_generic_string    /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
             18387.00  7.8% __crc32c_le                 /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
             17264.00  7.3% _raw_spin_lock_irqsave      /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
             13890.00  5.9% ceph_crc32c_le              /usr/bin/ceph-osd
              5952.00  2.5% __copy_user_nocache         /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              4663.00  2.0% memmove                     /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              3141.00  1.3% _raw_spin_lock              /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2939.00  1.2% rb_prev                     /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2933.00  1.2% clflush_cache_range         /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2586.00  1.1% __list_del_entry            /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2357.00  1.0% intel_idle                  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2168.00  0.9% __set_page_dirty_nobuffers  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2110.00  0.9% get_pageblock_flags_group   /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2103.00  0.9% set_page_dirty              /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              2090.00  0.9% futex_wake                  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1959.00  0.8% __memcpy                    /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1696.00  0.7% generic_bin_search          /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
              1628.00  0.7% btree_set_page_dirty        /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
              1606.00  0.7% _raw_spin_unlock_irqrestore /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1516.00  0.6% map_private_extent_buffer   /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
              1481.00  0.6% futex_requeue               /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1447.00  0.6% isolate_migratepages_range  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1365.00  0.6% __schedule                  /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1361.00  0.6% memcpy                      /lib64/libc-2.12.so
              1263.00  0.5% trace_hardirqs_off          /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1238.00  0.5% tg_load_down                /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1220.00  0.5% move_freepages_block        /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1198.00  0.5% find_iova                   /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
              1139.00  0.5% process_responses           /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko

So far I've run a total of ~20 TB of data over fifty minutes
or so through 12 machines running this patch; no hint of
trouble, great performance.

Without this patch I would typically start having trouble
after just a few minutes of this load.

Thanks!

-- Jim


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
  2012-06-28 17:16   ` Jim Schutt
@ 2012-06-28 17:25     ` Rik van Riel
  -1 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-06-28 17:25 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-mm, akpm, Mel Gorman, kamezawa.hiroyu, minchan, linux-kernel

On 06/28/2012 01:16 PM, Jim Schutt wrote:
>
> On 06/27/2012 09:37 PM, Rik van Riel wrote:
>> Order> 0 compaction stops when enough free pages of the correct
>> page order have been coalesced. When doing subsequent higher order
>> allocations, it is possible for compaction to be invoked many times.
>>
>> However, the compaction code always starts out looking for things to
>> compact at the start of the zone, and for free pages to compact things
>> to at the end of the zone.
>>
>> This can cause quadratic behaviour, with isolate_freepages starting
>> at the end of the zone each time, even though previous invocations
>> of the compaction code already filled up all free memory on that end
>> of the zone.
>>
>> This can cause isolate_freepages to take enormous amounts of CPU
>> with certain workloads on larger memory systems.
>>
>> The obvious solution is to have isolate_freepages remember where
>> it left off last time, and continue at that point the next time
>> it gets invoked for an order> 0 compaction. This could cause
>> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
>> together initially, in that case we restart from the end of the
>> zone and try once more.
>>
>> Forced full (order == -1) compactions are left alone.
>>
>> Reported-by: Jim Schutt<jaschut@sandia.gov>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
> Tested-by: Jim Schutt<jaschut@sandia.gov>
>
> Please let me know if you further refine this patch
> and would like me to test it with my workload.

Mel pointed out a serious problem with the way wrapping
cc->free_pfn back to the top of the zone is handled.

I will send you a new patch once I have a fix for that.

> So far I've run a total of ~20 TB of data over fifty minutes
> or so through 12 machines running this patch; no hint of
> trouble, great performance.
>
> Without this patch I would typically start having trouble
> after just a few minutes of this load.

Good to hear that!

Thank you for testing last night's version.

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-28 17:25     ` Rik van Riel
  0 siblings, 0 replies; 12+ messages in thread
From: Rik van Riel @ 2012-06-28 17:25 UTC (permalink / raw)
  To: Jim Schutt
  Cc: linux-mm, akpm, Mel Gorman, kamezawa.hiroyu, minchan, linux-kernel

On 06/28/2012 01:16 PM, Jim Schutt wrote:
>
> On 06/27/2012 09:37 PM, Rik van Riel wrote:
>> Order> 0 compaction stops when enough free pages of the correct
>> page order have been coalesced. When doing subsequent higher order
>> allocations, it is possible for compaction to be invoked many times.
>>
>> However, the compaction code always starts out looking for things to
>> compact at the start of the zone, and for free pages to compact things
>> to at the end of the zone.
>>
>> This can cause quadratic behaviour, with isolate_freepages starting
>> at the end of the zone each time, even though previous invocations
>> of the compaction code already filled up all free memory on that end
>> of the zone.
>>
>> This can cause isolate_freepages to take enormous amounts of CPU
>> with certain workloads on larger memory systems.
>>
>> The obvious solution is to have isolate_freepages remember where
>> it left off last time, and continue at that point the next time
>> it gets invoked for an order> 0 compaction. This could cause
>> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
>> together initially, in that case we restart from the end of the
>> zone and try once more.
>>
>> Forced full (order == -1) compactions are left alone.
>>
>> Reported-by: Jim Schutt<jaschut@sandia.gov>
>> Signed-off-by: Rik van Riel<riel@redhat.com>
>
> Tested-by: Jim Schutt<jaschut@sandia.gov>
>
> Please let me know if you further refine this patch
> and would like me to test it with my workload.

Mel pointed out a serious problem with the way wrapping
cc->free_pfn back to the top of the zone is handled.

I will send you a new patch once I have a fix for that.

> So far I've run a total of ~20 TB of data over fifty minutes
> or so through 12 machines running this patch; no hint of
> trouble, great performance.
>
> Without this patch I would typically start having trouble
> after just a few minutes of this load.

Good to hear that!

Thank you for testing last night's version.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
  2012-06-28 16:30     ` Rik van Riel
@ 2012-06-29  2:58       ` Kamezawa Hiroyuki
  -1 siblings, 0 replies; 12+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-29  2:58 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Mel Gorman, linux-mm, akpm, minchan, linux-kernel, jaschut

(2012/06/29 1:30), Rik van Riel wrote:
> On 06/28/2012 06:29 AM, Mel Gorman wrote:
>
>> Lets say there are two parallel compactions running. Process A meets
>> the migration PFN and moves to the end of the zone to restart. Process B
>> finishes scanning mid-way through the zone and updates last_free_pfn. This
>> will cause Process A to "jump" to where Process B left off which is not
>> necessarily desirable.
>>
>> Another side effect is that a workload that allocations/frees
>> aggressively will not compact as well as the "free" scanner is not
>> scanning the end of the zone each time. It would be better if
>> last_free_pfn was updated when a full pageblock was encountered
>>
>> So;
>>
>> 1. Initialise last_free_pfn to the end of the zone
>> 2. On compaction, scan from last_free_pfn and record where it started
>> 3. If a pageblock is full, update last_free_pfn
>> 4. If the migration and free scanner meet, reset last_free_pfn and
>>     the free scanner. Abort if the free scanner wraps to where it started
>>
>> Does that make sense?
>
> Yes, that makes sense.  We still have to keep track
> of whether we have wrapped around, but I guess that
> allows for a better name for the bool :)
>
> Maybe cc->wrapped?
>
> Does anyone have a better name?
>

cc->second_scan ? (I have no sense of naming ;)

> As for point (4), should we abort when we wrap
> around to where we started, or should we abort
> when free_pfn and migrate_pfn meet after we
> wrapped around?
>

I'd like to vote for aborting earlier.

Regards,
-Kame


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -mm] mm: have order>0 compaction start off where it left
@ 2012-06-29  2:58       ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-29  2:58 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Mel Gorman, linux-mm, akpm, minchan, linux-kernel, jaschut

(2012/06/29 1:30), Rik van Riel wrote:
> On 06/28/2012 06:29 AM, Mel Gorman wrote:
>
>> Lets say there are two parallel compactions running. Process A meets
>> the migration PFN and moves to the end of the zone to restart. Process B
>> finishes scanning mid-way through the zone and updates last_free_pfn. This
>> will cause Process A to "jump" to where Process B left off which is not
>> necessarily desirable.
>>
>> Another side effect is that a workload that allocations/frees
>> aggressively will not compact as well as the "free" scanner is not
>> scanning the end of the zone each time. It would be better if
>> last_free_pfn was updated when a full pageblock was encountered
>>
>> So;
>>
>> 1. Initialise last_free_pfn to the end of the zone
>> 2. On compaction, scan from last_free_pfn and record where it started
>> 3. If a pageblock is full, update last_free_pfn
>> 4. If the migration and free scanner meet, reset last_free_pfn and
>>     the free scanner. Abort if the free scanner wraps to where it started
>>
>> Does that make sense?
>
> Yes, that makes sense.  We still have to keep track
> of whether we have wrapped around, but I guess that
> allows for a better name for the bool :)
>
> Maybe cc->wrapped?
>
> Does anyone have a better name?
>

cc->second_scan ? (I have no sense of naming ;)

> As for point (4), should we abort when we wrap
> around to where we started, or should we abort
> when free_pfn and migrate_pfn meet after we
> wrapped around?
>

I'd like to vote for aborting earlier.

Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-06-29  3:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-28  3:37 [PATCH -mm] mm: have order>0 compaction start off where it left Rik van Riel
2012-06-28  3:37 ` Rik van Riel
2012-06-28 10:29 ` Mel Gorman
2012-06-28 10:29   ` Mel Gorman
2012-06-28 16:30   ` Rik van Riel
2012-06-28 16:30     ` Rik van Riel
2012-06-29  2:58     ` Kamezawa Hiroyuki
2012-06-29  2:58       ` Kamezawa Hiroyuki
2012-06-28 17:16 ` Jim Schutt
2012-06-28 17:16   ` Jim Schutt
2012-06-28 17:25   ` Rik van Riel
2012-06-28 17:25     ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.