All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
@ 2012-04-26  8:15 Bartlomiej Zolnierkiewicz
  2012-04-26 14:36 ` Mel Gorman
  2012-04-26 15:42 ` Rik van Riel
  0 siblings, 2 replies; 13+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-04-26  8:15 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman, Minchan Kim, Marek Szyprowski, Kyungmin Park

From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks

When Unmovable pages are freed from Unmovable type pageblock
(and some Movable type pages are left in it) the type of
the pageblock remains unchanged and therefore the pageblock
cannot be used as a migration target during compaction.

Fix it by:

* Adding enum compaction_mode (COMPACTION_ASYNC_PARTIAL,
  COMPACTION_ASYNC_FULL and COMPACTION_SYNC) and then converting
  sync field in struct compact_control to use it.

* Scanning the Unmovable pageblocks (during COMPACTION_ASYNC_FULL
  and COMPACTION_SYNC compactions) and building a count based on
  finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
  If all pages within the Unmovable pageblock are in one of those
  three sets change the whole pageblock type to Movable.


My particular test case (on a ARM EXYNOS4 device with 512 MiB,
which means 131072 standard 4KiB pages in 'Normal' zone) is to:
- allocate 120000 pages for kernel's usage
- free every second page (60000 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
(now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 11 successful allocations
- with compaction enabled - 14 successful allocations
- with this patch I'm able to get all 100 successful allocations

Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
v2: redo the patch basing on review from Mel Gorman.
    (http://marc.info/?l=linux-mm&m=133519311025444&w=2)
v3: apply review comments from Minchan Kim.
    (http://marc.info/?l=linux-mm&m=133531540308862&w=2)

 include/linux/compaction.h |   21 ++++++++++++++-
 mm/compaction.c            |   61 +++++++++++++++++++++++++++++++++------------
 mm/internal.h              |    6 +++-
 mm/page_alloc.c            |   24 ++++++++---------
 4 files changed, 82 insertions(+), 30 deletions(-)

Index: b/include/linux/compaction.h
===================================================================
--- a/include/linux/compaction.h	2012-04-25 17:57:06.000000000 +0200
+++ b/include/linux/compaction.h	2012-04-26 09:58:54.272510940 +0200
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,21 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * ASYNC_PARTIAL uses asynchronous migration and scans only Movable
+ * pageblocks for pages to migrate from, ASYNC_FULL additionally
+ * scans Unmovable pageblocks (for use as migration target pages)
+ * and converts them to Movable ones if possible, SYNC uses
+ * synchronous migration, scans all pageblocks for pages to migrate
+ * from and also scans Unmovable pageblocks (for use as migration
+ * target pages) and converts them to Movable ones if possible.
+ */
+enum compaction_mode {
+	COMPACTION_ASYNC_PARTIAL,
+	COMPACTION_ASYNC_FULL,
+	COMPACTION_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
@@ -22,7 +39,7 @@
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *mask,
-			bool sync);
+			enum compaction_mode mode);
 extern int compact_pgdat(pg_data_t *pgdat, int order);
 extern unsigned long compaction_suitable(struct zone *zone, int order);
 
@@ -64,7 +81,7 @@
 #else
 static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *nodemask,
-			bool sync)
+			enum compaction_mode mode)
 {
 	return COMPACT_CONTINUE;
 }
Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-04-25 17:57:09.000000000 +0200
+++ b/mm/compaction.c	2012-04-26 10:14:09.880510831 +0200
@@ -235,7 +235,7 @@
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACTION_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -303,7 +303,8 @@
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACTION_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -324,7 +325,7 @@
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACTION_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		/* Try isolate the page */
@@ -357,9 +358,33 @@
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+static bool rescue_unmovable_pageblock(struct page *page)
+{
+	unsigned long pfn, start_pfn, end_pfn;
+
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages;
+
+	for (pfn = start_pfn; pfn < end_pfn;) {
+		page = pfn_to_page(pfn);
+
+		if (PageBuddy(page))
+			pfn += (1 << page_order(page));
+		else if (page_count(page) == 0 || PageLRU(page))
+			pfn++;
+		else
+			return false;
+	}
+
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+	return true;
+}
 
 /* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+static bool suitable_migration_target(struct page *page,
+				      enum compaction_mode mode)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
@@ -376,6 +401,11 @@
 	if (migrate_async_suitable(migratetype))
 		return true;
 
+	if ((mode == COMPACTION_ASYNC_FULL || mode == COMPACTION_SYNC) &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    rescue_unmovable_pageblock(page))
+		return true;
+
 	/* Otherwise skip the block */
 	return false;
 }
@@ -434,7 +464,7 @@
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		if (!suitable_migration_target(page, cc->mode))
 			continue;
 
 		/*
@@ -445,7 +475,7 @@
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		if (suitable_migration_target(page, cc->mode)) {
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
@@ -682,8 +712,9 @@
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)cc, false,
+			(cc->mode == COMPACTION_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -712,7 +743,7 @@
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compaction_mode mode)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -720,7 +751,7 @@
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
@@ -742,7 +773,7 @@
  */
 unsigned long try_to_compact_pages(struct zonelist *zonelist,
 			int order, gfp_t gfp_mask, nodemask_t *nodemask,
-			bool sync)
+			enum compaction_mode mode)
 {
 	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
 	int may_enter_fs = gfp_mask & __GFP_FS;
@@ -766,7 +797,7 @@
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		status = compact_zone_order(zone, order, gfp_mask, mode);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
@@ -805,7 +836,7 @@
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACTION_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -820,7 +851,7 @@
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACTION_ASYNC_FULL,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -830,7 +861,7 @@
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACTION_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h	2012-04-25 17:57:09.000000000 +0200
+++ b/mm/internal.h	2012-04-26 09:58:15.024510943 +0200
@@ -94,6 +94,9 @@
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,7 +123,7 @@
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compaction_mode mode;	/* partial/full/sync compaction */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-04-25 17:57:09.000000000 +0200
+++ b/mm/page_alloc.c	2012-04-26 10:02:48.756510912 +0200
@@ -232,7 +232,7 @@
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -967,8 +967,8 @@
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -2074,7 +2074,7 @@
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, bool sync_migration,
+	int migratetype, enum compaction_mode migration_mode,
 	bool *deferred_compaction,
 	unsigned long *did_some_progress)
 {
@@ -2090,7 +2090,7 @@
 
 	current->flags |= PF_MEMALLOC;
 	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
-						nodemask, sync_migration);
+						nodemask, migration_mode);
 	current->flags &= ~PF_MEMALLOC;
 	if (*did_some_progress != COMPACT_SKIPPED) {
 
@@ -2122,7 +2122,7 @@
 		 * As async compaction considers a subset of pageblocks, only
 		 * defer if the failure was a sync compaction failure.
 		 */
-		if (sync_migration)
+		if (migration_mode == COMPACTION_SYNC)
 			defer_compaction(preferred_zone, order);
 
 		cond_resched();
@@ -2135,7 +2135,7 @@
 __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
-	int migratetype, bool sync_migration,
+	int migratetype, enum compaction_mode migration_mode,
 	bool *deferred_compaction,
 	unsigned long *did_some_progress)
 {
@@ -2298,7 +2298,7 @@
 	int alloc_flags;
 	unsigned long pages_reclaimed = 0;
 	unsigned long did_some_progress;
-	bool sync_migration = false;
+	enum compaction_mode migration_mode = COMPACTION_ASYNC_PARTIAL;
 	bool deferred_compaction = false;
 
 	/*
@@ -2380,12 +2380,12 @@
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, sync_migration,
+					migratetype, migration_mode,
 					&deferred_compaction,
 					&did_some_progress);
 	if (page)
 		goto got_pg;
-	sync_migration = true;
+	migration_mode = COMPACTION_SYNC;
 
 	/*
 	 * If compaction is deferred for high-order allocations, it is because
@@ -2463,7 +2463,7 @@
 					zonelist, high_zoneidx,
 					nodemask,
 					alloc_flags, preferred_zone,
-					migratetype, sync_migration,
+					migratetype, migration_mode,
 					&deferred_compaction,
 					&did_some_progress);
 		if (page)
@@ -5673,7 +5673,7 @@
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACTION_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26  8:15 [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks Bartlomiej Zolnierkiewicz
@ 2012-04-26 14:36 ` Mel Gorman
  2012-04-26 15:53   ` Rik van Riel
  2012-04-26 15:42 ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2012-04-26 14:36 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, Minchan Kim, Marek Szyprowski, Kyungmin Park

On Thu, Apr 26, 2012 at 10:15:54AM +0200, Bartlomiej Zolnierkiewicz wrote:
> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Subject: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
> 
> When Unmovable pages are freed from Unmovable type pageblock
> (and some Movable type pages are left in it) the type of
> the pageblock remains unchanged and therefore the pageblock
> cannot be used as a migration target during compaction.
> 

Add a note saying that waiting until an allocation takes ownership of
the block may take too long.

> Fix it by:
> 
> * Adding enum compaction_mode (COMPACTION_ASYNC_PARTIAL,
>   COMPACTION_ASYNC_FULL and COMPACTION_SYNC) and then converting
>   sync field in struct compact_control to use it.
> 

Other compaction constants use just COMPACT such as COMPACT_CONTINUE,
COMPACT_SKIPPED and so on. I suggest you do the same and use
COMPACT_ASYNC_PARTIAL, COMPACT_ASYNC_FULL and COMPACT_SYNC.

> * Scanning the Unmovable pageblocks (during COMPACTION_ASYNC_FULL
>   and COMPACTION_SYNC compactions) and building a count based on
>   finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
>   If all pages within the Unmovable pageblock are in one of those
>   three sets change the whole pageblock type to Movable.
> 

Good.

> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> - allocate 120000 pages for kernel's usage
> - free every second page (60000 pages) of memory just allocated
> - allocate and use 60000 pages from user space
> - free remaining 60000 pages of kernel memory
> (now we have fragmented memory occupied mostly by user space pages)
> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> 
> The results:
> - with compaction disabled I get 11 successful allocations
> - with compaction enabled - 14 successful allocations
> - with this patch I'm able to get all 100 successful allocations
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
> v2: redo the patch basing on review from Mel Gorman.
>     (http://marc.info/?l=linux-mm&m=133519311025444&w=2)
> v3: apply review comments from Minchan Kim.
>     (http://marc.info/?l=linux-mm&m=133531540308862&w=2)
> 
>  include/linux/compaction.h |   21 ++++++++++++++-
>  mm/compaction.c            |   61 +++++++++++++++++++++++++++++++++------------
>  mm/internal.h              |    6 +++-
>  mm/page_alloc.c            |   24 ++++++++---------
>  4 files changed, 82 insertions(+), 30 deletions(-)
> 
> Index: b/include/linux/compaction.h
> ===================================================================
> --- a/include/linux/compaction.h	2012-04-25 17:57:06.000000000 +0200
> +++ b/include/linux/compaction.h	2012-04-26 09:58:54.272510940 +0200
> @@ -1,6 +1,8 @@
>  #ifndef _LINUX_COMPACTION_H
>  #define _LINUX_COMPACTION_H
>  
> +#include <linux/node.h>
> +
>  /* Return values for compact_zone() and try_to_compact_pages() */
>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>  #define COMPACT_SKIPPED		0
> @@ -11,6 +13,21 @@
>  /* The full zone was compacted */
>  #define COMPACT_COMPLETE	3
>  
> +/*
> + * ASYNC_PARTIAL uses asynchronous migration and scans only Movable
> + * pageblocks for pages to migrate from, ASYNC_FULL additionally
> + * scans Unmovable pageblocks (for use as migration target pages)
> + * and converts them to Movable ones if possible, SYNC uses
> + * synchronous migration, scans all pageblocks for pages to migrate
> + * from and also scans Unmovable pageblocks (for use as migration
> + * target pages) and converts them to Movable ones if possible.
> + */

This comment could be less ambiguous. How about the following?

/*
 * compaction supports three modes
 *
 * COMPACT_ASYNC_PARTIAL uses asynchronous migration and only scans
 *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
 * COMPACT_ASYNC_FULL uses asynchronous migration and only scans
 *    MIGRATE_MOVABLE pageblocks as migration sources.
 *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
 *    targets and convers them to MIGRATE_MOVABLE if possible
 * COMPACT_SYNC uses synchronous migration and scans all pageblocks
 */

> +enum compaction_mode {
> +	COMPACTION_ASYNC_PARTIAL,
> +	COMPACTION_ASYNC_FULL,
> +	COMPACTION_SYNC,
> +};
> +
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> @@ -22,7 +39,7 @@
>  extern int fragmentation_index(struct zone *zone, unsigned int order);
>  extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  			int order, gfp_t gfp_mask, nodemask_t *mask,
> -			bool sync);
> +			enum compaction_mode mode);

Same for the naming. The control structure is called compact_control so
make the mode compact_mode as well for consistency.

>  extern int compact_pgdat(pg_data_t *pgdat, int order);
>  extern unsigned long compaction_suitable(struct zone *zone, int order);
>  
> @@ -64,7 +81,7 @@
>  #else
>  static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  			int order, gfp_t gfp_mask, nodemask_t *nodemask,
> -			bool sync)
> +			enum compaction_mode mode)
>  {
>  	return COMPACT_CONTINUE;
>  }
> Index: b/mm/compaction.c
> ===================================================================
> --- a/mm/compaction.c	2012-04-25 17:57:09.000000000 +0200
> +++ b/mm/compaction.c	2012-04-26 10:14:09.880510831 +0200
> @@ -235,7 +235,7 @@
>  	 */
>  	while (unlikely(too_many_isolated(zone))) {
>  		/* async migration should just abort */
> -		if (!cc->sync)
> +		if (cc->mode != COMPACTION_SYNC)
>  			return 0;
>  
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
> @@ -303,7 +303,8 @@
>  		 * satisfies the allocation
>  		 */
>  		pageblock_nr = low_pfn >> pageblock_order;
> -		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
> +		if (cc->mode != COMPACTION_SYNC &&
> +		    last_pageblock_nr != pageblock_nr &&
>  		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
>  			low_pfn += pageblock_nr_pages;
>  			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
> @@ -324,7 +325,7 @@
>  			continue;
>  		}
>  
> -		if (!cc->sync)
> +		if (cc->mode != COMPACTION_SYNC)
>  			mode |= ISOLATE_ASYNC_MIGRATE;
>  
>  		/* Try isolate the page */
> @@ -357,9 +358,33 @@
>  
>  #endif /* CONFIG_COMPACTION || CONFIG_CMA */
>  #ifdef CONFIG_COMPACTION
> +static bool rescue_unmovable_pageblock(struct page *page)
> +{
> +	unsigned long pfn, start_pfn, end_pfn;
> +
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages;
> +
> +	for (pfn = start_pfn; pfn < end_pfn;) {
> +		page = pfn_to_page(pfn);
> +

pfn_to_page() is not cheap at all.

What you need to do avoid the lookup every time and also deal with
pageblocks that straddle zones is something like this

start_page = pfn_to_page(start_pfn)
end_page = pfn_to_page(end_pfn)
/* Do not deal with pageblocks that overlap zones */
if (page_zone(start_page) != page_zone(end_page))
	return;

for (page = start_page, pfn = start_pfn; page < end_page; pfn++, page++)

This avoids doing a PFN lookup every time and will be faster. On ARM,
you also have to call pfn_valid_within so

	if (!pfn_valid_within(pfn))
		continue;

> +		if (PageBuddy(page))
> +			pfn += (1 << page_order(page));

If you use the for loop I have above, you will need to update both pfn
and page and put in a -1

pfn += (1 << page_order(page)) - 1;
page += (1 << page_order(page)) - 1;


> +		else if (page_count(page) == 0 || PageLRU(page))
> +			pfn++;

This would become "continue"

> +		else
> +			return false;

The else is unnecessary there, just return false.

> +	}
> +
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +	return true;
> +}

Despite all my complaining, this function looks more or less like what I
expected.

>  
>  /* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +static bool suitable_migration_target(struct page *page,
> +				      enum compaction_mode mode)
>  {
>  
>  	int migratetype = get_pageblock_migratetype(page);
> @@ -376,6 +401,11 @@
>  	if (migrate_async_suitable(migratetype))
>  		return true;
>  
> +	if ((mode == COMPACTION_ASYNC_FULL || mode == COMPACTION_SYNC) &&

mode != COMPACT_ASYNC_PARTIAL ?

> +	    migratetype == MIGRATE_UNMOVABLE &&
> +	    rescue_unmovable_pageblock(page))
> +		return true;
> +
>  	/* Otherwise skip the block */
>  	return false;
>  }
> @@ -434,7 +464,7 @@
>  			continue;
>  
>  		/* Check the block is suitable for migration */
> -		if (!suitable_migration_target(page))
> +		if (!suitable_migration_target(page, cc->mode))
>  			continue;
>  
>  		/*
> @@ -445,7 +475,7 @@
>  		 */
>  		isolated = 0;
>  		spin_lock_irqsave(&zone->lock, flags);
> -		if (suitable_migration_target(page)) {
> +		if (suitable_migration_target(page, cc->mode)) {
>  			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
>  			isolated = isolate_freepages_block(pfn, end_pfn,
>  							   freelist, false);
> @@ -682,8 +712,9 @@
>  
>  		nr_migrate = cc->nr_migratepages;
>  		err = migrate_pages(&cc->migratepages, compaction_alloc,
> -				(unsigned long)cc, false,
> -				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
> +			(unsigned long)cc, false,
> +			(cc->mode == COMPACTION_SYNC) ? MIGRATE_SYNC_LIGHT
> +						      : MIGRATE_ASYNC);
>  		update_nr_listpages(cc);
>  		nr_remaining = cc->nr_migratepages;
>  
> @@ -712,7 +743,7 @@
>  
>  static unsigned long compact_zone_order(struct zone *zone,
>  				 int order, gfp_t gfp_mask,
> -				 bool sync)
> +				 enum compaction_mode mode)
>  {
>  	struct compact_control cc = {
>  		.nr_freepages = 0,
> @@ -720,7 +751,7 @@
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
> -		.sync = sync,
> +		.mode = mode,
>  	};
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
> @@ -742,7 +773,7 @@
>   */
>  unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  			int order, gfp_t gfp_mask, nodemask_t *nodemask,
> -			bool sync)
> +			enum compaction_mode mode)
>  {
>  	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
>  	int may_enter_fs = gfp_mask & __GFP_FS;
> @@ -766,7 +797,7 @@
>  								nodemask) {
>  		int status;
>  
> -		status = compact_zone_order(zone, order, gfp_mask, sync);
> +		status = compact_zone_order(zone, order, gfp_mask, mode);
>  		rc = max(status, rc);
>  
>  		/* If a normal allocation would succeed, stop compacting */
> @@ -805,7 +836,7 @@
>  			if (ok && cc->order > zone->compact_order_failed)
>  				zone->compact_order_failed = cc->order + 1;
>  			/* Currently async compaction is never deferred. */
> -			else if (!ok && cc->sync)
> +			else if (!ok && cc->mode == COMPACTION_SYNC)
>  				defer_compaction(zone, cc->order);
>  		}
>  
> @@ -820,7 +851,7 @@
>  {
>  	struct compact_control cc = {
>  		.order = order,
> -		.sync = false,
> +		.mode = COMPACTION_ASYNC_FULL,
>  	};
>  
>  	return __compact_pgdat(pgdat, &cc);
> @@ -830,7 +861,7 @@
>  {
>  	struct compact_control cc = {
>  		.order = -1,
> -		.sync = true,
> +		.mode = COMPACTION_SYNC,
>  	};
>  
>  	return __compact_pgdat(NODE_DATA(nid), &cc);
> Index: b/mm/internal.h
> ===================================================================
> --- a/mm/internal.h	2012-04-25 17:57:09.000000000 +0200
> +++ b/mm/internal.h	2012-04-26 09:58:15.024510943 +0200
> @@ -94,6 +94,9 @@
>  /*
>   * in mm/page_alloc.c
>   */
> +extern void set_pageblock_migratetype(struct page *page, int migratetype);
> +extern int move_freepages_block(struct zone *zone, struct page *page,
> +				int migratetype);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> @@ -101,6 +104,7 @@
>  #endif
>  
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +#include <linux/compaction.h>
>  
>  /*
>   * in mm/compaction.c
> @@ -119,7 +123,7 @@
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> -	bool sync;			/* Synchronous migration */
> +	enum compaction_mode mode;	/* partial/full/sync compaction */
>  
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c	2012-04-25 17:57:09.000000000 +0200
> +++ b/mm/page_alloc.c	2012-04-26 10:02:48.756510912 +0200
> @@ -232,7 +232,7 @@
>  
>  int page_group_by_mobility_disabled __read_mostly;
>  
> -static void set_pageblock_migratetype(struct page *page, int migratetype)
> +void set_pageblock_migratetype(struct page *page, int migratetype)
>  {
>  
>  	if (unlikely(page_group_by_mobility_disabled))
> @@ -967,8 +967,8 @@
>  	return pages_moved;
>  }
>  
> -static int move_freepages_block(struct zone *zone, struct page *page,
> -				int migratetype)
> +int move_freepages_block(struct zone *zone, struct page *page,
> +			 int migratetype)
>  {
>  	unsigned long start_pfn, end_pfn;
>  	struct page *start_page, *end_page;
> @@ -2074,7 +2074,7 @@
>  __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  	struct zonelist *zonelist, enum zone_type high_zoneidx,
>  	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
> -	int migratetype, bool sync_migration,
> +	int migratetype, enum compaction_mode migration_mode,
>  	bool *deferred_compaction,
>  	unsigned long *did_some_progress)
>  {
> @@ -2090,7 +2090,7 @@
>  
>  	current->flags |= PF_MEMALLOC;
>  	*did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
> -						nodemask, sync_migration);
> +						nodemask, migration_mode);
>  	current->flags &= ~PF_MEMALLOC;
>  	if (*did_some_progress != COMPACT_SKIPPED) {
>  
> @@ -2122,7 +2122,7 @@
>  		 * As async compaction considers a subset of pageblocks, only
>  		 * defer if the failure was a sync compaction failure.
>  		 */
> -		if (sync_migration)
> +		if (migration_mode == COMPACTION_SYNC)
>  			defer_compaction(preferred_zone, order);
>  
>  		cond_resched();
> @@ -2135,7 +2135,7 @@
>  __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
>  	struct zonelist *zonelist, enum zone_type high_zoneidx,
>  	nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
> -	int migratetype, bool sync_migration,
> +	int migratetype, enum compaction_mode migration_mode,
>  	bool *deferred_compaction,
>  	unsigned long *did_some_progress)
>  {
> @@ -2298,7 +2298,7 @@
>  	int alloc_flags;
>  	unsigned long pages_reclaimed = 0;
>  	unsigned long did_some_progress;
> -	bool sync_migration = false;
> +	enum compaction_mode migration_mode = COMPACTION_ASYNC_PARTIAL;
>  	bool deferred_compaction = false;
>  
>  	/*
> @@ -2380,12 +2380,12 @@
>  					zonelist, high_zoneidx,
>  					nodemask,
>  					alloc_flags, preferred_zone,
> -					migratetype, sync_migration,
> +					migratetype, migration_mode,
>  					&deferred_compaction,
>  					&did_some_progress);
>  	if (page)
>  		goto got_pg;
> -	sync_migration = true;
> +	migration_mode = COMPACTION_SYNC;
>  

Hmm, at what point does COMPACT_ASYNC_FULL get used? I see it gets
used for the proc interface but it's not used via the page allocator at
all.

Minimally I was expecting to see if being used from the page allocator.

A better option might be to track the number of MIGRATE_UNMOVABLE blocks that
were skipped over during COMPACT_ASYNC_PARTIAL and if it was a high
percentage and it looked like compaction failed then to retry with
COMPACT_ASYNC_FULL. If you took this option, try_to_compact_pages()
would still only take sync as a parameter and keep the decision within
compaction.c

>  	/*
>  	 * If compaction is deferred for high-order allocations, it is because
> @@ -2463,7 +2463,7 @@
>  					zonelist, high_zoneidx,
>  					nodemask,
>  					alloc_flags, preferred_zone,
> -					migratetype, sync_migration,
> +					migratetype, migration_mode,
>  					&deferred_compaction,
>  					&did_some_progress);
>  		if (page)
> @@ -5673,7 +5673,7 @@
>  		.nr_migratepages = 0,
>  		.order = -1,
>  		.zone = page_zone(pfn_to_page(start)),
> -		.sync = true,
> +		.mode = COMPACTION_SYNC,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
>  

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26  8:15 [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks Bartlomiej Zolnierkiewicz
  2012-04-26 14:36 ` Mel Gorman
@ 2012-04-26 15:42 ` Rik van Riel
  1 sibling, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2012-04-26 15:42 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, Mel Gorman, Minchan Kim, Marek Szyprowski, Kyungmin Park

On 04/26/2012 04:15 AM, Bartlomiej Zolnierkiewicz wrote:
> From: Bartlomiej Zolnierkiewicz<b.zolnierkie@samsung.com>
> Subject: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks

> The results:
> - with compaction disabled I get 11 successful allocations
> - with compaction enabled - 14 successful allocations
> - with this patch I'm able to get all 100 successful allocations

Nice!

I am looking forward to a patch with the cleanups
suggested by Mel :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26 14:36 ` Mel Gorman
@ 2012-04-26 15:53   ` Rik van Riel
  2012-04-26 16:47     ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2012-04-26 15:53 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, Minchan Kim,
	Marek Szyprowski, Kyungmin Park

On 04/26/2012 10:36 AM, Mel Gorman wrote:

> Hmm, at what point does COMPACT_ASYNC_FULL get used? I see it gets
> used for the proc interface but it's not used via the page allocator at
> all.

He is using COMPACT_SYNC for the proc interface, and
COMPACT_ASYNC_FULL from kswapd.

> Minimally I was expecting to see if being used from the page allocator.

Makes sense, especially if we get the CPU overhead
saving stuff that we talked about at LSF to work :)

> A better option might be to track the number of MIGRATE_UNMOVABLE blocks that
> were skipped over during COMPACT_ASYNC_PARTIAL and if it was a high
> percentage and it looked like compaction failed then to retry with
> COMPACT_ASYNC_FULL. If you took this option, try_to_compact_pages()
> would still only take sync as a parameter and keep the decision within
> compaction.c

This I don't get.

If we have a small number of MIGRATE_UNMOVABLE blocks,
is it worth skipping over them?

If we have really large number of MIGRATE_UNMOVABLE blocks,
did we let things get out of hand?    By giving the page
allocator this many unmovable blocks to choose from, we
could have ended up with actually non-compactable memory.

If we have a medium number of MIGRATE_UNMOVABLE blocks,
is it worth doing a restart and scanning all the movable
blocks again?

In other words, could it be better to always try to
rescue the unmovable blocks?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26 15:53   ` Rik van Riel
@ 2012-04-26 16:47     ` Mel Gorman
  2012-04-26 18:52       ` Rik van Riel
  2012-04-27  0:58       ` Minchan Kim
  0 siblings, 2 replies; 13+ messages in thread
From: Mel Gorman @ 2012-04-26 16:47 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, Minchan Kim,
	Marek Szyprowski, Kyungmin Park

On Thu, Apr 26, 2012 at 11:53:47AM -0400, Rik van Riel wrote:
> On 04/26/2012 10:36 AM, Mel Gorman wrote:
> 
> >Hmm, at what point does COMPACT_ASYNC_FULL get used? I see it gets
> >used for the proc interface but it's not used via the page allocator at
> >all.
> 
> He is using COMPACT_SYNC for the proc interface, and
> COMPACT_ASYNC_FULL from kswapd.
> 

Ah, yes, of course. My bad.

Even that is not particularly satisfactory though as it's depending on
kswapd to do the work so it's a bit of a race to see if kswapd completes
the job before the page allocator needs it.

> >Minimally I was expecting to see if being used from the page allocator.
> 
> Makes sense, especially if we get the CPU overhead
> saving stuff that we talked about at LSF to work :)
> 

True.

> >A better option might be to track the number of MIGRATE_UNMOVABLE blocks that
> >were skipped over during COMPACT_ASYNC_PARTIAL and if it was a high
> >percentage and it looked like compaction failed then to retry with
> >COMPACT_ASYNC_FULL. If you took this option, try_to_compact_pages()
> >would still only take sync as a parameter and keep the decision within
> >compaction.c
> 
> This I don't get.
> 
> If we have a small number of MIGRATE_UNMOVABLE blocks,
> is it worth skipping over them?
> 

We do not know in advance how many MIGRATE_UNMOVABLE blocks are going to
be encountered. Even if we kept track of the number of MIGRATE_UNMOVABLE
pageblocks in the zone, it would not tell us how many pageblocks the
scanner will see.

> If we have really large number of MIGRATE_UNMOVABLE blocks,
> did we let things get out of hand?  By giving the page
> allocator this many unmovable blocks to choose from, we
> could have ended up with actually non-compactable memory.
> 

If there are a large number of MIGRATE_UNMOVABLE blocks, each with a single
unmovable page at the end of the block then the worst case situation
is that the second pass (COMPACT_ASYNC_PARTIAL being the first pass)
is useless and slow due to the scanning within MIGRATE_UNMOVABLE blocks.

When this situation occurs, I would also expect that the third pass
(COMPACT_SYNC) will also fail and then compaction will get deferred to
limit further damage.

In the average case, I would expect the large number of
MIGRATE_UNMOVABLE blocks to also be partially populated which means that
scans of these blocks will also be partial limiting the amount of
scanning we do. How much this is limited is impossible to estimate as
it's dependant on the workload.

> If we have a medium number of MIGRATE_UNMOVABLE blocks,
> is it worth doing a restart and scanning all the movable
> blocks again?
> 

This goes back to the same problem of we do not know how many
MIGRATE_UNMOVABLE pageblocks are going to be encountered in advance However,
I see your point.

Instead of COMPACT_ASYNC_PARTIAL and COMPACT_ASYNC_FULL should we have
COMPACT_ASYNC_MOVABLE and COMPACT_ASYNC_UNMOVABLE? The first pass from
the page allocator (COMPACT_ASYNC_MOVABLE) would only consider MOVABLE
blocks as migration targets. The second pass (COMPACT_ASYNC_UNMOVABLE)
would examine UNMOVABLE blocks, rescue them and use what blocks it
rescues as migration targets. The third pass (COMPACT_SYNC) would work
as it does currently. kswapd would only ever use COMPACT_ASYNC_MOVABLE.

That would avoid rescanning the movable blocks uselessly on the second
pass but should still work for Bartlomiej's workload.

What do you think?

> In other words, could it be better to always try to
> rescue the unmovable blocks?

I do not think we should always scan within unmovable blocks on the
first pass. I strongly suspect it would lead to excessive amounts of CPU
time spent in mm/compaction.c.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26 16:47     ` Mel Gorman
@ 2012-04-26 18:52       ` Rik van Riel
  2012-04-27  9:45         ` Mel Gorman
  2012-04-27  0:58       ` Minchan Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2012-04-26 18:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, Minchan Kim,
	Marek Szyprowski, Kyungmin Park

On 04/26/2012 12:47 PM, Mel Gorman wrote:

> Instead of COMPACT_ASYNC_PARTIAL and COMPACT_ASYNC_FULL should we have
> COMPACT_ASYNC_MOVABLE and COMPACT_ASYNC_UNMOVABLE? The first pass from
> the page allocator (COMPACT_ASYNC_MOVABLE) would only consider MOVABLE
> blocks as migration targets. The second pass (COMPACT_ASYNC_UNMOVABLE)
> would examine UNMOVABLE blocks, rescue them and use what blocks it
> rescues as migration targets. The third pass (COMPACT_SYNC) would work
> as it does currently. kswapd would only ever use COMPACT_ASYNC_MOVABLE.
>
> That would avoid rescanning the movable blocks uselessly on the second
> pass but should still work for Bartlomiej's workload.
>
> What do you think?

This makes sense.

>> In other words, could it be better to always try to
>> rescue the unmovable blocks?
>
> I do not think we should always scan within unmovable blocks on the
> first pass. I strongly suspect it would lead to excessive amounts of CPU
> time spent in mm/compaction.c.

Maybe my systems are not typical.  I have not seen
more than about 10% of the memory blocks marked as
unmovable in my system.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26 16:47     ` Mel Gorman
  2012-04-26 18:52       ` Rik van Riel
@ 2012-04-27  0:58       ` Minchan Kim
  2012-04-27  9:56         ` Mel Gorman
  1 sibling, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2012-04-27  0:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Bartlomiej Zolnierkiewicz, linux-mm,
	Marek Szyprowski, Kyungmin Park

On 04/27/2012 01:47 AM, Mel Gorman wrote:

> On Thu, Apr 26, 2012 at 11:53:47AM -0400, Rik van Riel wrote:
>> On 04/26/2012 10:36 AM, Mel Gorman wrote:
>>
>>> Hmm, at what point does COMPACT_ASYNC_FULL get used? I see it gets
>>> used for the proc interface but it's not used via the page allocator at
>>> all.
>>
>> He is using COMPACT_SYNC for the proc interface, and
>> COMPACT_ASYNC_FULL from kswapd.
>>
> 
> Ah, yes, of course. My bad.
> 
> Even that is not particularly satisfactory though as it's depending on
> kswapd to do the work so it's a bit of a race to see if kswapd completes
> the job before the page allocator needs it.


It was a direction by my review.
In my point, I don't want to add more latency in direct reclaim async path if we can
although reclaim is already slow path.

If async direct reclaim fails to compact memory with COMPACT_ASYNC_PARTIAL,
it ends up trying to compact memory with COMPACT_SYNC, again so it would
be no problem to allocate big order page and it's as-it-is approach by
async and sync mode.

While latency is important in direct reclaim, kswapd isn't.
So I think using COMPACT_ASYNC_FULL in kswapd makes sense.

> 
>>> Minimally I was expecting to see if being used from the page allocator.
>>
>> Makes sense, especially if we get the CPU overhead
>> saving stuff that we talked about at LSF to work :)
>>
> 
> True.
> 
>>> A better option might be to track the number of MIGRATE_UNMOVABLE blocks that
>>> were skipped over during COMPACT_ASYNC_PARTIAL and if it was a high
>>> percentage and it looked like compaction failed then to retry with
>>> COMPACT_ASYNC_FULL. If you took this option, try_to_compact_pages()
>>> would still only take sync as a parameter and keep the decision within
>>> compaction.c
>>
>> This I don't get.
>>
>> If we have a small number of MIGRATE_UNMOVABLE blocks,
>> is it worth skipping over them?
>>
> 
> We do not know in advance how many MIGRATE_UNMOVABLE blocks are going to
> be encountered. Even if we kept track of the number of MIGRATE_UNMOVABLE
> pageblocks in the zone, it would not tell us how many pageblocks the
> scanner will see.
> 
>> If we have really large number of MIGRATE_UNMOVABLE blocks,
>> did we let things get out of hand?  By giving the page
>> allocator this many unmovable blocks to choose from, we
>> could have ended up with actually non-compactable memory.
>>
> 
> If there are a large number of MIGRATE_UNMOVABLE blocks, each with a single
> unmovable page at the end of the block then the worst case situation
> is that the second pass (COMPACT_ASYNC_PARTIAL being the first pass)
> is useless and slow due to the scanning within MIGRATE_UNMOVABLE blocks.
> 
> When this situation occurs, I would also expect that the third pass
> (COMPACT_SYNC) will also fail and then compaction will get deferred to
> limit further damage.
> 
> In the average case, I would expect the large number of
> MIGRATE_UNMOVABLE blocks to also be partially populated which means that
> scans of these blocks will also be partial limiting the amount of
> scanning we do. How much this is limited is impossible to estimate as
> it's dependant on the workload.
> 
>> If we have a medium number of MIGRATE_UNMOVABLE blocks,
>> is it worth doing a restart and scanning all the movable
>> blocks again?
>>
> 
> This goes back to the same problem of we do not know how many
> MIGRATE_UNMOVABLE pageblocks are going to be encountered in advance However,
> I see your point.
> 
> Instead of COMPACT_ASYNC_PARTIAL and COMPACT_ASYNC_FULL should we have
> COMPACT_ASYNC_MOVABLE and COMPACT_ASYNC_UNMOVABLE? The first pass from
> the page allocator (COMPACT_ASYNC_MOVABLE) would only consider MOVABLE
> blocks as migration targets. The second pass (COMPACT_ASYNC_UNMOVABLE)
> would examine UNMOVABLE blocks, rescue them and use what blocks it
> rescues as migration targets. The third pass (COMPACT_SYNC) would work


It does make sense.

> as it does currently. kswapd would only ever use COMPACT_ASYNC_MOVABLE.


I don't get it. Why do kswapd use only COMPACT_ASYNC_MOVALBE?
As I mentioned, latency isn't important in kswapd so I think kswapd always
rescur unmovable block would help direct reclaim's first path(COMPACT_ASYNC
_MOVABLE)'s success rate.

> 
> That would avoid rescanning the movable blocks uselessly on the second
> pass but should still work for Bartlomiej's workload.
> 
> What do you think?
> 
>> In other words, could it be better to always try to
>> rescue the unmovable blocks?
> 
> I do not think we should always scan within unmovable blocks on the
> first pass. I strongly suspect it would lead to excessive amounts of CPU
> time spent in mm/compaction.c.


Agree.

> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-26 18:52       ` Rik van Riel
@ 2012-04-27  9:45         ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2012-04-27  9:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, Minchan Kim,
	Marek Szyprowski, Kyungmin Park

On Thu, Apr 26, 2012 at 02:52:56PM -0400, Rik van Riel wrote:
> >Instead of COMPACT_ASYNC_PARTIAL and COMPACT_ASYNC_FULL should we have
> >COMPACT_ASYNC_MOVABLE and COMPACT_ASYNC_UNMOVABLE? The first pass from
> >the page allocator (COMPACT_ASYNC_MOVABLE) would only consider MOVABLE
> >blocks as migration targets. The second pass (COMPACT_ASYNC_UNMOVABLE)
> >would examine UNMOVABLE blocks, rescue them and use what blocks it
> >rescues as migration targets. The third pass (COMPACT_SYNC) would work
> >as it does currently. kswapd would only ever use COMPACT_ASYNC_MOVABLE.
> >
> >That would avoid rescanning the movable blocks uselessly on the second
> >pass but should still work for Bartlomiej's workload.
> >
> >What do you think?
> 
> This makes sense.
> 
> >>In other words, could it be better to always try to
> >>rescue the unmovable blocks?
> >
> >I do not think we should always scan within unmovable blocks on the
> >first pass. I strongly suspect it would lead to excessive amounts of CPU
> >time spent in mm/compaction.c.
> 
> Maybe my systems are not typical.  I have not seen
> more than about 10% of the memory blocks marked as
> unmovable in my system.

I see even less than 10% on my systems but I do not consider them to be
typical and there will be systems where there are more unmovable pageblocks
for whatever reason (lots of page table pages, anon_vmas and vmas for
example). Hence I'd rather not assume that the number is typically low.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-27  0:58       ` Minchan Kim
@ 2012-04-27  9:56         ` Mel Gorman
  2012-04-30  2:44           ` Minchan Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2012-04-27  9:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Bartlomiej Zolnierkiewicz, linux-mm,
	Marek Szyprowski, Kyungmin Park

On Fri, Apr 27, 2012 at 09:58:10AM +0900, Minchan Kim wrote:
> On 04/27/2012 01:47 AM, Mel Gorman wrote:
> 
> > On Thu, Apr 26, 2012 at 11:53:47AM -0400, Rik van Riel wrote:
> >> On 04/26/2012 10:36 AM, Mel Gorman wrote:
> >>
> >>> Hmm, at what point does COMPACT_ASYNC_FULL get used? I see it gets
> >>> used for the proc interface but it's not used via the page allocator at
> >>> all.
> >>
> >> He is using COMPACT_SYNC for the proc interface, and
> >> COMPACT_ASYNC_FULL from kswapd.
> >>
> > 
> > Ah, yes, of course. My bad.
> > 
> > Even that is not particularly satisfactory though as it's depending on
> > kswapd to do the work so it's a bit of a race to see if kswapd completes
> > the job before the page allocator needs it.
> 
> 
> It was a direction by my review.

Ah.

> In my point, I don't want to add more latency in direct reclaim async path if we can
> although reclaim is already slow path.
> 

Your statement was

   Direct reclaim latency is critical on latency sensitive applications(of
   course, you can argue it's already very slow once we reach this path,
   but at least, let's not increase more overhead if we can) so I think
   it would be better to use ASYNC_PARTIAL.  If we fail to allocate in
   this phase, we set it with COMPACTION_SYNC in next phase, below code.

If a path is latency sensitive they have already lost if they are in this
path. They have entered compaction and may enter direct reclaim shortly
so latency is bad at this point. If the application is latency sensitive
they probably should disable THP to avoid any spikes due to THP allocation.

So I still maintain that the page allocator should not be depending on
kswapd to do the work for it. If the caller wants high-order pages, it
must be prepared to pay the cost of allocation.

> If async direct reclaim fails to compact memory with COMPACT_ASYNC_PARTIAL,
> it ends up trying to compact memory with COMPACT_SYNC, again so it would
> be no problem to allocate big order page and it's as-it-is approach by
> async and sync mode.
> 

Is a compromise whereby a second pass consider only MIGRATE_UNMOVABLE
pageblocks for rescus and migration targets acceptable? It would be nicer
again if try_to_compact_pages() still accepted a "sync" parameter and would
decide itself if a COMPACT_ASYNC_FULL pass was necessary when sync==false.

> While latency is important in direct reclaim, kswapd isn't.

That does not mean we should tie up kswapd in compaction.c for longer
than is necessary. It should be getting out of compaction ASAP in case
reclaim is necessary.

> So I think using COMPACT_ASYNC_FULL in kswapd makes sense.
> 

I'm not convinced but am not willing to push on it either. I do think
that the caller of the page allocator does have to use
COMPACT_ASYNC_FULL though and cannot be depending on kswapd to do the
work.

> > <SNIP>
> >
> > This goes back to the same problem of we do not know how many
> > MIGRATE_UNMOVABLE pageblocks are going to be encountered in advance However,
> > I see your point.
> > 
> > Instead of COMPACT_ASYNC_PARTIAL and COMPACT_ASYNC_FULL should we have
> > COMPACT_ASYNC_MOVABLE and COMPACT_ASYNC_UNMOVABLE? The first pass from
> > the page allocator (COMPACT_ASYNC_MOVABLE) would only consider MOVABLE
> > blocks as migration targets. The second pass (COMPACT_ASYNC_UNMOVABLE)
> > would examine UNMOVABLE blocks, rescue them and use what blocks it
> > rescues as migration targets. The third pass (COMPACT_SYNC) would work
> 
> 
> It does make sense.
> 
> > as it does currently. kswapd would only ever use COMPACT_ASYNC_MOVABLE.
> 
> I don't get it. Why do kswapd use only COMPACT_ASYNC_MOVALBE?

Because kswapds primary responsibility is reclaim, not compaction.

> As I mentioned, latency isn't important in kswapd so I think kswapd always
> rescur unmovable block would help direct reclaim's first path(COMPACT_ASYNC
> _MOVABLE)'s success rate.
> 

Latency for kswapd can be important if processes are entering direct
reclaim because kswapd was running compaction instead of reclaim. The
cost is indirect and difficult to detect which is why I would prefer
kswapds use of compaction was as fast as possible.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-27  9:56         ` Mel Gorman
@ 2012-04-30  2:44           ` Minchan Kim
  2012-04-30  8:31             ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2012-04-30  2:44 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Bartlomiej Zolnierkiewicz, linux-mm,
	Marek Szyprowski, Kyungmin Park

On 04/27/2012 06:56 PM, Mel Gorman wrote:

> On Fri, Apr 27, 2012 at 09:58:10AM +0900, Minchan Kim wrote:
>> On 04/27/2012 01:47 AM, Mel Gorman wrote:
>>
>>> On Thu, Apr 26, 2012 at 11:53:47AM -0400, Rik van Riel wrote:
>>>> On 04/26/2012 10:36 AM, Mel Gorman wrote:
>>>>
>>>>> Hmm, at what point does COMPACT_ASYNC_FULL get used? I see it gets
>>>>> used for the proc interface but it's not used via the page allocator at
>>>>> all.
>>>>
>>>> He is using COMPACT_SYNC for the proc interface, and
>>>> COMPACT_ASYNC_FULL from kswapd.
>>>>
>>>
>>> Ah, yes, of course. My bad.
>>>
>>> Even that is not particularly satisfactory though as it's depending on
>>> kswapd to do the work so it's a bit of a race to see if kswapd completes
>>> the job before the page allocator needs it.
>>
>>
>> It was a direction by my review.
> 
> Ah.
> 
>> In my point, I don't want to add more latency in direct reclaim async path if we can
>> although reclaim is already slow path.
>>
> 
> Your statement was
> 
>    Direct reclaim latency is critical on latency sensitive applications(of
>    course, you can argue it's already very slow once we reach this path,
>    but at least, let's not increase more overhead if we can) so I think
>    it would be better to use ASYNC_PARTIAL.  If we fail to allocate in
>    this phase, we set it with COMPACTION_SYNC in next phase, below code.
> 
> If a path is latency sensitive they have already lost if they are in this
> path. They have entered compaction and may enter direct reclaim shortly
> so latency is bad at this point. If the application is latency sensitive
> they probably should disable THP to avoid any spikes due to THP allocation.


Only THP isn't latency factor.
In case of ARM, we allocates 4-pages(ie, order=2) for pgd.
It means it can affect fork latency.

> 
> So I still maintain that the page allocator should not be depending on
> kswapd to do the work for it. If the caller wants high-order pages, it
> must be prepared to pay the cost of allocation.


I think it would be better if kswapd helps us.

> 
>> If async direct reclaim fails to compact memory with COMPACT_ASYNC_PARTIAL,
>> it ends up trying to compact memory with COMPACT_SYNC, again so it would
>> be no problem to allocate big order page and it's as-it-is approach by
>> async and sync mode.
>>
> 
> Is a compromise whereby a second pass consider only MIGRATE_UNMOVABLE
> pageblocks for rescus and migration targets acceptable? It would be nicer
> again if try_to_compact_pages() still accepted a "sync" parameter and would
> decide itself if a COMPACT_ASYNC_FULL pass was necessary when sync==false.


Looks good to me. 

> 
>> While latency is important in direct reclaim, kswapd isn't.
> 
> That does not mean we should tie up kswapd in compaction.c for longer
> than is necessary. It should be getting out of compaction ASAP in case
> reclaim is necessary.


Why do you think compaction and reclaim by separate?
If kswapd starts compaction, it means someone in direct reclaim path request
to kswapd to get a big order page. So I think compaction is a part of reclaim.
In this case, compaction should be necessary.

> 
>> So I think using COMPACT_ASYNC_FULL in kswapd makes sense.
>>
> 
> I'm not convinced but am not willing to push on it either. I do think
> that the caller of the page allocator does have to use
> COMPACT_ASYNC_FULL though and cannot be depending on kswapd to do the
> work.


I agree your second stage reclaiming in direct reclaim.
1. ASYNC-MOVABLE only
2. ASYNC-UNMOVABLE only
3. SYNC

Another reason we should check unmovable page block in kswapd is that we should consider
atomic allocation where is only place kswapd helps us.
I hope that reason would convince you.

> 
>>> <SNIP>
>>>
>>> This goes back to the same problem of we do not know how many
>>> MIGRATE_UNMOVABLE pageblocks are going to be encountered in advance However,
>>> I see your point.
>>>
>>> Instead of COMPACT_ASYNC_PARTIAL and COMPACT_ASYNC_FULL should we have
>>> COMPACT_ASYNC_MOVABLE and COMPACT_ASYNC_UNMOVABLE? The first pass from
>>> the page allocator (COMPACT_ASYNC_MOVABLE) would only consider MOVABLE
>>> blocks as migration targets. The second pass (COMPACT_ASYNC_UNMOVABLE)
>>> would examine UNMOVABLE blocks, rescue them and use what blocks it
>>> rescues as migration targets. The third pass (COMPACT_SYNC) would work
>>
>>
>> It does make sense.
>>
>>> as it does currently. kswapd would only ever use COMPACT_ASYNC_MOVABLE.
>>
>> I don't get it. Why do kswapd use only COMPACT_ASYNC_MOVALBE?
> 
> Because kswapds primary responsibility is reclaim, not compaction.


Again, I think compaction is a part of reclaim.

> 
>> As I mentioned, latency isn't important in kswapd so I think kswapd always
>> rescur unmovable block would help direct reclaim's first path(COMPACT_ASYNC
>> _MOVABLE)'s success rate.
>>
> 
> Latency for kswapd can be important if processes are entering direct
> reclaim because kswapd was running compaction instead of reclaim. The
> cost is indirect and difficult to detect which is why I would prefer
> kswapds use of compaction was as fast as possible.
> 


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-30  2:44           ` Minchan Kim
@ 2012-04-30  8:31             ` Mel Gorman
  2012-04-30  8:55               ` Minchan Kim
  0 siblings, 1 reply; 13+ messages in thread
From: Mel Gorman @ 2012-04-30  8:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Bartlomiej Zolnierkiewicz, linux-mm,
	Marek Szyprowski, Kyungmin Park

On Mon, Apr 30, 2012 at 11:44:47AM +0900, Minchan Kim wrote:
> > Your statement was
> > 
> >    Direct reclaim latency is critical on latency sensitive applications(of
> >    course, you can argue it's already very slow once we reach this path,
> >    but at least, let's not increase more overhead if we can) so I think
> >    it would be better to use ASYNC_PARTIAL.  If we fail to allocate in
> >    this phase, we set it with COMPACTION_SYNC in next phase, below code.
> > 
> > If a path is latency sensitive they have already lost if they are in this
> > path. They have entered compaction and may enter direct reclaim shortly
> > so latency is bad at this point. If the application is latency sensitive
> > they probably should disable THP to avoid any spikes due to THP allocation.
> 
> 
> Only THP isn't latency factor.
> In case of ARM, we allocates 4-pages(ie, order=2) for pgd.
> It means it can affect fork latency.
> 

order-2 is below PAGE_ALLOC_COSTLY_ORDER where the expectation is that
reclaim on its own should be able to free the necessary pages. That aside,
part of what you are proposing is that the page allocator not use ASYNC_FULL
and instead depend on kswapd to compact memory out of line.  That means
the caller of fork() either gets to do reclaim pages or loop until kswapd
does the compaction. That does not make sense from a latency perspective.

> > So I still maintain that the page allocator should not be depending on
> > kswapd to do the work for it. If the caller wants high-order pages, it
> > must be prepared to pay the cost of allocation.
> 
> 
> I think it would be better if kswapd helps us.
> 

Help maybe, but you are proposing the caller of fork() does not do the work
necessary to allocate the order-2 page (using ASYNC_PARTIAL, ASYNC_FULL
and SYNC) and instead depends on kswapd to do it.

> >> If async direct reclaim fails to compact memory with COMPACT_ASYNC_PARTIAL,
> >> it ends up trying to compact memory with COMPACT_SYNC, again so it would
> >> be no problem to allocate big order page and it's as-it-is approach by
> >> async and sync mode.
> >>
> > 
> > Is a compromise whereby a second pass consider only MIGRATE_UNMOVABLE
> > pageblocks for rescus and migration targets acceptable? It would be nicer
> > again if try_to_compact_pages() still accepted a "sync" parameter and would
> > decide itself if a COMPACT_ASYNC_FULL pass was necessary when sync==false.
> 
> 
> Looks good to me. 
> 

Ok.

> > 
> >> While latency is important in direct reclaim, kswapd isn't.
> > 
> > That does not mean we should tie up kswapd in compaction.c for longer
> > than is necessary. It should be getting out of compaction ASAP in case
> > reclaim is necessary.
> 
> Why do you think compaction and reclaim by separate?
> If kswapd starts compaction, it means someone in direct reclaim path request
> to kswapd to get a big order page.

It's not all about high order pages. If kswapd is running compaction and a
caller needs an order-0 page it may enter direct reclaim instead which is
worse from a latency perspective. The possibility for this situation should
be limited as much as possible without a very strong compelling reason.I
do not think there is a compelling reason right now to take the risk.

> So I think compaction is a part of reclaim.
> In this case, compaction should be necessary.
> 
> > 
> >> So I think using COMPACT_ASYNC_FULL in kswapd makes sense.
> >>
> > 
> > I'm not convinced but am not willing to push on it either. I do think
> > that the caller of the page allocator does have to use
> > COMPACT_ASYNC_FULL though and cannot be depending on kswapd to do the
> > work.
> 
> I agree your second stage reclaiming in direct reclaim.
> 1. ASYNC-MOVABLE only
> 2. ASYNC-UNMOVABLE only
> 3. SYNC
> 

Ok, then can we at least start with that? Specifically that the
page allocator continue to pass in sync to try_to_compact_pages() and
try_to_compact_pages() doing compaction first as ASYNC_PARTIAL and then
deciding whether it should do a second pass as ASYNC_FULL?

> Another reason we should check unmovable page block in kswapd is that we should consider
> atomic allocation where is only place kswapd helps us.
> I hope that reason would convince you.
> 

It doesn't really. High-order atomic allocations are something that should
be avoided as much as possible and the longer kswapd runs compaction the
greater the risk that processes stall in direct reclaim unnecessarily.
I know the current logic of kswapd using compaction.c is meant to help high
order atomics but that does not mean I think kswapd should spend even more
time in compaction.c without a compelling use case.

At the very least, make kswapd using ASYNC_FULL a separate patch. I will
not ACK it without compelling data backing it up but patch 1 would be
there to handle Bartlomiej's adverse workload.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-30  8:31             ` Mel Gorman
@ 2012-04-30  8:55               ` Minchan Kim
  2012-04-30  9:16                 ` Mel Gorman
  0 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2012-04-30  8:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Bartlomiej Zolnierkiewicz, linux-mm,
	Marek Szyprowski, Kyungmin Park

On 04/30/2012 05:31 PM, Mel Gorman wrote:

> On Mon, Apr 30, 2012 at 11:44:47AM +0900, Minchan Kim wrote:
>>> Your statement was
>>>
>>>    Direct reclaim latency is critical on latency sensitive applications(of
>>>    course, you can argue it's already very slow once we reach this path,
>>>    but at least, let's not increase more overhead if we can) so I think
>>>    it would be better to use ASYNC_PARTIAL.  If we fail to allocate in
>>>    this phase, we set it with COMPACTION_SYNC in next phase, below code.
>>>
>>> If a path is latency sensitive they have already lost if they are in this
>>> path. They have entered compaction and may enter direct reclaim shortly
>>> so latency is bad at this point. If the application is latency sensitive
>>> they probably should disable THP to avoid any spikes due to THP allocation.
>>
>>
>> Only THP isn't latency factor.
>> In case of ARM, we allocates 4-pages(ie, order=2) for pgd.
>> It means it can affect fork latency.
>>
> 
> order-2 is below PAGE_ALLOC_COSTLY_ORDER where the expectation is that
> reclaim on its own should be able to free the necessary pages. That aside,
> part of what you are proposing is that the page allocator not use ASYNC_FULL
> and instead depend on kswapd to compact memory out of line.  That means
> the caller of fork() either gets to do reclaim pages or loop until kswapd
> does the compaction. That does not make sense from a latency perspective.
> 
>>> So I still maintain that the page allocator should not be depending on
>>> kswapd to do the work for it. If the caller wants high-order pages, it
>>> must be prepared to pay the cost of allocation.
>>
>>
>> I think it would be better if kswapd helps us.
>>
> 
> Help maybe, but you are proposing the caller of fork() does not do the work
> necessary to allocate the order-2 page (using ASYNC_PARTIAL, ASYNC_FULL
> and SYNC) and instead depends on kswapd to do it.


Hmm, there was misunderstanding.
I agreed your page allocator suggestion after you suggest AYNC_PARTIAL, ASYNC_FULL and sync.
The concern was only kswapd. :)

> 
>>>> If async direct reclaim fails to compact memory with COMPACT_ASYNC_PARTIAL,
>>>> it ends up trying to compact memory with COMPACT_SYNC, again so it would
>>>> be no problem to allocate big order page and it's as-it-is approach by
>>>> async and sync mode.
>>>>
>>>
>>> Is a compromise whereby a second pass consider only MIGRATE_UNMOVABLE
>>> pageblocks for rescus and migration targets acceptable? It would be nicer
>>> again if try_to_compact_pages() still accepted a "sync" parameter and would
>>> decide itself if a COMPACT_ASYNC_FULL pass was necessary when sync==false.
>>
>>
>> Looks good to me. 
>>
> 
> Ok.
> 
>>>
>>>> While latency is important in direct reclaim, kswapd isn't.
>>>
>>> That does not mean we should tie up kswapd in compaction.c for longer
>>> than is necessary. It should be getting out of compaction ASAP in case
>>> reclaim is necessary.
>>
>> Why do you think compaction and reclaim by separate?
>> If kswapd starts compaction, it means someone in direct reclaim path request
>> to kswapd to get a big order page.
> 
> It's not all about high order pages. If kswapd is running compaction and a
> caller needs an order-0 page it may enter direct reclaim instead which is
> worse from a latency perspective. The possibility for this situation should
> be limited as much as possible without a very strong compelling reason.I
> do not think there is a compelling reason right now to take the risk.



Hmm, I understand your point.
Suggestion:
Couldn't we can coded to give up kswapd's compaction 
immediately if another task requests order-0 in direct reclaim path?

> 
>> So I think compaction is a part of reclaim.
>> In this case, compaction should be necessary.
>>
>>>
>>>> So I think using COMPACT_ASYNC_FULL in kswapd makes sense.
>>>>
>>>
>>> I'm not convinced but am not willing to push on it either. I do think
>>> that the caller of the page allocator does have to use
>>> COMPACT_ASYNC_FULL though and cannot be depending on kswapd to do the
>>> work.
>>
>> I agree your second stage reclaiming in direct reclaim.
>> 1. ASYNC-MOVABLE only
>> 2. ASYNC-UNMOVABLE only
>> 3. SYNC
>>
> 
> Ok, then can we at least start with that? Specifically that the
> page allocator continue to pass in sync to try_to_compact_pages() and
> try_to_compact_pages() doing compaction first as ASYNC_PARTIAL and then
> deciding whether it should do a second pass as ASYNC_FULL?


Yeb. We can proceed second pass once we found many unmovalbe page blocks
during first ASYNC_PARTIAL compaction.

> 
>> Another reason we should check unmovable page block in kswapd is that we should consider
>> atomic allocation where is only place kswapd helps us.
>> I hope that reason would convince you.
>>
> 
> It doesn't really. High-order atomic allocations are something that should
> be avoided as much as possible and the longer kswapd runs compaction the
> greater the risk that processes stall in direct reclaim unnecessarily.
> I know the current logic of kswapd using compaction.c is meant to help high
> order atomics but that does not mean I think kswapd should spend even more
> time in compaction.c without a compelling use case.
> 


My suggestion may mitigate the problem.

> At the very least, make kswapd using ASYNC_FULL a separate patch. I will
> not ACK it without compelling data backing it up but patch 1 would be
> there to handle Bartlomiej's adverse workload.
> 


If I have a time, I will try it but now I don't have a time to make such data.
So let's keep remember this discussion for trial later if we look the problem.

Thanks.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks
  2012-04-30  8:55               ` Minchan Kim
@ 2012-04-30  9:16                 ` Mel Gorman
  0 siblings, 0 replies; 13+ messages in thread
From: Mel Gorman @ 2012-04-30  9:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Rik van Riel, Bartlomiej Zolnierkiewicz, linux-mm,
	Marek Szyprowski, Kyungmin Park

On Mon, Apr 30, 2012 at 05:55:09PM +0900, Minchan Kim wrote:
> > <SNIP>
> > 
> > Help maybe, but you are proposing the caller of fork() does not do the work
> > necessary to allocate the order-2 page (using ASYNC_PARTIAL, ASYNC_FULL
> > and SYNC) and instead depends on kswapd to do it.
> 
> 
> Hmm, there was misunderstanding.
> I agreed your page allocator suggestion after you suggest AYNC_PARTIAL, ASYNC_FULL and sync.
> The concern was only kswapd. :)
> 

Understood :)

> >> <SNIP>
> >> Why do you think compaction and reclaim by separate?
> >> If kswapd starts compaction, it means someone in direct reclaim path request
> >> to kswapd to get a big order page.
> > 
> > It's not all about high order pages. If kswapd is running compaction and a
> > caller needs an order-0 page it may enter direct reclaim instead which is
> > worse from a latency perspective. The possibility for this situation should
> > be limited as much as possible without a very strong compelling reason.I
> > do not think there is a compelling reason right now to take the risk.
> 
> Hmm, I understand your point.
> Suggestion:
> Couldn't we can coded to give up kswapd's compaction 
> immediately if another task requests order-0 in direct reclaim path?
> 

That would be desirable and it's possible you can do it by altering
slightly how pgdat->classzone_idx so that it's default value is MAX_ZONE
or something similar. If that value changes from its default and kswapd
is in compaction, it can decide whether to stop compaction or not. It
might decide to continue compaction if it has been woken for a
high-order allocation for example.

> >> So I think compaction is a part of reclaim.
> >> In this case, compaction should be necessary.
> >>
> >>>
> >>>> So I think using COMPACT_ASYNC_FULL in kswapd makes sense.
> >>>>
> >>>
> >>> I'm not convinced but am not willing to push on it either. I do think
> >>> that the caller of the page allocator does have to use
> >>> COMPACT_ASYNC_FULL though and cannot be depending on kswapd to do the
> >>> work.
> >>
> >> I agree your second stage reclaiming in direct reclaim.
> >> 1. ASYNC-MOVABLE only
> >> 2. ASYNC-UNMOVABLE only
> >> 3. SYNC
> >>
> > 
> > Ok, then can we at least start with that? Specifically that the
> > page allocator continue to pass in sync to try_to_compact_pages() and
> > try_to_compact_pages() doing compaction first as ASYNC_PARTIAL and then
> > deciding whether it should do a second pass as ASYNC_FULL?
> 
> Yeb. We can proceed second pass once we found many unmovalbe page blocks
> during first ASYNC_PARTIAL compaction.
> 

Exactly.

> >> Another reason we should check unmovable page block in kswapd is that we should consider
> >> atomic allocation where is only place kswapd helps us.
> >> I hope that reason would convince you.
> >>
> > 
> > It doesn't really. High-order atomic allocations are something that should
> > be avoided as much as possible and the longer kswapd runs compaction the
> > greater the risk that processes stall in direct reclaim unnecessarily.
> > I know the current logic of kswapd using compaction.c is meant to help high
> > order atomics but that does not mean I think kswapd should spend even more
> > time in compaction.c without a compelling use case.
> > 
> 
> My suggestion may mitigate the problem.
> 

Yes, it may.

> > At the very least, make kswapd using ASYNC_FULL a separate patch. I will
> > not ACK it without compelling data backing it up but patch 1 would be
> > there to handle Bartlomiej's adverse workload.
> 
> If I have a time, I will try it but now I don't have a time to make such data.
> So let's keep remember this discussion for trial later if we look the problem.
> 

Will do. Minimally add a link to this thread to the changelog.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-04-30  9:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-26  8:15 [PATCH v3] mm: compaction: handle incorrect Unmovable type pageblocks Bartlomiej Zolnierkiewicz
2012-04-26 14:36 ` Mel Gorman
2012-04-26 15:53   ` Rik van Riel
2012-04-26 16:47     ` Mel Gorman
2012-04-26 18:52       ` Rik van Riel
2012-04-27  9:45         ` Mel Gorman
2012-04-27  0:58       ` Minchan Kim
2012-04-27  9:56         ` Mel Gorman
2012-04-30  2:44           ` Minchan Kim
2012-04-30  8:31             ` Mel Gorman
2012-04-30  8:55               ` Minchan Kim
2012-04-30  9:16                 ` Mel Gorman
2012-04-26 15:42 ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.