All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-08  8:46 ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-08  8:46 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Hugh Dickins, KOSAKI Motohiro, Dave Jones, Cong Wang,
	Markus Trippelsdorf, Mel Gorman, Minchan Kim, Rik van Riel,
	Marek Szyprowski, Kyungmin Park, Andrew Morton, Linus Torvalds


Hi,

This version is much simpler as it just uses __count_immobile_pages()
instead of using its own open coded version and it integrates changes
from Minchan Kim (without page_count change as it doesn't seem correct
and __count_immobile_pages() does the check in the standard way; if it
still is a problem I think that removing 1st phase check altogether
would be better instead of adding more locking complexity).

The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
to make it possible to easily check if the code is working in practice.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center


From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
type pageblock (and some MIGRATE_MOVABLE pages are left in it)
waiting until an allocation takes ownership of the block may
take too long.  The type of the pageblock remains unchanged
so the pageblock cannot be used as a migration target during
compaction.

Fix it by:

* Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
  and COMPACT_SYNC) and then converting sync field in struct
  compact_control to use it.

* Adding nr_pageblocks_skipped field to struct compact_control
  and tracking how many destination pageblocks were of
  MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
  ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
  that there is not a suitable page for allocation.  In this case
  then check how if there were enough MIGRATE_UNMOVABLE pageblocks
  to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.

* Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
  and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
  a count based on finding PageBuddy pages, page_count(page) == 0
  or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
  pageblock are in one of those three sets change the whole
  pageblock type to MIGRATE_MOVABLE.

My particular test case (on a ARM EXYNOS4 device with 512 MiB,
which means 131072 standard 4KiB pages in 'Normal' zone) is to:
- allocate 95000 pages for kernel's usage
- free every second page (47500 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
(now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 10 successful allocations
- with compaction enabled - 11 successful allocations
- with this patch I'm able to get 25 successful allocations

NOTE: If we can make kswapd aware of order-0 request during
compaction, we can enhance kswapd with changing mode to
COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
Please see the following thread:

	http://marc.info/?l=linux-mm&m=133552069417068&w=2

[minchan@kernel.org: minor cleanups]
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
v2:
- redo the patch basing on review from Mel Gorman
  (http://marc.info/?l=linux-mm&m=133519311025444&w=2)
v3:
- apply review comments from Minchan Kim
  (http://marc.info/?l=linux-mm&m=133531540308862&w=2)
v4:
- more review comments from Mel
  (http://marc.info/?l=linux-mm&m=133545110625042&w=2)
v5:
- even more comments from Mel
  (http://marc.info/?l=linux-mm&m=133577669023492&w=2)
- fix patch description
v6: (based on comments from Minchan Kim and Mel Gorman)
- add note about kswapd
- rename nr_pageblocks to nr_pageblocks_scanned_scanned and nr_skipped
  to nr_pageblocks_scanned_skipped
- fix pageblocks counting in suitable_migration_target()
- fix try_to_compact_pages() to do COMPACT_ASYNC_UNMOVABLE per zone 
v7:
- minor cleanups from Minchan Kim
- cleanup try_to_compact_pages()
v8:
- document rescue_unmovable_pageblock()
- enum result_smt -> enum_smt_result
- fix suitable_migration_target() documentation
- add comment about zeroing cc->nr_pageblocks_skipped
- fix FAIL_UNMOVABLE_TARGET handling in isolate_freepages()
v9:
- use right page for pageblock conversion in rescue_unmovable_pageblock()
- split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock()
  and __rescue_unmovable_pageblock()
- add missing locking
- modify test-case slightly
v10:
- merge changes from Minchan Kim
- use __count_immobile_pages()
- add compact_rescued_unmovable_blocks vmevent to vmstats

 include/linux/compaction.h    |   19 +++++
 include/linux/vm_event_item.h |    1 
 mm/compaction.c               |  140 +++++++++++++++++++++++++++++++++++-------
 mm/internal.h                 |   11 +++
 mm/page_alloc.c               |   16 +++-
 mm/vmstat.c                   |    1 
 6 files changed, 158 insertions(+), 30 deletions(-)

Index: b/include/linux/compaction.h
===================================================================
--- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
+++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
Index: b/include/linux/vm_event_item.h
===================================================================
--- a/include/linux/vm_event_item.h	2012-06-08 09:55:12.653681272 +0200
+++ b/include/linux/vm_event_item.h	2012-06-08 10:02:32.069681221 +0200
@@ -39,6 +39,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 #ifdef CONFIG_COMPACTION
 		COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
 		COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+		COMPACT_RESCUED_UNMOVABLE_BLOCKS,
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-08 09:01:32.105681656 +0200
+++ b/mm/compaction.c	2012-06-08 10:03:38.413681248 +0200
@@ -235,7 +235,7 @@ isolate_migratepages_range(struct zone *
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -303,7 +303,8 @@ isolate_migratepages_range(struct zone *
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -324,7 +325,7 @@ isolate_migratepages_range(struct zone *
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		/* Try isolate the page */
@@ -357,27 +358,86 @@ isolate_migratepages_range(struct zone *
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page)
+{
+	struct zone *zone;
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page;
+
+	zone = page_zone(page);
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	start_page = pfn_to_page(start_pfn);
+
+	/*
+	 * Race with page allocator/reclaimer can happen so that it can
+	 * deceive unmovable block to migratable type on this pageblock.
+	 * It could regress on anti-fragmentation but it's rare and not
+	 * critical.
+	 */
+	return __count_immobile_pages(zone, start_page, 0);
+}
+
+static void rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+
+	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
+}
+
+/*
+ * MIGRATE_TARGET : good for migration target
+ * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET : can't migrate by another reasons.
+ */
+enum smt_result {
+	MIGRATE_TARGET,
+	RESCUE_UNMOVABLE_TARGET,
+	UNMOVABLE_TARGET,
+	SKIP_TARGET,
+};
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+/*
+ * Returns MIGRATE_TARGET if the page is within a block
+ * suitable for migration to, UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+			      struct compact_control *cc)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return SKIP_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return MIGRATE_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return MIGRATE_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page))
+		return RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return SKIP_TARGET;
 }
 
 /*
@@ -411,6 +471,13 @@ static void isolate_freepages(struct zon
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_unmovable_pageblock = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -418,6 +485,7 @@ static void isolate_freepages(struct zon
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -434,9 +502,12 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc);
+		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
+			if (ret == UNMOVABLE_TARGET)
+				cc->nr_unmovable_pageblock++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -445,12 +516,16 @@ static void isolate_freepages(struct zon
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc);
+		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
+			if (ret == RESCUE_UNMOVABLE_TARGET)
+				rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == UNMOVABLE_TARGET)
+			cc->nr_unmovable_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -682,8 +757,9 @@ static int compact_zone(struct zone *zon
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -712,7 +788,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -720,12 +797,17 @@ static unsigned long compact_zone_order(
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -750,6 +832,8 @@ unsigned long try_to_compact_pages(struc
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -766,12 +850,22 @@ unsigned long try_to_compact_pages(struc
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -805,7 +899,7 @@ static int __compact_pgdat(pg_data_t *pg
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -820,7 +914,7 @@ int compact_pgdat(pg_data_t *pgdat, int 
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -830,7 +924,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h	2012-06-08 09:01:32.109681656 +0200
+++ b/mm/internal.h	2012-06-08 09:06:09.481681670 +0200
@@ -94,6 +94,11 @@ extern void putback_lru_page(struct page
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
+extern int __count_immobile_pages(struct zone *zone, struct page *page,
+				  int count);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +106,7 @@ extern bool is_free_buddy_page(struct pa
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +125,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_unmovable_pageblock;
 };
 
 unsigned long
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-06-08 09:01:32.125681656 +0200
+++ b/mm/page_alloc.c	2012-06-08 09:04:51.693681694 +0200
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -954,8 +954,8 @@ static int move_freepages(struct zone *z
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
  * page allocater never alloc memory from ISOLATE block.
  */
 
-static int
-__count_immobile_pages(struct zone *zone, struct page *page, int count)
+int __count_immobile_pages(struct zone *zone, struct page *page, int count)
 {
 	unsigned long pfn, iter, found;
 	int mt;
@@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
 			continue;
 
 		page = pfn_to_page(check);
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(page) != zone)
+			return false;
+
 		if (!page_count(page)) {
 			if (PageBuddy(page))
 				iter += (1 << page_order(page)) - 1;
@@ -5654,7 +5658,7 @@ static int __alloc_contig_migrate_range(
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c	2012-06-08 09:39:22.421681379 +0200
+++ b/mm/vmstat.c	2012-06-08 10:02:22.053681226 +0200
@@ -767,6 +767,7 @@ const char * const vmstat_text[] = {
 	"compact_stall",
 	"compact_fail",
 	"compact_success",
+	"compact_rescued_unmovable_blocks",
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-08  8:46 ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-08  8:46 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Hugh Dickins, KOSAKI Motohiro, Dave Jones, Cong Wang,
	Markus Trippelsdorf, Mel Gorman, Minchan Kim, Rik van Riel,
	Marek Szyprowski, Kyungmin Park, Andrew Morton, Linus Torvalds


Hi,

This version is much simpler as it just uses __count_immobile_pages()
instead of using its own open coded version and it integrates changes
from Minchan Kim (without page_count change as it doesn't seem correct
and __count_immobile_pages() does the check in the standard way; if it
still is a problem I think that removing 1st phase check altogether
would be better instead of adding more locking complexity).

The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
to make it possible to easily check if the code is working in practice.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center


From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
type pageblock (and some MIGRATE_MOVABLE pages are left in it)
waiting until an allocation takes ownership of the block may
take too long.  The type of the pageblock remains unchanged
so the pageblock cannot be used as a migration target during
compaction.

Fix it by:

* Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
  and COMPACT_SYNC) and then converting sync field in struct
  compact_control to use it.

* Adding nr_pageblocks_skipped field to struct compact_control
  and tracking how many destination pageblocks were of
  MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
  ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
  that there is not a suitable page for allocation.  In this case
  then check how if there were enough MIGRATE_UNMOVABLE pageblocks
  to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.

* Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
  and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
  a count based on finding PageBuddy pages, page_count(page) == 0
  or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
  pageblock are in one of those three sets change the whole
  pageblock type to MIGRATE_MOVABLE.

My particular test case (on a ARM EXYNOS4 device with 512 MiB,
which means 131072 standard 4KiB pages in 'Normal' zone) is to:
- allocate 95000 pages for kernel's usage
- free every second page (47500 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
(now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 10 successful allocations
- with compaction enabled - 11 successful allocations
- with this patch I'm able to get 25 successful allocations

NOTE: If we can make kswapd aware of order-0 request during
compaction, we can enhance kswapd with changing mode to
COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
Please see the following thread:

	http://marc.info/?l=linux-mm&m=133552069417068&w=2

[minchan@kernel.org: minor cleanups]
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
v2:
- redo the patch basing on review from Mel Gorman
  (http://marc.info/?l=linux-mm&m=133519311025444&w=2)
v3:
- apply review comments from Minchan Kim
  (http://marc.info/?l=linux-mm&m=133531540308862&w=2)
v4:
- more review comments from Mel
  (http://marc.info/?l=linux-mm&m=133545110625042&w=2)
v5:
- even more comments from Mel
  (http://marc.info/?l=linux-mm&m=133577669023492&w=2)
- fix patch description
v6: (based on comments from Minchan Kim and Mel Gorman)
- add note about kswapd
- rename nr_pageblocks to nr_pageblocks_scanned_scanned and nr_skipped
  to nr_pageblocks_scanned_skipped
- fix pageblocks counting in suitable_migration_target()
- fix try_to_compact_pages() to do COMPACT_ASYNC_UNMOVABLE per zone 
v7:
- minor cleanups from Minchan Kim
- cleanup try_to_compact_pages()
v8:
- document rescue_unmovable_pageblock()
- enum result_smt -> enum_smt_result
- fix suitable_migration_target() documentation
- add comment about zeroing cc->nr_pageblocks_skipped
- fix FAIL_UNMOVABLE_TARGET handling in isolate_freepages()
v9:
- use right page for pageblock conversion in rescue_unmovable_pageblock()
- split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock()
  and __rescue_unmovable_pageblock()
- add missing locking
- modify test-case slightly
v10:
- merge changes from Minchan Kim
- use __count_immobile_pages()
- add compact_rescued_unmovable_blocks vmevent to vmstats

 include/linux/compaction.h    |   19 +++++
 include/linux/vm_event_item.h |    1 
 mm/compaction.c               |  140 +++++++++++++++++++++++++++++++++++-------
 mm/internal.h                 |   11 +++
 mm/page_alloc.c               |   16 +++-
 mm/vmstat.c                   |    1 
 6 files changed, 158 insertions(+), 30 deletions(-)

Index: b/include/linux/compaction.h
===================================================================
--- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
+++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
Index: b/include/linux/vm_event_item.h
===================================================================
--- a/include/linux/vm_event_item.h	2012-06-08 09:55:12.653681272 +0200
+++ b/include/linux/vm_event_item.h	2012-06-08 10:02:32.069681221 +0200
@@ -39,6 +39,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 #ifdef CONFIG_COMPACTION
 		COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
 		COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+		COMPACT_RESCUED_UNMOVABLE_BLOCKS,
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-08 09:01:32.105681656 +0200
+++ b/mm/compaction.c	2012-06-08 10:03:38.413681248 +0200
@@ -235,7 +235,7 @@ isolate_migratepages_range(struct zone *
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -303,7 +303,8 @@ isolate_migratepages_range(struct zone *
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -324,7 +325,7 @@ isolate_migratepages_range(struct zone *
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		/* Try isolate the page */
@@ -357,27 +358,86 @@ isolate_migratepages_range(struct zone *
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page)
+{
+	struct zone *zone;
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page;
+
+	zone = page_zone(page);
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	start_page = pfn_to_page(start_pfn);
+
+	/*
+	 * Race with page allocator/reclaimer can happen so that it can
+	 * deceive unmovable block to migratable type on this pageblock.
+	 * It could regress on anti-fragmentation but it's rare and not
+	 * critical.
+	 */
+	return __count_immobile_pages(zone, start_page, 0);
+}
+
+static void rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+
+	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
+}
+
+/*
+ * MIGRATE_TARGET : good for migration target
+ * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET : can't migrate by another reasons.
+ */
+enum smt_result {
+	MIGRATE_TARGET,
+	RESCUE_UNMOVABLE_TARGET,
+	UNMOVABLE_TARGET,
+	SKIP_TARGET,
+};
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+/*
+ * Returns MIGRATE_TARGET if the page is within a block
+ * suitable for migration to, UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+			      struct compact_control *cc)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return SKIP_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return MIGRATE_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return MIGRATE_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page))
+		return RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return SKIP_TARGET;
 }
 
 /*
@@ -411,6 +471,13 @@ static void isolate_freepages(struct zon
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_unmovable_pageblock = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -418,6 +485,7 @@ static void isolate_freepages(struct zon
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -434,9 +502,12 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc);
+		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
+			if (ret == UNMOVABLE_TARGET)
+				cc->nr_unmovable_pageblock++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -445,12 +516,16 @@ static void isolate_freepages(struct zon
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc);
+		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
+			if (ret == RESCUE_UNMOVABLE_TARGET)
+				rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == UNMOVABLE_TARGET)
+			cc->nr_unmovable_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -682,8 +757,9 @@ static int compact_zone(struct zone *zon
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -712,7 +788,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -720,12 +797,17 @@ static unsigned long compact_zone_order(
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -750,6 +832,8 @@ unsigned long try_to_compact_pages(struc
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -766,12 +850,22 @@ unsigned long try_to_compact_pages(struc
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -805,7 +899,7 @@ static int __compact_pgdat(pg_data_t *pg
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -820,7 +914,7 @@ int compact_pgdat(pg_data_t *pgdat, int 
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -830,7 +924,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h	2012-06-08 09:01:32.109681656 +0200
+++ b/mm/internal.h	2012-06-08 09:06:09.481681670 +0200
@@ -94,6 +94,11 @@ extern void putback_lru_page(struct page
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
+extern int __count_immobile_pages(struct zone *zone, struct page *page,
+				  int count);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +106,7 @@ extern bool is_free_buddy_page(struct pa
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +125,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_unmovable_pageblock;
 };
 
 unsigned long
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-06-08 09:01:32.125681656 +0200
+++ b/mm/page_alloc.c	2012-06-08 09:04:51.693681694 +0200
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -954,8 +954,8 @@ static int move_freepages(struct zone *z
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
  * page allocater never alloc memory from ISOLATE block.
  */
 
-static int
-__count_immobile_pages(struct zone *zone, struct page *page, int count)
+int __count_immobile_pages(struct zone *zone, struct page *page, int count)
 {
 	unsigned long pfn, iter, found;
 	int mt;
@@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
 			continue;
 
 		page = pfn_to_page(check);
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(page) != zone)
+			return false;
+
 		if (!page_count(page)) {
 			if (PageBuddy(page))
 				iter += (1 << page_order(page)) - 1;
@@ -5654,7 +5658,7 @@ static int __alloc_contig_migrate_range(
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c	2012-06-08 09:39:22.421681379 +0200
+++ b/mm/vmstat.c	2012-06-08 10:02:22.053681226 +0200
@@ -767,6 +767,7 @@ const char * const vmstat_text[] = {
 	"compact_stall",
 	"compact_fail",
 	"compact_success",
+	"compact_rescued_unmovable_blocks",
 #endif
 
 #ifdef CONFIG_HUGETLB_PAGE

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-08  8:46 ` Bartlomiej Zolnierkiewicz
@ 2012-06-08 21:18   ` Andrew Morton
  -1 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2012-06-08 21:18 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Minchan Kim, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Linus Torvalds

On Fri, 08 Jun 2012 10:46:32 +0200
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:

> 
> Hi,
> 
> This version is much simpler as it just uses __count_immobile_pages()
> instead of using its own open coded version and it integrates changes
> from Minchan Kim (without page_count change as it doesn't seem correct
> and __count_immobile_pages() does the check in the standard way; if it
> still is a problem I think that removing 1st phase check altogether
> would be better instead of adding more locking complexity).
> 
> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> to make it possible to easily check if the code is working in practice.
> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center
> 
> 
> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
> 
> When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
> type pageblock (and some MIGRATE_MOVABLE pages are left in it)
> waiting until an allocation takes ownership of the block may
> take too long.  The type of the pageblock remains unchanged
> so the pageblock cannot be used as a migration target during
> compaction.
> 
> Fix it by:
> 
> * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
>   and COMPACT_SYNC) and then converting sync field in struct
>   compact_control to use it.
> 
> * Adding nr_pageblocks_skipped field to struct compact_control
>   and tracking how many destination pageblocks were of
>   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
>   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
>   that there is not a suitable page for allocation.  In this case
>   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
>   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
> 
> * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
>   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
>   a count based on finding PageBuddy pages, page_count(page) == 0
>   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
>   pageblock are in one of those three sets change the whole
>   pageblock type to MIGRATE_MOVABLE.
> 
> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> - allocate 95000 pages for kernel's usage
> - free every second page (47500 pages) of memory just allocated
> - allocate and use 60000 pages from user space
> - free remaining 60000 pages of kernel memory
> (now we have fragmented memory occupied mostly by user space pages)
> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> 
> The results:
> - with compaction disabled I get 10 successful allocations
> - with compaction enabled - 11 successful allocations
> - with this patch I'm able to get 25 successful allocations
> 
> NOTE: If we can make kswapd aware of order-0 request during
> compaction, we can enhance kswapd with changing mode to
> COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
> Please see the following thread:
> 
> 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
> 
>
> ...
>
> --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
> +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
> @@ -1,6 +1,8 @@
>  #ifndef _LINUX_COMPACTION_H
>  #define _LINUX_COMPACTION_H
>  
> +#include <linux/node.h>

Why was this addition needed?  (I think I asked this before)

>  /* Return values for compact_zone() and try_to_compact_pages() */
>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>  #define COMPACT_SKIPPED		0
>
> ...
>
> +static bool can_rescue_unmovable_pageblock(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	start_page = pfn_to_page(start_pfn);
> +
> +	/*
> +	 * Race with page allocator/reclaimer can happen so that it can
> +	 * deceive unmovable block to migratable type on this pageblock.
> +	 * It could regress on anti-fragmentation but it's rare and not
> +	 * critical.
> +	 */

This is quite ungramattical and needs a rewrite, please.  Suggest the
use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
rather than "unmovable block", etc.

Please explain "could regress" and also explain why it is "not critical".

> +	return __count_immobile_pages(zone, start_page, 0);
> +}
> +
> +static void rescue_unmovable_pageblock(struct page *page)
> +{
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +
> +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
> +}
> +
> +/*
> + * MIGRATE_TARGET : good for migration target
> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.

s/TARTET/TARGET/

> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET : can't migrate by another reasons.
> + */
> +enum smt_result {
> +	MIGRATE_TARGET,
> +	RESCUE_UNMOVABLE_TARGET,
> +	UNMOVABLE_TARGET,
> +	SKIP_TARGET,
> +};
>  
>
> ...
>
> @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
>   * page allocater never alloc memory from ISOLATE block.
>   */
>  
> -static int
> -__count_immobile_pages(struct zone *zone, struct page *page, int count)
> +int __count_immobile_pages(struct zone *zone, struct page *page, int count)

We may as well fix the return type of this while we're in there.  It
should be bool.

Also, the comment over __count_immobile_pages() is utter rubbish.  Can
we please cook up a new one?  Something human-readable which also
describes the return value.

>  {
>  	unsigned long pfn, iter, found;
>  	int mt;
> @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
>  			continue;
>  
>  		page = pfn_to_page(check);
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(page) != zone)
> +			return false;

I don't really understand this bit.  Wasn't it wrong to walk across
zones in the original code?  Did you do something which will newly
cause this to walk between zones?  It doesn't seem to be changelogged,
and the comment commits the common mistake of explaining "what" but not
"why".


>  		if (!page_count(page)) {
>  			if (PageBuddy(page))
>  				iter += (1 << page_order(page)) - 1;
>
> ...
>

A few tweaks:

--- a/mm/compaction.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
+++ a/mm/compaction.c
@@ -394,10 +394,10 @@ static void rescue_unmovable_pageblock(s
 }
 
 /*
- * MIGRATE_TARGET : good for migration target
- * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
- * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
- * SKIP_TARGET : can't migrate by another reasons.
+ * MIGRATE_TARGET: good for migration target
+ * RESCUE_UNMOVABLE_TARGET: good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET: can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET: can't migrate by another reasons.
  */
 enum smt_result {
 	MIGRATE_TARGET,
@@ -407,9 +407,9 @@ enum smt_result {
 };
 
 /*
- * Returns MIGRATE_TARGET if the page is within a block
- * suitable for migration to, UNMOVABLE_TARGET if the page
- * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ * Returns MIGRATE_TARGET if the page is within a block suitable for migration
+ * to, UNMOVABLE_TARGET if the page is within a MIGRATE_UNMOVABLE block,
+ * SKIP_TARGET otherwise.
  */
 static enum smt_result suitable_migration_target(struct page *page,
 			      struct compact_control *cc)
--- a/mm/internal.h~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
+++ a/mm/internal.h
@@ -97,7 +97,7 @@ extern void putback_lru_page(struct page
 extern void set_pageblock_migratetype(struct page *page, int migratetype);
 extern int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype);
-extern int __count_immobile_pages(struct zone *zone, struct page *page,
+extern bool __count_immobile_pages(struct zone *zone, struct page *page,
 				  int count);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
--- a/mm/page_alloc.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
+++ a/mm/page_alloc.c
@@ -221,7 +221,6 @@ int page_group_by_mobility_disabled __re
 
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
-
 	if (unlikely(page_group_by_mobility_disabled))
 		migratetype = MIGRATE_UNMOVABLE;
 
@@ -5472,7 +5471,7 @@ void set_pageblock_flags_group(struct pa
  * page allocater never alloc memory from ISOLATE block.
  */
 
-int __count_immobile_pages(struct zone *zone, struct page *page, int count)
+bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
 {
 	unsigned long pfn, iter, found;
 	int mt;


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-08 21:18   ` Andrew Morton
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Morton @ 2012-06-08 21:18 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Minchan Kim, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Linus Torvalds

On Fri, 08 Jun 2012 10:46:32 +0200
Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:

> 
> Hi,
> 
> This version is much simpler as it just uses __count_immobile_pages()
> instead of using its own open coded version and it integrates changes
> from Minchan Kim (without page_count change as it doesn't seem correct
> and __count_immobile_pages() does the check in the standard way; if it
> still is a problem I think that removing 1st phase check altogether
> would be better instead of adding more locking complexity).
> 
> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> to make it possible to easily check if the code is working in practice.
> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center
> 
> 
> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
> 
> When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
> type pageblock (and some MIGRATE_MOVABLE pages are left in it)
> waiting until an allocation takes ownership of the block may
> take too long.  The type of the pageblock remains unchanged
> so the pageblock cannot be used as a migration target during
> compaction.
> 
> Fix it by:
> 
> * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
>   and COMPACT_SYNC) and then converting sync field in struct
>   compact_control to use it.
> 
> * Adding nr_pageblocks_skipped field to struct compact_control
>   and tracking how many destination pageblocks were of
>   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
>   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
>   that there is not a suitable page for allocation.  In this case
>   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
>   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
> 
> * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
>   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
>   a count based on finding PageBuddy pages, page_count(page) == 0
>   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
>   pageblock are in one of those three sets change the whole
>   pageblock type to MIGRATE_MOVABLE.
> 
> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> - allocate 95000 pages for kernel's usage
> - free every second page (47500 pages) of memory just allocated
> - allocate and use 60000 pages from user space
> - free remaining 60000 pages of kernel memory
> (now we have fragmented memory occupied mostly by user space pages)
> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> 
> The results:
> - with compaction disabled I get 10 successful allocations
> - with compaction enabled - 11 successful allocations
> - with this patch I'm able to get 25 successful allocations
> 
> NOTE: If we can make kswapd aware of order-0 request during
> compaction, we can enhance kswapd with changing mode to
> COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
> Please see the following thread:
> 
> 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
> 
>
> ...
>
> --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
> +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
> @@ -1,6 +1,8 @@
>  #ifndef _LINUX_COMPACTION_H
>  #define _LINUX_COMPACTION_H
>  
> +#include <linux/node.h>

Why was this addition needed?  (I think I asked this before)

>  /* Return values for compact_zone() and try_to_compact_pages() */
>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>  #define COMPACT_SKIPPED		0
>
> ...
>
> +static bool can_rescue_unmovable_pageblock(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	start_page = pfn_to_page(start_pfn);
> +
> +	/*
> +	 * Race with page allocator/reclaimer can happen so that it can
> +	 * deceive unmovable block to migratable type on this pageblock.
> +	 * It could regress on anti-fragmentation but it's rare and not
> +	 * critical.
> +	 */

This is quite ungramattical and needs a rewrite, please.  Suggest the
use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
rather than "unmovable block", etc.

Please explain "could regress" and also explain why it is "not critical".

> +	return __count_immobile_pages(zone, start_page, 0);
> +}
> +
> +static void rescue_unmovable_pageblock(struct page *page)
> +{
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +
> +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
> +}
> +
> +/*
> + * MIGRATE_TARGET : good for migration target
> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.

s/TARTET/TARGET/

> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET : can't migrate by another reasons.
> + */
> +enum smt_result {
> +	MIGRATE_TARGET,
> +	RESCUE_UNMOVABLE_TARGET,
> +	UNMOVABLE_TARGET,
> +	SKIP_TARGET,
> +};
>  
>
> ...
>
> @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
>   * page allocater never alloc memory from ISOLATE block.
>   */
>  
> -static int
> -__count_immobile_pages(struct zone *zone, struct page *page, int count)
> +int __count_immobile_pages(struct zone *zone, struct page *page, int count)

We may as well fix the return type of this while we're in there.  It
should be bool.

Also, the comment over __count_immobile_pages() is utter rubbish.  Can
we please cook up a new one?  Something human-readable which also
describes the return value.

>  {
>  	unsigned long pfn, iter, found;
>  	int mt;
> @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
>  			continue;
>  
>  		page = pfn_to_page(check);
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(page) != zone)
> +			return false;

I don't really understand this bit.  Wasn't it wrong to walk across
zones in the original code?  Did you do something which will newly
cause this to walk between zones?  It doesn't seem to be changelogged,
and the comment commits the common mistake of explaining "what" but not
"why".


>  		if (!page_count(page)) {
>  			if (PageBuddy(page))
>  				iter += (1 << page_order(page)) - 1;
>
> ...
>

A few tweaks:

--- a/mm/compaction.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
+++ a/mm/compaction.c
@@ -394,10 +394,10 @@ static void rescue_unmovable_pageblock(s
 }
 
 /*
- * MIGRATE_TARGET : good for migration target
- * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
- * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
- * SKIP_TARGET : can't migrate by another reasons.
+ * MIGRATE_TARGET: good for migration target
+ * RESCUE_UNMOVABLE_TARGET: good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET: can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET: can't migrate by another reasons.
  */
 enum smt_result {
 	MIGRATE_TARGET,
@@ -407,9 +407,9 @@ enum smt_result {
 };
 
 /*
- * Returns MIGRATE_TARGET if the page is within a block
- * suitable for migration to, UNMOVABLE_TARGET if the page
- * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ * Returns MIGRATE_TARGET if the page is within a block suitable for migration
+ * to, UNMOVABLE_TARGET if the page is within a MIGRATE_UNMOVABLE block,
+ * SKIP_TARGET otherwise.
  */
 static enum smt_result suitable_migration_target(struct page *page,
 			      struct compact_control *cc)
--- a/mm/internal.h~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
+++ a/mm/internal.h
@@ -97,7 +97,7 @@ extern void putback_lru_page(struct page
 extern void set_pageblock_migratetype(struct page *page, int migratetype);
 extern int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype);
-extern int __count_immobile_pages(struct zone *zone, struct page *page,
+extern bool __count_immobile_pages(struct zone *zone, struct page *page,
 				  int count);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
--- a/mm/page_alloc.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
+++ a/mm/page_alloc.c
@@ -221,7 +221,6 @@ int page_group_by_mobility_disabled __re
 
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
-
 	if (unlikely(page_group_by_mobility_disabled))
 		migratetype = MIGRATE_UNMOVABLE;
 
@@ -5472,7 +5471,7 @@ void set_pageblock_flags_group(struct pa
  * page allocater never alloc memory from ISOLATE block.
  */
 
-int __count_immobile_pages(struct zone *zone, struct page *page, int count)
+bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
 {
 	unsigned long pfn, iter, found;
 	int mt;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-08  8:46 ` Bartlomiej Zolnierkiewicz
@ 2012-06-11  1:26   ` Minchan Kim
  -1 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11  1:26 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

Hi Bartlomiej,

On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:

> 
> Hi,
> 
> This version is much simpler as it just uses __count_immobile_pages()
> instead of using its own open coded version and it integrates changes


That's a good idea. I don't have noticed that function is there.
When I look at the function, it has a problem, too.
Please, look at this.

https://lkml.org/lkml/2012/6/10/180

If reviewer is okay that patch, I would like to resend your patch based on that. 

> from Minchan Kim (without page_count change as it doesn't seem correct


Why do you think so?
If it isn't correct, how can you prevent racing with THP page freeing?

> and __count_immobile_pages() does the check in the standard way; if it
> still is a problem I think that removing 1st phase check altogether
> would be better instead of adding more locking complexity).
> 
> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> to make it possible to easily check if the code is working in practice.


I think that part should be another patch.

1. Adding new vmstat would be arguable so it might interrupt this patch merging.
2. New vmstat adding is just for this patch is effective or not in real practice
   so if we prove it in future, let's revert the vmstat. Separating it would make it
   easily.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11  1:26   ` Minchan Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11  1:26 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

Hi Bartlomiej,

On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:

> 
> Hi,
> 
> This version is much simpler as it just uses __count_immobile_pages()
> instead of using its own open coded version and it integrates changes


That's a good idea. I don't have noticed that function is there.
When I look at the function, it has a problem, too.
Please, look at this.

https://lkml.org/lkml/2012/6/10/180

If reviewer is okay that patch, I would like to resend your patch based on that. 

> from Minchan Kim (without page_count change as it doesn't seem correct


Why do you think so?
If it isn't correct, how can you prevent racing with THP page freeing?

> and __count_immobile_pages() does the check in the standard way; if it
> still is a problem I think that removing 1st phase check altogether
> would be better instead of adding more locking complexity).
> 
> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> to make it possible to easily check if the code is working in practice.


I think that part should be another patch.

1. Adding new vmstat would be arguable so it might interrupt this patch merging.
2. New vmstat adding is just for this patch is effective or not in real practice
   so if we prove it in future, let's revert the vmstat. Separating it would make it
   easily.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-08 21:18   ` Andrew Morton
@ 2012-06-11  2:05     ` Minchan Kim
  -1 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11  2:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, linux-kernel, Hugh Dickins,
	KOSAKI Motohiro, Dave Jones, Cong Wang, Markus Trippelsdorf,
	Mel Gorman, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Linus Torvalds

Hi Andrew,

On 06/09/2012 06:18 AM, Andrew Morton wrote:

> On Fri, 08 Jun 2012 10:46:32 +0200
> Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> 
>>
>> Hi,
>>
>> This version is much simpler as it just uses __count_immobile_pages()
>> instead of using its own open coded version and it integrates changes
>> from Minchan Kim (without page_count change as it doesn't seem correct

>> and __count_immobile_pages() does the check in the standard way; if it

>> still is a problem I think that removing 1st phase check altogether
>> would be better instead of adding more locking complexity).
>>
>> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
>> to make it possible to easily check if the code is working in practice.
>>
>> Best regards,
>> --
>> Bartlomiej Zolnierkiewicz
>> Samsung Poland R&D Center
>>
>>
>> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
>> Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
>>
>> When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
>> type pageblock (and some MIGRATE_MOVABLE pages are left in it)
>> waiting until an allocation takes ownership of the block may
>> take too long.  The type of the pageblock remains unchanged
>> so the pageblock cannot be used as a migration target during
>> compaction.
>>
>> Fix it by:
>>
>> * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
>>   and COMPACT_SYNC) and then converting sync field in struct
>>   compact_control to use it.
>>
>> * Adding nr_pageblocks_skipped field to struct compact_control
>>   and tracking how many destination pageblocks were of
>>   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
>>   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
>>   that there is not a suitable page for allocation.  In this case
>>   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
>>   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
>>
>> * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
>>   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
>>   a count based on finding PageBuddy pages, page_count(page) == 0
>>   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
>>   pageblock are in one of those three sets change the whole
>>   pageblock type to MIGRATE_MOVABLE.
>>
>> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
>> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
>> - allocate 95000 pages for kernel's usage
>> - free every second page (47500 pages) of memory just allocated
>> - allocate and use 60000 pages from user space
>> - free remaining 60000 pages of kernel memory
>> (now we have fragmented memory occupied mostly by user space pages)
>> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>>
>> The results:
>> - with compaction disabled I get 10 successful allocations
>> - with compaction enabled - 11 successful allocations
>> - with this patch I'm able to get 25 successful allocations
>>
>> NOTE: If we can make kswapd aware of order-0 request during
>> compaction, we can enhance kswapd with changing mode to
>> COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
>> Please see the following thread:
>>
>> 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
>>
>>
>> ...
>>
>> --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
>> +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
>> @@ -1,6 +1,8 @@
>>  #ifndef _LINUX_COMPACTION_H
>>  #define _LINUX_COMPACTION_H
>>  
>> +#include <linux/node.h>
> 
> Why was this addition needed?  (I think I asked this before)
> 
>>  /* Return values for compact_zone() and try_to_compact_pages() */
>>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>>  #define COMPACT_SKIPPED		0
>>
>> ...
>>
>> +static bool can_rescue_unmovable_pageblock(struct page *page)
>> +{
>> +	struct zone *zone;
>> +	unsigned long pfn, start_pfn, end_pfn;
>> +	struct page *start_page;
>> +
>> +	zone = page_zone(page);
>> +	pfn = page_to_pfn(page);
>> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> +	start_page = pfn_to_page(start_pfn);
>> +
>> +	/*
>> +	 * Race with page allocator/reclaimer can happen so that it can
>> +	 * deceive unmovable block to migratable type on this pageblock.
>> +	 * It could regress on anti-fragmentation but it's rare and not
>> +	 * critical.
>> +	 */
> 
> This is quite ungramattical and needs a rewrite, please.  Suggest the


My bad.

> use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
> rather than "unmovable block", etc.


Bartlomiej, Please rewrite.
It seems I have no talent for English. :(

> 
> Please explain "could regress" and also explain why it is "not critical".


Regress: It is possible that MOVABLE_BLOCK has unmovable pages.
NOT critical: We are already in danger of that because allocator fallback
              mechanism can allocate unmovable pages in movable block.

> 
>> +	return __count_immobile_pages(zone, start_page, 0);
>> +}
>> +
>> +static void rescue_unmovable_pageblock(struct page *page)
>> +{
>> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +
>> +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
>> +}
>> +
>> +/*
>> + * MIGRATE_TARGET : good for migration target
>> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> 
> s/TARTET/TARGET/
> 
>> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
>> + * SKIP_TARGET : can't migrate by another reasons.
>> + */
>> +enum smt_result {
>> +	MIGRATE_TARGET,
>> +	RESCUE_UNMOVABLE_TARGET,
>> +	UNMOVABLE_TARGET,
>> +	SKIP_TARGET,
>> +};
>>  
>>
>> ...
>>
>> @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
>>   * page allocater never alloc memory from ISOLATE block.
>>   */
>>  
>> -static int
>> -__count_immobile_pages(struct zone *zone, struct page *page, int count)
>> +int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> 
> We may as well fix the return type of this while we're in there.  It
> should be bool.
> 

> Also, the comment over __count_immobile_pages() is utter rubbish.  Can

> we please cook up a new one?  Something human-readable which also
> describes the return value.



The function name is rather awkward, too.
I will try that clean up.

> 
>>  {
>>  	unsigned long pfn, iter, found;
>>  	int mt;
>> @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
>>  			continue;
>>  
>>  		page = pfn_to_page(check);
>> +
>> +		/* Do not deal with pageblocks that overlap zones */
>> +		if (page_zone(page) != zone)
>> +			return false;
> 
> I don't really understand this bit.  Wasn't it wrong to walk across
> zones in the original code?  Did you do something which will newly
> cause this to walk between zones?  It doesn't seem to be changelogged,
> and the comment commits the common mistake of explaining "what" but not
> "why".


I saw similar function in isolate_freepages and remember Mel said. 

"
Node-0 Node-1 Node-0
DMA DMA DMA
0-1023 1024-2047 2048-4096

In that case, a PFN scanner can enter a new node and zone but the migrate
and free scanners have not necessarily met. This configuration is *extremely*
rare but it happens on messed-up LPAR configurations on POWER
"
http://lkml.indiana.edu/hypermail/linux/kernel/1002.2/01140.html

> 

> 
>>  		if (!page_count(page)) {
>>  			if (PageBuddy(page))
>>  				iter += (1 << page_order(page)) - 1;
>>
>> ...
>>
> 
> A few tweaks:
> 
> --- a/mm/compaction.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/compaction.c
> @@ -394,10 +394,10 @@ static void rescue_unmovable_pageblock(s
>  }
>  
>  /*
> - * MIGRATE_TARGET : good for migration target
> - * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> - * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> - * SKIP_TARGET : can't migrate by another reasons.
> + * MIGRATE_TARGET: good for migration target
> + * RESCUE_UNMOVABLE_TARGET: good only if we can rescue the unmovable pageblock.
> + * UNMOVABLE_TARGET: can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET: can't migrate by another reasons.
>   */
>  enum smt_result {
>  	MIGRATE_TARGET,
> @@ -407,9 +407,9 @@ enum smt_result {
>  };
>  
>  /*
> - * Returns MIGRATE_TARGET if the page is within a block
> - * suitable for migration to, UNMOVABLE_TARGET if the page
> - * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
> + * Returns MIGRATE_TARGET if the page is within a block suitable for migration
> + * to, UNMOVABLE_TARGET if the page is within a MIGRATE_UNMOVABLE block,
> + * SKIP_TARGET otherwise.
>   */
>  static enum smt_result suitable_migration_target(struct page *page,
>  			      struct compact_control *cc)
> --- a/mm/internal.h~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/internal.h
> @@ -97,7 +97,7 @@ extern void putback_lru_page(struct page
>  extern void set_pageblock_migratetype(struct page *page, int migratetype);
>  extern int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype);
> -extern int __count_immobile_pages(struct zone *zone, struct page *page,
> +extern bool __count_immobile_pages(struct zone *zone, struct page *page,
>  				  int count);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
> --- a/mm/page_alloc.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/page_alloc.c
> @@ -221,7 +221,6 @@ int page_group_by_mobility_disabled __re
>  
>  void set_pageblock_migratetype(struct page *page, int migratetype)
>  {
> -
>  	if (unlikely(page_group_by_mobility_disabled))
>  		migratetype = MIGRATE_UNMOVABLE;
>  
> @@ -5472,7 +5471,7 @@ void set_pageblock_flags_group(struct pa
>   * page allocater never alloc memory from ISOLATE block.
>   */
>  
> -int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> +bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  {
>  	unsigned long pfn, iter, found;
>  	int mt;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11  2:05     ` Minchan Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11  2:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, linux-kernel, Hugh Dickins,
	KOSAKI Motohiro, Dave Jones, Cong Wang, Markus Trippelsdorf,
	Mel Gorman, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Linus Torvalds

Hi Andrew,

On 06/09/2012 06:18 AM, Andrew Morton wrote:

> On Fri, 08 Jun 2012 10:46:32 +0200
> Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> 
>>
>> Hi,
>>
>> This version is much simpler as it just uses __count_immobile_pages()
>> instead of using its own open coded version and it integrates changes
>> from Minchan Kim (without page_count change as it doesn't seem correct

>> and __count_immobile_pages() does the check in the standard way; if it

>> still is a problem I think that removing 1st phase check altogether
>> would be better instead of adding more locking complexity).
>>
>> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
>> to make it possible to easily check if the code is working in practice.
>>
>> Best regards,
>> --
>> Bartlomiej Zolnierkiewicz
>> Samsung Poland R&D Center
>>
>>
>> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
>> Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
>>
>> When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
>> type pageblock (and some MIGRATE_MOVABLE pages are left in it)
>> waiting until an allocation takes ownership of the block may
>> take too long.  The type of the pageblock remains unchanged
>> so the pageblock cannot be used as a migration target during
>> compaction.
>>
>> Fix it by:
>>
>> * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
>>   and COMPACT_SYNC) and then converting sync field in struct
>>   compact_control to use it.
>>
>> * Adding nr_pageblocks_skipped field to struct compact_control
>>   and tracking how many destination pageblocks were of
>>   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
>>   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
>>   that there is not a suitable page for allocation.  In this case
>>   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
>>   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
>>
>> * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
>>   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
>>   a count based on finding PageBuddy pages, page_count(page) == 0
>>   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
>>   pageblock are in one of those three sets change the whole
>>   pageblock type to MIGRATE_MOVABLE.
>>
>> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
>> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
>> - allocate 95000 pages for kernel's usage
>> - free every second page (47500 pages) of memory just allocated
>> - allocate and use 60000 pages from user space
>> - free remaining 60000 pages of kernel memory
>> (now we have fragmented memory occupied mostly by user space pages)
>> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
>>
>> The results:
>> - with compaction disabled I get 10 successful allocations
>> - with compaction enabled - 11 successful allocations
>> - with this patch I'm able to get 25 successful allocations
>>
>> NOTE: If we can make kswapd aware of order-0 request during
>> compaction, we can enhance kswapd with changing mode to
>> COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
>> Please see the following thread:
>>
>> 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
>>
>>
>> ...
>>
>> --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
>> +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
>> @@ -1,6 +1,8 @@
>>  #ifndef _LINUX_COMPACTION_H
>>  #define _LINUX_COMPACTION_H
>>  
>> +#include <linux/node.h>
> 
> Why was this addition needed?  (I think I asked this before)
> 
>>  /* Return values for compact_zone() and try_to_compact_pages() */
>>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>>  #define COMPACT_SKIPPED		0
>>
>> ...
>>
>> +static bool can_rescue_unmovable_pageblock(struct page *page)
>> +{
>> +	struct zone *zone;
>> +	unsigned long pfn, start_pfn, end_pfn;
>> +	struct page *start_page;
>> +
>> +	zone = page_zone(page);
>> +	pfn = page_to_pfn(page);
>> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> +	start_page = pfn_to_page(start_pfn);
>> +
>> +	/*
>> +	 * Race with page allocator/reclaimer can happen so that it can
>> +	 * deceive unmovable block to migratable type on this pageblock.
>> +	 * It could regress on anti-fragmentation but it's rare and not
>> +	 * critical.
>> +	 */
> 
> This is quite ungramattical and needs a rewrite, please.  Suggest the


My bad.

> use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
> rather than "unmovable block", etc.


Bartlomiej, Please rewrite.
It seems I have no talent for English. :(

> 
> Please explain "could regress" and also explain why it is "not critical".


Regress: It is possible that MOVABLE_BLOCK has unmovable pages.
NOT critical: We are already in danger of that because allocator fallback
              mechanism can allocate unmovable pages in movable block.

> 
>> +	return __count_immobile_pages(zone, start_page, 0);
>> +}
>> +
>> +static void rescue_unmovable_pageblock(struct page *page)
>> +{
>> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
>> +
>> +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
>> +}
>> +
>> +/*
>> + * MIGRATE_TARGET : good for migration target
>> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> 
> s/TARTET/TARGET/
> 
>> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
>> + * SKIP_TARGET : can't migrate by another reasons.
>> + */
>> +enum smt_result {
>> +	MIGRATE_TARGET,
>> +	RESCUE_UNMOVABLE_TARGET,
>> +	UNMOVABLE_TARGET,
>> +	SKIP_TARGET,
>> +};
>>  
>>
>> ...
>>
>> @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
>>   * page allocater never alloc memory from ISOLATE block.
>>   */
>>  
>> -static int
>> -__count_immobile_pages(struct zone *zone, struct page *page, int count)
>> +int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> 
> We may as well fix the return type of this while we're in there.  It
> should be bool.
> 

> Also, the comment over __count_immobile_pages() is utter rubbish.  Can

> we please cook up a new one?  Something human-readable which also
> describes the return value.



The function name is rather awkward, too.
I will try that clean up.

> 
>>  {
>>  	unsigned long pfn, iter, found;
>>  	int mt;
>> @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
>>  			continue;
>>  
>>  		page = pfn_to_page(check);
>> +
>> +		/* Do not deal with pageblocks that overlap zones */
>> +		if (page_zone(page) != zone)
>> +			return false;
> 
> I don't really understand this bit.  Wasn't it wrong to walk across
> zones in the original code?  Did you do something which will newly
> cause this to walk between zones?  It doesn't seem to be changelogged,
> and the comment commits the common mistake of explaining "what" but not
> "why".


I saw similar function in isolate_freepages and remember Mel said. 

"
Node-0 Node-1 Node-0
DMA DMA DMA
0-1023 1024-2047 2048-4096

In that case, a PFN scanner can enter a new node and zone but the migrate
and free scanners have not necessarily met. This configuration is *extremely*
rare but it happens on messed-up LPAR configurations on POWER
"
http://lkml.indiana.edu/hypermail/linux/kernel/1002.2/01140.html

> 

> 
>>  		if (!page_count(page)) {
>>  			if (PageBuddy(page))
>>  				iter += (1 << page_order(page)) - 1;
>>
>> ...
>>
> 
> A few tweaks:
> 
> --- a/mm/compaction.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/compaction.c
> @@ -394,10 +394,10 @@ static void rescue_unmovable_pageblock(s
>  }
>  
>  /*
> - * MIGRATE_TARGET : good for migration target
> - * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> - * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> - * SKIP_TARGET : can't migrate by another reasons.
> + * MIGRATE_TARGET: good for migration target
> + * RESCUE_UNMOVABLE_TARGET: good only if we can rescue the unmovable pageblock.
> + * UNMOVABLE_TARGET: can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET: can't migrate by another reasons.
>   */
>  enum smt_result {
>  	MIGRATE_TARGET,
> @@ -407,9 +407,9 @@ enum smt_result {
>  };
>  
>  /*
> - * Returns MIGRATE_TARGET if the page is within a block
> - * suitable for migration to, UNMOVABLE_TARGET if the page
> - * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
> + * Returns MIGRATE_TARGET if the page is within a block suitable for migration
> + * to, UNMOVABLE_TARGET if the page is within a MIGRATE_UNMOVABLE block,
> + * SKIP_TARGET otherwise.
>   */
>  static enum smt_result suitable_migration_target(struct page *page,
>  			      struct compact_control *cc)
> --- a/mm/internal.h~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/internal.h
> @@ -97,7 +97,7 @@ extern void putback_lru_page(struct page
>  extern void set_pageblock_migratetype(struct page *page, int migratetype);
>  extern int move_freepages_block(struct zone *zone, struct page *page,
>  				int migratetype);
> -extern int __count_immobile_pages(struct zone *zone, struct page *page,
> +extern bool __count_immobile_pages(struct zone *zone, struct page *page,
>  				  int count);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
> --- a/mm/page_alloc.c~mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix
> +++ a/mm/page_alloc.c
> @@ -221,7 +221,6 @@ int page_group_by_mobility_disabled __re
>  
>  void set_pageblock_migratetype(struct page *page, int migratetype)
>  {
> -
>  	if (unlikely(page_group_by_mobility_disabled))
>  		migratetype = MIGRATE_UNMOVABLE;
>  
> @@ -5472,7 +5471,7 @@ void set_pageblock_flags_group(struct pa
>   * page allocater never alloc memory from ISOLATE block.
>   */
>  
> -int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> +bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
>  {
>  	unsigned long pfn, iter, found;
>  	int mt;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-08 21:18   ` Andrew Morton
@ 2012-06-11 10:37     ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-11 10:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Minchan Kim, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Linus Torvalds

On Friday 08 June 2012 23:18:33 Andrew Morton wrote:
> On Fri, 08 Jun 2012 10:46:32 +0200
> Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> 
> > 
> > Hi,
> > 
> > This version is much simpler as it just uses __count_immobile_pages()
> > instead of using its own open coded version and it integrates changes
> > from Minchan Kim (without page_count change as it doesn't seem correct
> > and __count_immobile_pages() does the check in the standard way; if it
> > still is a problem I think that removing 1st phase check altogether
> > would be better instead of adding more locking complexity).
> > 
> > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > to make it possible to easily check if the code is working in practice.
> > 
> > Best regards,
> > --
> > Bartlomiej Zolnierkiewicz
> > Samsung Poland R&D Center
> > 
> > 
> > From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> > Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
> > 
> > When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
> > type pageblock (and some MIGRATE_MOVABLE pages are left in it)
> > waiting until an allocation takes ownership of the block may
> > take too long.  The type of the pageblock remains unchanged
> > so the pageblock cannot be used as a migration target during
> > compaction.
> > 
> > Fix it by:
> > 
> > * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
> >   and COMPACT_SYNC) and then converting sync field in struct
> >   compact_control to use it.
> > 
> > * Adding nr_pageblocks_skipped field to struct compact_control
> >   and tracking how many destination pageblocks were of
> >   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
> >   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
> >   that there is not a suitable page for allocation.  In this case
> >   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
> >   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
> > 
> > * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
> >   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
> >   a count based on finding PageBuddy pages, page_count(page) == 0
> >   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
> >   pageblock are in one of those three sets change the whole
> >   pageblock type to MIGRATE_MOVABLE.
> > 
> > My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> > which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> > - allocate 95000 pages for kernel's usage
> > - free every second page (47500 pages) of memory just allocated
> > - allocate and use 60000 pages from user space
> > - free remaining 60000 pages of kernel memory
> > (now we have fragmented memory occupied mostly by user space pages)
> > - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> > 
> > The results:
> > - with compaction disabled I get 10 successful allocations
> > - with compaction enabled - 11 successful allocations
> > - with this patch I'm able to get 25 successful allocations
> > 
> > NOTE: If we can make kswapd aware of order-0 request during
> > compaction, we can enhance kswapd with changing mode to
> > COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
> > Please see the following thread:
> > 
> > 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
> > 
> >
> > ...
> >
> > --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
> > +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
> > @@ -1,6 +1,8 @@
> >  #ifndef _LINUX_COMPACTION_H
> >  #define _LINUX_COMPACTION_H
> >  
> > +#include <linux/node.h>
> 
> Why was this addition needed?  (I think I asked this before)

You did. :)

It is needed to fix build failure, please see:
http://www.spinics.net/lists/linux-mm/msg33901.html

> >  /* Return values for compact_zone() and try_to_compact_pages() */
> >  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
> >  #define COMPACT_SKIPPED		0
> >
> > ...
> >
> > +static bool can_rescue_unmovable_pageblock(struct page *page)
> > +{
> > +	struct zone *zone;
> > +	unsigned long pfn, start_pfn, end_pfn;
> > +	struct page *start_page;
> > +
> > +	zone = page_zone(page);
> > +	pfn = page_to_pfn(page);
> > +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> > +	start_page = pfn_to_page(start_pfn);
> > +
> > +	/*
> > +	 * Race with page allocator/reclaimer can happen so that it can
> > +	 * deceive unmovable block to migratable type on this pageblock.
> > +	 * It could regress on anti-fragmentation but it's rare and not
> > +	 * critical.
> > +	 */
> 
> This is quite ungramattical and needs a rewrite, please.  Suggest the
> use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
> rather than "unmovable block", etc.
> 
> Please explain "could regress" and also explain why it is "not critical".

Ok.

> > +	return __count_immobile_pages(zone, start_page, 0);
> > +}
> > +
> > +static void rescue_unmovable_pageblock(struct page *page)
> > +{
> > +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> > +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> > +
> > +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
> > +}
> > +
> > +/*
> > + * MIGRATE_TARGET : good for migration target
> > + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> 
> s/TARTET/TARGET/
> 
> > + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> > + * SKIP_TARGET : can't migrate by another reasons.
> > + */
> > +enum smt_result {
> > +	MIGRATE_TARGET,
> > +	RESCUE_UNMOVABLE_TARGET,
> > +	UNMOVABLE_TARGET,
> > +	SKIP_TARGET,
> > +};
> >  
> >
> > ...
> >
> > @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
> >   * page allocater never alloc memory from ISOLATE block.
> >   */
> >  
> > -static int
> > -__count_immobile_pages(struct zone *zone, struct page *page, int count)
> > +int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> 
> We may as well fix the return type of this while we're in there.  It
> should be bool.
> 
> Also, the comment over __count_immobile_pages() is utter rubbish.  Can
> we please cook up a new one?  Something human-readable which also
> describes the return value.

Ok.

> >  {
> >  	unsigned long pfn, iter, found;
> >  	int mt;
> > @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
> >  			continue;
> >  
> >  		page = pfn_to_page(check);
> > +
> > +		/* Do not deal with pageblocks that overlap zones */
> > +		if (page_zone(page) != zone)
> > +			return false;
> 
> I don't really understand this bit.  Wasn't it wrong to walk across
> zones in the original code?  Did you do something which will newly
> cause this to walk between zones?  It doesn't seem to be changelogged,
> and the comment commits the common mistake of explaining "what" but not
> "why".

Minchan explained this in his mail (https://lkml.org/lkml/2012/6/10/212):

"I saw similar function in isolate_freepages and remember Mel said. 

"
Node-0 Node-1 Node-0
DMA DMA DMA
0-1023 1024-2047 2048-4096

In that case, a PFN scanner can enter a new node and zone but the migrate
and free scanners have not necessarily met. This configuration is *extremely*
rare but it happens on messed-up LPAR configurations on POWER
"
http://lkml.indiana.edu/hypermail/linux/kernel/1002.2/01140.html"

but I wonder if it is really needed as __count_immobile_pages() is called per
pageblock and the code only walks inside one pageblock at time..

> 
> >  		if (!page_count(page)) {
> >  			if (PageBuddy(page))
> >  				iter += (1 << page_order(page)) - 1;
> >
> > ...
> >
> 
> A few tweaks:

Thanks!

My incremental fixes on top of your patch:

From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix-2

Fix can_rescue_unmovable_pageblock() comment and __count_immobile_pages()
documentation.

Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 mm/compaction.c |    9 +++++----
 mm/page_alloc.c |   16 +++++++++++-----
 2 files changed, 16 insertions(+), 9 deletions(-)

Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-11 11:43:08.619790227 +0200
+++ b/mm/compaction.c	2012-06-11 11:48:32.263790242 +0200
@@ -374,10 +374,11 @@ static bool can_rescue_unmovable_pageblo
 	start_page = pfn_to_page(start_pfn);
 
 	/*
-	 * Race with page allocator/reclaimer can happen so that it can
-	 * deceive unmovable block to migratable type on this pageblock.
-	 * It could regress on anti-fragmentation but it's rare and not
-	 * critical.
+	 * Race with page allocator/reclaimer can happen so it is possible
+	 * that MIGRATE_UNMOVABLE type page will end up in MIGRATE_MOVABLE
+	 * type pageblock.  However such situation is rare and not critical
+	 * (because page allocator fallback mechanism can also allocate
+	 * MIGRATE_UNMOVABLE type pages in MIGRATE_MOVABLE type pageblock).
 	 */
 	return __count_immobile_pages(zone, start_page, 0);
 }
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-06-11 11:49:31.603790172 +0200
+++ b/mm/page_alloc.c	2012-06-11 12:14:41.231789996 +0200
@@ -5469,12 +5469,18 @@ void set_pageblock_flags_group(struct pa
 			__clear_bit(bitidx + start_bitidx, bitmap);
 }
 
-/*
- * This is designed as sub function...plz see page_isolation.c also.
- * set/clear page block's type to be ISOLATE.
- * page allocater never alloc memory from ISOLATE block.
+/**
+ * __count_immobile_pages - Check pageblock for MIGRATE_UNMOVABLE type pages.
+ * @zone: Zone pages are in.
+ * @page: The first page in the pageblock.
+ * @count: The count of allowed MIGRATE_UNMOVABLE type pages.
+ *
+ * Count the number of MIGRATE_UNMOVABLE type pages in the pageblock
+ * starting with @page.  Returns true If the @zone is of ZONE_MOVABLE
+ * type or the pageblock is of MIGRATE_MOVABLE or MIGRATE_CMA type.
+ * If the number of MIGRATE_UNMOVABLE type pages inside the pageblock
+ * is higher than given by @count returns false, true otherwise.
  */
-
 bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
 {
 	unsigned long pfn, iter, found;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 10:37     ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-11 10:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Minchan Kim, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Linus Torvalds

On Friday 08 June 2012 23:18:33 Andrew Morton wrote:
> On Fri, 08 Jun 2012 10:46:32 +0200
> Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> 
> > 
> > Hi,
> > 
> > This version is much simpler as it just uses __count_immobile_pages()
> > instead of using its own open coded version and it integrates changes
> > from Minchan Kim (without page_count change as it doesn't seem correct
> > and __count_immobile_pages() does the check in the standard way; if it
> > still is a problem I think that removing 1st phase check altogether
> > would be better instead of adding more locking complexity).
> > 
> > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > to make it possible to easily check if the code is working in practice.
> > 
> > Best regards,
> > --
> > Bartlomiej Zolnierkiewicz
> > Samsung Poland R&D Center
> > 
> > 
> > From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
> > Subject: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
> > 
> > When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
> > type pageblock (and some MIGRATE_MOVABLE pages are left in it)
> > waiting until an allocation takes ownership of the block may
> > take too long.  The type of the pageblock remains unchanged
> > so the pageblock cannot be used as a migration target during
> > compaction.
> > 
> > Fix it by:
> > 
> > * Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
> >   and COMPACT_SYNC) and then converting sync field in struct
> >   compact_control to use it.
> > 
> > * Adding nr_pageblocks_skipped field to struct compact_control
> >   and tracking how many destination pageblocks were of
> >   MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
> >   ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
> >   that there is not a suitable page for allocation.  In this case
> >   then check how if there were enough MIGRATE_UNMOVABLE pageblocks
> >   to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.
> > 
> > * Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
> >   and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
> >   a count based on finding PageBuddy pages, page_count(page) == 0
> >   or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
> >   pageblock are in one of those three sets change the whole
> >   pageblock type to MIGRATE_MOVABLE.
> > 
> > My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> > which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> > - allocate 95000 pages for kernel's usage
> > - free every second page (47500 pages) of memory just allocated
> > - allocate and use 60000 pages from user space
> > - free remaining 60000 pages of kernel memory
> > (now we have fragmented memory occupied mostly by user space pages)
> > - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> > 
> > The results:
> > - with compaction disabled I get 10 successful allocations
> > - with compaction enabled - 11 successful allocations
> > - with this patch I'm able to get 25 successful allocations
> > 
> > NOTE: If we can make kswapd aware of order-0 request during
> > compaction, we can enhance kswapd with changing mode to
> > COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
> > Please see the following thread:
> > 
> > 	http://marc.info/?l=linux-mm&m=133552069417068&w=2
> > 
> >
> > ...
> >
> > --- a/include/linux/compaction.h	2012-06-08 09:01:32.041681656 +0200
> > +++ b/include/linux/compaction.h	2012-06-08 09:01:35.697681651 +0200
> > @@ -1,6 +1,8 @@
> >  #ifndef _LINUX_COMPACTION_H
> >  #define _LINUX_COMPACTION_H
> >  
> > +#include <linux/node.h>
> 
> Why was this addition needed?  (I think I asked this before)

You did. :)

It is needed to fix build failure, please see:
http://www.spinics.net/lists/linux-mm/msg33901.html

> >  /* Return values for compact_zone() and try_to_compact_pages() */
> >  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
> >  #define COMPACT_SKIPPED		0
> >
> > ...
> >
> > +static bool can_rescue_unmovable_pageblock(struct page *page)
> > +{
> > +	struct zone *zone;
> > +	unsigned long pfn, start_pfn, end_pfn;
> > +	struct page *start_page;
> > +
> > +	zone = page_zone(page);
> > +	pfn = page_to_pfn(page);
> > +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> > +	start_page = pfn_to_page(start_pfn);
> > +
> > +	/*
> > +	 * Race with page allocator/reclaimer can happen so that it can
> > +	 * deceive unmovable block to migratable type on this pageblock.
> > +	 * It could regress on anti-fragmentation but it's rare and not
> > +	 * critical.
> > +	 */
> 
> This is quite ungramattical and needs a rewrite, please.  Suggest the
> use of well-understood terms MIGRATE_UNMOVABLE, MIGRATE_MOVABLE etc
> rather than "unmovable block", etc.
> 
> Please explain "could regress" and also explain why it is "not critical".

Ok.

> > +	return __count_immobile_pages(zone, start_page, 0);
> > +}
> > +
> > +static void rescue_unmovable_pageblock(struct page *page)
> > +{
> > +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> > +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> > +
> > +	count_vm_event(COMPACT_RESCUED_UNMOVABLE_BLOCKS);
> > +}
> > +
> > +/*
> > + * MIGRATE_TARGET : good for migration target
> > + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> 
> s/TARTET/TARGET/
> 
> > + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> > + * SKIP_TARGET : can't migrate by another reasons.
> > + */
> > +enum smt_result {
> > +	MIGRATE_TARGET,
> > +	RESCUE_UNMOVABLE_TARGET,
> > +	UNMOVABLE_TARGET,
> > +	SKIP_TARGET,
> > +};
> >  
> >
> > ...
> >
> > @@ -5476,8 +5476,7 @@ void set_pageblock_flags_group(struct pa
> >   * page allocater never alloc memory from ISOLATE block.
> >   */
> >  
> > -static int
> > -__count_immobile_pages(struct zone *zone, struct page *page, int count)
> > +int __count_immobile_pages(struct zone *zone, struct page *page, int count)
> 
> We may as well fix the return type of this while we're in there.  It
> should be bool.
> 
> Also, the comment over __count_immobile_pages() is utter rubbish.  Can
> we please cook up a new one?  Something human-readable which also
> describes the return value.

Ok.

> >  {
> >  	unsigned long pfn, iter, found;
> >  	int mt;
> > @@ -5500,6 +5499,11 @@ __count_immobile_pages(struct zone *zone
> >  			continue;
> >  
> >  		page = pfn_to_page(check);
> > +
> > +		/* Do not deal with pageblocks that overlap zones */
> > +		if (page_zone(page) != zone)
> > +			return false;
> 
> I don't really understand this bit.  Wasn't it wrong to walk across
> zones in the original code?  Did you do something which will newly
> cause this to walk between zones?  It doesn't seem to be changelogged,
> and the comment commits the common mistake of explaining "what" but not
> "why".

Minchan explained this in his mail (https://lkml.org/lkml/2012/6/10/212):

"I saw similar function in isolate_freepages and remember Mel said. 

"
Node-0 Node-1 Node-0
DMA DMA DMA
0-1023 1024-2047 2048-4096

In that case, a PFN scanner can enter a new node and zone but the migrate
and free scanners have not necessarily met. This configuration is *extremely*
rare but it happens on messed-up LPAR configurations on POWER
"
http://lkml.indiana.edu/hypermail/linux/kernel/1002.2/01140.html"

but I wonder if it is really needed as __count_immobile_pages() is called per
pageblock and the code only walks inside one pageblock at time..

> 
> >  		if (!page_count(page)) {
> >  			if (PageBuddy(page))
> >  				iter += (1 << page_order(page)) - 1;
> >
> > ...
> >
> 
> A few tweaks:

Thanks!

My incremental fixes on top of your patch:

From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: mm-compaction-handle-incorrect-migrate_unmovable-type-pageblocks-fix-2

Fix can_rescue_unmovable_pageblock() comment and __count_immobile_pages()
documentation.

Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 mm/compaction.c |    9 +++++----
 mm/page_alloc.c |   16 +++++++++++-----
 2 files changed, 16 insertions(+), 9 deletions(-)

Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-11 11:43:08.619790227 +0200
+++ b/mm/compaction.c	2012-06-11 11:48:32.263790242 +0200
@@ -374,10 +374,11 @@ static bool can_rescue_unmovable_pageblo
 	start_page = pfn_to_page(start_pfn);
 
 	/*
-	 * Race with page allocator/reclaimer can happen so that it can
-	 * deceive unmovable block to migratable type on this pageblock.
-	 * It could regress on anti-fragmentation but it's rare and not
-	 * critical.
+	 * Race with page allocator/reclaimer can happen so it is possible
+	 * that MIGRATE_UNMOVABLE type page will end up in MIGRATE_MOVABLE
+	 * type pageblock.  However such situation is rare and not critical
+	 * (because page allocator fallback mechanism can also allocate
+	 * MIGRATE_UNMOVABLE type pages in MIGRATE_MOVABLE type pageblock).
 	 */
 	return __count_immobile_pages(zone, start_page, 0);
 }
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-06-11 11:49:31.603790172 +0200
+++ b/mm/page_alloc.c	2012-06-11 12:14:41.231789996 +0200
@@ -5469,12 +5469,18 @@ void set_pageblock_flags_group(struct pa
 			__clear_bit(bitidx + start_bitidx, bitmap);
 }
 
-/*
- * This is designed as sub function...plz see page_isolation.c also.
- * set/clear page block's type to be ISOLATE.
- * page allocater never alloc memory from ISOLATE block.
+/**
+ * __count_immobile_pages - Check pageblock for MIGRATE_UNMOVABLE type pages.
+ * @zone: Zone pages are in.
+ * @page: The first page in the pageblock.
+ * @count: The count of allowed MIGRATE_UNMOVABLE type pages.
+ *
+ * Count the number of MIGRATE_UNMOVABLE type pages in the pageblock
+ * starting with @page.  Returns true If the @zone is of ZONE_MOVABLE
+ * type or the pageblock is of MIGRATE_MOVABLE or MIGRATE_CMA type.
+ * If the number of MIGRATE_UNMOVABLE type pages inside the pageblock
+ * is higher than given by @count returns false, true otherwise.
  */
-
 bool __count_immobile_pages(struct zone *zone, struct page *page, int count)
 {
 	unsigned long pfn, iter, found;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-11  1:26   ` Minchan Kim
@ 2012-06-11 10:43     ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-11 10:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
> Hi Bartlomiej,
> 
> On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
> 
> > 
> > Hi,
> > 
> > This version is much simpler as it just uses __count_immobile_pages()
> > instead of using its own open coded version and it integrates changes
> 
> 
> That's a good idea. I don't have noticed that function is there.
> When I look at the function, it has a problem, too.
> Please, look at this.
> 
> https://lkml.org/lkml/2012/6/10/180
> 
> If reviewer is okay that patch, I would like to resend your patch based on that. 

Ok, I would later merge all changes into v11 and rebase on top of your patch.

> > from Minchan Kim (without page_count change as it doesn't seem correct
> 
> 
> Why do you think so?
> If it isn't correct, how can you prevent racing with THP page freeing?

After seeing the explanation for the previous fix it is all clear now.

> > and __count_immobile_pages() does the check in the standard way; if it
> > still is a problem I think that removing 1st phase check altogether
> > would be better instead of adding more locking complexity).
> > 
> > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > to make it possible to easily check if the code is working in practice.
> 
> 
> I think that part should be another patch.
> 
> 1. Adding new vmstat would be arguable so it might interrupt this patch merging.

Why would it be arguable?  It seems non-intrusive and obvious to me.

> 2. New vmstat adding is just for this patch is effective or not in real practice
>    so if we prove it in future, let's revert the vmstat. Separating it would make it
>    easily.

I would like to add this vmstat permanently, not only for the testing period..

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 10:43     ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-11 10:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
> Hi Bartlomiej,
> 
> On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
> 
> > 
> > Hi,
> > 
> > This version is much simpler as it just uses __count_immobile_pages()
> > instead of using its own open coded version and it integrates changes
> 
> 
> That's a good idea. I don't have noticed that function is there.
> When I look at the function, it has a problem, too.
> Please, look at this.
> 
> https://lkml.org/lkml/2012/6/10/180
> 
> If reviewer is okay that patch, I would like to resend your patch based on that. 

Ok, I would later merge all changes into v11 and rebase on top of your patch.

> > from Minchan Kim (without page_count change as it doesn't seem correct
> 
> 
> Why do you think so?
> If it isn't correct, how can you prevent racing with THP page freeing?

After seeing the explanation for the previous fix it is all clear now.

> > and __count_immobile_pages() does the check in the standard way; if it
> > still is a problem I think that removing 1st phase check altogether
> > would be better instead of adding more locking complexity).
> > 
> > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > to make it possible to easily check if the code is working in practice.
> 
> 
> I think that part should be another patch.
> 
> 1. Adding new vmstat would be arguable so it might interrupt this patch merging.

Why would it be arguable?  It seems non-intrusive and obvious to me.

> 2. New vmstat adding is just for this patch is effective or not in real practice
>    so if we prove it in future, let's revert the vmstat. Separating it would make it
>    easily.

I would like to add this vmstat permanently, not only for the testing period..

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-11 10:43     ` Bartlomiej Zolnierkiewicz
@ 2012-06-11 13:39       ` Minchan Kim
  -1 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11 13:39 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Minchan Kim, linux-mm, linux-kernel, Hugh Dickins,
	KOSAKI Motohiro, Dave Jones, Cong Wang, Markus Trippelsdorf,
	Mel Gorman, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, Linus Torvalds

On Mon, Jun 11, 2012 at 12:43:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
> > Hi Bartlomiej,
> > 
> > On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
> > 
> > > 
> > > Hi,
> > > 
> > > This version is much simpler as it just uses __count_immobile_pages()
> > > instead of using its own open coded version and it integrates changes
> > 
> > 
> > That's a good idea. I don't have noticed that function is there.
> > When I look at the function, it has a problem, too.
> > Please, look at this.
> > 
> > https://lkml.org/lkml/2012/6/10/180
> > 
> > If reviewer is okay that patch, I would like to resend your patch based on that. 
> 
> Ok, I would later merge all changes into v11 and rebase on top of your patch.
> 
> > > from Minchan Kim (without page_count change as it doesn't seem correct
> > 
> > 
> > Why do you think so?
> > If it isn't correct, how can you prevent racing with THP page freeing?
> 
> After seeing the explanation for the previous fix it is all clear now.
> 
> > > and __count_immobile_pages() does the check in the standard way; if it
> > > still is a problem I think that removing 1st phase check altogether
> > > would be better instead of adding more locking complexity).
> > > 
> > > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > > to make it possible to easily check if the code is working in practice.
> > 
> > 
> > I think that part should be another patch.
> > 
> > 1. Adding new vmstat would be arguable so it might interrupt this patch merging.
> 
> Why would it be arguable?  It seems non-intrusive and obvious to me.

Once you add new vmstat, someone can make another dependent code in userspace.
It means your new vmstat would become a new ABI so we should be careful.

> 
> > 2. New vmstat adding is just for this patch is effective or not in real practice
> >    so if we prove it in future, let's revert the vmstat. Separating it would make it
> >    easily.
> 
> I would like to add this vmstat permanently, not only for the testing period..

"I would like to add this vmstat permanently" isn't logical at all.
You should mention why we need such vmstat and how administrator can parse it/
handle it if he needs.

If we have Documentation/vmstat.txt, you should have written it down. Sigh.

> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 13:39       ` Minchan Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11 13:39 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Minchan Kim, linux-mm, linux-kernel, Hugh Dickins,
	KOSAKI Motohiro, Dave Jones, Cong Wang, Markus Trippelsdorf,
	Mel Gorman, Rik van Riel, Marek Szyprowski, Kyungmin Park,
	Andrew Morton, Linus Torvalds

On Mon, Jun 11, 2012 at 12:43:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
> > Hi Bartlomiej,
> > 
> > On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
> > 
> > > 
> > > Hi,
> > > 
> > > This version is much simpler as it just uses __count_immobile_pages()
> > > instead of using its own open coded version and it integrates changes
> > 
> > 
> > That's a good idea. I don't have noticed that function is there.
> > When I look at the function, it has a problem, too.
> > Please, look at this.
> > 
> > https://lkml.org/lkml/2012/6/10/180
> > 
> > If reviewer is okay that patch, I would like to resend your patch based on that. 
> 
> Ok, I would later merge all changes into v11 and rebase on top of your patch.
> 
> > > from Minchan Kim (without page_count change as it doesn't seem correct
> > 
> > 
> > Why do you think so?
> > If it isn't correct, how can you prevent racing with THP page freeing?
> 
> After seeing the explanation for the previous fix it is all clear now.
> 
> > > and __count_immobile_pages() does the check in the standard way; if it
> > > still is a problem I think that removing 1st phase check altogether
> > > would be better instead of adding more locking complexity).
> > > 
> > > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > > to make it possible to easily check if the code is working in practice.
> > 
> > 
> > I think that part should be another patch.
> > 
> > 1. Adding new vmstat would be arguable so it might interrupt this patch merging.
> 
> Why would it be arguable?  It seems non-intrusive and obvious to me.

Once you add new vmstat, someone can make another dependent code in userspace.
It means your new vmstat would become a new ABI so we should be careful.

> 
> > 2. New vmstat adding is just for this patch is effective or not in real practice
> >    so if we prove it in future, let's revert the vmstat. Separating it would make it
> >    easily.
> 
> I would like to add this vmstat permanently, not only for the testing period..

"I would like to add this vmstat permanently" isn't logical at all.
You should mention why we need such vmstat and how administrator can parse it/
handle it if he needs.

If we have Documentation/vmstat.txt, you should have written it down. Sigh.

> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-11 13:39       ` Minchan Kim
@ 2012-06-11 14:54         ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-11 14:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

On Monday 11 June 2012 15:39:16 Minchan Kim wrote:
> On Mon, Jun 11, 2012 at 12:43:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
> > > Hi Bartlomiej,
> > > 
> > > On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
> > > 
> > > > 
> > > > Hi,
> > > > 
> > > > This version is much simpler as it just uses __count_immobile_pages()
> > > > instead of using its own open coded version and it integrates changes
> > > 
> > > 
> > > That's a good idea. I don't have noticed that function is there.
> > > When I look at the function, it has a problem, too.
> > > Please, look at this.
> > > 
> > > https://lkml.org/lkml/2012/6/10/180
> > > 
> > > If reviewer is okay that patch, I would like to resend your patch based on that. 
> > 
> > Ok, I would later merge all changes into v11 and rebase on top of your patch.
> > 
> > > > from Minchan Kim (without page_count change as it doesn't seem correct
> > > 
> > > 
> > > Why do you think so?
> > > If it isn't correct, how can you prevent racing with THP page freeing?
> > 
> > After seeing the explanation for the previous fix it is all clear now.
> > 
> > > > and __count_immobile_pages() does the check in the standard way; if it
> > > > still is a problem I think that removing 1st phase check altogether
> > > > would be better instead of adding more locking complexity).
> > > > 
> > > > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > > > to make it possible to easily check if the code is working in practice.
> > > 
> > > 
> > > I think that part should be another patch.
> > > 
> > > 1. Adding new vmstat would be arguable so it might interrupt this patch merging.
> > 
> > Why would it be arguable?  It seems non-intrusive and obvious to me.
> 
> Once you add new vmstat, someone can make another dependent code in userspace.
> It means your new vmstat would become a new ABI so we should be careful.

I know about it but I doubt that it will be ever used by the user-space
for other purpose than showing kernel statistics (even that is unlikely).

> > 
> > > 2. New vmstat adding is just for this patch is effective or not in real practice
> > >    so if we prove it in future, let's revert the vmstat. Separating it would make it
> > >    easily.
> > 
> > I would like to add this vmstat permanently, not only for the testing period..
> 
> "I would like to add this vmstat permanently" isn't logical at all.
> You should mention why we need such vmstat and how administrator can parse it/
> handle it if he needs.

I quickly went through vmstat history and I see rationales like this:
"Optional patch, but useful for development and understanding the system."
for adding new vmstat counters.  The new counter falls into this category.

compact_rescued_unmovable_blocks shows the number of MIGRATE_UNMOVABLE
pageblocks converted back to MIGRATE_MOVABLE type by the memory compaction
code.  Non-zero values indicate that large kernel-originated allocations
of MIGRATE_UNMOVABLE type happen in the system and need special handling 
from the memory compaction code.

> If we have Documentation/vmstat.txt, you should have written it down. Sigh.

But we don't have vmstat.txt even though we have a lot of vmstat counters. :(

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 14:54         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 18+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-11 14:54 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

On Monday 11 June 2012 15:39:16 Minchan Kim wrote:
> On Mon, Jun 11, 2012 at 12:43:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
> > > Hi Bartlomiej,
> > > 
> > > On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
> > > 
> > > > 
> > > > Hi,
> > > > 
> > > > This version is much simpler as it just uses __count_immobile_pages()
> > > > instead of using its own open coded version and it integrates changes
> > > 
> > > 
> > > That's a good idea. I don't have noticed that function is there.
> > > When I look at the function, it has a problem, too.
> > > Please, look at this.
> > > 
> > > https://lkml.org/lkml/2012/6/10/180
> > > 
> > > If reviewer is okay that patch, I would like to resend your patch based on that. 
> > 
> > Ok, I would later merge all changes into v11 and rebase on top of your patch.
> > 
> > > > from Minchan Kim (without page_count change as it doesn't seem correct
> > > 
> > > 
> > > Why do you think so?
> > > If it isn't correct, how can you prevent racing with THP page freeing?
> > 
> > After seeing the explanation for the previous fix it is all clear now.
> > 
> > > > and __count_immobile_pages() does the check in the standard way; if it
> > > > still is a problem I think that removing 1st phase check altogether
> > > > would be better instead of adding more locking complexity).
> > > > 
> > > > The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
> > > > to make it possible to easily check if the code is working in practice.
> > > 
> > > 
> > > I think that part should be another patch.
> > > 
> > > 1. Adding new vmstat would be arguable so it might interrupt this patch merging.
> > 
> > Why would it be arguable?  It seems non-intrusive and obvious to me.
> 
> Once you add new vmstat, someone can make another dependent code in userspace.
> It means your new vmstat would become a new ABI so we should be careful.

I know about it but I doubt that it will be ever used by the user-space
for other purpose than showing kernel statistics (even that is unlikely).

> > 
> > > 2. New vmstat adding is just for this patch is effective or not in real practice
> > >    so if we prove it in future, let's revert the vmstat. Separating it would make it
> > >    easily.
> > 
> > I would like to add this vmstat permanently, not only for the testing period..
> 
> "I would like to add this vmstat permanently" isn't logical at all.
> You should mention why we need such vmstat and how administrator can parse it/
> handle it if he needs.

I quickly went through vmstat history and I see rationales like this:
"Optional patch, but useful for development and understanding the system."
for adding new vmstat counters.  The new counter falls into this category.

compact_rescued_unmovable_blocks shows the number of MIGRATE_UNMOVABLE
pageblocks converted back to MIGRATE_MOVABLE type by the memory compaction
code.  Non-zero values indicate that large kernel-originated allocations
of MIGRATE_UNMOVABLE type happen in the system and need special handling 
from the memory compaction code.

> If we have Documentation/vmstat.txt, you should have written it down. Sigh.

But we don't have vmstat.txt even though we have a lot of vmstat counters. :(

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-11 14:54         ` Bartlomiej Zolnierkiewicz
@ 2012-06-11 23:37           ` Minchan Kim
  -1 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11 23:37 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

On 06/11/2012 11:54 PM, Bartlomiej Zolnierkiewicz wrote:

> On Monday 11 June 2012 15:39:16 Minchan Kim wrote:
>> On Mon, Jun 11, 2012 at 12:43:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
>>> On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
>>>> Hi Bartlomiej,
>>>>
>>>> On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> This version is much simpler as it just uses __count_immobile_pages()
>>>>> instead of using its own open coded version and it integrates changes
>>>>
>>>>
>>>> That's a good idea. I don't have noticed that function is there.
>>>> When I look at the function, it has a problem, too.
>>>> Please, look at this.
>>>>
>>>> https://lkml.org/lkml/2012/6/10/180
>>>>
>>>> If reviewer is okay that patch, I would like to resend your patch based on that. 
>>>
>>> Ok, I would later merge all changes into v11 and rebase on top of your patch.
>>>
>>>>> from Minchan Kim (without page_count change as it doesn't seem correct
>>>>
>>>>
>>>> Why do you think so?
>>>> If it isn't correct, how can you prevent racing with THP page freeing?
>>>
>>> After seeing the explanation for the previous fix it is all clear now.
>>>
>>>>> and __count_immobile_pages() does the check in the standard way; if it
>>>>> still is a problem I think that removing 1st phase check altogether
>>>>> would be better instead of adding more locking complexity).
>>>>>
>>>>> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
>>>>> to make it possible to easily check if the code is working in practice.
>>>>
>>>>
>>>> I think that part should be another patch.
>>>>
>>>> 1. Adding new vmstat would be arguable so it might interrupt this patch merging.
>>>
>>> Why would it be arguable?  It seems non-intrusive and obvious to me.
>>
>> Once you add new vmstat, someone can make another dependent code in userspace.
>> It means your new vmstat would become a new ABI so we should be careful.
> 
> I know about it but I doubt that it will be ever used by the user-space
> for other purpose than showing kernel statistics (even that is unlikely).
> 
>>>
>>>> 2. New vmstat adding is just for this patch is effective or not in real practice
>>>>    so if we prove it in future, let's revert the vmstat. Separating it would make it
>>>>    easily.
>>>
>>> I would like to add this vmstat permanently, not only for the testing period..
>>
>> "I would like to add this vmstat permanently" isn't logical at all.
>> You should mention why we need such vmstat and how administrator can parse it/
>> handle it if he needs.
> 
> I quickly went through vmstat history and I see rationales like this:
> "Optional patch, but useful for development and understanding the system."
> for adding new vmstat counters.  The new counter falls into this category.
> 


I can't agree your word "useful for development"
If we did it long time ago, I think it's wrong decision and never done again.

> compact_rescued_unmovable_blocks shows the number of MIGRATE_UNMOVABLE
> pageblocks converted back to MIGRATE_MOVABLE type by the memory compaction
> code.  Non-zero values indicate that large kernel-originated allocations
> of MIGRATE_UNMOVABLE type happen in the system and need special handling 
> from the memory compaction code.


So what can admin do it after he looked at?
Why should he know about "some unmovable block rescued"?

Thing admin want is allocation ratio or compaction ratio, NOT how many rescued.
If the patch is useful, THP alloc/compaction success ratio could be improved.
I think it's enough if you try to want to maintain it permanently.

> 
>> If we have Documentation/vmstat.txt, you should have written it down. Sigh.
> 
> But we don't have vmstat.txt even though we have a lot of vmstat counters. :(
> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 23:37           ` Minchan Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Minchan Kim @ 2012-06-11 23:37 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Hugh Dickins, KOSAKI Motohiro,
	Dave Jones, Cong Wang, Markus Trippelsdorf, Mel Gorman,
	Rik van Riel, Marek Szyprowski, Kyungmin Park, Andrew Morton,
	Linus Torvalds

On 06/11/2012 11:54 PM, Bartlomiej Zolnierkiewicz wrote:

> On Monday 11 June 2012 15:39:16 Minchan Kim wrote:
>> On Mon, Jun 11, 2012 at 12:43:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
>>> On Monday 11 June 2012 03:26:49 Minchan Kim wrote:
>>>> Hi Bartlomiej,
>>>>
>>>> On 06/08/2012 05:46 PM, Bartlomiej Zolnierkiewicz wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> This version is much simpler as it just uses __count_immobile_pages()
>>>>> instead of using its own open coded version and it integrates changes
>>>>
>>>>
>>>> That's a good idea. I don't have noticed that function is there.
>>>> When I look at the function, it has a problem, too.
>>>> Please, look at this.
>>>>
>>>> https://lkml.org/lkml/2012/6/10/180
>>>>
>>>> If reviewer is okay that patch, I would like to resend your patch based on that. 
>>>
>>> Ok, I would later merge all changes into v11 and rebase on top of your patch.
>>>
>>>>> from Minchan Kim (without page_count change as it doesn't seem correct
>>>>
>>>>
>>>> Why do you think so?
>>>> If it isn't correct, how can you prevent racing with THP page freeing?
>>>
>>> After seeing the explanation for the previous fix it is all clear now.
>>>
>>>>> and __count_immobile_pages() does the check in the standard way; if it
>>>>> still is a problem I think that removing 1st phase check altogether
>>>>> would be better instead of adding more locking complexity).
>>>>>
>>>>> The patch also adds compact_rescued_unmovable_blocks vmevent to vmstats
>>>>> to make it possible to easily check if the code is working in practice.
>>>>
>>>>
>>>> I think that part should be another patch.
>>>>
>>>> 1. Adding new vmstat would be arguable so it might interrupt this patch merging.
>>>
>>> Why would it be arguable?  It seems non-intrusive and obvious to me.
>>
>> Once you add new vmstat, someone can make another dependent code in userspace.
>> It means your new vmstat would become a new ABI so we should be careful.
> 
> I know about it but I doubt that it will be ever used by the user-space
> for other purpose than showing kernel statistics (even that is unlikely).
> 
>>>
>>>> 2. New vmstat adding is just for this patch is effective or not in real practice
>>>>    so if we prove it in future, let's revert the vmstat. Separating it would make it
>>>>    easily.
>>>
>>> I would like to add this vmstat permanently, not only for the testing period..
>>
>> "I would like to add this vmstat permanently" isn't logical at all.
>> You should mention why we need such vmstat and how administrator can parse it/
>> handle it if he needs.
> 
> I quickly went through vmstat history and I see rationales like this:
> "Optional patch, but useful for development and understanding the system."
> for adding new vmstat counters.  The new counter falls into this category.
> 


I can't agree your word "useful for development"
If we did it long time ago, I think it's wrong decision and never done again.

> compact_rescued_unmovable_blocks shows the number of MIGRATE_UNMOVABLE
> pageblocks converted back to MIGRATE_MOVABLE type by the memory compaction
> code.  Non-zero values indicate that large kernel-originated allocations
> of MIGRATE_UNMOVABLE type happen in the system and need special handling 
> from the memory compaction code.


So what can admin do it after he looked at?
Why should he know about "some unmovable block rescued"?

Thing admin want is allocation ratio or compaction ratio, NOT how many rescued.
If the patch is useful, THP alloc/compaction success ratio could be improved.
I think it's enough if you try to want to maintain it permanently.

> 
>> If we have Documentation/vmstat.txt, you should have written it down. Sigh.
> 
> But we don't have vmstat.txt even though we have a lot of vmstat counters. :(
> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-06-11 23:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-08  8:46 [PATCH v10] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks Bartlomiej Zolnierkiewicz
2012-06-08  8:46 ` Bartlomiej Zolnierkiewicz
2012-06-08 21:18 ` Andrew Morton
2012-06-08 21:18   ` Andrew Morton
2012-06-11  2:05   ` Minchan Kim
2012-06-11  2:05     ` Minchan Kim
2012-06-11 10:37   ` Bartlomiej Zolnierkiewicz
2012-06-11 10:37     ` Bartlomiej Zolnierkiewicz
2012-06-11  1:26 ` Minchan Kim
2012-06-11  1:26   ` Minchan Kim
2012-06-11 10:43   ` Bartlomiej Zolnierkiewicz
2012-06-11 10:43     ` Bartlomiej Zolnierkiewicz
2012-06-11 13:39     ` Minchan Kim
2012-06-11 13:39       ` Minchan Kim
2012-06-11 14:54       ` Bartlomiej Zolnierkiewicz
2012-06-11 14:54         ` Bartlomiej Zolnierkiewicz
2012-06-11 23:37         ` Minchan Kim
2012-06-11 23:37           ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.