linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] improve robustness on handling migratetype
@ 2014-01-09  7:04 Joonsoo Kim
  2014-01-09  7:04 ` [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Joonsoo Kim
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Hello,

I found some weaknesses on handling migratetype during code review and
testing CMA.

First, we don't have any synchronization method on get/set pageblock
migratetype. When we change migratetype, we hold the zone lock. So
writer-writer race doesn't exist. But while someone changes migratetype,
others can get migratetype. This may introduce totally unintended value
as migratetype. Although I haven't heard of any problem report about
that, it is better to protect properly.

Second, (get/set)_freepage_migrate isn't used properly. I guess that it
would be introduced for per cpu page(pcp) performance, but, it is also
used by memory isolation, now. For that case, the information isn't
enough to use, so we need to fix it.

Third, there is the problem on buddy allocator. It doesn't consider
migratetype when merging buddy, so pages from cma or isolate region can
be moved to other migratetype freelist. It makes CMA failed over and over.
To prevent it, the buddy allocator should consider migratetype if
CMA/ISOLATE is enabled.

This patchset is aimed at fixing these problems and based on v3.13-rc7.

Thanks.

Joonsoo Kim (7):
  mm/page_alloc: synchronize get/set pageblock
  mm/cma: fix cma free page accounting
  mm/page_alloc: move set_freepage_migratetype() to better place
  mm/isolation: remove invalid check condition
  mm/page_alloc: separate interface to set/get migratetype of freepage
  mm/page_alloc: store freelist migratetype to the page on buddy
    properly
  mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy

 include/linux/mm.h             |   35 +++++++++++++++++++++---
 include/linux/mmzone.h         |    2 ++
 include/linux/page-isolation.h |    1 -
 mm/page_alloc.c                |   59 ++++++++++++++++++++++++++--------------
 mm/page_isolation.c            |    5 +---
 5 files changed, 73 insertions(+), 29 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09  9:08   ` Michal Nazarewicz
  2014-01-09  7:04 ` [PATCH 2/7] mm/cma: fix cma free page accounting Joonsoo Kim
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Now get/set pageblock is done without any syncronization. Therefore
there is race condition and migratetype can be unintended value.
Sometime we move some pageblocks from one migratetype to the other
type, and, at the sametime, some page in this pageblock could be
freed. In this case, we can get totally unintended value,
since get/set pageblock don't get/set atomically. Instead, it is
accessed in bit unit.

Since set pageblock isn't used frequently rather than get pageblock,
I think that seqlock is proper method to synchronize it. This type
of lock has minimum overhead if there are a lot of readers and few
of writers. So it fits to this situation.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bd791e4..feaa607 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -79,6 +79,7 @@ static inline int get_pageblock_migratetype(struct page *page)
 {
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
+void set_pageblock_migratetype(struct page *page, int migratetype);
 
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
@@ -367,6 +368,7 @@ struct zone {
 #endif
 	struct free_area	free_area[MAX_ORDER];
 
+	seqlock_t		pageblock_seqlock;
 #ifndef CONFIG_SPARSEMEM
 	/*
 	 * Flags for a pageblock_nr_pages block. See pageblock-flags.h.
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3fff8e7..58e2a89 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -23,7 +23,6 @@ static inline bool is_migrate_isolate(int migratetype)
 
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
-void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
 				int migratetype);
 int move_freepages(struct zone *zone,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5248fe0..b36aa5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4788,6 +4788,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 		spin_lock_init(&zone->lock);
 		spin_lock_init(&zone->lru_lock);
 		zone_seqlock_init(zone);
+		seqlock_init(&zone->pageblock_seqlock);
 		zone->zone_pgdat = pgdat;
 		zone_pcp_init(zone);
 
@@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct page *page,
 	unsigned long pfn, bitidx;
 	unsigned long flags = 0;
 	unsigned long value = 1;
+	unsigned int seq;
 
 	zone = page_zone(page);
 	pfn = page_to_pfn(page);
 	bitmap = get_pageblock_bitmap(zone, pfn);
 	bitidx = pfn_to_bitidx(zone, pfn);
 
-	for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
-		if (test_bit(bitidx + start_bitidx, bitmap))
-			flags |= value;
+	do {
+		seq = read_seqbegin(&zone->pageblock_seqlock);
+		for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
+			if (test_bit(bitidx + start_bitidx, bitmap))
+				flags |= value;
+	} while (read_seqretry(&zone->pageblock_seqlock, seq));
 
 	return flags;
 }
@@ -5954,6 +5959,7 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
 	unsigned long *bitmap;
 	unsigned long pfn, bitidx;
 	unsigned long value = 1;
+	unsigned long irq_flags;
 
 	zone = page_zone(page);
 	pfn = page_to_pfn(page);
@@ -5961,11 +5967,13 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
 	bitidx = pfn_to_bitidx(zone, pfn);
 	VM_BUG_ON(!zone_spans_pfn(zone, pfn));
 
+	write_seqlock_irqsave(&zone->pageblock_seqlock, irq_flags);
 	for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
 		if (flags & value)
 			__set_bit(bitidx + start_bitidx, bitmap);
 		else
 			__clear_bit(bitidx + start_bitidx, bitmap);
+	write_sequnlock_irqrestore(&zone->pageblock_seqlock, irq_flags);
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] mm/cma: fix cma free page accounting
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
  2014-01-09  7:04 ` [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09 21:10   ` Laura Abbott
  2014-01-09  7:04 ` [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place Joonsoo Kim
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Cma pages can be allocated by not only order 0 request but also high order
request. So, we should consider to account free cma page in the both
places.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b36aa5a..1489c301 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 							  start_migratetype,
 							  migratetype);
 
+			/* CMA pages cannot be stolen */
+			if (is_migrate_cma(migratetype)) {
+				__mod_zone_page_state(zone,
+					NR_FREE_CMA_PAGES, -(1 << order));
+			}
+
 			/* Remove the page from the freelists */
 			list_del(&page->lru);
 			rmv_page_order(page);
@@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		}
 		set_freepage_migratetype(page, mt);
 		list = &page->lru;
-		if (is_migrate_cma(mt))
-			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-					      -(1 << order));
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
 	spin_unlock(&zone->lock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
  2014-01-09  7:04 ` [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Joonsoo Kim
  2014-01-09  7:04 ` [PATCH 2/7] mm/cma: fix cma free page accounting Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09  7:04 ` [PATCH 4/7] mm/isolation: remove invalid check condition Joonsoo Kim
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

set_freepage_migratetype() inform us of the buddy freelist where
the page should be linked when it goes to buddy freelist.

Now, it has done in rmqueue_bulk() so that we should call
get_pageblock_migratetype() to know it's migratetype exactly if
CONFIG_CMA is enabled. That function has some overhead so that
removing it is preferable.

To remove it, we move set_freepage_migratetype() to __rmqueue_fallback()
and __rmqueue_smallest(). In those functions, we can know migratetype
easily so that we don't need to call get_pageblock_migratetype().

Removing is_migrate_isolate() is safe since what we want to ensure is
that the page from cma will not go to other migratetype freelist.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1489c301..4913829 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -903,6 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 		rmv_page_order(page);
 		area->nr_free--;
 		expand(zone, page, order, current_order, area, migratetype);
+		set_freepage_migratetype(page, migratetype);
 		return page;
 	}
 
@@ -1093,8 +1094,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 
 			/* CMA pages cannot be stolen */
 			if (is_migrate_cma(migratetype)) {
+				set_freepage_migratetype(page, migratetype);
 				__mod_zone_page_state(zone,
 					NR_FREE_CMA_PAGES, -(1 << order));
+			} else {
+				set_freepage_migratetype(page,
+							start_migratetype);
 			}
 
 			/* Remove the page from the freelists */
@@ -1153,7 +1158,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
 			int migratetype, int cold)
 {
-	int mt = migratetype, i;
+	int i;
 
 	spin_lock(&zone->lock);
 	for (i = 0; i < count; ++i) {
@@ -1174,12 +1179,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		if (IS_ENABLED(CONFIG_CMA)) {
-			mt = get_pageblock_migratetype(page);
-			if (!is_migrate_cma(mt) && !is_migrate_isolate(mt))
-				mt = migratetype;
-		}
-		set_freepage_migratetype(page, mt);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] mm/isolation: remove invalid check condition
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
                   ` (2 preceding siblings ...)
  2014-01-09  7:04 ` [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09  7:04 ` [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Joonsoo Kim
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

test_page_isolated() checks stability of pages. It checks two conditions,
one is that the page is on isolate migratetype and the other is that
the page is on the buddy and the isolate freelist. With satisfying
these two conditions, we can determine that the page is stable and then
go forward.

__test_page_isolated_in_pageblock() is one of the main functions for this
test. In that function, if it meets the page with page_count 0 and
isolate migratetype, it decides that this page is stable. But this is not
true, because there is possiblity that this kind of page is on the pcp
and then it can be allocated by other users even though we hold the zone
lock. So removing this check.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..534fb3a 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -199,9 +199,6 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			}
 			pfn += 1 << page_order(page);
 		}
-		else if (page_count(page) == 0 &&
-			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
-			pfn += 1;
 		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
 			 * The HWPoisoned page may be not in buddy
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
                   ` (3 preceding siblings ...)
  2014-01-09  7:04 ` [PATCH 4/7] mm/isolation: remove invalid check condition Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09  9:18   ` Michal Nazarewicz
  2014-01-09  7:04 ` [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Joonsoo Kim
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

Currently, we use (set/get)_freepage_migratetype in two use cases.
One is to know the buddy list where this page will be linked and
the other is to know the buddy list where this page is linked now.

But, we should deal these two use cases differently, because information
isn't sufficient for the second use case and properly setting this
information needs some overhead. Whenever the page is merged or split
in buddy, this information isn't properly re-assigned and it may not
have enough information for the second use case.

This patch just separates interface, so there is no functional change.
Following patch will do further steps about this issue.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3552717..2733e0b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -257,14 +257,31 @@ struct inode;
 #define page_private(page)		((page)->private)
 #define set_page_private(page, v)	((page)->private = (v))
 
-/* It's valid only if the page is free path or free_list */
-static inline void set_freepage_migratetype(struct page *page, int migratetype)
+/*
+ * It's valid only if the page is on buddy. It represents
+ * which freelist the page is linked.
+ */
+static inline void set_buddy_migratetype(struct page *page, int migratetype)
+{
+	page->index = migratetype;
+}
+
+static inline int get_buddy_migratetype(struct page *page)
+{
+	return page->index;
+}
+
+/*
+ * It's valid only if the page is on pcp list. It represents
+ * which freelist the page should go on buddy.
+ */
+static inline void set_pcp_migratetype(struct page *page, int migratetype)
 {
 	page->index = migratetype;
 }
 
-/* It's valid only if the page is free path or free_list */
-static inline int get_freepage_migratetype(struct page *page)
+/* It's valid only if the page is on pcp list */
+static inline int get_pcp_migratetype(struct page *page)
 {
 	return page->index;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4913829..c9e6622 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -681,7 +681,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			page = list_entry(list->prev, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
-			mt = get_freepage_migratetype(page);
+			mt = get_pcp_migratetype(page);
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, zone, 0, mt);
 			trace_mm_page_pcpu_drain(page, 0, mt);
@@ -745,7 +745,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_save(flags);
 	__count_vm_events(PGFREE, 1 << order);
 	migratetype = get_pageblock_migratetype(page);
-	set_freepage_migratetype(page, migratetype);
+	set_buddy_migratetype(page, migratetype);
 	free_one_page(page_zone(page), page, order, migratetype);
 	local_irq_restore(flags);
 }
@@ -903,7 +903,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 		rmv_page_order(page);
 		area->nr_free--;
 		expand(zone, page, order, current_order, area, migratetype);
-		set_freepage_migratetype(page, migratetype);
+		set_pcp_migratetype(page, migratetype);
 		return page;
 	}
 
@@ -971,7 +971,7 @@ int move_freepages(struct zone *zone,
 		order = page_order(page);
 		list_move(&page->lru,
 			  &zone->free_area[order].free_list[migratetype]);
-		set_freepage_migratetype(page, migratetype);
+		set_buddy_migratetype(page, migratetype);
 		page += 1 << order;
 		pages_moved += 1 << order;
 	}
@@ -1094,12 +1094,11 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 
 			/* CMA pages cannot be stolen */
 			if (is_migrate_cma(migratetype)) {
-				set_freepage_migratetype(page, migratetype);
+				set_pcp_migratetype(page, migratetype);
 				__mod_zone_page_state(zone,
 					NR_FREE_CMA_PAGES, -(1 << order));
 			} else {
-				set_freepage_migratetype(page,
-							start_migratetype);
+				set_pcp_migratetype(page, start_migratetype);
 			}
 
 			/* Remove the page from the freelists */
@@ -1346,7 +1345,7 @@ void free_hot_cold_page(struct page *page, int cold)
 		return;
 
 	migratetype = get_pageblock_migratetype(page);
-	set_freepage_migratetype(page, migratetype);
+	set_pcp_migratetype(page, migratetype);
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 534fb3a..c341413 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -190,7 +190,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 			 * is MIGRATE_ISOLATE. Catch it and move the page into
 			 * MIGRATE_ISOLATE list.
 			 */
-			if (get_freepage_migratetype(page) != MIGRATE_ISOLATE) {
+			if (get_buddy_migratetype(page) != MIGRATE_ISOLATE) {
 				struct page *end_page;
 
 				end_page = page + (1 << page_order(page)) - 1;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
                   ` (4 preceding siblings ...)
  2014-01-09  7:04 ` [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09  9:19   ` Michal Nazarewicz
  2014-01-09  7:04 ` [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Joonsoo Kim
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

To maintain freelist migratetype information on buddy pages, migratetype
should be set again whenever the page order is changed. set_page_order()
is the best place to do, because it is called whenever the page order is
changed, so this patch adds set_buddy_migratetype() to set_page_order().

And this patch makes set/get_buddy_migratetype() only enabled if it is
really needed, because it has some overhead.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2733e0b..046e09f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -258,6 +258,12 @@ struct inode;
 #define set_page_private(page, v)	((page)->private = (v))
 
 /*
+ * This is for tracking the type of the list on buddy.
+ * It imposes some performance overhead to the buddy allocator,
+ * so we make it enabled only if it is needed.
+ */
+#if defined(CONFIG_MEMORY_ISOLATION) || defined(CONFIG_CMA)
+/*
  * It's valid only if the page is on buddy. It represents
  * which freelist the page is linked.
  */
@@ -270,6 +276,10 @@ static inline int get_buddy_migratetype(struct page *page)
 {
 	return page->index;
 }
+#else
+static inline void set_buddy_migratetype(struct page *page, int migratetype) {}
+static inline int get_buddy_migratetype(struct page *page) { return 0; }
+#endif
 
 /*
  * It's valid only if the page is on pcp list. It represents
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c9e6622..2548b42 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -446,9 +446,11 @@ static inline void set_page_guard_flag(struct page *page) { }
 static inline void clear_page_guard_flag(struct page *page) { }
 #endif
 
-static inline void set_page_order(struct page *page, int order)
+static inline void set_page_order(struct page *page, int order,
+						int migratetype)
 {
 	set_page_private(page, order);
+	set_buddy_migratetype(page, migratetype);
 	__SetPageBuddy(page);
 }
 
@@ -588,7 +590,7 @@ static inline void __free_one_page(struct page *page,
 		page_idx = combined_idx;
 		order++;
 	}
-	set_page_order(page, order);
+	set_page_order(page, order, migratetype);
 
 	/*
 	 * If this is not the largest possible page, check if the buddy
@@ -745,7 +747,6 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	local_irq_save(flags);
 	__count_vm_events(PGFREE, 1 << order);
 	migratetype = get_pageblock_migratetype(page);
-	set_buddy_migratetype(page, migratetype);
 	free_one_page(page_zone(page), page, order, migratetype);
 	local_irq_restore(flags);
 }
@@ -834,7 +835,7 @@ static inline void expand(struct zone *zone, struct page *page,
 #endif
 		list_add(&page[size].lru, &area->free_list[migratetype]);
 		area->nr_free++;
-		set_page_order(&page[size], high);
+		set_page_order(&page[size], high, migratetype);
 	}
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
                   ` (5 preceding siblings ...)
  2014-01-09  7:04 ` [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Joonsoo Kim
@ 2014-01-09  7:04 ` Joonsoo Kim
  2014-01-09  9:22   ` Michal Nazarewicz
  2014-01-09  9:06 ` [PATCH 0/7] improve robustness on handling migratetype Michal Nazarewicz
  2014-01-09  9:27 ` Mel Gorman
  8 siblings, 1 reply; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09  7:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim, Joonsoo Kim

If (MAX_ORDER-1) is greater than pageblock order, there is a possibility
to merge different migratetype pages and to be linked in unintended
freelist.

While I test CMA, CMA pages are merged and linked into MOVABLE freelist
by above issue and then, the pages change their migratetype to UNMOVABLE by
try_to_steal_freepages(). After that, CMA to this region always fail.

To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE)
freelist.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2548b42..ea99cee 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page,
 			__mod_zone_freepage_state(zone, 1 << order,
 						  migratetype);
 		} else {
+			int buddy_mt = get_buddy_migratetype(buddy);
+
+			/* We don't want to merge cma, isolate pages */
+			if (unlikely(order >= pageblock_order) &&
+				migratetype != buddy_mt &&
+				(migratetype >= MIGRATE_PCPTYPES ||
+				buddy_mt >= MIGRATE_PCPTYPES)) {
+				break;
+			}
 			list_del(&buddy->lru);
 			zone->free_area[order].nr_free--;
 			rmv_page_order(buddy);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
                   ` (6 preceding siblings ...)
  2014-01-09  7:04 ` [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Joonsoo Kim
@ 2014-01-09  9:06 ` Michal Nazarewicz
  2014-01-09 14:05   ` Joonsoo Kim
  2014-01-09  9:27 ` Mel Gorman
  8 siblings, 1 reply; 24+ messages in thread
From: Michal Nazarewicz @ 2014-01-09  9:06 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Andi Kleen, Wei Yongjun, Tang Chen, linux-mm, linux-kernel,
	Joonsoo Kim, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

On Thu, Jan 09 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> Third, there is the problem on buddy allocator. It doesn't consider
> migratetype when merging buddy, so pages from cma or isolate region can
> be moved to other migratetype freelist. It makes CMA failed over and over.
> To prevent it, the buddy allocator should consider migratetype if
> CMA/ISOLATE is enabled.

There should never be situation where a CMA page shares a pageblock (or
a max-order page) with a non-CMA page though, so this should never be an
issue.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock
  2014-01-09  7:04 ` [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Joonsoo Kim
@ 2014-01-09  9:08   ` Michal Nazarewicz
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Nazarewicz @ 2014-01-09  9:08 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Andi Kleen, Wei Yongjun, Tang Chen, linux-mm, linux-kernel,
	Joonsoo Kim, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 1130 bytes --]

On Thu, Jan 09 2014, Joonsoo Kim wrote:
> @@ -5927,15 +5928,19 @@ unsigned long get_pageblock_flags_group(struct page *page,
>  	unsigned long pfn, bitidx;
>  	unsigned long flags = 0;
>  	unsigned long value = 1;
> +	unsigned int seq;
>  
>  	zone = page_zone(page);
>  	pfn = page_to_pfn(page);
>  	bitmap = get_pageblock_bitmap(zone, pfn);
>  	bitidx = pfn_to_bitidx(zone, pfn);
>  
> -	for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
> -		if (test_bit(bitidx + start_bitidx, bitmap))
> -			flags |= value;
> +	do {

+		flags = 0;

> +		seq = read_seqbegin(&zone->pageblock_seqlock);
> +		for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
> +			if (test_bit(bitidx + start_bitidx, bitmap))
> +				flags |= value;
> +	} while (read_seqretry(&zone->pageblock_seqlock, seq));
>  
>  	return flags;
>  }

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage
  2014-01-09  7:04 ` [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Joonsoo Kim
@ 2014-01-09  9:18   ` Michal Nazarewicz
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Nazarewicz @ 2014-01-09  9:18 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Andi Kleen, Wei Yongjun, Tang Chen, linux-mm, linux-kernel,
	Joonsoo Kim, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

On Thu, Jan 09 2014, Joonsoo Kim wrote:
> Currently, we use (set/get)_freepage_migratetype in two use cases.
> One is to know the buddy list where this page will be linked and
> the other is to know the buddy list where this page is linked now.
>
> But, we should deal these two use cases differently, because information
> isn't sufficient for the second use case and properly setting this
> information needs some overhead. Whenever the page is merged or split
> in buddy, this information isn't properly re-assigned and it may not
> have enough information for the second use case.
>
> This patch just separates interface, so there is no functional change.
> Following patch will do further steps about this issue.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

I think this patch would be smaller if it was pushed earlier in the
patchset.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly
  2014-01-09  7:04 ` [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Joonsoo Kim
@ 2014-01-09  9:19   ` Michal Nazarewicz
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Nazarewicz @ 2014-01-09  9:19 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Andi Kleen, Wei Yongjun, Tang Chen, linux-mm, linux-kernel,
	Joonsoo Kim, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

On Thu, Jan 09 2014, Joonsoo Kim wrote:
> To maintain freelist migratetype information on buddy pages, migratetype
> should be set again whenever the page order is changed. set_page_order()
> is the best place to do, because it is called whenever the page order is
> changed, so this patch adds set_buddy_migratetype() to set_page_order().
>
> And this patch makes set/get_buddy_migratetype() only enabled if it is
> really needed, because it has some overhead.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy
  2014-01-09  7:04 ` [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Joonsoo Kim
@ 2014-01-09  9:22   ` Michal Nazarewicz
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Nazarewicz @ 2014-01-09  9:22 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Andi Kleen, Wei Yongjun, Tang Chen, linux-mm, linux-kernel,
	Joonsoo Kim, Joonsoo Kim

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

On Thu, Jan 09 2014, Joonsoo Kim wrote:
> If (MAX_ORDER-1) is greater than pageblock order, there is a possibility
> to merge different migratetype pages and to be linked in unintended
> freelist.
>
> While I test CMA, CMA pages are merged and linked into MOVABLE freelist
> by above issue and then, the pages change their migratetype to UNMOVABLE by
> try_to_steal_freepages(). After that, CMA to this region always fail.
>
> To prevent this, we should not merge the page on MIGRATE_(CMA|ISOLATE)
> freelist.

This is strange.  CMA regions are always multiplies of max-pages (or
pageblocks whichever is larger), so MOVABLE free pages should never be
inside of a CMA region.

If what you're describing happens, it looks like an issue somewhere
else.

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2548b42..ea99cee 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -581,6 +581,15 @@ static inline void __free_one_page(struct page *page,
>  			__mod_zone_freepage_state(zone, 1 << order,
>  						  migratetype);
>  		} else {
> +			int buddy_mt = get_buddy_migratetype(buddy);
> +
> +			/* We don't want to merge cma, isolate pages */
> +			if (unlikely(order >= pageblock_order) &&
> +				migratetype != buddy_mt &&
> +				(migratetype >= MIGRATE_PCPTYPES ||
> +				buddy_mt >= MIGRATE_PCPTYPES)) {
> +				break;
> +			}
>  			list_del(&buddy->lru);
>  			zone->free_area[order].nr_free--;
>  			rmv_page_order(buddy);
> -- 
> 1.7.9.5
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2.1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
                   ` (7 preceding siblings ...)
  2014-01-09  9:06 ` [PATCH 0/7] improve robustness on handling migratetype Michal Nazarewicz
@ 2014-01-09  9:27 ` Mel Gorman
  2014-01-10  8:48   ` Joonsoo Kim
  8 siblings, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2014-01-09  9:27 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Jiang Liu,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim

On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> Hello,
> 
> I found some weaknesses on handling migratetype during code review and
> testing CMA.
> 
> First, we don't have any synchronization method on get/set pageblock
> migratetype. When we change migratetype, we hold the zone lock. So
> writer-writer race doesn't exist. But while someone changes migratetype,
> others can get migratetype. This may introduce totally unintended value
> as migratetype. Although I haven't heard of any problem report about
> that, it is better to protect properly.
> 

This is deliberate. The migratetypes for the majority of users are advisory
and aimed for fragmentation avoidance. It was important that the cost of
that be kept as low as possible and the general case is that migration types
change very rarely. In many cases, the zone lock is held. In other cases,
such as splitting free pages, the cost is simply not justified.

I doubt there is any amount of data you could add in support that would
justify hammering the free fast paths (which call get_pageblock_type).

> Second, (get/set)_freepage_migrate isn't used properly. I guess that it
> would be introduced for per cpu page(pcp) performance, but, it is also
> used by memory isolation, now. For that case, the information isn't
> enough to use, so we need to fix it.
> 
> Third, there is the problem on buddy allocator. It doesn't consider
> migratetype when merging buddy, so pages from cma or isolate region can
> be moved to other migratetype freelist. It makes CMA failed over and over.
> To prevent it, the buddy allocator should consider migratetype if
> CMA/ISOLATE is enabled.

Without loioing at the patches, this is likely to add some cost to the
page free fast path -- heavy cost if it's a pageblock lookup and lighter
cost if you are using cached page information which is potentially stale.
Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary
instead to avoid any possibility of merging issues?

> This patchset is aimed at fixing these problems and based on v3.13-rc7.
> 
>   mm/page_alloc: synchronize get/set pageblock

cost with no justification.

>   mm/cma: fix cma free page accounting

sounds like it would be a fix but unrelated to the leader and should be
seperated out on its own

>   mm/page_alloc: move set_freepage_migratetype() to better place

Very vague. If this does something useful then it could do with a better
subject.

>   mm/isolation: remove invalid check condition

Looks harmless.

>   mm/page_alloc: separate interface to set/get migratetype of freepage
>   mm/page_alloc: store freelist migratetype to the page on buddy
>     properly

Potentially sounds useful

>   mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy
> 

Sounds unnecessary if CMA regions were MAX_ORDER_NR_PAGES aligned and
then the free paths would be unaffected for everybody.

I didn't look at the patches because it felt like cost without any supporting
justification for the patches. Superficially it looks like patch 1 needs to
go away and the last patch could be done without affected !CMA users. The
rest are potentially useful but there should have been some supporting
data on how it helps CMA with some backup showing that the page allocation
paths are not impacted as a result.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-09  9:06 ` [PATCH 0/7] improve robustness on handling migratetype Michal Nazarewicz
@ 2014-01-09 14:05   ` Joonsoo Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-09 14:05 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Jiang Liu, Mel Gorman, Cody P Schafer, Johannes Weiner,
	Michal Hocko, Minchan Kim, Andi Kleen, Wei Yongjun, Tang Chen,
	Linux Memory Management List, LKML

2014/1/9 Michal Nazarewicz <mina86@mina86.com>:
> On Thu, Jan 09 2014, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> Third, there is the problem on buddy allocator. It doesn't consider
>> migratetype when merging buddy, so pages from cma or isolate region can
>> be moved to other migratetype freelist. It makes CMA failed over and over.
>> To prevent it, the buddy allocator should consider migratetype if
>> CMA/ISOLATE is enabled.
>
> There should never be situation where a CMA page shares a pageblock (or
> a max-order page) with a non-CMA page though, so this should never be an
> issue.

Right... It never happens.
When I ported CMA region reservation code to my own code for testing,
I made a mistake. Sorry for noise.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] mm/cma: fix cma free page accounting
  2014-01-09  7:04 ` [PATCH 2/7] mm/cma: fix cma free page accounting Joonsoo Kim
@ 2014-01-09 21:10   ` Laura Abbott
  2014-01-10  8:50     ` Joonsoo Kim
  0 siblings, 1 reply; 24+ messages in thread
From: Laura Abbott @ 2014-01-09 21:10 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Jiang Liu, Mel Gorman,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel, Joonsoo Kim

On 1/8/2014 11:04 PM, Joonsoo Kim wrote:
> Cma pages can be allocated by not only order 0 request but also high order
> request. So, we should consider to account free cma page in the both
> places.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b36aa5a..1489c301 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>   							  start_migratetype,
>   							  migratetype);
>
> +			/* CMA pages cannot be stolen */
> +			if (is_migrate_cma(migratetype)) {
> +				__mod_zone_page_state(zone,
> +					NR_FREE_CMA_PAGES, -(1 << order));
> +			}
> +
>   			/* Remove the page from the freelists */
>   			list_del(&page->lru);
>   			rmv_page_order(page);
> @@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>   		}
>   		set_freepage_migratetype(page, mt);
>   		list = &page->lru;
> -		if (is_migrate_cma(mt))
> -			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> -					      -(1 << order));
>   	}
>   	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
>   	spin_unlock(&zone->lock);
>

Wouldn't this result in double counting? in the buffered_rmqueue non 
zero ordered request we call __mod_zone_freepage_state which already 
accounts for CMA pages if the migrate type is CMA so it seems like we 
would get hit twice:

buffered_rmqueue
    __rmqueue
        __rmqueue_fallback
            decrement
    __mod_zone_freepage_state
       decrement

Thanks,
Laura
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-09  9:27 ` Mel Gorman
@ 2014-01-10  8:48   ` Joonsoo Kim
  2014-01-10  9:48     ` Mel Gorman
  2014-01-29 16:52     ` Vlastimil Babka
  0 siblings, 2 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-10  8:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Jiang Liu,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel

On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> > Hello,
> > 
> > I found some weaknesses on handling migratetype during code review and
> > testing CMA.
> > 
> > First, we don't have any synchronization method on get/set pageblock
> > migratetype. When we change migratetype, we hold the zone lock. So
> > writer-writer race doesn't exist. But while someone changes migratetype,
> > others can get migratetype. This may introduce totally unintended value
> > as migratetype. Although I haven't heard of any problem report about
> > that, it is better to protect properly.
> > 
> 
> This is deliberate. The migratetypes for the majority of users are advisory
> and aimed for fragmentation avoidance. It was important that the cost of
> that be kept as low as possible and the general case is that migration types
> change very rarely. In many cases, the zone lock is held. In other cases,
> such as splitting free pages, the cost is simply not justified.
> 
> I doubt there is any amount of data you could add in support that would
> justify hammering the free fast paths (which call get_pageblock_type).

Hello, Mel.

There is a possibility that we can get unintended value such as 6 as migratetype
if reader-writer (get/set pageblock_migratetype) race happends. It can be
possible, because we read the value without any synchronization method. And
this migratetype, 6, has no place in buddy freelist, so array index overrun can
be possible and the system can break, although I haven't heard that it occurs.

I think that my solution is too expensive. However, I think that we need
solution. aren't we? Do you have any better idea?

> 
> > Second, (get/set)_freepage_migrate isn't used properly. I guess that it
> > would be introduced for per cpu page(pcp) performance, but, it is also
> > used by memory isolation, now. For that case, the information isn't
> > enough to use, so we need to fix it.
> > 
> > Third, there is the problem on buddy allocator. It doesn't consider
> > migratetype when merging buddy, so pages from cma or isolate region can
> > be moved to other migratetype freelist. It makes CMA failed over and over.
> > To prevent it, the buddy allocator should consider migratetype if
> > CMA/ISOLATE is enabled.
> 
> Without loioing at the patches, this is likely to add some cost to the
> page free fast path -- heavy cost if it's a pageblock lookup and lighter
> cost if you are using cached page information which is potentially stale.
> Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary
> instead to avoid any possibility of merging issues?
> 

There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it
can't happed. Sorry for noise.

> > This patchset is aimed at fixing these problems and based on v3.13-rc7.
> > 
> >   mm/page_alloc: synchronize get/set pageblock
> 
> cost with no justification.
> 
> >   mm/cma: fix cma free page accounting
> 
> sounds like it would be a fix but unrelated to the leader and should be
> seperated out on its own

Yes, it is not related to this topic and it is wrong patch as Laura
pointed out, so I will drop it.

> >   mm/page_alloc: move set_freepage_migratetype() to better place
> 
> Very vague. If this does something useful then it could do with a better
> subject.

Okay.

> >   mm/isolation: remove invalid check condition
> 
> Looks harmless.
> 
> >   mm/page_alloc: separate interface to set/get migratetype of freepage
> >   mm/page_alloc: store freelist migratetype to the page on buddy
> >     properly
> 
> Potentially sounds useful
> 

I made these two patches for last patch to reduce performance effect of it.
In case of dropping last patch, it is better to remove the last callsite
using freelist migratetype to know the buddy freelist type. I will do respin.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] mm/cma: fix cma free page accounting
  2014-01-09 21:10   ` Laura Abbott
@ 2014-01-10  8:50     ` Joonsoo Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-10  8:50 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Jiang Liu,
	Mel Gorman, Cody P Schafer, Johannes Weiner, Michal Hocko,
	Minchan Kim, Michal Nazarewicz, Andi Kleen, Wei Yongjun,
	Tang Chen, linux-mm, linux-kernel

On Thu, Jan 09, 2014 at 01:10:29PM -0800, Laura Abbott wrote:
> On 1/8/2014 11:04 PM, Joonsoo Kim wrote:
> >Cma pages can be allocated by not only order 0 request but also high order
> >request. So, we should consider to account free cma page in the both
> >places.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index b36aa5a..1489c301 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -1091,6 +1091,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
> >  							  start_migratetype,
> >  							  migratetype);
> >
> >+			/* CMA pages cannot be stolen */
> >+			if (is_migrate_cma(migratetype)) {
> >+				__mod_zone_page_state(zone,
> >+					NR_FREE_CMA_PAGES, -(1 << order));
> >+			}
> >+
> >  			/* Remove the page from the freelists */
> >  			list_del(&page->lru);
> >  			rmv_page_order(page);
> >@@ -1175,9 +1181,6 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
> >  		}
> >  		set_freepage_migratetype(page, mt);
> >  		list = &page->lru;
> >-		if (is_migrate_cma(mt))
> >-			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
> >-					      -(1 << order));
> >  	}
> >  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> >  	spin_unlock(&zone->lock);
> >
> 
> Wouldn't this result in double counting? in the buffered_rmqueue non
> zero ordered request we call __mod_zone_freepage_state which already
> accounts for CMA pages if the migrate type is CMA so it seems like
> we would get hit twice:
> 
> buffered_rmqueue
>    __rmqueue
>        __rmqueue_fallback
>            decrement
>    __mod_zone_freepage_state
>       decrement
> 

Hello, Laura.

You are right. I missed it. I will drop this patch.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-10  8:48   ` Joonsoo Kim
@ 2014-01-10  9:48     ` Mel Gorman
  2014-01-13  1:57       ` Joonsoo Kim
  2014-01-29 16:52     ` Vlastimil Babka
  1 sibling, 1 reply; 24+ messages in thread
From: Mel Gorman @ 2014-01-10  9:48 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Jiang Liu,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel

On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote:
> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> > > Hello,
> > > 
> > > I found some weaknesses on handling migratetype during code review and
> > > testing CMA.
> > > 
> > > First, we don't have any synchronization method on get/set pageblock
> > > migratetype. When we change migratetype, we hold the zone lock. So
> > > writer-writer race doesn't exist. But while someone changes migratetype,
> > > others can get migratetype. This may introduce totally unintended value
> > > as migratetype. Although I haven't heard of any problem report about
> > > that, it is better to protect properly.
> > > 
> > 
> > This is deliberate. The migratetypes for the majority of users are advisory
> > and aimed for fragmentation avoidance. It was important that the cost of
> > that be kept as low as possible and the general case is that migration types
> > change very rarely. In many cases, the zone lock is held. In other cases,
> > such as splitting free pages, the cost is simply not justified.
> > 
> > I doubt there is any amount of data you could add in support that would
> > justify hammering the free fast paths (which call get_pageblock_type).
> 
> Hello, Mel.
> 
> There is a possibility that we can get unintended value such as 6 as migratetype
> if reader-writer (get/set pageblock_migratetype) race happends. It can be
> possible, because we read the value without any synchronization method. And
> this migratetype, 6, has no place in buddy freelist, so array index overrun can
> be possible and the system can break, although I haven't heard that it occurs.
> 
> I think that my solution is too expensive. However, I think that we need
> solution. aren't we? Do you have any better idea?
> 

It's not something I have ever heard or seen of occurring but
if you've identified that it's a real possibility then split
get_pageblock_migratetype into locked and unlocked versions. Ensure
that calls to set_pageblock_migratetype is always under zone->lock and
get_pageblock_migratetype is also under zone->lock which both should be
true in the majority of cases. Use the unlocked version otherwise but
instead of synchronoing, check if it's returning >= MIGRATE_TYPES and
return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid
harming the fast paths for the majority of users and limit the damage if
a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-10  9:48     ` Mel Gorman
@ 2014-01-13  1:57       ` Joonsoo Kim
  0 siblings, 0 replies; 24+ messages in thread
From: Joonsoo Kim @ 2014-01-13  1:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Jiang Liu,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel

On Fri, Jan 10, 2014 at 09:48:34AM +0000, Mel Gorman wrote:
> On Fri, Jan 10, 2014 at 05:48:55PM +0900, Joonsoo Kim wrote:
> > On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> > > On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> > > > Hello,
> > > > 
> > > > I found some weaknesses on handling migratetype during code review and
> > > > testing CMA.
> > > > 
> > > > First, we don't have any synchronization method on get/set pageblock
> > > > migratetype. When we change migratetype, we hold the zone lock. So
> > > > writer-writer race doesn't exist. But while someone changes migratetype,
> > > > others can get migratetype. This may introduce totally unintended value
> > > > as migratetype. Although I haven't heard of any problem report about
> > > > that, it is better to protect properly.
> > > > 
> > > 
> > > This is deliberate. The migratetypes for the majority of users are advisory
> > > and aimed for fragmentation avoidance. It was important that the cost of
> > > that be kept as low as possible and the general case is that migration types
> > > change very rarely. In many cases, the zone lock is held. In other cases,
> > > such as splitting free pages, the cost is simply not justified.
> > > 
> > > I doubt there is any amount of data you could add in support that would
> > > justify hammering the free fast paths (which call get_pageblock_type).
> > 
> > Hello, Mel.
> > 
> > There is a possibility that we can get unintended value such as 6 as migratetype
> > if reader-writer (get/set pageblock_migratetype) race happends. It can be
> > possible, because we read the value without any synchronization method. And
> > this migratetype, 6, has no place in buddy freelist, so array index overrun can
> > be possible and the system can break, although I haven't heard that it occurs.
> > 
> > I think that my solution is too expensive. However, I think that we need
> > solution. aren't we? Do you have any better idea?
> > 
> 
> It's not something I have ever heard or seen of occurring but
> if you've identified that it's a real possibility then split
> get_pageblock_migratetype into locked and unlocked versions. Ensure
> that calls to set_pageblock_migratetype is always under zone->lock and
> get_pageblock_migratetype is also under zone->lock which both should be
> true in the majority of cases. Use the unlocked version otherwise but
> instead of synchronoing, check if it's returning >= MIGRATE_TYPES and
> return MIGRATE_MOVABLE in the unlikely event of a race. This will avoid
> harming the fast paths for the majority of users and limit the damage if
> a MIGRATE_CMA region is accidentally treated as MIGRATe_MOVABLE

Okay.
I will re-investigate it and if I have indentified that it's a real possiblity,
I will re-make this patch according to your advice.

Thanks for comment!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-10  8:48   ` Joonsoo Kim
  2014-01-10  9:48     ` Mel Gorman
@ 2014-01-29 16:52     ` Vlastimil Babka
  2014-01-31 15:39       ` Mel Gorman
  2014-02-03  7:45       ` Joonsoo Kim
  1 sibling, 2 replies; 24+ messages in thread
From: Vlastimil Babka @ 2014-01-29 16:52 UTC (permalink / raw)
  To: Joonsoo Kim, Mel Gorman
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Jiang Liu,
	Cody P Schafer, Johannes Weiner, Michal Hocko, Minchan Kim,
	Michal Nazarewicz, Andi Kleen, Wei Yongjun, Tang Chen, linux-mm,
	linux-kernel

On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
>>> Hello,
>>>
>>> I found some weaknesses on handling migratetype during code review and
>>> testing CMA.
>>>
>>> First, we don't have any synchronization method on get/set pageblock
>>> migratetype. When we change migratetype, we hold the zone lock. So
>>> writer-writer race doesn't exist. But while someone changes migratetype,
>>> others can get migratetype. This may introduce totally unintended value
>>> as migratetype. Although I haven't heard of any problem report about
>>> that, it is better to protect properly.
>>>
>>
>> This is deliberate. The migratetypes for the majority of users are advisory
>> and aimed for fragmentation avoidance. It was important that the cost of
>> that be kept as low as possible and the general case is that migration types
>> change very rarely. In many cases, the zone lock is held. In other cases,
>> such as splitting free pages, the cost is simply not justified.
>>
>> I doubt there is any amount of data you could add in support that would
>> justify hammering the free fast paths (which call get_pageblock_type).
>
> Hello, Mel.
>
> There is a possibility that we can get unintended value such as 6 as migratetype
> if reader-writer (get/set pageblock_migratetype) race happends. It can be
> possible, because we read the value without any synchronization method. And
> this migratetype, 6, has no place in buddy freelist, so array index overrun can
> be possible and the system can break, although I haven't heard that it occurs.

Hello,

it seems this can indeed happen. I'm working on memory compaction 
improvements and in a prototype patch, I'm basically adding calls of 
start_isolate_page_range() undo_isolate_page_range() some functions 
under compact_zone(). With this I've seen occurrences of NULL pointers 
in move_freepages(), free_one_page() in places where 
free_list[migratetype] is manipulated by e.g. list_move(). That lead me 
to question the value of migratetype and I found this thread. Adding 
some debugging in get_pageblock_migratetype() and voila, I get a value 
of 6 being read.

So is it just my patch adding a dangerous situation, or does it exist in
mainline as well? By looking at free_one_page(), it uses zone->lock, but
get_pageblock_migratetype() is called by its callers 
(free_hot_cold_page() or __free_pages_ok()) outside of the lock. This 
determined migratetype is then used under free_one_page() to access a 
free_list.

It seems that this could race with set_pageblock_migratetype() called 
from try_to_steal_freepages() (despite the latter being properly 
locked). There are also other callers but those seem to be either 
limited to initialization and isolation, which should be rare (?).
However, try_to_steal_freepages can occur repeatedly.
So I assume that the race happens but never manifests as a fatal error 
as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE
values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values with 
bit 4 enabled and can thus result in invalid values due to non-atomic 
access.

Does that make sense to you and should we thus proceed with patching 
this race?

Vlastimil

> I think that my solution is too expensive. However, I think that we need
> solution. aren't we? Do you have any better idea?
>
>>
>>> Second, (get/set)_freepage_migrate isn't used properly. I guess that it
>>> would be introduced for per cpu page(pcp) performance, but, it is also
>>> used by memory isolation, now. For that case, the information isn't
>>> enough to use, so we need to fix it.
>>>
>>> Third, there is the problem on buddy allocator. It doesn't consider
>>> migratetype when merging buddy, so pages from cma or isolate region can
>>> be moved to other migratetype freelist. It makes CMA failed over and over.
>>> To prevent it, the buddy allocator should consider migratetype if
>>> CMA/ISOLATE is enabled.
>>
>> Without loioing at the patches, this is likely to add some cost to the
>> page free fast path -- heavy cost if it's a pageblock lookup and lighter
>> cost if you are using cached page information which is potentially stale.
>> Why not force CMA regions to be aligned on MAX_ORDER_NR_PAGES boundary
>> instead to avoid any possibility of merging issues?
>>
>
> There was my mistake. CMA region is aligned on MAX_ORDER_NR_PAGES, so it
> can't happed. Sorry for noise.
>
>>> This patchset is aimed at fixing these problems and based on v3.13-rc7.
>>>
>>>    mm/page_alloc: synchronize get/set pageblock
>>
>> cost with no justification.
>>
>>>    mm/cma: fix cma free page accounting
>>
>> sounds like it would be a fix but unrelated to the leader and should be
>> seperated out on its own
>
> Yes, it is not related to this topic and it is wrong patch as Laura
> pointed out, so I will drop it.
>
>>>    mm/page_alloc: move set_freepage_migratetype() to better place
>>
>> Very vague. If this does something useful then it could do with a better
>> subject.
>
> Okay.
>
>>>    mm/isolation: remove invalid check condition
>>
>> Looks harmless.
>>
>>>    mm/page_alloc: separate interface to set/get migratetype of freepage
>>>    mm/page_alloc: store freelist migratetype to the page on buddy
>>>      properly
>>
>> Potentially sounds useful
>>
>
> I made these two patches for last patch to reduce performance effect of it.
> In case of dropping last patch, it is better to remove the last callsite
> using freelist migratetype to know the buddy freelist type. I will do respin.
>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-29 16:52     ` Vlastimil Babka
@ 2014-01-31 15:39       ` Mel Gorman
  2014-02-03  7:45       ` Joonsoo Kim
  1 sibling, 0 replies; 24+ messages in thread
From: Mel Gorman @ 2014-01-31 15:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Jiang Liu, Cody P Schafer, Johannes Weiner, Michal Hocko,
	Minchan Kim, Michal Nazarewicz, Andi Kleen, Wei Yongjun,
	Tang Chen, linux-mm, linux-kernel

On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
> >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> >>>Hello,
> >>>
> >>>I found some weaknesses on handling migratetype during code review and
> >>>testing CMA.
> >>>
> >>>First, we don't have any synchronization method on get/set pageblock
> >>>migratetype. When we change migratetype, we hold the zone lock. So
> >>>writer-writer race doesn't exist. But while someone changes migratetype,
> >>>others can get migratetype. This may introduce totally unintended value
> >>>as migratetype. Although I haven't heard of any problem report about
> >>>that, it is better to protect properly.
> >>>
> >>
> >>This is deliberate. The migratetypes for the majority of users are advisory
> >>and aimed for fragmentation avoidance. It was important that the cost of
> >>that be kept as low as possible and the general case is that migration types
> >>change very rarely. In many cases, the zone lock is held. In other cases,
> >>such as splitting free pages, the cost is simply not justified.
> >>
> >>I doubt there is any amount of data you could add in support that would
> >>justify hammering the free fast paths (which call get_pageblock_type).
> >
> >Hello, Mel.
> >
> >There is a possibility that we can get unintended value such as 6 as migratetype
> >if reader-writer (get/set pageblock_migratetype) race happends. It can be
> >possible, because we read the value without any synchronization method. And
> >this migratetype, 6, has no place in buddy freelist, so array index overrun can
> >be possible and the system can break, although I haven't heard that it occurs.
> 
> Hello,
> 
> it seems this can indeed happen. I'm working on memory compaction
> improvements and in a prototype patch, I'm basically adding calls of
> start_isolate_page_range() undo_isolate_page_range() some functions
> under compact_zone(). With this I've seen occurrences of NULL
> pointers in move_freepages(), free_one_page() in places where
> free_list[migratetype] is manipulated by e.g. list_move(). That lead
> me to question the value of migratetype and I found this thread.
> Adding some debugging in get_pageblock_migratetype() and voila, I
> get a value of 6 being read.
> 
> So is it just my patch adding a dangerous situation, or does it exist in
> mainline as well? By looking at free_one_page(), it uses zone->lock, but
> get_pageblock_migratetype() is called by its callers
> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
> This determined migratetype is then used under free_one_page() to
> access a free_list.
> 
> It seems that this could race with set_pageblock_migratetype()
> called from try_to_steal_freepages() (despite the latter being
> properly locked). There are also other callers but those seem to be
> either limited to initialization and isolation, which should be rare
> (?).
> However, try_to_steal_freepages can occur repeatedly.
> So I assume that the race happens but never manifests as a fatal
> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
> MIGRATE_MOVABLE
> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
> with bit 4 enabled and can thus result in invalid values due to
> non-atomic access.
> 
> Does that make sense to you and should we thus proceed with patching
> this race?
> 

If you have direct evidence then it is indeed a problem.  the key would be
to avoid taking the zone->lock just to stabilise this and instead modify
when get_pageblock_pagetype is called to make it safe. Looking at the
callers of get_pageblock_pagetype it would appear that

1. __free_pages_ok's call to get_pageblock_pagetype can move into
   free_one_page() under the zone lock as long as you also move
   the set_freepage_migratetype call. The migratetype will be read
   twice by the free_hot_cold_page->free_one_page call but that's
   ok because you have established that it is necessary

2. rmqueue_bulk calls under zone->lock

3. free_hot_cold_page cannot take zone->lock to stabilise the
   migratetype read but if it gets a bad read due to a race, it
   enters the slow path. Force it to call free_one_page() there
   and take the lock in the event of a race instead of only
   calling in there due to is_migrate_isolatetype. Consider
   adding a debug patch that counts with vmstat how often this
   race occurs and check the value with and without the compaction
   patches you've added

4. It's not obvious but __isolate_free_page should already hold the zone lock

5. buffered_rmqueue, move the call to get_pageblock_migratetype under
   the zone lock. It'll just cost a local variable.

6. A race in setup_zone_migrate_reserve is relatively harmless. Check
   system_state == SYSTEM_BOOTING and take the zone->lock if the system
   is live. Release, resched and reacquire if need_resched()

7. has_unmovable_pages is harmless, the range should be isolated and
   not racing against other updates

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-01-29 16:52     ` Vlastimil Babka
  2014-01-31 15:39       ` Mel Gorman
@ 2014-02-03  7:45       ` Joonsoo Kim
  2014-02-03  9:16         ` Vlastimil Babka
  1 sibling, 1 reply; 24+ messages in thread
From: Joonsoo Kim @ 2014-02-03  7:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Mel Gorman, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Jiang Liu, Cody P Schafer, Johannes Weiner, Michal Hocko,
	Minchan Kim, Michal Nazarewicz, Andi Kleen, Wei Yongjun,
	Tang Chen, linux-mm, linux-kernel

On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
> >On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
> >>On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
> >>>Hello,
> >>>
> >>>I found some weaknesses on handling migratetype during code review and
> >>>testing CMA.
> >>>
> >>>First, we don't have any synchronization method on get/set pageblock
> >>>migratetype. When we change migratetype, we hold the zone lock. So
> >>>writer-writer race doesn't exist. But while someone changes migratetype,
> >>>others can get migratetype. This may introduce totally unintended value
> >>>as migratetype. Although I haven't heard of any problem report about
> >>>that, it is better to protect properly.
> >>>
> >>
> >>This is deliberate. The migratetypes for the majority of users are advisory
> >>and aimed for fragmentation avoidance. It was important that the cost of
> >>that be kept as low as possible and the general case is that migration types
> >>change very rarely. In many cases, the zone lock is held. In other cases,
> >>such as splitting free pages, the cost is simply not justified.
> >>
> >>I doubt there is any amount of data you could add in support that would
> >>justify hammering the free fast paths (which call get_pageblock_type).
> >
> >Hello, Mel.
> >
> >There is a possibility that we can get unintended value such as 6 as migratetype
> >if reader-writer (get/set pageblock_migratetype) race happends. It can be
> >possible, because we read the value without any synchronization method. And
> >this migratetype, 6, has no place in buddy freelist, so array index overrun can
> >be possible and the system can break, although I haven't heard that it occurs.
> 
> Hello,
> 
> it seems this can indeed happen. I'm working on memory compaction
> improvements and in a prototype patch, I'm basically adding calls of
> start_isolate_page_range() undo_isolate_page_range() some functions
> under compact_zone(). With this I've seen occurrences of NULL
> pointers in move_freepages(), free_one_page() in places where
> free_list[migratetype] is manipulated by e.g. list_move(). That lead
> me to question the value of migratetype and I found this thread.
> Adding some debugging in get_pageblock_migratetype() and voila, I
> get a value of 6 being read.
> 
> So is it just my patch adding a dangerous situation, or does it exist in
> mainline as well? By looking at free_one_page(), it uses zone->lock, but
> get_pageblock_migratetype() is called by its callers
> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
> This determined migratetype is then used under free_one_page() to
> access a free_list.
> 
> It seems that this could race with set_pageblock_migratetype()
> called from try_to_steal_freepages() (despite the latter being
> properly locked). There are also other callers but those seem to be
> either limited to initialization and isolation, which should be rare
> (?).
> However, try_to_steal_freepages can occur repeatedly.
> So I assume that the race happens but never manifests as a fatal
> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
> MIGRATE_MOVABLE
> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
> with bit 4 enabled and can thus result in invalid values due to
> non-atomic access.
> 
> Does that make sense to you and should we thus proceed with patching
> this race?
> 

Hello,

This race is possible without your prototype patch, however, on very low
probability. Some codes related to memory failure use set_migratetype_isolate()
which could result in this race.

Although it may be very rare case and not critical, it is better to fix
this race. I prefer that we don't depend on luck. :)

Mel's suggestion looks good to me. Do you have another idea?

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/7] improve robustness on handling migratetype
  2014-02-03  7:45       ` Joonsoo Kim
@ 2014-02-03  9:16         ` Vlastimil Babka
  0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2014-02-03  9:16 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Mel Gorman, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Jiang Liu, Cody P Schafer, Johannes Weiner, Michal Hocko,
	Minchan Kim, Michal Nazarewicz, Andi Kleen, Wei Yongjun,
	Tang Chen, linux-mm, linux-kernel

On 02/03/2014 08:45 AM, Joonsoo Kim wrote:
> On Wed, Jan 29, 2014 at 05:52:41PM +0100, Vlastimil Babka wrote:
>> On 01/10/2014 09:48 AM, Joonsoo Kim wrote:
>>> On Thu, Jan 09, 2014 at 09:27:20AM +0000, Mel Gorman wrote:
>>>> On Thu, Jan 09, 2014 at 04:04:40PM +0900, Joonsoo Kim wrote:
>>>>> Hello,
>>>>>
>>>>> I found some weaknesses on handling migratetype during code review and
>>>>> testing CMA.
>>>>>
>>>>> First, we don't have any synchronization method on get/set pageblock
>>>>> migratetype. When we change migratetype, we hold the zone lock. So
>>>>> writer-writer race doesn't exist. But while someone changes migratetype,
>>>>> others can get migratetype. This may introduce totally unintended value
>>>>> as migratetype. Although I haven't heard of any problem report about
>>>>> that, it is better to protect properly.
>>>>>
>>>>
>>>> This is deliberate. The migratetypes for the majority of users are advisory
>>>> and aimed for fragmentation avoidance. It was important that the cost of
>>>> that be kept as low as possible and the general case is that migration types
>>>> change very rarely. In many cases, the zone lock is held. In other cases,
>>>> such as splitting free pages, the cost is simply not justified.
>>>>
>>>> I doubt there is any amount of data you could add in support that would
>>>> justify hammering the free fast paths (which call get_pageblock_type).
>>>
>>> Hello, Mel.
>>>
>>> There is a possibility that we can get unintended value such as 6 as migratetype
>>> if reader-writer (get/set pageblock_migratetype) race happends. It can be
>>> possible, because we read the value without any synchronization method. And
>>> this migratetype, 6, has no place in buddy freelist, so array index overrun can
>>> be possible and the system can break, although I haven't heard that it occurs.
>>
>> Hello,
>>
>> it seems this can indeed happen. I'm working on memory compaction
>> improvements and in a prototype patch, I'm basically adding calls of
>> start_isolate_page_range() undo_isolate_page_range() some functions
>> under compact_zone(). With this I've seen occurrences of NULL
>> pointers in move_freepages(), free_one_page() in places where
>> free_list[migratetype] is manipulated by e.g. list_move(). That lead
>> me to question the value of migratetype and I found this thread.
>> Adding some debugging in get_pageblock_migratetype() and voila, I
>> get a value of 6 being read.
>>
>> So is it just my patch adding a dangerous situation, or does it exist in
>> mainline as well? By looking at free_one_page(), it uses zone->lock, but
>> get_pageblock_migratetype() is called by its callers
>> (free_hot_cold_page() or __free_pages_ok()) outside of the lock.
>> This determined migratetype is then used under free_one_page() to
>> access a free_list.
>>
>> It seems that this could race with set_pageblock_migratetype()
>> called from try_to_steal_freepages() (despite the latter being
>> properly locked). There are also other callers but those seem to be
>> either limited to initialization and isolation, which should be rare
>> (?).
>> However, try_to_steal_freepages can occur repeatedly.
>> So I assume that the race happens but never manifests as a fatal
>> error as long as MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE and
>> MIGRATE_MOVABLE
>> values are used. Only MIGRATE_CMA and MIGRATE_ISOLATE have values
>> with bit 4 enabled and can thus result in invalid values due to
>> non-atomic access.
>>
>> Does that make sense to you and should we thus proceed with patching
>> this race?
>>
>
> Hello,
>
> This race is possible without your prototype patch, however, on very low
> probability. Some codes related to memory failure use set_migratetype_isolate()
> which could result in this race.
>
> Although it may be very rare case and not critical, it is better to fix
> this race. I prefer that we don't depend on luck. :)

I agree :) I also don't like the possibility that the non-fatal type of 
race (where higher-order bits are not involved) occurs and can hurt 
anti-fragmentation, or even suddenly become a problem in the future if 
e.g. more migratetypes are added. I'll try to quantify that with a debug 
patch.

> Mel's suggestion looks good to me. Do you have another idea?

No, it sounds good so I'm going to work on this as outlined.

> Thanks.
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-02-03  9:17 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-09  7:04 [PATCH 0/7] improve robustness on handling migratetype Joonsoo Kim
2014-01-09  7:04 ` [PATCH 1/7] mm/page_alloc: synchronize get/set pageblock Joonsoo Kim
2014-01-09  9:08   ` Michal Nazarewicz
2014-01-09  7:04 ` [PATCH 2/7] mm/cma: fix cma free page accounting Joonsoo Kim
2014-01-09 21:10   ` Laura Abbott
2014-01-10  8:50     ` Joonsoo Kim
2014-01-09  7:04 ` [PATCH 3/7] mm/page_alloc: move set_freepage_migratetype() to better place Joonsoo Kim
2014-01-09  7:04 ` [PATCH 4/7] mm/isolation: remove invalid check condition Joonsoo Kim
2014-01-09  7:04 ` [PATCH 5/7] mm/page_alloc: separate interface to set/get migratetype of freepage Joonsoo Kim
2014-01-09  9:18   ` Michal Nazarewicz
2014-01-09  7:04 ` [PATCH 6/7] mm/page_alloc: store freelist migratetype to the page on buddy properly Joonsoo Kim
2014-01-09  9:19   ` Michal Nazarewicz
2014-01-09  7:04 ` [PATCH 7/7] mm/page_alloc: don't merge MIGRATE_(CMA|ISOLATE) pages on buddy Joonsoo Kim
2014-01-09  9:22   ` Michal Nazarewicz
2014-01-09  9:06 ` [PATCH 0/7] improve robustness on handling migratetype Michal Nazarewicz
2014-01-09 14:05   ` Joonsoo Kim
2014-01-09  9:27 ` Mel Gorman
2014-01-10  8:48   ` Joonsoo Kim
2014-01-10  9:48     ` Mel Gorman
2014-01-13  1:57       ` Joonsoo Kim
2014-01-29 16:52     ` Vlastimil Babka
2014-01-31 15:39       ` Mel Gorman
2014-02-03  7:45       ` Joonsoo Kim
2014-02-03  9:16         ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).