linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/4] fix freepage count problems in memory isolation
@ 2014-08-26  8:08 Joonsoo Kim
  2014-08-26  8:08 ` [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Joonsoo Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-08-26  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Peter Zijlstra, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel, Joonsoo Kim

This is version 3 patchset which is improved and minimized version of
version 1 to fix freepage accounting problem during memory isolation.
I tried different approach in version 2, but, it looks really complicated
so I change my mind to improve version 1. You can see version 1, 2 in
following links [1] [2], respectively.

IMO, this v3 is better than v2, because this is simpler than v2 so
better for maintainance and this doesn't change pageblock isolation
logic so it is much easier to backport.

This problems are found by testing my patchset [3]. There are some race
conditions on pageblock isolation and these race cause incorrect
freepage count.

Before describing bugs itself, I first explain definition of freepage.

1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.

Now, I describe problems and related patch.

Patch 1: There is race conditions on getting pageblock migratetype that
it results in misplacement of freepages on buddy list, incorrect
freepage count and un-availability of freepage.

Patch 2: Freepages on pcp list could have stale cached information to
determine migratetype of buddy list to go. This causes misplacement
of freepages on buddy list and incorrect freepage count.

Patch 4: Merging between freepages on different migratetype of
pageblocks will cause freepages accouting problem. This patch fixes it.

Without patchset [3], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.

With patchset [3], I did simple CMA allocation test and get below result.

- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed

With patchset [3] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.

These problems can be possible on memory hot remove users, although
I didn't check it further.

This patchset is based on linux-next-20140826.
Please see individual patches for more information.

Thanks.

[1]: https://lkml.org/lkml/2014/7/4/79
[2]: lkml.org/lkml/2014/8/6/52
[3]: Aggressively allocate the pages on cma reserved memory
     https://lkml.org/lkml/2014/5/30/291

Joonsoo Kim (4):
  mm/page_alloc: fix incorrect isolation behavior by rechecking
    migratetype
  mm/page_alloc: add freepage on isolate pageblock to correct buddy
    list
  mm/page_alloc: move migratetype recheck logic to __free_one_page()
  mm/page_alloc: restrict max order of merging on isolated pageblock

 include/linux/mmzone.h         |    4 ++++
 include/linux/page-isolation.h |    8 ++++++++
 mm/page_alloc.c                |   28 +++++++++++++++++++---------
 mm/page_isolation.c            |    2 ++
 4 files changed, 33 insertions(+), 9 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-08-26  8:08 [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Joonsoo Kim
@ 2014-08-26  8:08 ` Joonsoo Kim
  2014-08-29 17:46   ` Naoya Horiguchi
  2014-09-08  8:31   ` Vlastimil Babka
  2014-08-26  8:08 ` [RFC PATCH v3 2/4] mm/page_alloc: add freepage on isolate pageblock to correct buddy list Joonsoo Kim
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-08-26  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Peter Zijlstra, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel, Joonsoo Kim

There are two paths to reach core free function of buddy allocator,
__free_one_page(), one is free_one_page()->__free_one_page() and the
other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
Each paths has race condition causing serious problems. At first, this
patch is focused on first type of freepath. And then, following patch
will solve the problem in second type of freepath.

In the first type of freepath, we got migratetype of freeing page without
holding the zone lock, so it could be racy. There are two cases of this
race.

1. pages are added to isolate buddy list after restoring orignal
migratetype

CPU1                                   CPU2

get migratetype => return MIGRATE_ISOLATE
call free_one_page() with MIGRATE_ISOLATE

				grab the zone lock
				unisolate pageblock
				release the zone lock

grab the zone lock
call __free_one_page() with MIGRATE_ISOLATE
freepage go into isolate buddy list,
although pageblock is already unisolated

This may cause two problems. One is that we can't use this page anymore
until next isolation attempt of this pageblock, because freepage is on
isolate pageblock. The other is that freepage accouting could be wrong
due to merging between different buddy list. Freepages on isolate buddy
list aren't counted as freepage, but ones on normal buddy list are counted
as freepage. If merge happens, buddy freepage on normal buddy list is
inevitably moved to isolate buddy list without any consideration of
freepage accouting so it could be incorrect.

2. pages are added to normal buddy list while pageblock is isolated.
It is similar with above case.

This also may cause two problems. One is that we can't keep these
freepages from being allocated. Although this pageblock is isolated,
freepage would be added to normal buddy list so that it could be
allocated without any restriction. And the other problem is same as
case 1, that it, incorrect freepage accouting.

This race condition would be prevented by checking migratetype again
with holding the zone lock. Because it is somewhat heavy operation
and it isn't needed in common case, we want to avoid rechecking as much
as possible. So this patch introduce new variable, nr_isolate_pageblock
in struct zone to check if there is isolated pageblock.
With this, we can avoid to re-check migratetype in common case and do
it only if there is isolated pageblock. This solve above
mentioned problems.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/mmzone.h         |    4 ++++
 include/linux/page-isolation.h |    8 ++++++++
 mm/page_alloc.c                |   10 ++++++++--
 mm/page_isolation.c            |    2 ++
 4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 318df70..23e69f1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -431,6 +431,10 @@ struct zone {
 	 */
 	int			nr_migrate_reserve_block;
 
+#ifdef CONFIG_MEMORY_ISOLATION
+	unsigned long		nr_isolate_pageblock;
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/* see spanned/present_pages for more description */
 	seqlock_t		span_seqlock;
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3fff8e7..2dc1e16 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -2,6 +2,10 @@
 #define __LINUX_PAGEISOLATION_H
 
 #ifdef CONFIG_MEMORY_ISOLATION
+static inline bool has_isolate_pageblock(struct zone *zone)
+{
+	return zone->nr_isolate_pageblock;
+}
 static inline bool is_migrate_isolate_page(struct page *page)
 {
 	return get_pageblock_migratetype(page) == MIGRATE_ISOLATE;
@@ -11,6 +15,10 @@ static inline bool is_migrate_isolate(int migratetype)
 	return migratetype == MIGRATE_ISOLATE;
 }
 #else
+static inline bool has_isolate_pageblock(struct zone *zone)
+{
+	return false;
+}
 static inline bool is_migrate_isolate_page(struct page *page)
 {
 	return false;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f86023b..51e0d13 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -740,9 +740,15 @@ static void free_one_page(struct zone *zone,
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
 
+	if (unlikely(has_isolate_pageblock(zone))) {
+		migratetype = get_pfnblock_migratetype(page, pfn);
+		if (is_migrate_isolate(migratetype))
+			goto skip_counting;
+	}
+	__mod_zone_freepage_state(zone, 1 << order, migratetype);
+
+skip_counting:
 	__free_one_page(page, pfn, zone, order, migratetype);
-	if (unlikely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
 	spin_unlock(&zone->lock);
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..1fa4a4d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -60,6 +60,7 @@ out:
 		int migratetype = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
 
 		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
@@ -83,6 +84,7 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	nr_pages = move_freepages_block(zone, page, migratetype);
 	__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	set_pageblock_migratetype(page, migratetype);
+	zone->nr_isolate_pageblock--;
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 2/4] mm/page_alloc: add freepage on isolate pageblock to correct buddy list
  2014-08-26  8:08 [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Joonsoo Kim
  2014-08-26  8:08 ` [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Joonsoo Kim
@ 2014-08-26  8:08 ` Joonsoo Kim
  2014-08-26  8:08 ` [RFC PATCH v3 3/4] mm/page_alloc: move migratetype recheck logic to __free_one_page() Joonsoo Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-08-26  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Peter Zijlstra, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel, Joonsoo Kim

In free_pcppages_bulk(), we use cached migratetype of freepage
to determine type of buddy list where freepage will be added.
This information is stored when freepage is added to pcp list, so
if isolation of pageblock of this freepage begins after storing,
this cached information could be stale. In other words, it has
original migratetype rather than MIGRATE_ISOLATE.

There are two problems caused by this stale information. One is that
we can't keep these freepages from being allocated. Although this
pageblock is isolated, freepage will be added to normal buddy list
so that it could be allocated without any restriction. And the other
problem is incorrect freepage accounting. Freepages on isolate pageblock
should not be counted for number of freepage.

Following is the code snippet in free_pcppages_bulk().

/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, page_to_pfn(page), zone, 0, mt);
trace_mm_page_pcpu_drain(page, 0, mt);
if (likely(!is_migrate_isolate_page(page))) {
	__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
	if (is_migrate_cma(mt))
		__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
}

As you can see above snippet, current code already handle second problem,
incorrect freepage accounting, by re-fetching pageblock migratetype
through is_migrate_isolate_page(page). But, because this re-fetched
information isn't used for __free_one_page(), first problem would not be
solved. This patch try to solve this situation to re-fetch pageblock
migratetype before __free_one_page() and to use it for __free_one_page().

In addition to move up position of this re-fetch, this patch use
optimization technique, re-fetching migratetype only if there is
isolate pageblock. Pageblock isolation is rare event, so we can
avoid re-fetching in common case with this optimization.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 51e0d13..6c952b6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -716,14 +716,17 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
 			mt = get_freepage_migratetype(page);
+			if (unlikely(has_isolate_pageblock(zone))) {
+				mt = get_pageblock_migratetype(page);
+				if (is_migrate_isolate(mt))
+					goto skip_counting;
+			}
+			__mod_zone_freepage_state(zone, 1, mt);
+
+skip_counting:
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
 			trace_mm_page_pcpu_drain(page, 0, mt);
-			if (likely(!is_migrate_isolate_page(page))) {
-				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
-				if (is_migrate_cma(mt))
-					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
-			}
 		} while (--to_free && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 3/4] mm/page_alloc: move migratetype recheck logic to __free_one_page()
  2014-08-26  8:08 [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Joonsoo Kim
  2014-08-26  8:08 ` [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Joonsoo Kim
  2014-08-26  8:08 ` [RFC PATCH v3 2/4] mm/page_alloc: add freepage on isolate pageblock to correct buddy list Joonsoo Kim
@ 2014-08-26  8:08 ` Joonsoo Kim
  2014-08-26  8:08 ` [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock Joonsoo Kim
  2014-09-15  5:09 ` [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Minchan Kim
  4 siblings, 0 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-08-26  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Peter Zijlstra, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel, Joonsoo Kim

All the caller of __free_one_page() has same migratetype recheck logic,
so we can move it to __free_one_page(). This reduce line of code and help
future maintenance. This is also preparation step for "mm/page_alloc:
restrict max order of merging on isolated pageblock" which fix the
freepage accouting problem on freepage with more than pageblock order.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6c952b6..809bfd3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -578,7 +578,14 @@ static inline void __free_one_page(struct page *page,
 			return;
 
 	VM_BUG_ON(migratetype == -1);
+	if (unlikely(has_isolate_pageblock(zone))) {
+		migratetype = get_pfnblock_migratetype(page, pfn);
+		if (is_migrate_isolate(migratetype))
+			goto skip_counting;
+	}
+	__mod_zone_freepage_state(zone, 1 << order, migratetype);
 
+skip_counting:
 	page_idx = pfn & ((1 << MAX_ORDER) - 1);
 
 	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
@@ -716,14 +723,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
 			mt = get_freepage_migratetype(page);
-			if (unlikely(has_isolate_pageblock(zone))) {
-				mt = get_pageblock_migratetype(page);
-				if (is_migrate_isolate(mt))
-					goto skip_counting;
-			}
-			__mod_zone_freepage_state(zone, 1, mt);
 
-skip_counting:
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
 			trace_mm_page_pcpu_drain(page, 0, mt);
@@ -743,14 +743,6 @@ static void free_one_page(struct zone *zone,
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
 
-	if (unlikely(has_isolate_pageblock(zone))) {
-		migratetype = get_pfnblock_migratetype(page, pfn);
-		if (is_migrate_isolate(migratetype))
-			goto skip_counting;
-	}
-	__mod_zone_freepage_state(zone, 1 << order, migratetype);
-
-skip_counting:
 	__free_one_page(page, pfn, zone, order, migratetype);
 	spin_unlock(&zone->lock);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock
  2014-08-26  8:08 [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Joonsoo Kim
                   ` (2 preceding siblings ...)
  2014-08-26  8:08 ` [RFC PATCH v3 3/4] mm/page_alloc: move migratetype recheck logic to __free_one_page() Joonsoo Kim
@ 2014-08-26  8:08 ` Joonsoo Kim
  2014-08-29 16:52   ` Naoya Horiguchi
  2014-09-15  5:09 ` [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Minchan Kim
  4 siblings, 1 reply; 14+ messages in thread
From: Joonsoo Kim @ 2014-08-26  8:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Peter Zijlstra, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel, Joonsoo Kim

Current pageblock isolation logic could isolate each pageblock
individually. This causes freepage accounting problem if freepage with
pageblock order on isolate pageblock is merged with other freepage on
normal pageblock. We can prevent merging by restricting max order of
merging to pageblock order if freepage is on isolate pageblock.

Side-effect of this change is that there could be non-merged buddy
freepage even if finishing pageblock isolation, because undoing pageblock
isolation is just to move freepage from isolate buddy list to normal buddy
list rather than to consider merging. But, I think it doesn't matter
because 1) almost allocation request are for equal or below pageblock
order, 2) caller of pageblock isolation will use this freepage so
freepage will split in any case and 3) merge would happen soon after
some alloc/free on this and buddy pageblock.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 809bfd3..8ba9fb0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -570,6 +570,7 @@ static inline void __free_one_page(struct page *page,
 	unsigned long combined_idx;
 	unsigned long uninitialized_var(buddy_idx);
 	struct page *buddy;
+	int max_order = MAX_ORDER;
 
 	VM_BUG_ON(!zone_is_initialized(zone));
 
@@ -580,18 +581,26 @@ static inline void __free_one_page(struct page *page,
 	VM_BUG_ON(migratetype == -1);
 	if (unlikely(has_isolate_pageblock(zone))) {
 		migratetype = get_pfnblock_migratetype(page, pfn);
-		if (is_migrate_isolate(migratetype))
+		if (is_migrate_isolate(migratetype)) {
+			/*
+			 * We restrict max order of merging to prevent merge
+			 * between freepages on isolate pageblock and normal
+			 * pageblock. Without this, pageblock isolation
+			 * could cause incorrect freepage accounting.
+			 */
+			max_order = pageblock_order + 1;
 			goto skip_counting;
+		}
 	}
 	__mod_zone_freepage_state(zone, 1 << order, migratetype);
 
 skip_counting:
-	page_idx = pfn & ((1 << MAX_ORDER) - 1);
+	page_idx = pfn & ((1 << max_order) - 1);
 
 	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
 	VM_BUG_ON_PAGE(bad_range(zone, page), page);
 
-	while (order < MAX_ORDER-1) {
+	while (order < max_order - 1) {
 		buddy_idx = __find_buddy_index(page_idx, order);
 		buddy = page + (buddy_idx - page_idx);
 		if (!page_is_buddy(page, buddy, order))
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock
  2014-08-26  8:08 ` [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock Joonsoo Kim
@ 2014-08-29 16:52   ` Naoya Horiguchi
  2014-09-01  0:15     ` Joonsoo Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2014-08-29 16:52 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel

Hi Joonsoo,

On Tue, Aug 26, 2014 at 05:08:18PM +0900, Joonsoo Kim wrote:
> Current pageblock isolation logic could isolate each pageblock
> individually. This causes freepage accounting problem if freepage with
> pageblock order on isolate pageblock is merged with other freepage on
> normal pageblock. We can prevent merging by restricting max order of
> merging to pageblock order if freepage is on isolate pageblock.
> 
> Side-effect of this change is that there could be non-merged buddy
> freepage even if finishing pageblock isolation, because undoing pageblock
> isolation is just to move freepage from isolate buddy list to normal buddy
> list rather than to consider merging. But, I think it doesn't matter
> because 1) almost allocation request are for equal or below pageblock
> order, 2) caller of pageblock isolation will use this freepage so
> freepage will split in any case and 3) merge would happen soon after
> some alloc/free on this and buddy pageblock.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 809bfd3..8ba9fb0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -570,6 +570,7 @@ static inline void __free_one_page(struct page *page,
>  	unsigned long combined_idx;
>  	unsigned long uninitialized_var(buddy_idx);
>  	struct page *buddy;
> +	int max_order = MAX_ORDER;
>  
>  	VM_BUG_ON(!zone_is_initialized(zone));
>  
> @@ -580,18 +581,26 @@ static inline void __free_one_page(struct page *page,
>  	VM_BUG_ON(migratetype == -1);
>  	if (unlikely(has_isolate_pageblock(zone))) {
>  		migratetype = get_pfnblock_migratetype(page, pfn);
> -		if (is_migrate_isolate(migratetype))
> +		if (is_migrate_isolate(migratetype)) {
> +			/*
> +			 * We restrict max order of merging to prevent merge
> +			 * between freepages on isolate pageblock and normal
> +			 * pageblock. Without this, pageblock isolation
> +			 * could cause incorrect freepage accounting.
> +			 */
> +			max_order = pageblock_order + 1;

When pageblock_order >= max_order, order in the while loop below could
go beyond MAX_ORDER - 1. Or does it never happen?

Thanks,
Naoya Horiguchi

>  			goto skip_counting;
> +		}
>  	}
>  	__mod_zone_freepage_state(zone, 1 << order, migratetype);
>  
>  skip_counting:
> -	page_idx = pfn & ((1 << MAX_ORDER) - 1);
> +	page_idx = pfn & ((1 << max_order) - 1);
>  
>  	VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page);
>  	VM_BUG_ON_PAGE(bad_range(zone, page), page);
>  
> -	while (order < MAX_ORDER-1) {
> +	while (order < max_order - 1) {
>  		buddy_idx = __find_buddy_index(page_idx, order);
>  		buddy = page + (buddy_idx - page_idx);
>  		if (!page_is_buddy(page, buddy, order))
> -- 
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-08-26  8:08 ` [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Joonsoo Kim
@ 2014-08-29 17:46   ` Naoya Horiguchi
  2014-09-01  0:14     ` Joonsoo Kim
  2014-09-08  8:31   ` Vlastimil Babka
  1 sibling, 1 reply; 14+ messages in thread
From: Naoya Horiguchi @ 2014-08-29 17:46 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel

On Tue, Aug 26, 2014 at 05:08:15PM +0900, Joonsoo Kim wrote:
> There are two paths to reach core free function of buddy allocator,
> __free_one_page(), one is free_one_page()->__free_one_page() and the
> other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
> Each paths has race condition causing serious problems. At first, this
> patch is focused on first type of freepath. And then, following patch
> will solve the problem in second type of freepath.
> 
> In the first type of freepath, we got migratetype of freeing page without
> holding the zone lock, so it could be racy. There are two cases of this
> race.
> 
> 1. pages are added to isolate buddy list after restoring orignal
> migratetype
> 
> CPU1                                   CPU2
> 
> get migratetype => return MIGRATE_ISOLATE
> call free_one_page() with MIGRATE_ISOLATE
> 
> 				grab the zone lock
> 				unisolate pageblock
> 				release the zone lock
> 
> grab the zone lock
> call __free_one_page() with MIGRATE_ISOLATE
> freepage go into isolate buddy list,
> although pageblock is already unisolated
> 
> This may cause two problems. One is that we can't use this page anymore
> until next isolation attempt of this pageblock, because freepage is on
> isolate pageblock. The other is that freepage accouting could be wrong
> due to merging between different buddy list. Freepages on isolate buddy
> list aren't counted as freepage, but ones on normal buddy list are counted
> as freepage. If merge happens, buddy freepage on normal buddy list is
> inevitably moved to isolate buddy list without any consideration of
> freepage accouting so it could be incorrect.
> 
> 2. pages are added to normal buddy list while pageblock is isolated.
> It is similar with above case.
> 
> This also may cause two problems. One is that we can't keep these
> freepages from being allocated. Although this pageblock is isolated,
> freepage would be added to normal buddy list so that it could be
> allocated without any restriction. And the other problem is same as
> case 1, that it, incorrect freepage accouting.
> 
> This race condition would be prevented by checking migratetype again
> with holding the zone lock. Because it is somewhat heavy operation
> and it isn't needed in common case, we want to avoid rechecking as much
> as possible. So this patch introduce new variable, nr_isolate_pageblock
> in struct zone to check if there is isolated pageblock.
> With this, we can avoid to re-check migratetype in common case and do
> it only if there is isolated pageblock. This solve above
> mentioned problems.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/mmzone.h         |    4 ++++
>  include/linux/page-isolation.h |    8 ++++++++
>  mm/page_alloc.c                |   10 ++++++++--
>  mm/page_isolation.c            |    2 ++
>  4 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 318df70..23e69f1 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -431,6 +431,10 @@ struct zone {
>  	 */
>  	int			nr_migrate_reserve_block;
>  
> +#ifdef CONFIG_MEMORY_ISOLATION

It's worth adding some comment, especially about locking?
The patch itself looks good me.

Thanks,
Naoya Horiguchi

> +	unsigned long		nr_isolate_pageblock;
> +#endif
> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  	/* see spanned/present_pages for more description */
>  	seqlock_t		span_seqlock;
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 3fff8e7..2dc1e16 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -2,6 +2,10 @@
>  #define __LINUX_PAGEISOLATION_H
>  
>  #ifdef CONFIG_MEMORY_ISOLATION
> +static inline bool has_isolate_pageblock(struct zone *zone)
> +{
> +	return zone->nr_isolate_pageblock;
> +}
>  static inline bool is_migrate_isolate_page(struct page *page)
>  {
>  	return get_pageblock_migratetype(page) == MIGRATE_ISOLATE;
> @@ -11,6 +15,10 @@ static inline bool is_migrate_isolate(int migratetype)
>  	return migratetype == MIGRATE_ISOLATE;
>  }
>  #else
> +static inline bool has_isolate_pageblock(struct zone *zone)
> +{
> +	return false;
> +}
>  static inline bool is_migrate_isolate_page(struct page *page)
>  {
>  	return false;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f86023b..51e0d13 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -740,9 +740,15 @@ static void free_one_page(struct zone *zone,
>  	if (nr_scanned)
>  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
>  
> +	if (unlikely(has_isolate_pageblock(zone))) {
> +		migratetype = get_pfnblock_migratetype(page, pfn);
> +		if (is_migrate_isolate(migratetype))
> +			goto skip_counting;
> +	}
> +	__mod_zone_freepage_state(zone, 1 << order, migratetype);
> +
> +skip_counting:
>  	__free_one_page(page, pfn, zone, order, migratetype);
> -	if (unlikely(!is_migrate_isolate(migratetype)))
> -		__mod_zone_freepage_state(zone, 1 << order, migratetype);
>  	spin_unlock(&zone->lock);
>  }
>  
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index d1473b2..1fa4a4d 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -60,6 +60,7 @@ out:
>  		int migratetype = get_pageblock_migratetype(page);
>  
>  		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +		zone->nr_isolate_pageblock++;
>  		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
>  
>  		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> @@ -83,6 +84,7 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  	nr_pages = move_freepages_block(zone, page, migratetype);
>  	__mod_zone_freepage_state(zone, nr_pages, migratetype);
>  	set_pageblock_migratetype(page, migratetype);
> +	zone->nr_isolate_pageblock--;
>  out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
> -- 
> 1.7.9.5
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-08-29 17:46   ` Naoya Horiguchi
@ 2014-09-01  0:14     ` Joonsoo Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-09-01  0:14 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel

On Fri, Aug 29, 2014 at 01:46:41PM -0400, Naoya Horiguchi wrote:
> On Tue, Aug 26, 2014 at 05:08:15PM +0900, Joonsoo Kim wrote:
> > There are two paths to reach core free function of buddy allocator,
> > __free_one_page(), one is free_one_page()->__free_one_page() and the
> > other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
> > Each paths has race condition causing serious problems. At first, this
> > patch is focused on first type of freepath. And then, following patch
> > will solve the problem in second type of freepath.
> > 
> > In the first type of freepath, we got migratetype of freeing page without
> > holding the zone lock, so it could be racy. There are two cases of this
> > race.
> > 
> > 1. pages are added to isolate buddy list after restoring orignal
> > migratetype
> > 
> > CPU1                                   CPU2
> > 
> > get migratetype => return MIGRATE_ISOLATE
> > call free_one_page() with MIGRATE_ISOLATE
> > 
> > 				grab the zone lock
> > 				unisolate pageblock
> > 				release the zone lock
> > 
> > grab the zone lock
> > call __free_one_page() with MIGRATE_ISOLATE
> > freepage go into isolate buddy list,
> > although pageblock is already unisolated
> > 
> > This may cause two problems. One is that we can't use this page anymore
> > until next isolation attempt of this pageblock, because freepage is on
> > isolate pageblock. The other is that freepage accouting could be wrong
> > due to merging between different buddy list. Freepages on isolate buddy
> > list aren't counted as freepage, but ones on normal buddy list are counted
> > as freepage. If merge happens, buddy freepage on normal buddy list is
> > inevitably moved to isolate buddy list without any consideration of
> > freepage accouting so it could be incorrect.
> > 
> > 2. pages are added to normal buddy list while pageblock is isolated.
> > It is similar with above case.
> > 
> > This also may cause two problems. One is that we can't keep these
> > freepages from being allocated. Although this pageblock is isolated,
> > freepage would be added to normal buddy list so that it could be
> > allocated without any restriction. And the other problem is same as
> > case 1, that it, incorrect freepage accouting.
> > 
> > This race condition would be prevented by checking migratetype again
> > with holding the zone lock. Because it is somewhat heavy operation
> > and it isn't needed in common case, we want to avoid rechecking as much
> > as possible. So this patch introduce new variable, nr_isolate_pageblock
> > in struct zone to check if there is isolated pageblock.
> > With this, we can avoid to re-check migratetype in common case and do
> > it only if there is isolated pageblock. This solve above
> > mentioned problems.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  include/linux/mmzone.h         |    4 ++++
> >  include/linux/page-isolation.h |    8 ++++++++
> >  mm/page_alloc.c                |   10 ++++++++--
> >  mm/page_isolation.c            |    2 ++
> >  4 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 318df70..23e69f1 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -431,6 +431,10 @@ struct zone {
> >  	 */
> >  	int			nr_migrate_reserve_block;
> >  
> > +#ifdef CONFIG_MEMORY_ISOLATION
> 
> It's worth adding some comment, especially about locking?
> The patch itself looks good me.

Okay. Will do. :)

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock
  2014-08-29 16:52   ` Naoya Horiguchi
@ 2014-09-01  0:15     ` Joonsoo Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-09-01  0:15 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel

On Fri, Aug 29, 2014 at 12:52:44PM -0400, Naoya Horiguchi wrote:
> Hi Joonsoo,
> 
> On Tue, Aug 26, 2014 at 05:08:18PM +0900, Joonsoo Kim wrote:
> > Current pageblock isolation logic could isolate each pageblock
> > individually. This causes freepage accounting problem if freepage with
> > pageblock order on isolate pageblock is merged with other freepage on
> > normal pageblock. We can prevent merging by restricting max order of
> > merging to pageblock order if freepage is on isolate pageblock.
> > 
> > Side-effect of this change is that there could be non-merged buddy
> > freepage even if finishing pageblock isolation, because undoing pageblock
> > isolation is just to move freepage from isolate buddy list to normal buddy
> > list rather than to consider merging. But, I think it doesn't matter
> > because 1) almost allocation request are for equal or below pageblock
> > order, 2) caller of pageblock isolation will use this freepage so
> > freepage will split in any case and 3) merge would happen soon after
> > some alloc/free on this and buddy pageblock.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/page_alloc.c |   15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 809bfd3..8ba9fb0 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -570,6 +570,7 @@ static inline void __free_one_page(struct page *page,
> >  	unsigned long combined_idx;
> >  	unsigned long uninitialized_var(buddy_idx);
> >  	struct page *buddy;
> > +	int max_order = MAX_ORDER;
> >  
> >  	VM_BUG_ON(!zone_is_initialized(zone));
> >  
> > @@ -580,18 +581,26 @@ static inline void __free_one_page(struct page *page,
> >  	VM_BUG_ON(migratetype == -1);
> >  	if (unlikely(has_isolate_pageblock(zone))) {
> >  		migratetype = get_pfnblock_migratetype(page, pfn);
> > -		if (is_migrate_isolate(migratetype))
> > +		if (is_migrate_isolate(migratetype)) {
> > +			/*
> > +			 * We restrict max order of merging to prevent merge
> > +			 * between freepages on isolate pageblock and normal
> > +			 * pageblock. Without this, pageblock isolation
> > +			 * could cause incorrect freepage accounting.
> > +			 */
> > +			max_order = pageblock_order + 1;
> 
> When pageblock_order >= max_order, order in the while loop below could
> go beyond MAX_ORDER - 1. Or does it never happen?

Yes, you are right. Will fix it in next spin.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-08-26  8:08 ` [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Joonsoo Kim
  2014-08-29 17:46   ` Naoya Horiguchi
@ 2014-09-08  8:31   ` Vlastimil Babka
  2014-09-15  2:31     ` Joonsoo Kim
  1 sibling, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2014-09-08  8:31 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Peter Zijlstra, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/26/2014 10:08 AM, Joonsoo Kim wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f86023b..51e0d13 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -740,9 +740,15 @@ static void free_one_page(struct zone *zone,
>   	if (nr_scanned)
>   		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
>
> +	if (unlikely(has_isolate_pageblock(zone))) {
> +		migratetype = get_pfnblock_migratetype(page, pfn);
> +		if (is_migrate_isolate(migratetype))
> +			goto skip_counting;
> +	}
> +	__mod_zone_freepage_state(zone, 1 << order, migratetype);
> +
> +skip_counting:

Here, wouldn't a simple 'else __mod_zone_freepage_state...' look better 
than goto + label? (same for the following 2 patches). Or does that 
generate worse code?

>   	__free_one_page(page, pfn, zone, order, migratetype);
> -	if (unlikely(!is_migrate_isolate(migratetype)))
> -		__mod_zone_freepage_state(zone, 1 << order, migratetype);
>   	spin_unlock(&zone->lock);
>   }
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index d1473b2..1fa4a4d 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -60,6 +60,7 @@ out:
>   		int migratetype = get_pageblock_migratetype(page);
>
>   		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +		zone->nr_isolate_pageblock++;
>   		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
>
>   		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> @@ -83,6 +84,7 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>   	nr_pages = move_freepages_block(zone, page, migratetype);
>   	__mod_zone_freepage_state(zone, nr_pages, migratetype);
>   	set_pageblock_migratetype(page, migratetype);
> +	zone->nr_isolate_pageblock--;
>   out:
>   	spin_unlock_irqrestore(&zone->lock, flags);
>   }
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-09-08  8:31   ` Vlastimil Babka
@ 2014-09-15  2:31     ` Joonsoo Kim
  2014-09-24 13:30       ` Vlastimil Babka
  0 siblings, 1 reply; 14+ messages in thread
From: Joonsoo Kim @ 2014-09-15  2:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Mon, Sep 08, 2014 at 10:31:29AM +0200, Vlastimil Babka wrote:
> On 08/26/2014 10:08 AM, Joonsoo Kim wrote:
> 
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index f86023b..51e0d13 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -740,9 +740,15 @@ static void free_one_page(struct zone *zone,
> >  	if (nr_scanned)
> >  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
> >
> >+	if (unlikely(has_isolate_pageblock(zone))) {
> >+		migratetype = get_pfnblock_migratetype(page, pfn);
> >+		if (is_migrate_isolate(migratetype))
> >+			goto skip_counting;
> >+	}
> >+	__mod_zone_freepage_state(zone, 1 << order, migratetype);
> >+
> >+skip_counting:
> 
> Here, wouldn't a simple 'else __mod_zone_freepage_state...' look
> better than goto + label? (same for the following 2 patches). Or
> does that generate worse code?

To remove goto label, we need two __mod_zone_freepage_state() like
as below. On my system, it doesn't generate worse code, but, I am not
sure that this is true if more logic would be added. I think that
goto + label is better.

+	if (unlikely(has_isolate_pageblock(zone))) {
+		migratetype = get_pfnblock_migratetype(page, pfn);
+               if (!is_migrate_isolate(migratetype))
+                       __mod_zone_freepage_state(zone, 1 << order, migratetype);
+       } else {
+               __mod_zone_freepage_state(zone, 1 << order, migratetype);
        }

Anyway, What do you think which one is better, either v2 or v3? Still, v3? :)

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 0/4] fix freepage count problems in memory isolation
  2014-08-26  8:08 [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Joonsoo Kim
                   ` (3 preceding siblings ...)
  2014-08-26  8:08 ` [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock Joonsoo Kim
@ 2014-09-15  5:09 ` Minchan Kim
  4 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2014-09-15  5:09 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, Vlastimil Babka,
	linux-mm, linux-kernel

Hi Joonsoo,

Sorry for late response.

On Tue, Aug 26, 2014 at 05:08:14PM +0900, Joonsoo Kim wrote:
> This is version 3 patchset which is improved and minimized version of
> version 1 to fix freepage accounting problem during memory isolation.
> I tried different approach in version 2, but, it looks really complicated
> so I change my mind to improve version 1. You can see version 1, 2 in
> following links [1] [2], respectively.
> 
> IMO, this v3 is better than v2, because this is simpler than v2 so
> better for maintainance and this doesn't change pageblock isolation
> logic so it is much easier to backport.
> 
> This problems are found by testing my patchset [3]. There are some race
> conditions on pageblock isolation and these race cause incorrect
> freepage count.
> 
> Before describing bugs itself, I first explain definition of freepage.
> 
> 1. pages on buddy list are counted as freepage.
> 2. pages on isolate migratetype buddy list are *not* counted as freepage.
> 3. pages on cma buddy list are counted as CMA freepage, too.
> 
> Now, I describe problems and related patch.
> 
> Patch 1: There is race conditions on getting pageblock migratetype that
> it results in misplacement of freepages on buddy list, incorrect
> freepage count and un-availability of freepage.
> 
> Patch 2: Freepages on pcp list could have stale cached information to
> determine migratetype of buddy list to go. This causes misplacement
> of freepages on buddy list and incorrect freepage count.
> 
> Patch 4: Merging between freepages on different migratetype of
> pageblocks will cause freepages accouting problem. This patch fixes it.

Look though this patchset, it's really simple compared to v2 although
it adds some overhead on hotpath but I support this patchset unless
other suggest more simple/clean code to fix horrible race bugs.

Thanks a lot!

> 
> Without patchset [3], above problem doesn't happens on my CMA allocation
> test, because CMA reserved pages aren't used at all. So there is no
> chance for above race.
> 
> With patchset [3], I did simple CMA allocation test and get below result.
> 
> - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
> - run kernel build (make -j16) on background
> - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
> - Result: more than 5000 freepage count are missed
> 
> With patchset [3] and this patchset, I found that no freepage count are
> missed so that I conclude that problems are solved.
> 
> These problems can be possible on memory hot remove users, although
> I didn't check it further.
> 
> This patchset is based on linux-next-20140826.
> Please see individual patches for more information.
> 
> Thanks.
> 
> [1]: https://lkml.org/lkml/2014/7/4/79
> [2]: lkml.org/lkml/2014/8/6/52
> [3]: Aggressively allocate the pages on cma reserved memory
>      https://lkml.org/lkml/2014/5/30/291
> 
> Joonsoo Kim (4):
>   mm/page_alloc: fix incorrect isolation behavior by rechecking
>     migratetype
>   mm/page_alloc: add freepage on isolate pageblock to correct buddy
>     list
>   mm/page_alloc: move migratetype recheck logic to __free_one_page()
>   mm/page_alloc: restrict max order of merging on isolated pageblock
> 
>  include/linux/mmzone.h         |    4 ++++
>  include/linux/page-isolation.h |    8 ++++++++
>  mm/page_alloc.c                |   28 +++++++++++++++++++---------
>  mm/page_isolation.c            |    2 ++
>  4 files changed, 33 insertions(+), 9 deletions(-)
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-09-15  2:31     ` Joonsoo Kim
@ 2014-09-24 13:30       ` Vlastimil Babka
  2014-09-25  6:13         ` Joonsoo Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Vlastimil Babka @ 2014-09-24 13:30 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 09/15/2014 04:31 AM, Joonsoo Kim wrote:
> On Mon, Sep 08, 2014 at 10:31:29AM +0200, Vlastimil Babka wrote:
>> On 08/26/2014 10:08 AM, Joonsoo Kim wrote:
>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index f86023b..51e0d13 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -740,9 +740,15 @@ static void free_one_page(struct zone *zone,
>>>   	if (nr_scanned)
>>>   		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
>>>
>>> +	if (unlikely(has_isolate_pageblock(zone))) {
>>> +		migratetype = get_pfnblock_migratetype(page, pfn);
>>> +		if (is_migrate_isolate(migratetype))
>>> +			goto skip_counting;
>>> +	}
>>> +	__mod_zone_freepage_state(zone, 1 << order, migratetype);
>>> +
>>> +skip_counting:
>>
>> Here, wouldn't a simple 'else __mod_zone_freepage_state...' look
>> better than goto + label? (same for the following 2 patches). Or
>> does that generate worse code?
>
> To remove goto label, we need two __mod_zone_freepage_state() like
> as below. On my system, it doesn't generate worse code, but, I am not
> sure that this is true if more logic would be added. I think that
> goto + label is better.

Oh right, I missed that. It's a bit subtle, but I don't see a nicer 
solution right now.

> +	if (unlikely(has_isolate_pageblock(zone))) {
> +		migratetype = get_pfnblock_migratetype(page, pfn);
> +               if (!is_migrate_isolate(migratetype))
> +                       __mod_zone_freepage_state(zone, 1 << order, migratetype);
> +       } else {
> +               __mod_zone_freepage_state(zone, 1 << order, migratetype);
>          }
>

Yeah that would be uglier I guess.

> Anyway, What do you think which one is better, either v2 or v3? Still, v3? :)

Yeah v3 is much better than v1 was, and better for backporting than v2. 
The changelogs also look quite clear. The overhead shouldn't be bad with 
the per-zone flag guarding get_pfnblock_migratetype.

I'm just not sure about patch 4 and potentially leaving unmerged budies 
behind. How would it look if instead we made sure isolation works on 
whole MAX_ORDER blocks instead?

Vlastimil

> Thanks.
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  2014-09-24 13:30       ` Vlastimil Babka
@ 2014-09-25  6:13         ` Joonsoo Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Joonsoo Kim @ 2014-09-25  6:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Peter Zijlstra,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Sep 24, 2014 at 03:30:26PM +0200, Vlastimil Babka wrote:
> On 09/15/2014 04:31 AM, Joonsoo Kim wrote:
> >On Mon, Sep 08, 2014 at 10:31:29AM +0200, Vlastimil Babka wrote:
> >>On 08/26/2014 10:08 AM, Joonsoo Kim wrote:
> >>
> >>>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>index f86023b..51e0d13 100644
> >>>--- a/mm/page_alloc.c
> >>>+++ b/mm/page_alloc.c
> >>>@@ -740,9 +740,15 @@ static void free_one_page(struct zone *zone,
> >>>  	if (nr_scanned)
> >>>  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
> >>>
> >>>+	if (unlikely(has_isolate_pageblock(zone))) {
> >>>+		migratetype = get_pfnblock_migratetype(page, pfn);
> >>>+		if (is_migrate_isolate(migratetype))
> >>>+			goto skip_counting;
> >>>+	}
> >>>+	__mod_zone_freepage_state(zone, 1 << order, migratetype);
> >>>+
> >>>+skip_counting:
> >>
> >>Here, wouldn't a simple 'else __mod_zone_freepage_state...' look
> >>better than goto + label? (same for the following 2 patches). Or
> >>does that generate worse code?
> >
> >To remove goto label, we need two __mod_zone_freepage_state() like
> >as below. On my system, it doesn't generate worse code, but, I am not
> >sure that this is true if more logic would be added. I think that
> >goto + label is better.
> 
> Oh right, I missed that. It's a bit subtle, but I don't see a nicer
> solution right now.
> 
> >+	if (unlikely(has_isolate_pageblock(zone))) {
> >+		migratetype = get_pfnblock_migratetype(page, pfn);
> >+               if (!is_migrate_isolate(migratetype))
> >+                       __mod_zone_freepage_state(zone, 1 << order, migratetype);
> >+       } else {
> >+               __mod_zone_freepage_state(zone, 1 << order, migratetype);
> >         }
> >
> 
> Yeah that would be uglier I guess.
> 
> >Anyway, What do you think which one is better, either v2 or v3? Still, v3? :)
> 
> Yeah v3 is much better than v1 was, and better for backporting than
> v2. The changelogs also look quite clear. The overhead shouldn't be
> bad with the per-zone flag guarding get_pfnblock_migratetype.

Okay. I will go this way. :)

> 
> I'm just not sure about patch 4 and potentially leaving unmerged
> budies behind. How would it look if instead we made sure isolation
> works on whole MAX_ORDER blocks instead?
> 

If alloc_contig_range() succeed, and later, free_contig_range() is
called for free, there would be no leaving unmerged buddies.
If we fail on alloc_contig_range(), we can get unmerged buddies, but,
that's rare case and it's not big matter because normally we don't
want to allocate page with MAXORDER-1. We mostly want to allocate page
with pageblock_order at maximum. After some split and merging of freepage,
freepage could be MAXORDER-1 page again so that's not real issue, IMO.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-25  6:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-26  8:08 [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Joonsoo Kim
2014-08-26  8:08 ` [RFC PATCH v3 1/4] mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype Joonsoo Kim
2014-08-29 17:46   ` Naoya Horiguchi
2014-09-01  0:14     ` Joonsoo Kim
2014-09-08  8:31   ` Vlastimil Babka
2014-09-15  2:31     ` Joonsoo Kim
2014-09-24 13:30       ` Vlastimil Babka
2014-09-25  6:13         ` Joonsoo Kim
2014-08-26  8:08 ` [RFC PATCH v3 2/4] mm/page_alloc: add freepage on isolate pageblock to correct buddy list Joonsoo Kim
2014-08-26  8:08 ` [RFC PATCH v3 3/4] mm/page_alloc: move migratetype recheck logic to __free_one_page() Joonsoo Kim
2014-08-26  8:08 ` [RFC PATCH v3 4/4] mm/page_alloc: restrict max order of merging on isolated pageblock Joonsoo Kim
2014-08-29 16:52   ` Naoya Horiguchi
2014-09-01  0:15     ` Joonsoo Kim
2014-09-15  5:09 ` [RFC PATCH v3 0/4] fix freepage count problems in memory isolation Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).