All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] fix freepage count problems in memory isolation
@ 2014-08-06  7:18 ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Hello,

This patchset aims at fixing problems during memory isolation found by
testing my patchset [1].

These are really subtle problems so I can be wrong. If you find what I am
missing, please let me know.

Before describing bugs itself, I first explain definition of freepage.

1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.
4. pages for guard are *not* counted as freepage.

Now, I describe problems and related patch.

Patch 1: If guard page are cleared and merged into isolate buddy list,
we should not add freepage count.

Patch 4: There is race conditions that results in misplacement of free
pages on buddy list. Then, it results in incorrect freepage count and
un-availability of freepage.

Patch 5: To count freepage correctly, we should prevent freepage from
being added to buddy list in some period of isolation. Without it, we
cannot be sure if the freepage is counted or not and miscount number
of freepage.

Patch 7: In spite of above fixes, there is one more condition for
incorrect freepage count. pageblock isolation could be done in pageblock
unit  so we can't prevent freepage from merging with page on next
pageblock. To fix it, start_isolate_page_range() and
undo_isolate_page_range() is modified to process whole range at one go.
With this change, if input parameter of start_isolate_page_range() and
undo_isolate_page_range() is properly aligned, there is no condition for
incorrect merging.

Without patchset [1], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.

With patchset [1], I did simple CMA allocation test and get below result.

- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed

With patchset [1] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.

These problems can be possible on memory hot remove users, although
I didn't check it further.

This patchset is based on linux-next-20140728.
Please see individual patches for more information.

Thanks.

[1]: Aggressively allocate the pages on cma reserved memory
     https://lkml.org/lkml/2014/5/30/291

Joonsoo Kim (8):
  mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  mm/isolation: remove unstable check for isolated page
  mm/page_alloc: fix pcp high, batch management
  mm/isolation: close the two race problems related to pageblock
    isolation
  mm/isolation: change pageblock isolation logic to fix freepage
    counting bugs
  mm/isolation: factor out pre/post logic on
    set/unset_migratetype_isolate()
  mm/isolation: fix freepage counting bug on
    start/undo_isolat_page_range()
  mm/isolation: remove useless race handling related to pageblock
    isolation

 include/linux/page-isolation.h |    2 +
 mm/internal.h                  |    5 +
 mm/page_alloc.c                |  223 +++++++++++++++++-------------
 mm/page_isolation.c            |  292 +++++++++++++++++++++++++++++++---------
 4 files changed, 368 insertions(+), 154 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 0/8] fix freepage count problems in memory isolation
@ 2014-08-06  7:18 ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Hello,

This patchset aims at fixing problems during memory isolation found by
testing my patchset [1].

These are really subtle problems so I can be wrong. If you find what I am
missing, please let me know.

Before describing bugs itself, I first explain definition of freepage.

1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.
4. pages for guard are *not* counted as freepage.

Now, I describe problems and related patch.

Patch 1: If guard page are cleared and merged into isolate buddy list,
we should not add freepage count.

Patch 4: There is race conditions that results in misplacement of free
pages on buddy list. Then, it results in incorrect freepage count and
un-availability of freepage.

Patch 5: To count freepage correctly, we should prevent freepage from
being added to buddy list in some period of isolation. Without it, we
cannot be sure if the freepage is counted or not and miscount number
of freepage.

Patch 7: In spite of above fixes, there is one more condition for
incorrect freepage count. pageblock isolation could be done in pageblock
unit  so we can't prevent freepage from merging with page on next
pageblock. To fix it, start_isolate_page_range() and
undo_isolate_page_range() is modified to process whole range at one go.
With this change, if input parameter of start_isolate_page_range() and
undo_isolate_page_range() is properly aligned, there is no condition for
incorrect merging.

Without patchset [1], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.

With patchset [1], I did simple CMA allocation test and get below result.

- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed

With patchset [1] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.

These problems can be possible on memory hot remove users, although
I didn't check it further.

This patchset is based on linux-next-20140728.
Please see individual patches for more information.

Thanks.

[1]: Aggressively allocate the pages on cma reserved memory
     https://lkml.org/lkml/2014/5/30/291

Joonsoo Kim (8):
  mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  mm/isolation: remove unstable check for isolated page
  mm/page_alloc: fix pcp high, batch management
  mm/isolation: close the two race problems related to pageblock
    isolation
  mm/isolation: change pageblock isolation logic to fix freepage
    counting bugs
  mm/isolation: factor out pre/post logic on
    set/unset_migratetype_isolate()
  mm/isolation: fix freepage counting bug on
    start/undo_isolat_page_range()
  mm/isolation: remove useless race handling related to pageblock
    isolation

 include/linux/page-isolation.h |    2 +
 mm/internal.h                  |    5 +
 mm/page_alloc.c                |  223 +++++++++++++++++-------------
 mm/page_isolation.c            |  292 +++++++++++++++++++++++++++++++---------
 4 files changed, 368 insertions(+), 154 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH v2 1/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

In __free_one_page(), we check the buddy page if it is guard page.
And, if so, we should clear guard attribute on the buddy page. But,
currently, we clear original page's order rather than buddy one's.
This doesn't have any problem, because resetting buddy's order
is useless and the original page's order is re-assigned soon.
But, it is better to correct code.

Additionally, I change (set/clear)_page_guard_flag() to
(set/clear)_page_guard() and makes these functions do all works
needed for guard page. This may make code more understandable.

One more thing, I did in this patch, is that fixing freepage accounting.
If we clear guard page and link it onto isolate buddy list, we should
not increase freepage count.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b99643d4..e6fee4b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
 }
 __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
 
-static inline void set_page_guard_flag(struct page *page)
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, order);
+	/* Guard pages are not available for any usage */
+	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
 }
 
-static inline void clear_page_guard_flag(struct page *page)
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, 0);
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, (1 << order), migratetype);
 }
 #else
-static inline void set_page_guard_flag(struct page *page) { }
-static inline void clear_page_guard_flag(struct page *page) { }
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
 #endif
 
 static inline void set_page_order(struct page *page, unsigned int order)
@@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
 		 * merge with it and move up one order.
 		 */
 		if (page_is_guard(buddy)) {
-			clear_page_guard_flag(buddy);
-			set_page_private(page, 0);
-			__mod_zone_freepage_state(zone, 1 << order,
-						  migratetype);
+			clear_page_guard(zone, buddy, order, migratetype);
 		} else {
 			list_del(&buddy->lru);
 			zone->free_area[order].nr_free--;
@@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
 			 * pages will stay not present in virtual address space
 			 */
 			INIT_LIST_HEAD(&page[size].lru);
-			set_page_guard_flag(&page[size]);
-			set_page_private(&page[size], high);
-			/* Guard pages are not available for any usage */
-			__mod_zone_freepage_state(zone, -(1 << high),
-						  migratetype);
+			set_page_guard(zone, &page[size], high, migratetype);
 			continue;
 		}
 #endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 1/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

In __free_one_page(), we check the buddy page if it is guard page.
And, if so, we should clear guard attribute on the buddy page. But,
currently, we clear original page's order rather than buddy one's.
This doesn't have any problem, because resetting buddy's order
is useless and the original page's order is re-assigned soon.
But, it is better to correct code.

Additionally, I change (set/clear)_page_guard_flag() to
(set/clear)_page_guard() and makes these functions do all works
needed for guard page. This may make code more understandable.

One more thing, I did in this patch, is that fixing freepage accounting.
If we clear guard page and link it onto isolate buddy list, we should
not increase freepage count.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b99643d4..e6fee4b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
 }
 __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
 
-static inline void set_page_guard_flag(struct page *page)
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, order);
+	/* Guard pages are not available for any usage */
+	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
 }
 
-static inline void clear_page_guard_flag(struct page *page)
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, 0);
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, (1 << order), migratetype);
 }
 #else
-static inline void set_page_guard_flag(struct page *page) { }
-static inline void clear_page_guard_flag(struct page *page) { }
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
 #endif
 
 static inline void set_page_order(struct page *page, unsigned int order)
@@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
 		 * merge with it and move up one order.
 		 */
 		if (page_is_guard(buddy)) {
-			clear_page_guard_flag(buddy);
-			set_page_private(page, 0);
-			__mod_zone_freepage_state(zone, 1 << order,
-						  migratetype);
+			clear_page_guard(zone, buddy, order, migratetype);
 		} else {
 			list_del(&buddy->lru);
 			zone->free_area[order].nr_free--;
@@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
 			 * pages will stay not present in virtual address space
 			 */
 			INIT_LIST_HEAD(&page[size].lru);
-			set_page_guard_flag(&page[size]);
-			set_page_private(&page[size], high);
-			/* Guard pages are not available for any usage */
-			__mod_zone_freepage_state(zone, -(1 << high),
-						  migratetype);
+			set_page_guard(zone, &page[size], high, migratetype);
 			continue;
 		}
 #endif
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

per cpu pages structure, aka pcp, has high and batch values to control
how many pages we perform caching. This values could be updated
asynchronously and updater should ensure that this doesn't make any
problem. For this purpose, pageset_update() is implemented and do some
memory synchronization. But, it turns out to be wrong when I implemented
new feature using this. There is no corresponding smp_rmb() in read-side
so that it can't guarantee anything. Without correct updating, system
could hang in free_pcppages_bulk() due to larger batch value than high.
To properly update this values, we need to synchronization primitives on
read-side, but, it hurts allocator's fastpath.

There is another choice for synchronization, that is, sending IPI. This
is somewhat expensive, but, this is really rare case so I guess it has
no problem here. However, reducing IPI is very helpful here. Current
logic handles each CPU's pcp update one by one. To reduce sending IPI,
we need to re-ogranize the code to handle all CPU's pcp update at one go.
This patch implement these requirements.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 80 insertions(+), 59 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b99643d4..44672dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 static void setup_zone_pageset(struct zone *zone);
 
@@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu) {
-		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
+	setup_pageset(&boot_pageset);
 
+	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
 		/*
 		 * We now know the "local memory node" for each node--
@@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
  * outside of boot time (or some other assurance that no concurrent updaters
  * exist).
  */
-static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
-		unsigned long batch)
+static void pageset_update(struct zone *zone, int high, int batch)
 {
-       /* start with a fail safe value for batch */
-	pcp->batch = 1;
-	smp_wmb();
+	int cpu;
+	struct per_cpu_pages *pcp;
+
+	/* start with a fail safe value for batch */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = 1;
+	}
+	kick_all_cpus_sync();
+
+	/* Update high, then batch, in order */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->high = high;
+	}
+	kick_all_cpus_sync();
 
-       /* Update high, then batch, in order */
-	pcp->high = high;
-	smp_wmb();
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = batch;
+	}
+}
+
+/*
+ * pageset_get_values_by_high() gets the high water mark for
+ * hot per_cpu_pagelist to the value high for the pageset p.
+ */
+static void pageset_get_values_by_high(int input_high,
+				int *output_high, int *output_batch)
+{
+	*output_batch = max(1, input_high / 4);
+	if ((input_high / 4) > (PAGE_SHIFT * 8))
+		*output_batch = PAGE_SHIFT * 8;
+}
 
-	pcp->batch = batch;
+/* a companion to pageset_get_values_by_high() */
+static void pageset_get_values_by_batch(int input_batch,
+				int *output_high, int *output_batch)
+{
+	*output_high = 6 * input_batch;
+	*output_batch = max(1, 1 * input_batch);
 }
 
-/* a companion to pageset_set_high() */
-static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_get_values(struct zone *zone, int *high, int *batch)
 {
-	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
+	if (percpu_pagelist_fraction) {
+		pageset_get_values_by_high(
+			(zone->managed_pages / percpu_pagelist_fraction),
+			high, batch);
+	} else
+		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
 }
 
 static void pageset_init(struct per_cpu_pageset *p)
@@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+/* Use this only in boot time, because it doesn't do any synchronization */
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
 {
-	pageset_init(p);
-	pageset_set_batch(p, batch);
-}
-
-/*
- * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
- */
-static void pageset_set_high(struct per_cpu_pageset *p,
-				unsigned long high)
-{
-	unsigned long batch = max(1UL, high / 4);
-	if ((high / 4) > (PAGE_SHIFT * 8))
-		batch = PAGE_SHIFT * 8;
-
-	pageset_update(&p->pcp, high, batch);
-}
-
-static void pageset_set_high_and_batch(struct zone *zone,
-				       struct per_cpu_pageset *pcp)
-{
-	if (percpu_pagelist_fraction)
-		pageset_set_high(pcp,
-			(zone->managed_pages /
-				percpu_pagelist_fraction));
-	else
-		pageset_set_batch(pcp, zone_batchsize(zone));
-}
+	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
 
-static void __meminit zone_pageset_init(struct zone *zone, int cpu)
-{
-	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+	pageset_get_values_by_batch(0, &high, &batch);
 
-	pageset_init(pcp);
-	pageset_set_high_and_batch(zone, pcp);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(pcp, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 static void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
+
+	pageset_get_values(zone, &high, &batch);
+
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
-	for_each_possible_cpu(cpu)
-		zone_pageset_init(zone, cpu);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(zone->pageset, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 /*
@@ -5925,11 +5947,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
 		goto out;
 
 	for_each_populated_zone(zone) {
-		unsigned int cpu;
+		int high, batch;
 
-		for_each_possible_cpu(cpu)
-			pageset_set_high_and_batch(zone,
-					per_cpu_ptr(zone->pageset, cpu));
+		pageset_get_values(zone, &high, &batch);
+		pageset_update(zone, high, batch);
 	}
 out:
 	mutex_unlock(&pcp_batch_high_lock);
@@ -6452,11 +6473,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
  */
 void __meminit zone_pcp_update(struct zone *zone)
 {
-	unsigned cpu;
+	int high, batch;
+
 	mutex_lock(&pcp_batch_high_lock);
-	for_each_possible_cpu(cpu)
-		pageset_set_high_and_batch(zone,
-				per_cpu_ptr(zone->pageset, cpu));
+	pageset_get_values(zone, &high, &batch);
+	pageset_update(zone, high, batch);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

per cpu pages structure, aka pcp, has high and batch values to control
how many pages we perform caching. This values could be updated
asynchronously and updater should ensure that this doesn't make any
problem. For this purpose, pageset_update() is implemented and do some
memory synchronization. But, it turns out to be wrong when I implemented
new feature using this. There is no corresponding smp_rmb() in read-side
so that it can't guarantee anything. Without correct updating, system
could hang in free_pcppages_bulk() due to larger batch value than high.
To properly update this values, we need to synchronization primitives on
read-side, but, it hurts allocator's fastpath.

There is another choice for synchronization, that is, sending IPI. This
is somewhat expensive, but, this is really rare case so I guess it has
no problem here. However, reducing IPI is very helpful here. Current
logic handles each CPU's pcp update one by one. To reduce sending IPI,
we need to re-ogranize the code to handle all CPU's pcp update at one go.
This patch implement these requirements.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 80 insertions(+), 59 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b99643d4..44672dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 static void setup_zone_pageset(struct zone *zone);
 
@@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu) {
-		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
+	setup_pageset(&boot_pageset);
 
+	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
 		/*
 		 * We now know the "local memory node" for each node--
@@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
  * outside of boot time (or some other assurance that no concurrent updaters
  * exist).
  */
-static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
-		unsigned long batch)
+static void pageset_update(struct zone *zone, int high, int batch)
 {
-       /* start with a fail safe value for batch */
-	pcp->batch = 1;
-	smp_wmb();
+	int cpu;
+	struct per_cpu_pages *pcp;
+
+	/* start with a fail safe value for batch */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = 1;
+	}
+	kick_all_cpus_sync();
+
+	/* Update high, then batch, in order */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->high = high;
+	}
+	kick_all_cpus_sync();
 
-       /* Update high, then batch, in order */
-	pcp->high = high;
-	smp_wmb();
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = batch;
+	}
+}
+
+/*
+ * pageset_get_values_by_high() gets the high water mark for
+ * hot per_cpu_pagelist to the value high for the pageset p.
+ */
+static void pageset_get_values_by_high(int input_high,
+				int *output_high, int *output_batch)
+{
+	*output_batch = max(1, input_high / 4);
+	if ((input_high / 4) > (PAGE_SHIFT * 8))
+		*output_batch = PAGE_SHIFT * 8;
+}
 
-	pcp->batch = batch;
+/* a companion to pageset_get_values_by_high() */
+static void pageset_get_values_by_batch(int input_batch,
+				int *output_high, int *output_batch)
+{
+	*output_high = 6 * input_batch;
+	*output_batch = max(1, 1 * input_batch);
 }
 
-/* a companion to pageset_set_high() */
-static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_get_values(struct zone *zone, int *high, int *batch)
 {
-	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
+	if (percpu_pagelist_fraction) {
+		pageset_get_values_by_high(
+			(zone->managed_pages / percpu_pagelist_fraction),
+			high, batch);
+	} else
+		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
 }
 
 static void pageset_init(struct per_cpu_pageset *p)
@@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+/* Use this only in boot time, because it doesn't do any synchronization */
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
 {
-	pageset_init(p);
-	pageset_set_batch(p, batch);
-}
-
-/*
- * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
- */
-static void pageset_set_high(struct per_cpu_pageset *p,
-				unsigned long high)
-{
-	unsigned long batch = max(1UL, high / 4);
-	if ((high / 4) > (PAGE_SHIFT * 8))
-		batch = PAGE_SHIFT * 8;
-
-	pageset_update(&p->pcp, high, batch);
-}
-
-static void pageset_set_high_and_batch(struct zone *zone,
-				       struct per_cpu_pageset *pcp)
-{
-	if (percpu_pagelist_fraction)
-		pageset_set_high(pcp,
-			(zone->managed_pages /
-				percpu_pagelist_fraction));
-	else
-		pageset_set_batch(pcp, zone_batchsize(zone));
-}
+	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
 
-static void __meminit zone_pageset_init(struct zone *zone, int cpu)
-{
-	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+	pageset_get_values_by_batch(0, &high, &batch);
 
-	pageset_init(pcp);
-	pageset_set_high_and_batch(zone, pcp);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(pcp, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 static void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
+
+	pageset_get_values(zone, &high, &batch);
+
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
-	for_each_possible_cpu(cpu)
-		zone_pageset_init(zone, cpu);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(zone->pageset, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 /*
@@ -5925,11 +5947,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
 		goto out;
 
 	for_each_populated_zone(zone) {
-		unsigned int cpu;
+		int high, batch;
 
-		for_each_possible_cpu(cpu)
-			pageset_set_high_and_batch(zone,
-					per_cpu_ptr(zone->pageset, cpu));
+		pageset_get_values(zone, &high, &batch);
+		pageset_update(zone, high, batch);
 	}
 out:
 	mutex_unlock(&pcp_batch_high_lock);
@@ -6452,11 +6473,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
  */
 void __meminit zone_pcp_update(struct zone *zone)
 {
-	unsigned cpu;
+	int high, batch;
+
 	mutex_lock(&pcp_batch_high_lock);
-	for_each_possible_cpu(cpu)
-		pageset_set_high_and_batch(zone,
-				per_cpu_ptr(zone->pageset, cpu));
+	pageset_get_values(zone, &high, &batch);
+	pageset_update(zone, high, batch);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

The check '!PageBuddy(page) && page_count(page) == 0 &&
migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
Although it could go into buddy allocator within a short time,
futher operation such as isolate_freepages_range() in CMA, called after
test_page_isolated_in_pageblock(), could be failed due to this unstability
since it requires that the page is on buddy. I think that removing
this unstability is good thing.

And, following patch makes isolated freepage has new status matched with
this condition and this check is the obstacle to that change. So remove
it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..3100f98 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -198,11 +198,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 						MIGRATE_ISOLATE);
 			}
 			pfn += 1 << page_order(page);
-		}
-		else if (page_count(page) == 0 &&
-			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
-			pfn += 1;
-		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
+		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
 			 * The HWPoisoned page may be not in buddy
 			 * system, and page_count() is not 0.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

The check '!PageBuddy(page) && page_count(page) == 0 &&
migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
Although it could go into buddy allocator within a short time,
futher operation such as isolate_freepages_range() in CMA, called after
test_page_isolated_in_pageblock(), could be failed due to this unstability
since it requires that the page is on buddy. I think that removing
this unstability is good thing.

And, following patch makes isolated freepage has new status matched with
this condition and this check is the obstacle to that change. So remove
it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..3100f98 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -198,11 +198,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 						MIGRATE_ISOLATE);
 			}
 			pfn += 1 << page_order(page);
-		}
-		else if (page_count(page) == 0 &&
-			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
-			pfn += 1;
-		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
+		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
 			 * The HWPoisoned page may be not in buddy
 			 * system, and page_count() is not 0.
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

In __free_one_page(), we check the buddy page if it is guard page.
And, if so, we should clear guard attribute on the buddy page. But,
currently, we clear original page's order rather than buddy one's.
This doesn't have any problem, because resetting buddy's order
is useless and the original page's order is re-assigned soon.
But, it is better to correct code.

Additionally, I change (set/clear)_page_guard_flag() to
(set/clear)_page_guard() and makes these functions do all works
needed for guard page. This may make code more understandable.

One more thing, I did in this patch, is that fixing freepage accounting.
If we clear guard page and link it onto isolate buddy list, we should
not increase freepage count.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44672dc..3e1e344 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
 }
 __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
 
-static inline void set_page_guard_flag(struct page *page)
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, order);
+	/* Guard pages are not available for any usage */
+	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
 }
 
-static inline void clear_page_guard_flag(struct page *page)
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, 0);
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, (1 << order), migratetype);
 }
 #else
-static inline void set_page_guard_flag(struct page *page) { }
-static inline void clear_page_guard_flag(struct page *page) { }
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
 #endif
 
 static inline void set_page_order(struct page *page, unsigned int order)
@@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
 		 * merge with it and move up one order.
 		 */
 		if (page_is_guard(buddy)) {
-			clear_page_guard_flag(buddy);
-			set_page_private(page, 0);
-			__mod_zone_freepage_state(zone, 1 << order,
-						  migratetype);
+			clear_page_guard(zone, buddy, order, migratetype);
 		} else {
 			list_del(&buddy->lru);
 			zone->free_area[order].nr_free--;
@@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
 			 * pages will stay not present in virtual address space
 			 */
 			INIT_LIST_HEAD(&page[size].lru);
-			set_page_guard_flag(&page[size]);
-			set_page_private(&page[size], high);
-			/* Guard pages are not available for any usage */
-			__mod_zone_freepage_state(zone, -(1 << high),
-						  migratetype);
+			set_page_guard(zone, &page[size], high, migratetype);
 			continue;
 		}
 #endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

In __free_one_page(), we check the buddy page if it is guard page.
And, if so, we should clear guard attribute on the buddy page. But,
currently, we clear original page's order rather than buddy one's.
This doesn't have any problem, because resetting buddy's order
is useless and the original page's order is re-assigned soon.
But, it is better to correct code.

Additionally, I change (set/clear)_page_guard_flag() to
(set/clear)_page_guard() and makes these functions do all works
needed for guard page. This may make code more understandable.

One more thing, I did in this patch, is that fixing freepage accounting.
If we clear guard page and link it onto isolate buddy list, we should
not increase freepage count.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |   29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 44672dc..3e1e344 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
 }
 __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
 
-static inline void set_page_guard_flag(struct page *page)
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, order);
+	/* Guard pages are not available for any usage */
+	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
 }
 
-static inline void clear_page_guard_flag(struct page *page)
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype)
 {
 	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
+	set_page_private(page, 0);
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, (1 << order), migratetype);
 }
 #else
-static inline void set_page_guard_flag(struct page *page) { }
-static inline void clear_page_guard_flag(struct page *page) { }
+static inline void set_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
+static inline void clear_page_guard(struct zone *zone, struct page *page,
+				unsigned int order, int migratetype) {}
 #endif
 
 static inline void set_page_order(struct page *page, unsigned int order)
@@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
 		 * merge with it and move up one order.
 		 */
 		if (page_is_guard(buddy)) {
-			clear_page_guard_flag(buddy);
-			set_page_private(page, 0);
-			__mod_zone_freepage_state(zone, 1 << order,
-						  migratetype);
+			clear_page_guard(zone, buddy, order, migratetype);
 		} else {
 			list_del(&buddy->lru);
 			zone->free_area[order].nr_free--;
@@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
 			 * pages will stay not present in virtual address space
 			 */
 			INIT_LIST_HEAD(&page[size].lru);
-			set_page_guard_flag(&page[size]);
-			set_page_private(&page[size], high);
-			/* Guard pages are not available for any usage */
-			__mod_zone_freepage_state(zone, -(1 << high),
-						  migratetype);
+			set_page_guard(zone, &page[size], high, migratetype);
 			continue;
 		}
 #endif
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 3/8] mm/isolation: remove unstable check for isolated page
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

The check '!PageBuddy(page) && page_count(page) == 0 &&
migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
Although it could go into buddy allocator within a short time,
futher operation such as isolate_freepages_range() in CMA, called after
test_page_isolated_in_pageblock(), could be failed due to this unstability
since it requires that the page is on buddy. I think that removing
this unstability is good thing.

And, following patch makes isolated freepage has new status matched with
this condition and this check is the obstacle to that change. So remove
it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..3100f98 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -198,11 +198,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 						MIGRATE_ISOLATE);
 			}
 			pfn += 1 << page_order(page);
-		}
-		else if (page_count(page) == 0 &&
-			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
-			pfn += 1;
-		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
+		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
 			 * The HWPoisoned page may be not in buddy
 			 * system, and page_count() is not 0.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 3/8] mm/isolation: remove unstable check for isolated page
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

The check '!PageBuddy(page) && page_count(page) == 0 &&
migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
Although it could go into buddy allocator within a short time,
futher operation such as isolate_freepages_range() in CMA, called after
test_page_isolated_in_pageblock(), could be failed due to this unstability
since it requires that the page is on buddy. I think that removing
this unstability is good thing.

And, following patch makes isolated freepage has new status matched with
this condition and this check is the obstacle to that change. So remove
it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..3100f98 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -198,11 +198,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 						MIGRATE_ISOLATE);
 			}
 			pfn += 1 << page_order(page);
-		}
-		else if (page_count(page) == 0 &&
-			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
-			pfn += 1;
-		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
+		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
 			 * The HWPoisoned page may be not in buddy
 			 * system, and page_count() is not 0.
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

per cpu pages structure, aka pcp, has high and batch values to control
how many pages we perform caching. This values could be updated
asynchronously and updater should ensure that this doesn't make any
problem. For this purpose, pageset_update() is implemented and do some
memory synchronization. But, it turns out to be wrong when I implemented
new feature using this. There is no corresponding smp_rmb() in read-side
so that it can't guarantee anything. Without correct updating, system
could hang in free_pcppages_bulk() due to larger batch value than high.
To properly update this values, we need to synchronization primitives on
read-side, but, it hurts allocator's fastpath.

There is another choice for synchronization, that is, sending IPI. This
is somewhat expensive, but, this is really rare case so I guess it has
no problem here. However, reducing IPI is very helpful here. Current
logic handles each CPU's pcp update one by one. To reduce sending IPI,
we need to re-ogranize the code to handle all CPU's pcp update at one go.
This patch implement these requirements.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 80 insertions(+), 59 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e6fee4b..3e1e344 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3800,7 +3800,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 static void setup_zone_pageset(struct zone *zone);
 
@@ -3846,9 +3846,9 @@ static int __build_all_zonelists(void *data)
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu) {
-		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
+	setup_pageset(&boot_pageset);
 
+	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
 		/*
 		 * We now know the "local memory node" for each node--
@@ -4230,24 +4230,59 @@ static int zone_batchsize(struct zone *zone)
  * outside of boot time (or some other assurance that no concurrent updaters
  * exist).
  */
-static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
-		unsigned long batch)
+static void pageset_update(struct zone *zone, int high, int batch)
 {
-       /* start with a fail safe value for batch */
-	pcp->batch = 1;
-	smp_wmb();
+	int cpu;
+	struct per_cpu_pages *pcp;
+
+	/* start with a fail safe value for batch */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = 1;
+	}
+	kick_all_cpus_sync();
+
+	/* Update high, then batch, in order */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->high = high;
+	}
+	kick_all_cpus_sync();
 
-       /* Update high, then batch, in order */
-	pcp->high = high;
-	smp_wmb();
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = batch;
+	}
+}
+
+/*
+ * pageset_get_values_by_high() gets the high water mark for
+ * hot per_cpu_pagelist to the value high for the pageset p.
+ */
+static void pageset_get_values_by_high(int input_high,
+				int *output_high, int *output_batch)
+{
+	*output_batch = max(1, input_high / 4);
+	if ((input_high / 4) > (PAGE_SHIFT * 8))
+		*output_batch = PAGE_SHIFT * 8;
+}
 
-	pcp->batch = batch;
+/* a companion to pageset_get_values_by_high() */
+static void pageset_get_values_by_batch(int input_batch,
+				int *output_high, int *output_batch)
+{
+	*output_high = 6 * input_batch;
+	*output_batch = max(1, 1 * input_batch);
 }
 
-/* a companion to pageset_set_high() */
-static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_get_values(struct zone *zone, int *high, int *batch)
 {
-	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
+	if (percpu_pagelist_fraction) {
+		pageset_get_values_by_high(
+			(zone->managed_pages / percpu_pagelist_fraction),
+			high, batch);
+	} else
+		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
 }
 
 static void pageset_init(struct per_cpu_pageset *p)
@@ -4263,51 +4298,38 @@ static void pageset_init(struct per_cpu_pageset *p)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+/* Use this only in boot time, because it doesn't do any synchronization */
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
 {
-	pageset_init(p);
-	pageset_set_batch(p, batch);
-}
-
-/*
- * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
- */
-static void pageset_set_high(struct per_cpu_pageset *p,
-				unsigned long high)
-{
-	unsigned long batch = max(1UL, high / 4);
-	if ((high / 4) > (PAGE_SHIFT * 8))
-		batch = PAGE_SHIFT * 8;
-
-	pageset_update(&p->pcp, high, batch);
-}
-
-static void pageset_set_high_and_batch(struct zone *zone,
-				       struct per_cpu_pageset *pcp)
-{
-	if (percpu_pagelist_fraction)
-		pageset_set_high(pcp,
-			(zone->managed_pages /
-				percpu_pagelist_fraction));
-	else
-		pageset_set_batch(pcp, zone_batchsize(zone));
-}
+	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
 
-static void __meminit zone_pageset_init(struct zone *zone, int cpu)
-{
-	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+	pageset_get_values_by_batch(0, &high, &batch);
 
-	pageset_init(pcp);
-	pageset_set_high_and_batch(zone, pcp);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(pcp, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 static void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
+
+	pageset_get_values(zone, &high, &batch);
+
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
-	for_each_possible_cpu(cpu)
-		zone_pageset_init(zone, cpu);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(zone->pageset, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 /*
@@ -5928,11 +5950,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
 		goto out;
 
 	for_each_populated_zone(zone) {
-		unsigned int cpu;
+		int high, batch;
 
-		for_each_possible_cpu(cpu)
-			pageset_set_high_and_batch(zone,
-					per_cpu_ptr(zone->pageset, cpu));
+		pageset_get_values(zone, &high, &batch);
+		pageset_update(zone, high, batch);
 	}
 out:
 	mutex_unlock(&pcp_batch_high_lock);
@@ -6455,11 +6476,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
  */
 void __meminit zone_pcp_update(struct zone *zone)
 {
-	unsigned cpu;
+	int high, batch;
+
 	mutex_lock(&pcp_batch_high_lock);
-	for_each_possible_cpu(cpu)
-		pageset_set_high_and_batch(zone,
-				per_cpu_ptr(zone->pageset, cpu));
+	pageset_get_values(zone, &high, &batch);
+	pageset_update(zone, high, batch);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

per cpu pages structure, aka pcp, has high and batch values to control
how many pages we perform caching. This values could be updated
asynchronously and updater should ensure that this doesn't make any
problem. For this purpose, pageset_update() is implemented and do some
memory synchronization. But, it turns out to be wrong when I implemented
new feature using this. There is no corresponding smp_rmb() in read-side
so that it can't guarantee anything. Without correct updating, system
could hang in free_pcppages_bulk() due to larger batch value than high.
To properly update this values, we need to synchronization primitives on
read-side, but, it hurts allocator's fastpath.

There is another choice for synchronization, that is, sending IPI. This
is somewhat expensive, but, this is really rare case so I guess it has
no problem here. However, reducing IPI is very helpful here. Current
logic handles each CPU's pcp update one by one. To reduce sending IPI,
we need to re-ogranize the code to handle all CPU's pcp update at one go.
This patch implement these requirements.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 80 insertions(+), 59 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e6fee4b..3e1e344 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3800,7 +3800,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
  * not check if the processor is online before following the pageset pointer.
  * Other parts of the kernel may not check if the zone is available.
  */
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
 static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
 static void setup_zone_pageset(struct zone *zone);
 
@@ -3846,9 +3846,9 @@ static int __build_all_zonelists(void *data)
 	 * needs the percpu allocator in order to allocate its pagesets
 	 * (a chicken-egg dilemma).
 	 */
-	for_each_possible_cpu(cpu) {
-		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
+	setup_pageset(&boot_pageset);
 
+	for_each_possible_cpu(cpu) {
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
 		/*
 		 * We now know the "local memory node" for each node--
@@ -4230,24 +4230,59 @@ static int zone_batchsize(struct zone *zone)
  * outside of boot time (or some other assurance that no concurrent updaters
  * exist).
  */
-static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
-		unsigned long batch)
+static void pageset_update(struct zone *zone, int high, int batch)
 {
-       /* start with a fail safe value for batch */
-	pcp->batch = 1;
-	smp_wmb();
+	int cpu;
+	struct per_cpu_pages *pcp;
+
+	/* start with a fail safe value for batch */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = 1;
+	}
+	kick_all_cpus_sync();
+
+	/* Update high, then batch, in order */
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->high = high;
+	}
+	kick_all_cpus_sync();
 
-       /* Update high, then batch, in order */
-	pcp->high = high;
-	smp_wmb();
+	for_each_possible_cpu(cpu) {
+		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
+		pcp->batch = batch;
+	}
+}
+
+/*
+ * pageset_get_values_by_high() gets the high water mark for
+ * hot per_cpu_pagelist to the value high for the pageset p.
+ */
+static void pageset_get_values_by_high(int input_high,
+				int *output_high, int *output_batch)
+{
+	*output_batch = max(1, input_high / 4);
+	if ((input_high / 4) > (PAGE_SHIFT * 8))
+		*output_batch = PAGE_SHIFT * 8;
+}
 
-	pcp->batch = batch;
+/* a companion to pageset_get_values_by_high() */
+static void pageset_get_values_by_batch(int input_batch,
+				int *output_high, int *output_batch)
+{
+	*output_high = 6 * input_batch;
+	*output_batch = max(1, 1 * input_batch);
 }
 
-/* a companion to pageset_set_high() */
-static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
+static void pageset_get_values(struct zone *zone, int *high, int *batch)
 {
-	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
+	if (percpu_pagelist_fraction) {
+		pageset_get_values_by_high(
+			(zone->managed_pages / percpu_pagelist_fraction),
+			high, batch);
+	} else
+		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
 }
 
 static void pageset_init(struct per_cpu_pageset *p)
@@ -4263,51 +4298,38 @@ static void pageset_init(struct per_cpu_pageset *p)
 		INIT_LIST_HEAD(&pcp->lists[migratetype]);
 }
 
-static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
+/* Use this only in boot time, because it doesn't do any synchronization */
+static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
 {
-	pageset_init(p);
-	pageset_set_batch(p, batch);
-}
-
-/*
- * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
- * to the value high for the pageset p.
- */
-static void pageset_set_high(struct per_cpu_pageset *p,
-				unsigned long high)
-{
-	unsigned long batch = max(1UL, high / 4);
-	if ((high / 4) > (PAGE_SHIFT * 8))
-		batch = PAGE_SHIFT * 8;
-
-	pageset_update(&p->pcp, high, batch);
-}
-
-static void pageset_set_high_and_batch(struct zone *zone,
-				       struct per_cpu_pageset *pcp)
-{
-	if (percpu_pagelist_fraction)
-		pageset_set_high(pcp,
-			(zone->managed_pages /
-				percpu_pagelist_fraction));
-	else
-		pageset_set_batch(pcp, zone_batchsize(zone));
-}
+	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
 
-static void __meminit zone_pageset_init(struct zone *zone, int cpu)
-{
-	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+	pageset_get_values_by_batch(0, &high, &batch);
 
-	pageset_init(pcp);
-	pageset_set_high_and_batch(zone, pcp);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(pcp, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 static void __meminit setup_zone_pageset(struct zone *zone)
 {
 	int cpu;
+	int high, batch;
+	struct per_cpu_pageset *p;
+
+	pageset_get_values(zone, &high, &batch);
+
 	zone->pageset = alloc_percpu(struct per_cpu_pageset);
-	for_each_possible_cpu(cpu)
-		zone_pageset_init(zone, cpu);
+	for_each_possible_cpu(cpu) {
+		p = per_cpu_ptr(zone->pageset, cpu);
+		pageset_init(p);
+		p->pcp.high = high;
+		p->pcp.batch = batch;
+	}
 }
 
 /*
@@ -5928,11 +5950,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
 		goto out;
 
 	for_each_populated_zone(zone) {
-		unsigned int cpu;
+		int high, batch;
 
-		for_each_possible_cpu(cpu)
-			pageset_set_high_and_batch(zone,
-					per_cpu_ptr(zone->pageset, cpu));
+		pageset_get_values(zone, &high, &batch);
+		pageset_update(zone, high, batch);
 	}
 out:
 	mutex_unlock(&pcp_batch_high_lock);
@@ -6455,11 +6476,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
  */
 void __meminit zone_pcp_update(struct zone *zone)
 {
-	unsigned cpu;
+	int high, batch;
+
 	mutex_lock(&pcp_batch_high_lock);
-	for_each_possible_cpu(cpu)
-		pageset_set_high_and_batch(zone,
-				per_cpu_ptr(zone->pageset, cpu));
+	pageset_get_values(zone, &high, &batch);
+	pageset_update(zone, high, batch);
 	mutex_unlock(&pcp_batch_high_lock);
 }
 #endif
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

We got migratetype of the freeing page without holding the zone lock so
it could be racy. There are two cases of this race.

1. pages are added to isolate buddy list after restoring original
migratetype.
2. pages are added to normal buddy list while pageblock is isolated.

If case 1 happens, we can't allocate freepages on isolate buddy list
until next pageblock isolation occurs.
In case of 2, pages could be merged with pages on isolate buddy list and
located on normal buddy list. This makes freepage counting incorrect
and break the property of pageblock isolation.

One solution to this problem is checking pageblock migratetype with
holding zone lock in __free_one_page() and I posted it before, but,
it didn't get welcome since it needs the hook in zone lock critical
section on freepath.

This is another solution to this problem and impose most overhead on
pageblock isolation logic. Following is how this solution works.

1. Extends irq disabled period on freepath to call
get_pfnblock_migratetype() with irq disabled. With this, we can be
sure that future freed pages will see modified pageblock migratetype
after certain synchronization point so we don't need to hold the zone
lock to get correct pageblock migratetype. Although it extends irq
disabled period on freepath, I guess it is marginal and better than
adding the hook in zone lock critical section.

2. #1 requires IPI for synchronization and we can't hold the zone lock
during processing IPI. In this time, some pages could be moved from buddy
list to pcp list on page allocation path and later it could be moved again
from pcp list to buddy list. In this time, this page would be on isolate
pageblock, so, the hook is required on free_pcppages_bulk() to prevent
misplacement. To remove this possibility, disabling and draining pcp
list is needed during isolation. It guaratees that there is no page on pcp
list on all cpus while isolation, so misplacement problem can't happen.

Note that this doesn't fix freepage counting problem. To fix it,
we need more logic. Following patches will do it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/internal.h       |    2 ++
 mm/page_alloc.c     |   27 ++++++++++++++++++++-------
 mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
 3 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index a1b651b..81b8884 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
 /*
  * in mm/page_alloc.c
  */
+extern void zone_pcp_disable(struct zone *zone);
+extern void zone_pcp_enable(struct zone *zone);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e1e344..4517b1d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
 			trace_mm_page_pcpu_drain(page, 0, mt);
-			if (likely(!is_migrate_isolate_page(page))) {
-				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
-				if (is_migrate_cma(mt))
-					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
-			}
+			__mod_zone_freepage_state(zone, 1, mt);
 		} while (--to_free && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
@@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	if (!free_pages_prepare(page, order))
 		return;
 
-	migratetype = get_pfnblock_migratetype(page, pfn);
 	local_irq_save(flags);
+	migratetype = get_pfnblock_migratetype(page, pfn);
 	__count_vm_events(PGFREE, 1 << order);
 	set_freepage_migratetype(page, migratetype);
 	free_one_page(page_zone(page), page, pfn, order, migratetype);
@@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
 	if (!free_pages_prepare(page, 0))
 		return;
 
+	local_irq_save(flags);
 	migratetype = get_pfnblock_migratetype(page, pfn);
 	set_freepage_migratetype(page, migratetype);
-	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 
 	/*
@@ -6469,6 +6465,23 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 }
 #endif
 
+#ifdef CONFIG_MEMORY_ISOLATION
+void zone_pcp_disable(struct zone *zone)
+{
+	mutex_lock(&pcp_batch_high_lock);
+	pageset_update(zone, 1, 1);
+}
+
+void zone_pcp_enable(struct zone *zone)
+{
+	int high, batch;
+
+	pageset_get_values(zone, &high, &batch);
+	pageset_update(zone, high, batch);
+	mutex_unlock(&pcp_batch_high_lock);
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
  * The zone indicated has a new number of managed_pages; batch sizes and percpu
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 3100f98..439158d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,9 +16,10 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	struct memory_isolate_notify arg;
 	int notifier_ret;
 	int ret = -EBUSY;
+	unsigned long nr_pages;
+	int migratetype;
 
 	zone = page_zone(page);
-
 	spin_lock_irqsave(&zone->lock, flags);
 
 	pfn = page_to_pfn(page);
@@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	 */
 
 out:
-	if (!ret) {
-		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
+	if (ret) {
+		spin_unlock_irqrestore(&zone->lock, flags);
+		return ret;
+	}
 
-		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	migratetype = get_pageblock_migratetype(page);
+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	spin_unlock_irqrestore(&zone->lock, flags);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
-	}
+	zone_pcp_disable(zone);
+
+	/*
+	 * After this point, freed pages will see MIGRATE_ISOLATE as
+	 * their pageblock migratetype on all cpus. And pcp list has
+	 * no free page.
+	 */
+	on_each_cpu(drain_local_pages, NULL, 1);
 
+	spin_lock_irqsave(&zone->lock, flags);
+	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	spin_unlock_irqrestore(&zone->lock, flags);
-	if (!ret)
-		drain_all_pages();
-	return ret;
+
+	zone_pcp_enable(zone);
+
+	return 0;
 }
 
 void unset_migratetype_isolate(struct page *page, unsigned migratetype)
@@ -80,9 +93,17 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
+
+	set_pageblock_migratetype(page, migratetype);
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	/* Freed pages will see original migratetype after this point */
+	kick_all_cpus_sync();
+
+	spin_lock_irqsave(&zone->lock, flags);
 	nr_pages = move_freepages_block(zone, page, migratetype);
 	__mod_zone_freepage_state(zone, nr_pages, migratetype);
-	set_pageblock_migratetype(page, migratetype);
+
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

We got migratetype of the freeing page without holding the zone lock so
it could be racy. There are two cases of this race.

1. pages are added to isolate buddy list after restoring original
migratetype.
2. pages are added to normal buddy list while pageblock is isolated.

If case 1 happens, we can't allocate freepages on isolate buddy list
until next pageblock isolation occurs.
In case of 2, pages could be merged with pages on isolate buddy list and
located on normal buddy list. This makes freepage counting incorrect
and break the property of pageblock isolation.

One solution to this problem is checking pageblock migratetype with
holding zone lock in __free_one_page() and I posted it before, but,
it didn't get welcome since it needs the hook in zone lock critical
section on freepath.

This is another solution to this problem and impose most overhead on
pageblock isolation logic. Following is how this solution works.

1. Extends irq disabled period on freepath to call
get_pfnblock_migratetype() with irq disabled. With this, we can be
sure that future freed pages will see modified pageblock migratetype
after certain synchronization point so we don't need to hold the zone
lock to get correct pageblock migratetype. Although it extends irq
disabled period on freepath, I guess it is marginal and better than
adding the hook in zone lock critical section.

2. #1 requires IPI for synchronization and we can't hold the zone lock
during processing IPI. In this time, some pages could be moved from buddy
list to pcp list on page allocation path and later it could be moved again
from pcp list to buddy list. In this time, this page would be on isolate
pageblock, so, the hook is required on free_pcppages_bulk() to prevent
misplacement. To remove this possibility, disabling and draining pcp
list is needed during isolation. It guaratees that there is no page on pcp
list on all cpus while isolation, so misplacement problem can't happen.

Note that this doesn't fix freepage counting problem. To fix it,
we need more logic. Following patches will do it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/internal.h       |    2 ++
 mm/page_alloc.c     |   27 ++++++++++++++++++++-------
 mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
 3 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index a1b651b..81b8884 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
 /*
  * in mm/page_alloc.c
  */
+extern void zone_pcp_disable(struct zone *zone);
+extern void zone_pcp_enable(struct zone *zone);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e1e344..4517b1d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
 			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
 			trace_mm_page_pcpu_drain(page, 0, mt);
-			if (likely(!is_migrate_isolate_page(page))) {
-				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
-				if (is_migrate_cma(mt))
-					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
-			}
+			__mod_zone_freepage_state(zone, 1, mt);
 		} while (--to_free && --batch_free && !list_empty(list));
 	}
 	spin_unlock(&zone->lock);
@@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	if (!free_pages_prepare(page, order))
 		return;
 
-	migratetype = get_pfnblock_migratetype(page, pfn);
 	local_irq_save(flags);
+	migratetype = get_pfnblock_migratetype(page, pfn);
 	__count_vm_events(PGFREE, 1 << order);
 	set_freepage_migratetype(page, migratetype);
 	free_one_page(page_zone(page), page, pfn, order, migratetype);
@@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
 	if (!free_pages_prepare(page, 0))
 		return;
 
+	local_irq_save(flags);
 	migratetype = get_pfnblock_migratetype(page, pfn);
 	set_freepage_migratetype(page, migratetype);
-	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 
 	/*
@@ -6469,6 +6465,23 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
 }
 #endif
 
+#ifdef CONFIG_MEMORY_ISOLATION
+void zone_pcp_disable(struct zone *zone)
+{
+	mutex_lock(&pcp_batch_high_lock);
+	pageset_update(zone, 1, 1);
+}
+
+void zone_pcp_enable(struct zone *zone)
+{
+	int high, batch;
+
+	pageset_get_values(zone, &high, &batch);
+	pageset_update(zone, high, batch);
+	mutex_unlock(&pcp_batch_high_lock);
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 /*
  * The zone indicated has a new number of managed_pages; batch sizes and percpu
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 3100f98..439158d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,9 +16,10 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	struct memory_isolate_notify arg;
 	int notifier_ret;
 	int ret = -EBUSY;
+	unsigned long nr_pages;
+	int migratetype;
 
 	zone = page_zone(page);
-
 	spin_lock_irqsave(&zone->lock, flags);
 
 	pfn = page_to_pfn(page);
@@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	 */
 
 out:
-	if (!ret) {
-		unsigned long nr_pages;
-		int migratetype = get_pageblock_migratetype(page);
+	if (ret) {
+		spin_unlock_irqrestore(&zone->lock, flags);
+		return ret;
+	}
 
-		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	migratetype = get_pageblock_migratetype(page);
+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	spin_unlock_irqrestore(&zone->lock, flags);
 
-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
-	}
+	zone_pcp_disable(zone);
+
+	/*
+	 * After this point, freed pages will see MIGRATE_ISOLATE as
+	 * their pageblock migratetype on all cpus. And pcp list has
+	 * no free page.
+	 */
+	on_each_cpu(drain_local_pages, NULL, 1);
 
+	spin_lock_irqsave(&zone->lock, flags);
+	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	__mod_zone_freepage_state(zone, -nr_pages, migratetype);
 	spin_unlock_irqrestore(&zone->lock, flags);
-	if (!ret)
-		drain_all_pages();
-	return ret;
+
+	zone_pcp_enable(zone);
+
+	return 0;
 }
 
 void unset_migratetype_isolate(struct page *page, unsigned migratetype)
@@ -80,9 +93,17 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	spin_lock_irqsave(&zone->lock, flags);
 	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
 		goto out;
+
+	set_pageblock_migratetype(page, migratetype);
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	/* Freed pages will see original migratetype after this point */
+	kick_all_cpus_sync();
+
+	spin_lock_irqsave(&zone->lock, flags);
 	nr_pages = move_freepages_block(zone, page, migratetype);
 	__mod_zone_freepage_state(zone, nr_pages, migratetype);
-	set_pageblock_migratetype(page, migratetype);
+
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Current pageblock isolation logic has a problem that results in incorrect
freepage counting. move_freepages_block() doesn't return number of
moved pages so freepage count could be wrong if some pages are freed
inbetween set_pageblock_migratetype() and move_freepages_block(). Although
we fix move_freepages_block() to return number of moved pages, the problem
wouldn't be fixed completely because buddy allocator doesn't care if merged
pages are on different buddy list or not. If some page on normal buddy list
is merged with isolated page and moved to isolate buddy list, freepage
count should be subtracted, but, it didn't and can't now.

To fix this case, freed page should not be added to buddy list
inbetween set_pageblock_migratetype() and move_freepages_block().
In this patch, I introduce hook, deactivate_isolate_page() on
free_one_page() for freeing page on isolate pageblock. This page will
be marked as PageIsolated() and handled specially in pageblock
isolation logic.

Overall design of changed pageblock isolation logic is as following.

1. ISOLATION
- check pageblock is suitable for pageblock isolation.
- change migratetype of pageblock to MIGRATE_ISOLATE.
- disable pcp list.
- drain pcp list.
- pcp couldn't have any freepage at this point.
- synchronize all cpus to see correct migratetype.
- freed pages on this pageblock will be handled specially and
not added to buddy list from here. With this way, there is no
possibility of merging pages on different buddy list.
- move freepages on normal buddy list to isolate buddy list.
There is no page on isolate buddy list so move_freepages_block()
returns number of moved freepages correctly.
- enable pcp list.

2. TEST-ISOLATION
- activates freepages marked as PageIsolated() and add to isolate
buddy list.
- test if pageblock is properly isolated.

3. UNDO-ISOLATION
- move freepages from isolate buddy list to normal buddy list.
There is no page on normal buddy list so move_freepages_block()
return number of moved freepages correctly.
- change migratetype of pageblock to normal migratetype
- synchronize all cpus.
- activate isolated freepages and add to normal buddy list.

With this patch, most of freepage counting bugs are solved and
exceptional handling for freepage count is done in pageblock isolation
logic rather than allocator.

Remain problem is for page with pageblock_order. Following patch
will fix it, too.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/page-isolation.h |    2 +
 mm/internal.h                  |    3 ++
 mm/page_alloc.c                |   28 ++++++-----
 mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
 4 files changed, 118 insertions(+), 22 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3fff8e7..3dd39fe 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
 }
 #endif
 
+void deactivate_isolated_page(struct zone *zone, struct page *page,
+				unsigned int order);
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
diff --git a/mm/internal.h b/mm/internal.h
index 81b8884..c70750a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
  */
 extern void zone_pcp_disable(struct zone *zone);
 extern void zone_pcp_enable(struct zone *zone);
+extern void __free_one_page(struct page *page, unsigned long pfn,
+		struct zone *zone, unsigned int order,
+		int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4517b1d..82da4a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * -- nyc
  */
 
-static inline void __free_one_page(struct page *page,
+void __free_one_page(struct page *page,
 		unsigned long pfn,
 		struct zone *zone, unsigned int order,
 		int migratetype)
@@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
 				int migratetype)
 {
 	unsigned long nr_scanned;
+
+	if (unlikely(is_migrate_isolate(migratetype))) {
+		deactivate_isolated_page(zone, page, order);
+		return;
+	}
+
 	spin_lock(&zone->lock);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
 
 	__free_one_page(page, pfn, zone, order, migratetype);
-	if (unlikely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
+	__mod_zone_freepage_state(zone, 1 << order, migratetype);
 	spin_unlock(&zone->lock);
 }
 
@@ -6413,6 +6418,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	lru_add_drain_all();
 	drain_all_pages();
 
+	/* Make sure the range is really isolated. */
+	if (test_pages_isolated(start, end, false)) {
+		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
+		       start, end);
+		ret = -EBUSY;
+		goto done;
+	}
+
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -6423,15 +6436,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		outer_start &= ~0UL << order;
 	}
 
-	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, false)) {
-		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
-		       outer_start, end);
-		ret = -EBUSY;
-		goto done;
-	}
-
-
 	/* Grab isolated pages from freelists. */
 	outer_end = isolate_freepages_range(&cc, outer_start, end);
 	if (!outer_end) {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 439158d..898361f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -9,6 +9,75 @@
 #include <linux/hugetlb.h>
 #include "internal.h"
 
+#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
+
+static inline int PageIsolated(struct page *page)
+{
+	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageIsolated(struct page *page)
+{
+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageIsolated(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
+	atomic_set(&page->_mapcount, -1);
+}
+
+void deactivate_isolated_page(struct zone *zone, struct page *page,
+				unsigned int order)
+{
+	spin_lock(&zone->lock);
+
+	set_page_private(page, order);
+	__SetPageIsolated(page);
+
+	spin_unlock(&zone->lock);
+}
+
+static void activate_isolated_pages(struct zone *zone, unsigned long start_pfn,
+				unsigned long end_pfn, int migratetype)
+{
+	unsigned long flags;
+	struct page *page;
+	unsigned long pfn = start_pfn;
+	unsigned int order;
+	unsigned long nr_pages = 0;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	while (pfn < end_pfn) {
+		if (!pfn_valid_within(pfn)) {
+			pfn++;
+			continue;
+		}
+
+		page = pfn_to_page(pfn);
+		if (PageBuddy(page)) {
+			pfn += 1 << page_order(page);
+		} else if (PageIsolated(page)) {
+			__ClearPageIsolated(page);
+			set_freepage_migratetype(page, migratetype);
+			order = page_order(page);
+			__free_one_page(page, pfn, zone, order, migratetype);
+
+			pfn += 1 << order;
+			nr_pages += 1 << order;
+		} else {
+			pfn++;
+		}
+	}
+
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 {
 	struct zone *zone;
@@ -88,24 +157,26 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags, nr_pages;
+	unsigned long start_pfn, end_pfn;
 
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lock, flags);
-	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
-		goto out;
+	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) {
+		spin_unlock_irqrestore(&zone->lock, flags);
+		return;
+	}
 
+	nr_pages = move_freepages_block(zone, page, migratetype);
+	__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	set_pageblock_migratetype(page, migratetype);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/* Freed pages will see original migratetype after this point */
 	kick_all_cpus_sync();
 
-	spin_lock_irqsave(&zone->lock, flags);
-	nr_pages = move_freepages_block(zone, page, migratetype);
-	__mod_zone_freepage_state(zone, nr_pages, migratetype);
-
-out:
-	spin_unlock_irqrestore(&zone->lock, flags);
+	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages;
+	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
 }
 
 static inline struct page *
@@ -242,6 +313,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	struct page *page;
 	struct zone *zone;
 	int ret;
+	int order;
+	unsigned long outer_start;
 
 	/*
 	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
@@ -256,10 +329,24 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
 	if ((pfn < end_pfn) || !page)
 		return -EBUSY;
-	/* Check all pages are free or marked as ISOLATED */
+
 	zone = page_zone(page);
+	activate_isolated_pages(zone, start_pfn, end_pfn, MIGRATE_ISOLATE);
+
+	/* Check all pages are free or marked as ISOLATED */
 	spin_lock_irqsave(&zone->lock, flags);
-	ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
+	order = 0;
+	outer_start = start_pfn;
+	while (!PageBuddy(pfn_to_page(outer_start))) {
+		if (++order >= MAX_ORDER) {
+			spin_unlock_irqrestore(&zone->lock, flags);
+			return -EBUSY;
+		}
+
+		outer_start &= ~0UL << order;
+	}
+
+	ret = __test_page_isolated_in_pageblock(outer_start, end_pfn,
 						skip_hwpoisoned_pages);
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Current pageblock isolation logic has a problem that results in incorrect
freepage counting. move_freepages_block() doesn't return number of
moved pages so freepage count could be wrong if some pages are freed
inbetween set_pageblock_migratetype() and move_freepages_block(). Although
we fix move_freepages_block() to return number of moved pages, the problem
wouldn't be fixed completely because buddy allocator doesn't care if merged
pages are on different buddy list or not. If some page on normal buddy list
is merged with isolated page and moved to isolate buddy list, freepage
count should be subtracted, but, it didn't and can't now.

To fix this case, freed page should not be added to buddy list
inbetween set_pageblock_migratetype() and move_freepages_block().
In this patch, I introduce hook, deactivate_isolate_page() on
free_one_page() for freeing page on isolate pageblock. This page will
be marked as PageIsolated() and handled specially in pageblock
isolation logic.

Overall design of changed pageblock isolation logic is as following.

1. ISOLATION
- check pageblock is suitable for pageblock isolation.
- change migratetype of pageblock to MIGRATE_ISOLATE.
- disable pcp list.
- drain pcp list.
- pcp couldn't have any freepage at this point.
- synchronize all cpus to see correct migratetype.
- freed pages on this pageblock will be handled specially and
not added to buddy list from here. With this way, there is no
possibility of merging pages on different buddy list.
- move freepages on normal buddy list to isolate buddy list.
There is no page on isolate buddy list so move_freepages_block()
returns number of moved freepages correctly.
- enable pcp list.

2. TEST-ISOLATION
- activates freepages marked as PageIsolated() and add to isolate
buddy list.
- test if pageblock is properly isolated.

3. UNDO-ISOLATION
- move freepages from isolate buddy list to normal buddy list.
There is no page on normal buddy list so move_freepages_block()
return number of moved freepages correctly.
- change migratetype of pageblock to normal migratetype
- synchronize all cpus.
- activate isolated freepages and add to normal buddy list.

With this patch, most of freepage counting bugs are solved and
exceptional handling for freepage count is done in pageblock isolation
logic rather than allocator.

Remain problem is for page with pageblock_order. Following patch
will fix it, too.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/page-isolation.h |    2 +
 mm/internal.h                  |    3 ++
 mm/page_alloc.c                |   28 ++++++-----
 mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
 4 files changed, 118 insertions(+), 22 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3fff8e7..3dd39fe 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
 }
 #endif
 
+void deactivate_isolated_page(struct zone *zone, struct page *page,
+				unsigned int order);
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages);
 void set_pageblock_migratetype(struct page *page, int migratetype);
diff --git a/mm/internal.h b/mm/internal.h
index 81b8884..c70750a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
  */
 extern void zone_pcp_disable(struct zone *zone);
 extern void zone_pcp_enable(struct zone *zone);
+extern void __free_one_page(struct page *page, unsigned long pfn,
+		struct zone *zone, unsigned int order,
+		int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4517b1d..82da4a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
  * -- nyc
  */
 
-static inline void __free_one_page(struct page *page,
+void __free_one_page(struct page *page,
 		unsigned long pfn,
 		struct zone *zone, unsigned int order,
 		int migratetype)
@@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
 				int migratetype)
 {
 	unsigned long nr_scanned;
+
+	if (unlikely(is_migrate_isolate(migratetype))) {
+		deactivate_isolated_page(zone, page, order);
+		return;
+	}
+
 	spin_lock(&zone->lock);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
 	if (nr_scanned)
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
 
 	__free_one_page(page, pfn, zone, order, migratetype);
-	if (unlikely(!is_migrate_isolate(migratetype)))
-		__mod_zone_freepage_state(zone, 1 << order, migratetype);
+	__mod_zone_freepage_state(zone, 1 << order, migratetype);
 	spin_unlock(&zone->lock);
 }
 
@@ -6413,6 +6418,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	lru_add_drain_all();
 	drain_all_pages();
 
+	/* Make sure the range is really isolated. */
+	if (test_pages_isolated(start, end, false)) {
+		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
+		       start, end);
+		ret = -EBUSY;
+		goto done;
+	}
+
 	order = 0;
 	outer_start = start;
 	while (!PageBuddy(pfn_to_page(outer_start))) {
@@ -6423,15 +6436,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 		outer_start &= ~0UL << order;
 	}
 
-	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, false)) {
-		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
-		       outer_start, end);
-		ret = -EBUSY;
-		goto done;
-	}
-
-
 	/* Grab isolated pages from freelists. */
 	outer_end = isolate_freepages_range(&cc, outer_start, end);
 	if (!outer_end) {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 439158d..898361f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -9,6 +9,75 @@
 #include <linux/hugetlb.h>
 #include "internal.h"
 
+#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
+
+static inline int PageIsolated(struct page *page)
+{
+	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
+}
+
+static inline void __SetPageIsolated(struct page *page)
+{
+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
+	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
+}
+
+static inline void __ClearPageIsolated(struct page *page)
+{
+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
+	atomic_set(&page->_mapcount, -1);
+}
+
+void deactivate_isolated_page(struct zone *zone, struct page *page,
+				unsigned int order)
+{
+	spin_lock(&zone->lock);
+
+	set_page_private(page, order);
+	__SetPageIsolated(page);
+
+	spin_unlock(&zone->lock);
+}
+
+static void activate_isolated_pages(struct zone *zone, unsigned long start_pfn,
+				unsigned long end_pfn, int migratetype)
+{
+	unsigned long flags;
+	struct page *page;
+	unsigned long pfn = start_pfn;
+	unsigned int order;
+	unsigned long nr_pages = 0;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	while (pfn < end_pfn) {
+		if (!pfn_valid_within(pfn)) {
+			pfn++;
+			continue;
+		}
+
+		page = pfn_to_page(pfn);
+		if (PageBuddy(page)) {
+			pfn += 1 << page_order(page);
+		} else if (PageIsolated(page)) {
+			__ClearPageIsolated(page);
+			set_freepage_migratetype(page, migratetype);
+			order = page_order(page);
+			__free_one_page(page, pfn, zone, order, migratetype);
+
+			pfn += 1 << order;
+			nr_pages += 1 << order;
+		} else {
+			pfn++;
+		}
+	}
+
+	if (!is_migrate_isolate(migratetype))
+		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
 int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 {
 	struct zone *zone;
@@ -88,24 +157,26 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags, nr_pages;
+	unsigned long start_pfn, end_pfn;
 
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lock, flags);
-	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
-		goto out;
+	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) {
+		spin_unlock_irqrestore(&zone->lock, flags);
+		return;
+	}
 
+	nr_pages = move_freepages_block(zone, page, migratetype);
+	__mod_zone_freepage_state(zone, nr_pages, migratetype);
 	set_pageblock_migratetype(page, migratetype);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	/* Freed pages will see original migratetype after this point */
 	kick_all_cpus_sync();
 
-	spin_lock_irqsave(&zone->lock, flags);
-	nr_pages = move_freepages_block(zone, page, migratetype);
-	__mod_zone_freepage_state(zone, nr_pages, migratetype);
-
-out:
-	spin_unlock_irqrestore(&zone->lock, flags);
+	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages;
+	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
 }
 
 static inline struct page *
@@ -242,6 +313,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	struct page *page;
 	struct zone *zone;
 	int ret;
+	int order;
+	unsigned long outer_start;
 
 	/*
 	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
@@ -256,10 +329,24 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
 	if ((pfn < end_pfn) || !page)
 		return -EBUSY;
-	/* Check all pages are free or marked as ISOLATED */
+
 	zone = page_zone(page);
+	activate_isolated_pages(zone, start_pfn, end_pfn, MIGRATE_ISOLATE);
+
+	/* Check all pages are free or marked as ISOLATED */
 	spin_lock_irqsave(&zone->lock, flags);
-	ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
+	order = 0;
+	outer_start = start_pfn;
+	while (!PageBuddy(pfn_to_page(outer_start))) {
+		if (++order >= MAX_ORDER) {
+			spin_unlock_irqrestore(&zone->lock, flags);
+			return -EBUSY;
+		}
+
+		outer_start &= ~0UL << order;
+	}
+
+	ret = __test_page_isolated_in_pageblock(outer_start, end_pfn,
 						skip_hwpoisoned_pages);
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 6/8] mm/isolation: factor out pre/post logic on set/unset_migratetype_isolate()
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Current isolation logic isolates each pageblock individually.
This causes freepage counting problem when page with pageblock order is
merged with other page on different buddy list. To prevent it, we should
handle whole range at one time in start_isolate_page_range(). This patch
is preparation of that work.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |   45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 898361f..b91f9ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -78,18 +78,14 @@ static void activate_isolated_pages(struct zone *zone, unsigned long start_pfn,
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
+static int set_migratetype_isolate_pre(struct page *page,
+				bool skip_hwpoisoned_pages)
 {
-	struct zone *zone;
-	unsigned long flags, pfn;
+	struct zone *zone = page_zone(page);
+	unsigned long pfn;
 	struct memory_isolate_notify arg;
 	int notifier_ret;
 	int ret = -EBUSY;
-	unsigned long nr_pages;
-	int migratetype;
-
-	zone = page_zone(page);
-	spin_lock_irqsave(&zone->lock, flags);
 
 	pfn = page_to_pfn(page);
 	arg.start_pfn = pfn;
@@ -110,7 +106,7 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
 	notifier_ret = notifier_to_errno(notifier_ret);
 	if (notifier_ret)
-		goto out;
+		return ret;
 	/*
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
@@ -124,10 +120,20 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	 * removable-by-driver pages reported by notifier, we'll fail.
 	 */
 
-out:
-	if (ret) {
+	return ret;
+}
+
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long flags;
+	unsigned long nr_pages;
+	int migratetype;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	if (set_migratetype_isolate_pre(page, skip_hwpoisoned_pages)) {
 		spin_unlock_irqrestore(&zone->lock, flags);
-		return ret;
+		return -EBUSY;
 	}
 
 	migratetype = get_pageblock_migratetype(page);
@@ -153,11 +159,20 @@ out:
 	return 0;
 }
 
+static void unset_migratetype_isolate_post(struct page *page,
+					unsigned migratetype)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long start_pfn, end_pfn;
+
+	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages;
+	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
+}
 void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags, nr_pages;
-	unsigned long start_pfn, end_pfn;
 
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lock, flags);
@@ -174,9 +189,7 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	/* Freed pages will see original migratetype after this point */
 	kick_all_cpus_sync();
 
-	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
-	end_pfn = start_pfn + pageblock_nr_pages;
-	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
+	unset_migratetype_isolate_post(page, migratetype);
 }
 
 static inline struct page *
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 6/8] mm/isolation: factor out pre/post logic on set/unset_migratetype_isolate()
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Current isolation logic isolates each pageblock individually.
This causes freepage counting problem when page with pageblock order is
merged with other page on different buddy list. To prevent it, we should
handle whole range at one time in start_isolate_page_range(). This patch
is preparation of that work.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |   45 +++++++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 898361f..b91f9ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -78,18 +78,14 @@ static void activate_isolated_pages(struct zone *zone, unsigned long start_pfn,
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
-int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
+static int set_migratetype_isolate_pre(struct page *page,
+				bool skip_hwpoisoned_pages)
 {
-	struct zone *zone;
-	unsigned long flags, pfn;
+	struct zone *zone = page_zone(page);
+	unsigned long pfn;
 	struct memory_isolate_notify arg;
 	int notifier_ret;
 	int ret = -EBUSY;
-	unsigned long nr_pages;
-	int migratetype;
-
-	zone = page_zone(page);
-	spin_lock_irqsave(&zone->lock, flags);
 
 	pfn = page_to_pfn(page);
 	arg.start_pfn = pfn;
@@ -110,7 +106,7 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
 	notifier_ret = notifier_to_errno(notifier_ret);
 	if (notifier_ret)
-		goto out;
+		return ret;
 	/*
 	 * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
 	 * We just check MOVABLE pages.
@@ -124,10 +120,20 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
 	 * removable-by-driver pages reported by notifier, we'll fail.
 	 */
 
-out:
-	if (ret) {
+	return ret;
+}
+
+int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long flags;
+	unsigned long nr_pages;
+	int migratetype;
+
+	spin_lock_irqsave(&zone->lock, flags);
+	if (set_migratetype_isolate_pre(page, skip_hwpoisoned_pages)) {
 		spin_unlock_irqrestore(&zone->lock, flags);
-		return ret;
+		return -EBUSY;
 	}
 
 	migratetype = get_pageblock_migratetype(page);
@@ -153,11 +159,20 @@ out:
 	return 0;
 }
 
+static void unset_migratetype_isolate_post(struct page *page,
+					unsigned migratetype)
+{
+	struct zone *zone = page_zone(page);
+	unsigned long start_pfn, end_pfn;
+
+	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages;
+	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
+}
 void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 {
 	struct zone *zone;
 	unsigned long flags, nr_pages;
-	unsigned long start_pfn, end_pfn;
 
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lock, flags);
@@ -174,9 +189,7 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	/* Freed pages will see original migratetype after this point */
 	kick_all_cpus_sync();
 
-	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
-	end_pfn = start_pfn + pageblock_nr_pages;
-	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
+	unset_migratetype_isolate_post(page, migratetype);
 }
 
 static inline struct page *
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 7/8] mm/isolation: fix freepage counting bug on start/undo_isolat_page_range()
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Current isolation logic isolates each pageblock individually.
This causes freepage counting problem when page with pageblock order is
merged with other page on different buddy list. We can prevent it by
following solutions.

1. decrease MAX_ORDER to pageblock order
2. prevent merging buddy pages if they are on different buddy list

Solution 1. looks really easy, but, I'm not sure if there is a user
to allocate more than pageblock order.

Solution 2. seems not to get greeted, because it needs to inserts
hooks to the core part of allocator.

So, this is solution 3, that is, making start/undo_isolat_page_range()
bug free through handling whole range at one go. If given range is
aligned with MAX_ORDER properly, page isn't merged with other page on
different buddy list. So we can calm down freepage counting bug.
Unfortunately, this solution only works for MAX_ORDER aligned range
like as CMA and aligning range is caller's duty.

Although we can go with solution 1., this patch is still useful since
some synchronization call is reduced since we call them in batch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |  105 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 84 insertions(+), 21 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b91f9ec..063f1f9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -222,30 +222,63 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn;
-	unsigned long undo_pfn;
-	struct page *page;
+	unsigned long flags = 0, nr_pages;
+	struct page *page = NULL;
+	struct zone *zone = NULL;
 
 	BUG_ON((start_pfn) & (pageblock_nr_pages - 1));
 	BUG_ON((end_pfn) & (pageblock_nr_pages - 1));
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page &&
-		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
-			undo_pfn = pfn;
-			goto undo;
+		if (!page)
+			continue;
+
+		if (!zone) {
+			zone = page_zone(page);
+			spin_lock_irqsave(&zone->lock, flags);
+		}
+
+		if (set_migratetype_isolate_pre(page, skip_hwpoisoned_pages)) {
+			spin_unlock_irqrestore(&zone->lock, flags);
+			return -EBUSY;
 		}
 	}
-	return 0;
-undo:
-	for (pfn = start_pfn;
-	     pfn < undo_pfn;
-	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
-	return -EBUSY;
+	if (!zone)
+		return 0;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (!page)
+			continue;
+
+		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	zone_pcp_disable(zone);
+	/*
+	 * After this point, freed page will see MIGRATE_ISOLATE as
+	 * their pageblock migratetype on all cpus. And pcp list has
+	 * no free page.
+	 */
+	on_each_cpu(drain_local_pages, NULL, 1);
+
+	spin_lock_irqsave(&zone->lock, flags);
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (!page)
+			continue;
+
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	zone_pcp_enable(zone);
+
+	return 0;
 }
 
 /*
@@ -256,18 +289,48 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 {
 	unsigned long pfn;
 	struct page *page;
+	struct zone *zone = NULL;
+	unsigned long flags, nr_pages;
+
 	BUG_ON((start_pfn) & (pageblock_nr_pages - 1));
 	BUG_ON((end_pfn) & (pageblock_nr_pages - 1));
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (!page)
+			continue;
+
+		if (!zone) {
+			zone = page_zone(page);
+			spin_lock_irqsave(&zone->lock, flags);
+		}
+
+		if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+			continue;
+
+		nr_pages = move_freepages_block(zone, page, migratetype);
+		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+		set_pageblock_migratetype(page, migratetype);
+	}
+
+	if (!zone)
+		return 0;
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	/* Freed pages will see original migratetype after this point */
+	kick_all_cpus_sync();
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+		if (!page)
 			continue;
-		unset_migratetype_isolate(page, migratetype);
+
+		unset_migratetype_isolate_post(page, migratetype);
 	}
 	return 0;
 }
+
 /*
  * Test all pages in the range is free(means isolated) or not.
  * all pages in [start_pfn...end_pfn) must be in the same zone.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 7/8] mm/isolation: fix freepage counting bug on start/undo_isolat_page_range()
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

Current isolation logic isolates each pageblock individually.
This causes freepage counting problem when page with pageblock order is
merged with other page on different buddy list. We can prevent it by
following solutions.

1. decrease MAX_ORDER to pageblock order
2. prevent merging buddy pages if they are on different buddy list

Solution 1. looks really easy, but, I'm not sure if there is a user
to allocate more than pageblock order.

Solution 2. seems not to get greeted, because it needs to inserts
hooks to the core part of allocator.

So, this is solution 3, that is, making start/undo_isolat_page_range()
bug free through handling whole range at one go. If given range is
aligned with MAX_ORDER properly, page isn't merged with other page on
different buddy list. So we can calm down freepage counting bug.
Unfortunately, this solution only works for MAX_ORDER aligned range
like as CMA and aligning range is caller's duty.

Although we can go with solution 1., this patch is still useful since
some synchronization call is reduced since we call them in batch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |  105 ++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 84 insertions(+), 21 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b91f9ec..063f1f9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -222,30 +222,63 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, bool skip_hwpoisoned_pages)
 {
 	unsigned long pfn;
-	unsigned long undo_pfn;
-	struct page *page;
+	unsigned long flags = 0, nr_pages;
+	struct page *page = NULL;
+	struct zone *zone = NULL;
 
 	BUG_ON((start_pfn) & (pageblock_nr_pages - 1));
 	BUG_ON((end_pfn) & (pageblock_nr_pages - 1));
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page &&
-		    set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
-			undo_pfn = pfn;
-			goto undo;
+		if (!page)
+			continue;
+
+		if (!zone) {
+			zone = page_zone(page);
+			spin_lock_irqsave(&zone->lock, flags);
+		}
+
+		if (set_migratetype_isolate_pre(page, skip_hwpoisoned_pages)) {
+			spin_unlock_irqrestore(&zone->lock, flags);
+			return -EBUSY;
 		}
 	}
-	return 0;
-undo:
-	for (pfn = start_pfn;
-	     pfn < undo_pfn;
-	     pfn += pageblock_nr_pages)
-		unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
 
-	return -EBUSY;
+	if (!zone)
+		return 0;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (!page)
+			continue;
+
+		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	zone_pcp_disable(zone);
+	/*
+	 * After this point, freed page will see MIGRATE_ISOLATE as
+	 * their pageblock migratetype on all cpus. And pcp list has
+	 * no free page.
+	 */
+	on_each_cpu(drain_local_pages, NULL, 1);
+
+	spin_lock_irqsave(&zone->lock, flags);
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (!page)
+			continue;
+
+		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
+		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
+	}
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	zone_pcp_enable(zone);
+
+	return 0;
 }
 
 /*
@@ -256,18 +289,48 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 {
 	unsigned long pfn;
 	struct page *page;
+	struct zone *zone = NULL;
+	unsigned long flags, nr_pages;
+
 	BUG_ON((start_pfn) & (pageblock_nr_pages - 1));
 	BUG_ON((end_pfn) & (pageblock_nr_pages - 1));
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
+		page = __first_valid_page(pfn, pageblock_nr_pages);
+		if (!page)
+			continue;
+
+		if (!zone) {
+			zone = page_zone(page);
+			spin_lock_irqsave(&zone->lock, flags);
+		}
+
+		if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+			continue;
+
+		nr_pages = move_freepages_block(zone, page, migratetype);
+		__mod_zone_freepage_state(zone, nr_pages, migratetype);
+		set_pageblock_migratetype(page, migratetype);
+	}
+
+	if (!zone)
+		return 0;
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+
+	/* Freed pages will see original migratetype after this point */
+	kick_all_cpus_sync();
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+		if (!page)
 			continue;
-		unset_migratetype_isolate(page, migratetype);
+
+		unset_migratetype_isolate_post(page, migratetype);
 	}
 	return 0;
 }
+
 /*
  * Test all pages in the range is free(means isolated) or not.
  * all pages in [start_pfn...end_pfn) must be in the same zone.
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 8/8] mm/isolation: remove useless race handling related to pageblock isolation
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:18   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

There is a mistake on moving freepage from normal buddy list to isolate
buddy list. If we move page from normal buddy list to isolate buddy list,
We should subtract freepage count in this case, but, it didn't.

And, previous patches ('mm/isolation: close the two race problems related
to pageblock isolation' and 'mm/isolation: change pageblock isolation logic
to fix freepage counting bugs') solves the race related to pageblock
isolation. So, this misplacement cannot happen and this workaround
aren't needed anymore.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |   14 --------------
 1 file changed, 14 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 063f1f9..48c8836 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -351,20 +351,6 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 		}
 		page = pfn_to_page(pfn);
 		if (PageBuddy(page)) {
-			/*
-			 * If race between isolatation and allocation happens,
-			 * some free pages could be in MIGRATE_MOVABLE list
-			 * although pageblock's migratation type of the page
-			 * is MIGRATE_ISOLATE. Catch it and move the page into
-			 * MIGRATE_ISOLATE list.
-			 */
-			if (get_freepage_migratetype(page) != MIGRATE_ISOLATE) {
-				struct page *end_page;
-
-				end_page = page + (1 << page_order(page)) - 1;
-				move_freepages(page_zone(page), page, end_page,
-						MIGRATE_ISOLATE);
-			}
 			pfn += 1 << page_order(page);
 		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH v2 8/8] mm/isolation: remove useless race handling related to pageblock isolation
@ 2014-08-06  7:18   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel, Joonsoo Kim

There is a mistake on moving freepage from normal buddy list to isolate
buddy list. If we move page from normal buddy list to isolate buddy list,
We should subtract freepage count in this case, but, it didn't.

And, previous patches ('mm/isolation: close the two race problems related
to pageblock isolation' and 'mm/isolation: change pageblock isolation logic
to fix freepage counting bugs') solves the race related to pageblock
isolation. So, this misplacement cannot happen and this workaround
aren't needed anymore.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 mm/page_isolation.c |   14 --------------
 1 file changed, 14 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 063f1f9..48c8836 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -351,20 +351,6 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 		}
 		page = pfn_to_page(pfn);
 		if (PageBuddy(page)) {
-			/*
-			 * If race between isolatation and allocation happens,
-			 * some free pages could be in MIGRATE_MOVABLE list
-			 * although pageblock's migratation type of the page
-			 * is MIGRATE_ISOLATE. Catch it and move the page into
-			 * MIGRATE_ISOLATE list.
-			 */
-			if (get_freepage_migratetype(page) != MIGRATE_ISOLATE) {
-				struct page *end_page;
-
-				end_page = page + (1 << page_order(page)) - 1;
-				move_freepages(page_zone(page), page, end_page,
-						MIGRATE_ISOLATE);
-			}
 			pfn += 1 << page_order(page);
 		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
 			/*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 0/8] fix freepage count problems in memory isolation
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-06  7:25   ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, Peter Zijlstra, Vlastimil Babka,
	linux-kernel

On Wed, Aug 06, 2014 at 04:18:26PM +0900, Joonsoo Kim wrote:
> Joonsoo Kim (8):
>   mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
>   mm/isolation: remove unstable check for isolated page
>   mm/page_alloc: fix pcp high, batch management
>   mm/isolation: close the two race problems related to pageblock
>     isolation
>   mm/isolation: change pageblock isolation logic to fix freepage
>     counting bugs
>   mm/isolation: factor out pre/post logic on
>     set/unset_migratetype_isolate()
>   mm/isolation: fix freepage counting bug on
>     start/undo_isolat_page_range()
>   mm/isolation: remove useless race handling related to pageblock
>     isolation
> 
>  include/linux/page-isolation.h |    2 +
>  mm/internal.h                  |    5 +
>  mm/page_alloc.c                |  223 +++++++++++++++++-------------
>  mm/page_isolation.c            |  292 +++++++++++++++++++++++++++++++---------
>  4 files changed, 368 insertions(+), 154 deletions(-)
> 

Sorry, Peter and Vlastimil.

I missed you two on CC due to typo, so manually add on CC of
cover-letter. I will do better next time. :)

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 0/8] fix freepage count problems in memory isolation
@ 2014-08-06  7:25   ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-06  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, Peter Zijlstra, Vlastimil Babka,
	linux-kernel

On Wed, Aug 06, 2014 at 04:18:26PM +0900, Joonsoo Kim wrote:
> Joonsoo Kim (8):
>   mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
>   mm/isolation: remove unstable check for isolated page
>   mm/page_alloc: fix pcp high, batch management
>   mm/isolation: close the two race problems related to pageblock
>     isolation
>   mm/isolation: change pageblock isolation logic to fix freepage
>     counting bugs
>   mm/isolation: factor out pre/post logic on
>     set/unset_migratetype_isolate()
>   mm/isolation: fix freepage counting bug on
>     start/undo_isolat_page_range()
>   mm/isolation: remove useless race handling related to pageblock
>     isolation
> 
>  include/linux/page-isolation.h |    2 +
>  mm/internal.h                  |    5 +
>  mm/page_alloc.c                |  223 +++++++++++++++++-------------
>  mm/page_isolation.c            |  292 +++++++++++++++++++++++++++++++---------
>  4 files changed, 368 insertions(+), 154 deletions(-)
> 

Sorry, Peter and Vlastimil.

I missed you two on CC due to typo, so manually add on CC of
cover-letter. I will do better next time. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-06 15:12     ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-06 15:12 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> Overall design of changed pageblock isolation logic is as following.

I'll reply here since the overall design part is described in this patch 
(would be worth to have it in cover letter as well IMHO).

> 1. ISOLATION
> - check pageblock is suitable for pageblock isolation.
> - change migratetype of pageblock to MIGRATE_ISOLATE.
> - disable pcp list.

Is it needed to disable the pcp list? Shouldn't drain be enough? After 
the drain you already are sure that future freeing will see 
MIGRATE_ISOLATE and skip pcp list anyway, so why disable it completely?

> - drain pcp list.
> - pcp couldn't have any freepage at this point.
> - synchronize all cpus to see correct migratetype.

This synchronization should already happen through the drain, no?

> - freed pages on this pageblock will be handled specially and
> not added to buddy list from here. With this way, there is no
> possibility of merging pages on different buddy list.
> - move freepages on normal buddy list to isolate buddy list.

Is there any advantage of moving the pages to isolate buddy list at this 
point, when we already have the new PageIsolated marking? Maybe not 
right now, but could this be later replaced by just splitting and 
marking PageIsolated the pages from normal buddy list? I guess memory 
hot-remove does not benefit from having buddy-merged pages and CMA 
probably also doesn't?

> There is no page on isolate buddy list so move_freepages_block()
> returns number of moved freepages correctly.
> - enable pcp list.
>
> 2. TEST-ISOLATION
> - activates freepages marked as PageIsolated() and add to isolate
> buddy list.
> - test if pageblock is properly isolated.
>
> 3. UNDO-ISOLATION
> - move freepages from isolate buddy list to normal buddy list.
> There is no page on normal buddy list so move_freepages_block()
> return number of moved freepages correctly.
> - change migratetype of pageblock to normal migratetype
> - synchronize all cpus.
> - activate isolated freepages and add to normal buddy list.

The lack of pcp list deactivation in the undo part IMHO suggests that it 
is indeed not needed.

> With this patch, most of freepage counting bugs are solved and
> exceptional handling for freepage count is done in pageblock isolation
> logic rather than allocator.

\o/


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-06 15:12     ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-06 15:12 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> Overall design of changed pageblock isolation logic is as following.

I'll reply here since the overall design part is described in this patch 
(would be worth to have it in cover letter as well IMHO).

> 1. ISOLATION
> - check pageblock is suitable for pageblock isolation.
> - change migratetype of pageblock to MIGRATE_ISOLATE.
> - disable pcp list.

Is it needed to disable the pcp list? Shouldn't drain be enough? After 
the drain you already are sure that future freeing will see 
MIGRATE_ISOLATE and skip pcp list anyway, so why disable it completely?

> - drain pcp list.
> - pcp couldn't have any freepage at this point.
> - synchronize all cpus to see correct migratetype.

This synchronization should already happen through the drain, no?

> - freed pages on this pageblock will be handled specially and
> not added to buddy list from here. With this way, there is no
> possibility of merging pages on different buddy list.
> - move freepages on normal buddy list to isolate buddy list.

Is there any advantage of moving the pages to isolate buddy list at this 
point, when we already have the new PageIsolated marking? Maybe not 
right now, but could this be later replaced by just splitting and 
marking PageIsolated the pages from normal buddy list? I guess memory 
hot-remove does not benefit from having buddy-merged pages and CMA 
probably also doesn't?

> There is no page on isolate buddy list so move_freepages_block()
> returns number of moved freepages correctly.
> - enable pcp list.
>
> 2. TEST-ISOLATION
> - activates freepages marked as PageIsolated() and add to isolate
> buddy list.
> - test if pageblock is properly isolated.
>
> 3. UNDO-ISOLATION
> - move freepages from isolate buddy list to normal buddy list.
> There is no page on normal buddy list so move_freepages_block()
> return number of moved freepages correctly.
> - change migratetype of pageblock to normal migratetype
> - synchronize all cpus.
> - activate isolated freepages and add to normal buddy list.

The lack of pcp list deactivation in the undo part IMHO suggests that it 
is indeed not needed.

> With this patch, most of freepage counting bugs are solved and
> exceptional handling for freepage count is done in pageblock isolation
> logic rather than allocator.

\o/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 0/8] fix freepage count problems in memory isolation
  2014-08-06  7:18 ` Joonsoo Kim
@ 2014-08-07  0:49   ` Zhang Yanfei
  -1 siblings, 0 replies; 84+ messages in thread
From: Zhang Yanfei @ 2014-08-07  0:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

Hi Joonsoo,

The first 3 patches in this patchset are in a bit of mess.

On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> Hello,
> 
> This patchset aims at fixing problems during memory isolation found by
> testing my patchset [1].
> 
> These are really subtle problems so I can be wrong. If you find what I am
> missing, please let me know.
> 
> Before describing bugs itself, I first explain definition of freepage.
> 
> 1. pages on buddy list are counted as freepage.
> 2. pages on isolate migratetype buddy list are *not* counted as freepage.
> 3. pages on cma buddy list are counted as CMA freepage, too.
> 4. pages for guard are *not* counted as freepage.
> 
> Now, I describe problems and related patch.
> 
> Patch 1: If guard page are cleared and merged into isolate buddy list,
> we should not add freepage count.
> 
> Patch 4: There is race conditions that results in misplacement of free
> pages on buddy list. Then, it results in incorrect freepage count and
> un-availability of freepage.
> 
> Patch 5: To count freepage correctly, we should prevent freepage from
> being added to buddy list in some period of isolation. Without it, we
> cannot be sure if the freepage is counted or not and miscount number
> of freepage.
> 
> Patch 7: In spite of above fixes, there is one more condition for
> incorrect freepage count. pageblock isolation could be done in pageblock
> unit  so we can't prevent freepage from merging with page on next
> pageblock. To fix it, start_isolate_page_range() and
> undo_isolate_page_range() is modified to process whole range at one go.
> With this change, if input parameter of start_isolate_page_range() and
> undo_isolate_page_range() is properly aligned, there is no condition for
> incorrect merging.
> 
> Without patchset [1], above problem doesn't happens on my CMA allocation
> test, because CMA reserved pages aren't used at all. So there is no
> chance for above race.
> 
> With patchset [1], I did simple CMA allocation test and get below result.
> 
> - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
> - run kernel build (make -j16) on background
> - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
> - Result: more than 5000 freepage count are missed
> 
> With patchset [1] and this patchset, I found that no freepage count are
> missed so that I conclude that problems are solved.
> 
> These problems can be possible on memory hot remove users, although
> I didn't check it further.
> 
> This patchset is based on linux-next-20140728.
> Please see individual patches for more information.
> 
> Thanks.
> 
> [1]: Aggressively allocate the pages on cma reserved memory
>      https://lkml.org/lkml/2014/5/30/291
> 
> Joonsoo Kim (8):
>   mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
>   mm/isolation: remove unstable check for isolated page
>   mm/page_alloc: fix pcp high, batch management
>   mm/isolation: close the two race problems related to pageblock
>     isolation
>   mm/isolation: change pageblock isolation logic to fix freepage
>     counting bugs
>   mm/isolation: factor out pre/post logic on
>     set/unset_migratetype_isolate()
>   mm/isolation: fix freepage counting bug on
>     start/undo_isolat_page_range()
>   mm/isolation: remove useless race handling related to pageblock
>     isolation
> 
>  include/linux/page-isolation.h |    2 +
>  mm/internal.h                  |    5 +
>  mm/page_alloc.c                |  223 +++++++++++++++++-------------
>  mm/page_isolation.c            |  292 +++++++++++++++++++++++++++++++---------
>  4 files changed, 368 insertions(+), 154 deletions(-)
> 


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 0/8] fix freepage count problems in memory isolation
@ 2014-08-07  0:49   ` Zhang Yanfei
  0 siblings, 0 replies; 84+ messages in thread
From: Zhang Yanfei @ 2014-08-07  0:49 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

Hi Joonsoo,

The first 3 patches in this patchset are in a bit of mess.

On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> Hello,
> 
> This patchset aims at fixing problems during memory isolation found by
> testing my patchset [1].
> 
> These are really subtle problems so I can be wrong. If you find what I am
> missing, please let me know.
> 
> Before describing bugs itself, I first explain definition of freepage.
> 
> 1. pages on buddy list are counted as freepage.
> 2. pages on isolate migratetype buddy list are *not* counted as freepage.
> 3. pages on cma buddy list are counted as CMA freepage, too.
> 4. pages for guard are *not* counted as freepage.
> 
> Now, I describe problems and related patch.
> 
> Patch 1: If guard page are cleared and merged into isolate buddy list,
> we should not add freepage count.
> 
> Patch 4: There is race conditions that results in misplacement of free
> pages on buddy list. Then, it results in incorrect freepage count and
> un-availability of freepage.
> 
> Patch 5: To count freepage correctly, we should prevent freepage from
> being added to buddy list in some period of isolation. Without it, we
> cannot be sure if the freepage is counted or not and miscount number
> of freepage.
> 
> Patch 7: In spite of above fixes, there is one more condition for
> incorrect freepage count. pageblock isolation could be done in pageblock
> unit  so we can't prevent freepage from merging with page on next
> pageblock. To fix it, start_isolate_page_range() and
> undo_isolate_page_range() is modified to process whole range at one go.
> With this change, if input parameter of start_isolate_page_range() and
> undo_isolate_page_range() is properly aligned, there is no condition for
> incorrect merging.
> 
> Without patchset [1], above problem doesn't happens on my CMA allocation
> test, because CMA reserved pages aren't used at all. So there is no
> chance for above race.
> 
> With patchset [1], I did simple CMA allocation test and get below result.
> 
> - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
> - run kernel build (make -j16) on background
> - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
> - Result: more than 5000 freepage count are missed
> 
> With patchset [1] and this patchset, I found that no freepage count are
> missed so that I conclude that problems are solved.
> 
> These problems can be possible on memory hot remove users, although
> I didn't check it further.
> 
> This patchset is based on linux-next-20140728.
> Please see individual patches for more information.
> 
> Thanks.
> 
> [1]: Aggressively allocate the pages on cma reserved memory
>      https://lkml.org/lkml/2014/5/30/291
> 
> Joonsoo Kim (8):
>   mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
>   mm/isolation: remove unstable check for isolated page
>   mm/page_alloc: fix pcp high, batch management
>   mm/isolation: close the two race problems related to pageblock
>     isolation
>   mm/isolation: change pageblock isolation logic to fix freepage
>     counting bugs
>   mm/isolation: factor out pre/post logic on
>     set/unset_migratetype_isolate()
>   mm/isolation: fix freepage counting bug on
>     start/undo_isolat_page_range()
>   mm/isolation: remove useless race handling related to pageblock
>     isolation
> 
>  include/linux/page-isolation.h |    2 +
>  mm/internal.h                  |    5 +
>  mm/page_alloc.c                |  223 +++++++++++++++++-------------
>  mm/page_isolation.c            |  292 +++++++++++++++++++++++++++++++---------
>  4 files changed, 368 insertions(+), 154 deletions(-)
> 


-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-07  1:46     ` Zhang Yanfei
  -1 siblings, 0 replies; 84+ messages in thread
From: Zhang Yanfei @ 2014-08-07  1:46 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> In __free_one_page(), we check the buddy page if it is guard page.
> And, if so, we should clear guard attribute on the buddy page. But,
> currently, we clear original page's order rather than buddy one's.
> This doesn't have any problem, because resetting buddy's order
> is useless and the original page's order is re-assigned soon.
> But, it is better to correct code.
> 
> Additionally, I change (set/clear)_page_guard_flag() to
> (set/clear)_page_guard() and makes these functions do all works
> needed for guard page. This may make code more understandable.
> 
> One more thing, I did in this patch, is that fixing freepage accounting.
> If we clear guard page and link it onto isolate buddy list, we should
> not increase freepage count.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

> ---
>  mm/page_alloc.c |   29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b99643d4..e6fee4b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
>  }
>  __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
>  
> -static inline void set_page_guard_flag(struct page *page)
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, order);
> +	/* Guard pages are not available for any usage */
> +	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
>  }
>  
> -static inline void clear_page_guard_flag(struct page *page)
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, 0);
> +	if (!is_migrate_isolate(migratetype))
> +		__mod_zone_freepage_state(zone, (1 << order), migratetype);
>  }
>  #else
> -static inline void set_page_guard_flag(struct page *page) { }
> -static inline void clear_page_guard_flag(struct page *page) { }
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
>  #endif
>  
>  static inline void set_page_order(struct page *page, unsigned int order)
> @@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
>  		 * merge with it and move up one order.
>  		 */
>  		if (page_is_guard(buddy)) {
> -			clear_page_guard_flag(buddy);
> -			set_page_private(page, 0);
> -			__mod_zone_freepage_state(zone, 1 << order,
> -						  migratetype);
> +			clear_page_guard(zone, buddy, order, migratetype);
>  		} else {
>  			list_del(&buddy->lru);
>  			zone->free_area[order].nr_free--;
> @@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
>  			 * pages will stay not present in virtual address space
>  			 */
>  			INIT_LIST_HEAD(&page[size].lru);
> -			set_page_guard_flag(&page[size]);
> -			set_page_private(&page[size], high);
> -			/* Guard pages are not available for any usage */
> -			__mod_zone_freepage_state(zone, -(1 << high),
> -						  migratetype);
> +			set_page_guard(zone, &page[size], high, migratetype);
>  			continue;
>  		}
>  #endif
> 


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
@ 2014-08-07  1:46     ` Zhang Yanfei
  0 siblings, 0 replies; 84+ messages in thread
From: Zhang Yanfei @ 2014-08-07  1:46 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> In __free_one_page(), we check the buddy page if it is guard page.
> And, if so, we should clear guard attribute on the buddy page. But,
> currently, we clear original page's order rather than buddy one's.
> This doesn't have any problem, because resetting buddy's order
> is useless and the original page's order is re-assigned soon.
> But, it is better to correct code.
> 
> Additionally, I change (set/clear)_page_guard_flag() to
> (set/clear)_page_guard() and makes these functions do all works
> needed for guard page. This may make code more understandable.
> 
> One more thing, I did in this patch, is that fixing freepage accounting.
> If we clear guard page and link it onto isolate buddy list, we should
> not increase freepage count.
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>

> ---
>  mm/page_alloc.c |   29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b99643d4..e6fee4b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
>  }
>  __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
>  
> -static inline void set_page_guard_flag(struct page *page)
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, order);
> +	/* Guard pages are not available for any usage */
> +	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
>  }
>  
> -static inline void clear_page_guard_flag(struct page *page)
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, 0);
> +	if (!is_migrate_isolate(migratetype))
> +		__mod_zone_freepage_state(zone, (1 << order), migratetype);
>  }
>  #else
> -static inline void set_page_guard_flag(struct page *page) { }
> -static inline void clear_page_guard_flag(struct page *page) { }
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
>  #endif
>  
>  static inline void set_page_order(struct page *page, unsigned int order)
> @@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
>  		 * merge with it and move up one order.
>  		 */
>  		if (page_is_guard(buddy)) {
> -			clear_page_guard_flag(buddy);
> -			set_page_private(page, 0);
> -			__mod_zone_freepage_state(zone, 1 << order,
> -						  migratetype);
> +			clear_page_guard(zone, buddy, order, migratetype);
>  		} else {
>  			list_del(&buddy->lru);
>  			zone->free_area[order].nr_free--;
> @@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
>  			 * pages will stay not present in virtual address space
>  			 */
>  			INIT_LIST_HEAD(&page[size].lru);
> -			set_page_guard_flag(&page[size]);
> -			set_page_private(&page[size], high);
> -			/* Guard pages are not available for any usage */
> -			__mod_zone_freepage_state(zone, -(1 << high),
> -						  migratetype);
> +			set_page_guard(zone, &page[size], high, migratetype);
>  			continue;
>  		}
>  #endif
> 


-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-07  2:11     ` Zhang Yanfei
  -1 siblings, 0 replies; 84+ messages in thread
From: Zhang Yanfei @ 2014-08-07  2:11 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

Hi Joonsoo,

On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> per cpu pages structure, aka pcp, has high and batch values to control
> how many pages we perform caching. This values could be updated
> asynchronously and updater should ensure that this doesn't make any
> problem. For this purpose, pageset_update() is implemented and do some
> memory synchronization. But, it turns out to be wrong when I implemented
> new feature using this. There is no corresponding smp_rmb() in read-side

Out of curiosity, what new feature are you implementing?

IIRC, pageset_update() is used to update high and batch which can be changed
during:

system boot
sysfs
memory hot-plug

So it seems to me that the latter two would have the problems you described here.

Thanks.

> so that it can't guarantee anything. Without correct updating, system
> could hang in free_pcppages_bulk() due to larger batch value than high.
> To properly update this values, we need to synchronization primitives on
> read-side, but, it hurts allocator's fastpath.
> 
> There is another choice for synchronization, that is, sending IPI. This
> is somewhat expensive, but, this is really rare case so I guess it has
> no problem here. However, reducing IPI is very helpful here. Current
> logic handles each CPU's pcp update one by one. To reduce sending IPI,
> we need to re-ogranize the code to handle all CPU's pcp update at one go.
> This patch implement these requirements.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e6fee4b..3e1e344 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3800,7 +3800,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
>   * not check if the processor is online before following the pageset pointer.
>   * Other parts of the kernel may not check if the zone is available.
>   */
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
>  static void setup_zone_pageset(struct zone *zone);
>  
> @@ -3846,9 +3846,9 @@ static int __build_all_zonelists(void *data)
>  	 * needs the percpu allocator in order to allocate its pagesets
>  	 * (a chicken-egg dilemma).
>  	 */
> -	for_each_possible_cpu(cpu) {
> -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> +	setup_pageset(&boot_pageset);
>  
> +	for_each_possible_cpu(cpu) {
>  #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>  		/*
>  		 * We now know the "local memory node" for each node--
> @@ -4230,24 +4230,59 @@ static int zone_batchsize(struct zone *zone)
>   * outside of boot time (or some other assurance that no concurrent updaters
>   * exist).
>   */
> -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> -		unsigned long batch)
> +static void pageset_update(struct zone *zone, int high, int batch)
>  {
> -       /* start with a fail safe value for batch */
> -	pcp->batch = 1;
> -	smp_wmb();
> +	int cpu;
> +	struct per_cpu_pages *pcp;
> +
> +	/* start with a fail safe value for batch */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = 1;
> +	}
> +	kick_all_cpus_sync();
> +
> +	/* Update high, then batch, in order */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->high = high;
> +	}
> +	kick_all_cpus_sync();
>  
> -       /* Update high, then batch, in order */
> -	pcp->high = high;
> -	smp_wmb();
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = batch;
> +	}
> +}
> +
> +/*
> + * pageset_get_values_by_high() gets the high water mark for
> + * hot per_cpu_pagelist to the value high for the pageset p.
> + */
> +static void pageset_get_values_by_high(int input_high,
> +				int *output_high, int *output_batch)
> +{
> +	*output_batch = max(1, input_high / 4);
> +	if ((input_high / 4) > (PAGE_SHIFT * 8))
> +		*output_batch = PAGE_SHIFT * 8;
> +}
>  
> -	pcp->batch = batch;
> +/* a companion to pageset_get_values_by_high() */
> +static void pageset_get_values_by_batch(int input_batch,
> +				int *output_high, int *output_batch)
> +{
> +	*output_high = 6 * input_batch;
> +	*output_batch = max(1, 1 * input_batch);
>  }
>  
> -/* a companion to pageset_set_high() */
> -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +static void pageset_get_values(struct zone *zone, int *high, int *batch)
>  {
> -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> +	if (percpu_pagelist_fraction) {
> +		pageset_get_values_by_high(
> +			(zone->managed_pages / percpu_pagelist_fraction),
> +			high, batch);
> +	} else
> +		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
>  }
>  
>  static void pageset_init(struct per_cpu_pageset *p)
> @@ -4263,51 +4298,38 @@ static void pageset_init(struct per_cpu_pageset *p)
>  		INIT_LIST_HEAD(&pcp->lists[migratetype]);
>  }
>  
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> +/* Use this only in boot time, because it doesn't do any synchronization */
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
>  {
> -	pageset_init(p);
> -	pageset_set_batch(p, batch);
> -}
> -
> -/*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> - */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> -				unsigned long high)
> -{
> -	unsigned long batch = max(1UL, high / 4);
> -	if ((high / 4) > (PAGE_SHIFT * 8))
> -		batch = PAGE_SHIFT * 8;
> -
> -	pageset_update(&p->pcp, high, batch);
> -}
> -
> -static void pageset_set_high_and_batch(struct zone *zone,
> -				       struct per_cpu_pageset *pcp)
> -{
> -	if (percpu_pagelist_fraction)
> -		pageset_set_high(pcp,
> -			(zone->managed_pages /
> -				percpu_pagelist_fraction));
> -	else
> -		pageset_set_batch(pcp, zone_batchsize(zone));
> -}
> +	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
>  
> -static void __meminit zone_pageset_init(struct zone *zone, int cpu)
> -{
> -	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
> +	pageset_get_values_by_batch(0, &high, &batch);
>  
> -	pageset_init(pcp);
> -	pageset_set_high_and_batch(zone, pcp);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(pcp, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  static void __meminit setup_zone_pageset(struct zone *zone)
>  {
>  	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
> +
> +	pageset_get_values(zone, &high, &batch);
> +
>  	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> -	for_each_possible_cpu(cpu)
> -		zone_pageset_init(zone, cpu);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(zone->pageset, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  /*
> @@ -5928,11 +5950,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
>  		goto out;
>  
>  	for_each_populated_zone(zone) {
> -		unsigned int cpu;
> +		int high, batch;
>  
> -		for_each_possible_cpu(cpu)
> -			pageset_set_high_and_batch(zone,
> -					per_cpu_ptr(zone->pageset, cpu));
> +		pageset_get_values(zone, &high, &batch);
> +		pageset_update(zone, high, batch);
>  	}
>  out:
>  	mutex_unlock(&pcp_batch_high_lock);
> @@ -6455,11 +6476,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>   */
>  void __meminit zone_pcp_update(struct zone *zone)
>  {
> -	unsigned cpu;
> +	int high, batch;
> +
>  	mutex_lock(&pcp_batch_high_lock);
> -	for_each_possible_cpu(cpu)
> -		pageset_set_high_and_batch(zone,
> -				per_cpu_ptr(zone->pageset, cpu));
> +	pageset_get_values(zone, &high, &batch);
> +	pageset_update(zone, high, batch);
>  	mutex_unlock(&pcp_batch_high_lock);
>  }
>  #endif
> 


-- 
Thanks.
Zhang Yanfei

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management
@ 2014-08-07  2:11     ` Zhang Yanfei
  0 siblings, 0 replies; 84+ messages in thread
From: Zhang Yanfei @ 2014-08-07  2:11 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

Hi Joonsoo,

On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> per cpu pages structure, aka pcp, has high and batch values to control
> how many pages we perform caching. This values could be updated
> asynchronously and updater should ensure that this doesn't make any
> problem. For this purpose, pageset_update() is implemented and do some
> memory synchronization. But, it turns out to be wrong when I implemented
> new feature using this. There is no corresponding smp_rmb() in read-side

Out of curiosity, what new feature are you implementing?

IIRC, pageset_update() is used to update high and batch which can be changed
during:

system boot
sysfs
memory hot-plug

So it seems to me that the latter two would have the problems you described here.

Thanks.

> so that it can't guarantee anything. Without correct updating, system
> could hang in free_pcppages_bulk() due to larger batch value than high.
> To properly update this values, we need to synchronization primitives on
> read-side, but, it hurts allocator's fastpath.
> 
> There is another choice for synchronization, that is, sending IPI. This
> is somewhat expensive, but, this is really rare case so I guess it has
> no problem here. However, reducing IPI is very helpful here. Current
> logic handles each CPU's pcp update one by one. To reduce sending IPI,
> we need to re-ogranize the code to handle all CPU's pcp update at one go.
> This patch implement these requirements.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e6fee4b..3e1e344 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3800,7 +3800,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
>   * not check if the processor is online before following the pageset pointer.
>   * Other parts of the kernel may not check if the zone is available.
>   */
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
>  static void setup_zone_pageset(struct zone *zone);
>  
> @@ -3846,9 +3846,9 @@ static int __build_all_zonelists(void *data)
>  	 * needs the percpu allocator in order to allocate its pagesets
>  	 * (a chicken-egg dilemma).
>  	 */
> -	for_each_possible_cpu(cpu) {
> -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> +	setup_pageset(&boot_pageset);
>  
> +	for_each_possible_cpu(cpu) {
>  #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>  		/*
>  		 * We now know the "local memory node" for each node--
> @@ -4230,24 +4230,59 @@ static int zone_batchsize(struct zone *zone)
>   * outside of boot time (or some other assurance that no concurrent updaters
>   * exist).
>   */
> -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> -		unsigned long batch)
> +static void pageset_update(struct zone *zone, int high, int batch)
>  {
> -       /* start with a fail safe value for batch */
> -	pcp->batch = 1;
> -	smp_wmb();
> +	int cpu;
> +	struct per_cpu_pages *pcp;
> +
> +	/* start with a fail safe value for batch */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = 1;
> +	}
> +	kick_all_cpus_sync();
> +
> +	/* Update high, then batch, in order */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->high = high;
> +	}
> +	kick_all_cpus_sync();
>  
> -       /* Update high, then batch, in order */
> -	pcp->high = high;
> -	smp_wmb();
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = batch;
> +	}
> +}
> +
> +/*
> + * pageset_get_values_by_high() gets the high water mark for
> + * hot per_cpu_pagelist to the value high for the pageset p.
> + */
> +static void pageset_get_values_by_high(int input_high,
> +				int *output_high, int *output_batch)
> +{
> +	*output_batch = max(1, input_high / 4);
> +	if ((input_high / 4) > (PAGE_SHIFT * 8))
> +		*output_batch = PAGE_SHIFT * 8;
> +}
>  
> -	pcp->batch = batch;
> +/* a companion to pageset_get_values_by_high() */
> +static void pageset_get_values_by_batch(int input_batch,
> +				int *output_high, int *output_batch)
> +{
> +	*output_high = 6 * input_batch;
> +	*output_batch = max(1, 1 * input_batch);
>  }
>  
> -/* a companion to pageset_set_high() */
> -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +static void pageset_get_values(struct zone *zone, int *high, int *batch)
>  {
> -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> +	if (percpu_pagelist_fraction) {
> +		pageset_get_values_by_high(
> +			(zone->managed_pages / percpu_pagelist_fraction),
> +			high, batch);
> +	} else
> +		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
>  }
>  
>  static void pageset_init(struct per_cpu_pageset *p)
> @@ -4263,51 +4298,38 @@ static void pageset_init(struct per_cpu_pageset *p)
>  		INIT_LIST_HEAD(&pcp->lists[migratetype]);
>  }
>  
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> +/* Use this only in boot time, because it doesn't do any synchronization */
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
>  {
> -	pageset_init(p);
> -	pageset_set_batch(p, batch);
> -}
> -
> -/*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> - */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> -				unsigned long high)
> -{
> -	unsigned long batch = max(1UL, high / 4);
> -	if ((high / 4) > (PAGE_SHIFT * 8))
> -		batch = PAGE_SHIFT * 8;
> -
> -	pageset_update(&p->pcp, high, batch);
> -}
> -
> -static void pageset_set_high_and_batch(struct zone *zone,
> -				       struct per_cpu_pageset *pcp)
> -{
> -	if (percpu_pagelist_fraction)
> -		pageset_set_high(pcp,
> -			(zone->managed_pages /
> -				percpu_pagelist_fraction));
> -	else
> -		pageset_set_batch(pcp, zone_batchsize(zone));
> -}
> +	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
>  
> -static void __meminit zone_pageset_init(struct zone *zone, int cpu)
> -{
> -	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
> +	pageset_get_values_by_batch(0, &high, &batch);
>  
> -	pageset_init(pcp);
> -	pageset_set_high_and_batch(zone, pcp);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(pcp, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  static void __meminit setup_zone_pageset(struct zone *zone)
>  {
>  	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
> +
> +	pageset_get_values(zone, &high, &batch);
> +
>  	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> -	for_each_possible_cpu(cpu)
> -		zone_pageset_init(zone, cpu);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(zone->pageset, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  /*
> @@ -5928,11 +5950,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
>  		goto out;
>  
>  	for_each_populated_zone(zone) {
> -		unsigned int cpu;
> +		int high, batch;
>  
> -		for_each_possible_cpu(cpu)
> -			pageset_set_high_and_batch(zone,
> -					per_cpu_ptr(zone->pageset, cpu));
> +		pageset_get_values(zone, &high, &batch);
> +		pageset_update(zone, high, batch);
>  	}
>  out:
>  	mutex_unlock(&pcp_batch_high_lock);
> @@ -6455,11 +6476,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>   */
>  void __meminit zone_pcp_update(struct zone *zone)
>  {
> -	unsigned cpu;
> +	int high, batch;
> +
>  	mutex_lock(&pcp_batch_high_lock);
> -	for_each_possible_cpu(cpu)
> -		pageset_set_high_and_batch(zone,
> -				per_cpu_ptr(zone->pageset, cpu));
> +	pageset_get_values(zone, &high, &batch);
> +	pageset_update(zone, high, batch);
>  	mutex_unlock(&pcp_batch_high_lock);
>  }
>  #endif
> 


-- 
Thanks.
Zhang Yanfei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-06 15:12     ` Vlastimil Babka
@ 2014-08-07  8:19       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07  8:19 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 05:12:20PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >Overall design of changed pageblock isolation logic is as following.
> 
> I'll reply here since the overall design part is described in this
> patch (would be worth to have it in cover letter as well IMHO).
> 
> >1. ISOLATION
> >- check pageblock is suitable for pageblock isolation.
> >- change migratetype of pageblock to MIGRATE_ISOLATE.
> >- disable pcp list.
> 
> Is it needed to disable the pcp list? Shouldn't drain be enough?
> After the drain you already are sure that future freeing will see
> MIGRATE_ISOLATE and skip pcp list anyway, so why disable it
> completely?

Yes, it is needed. Until we move freepages from normal buddy list
to isolate buddy list, freepages could be allocated by others. In this
case, they could be moved to pcp list. When it is flushed from pcp list
to buddy list, we need to check whether it is on isolate migratetype
pageblock or not. But, we don't want that hook in free_pcppages_bulk()
because it is page allocator's normal freepath. To remove it, we shoule
disable the pcp list here.

> 
> >- drain pcp list.
> >- pcp couldn't have any freepage at this point.
> >- synchronize all cpus to see correct migratetype.
> 
> This synchronization should already happen through the drain, no?

Yes, this line should be removed. Now synchronization is complete
through the drain. It is leftover from not submitted implementation attempt.

> >- freed pages on this pageblock will be handled specially and
> >not added to buddy list from here. With this way, there is no
> >possibility of merging pages on different buddy list.
> >- move freepages on normal buddy list to isolate buddy list.
> 
> Is there any advantage of moving the pages to isolate buddy list at
> this point, when we already have the new PageIsolated marking? Maybe
> not right now, but could this be later replaced by just splitting
> and marking PageIsolated the pages from normal buddy list? I guess
> memory hot-remove does not benefit from having buddy-merged pages
> and CMA probably also doesn't?

At least, we need to detach freepages on this pageblock from buddy
list to prevent futher allocation of these pages. In this case, moving
looks more simple approach to me.

> >There is no page on isolate buddy list so move_freepages_block()
> >returns number of moved freepages correctly.
> >- enable pcp list.
> >
> >2. TEST-ISOLATION
> >- activates freepages marked as PageIsolated() and add to isolate
> >buddy list.
> >- test if pageblock is properly isolated.
> >
> >3. UNDO-ISOLATION
> >- move freepages from isolate buddy list to normal buddy list.
> >There is no page on normal buddy list so move_freepages_block()
> >return number of moved freepages correctly.
> >- change migratetype of pageblock to normal migratetype
> >- synchronize all cpus.
> >- activate isolated freepages and add to normal buddy list.
> 
> The lack of pcp list deactivation in the undo part IMHO suggests
> that it is indeed not needed.

It is different situation. When UNDO, pages would be on isolate buddy
list so moving from buddy list to pcp list couldn't be possible and
then pcp list deactivation isn't needed.

> >With this patch, most of freepage counting bugs are solved and
> >exceptional handling for freepage count is done in pageblock isolation
> >logic rather than allocator.
> 
> \o/

:)

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-07  8:19       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07  8:19 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 05:12:20PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >Overall design of changed pageblock isolation logic is as following.
> 
> I'll reply here since the overall design part is described in this
> patch (would be worth to have it in cover letter as well IMHO).
> 
> >1. ISOLATION
> >- check pageblock is suitable for pageblock isolation.
> >- change migratetype of pageblock to MIGRATE_ISOLATE.
> >- disable pcp list.
> 
> Is it needed to disable the pcp list? Shouldn't drain be enough?
> After the drain you already are sure that future freeing will see
> MIGRATE_ISOLATE and skip pcp list anyway, so why disable it
> completely?

Yes, it is needed. Until we move freepages from normal buddy list
to isolate buddy list, freepages could be allocated by others. In this
case, they could be moved to pcp list. When it is flushed from pcp list
to buddy list, we need to check whether it is on isolate migratetype
pageblock or not. But, we don't want that hook in free_pcppages_bulk()
because it is page allocator's normal freepath. To remove it, we shoule
disable the pcp list here.

> 
> >- drain pcp list.
> >- pcp couldn't have any freepage at this point.
> >- synchronize all cpus to see correct migratetype.
> 
> This synchronization should already happen through the drain, no?

Yes, this line should be removed. Now synchronization is complete
through the drain. It is leftover from not submitted implementation attempt.

> >- freed pages on this pageblock will be handled specially and
> >not added to buddy list from here. With this way, there is no
> >possibility of merging pages on different buddy list.
> >- move freepages on normal buddy list to isolate buddy list.
> 
> Is there any advantage of moving the pages to isolate buddy list at
> this point, when we already have the new PageIsolated marking? Maybe
> not right now, but could this be later replaced by just splitting
> and marking PageIsolated the pages from normal buddy list? I guess
> memory hot-remove does not benefit from having buddy-merged pages
> and CMA probably also doesn't?

At least, we need to detach freepages on this pageblock from buddy
list to prevent futher allocation of these pages. In this case, moving
looks more simple approach to me.

> >There is no page on isolate buddy list so move_freepages_block()
> >returns number of moved freepages correctly.
> >- enable pcp list.
> >
> >2. TEST-ISOLATION
> >- activates freepages marked as PageIsolated() and add to isolate
> >buddy list.
> >- test if pageblock is properly isolated.
> >
> >3. UNDO-ISOLATION
> >- move freepages from isolate buddy list to normal buddy list.
> >There is no page on normal buddy list so move_freepages_block()
> >return number of moved freepages correctly.
> >- change migratetype of pageblock to normal migratetype
> >- synchronize all cpus.
> >- activate isolated freepages and add to normal buddy list.
> 
> The lack of pcp list deactivation in the undo part IMHO suggests
> that it is indeed not needed.

It is different situation. When UNDO, pages would be on isolate buddy
list so moving from buddy list to pcp list couldn't be possible and
then pcp list deactivation isn't needed.

> >With this patch, most of freepage counting bugs are solved and
> >exceptional handling for freepage count is done in pageblock isolation
> >logic rather than allocator.
> 
> \o/

:)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 0/8] fix freepage count problems in memory isolation
  2014-08-07  0:49   ` Zhang Yanfei
@ 2014-08-07  8:20     ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07  8:20 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 08:49:00AM +0800, Zhang Yanfei wrote:
> Hi Joonsoo,
> 
> The first 3 patches in this patchset are in a bit of mess.

Sorry about that.
I will do better in next spin. ):

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 0/8] fix freepage count problems in memory isolation
@ 2014-08-07  8:20     ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07  8:20 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 08:49:00AM +0800, Zhang Yanfei wrote:
> Hi Joonsoo,
> 
> The first 3 patches in this patchset are in a bit of mess.

Sorry about that.
I will do better in next spin. ):

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management
  2014-08-07  2:11     ` Zhang Yanfei
@ 2014-08-07  8:23       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07  8:23 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 10:11:14AM +0800, Zhang Yanfei wrote:
> Hi Joonsoo,
> 
> On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> > per cpu pages structure, aka pcp, has high and batch values to control
> > how many pages we perform caching. This values could be updated
> > asynchronously and updater should ensure that this doesn't make any
> > problem. For this purpose, pageset_update() is implemented and do some
> > memory synchronization. But, it turns out to be wrong when I implemented
> > new feature using this. There is no corresponding smp_rmb() in read-side
> 
> Out of curiosity, what new feature are you implementing?

I mean just zone_pcp_disable() and zone_pcp_enable(). :)

> IIRC, pageset_update() is used to update high and batch which can be changed
> during:
> 
> system boot
> sysfs
> memory hot-plug
> 
> So it seems to me that the latter two would have the problems you described here.

Yes, I think so. But I'm not sure, because I didn't look at it
in detail. :)

Thanks.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management
@ 2014-08-07  8:23       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07  8:23 UTC (permalink / raw)
  To: Zhang Yanfei
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 10:11:14AM +0800, Zhang Yanfei wrote:
> Hi Joonsoo,
> 
> On 08/06/2014 03:18 PM, Joonsoo Kim wrote:
> > per cpu pages structure, aka pcp, has high and batch values to control
> > how many pages we perform caching. This values could be updated
> > asynchronously and updater should ensure that this doesn't make any
> > problem. For this purpose, pageset_update() is implemented and do some
> > memory synchronization. But, it turns out to be wrong when I implemented
> > new feature using this. There is no corresponding smp_rmb() in read-side
> 
> Out of curiosity, what new feature are you implementing?

I mean just zone_pcp_disable() and zone_pcp_enable(). :)

> IIRC, pageset_update() is used to update high and batch which can be changed
> during:
> 
> system boot
> sysfs
> memory hot-plug
> 
> So it seems to me that the latter two would have the problems you described here.

Yes, I think so. But I'm not sure, because I didn't look at it
in detail. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-07  8:19       ` Joonsoo Kim
@ 2014-08-07  8:53         ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07  8:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/07/2014 10:19 AM, Joonsoo Kim wrote:
>> Is it needed to disable the pcp list? Shouldn't drain be enough?
>> After the drain you already are sure that future freeing will see
>> MIGRATE_ISOLATE and skip pcp list anyway, so why disable it
>> completely?
>
> Yes, it is needed. Until we move freepages from normal buddy list
> to isolate buddy list, freepages could be allocated by others. In this
> case, they could be moved to pcp list. When it is flushed from pcp list
> to buddy list, we need to check whether it is on isolate migratetype
> pageblock or not. But, we don't want that hook in free_pcppages_bulk()
> because it is page allocator's normal freepath. To remove it, we shoule
> disable the pcp list here.

Ah, right. I thought that everything going to pcp lists would be through 
freeing which would already observe the isolate migratetype and skip 
pcplist. I forgot about the direct filling of pcplists from buddy list. 
You're right that we don't want extra hooks there.

Still, couldn't this be solved in a simpler way via another pcplist 
drain after the pages are moved from normal to isolate buddy list? 
Should be even faster because instead of disable - drain - enable (5 
all-cpu kicks, since each pageset_update does 2 kicks) you have drain - 
drain (2 kicks). While it's true that pageset_update is single-zone 
operation, I guess we would easily benefit from having a single-zone 
drain operation as well.

Vlastimil




^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-07  8:53         ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07  8:53 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/07/2014 10:19 AM, Joonsoo Kim wrote:
>> Is it needed to disable the pcp list? Shouldn't drain be enough?
>> After the drain you already are sure that future freeing will see
>> MIGRATE_ISOLATE and skip pcp list anyway, so why disable it
>> completely?
>
> Yes, it is needed. Until we move freepages from normal buddy list
> to isolate buddy list, freepages could be allocated by others. In this
> case, they could be moved to pcp list. When it is flushed from pcp list
> to buddy list, we need to check whether it is on isolate migratetype
> pageblock or not. But, we don't want that hook in free_pcppages_bulk()
> because it is page allocator's normal freepath. To remove it, we shoule
> disable the pcp list here.

Ah, right. I thought that everything going to pcp lists would be through 
freeing which would already observe the isolate migratetype and skip 
pcplist. I forgot about the direct filling of pcplists from buddy list. 
You're right that we don't want extra hooks there.

Still, couldn't this be solved in a simpler way via another pcplist 
drain after the pages are moved from normal to isolate buddy list? 
Should be even faster because instead of disable - drain - enable (5 
all-cpu kicks, since each pageset_update does 2 kicks) you have drain - 
drain (2 kicks). While it's true that pageset_update is single-zone 
operation, I guess we would easily benefit from having a single-zone 
drain operation as well.

Vlastimil



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-07  8:53         ` Vlastimil Babka
@ 2014-08-07 12:26           ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07 12:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim,
	Linux Memory Management List, LKML

2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 08/07/2014 10:19 AM, Joonsoo Kim wrote:
>>>
>>> Is it needed to disable the pcp list? Shouldn't drain be enough?
>>> After the drain you already are sure that future freeing will see
>>> MIGRATE_ISOLATE and skip pcp list anyway, so why disable it
>>> completely?
>>
>>
>> Yes, it is needed. Until we move freepages from normal buddy list
>> to isolate buddy list, freepages could be allocated by others. In this
>> case, they could be moved to pcp list. When it is flushed from pcp list
>> to buddy list, we need to check whether it is on isolate migratetype
>> pageblock or not. But, we don't want that hook in free_pcppages_bulk()
>> because it is page allocator's normal freepath. To remove it, we shoule
>> disable the pcp list here.
>
>
> Ah, right. I thought that everything going to pcp lists would be through
> freeing which would already observe the isolate migratetype and skip
> pcplist. I forgot about the direct filling of pcplists from buddy list.
> You're right that we don't want extra hooks there.
>
> Still, couldn't this be solved in a simpler way via another pcplist drain
> after the pages are moved from normal to isolate buddy list? Should be even
> faster because instead of disable - drain - enable (5 all-cpu kicks, since
> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
> it's true that pageset_update is single-zone operation, I guess we would
> easily benefit from having a single-zone drain operation as well.

I hope so, but, it's not possible. Consider following situation.

Page A: on pcplist of CPU2 and it is on isolate pageblock.

CPU 1                   CPU 2
drain pcplist
wait IPI finished     move A to normal buddy list
finish IPI
                            A is moved to pcplist by allocation request

move doesn't catch A,
because it is on pcplist.

drain pcplist
wait IPI finished     move A to normal buddy list
finish IPI
                            A is moved to pcplist by allocation request

repeat!!

It could happen infinitely, though, low possibility.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-07 12:26           ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07 12:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim,
	Linux Memory Management List, LKML

2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 08/07/2014 10:19 AM, Joonsoo Kim wrote:
>>>
>>> Is it needed to disable the pcp list? Shouldn't drain be enough?
>>> After the drain you already are sure that future freeing will see
>>> MIGRATE_ISOLATE and skip pcp list anyway, so why disable it
>>> completely?
>>
>>
>> Yes, it is needed. Until we move freepages from normal buddy list
>> to isolate buddy list, freepages could be allocated by others. In this
>> case, they could be moved to pcp list. When it is flushed from pcp list
>> to buddy list, we need to check whether it is on isolate migratetype
>> pageblock or not. But, we don't want that hook in free_pcppages_bulk()
>> because it is page allocator's normal freepath. To remove it, we shoule
>> disable the pcp list here.
>
>
> Ah, right. I thought that everything going to pcp lists would be through
> freeing which would already observe the isolate migratetype and skip
> pcplist. I forgot about the direct filling of pcplists from buddy list.
> You're right that we don't want extra hooks there.
>
> Still, couldn't this be solved in a simpler way via another pcplist drain
> after the pages are moved from normal to isolate buddy list? Should be even
> faster because instead of disable - drain - enable (5 all-cpu kicks, since
> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
> it's true that pageset_update is single-zone operation, I guess we would
> easily benefit from having a single-zone drain operation as well.

I hope so, but, it's not possible. Consider following situation.

Page A: on pcplist of CPU2 and it is on isolate pageblock.

CPU 1                   CPU 2
drain pcplist
wait IPI finished     move A to normal buddy list
finish IPI
                            A is moved to pcplist by allocation request

move doesn't catch A,
because it is on pcplist.

drain pcplist
wait IPI finished     move A to normal buddy list
finish IPI
                            A is moved to pcplist by allocation request

repeat!!

It could happen infinitely, though, low possibility.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-07 12:26           ` Joonsoo Kim
@ 2014-08-07 13:04             ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 13:04 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim,
	Linux Memory Management List, LKML

On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> Ah, right. I thought that everything going to pcp lists would be through
>> freeing which would already observe the isolate migratetype and skip
>> pcplist. I forgot about the direct filling of pcplists from buddy list.
>> You're right that we don't want extra hooks there.
>>
>> Still, couldn't this be solved in a simpler way via another pcplist drain
>> after the pages are moved from normal to isolate buddy list? Should be even
>> faster because instead of disable - drain - enable (5 all-cpu kicks, since
>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
>> it's true that pageset_update is single-zone operation, I guess we would
>> easily benefit from having a single-zone drain operation as well.
>
> I hope so, but, it's not possible. Consider following situation.
>
> Page A: on pcplist of CPU2 and it is on isolate pageblock.
>
> CPU 1                   CPU 2
> drain pcplist
> wait IPI finished     move A to normal buddy list
> finish IPI
>                              A is moved to pcplist by allocation request
>
> move doesn't catch A,
> because it is on pcplist.
>
> drain pcplist
> wait IPI finished     move A to normal buddy list
> finish IPI
>                              A is moved to pcplist by allocation request
>
> repeat!!
>
> It could happen infinitely, though, low possibility.

Hm I see. Not a correctness issue, but still a failure to isolate. 
Probably not impossible with enough CPU's and considering the fact that 
after pcplists are drained, the next allocation request will try to 
refill them. And during the drain, the pages are added to the beginning 
of the free_list AFAICS, so they will be in the first refill batch.

OK, another attempt for alternative solution proposal :) It's not that I 
would think disabling pcp would be so bad, just want to be sure there is 
no better alternative.

What if the drain operation had a flag telling it to recheck pageblock 
migratetype and don't assume it's on the correct pcplist. Then the 
problem would go away I think? Would it be possible to do without 
affecting the normal drain-pcplist-when-full path? So that the cost is 
only applied to isolation, but lower cost than pcplist disabling.

Actually I look that free_pcppages_bulk() doesn't consider migratetype 
of the pcplist, but uses get_freepage_migratetype(page). So the pcplist 
drain could first scan the pcplists and rewrite the freepage_migratetype 
according to pageblock_migratetype. Then the free_pcppages_bulk() 
operation would be unchanged for normal operation.

Or is this too clumsy? We could be also smart and have an alternative to 
free_pcppages_bulk() which would omit the round-robin stuff (not needed 
for this kind of drain), and have a pfn range to limit its operation to 
pages that we are isolating.

Hm I guess with this approach some pages might still escape us if they 
were moving between normal buddy list and pcplist through rmqueue_bulk() 
and free_pcppages_bulk() (and not through our drain) at the wrong 
moments, but I guess that would require a really specific workload 
(alternating between burst of allocations and deallocations) and 
consistently unlucky timing.

> Thanks.
>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-07 13:04             ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 13:04 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim,
	Linux Memory Management List, LKML

On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>> Ah, right. I thought that everything going to pcp lists would be through
>> freeing which would already observe the isolate migratetype and skip
>> pcplist. I forgot about the direct filling of pcplists from buddy list.
>> You're right that we don't want extra hooks there.
>>
>> Still, couldn't this be solved in a simpler way via another pcplist drain
>> after the pages are moved from normal to isolate buddy list? Should be even
>> faster because instead of disable - drain - enable (5 all-cpu kicks, since
>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
>> it's true that pageset_update is single-zone operation, I guess we would
>> easily benefit from having a single-zone drain operation as well.
>
> I hope so, but, it's not possible. Consider following situation.
>
> Page A: on pcplist of CPU2 and it is on isolate pageblock.
>
> CPU 1                   CPU 2
> drain pcplist
> wait IPI finished     move A to normal buddy list
> finish IPI
>                              A is moved to pcplist by allocation request
>
> move doesn't catch A,
> because it is on pcplist.
>
> drain pcplist
> wait IPI finished     move A to normal buddy list
> finish IPI
>                              A is moved to pcplist by allocation request
>
> repeat!!
>
> It could happen infinitely, though, low possibility.

Hm I see. Not a correctness issue, but still a failure to isolate. 
Probably not impossible with enough CPU's and considering the fact that 
after pcplists are drained, the next allocation request will try to 
refill them. And during the drain, the pages are added to the beginning 
of the free_list AFAICS, so they will be in the first refill batch.

OK, another attempt for alternative solution proposal :) It's not that I 
would think disabling pcp would be so bad, just want to be sure there is 
no better alternative.

What if the drain operation had a flag telling it to recheck pageblock 
migratetype and don't assume it's on the correct pcplist. Then the 
problem would go away I think? Would it be possible to do without 
affecting the normal drain-pcplist-when-full path? So that the cost is 
only applied to isolation, but lower cost than pcplist disabling.

Actually I look that free_pcppages_bulk() doesn't consider migratetype 
of the pcplist, but uses get_freepage_migratetype(page). So the pcplist 
drain could first scan the pcplists and rewrite the freepage_migratetype 
according to pageblock_migratetype. Then the free_pcppages_bulk() 
operation would be unchanged for normal operation.

Or is this too clumsy? We could be also smart and have an alternative to 
free_pcppages_bulk() which would omit the round-robin stuff (not needed 
for this kind of drain), and have a pfn range to limit its operation to 
pages that we are isolating.

Hm I guess with this approach some pages might still escape us if they 
were moving between normal buddy list and pcplist through rmqueue_bulk() 
and free_pcppages_bulk() (and not through our drain) at the wrong 
moments, but I guess that would require a really specific workload 
(alternating between burst of allocations and deallocations) and 
consistently unlucky timing.

> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-07 13:04             ` Vlastimil Babka
@ 2014-08-07 13:35               ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07 13:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim,
	Linux Memory Management List, LKML

2014-08-07 22:04 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
>>
>> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>>>
>>> Ah, right. I thought that everything going to pcp lists would be through
>>>
>>> freeing which would already observe the isolate migratetype and skip
>>> pcplist. I forgot about the direct filling of pcplists from buddy list.
>>> You're right that we don't want extra hooks there.
>>>
>>> Still, couldn't this be solved in a simpler way via another pcplist drain
>>> after the pages are moved from normal to isolate buddy list? Should be
>>> even
>>> faster because instead of disable - drain - enable (5 all-cpu kicks,
>>> since
>>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
>>> it's true that pageset_update is single-zone operation, I guess we would
>>> easily benefit from having a single-zone drain operation as well.
>>
>>
>> I hope so, but, it's not possible. Consider following situation.
>>
>> Page A: on pcplist of CPU2 and it is on isolate pageblock.
>>
>> CPU 1                   CPU 2
>> drain pcplist
>> wait IPI finished     move A to normal buddy list
>> finish IPI
>>                              A is moved to pcplist by allocation request
>>
>> move doesn't catch A,
>> because it is on pcplist.
>>
>> drain pcplist
>> wait IPI finished     move A to normal buddy list
>> finish IPI
>>                              A is moved to pcplist by allocation request
>>
>> repeat!!
>>
>> It could happen infinitely, though, low possibility.
>
>
> Hm I see. Not a correctness issue, but still a failure to isolate. Probably
> not impossible with enough CPU's and considering the fact that after
> pcplists are drained, the next allocation request will try to refill them.
> And during the drain, the pages are added to the beginning of the free_list
> AFAICS, so they will be in the first refill batch.

I think that it is correctness issue. When page A is moved to normal buddy
list, merge could happen and freepage counting would be incorrect.

> OK, another attempt for alternative solution proposal :) It's not that I
> would think disabling pcp would be so bad, just want to be sure there is no
> better alternative.

Yeah, welcome any comment. :)

> What if the drain operation had a flag telling it to recheck pageblock
> migratetype and don't assume it's on the correct pcplist. Then the problem
> would go away I think? Would it be possible to do without affecting the
> normal drain-pcplist-when-full path? So that the cost is only applied to
> isolation, but lower cost than pcplist disabling.
>
> Actually I look that free_pcppages_bulk() doesn't consider migratetype of
> the pcplist, but uses get_freepage_migratetype(page). So the pcplist drain
> could first scan the pcplists and rewrite the freepage_migratetype according
> to pageblock_migratetype. Then the free_pcppages_bulk() operation would be
> unchanged for normal operation.
>
> Or is this too clumsy? We could be also smart and have an alternative to
> free_pcppages_bulk() which would omit the round-robin stuff (not needed for
> this kind of drain), and have a pfn range to limit its operation to pages
> that we are isolating.
> Hm I guess with this approach some pages might still escape us if they were
> moving between normal buddy list and pcplist through rmqueue_bulk() and
> free_pcppages_bulk() (and not through our drain) at the wrong moments, but I
> guess that would require a really specific workload (alternating between
> burst of allocations and deallocations) and consistently unlucky timing.
>

Yes, it has similar problem as I mentioned above.

Page A: on pcplist of CPU2 and it is on isolate pageblock.

CPU 1                   CPU 2
                            A is on normal buddy list
drain pcplist
wait IPI finished
finish IPI
                             A is moved to pcplist by allocation request
move doesn't catch A,
because it is on pcplist.
                            move A to normal buddy list by free request

drain pcplist
wait IPI finished
finish IPI
                             A is moved to pcplist by allocation request
move doesn't catch A,
because it is on pcplist.
                            move A to normal buddy list by free request

repeat!!

Although it is really corner case, I would like to choose error-free
approach something like pcplist disable. :)

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-07 13:35               ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-07 13:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joonsoo Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu,
	Zhang Yanfei, Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim,
	Linux Memory Management List, LKML

2014-08-07 22:04 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
> On 08/07/2014 02:26 PM, Joonsoo Kim wrote:
>>
>> 2014-08-07 17:53 GMT+09:00 Vlastimil Babka <vbabka@suse.cz>:
>>>
>>> Ah, right. I thought that everything going to pcp lists would be through
>>>
>>> freeing which would already observe the isolate migratetype and skip
>>> pcplist. I forgot about the direct filling of pcplists from buddy list.
>>> You're right that we don't want extra hooks there.
>>>
>>> Still, couldn't this be solved in a simpler way via another pcplist drain
>>> after the pages are moved from normal to isolate buddy list? Should be
>>> even
>>> faster because instead of disable - drain - enable (5 all-cpu kicks,
>>> since
>>> each pageset_update does 2 kicks) you have drain - drain (2 kicks). While
>>> it's true that pageset_update is single-zone operation, I guess we would
>>> easily benefit from having a single-zone drain operation as well.
>>
>>
>> I hope so, but, it's not possible. Consider following situation.
>>
>> Page A: on pcplist of CPU2 and it is on isolate pageblock.
>>
>> CPU 1                   CPU 2
>> drain pcplist
>> wait IPI finished     move A to normal buddy list
>> finish IPI
>>                              A is moved to pcplist by allocation request
>>
>> move doesn't catch A,
>> because it is on pcplist.
>>
>> drain pcplist
>> wait IPI finished     move A to normal buddy list
>> finish IPI
>>                              A is moved to pcplist by allocation request
>>
>> repeat!!
>>
>> It could happen infinitely, though, low possibility.
>
>
> Hm I see. Not a correctness issue, but still a failure to isolate. Probably
> not impossible with enough CPU's and considering the fact that after
> pcplists are drained, the next allocation request will try to refill them.
> And during the drain, the pages are added to the beginning of the free_list
> AFAICS, so they will be in the first refill batch.

I think that it is correctness issue. When page A is moved to normal buddy
list, merge could happen and freepage counting would be incorrect.

> OK, another attempt for alternative solution proposal :) It's not that I
> would think disabling pcp would be so bad, just want to be sure there is no
> better alternative.

Yeah, welcome any comment. :)

> What if the drain operation had a flag telling it to recheck pageblock
> migratetype and don't assume it's on the correct pcplist. Then the problem
> would go away I think? Would it be possible to do without affecting the
> normal drain-pcplist-when-full path? So that the cost is only applied to
> isolation, but lower cost than pcplist disabling.
>
> Actually I look that free_pcppages_bulk() doesn't consider migratetype of
> the pcplist, but uses get_freepage_migratetype(page). So the pcplist drain
> could first scan the pcplists and rewrite the freepage_migratetype according
> to pageblock_migratetype. Then the free_pcppages_bulk() operation would be
> unchanged for normal operation.
>
> Or is this too clumsy? We could be also smart and have an alternative to
> free_pcppages_bulk() which would omit the round-robin stuff (not needed for
> this kind of drain), and have a pfn range to limit its operation to pages
> that we are isolating.
> Hm I guess with this approach some pages might still escape us if they were
> moving between normal buddy list and pcplist through rmqueue_bulk() and
> free_pcppages_bulk() (and not through our drain) at the wrong moments, but I
> guess that would require a really specific workload (alternating between
> burst of allocations and deallocations) and consistently unlucky timing.
>

Yes, it has similar problem as I mentioned above.

Page A: on pcplist of CPU2 and it is on isolate pageblock.

CPU 1                   CPU 2
                            A is on normal buddy list
drain pcplist
wait IPI finished
finish IPI
                             A is moved to pcplist by allocation request
move doesn't catch A,
because it is on pcplist.
                            move A to normal buddy list by free request

drain pcplist
wait IPI finished
finish IPI
                             A is moved to pcplist by allocation request
move doesn't catch A,
because it is on pcplist.
                            move A to normal buddy list by free request

repeat!!

Although it is really corner case, I would like to choose error-free
approach something like pcplist disable. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-07 13:49     ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 13:49 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> The check '!PageBuddy(page) && page_count(page) == 0 &&
> migratetype == MIGRATE_ISOLATE' would mean the page on free processing.

What is "the page on free processing"? I thought this test means the 
page is on some CPU's pcplist?

> Although it could go into buddy allocator within a short time,
> futher operation such as isolate_freepages_range() in CMA, called after
> test_page_isolated_in_pageblock(), could be failed due to this unstability

By "unstability" you mean the page can be allocated again from the 
pcplist instead of being freed to buddy list?

> since it requires that the page is on buddy. I think that removing
> this unstability is good thing.
>
> And, following patch makes isolated freepage has new status matched with
> this condition and this check is the obstacle to that change. So remove
> it.

You could also say that pages from isolated pageblocks can no longer 
appear on pcplists after the later patches.

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   mm/page_isolation.c |    6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index d1473b2..3100f98 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -198,11 +198,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>   						MIGRATE_ISOLATE);
>   			}
>   			pfn += 1 << page_order(page);
> -		}
> -		else if (page_count(page) == 0 &&
> -			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
> -			pfn += 1;
> -		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
> +		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
>   			/*
>   			 * The HWPoisoned page may be not in buddy
>   			 * system, and page_count() is not 0.
>


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
@ 2014-08-07 13:49     ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 13:49 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> The check '!PageBuddy(page) && page_count(page) == 0 &&
> migratetype == MIGRATE_ISOLATE' would mean the page on free processing.

What is "the page on free processing"? I thought this test means the 
page is on some CPU's pcplist?

> Although it could go into buddy allocator within a short time,
> futher operation such as isolate_freepages_range() in CMA, called after
> test_page_isolated_in_pageblock(), could be failed due to this unstability

By "unstability" you mean the page can be allocated again from the 
pcplist instead of being freed to buddy list?

> since it requires that the page is on buddy. I think that removing
> this unstability is good thing.
>
> And, following patch makes isolated freepage has new status matched with
> this condition and this check is the obstacle to that change. So remove
> it.

You could also say that pages from isolated pageblocks can no longer 
appear on pcplists after the later patches.

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   mm/page_isolation.c |    6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index d1473b2..3100f98 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -198,11 +198,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>   						MIGRATE_ISOLATE);
>   			}
>   			pfn += 1 << page_order(page);
> -		}
> -		else if (page_count(page) == 0 &&
> -			get_freepage_migratetype(page) == MIGRATE_ISOLATE)
> -			pfn += 1;
> -		else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
> +		} else if (skip_hwpoisoned_pages && PageHWPoison(page)) {
>   			/*
>   			 * The HWPoisoned page may be not in buddy
>   			 * system, and page_count() is not 0.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-07 14:34     ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 14:34 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> We got migratetype of the freeing page without holding the zone lock so
> it could be racy. There are two cases of this race.
>
> 1. pages are added to isolate buddy list after restoring original
> migratetype.
> 2. pages are added to normal buddy list while pageblock is isolated.
>
> If case 1 happens, we can't allocate freepages on isolate buddy list
> until next pageblock isolation occurs.
> In case of 2, pages could be merged with pages on isolate buddy list and
> located on normal buddy list. This makes freepage counting incorrect
> and break the property of pageblock isolation.
>
> One solution to this problem is checking pageblock migratetype with
> holding zone lock in __free_one_page() and I posted it before, but,
> it didn't get welcome since it needs the hook in zone lock critical
> section on freepath.
>
> This is another solution to this problem and impose most overhead on
> pageblock isolation logic. Following is how this solution works.
>
> 1. Extends irq disabled period on freepath to call
> get_pfnblock_migratetype() with irq disabled. With this, we can be
> sure that future freed pages will see modified pageblock migratetype
> after certain synchronization point so we don't need to hold the zone
> lock to get correct pageblock migratetype. Although it extends irq
> disabled period on freepath, I guess it is marginal and better than
> adding the hook in zone lock critical section.
>
> 2. #1 requires IPI for synchronization and we can't hold the zone lock

It would be better to explain here that the synchronization point is 
pcplists draining.

> during processing IPI. In this time, some pages could be moved from buddy
> list to pcp list on page allocation path and later it could be moved again
> from pcp list to buddy list. In this time, this page would be on isolate

It is difficult to understand the problem just by reading this. I guess 
the timelines you included while explaining the problem to me, would 
help here :)

> pageblock, so, the hook is required on free_pcppages_bulk() to prevent

More clearly, a recheck for pageblock's migratetype would be needed in 
free_pcppages_bulk(), which would again impose overhead outside isolation.

> misplacement. To remove this possibility, disabling and draining pcp
> list is needed during isolation. It guaratees that there is no page on pcp
> list on all cpus while isolation, so misplacement problem can't happen.
>
> Note that this doesn't fix freepage counting problem. To fix it,
> we need more logic. Following patches will do it.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   mm/internal.h       |    2 ++
>   mm/page_alloc.c     |   27 ++++++++++++++++++++-------
>   mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
>   3 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index a1b651b..81b8884 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>   /*
>    * in mm/page_alloc.c
>    */
> +extern void zone_pcp_disable(struct zone *zone);
> +extern void zone_pcp_enable(struct zone *zone);
>   extern void __free_pages_bootmem(struct page *page, unsigned int order);
>   extern void prep_compound_page(struct page *page, unsigned long order);
>   #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e1e344..4517b1d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>   			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>   			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
>   			trace_mm_page_pcpu_drain(page, 0, mt);
> -			if (likely(!is_migrate_isolate_page(page))) {
> -				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> -				if (is_migrate_cma(mt))
> -					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> -			}
> +			__mod_zone_freepage_state(zone, 1, mt);

Could be worth mentioning that this can now be removed as it was an 
incomplete attempt to fix freepage counting, but didn't address the 
misplacement.

>   		} while (--to_free && --batch_free && !list_empty(list));
>   	}
>   	spin_unlock(&zone->lock);
> @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>   	if (!free_pages_prepare(page, order))
>   		return;
>
> -	migratetype = get_pfnblock_migratetype(page, pfn);
>   	local_irq_save(flags);
> +	migratetype = get_pfnblock_migratetype(page, pfn);
>   	__count_vm_events(PGFREE, 1 << order);
>   	set_freepage_migratetype(page, migratetype);
>   	free_one_page(page_zone(page), page, pfn, order, migratetype);
> @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
>   	if (!free_pages_prepare(page, 0))
>   		return;
>
> +	local_irq_save(flags);
>   	migratetype = get_pfnblock_migratetype(page, pfn);
>   	set_freepage_migratetype(page, migratetype);
> -	local_irq_save(flags);
>   	__count_vm_event(PGFREE);

Maybe add comments to these two to make it clear that this cannot be 
moved outside of the irq disabled part, in case anyone considers it 
(again) in the future?

> @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>   	 */
>
>   out:
> -	if (!ret) {
> -		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +	if (ret) {
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		return ret;
> +	}
>   on pcplists
> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	migratetype = get_pageblock_migratetype(page);
> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	spin_unlock_irqrestore(&zone->lock, flags);
>
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> -	}
> +	zone_pcp_disable(zone);
> +
> +	/*
> +	 * After this point, freed pages will see MIGRATE_ISOLATE as
> +	 * their pageblock migratetype on all cpus. And pcp list has
> +	 * no free page.
> +	 */
> +	on_each_cpu(drain_local_pages, NULL, 1);

Is there any difference between drain_all_pages() and this, or why 
didn't you use drain_all_pages()?


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-07 14:34     ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 14:34 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> We got migratetype of the freeing page without holding the zone lock so
> it could be racy. There are two cases of this race.
>
> 1. pages are added to isolate buddy list after restoring original
> migratetype.
> 2. pages are added to normal buddy list while pageblock is isolated.
>
> If case 1 happens, we can't allocate freepages on isolate buddy list
> until next pageblock isolation occurs.
> In case of 2, pages could be merged with pages on isolate buddy list and
> located on normal buddy list. This makes freepage counting incorrect
> and break the property of pageblock isolation.
>
> One solution to this problem is checking pageblock migratetype with
> holding zone lock in __free_one_page() and I posted it before, but,
> it didn't get welcome since it needs the hook in zone lock critical
> section on freepath.
>
> This is another solution to this problem and impose most overhead on
> pageblock isolation logic. Following is how this solution works.
>
> 1. Extends irq disabled period on freepath to call
> get_pfnblock_migratetype() with irq disabled. With this, we can be
> sure that future freed pages will see modified pageblock migratetype
> after certain synchronization point so we don't need to hold the zone
> lock to get correct pageblock migratetype. Although it extends irq
> disabled period on freepath, I guess it is marginal and better than
> adding the hook in zone lock critical section.
>
> 2. #1 requires IPI for synchronization and we can't hold the zone lock

It would be better to explain here that the synchronization point is 
pcplists draining.

> during processing IPI. In this time, some pages could be moved from buddy
> list to pcp list on page allocation path and later it could be moved again
> from pcp list to buddy list. In this time, this page would be on isolate

It is difficult to understand the problem just by reading this. I guess 
the timelines you included while explaining the problem to me, would 
help here :)

> pageblock, so, the hook is required on free_pcppages_bulk() to prevent

More clearly, a recheck for pageblock's migratetype would be needed in 
free_pcppages_bulk(), which would again impose overhead outside isolation.

> misplacement. To remove this possibility, disabling and draining pcp
> list is needed during isolation. It guaratees that there is no page on pcp
> list on all cpus while isolation, so misplacement problem can't happen.
>
> Note that this doesn't fix freepage counting problem. To fix it,
> we need more logic. Following patches will do it.
>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   mm/internal.h       |    2 ++
>   mm/page_alloc.c     |   27 ++++++++++++++++++++-------
>   mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
>   3 files changed, 55 insertions(+), 19 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index a1b651b..81b8884 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>   /*
>    * in mm/page_alloc.c
>    */
> +extern void zone_pcp_disable(struct zone *zone);
> +extern void zone_pcp_enable(struct zone *zone);
>   extern void __free_pages_bootmem(struct page *page, unsigned int order);
>   extern void prep_compound_page(struct page *page, unsigned long order);
>   #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e1e344..4517b1d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>   			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>   			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
>   			trace_mm_page_pcpu_drain(page, 0, mt);
> -			if (likely(!is_migrate_isolate_page(page))) {
> -				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> -				if (is_migrate_cma(mt))
> -					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> -			}
> +			__mod_zone_freepage_state(zone, 1, mt);

Could be worth mentioning that this can now be removed as it was an 
incomplete attempt to fix freepage counting, but didn't address the 
misplacement.

>   		} while (--to_free && --batch_free && !list_empty(list));
>   	}
>   	spin_unlock(&zone->lock);
> @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>   	if (!free_pages_prepare(page, order))
>   		return;
>
> -	migratetype = get_pfnblock_migratetype(page, pfn);
>   	local_irq_save(flags);
> +	migratetype = get_pfnblock_migratetype(page, pfn);
>   	__count_vm_events(PGFREE, 1 << order);
>   	set_freepage_migratetype(page, migratetype);
>   	free_one_page(page_zone(page), page, pfn, order, migratetype);
> @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
>   	if (!free_pages_prepare(page, 0))
>   		return;
>
> +	local_irq_save(flags);
>   	migratetype = get_pfnblock_migratetype(page, pfn);
>   	set_freepage_migratetype(page, migratetype);
> -	local_irq_save(flags);
>   	__count_vm_event(PGFREE);

Maybe add comments to these two to make it clear that this cannot be 
moved outside of the irq disabled part, in case anyone considers it 
(again) in the future?

> @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>   	 */
>
>   out:
> -	if (!ret) {
> -		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +	if (ret) {
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		return ret;
> +	}
>   on pcplists
> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	migratetype = get_pageblock_migratetype(page);
> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	spin_unlock_irqrestore(&zone->lock, flags);
>
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> -	}
> +	zone_pcp_disable(zone);
> +
> +	/*
> +	 * After this point, freed pages will see MIGRATE_ISOLATE as
> +	 * their pageblock migratetype on all cpus. And pcp list has
> +	 * no free page.
> +	 */
> +	on_each_cpu(drain_local_pages, NULL, 1);

Is there any difference between drain_all_pages() and this, or why 
didn't you use drain_all_pages()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-07 15:15     ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 15:15 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> Current pageblock isolation logic has a problem that results in incorrect
> freepage counting. move_freepages_block() doesn't return number of
> moved pages so freepage count could be wrong if some pages are freed
> inbetween set_pageblock_migratetype() and move_freepages_block(). Although
> we fix move_freepages_block() to return number of moved pages, the problem

     ^ could

> wouldn't be fixed completely because buddy allocator doesn't care if merged
> pages are on different buddy list or not. If some page on normal buddy list
> is merged with isolated page and moved to isolate buddy list, freepage
> count should be subtracted, but, it didn't and can't now.

... but it's not done now and doing that would impose unwanted overhead 
on buddy merging.

Also the analogous problem exists when undoing isolation?

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   include/linux/page-isolation.h |    2 +
>   mm/internal.h                  |    3 ++
>   mm/page_alloc.c                |   28 ++++++-----
>   mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
>   4 files changed, 118 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 3fff8e7..3dd39fe 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
>   }
>   #endif
>
> +void deactivate_isolated_page(struct zone *zone, struct page *page,
> +				unsigned int order);
>   bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>   			 bool skip_hwpoisoned_pages);
>   void set_pageblock_migratetype(struct page *page, int migratetype);
> diff --git a/mm/internal.h b/mm/internal.h
> index 81b8884..c70750a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>    */
>   extern void zone_pcp_disable(struct zone *zone);
>   extern void zone_pcp_enable(struct zone *zone);
> +extern void __free_one_page(struct page *page, unsigned long pfn,
> +		struct zone *zone, unsigned int order,
> +		int migratetype);
>   extern void __free_pages_bootmem(struct page *page, unsigned int order);
>   extern void prep_compound_page(struct page *page, unsigned long order);
>   #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4517b1d..82da4a8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>    * -- nyc
>    */
>
> -static inline void __free_one_page(struct page *page,
> +void __free_one_page(struct page *page,
>   		unsigned long pfn,
>   		struct zone *zone, unsigned int order,
>   		int migratetype)
> @@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
>   				int migratetype)
>   {
>   	unsigned long nr_scanned;
> +
> +	if (unlikely(is_migrate_isolate(migratetype))) {
> +		deactivate_isolated_page(zone, page, order);
> +		return;
> +	}
> +

This would be more effectively done in the callers, which is where 
migratetype is determined - there are two:
- free_hot_cold_page() already has this test, so just call deactivation
   instead of free_one_page() - one test less in this path!
- __free_pages_ok() could add the test to call deactivation, and since 
you remove another test in the hunk below, the net result is the same in 
this path.

> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -9,6 +9,75 @@
>   #include <linux/hugetlb.h>
>   #include "internal.h"
>
> +#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
> +
> +static inline int PageIsolated(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}

Hmm wasn't the convention for atomic updates to be without the __ prefix?



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-07 15:15     ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-07 15:15 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Aneesh Kumar K.V, Ritesh Harjani, t.stanislaws,
	Gioh Kim, linux-mm, linux-kernel

On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> Current pageblock isolation logic has a problem that results in incorrect
> freepage counting. move_freepages_block() doesn't return number of
> moved pages so freepage count could be wrong if some pages are freed
> inbetween set_pageblock_migratetype() and move_freepages_block(). Although
> we fix move_freepages_block() to return number of moved pages, the problem

     ^ could

> wouldn't be fixed completely because buddy allocator doesn't care if merged
> pages are on different buddy list or not. If some page on normal buddy list
> is merged with isolated page and moved to isolate buddy list, freepage
> count should be subtracted, but, it didn't and can't now.

... but it's not done now and doing that would impose unwanted overhead 
on buddy merging.

Also the analogous problem exists when undoing isolation?

> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>   include/linux/page-isolation.h |    2 +
>   mm/internal.h                  |    3 ++
>   mm/page_alloc.c                |   28 ++++++-----
>   mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
>   4 files changed, 118 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 3fff8e7..3dd39fe 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
>   }
>   #endif
>
> +void deactivate_isolated_page(struct zone *zone, struct page *page,
> +				unsigned int order);
>   bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>   			 bool skip_hwpoisoned_pages);
>   void set_pageblock_migratetype(struct page *page, int migratetype);
> diff --git a/mm/internal.h b/mm/internal.h
> index 81b8884..c70750a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>    */
>   extern void zone_pcp_disable(struct zone *zone);
>   extern void zone_pcp_enable(struct zone *zone);
> +extern void __free_one_page(struct page *page, unsigned long pfn,
> +		struct zone *zone, unsigned int order,
> +		int migratetype);
>   extern void __free_pages_bootmem(struct page *page, unsigned int order);
>   extern void prep_compound_page(struct page *page, unsigned long order);
>   #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4517b1d..82da4a8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>    * -- nyc
>    */
>
> -static inline void __free_one_page(struct page *page,
> +void __free_one_page(struct page *page,
>   		unsigned long pfn,
>   		struct zone *zone, unsigned int order,
>   		int migratetype)
> @@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
>   				int migratetype)
>   {
>   	unsigned long nr_scanned;
> +
> +	if (unlikely(is_migrate_isolate(migratetype))) {
> +		deactivate_isolated_page(zone, page, order);
> +		return;
> +	}
> +

This would be more effectively done in the callers, which is where 
migratetype is determined - there are two:
- free_hot_cold_page() already has this test, so just call deactivation
   instead of free_one_page() - one test less in this path!
- __free_pages_ok() could add the test to call deactivation, and since 
you remove another test in the hunk below, the net result is the same in 
this path.

> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -9,6 +9,75 @@
>   #include <linux/hugetlb.h>
>   #include "internal.h"
>
> +#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
> +
> +static inline int PageIsolated(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}

Hmm wasn't the convention for atomic updates to be without the __ prefix?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
  2014-08-07 13:49     ` Vlastimil Babka
@ 2014-08-08  6:22       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-08  6:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 03:49:17PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >The check '!PageBuddy(page) && page_count(page) == 0 &&
> >migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
> 
> What is "the page on free processing"? I thought this test means the
> page is on some CPU's pcplist?

Yes, you are right.

> 
> >Although it could go into buddy allocator within a short time,
> >futher operation such as isolate_freepages_range() in CMA, called after
> >test_page_isolated_in_pageblock(), could be failed due to this unstability
> 
> By "unstability" you mean the page can be allocated again from the
> pcplist instead of being freed to buddy list?

Yes.

> >since it requires that the page is on buddy. I think that removing
> >this unstability is good thing.
> >
> >And, following patch makes isolated freepage has new status matched with
> >this condition and this check is the obstacle to that change. So remove
> >it.
> 
> You could also say that pages from isolated pageblocks can no longer
> appear on pcplists after the later patches.

Okay. I will do it.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
@ 2014-08-08  6:22       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-08  6:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 03:49:17PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >The check '!PageBuddy(page) && page_count(page) == 0 &&
> >migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
> 
> What is "the page on free processing"? I thought this test means the
> page is on some CPU's pcplist?

Yes, you are right.

> 
> >Although it could go into buddy allocator within a short time,
> >futher operation such as isolate_freepages_range() in CMA, called after
> >test_page_isolated_in_pageblock(), could be failed due to this unstability
> 
> By "unstability" you mean the page can be allocated again from the
> pcplist instead of being freed to buddy list?

Yes.

> >since it requires that the page is on buddy. I think that removing
> >this unstability is good thing.
> >
> >And, following patch makes isolated freepage has new status matched with
> >this condition and this check is the obstacle to that change. So remove
> >it.
> 
> You could also say that pages from isolated pageblocks can no longer
> appear on pcplists after the later patches.

Okay. I will do it.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-07 14:34     ` Vlastimil Babka
@ 2014-08-08  6:30       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-08  6:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 04:34:41PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >We got migratetype of the freeing page without holding the zone lock so
> >it could be racy. There are two cases of this race.
> >
> >1. pages are added to isolate buddy list after restoring original
> >migratetype.
> >2. pages are added to normal buddy list while pageblock is isolated.
> >
> >If case 1 happens, we can't allocate freepages on isolate buddy list
> >until next pageblock isolation occurs.
> >In case of 2, pages could be merged with pages on isolate buddy list and
> >located on normal buddy list. This makes freepage counting incorrect
> >and break the property of pageblock isolation.
> >
> >One solution to this problem is checking pageblock migratetype with
> >holding zone lock in __free_one_page() and I posted it before, but,
> >it didn't get welcome since it needs the hook in zone lock critical
> >section on freepath.
> >
> >This is another solution to this problem and impose most overhead on
> >pageblock isolation logic. Following is how this solution works.
> >
> >1. Extends irq disabled period on freepath to call
> >get_pfnblock_migratetype() with irq disabled. With this, we can be
> >sure that future freed pages will see modified pageblock migratetype
> >after certain synchronization point so we don't need to hold the zone
> >lock to get correct pageblock migratetype. Although it extends irq
> >disabled period on freepath, I guess it is marginal and better than
> >adding the hook in zone lock critical section.
> >
> >2. #1 requires IPI for synchronization and we can't hold the zone lock
> 
> It would be better to explain here that the synchronization point is
> pcplists draining.

Okay.

> 
> >during processing IPI. In this time, some pages could be moved from buddy
> >list to pcp list on page allocation path and later it could be moved again
> >from pcp list to buddy list. In this time, this page would be on isolate
> 
> It is difficult to understand the problem just by reading this. I
> guess the timelines you included while explaining the problem to me,
> would help here :)

Okay.

> >pageblock, so, the hook is required on free_pcppages_bulk() to prevent
> 
> More clearly, a recheck for pageblock's migratetype would be needed
> in free_pcppages_bulk(), which would again impose overhead outside
> isolation.

Thanks. I will replace above line with yours. :)

> >misplacement. To remove this possibility, disabling and draining pcp
> >list is needed during isolation. It guaratees that there is no page on pcp
> >list on all cpus while isolation, so misplacement problem can't happen.
> >
> >Note that this doesn't fix freepage counting problem. To fix it,
> >we need more logic. Following patches will do it.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> >  mm/internal.h       |    2 ++
> >  mm/page_alloc.c     |   27 ++++++++++++++++++++-------
> >  mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
> >  3 files changed, 55 insertions(+), 19 deletions(-)
> >
> >diff --git a/mm/internal.h b/mm/internal.h
> >index a1b651b..81b8884 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
> >  /*
> >   * in mm/page_alloc.c
> >   */
> >+extern void zone_pcp_disable(struct zone *zone);
> >+extern void zone_pcp_enable(struct zone *zone);
> >  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> >  extern void prep_compound_page(struct page *page, unsigned long order);
> >  #ifdef CONFIG_MEMORY_FAILURE
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 3e1e344..4517b1d 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> >  			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> >  			trace_mm_page_pcpu_drain(page, 0, mt);
> >-			if (likely(!is_migrate_isolate_page(page))) {
> >-				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> >-				if (is_migrate_cma(mt))
> >-					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> >-			}
> >+			__mod_zone_freepage_state(zone, 1, mt);
> 
> Could be worth mentioning that this can now be removed as it was an
> incomplete attempt to fix freepage counting, but didn't address the
> misplacement.

Okay. I will mention it.

> >  		} while (--to_free && --batch_free && !list_empty(list));
> >  	}
> >  	spin_unlock(&zone->lock);
> >@@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  	if (!free_pages_prepare(page, order))
> >  		return;
> >
> >-	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	local_irq_save(flags);
> >+	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	__count_vm_events(PGFREE, 1 << order);
> >  	set_freepage_migratetype(page, migratetype);
> >  	free_one_page(page_zone(page), page, pfn, order, migratetype);
> >@@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
> >  	if (!free_pages_prepare(page, 0))
> >  		return;
> >
> >+	local_irq_save(flags);
> >  	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	set_freepage_migratetype(page, migratetype);
> >-	local_irq_save(flags);
> >  	__count_vm_event(PGFREE);
> 
> Maybe add comments to these two to make it clear that this cannot be
> moved outside of the irq disabled part, in case anyone considers it
> (again) in the future?

Okay.

> 
> >@@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >  	 */
> >
> >  out:
> >-	if (!ret) {
> >-		unsigned long nr_pages;
> >-		int migratetype = get_pageblock_migratetype(page);
> >+	if (ret) {
> >+		spin_unlock_irqrestore(&zone->lock, flags);
> >+		return ret;
> >+	}
> >  on pcplists
> >-		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> >-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> >+	migratetype = get_pageblock_migratetype(page);
> >+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> >+	spin_unlock_irqrestore(&zone->lock, flags);
> >
> >-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> >-	}
> >+	zone_pcp_disable(zone);
> >+
> >+	/*
> >+	 * After this point, freed pages will see MIGRATE_ISOLATE as
> >+	 * their pageblock migratetype on all cpus. And pcp list has
> >+	 * no free page.
> >+	 */
> >+	on_each_cpu(drain_local_pages, NULL, 1);
> 
> Is there any difference between drain_all_pages() and this, or why
> didn't you use drain_all_pages()?

Yes, there is some difference. What we need here is not only to drain
pages on pcplist but also to synchronize memory on every CPUs. Because
drain_all_pages() send IPI only to CPUs having pages on pcplist, we
cannot be sure that all CPUs are synchronized. So I do it in this way.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-08  6:30       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-08  6:30 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 04:34:41PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >We got migratetype of the freeing page without holding the zone lock so
> >it could be racy. There are two cases of this race.
> >
> >1. pages are added to isolate buddy list after restoring original
> >migratetype.
> >2. pages are added to normal buddy list while pageblock is isolated.
> >
> >If case 1 happens, we can't allocate freepages on isolate buddy list
> >until next pageblock isolation occurs.
> >In case of 2, pages could be merged with pages on isolate buddy list and
> >located on normal buddy list. This makes freepage counting incorrect
> >and break the property of pageblock isolation.
> >
> >One solution to this problem is checking pageblock migratetype with
> >holding zone lock in __free_one_page() and I posted it before, but,
> >it didn't get welcome since it needs the hook in zone lock critical
> >section on freepath.
> >
> >This is another solution to this problem and impose most overhead on
> >pageblock isolation logic. Following is how this solution works.
> >
> >1. Extends irq disabled period on freepath to call
> >get_pfnblock_migratetype() with irq disabled. With this, we can be
> >sure that future freed pages will see modified pageblock migratetype
> >after certain synchronization point so we don't need to hold the zone
> >lock to get correct pageblock migratetype. Although it extends irq
> >disabled period on freepath, I guess it is marginal and better than
> >adding the hook in zone lock critical section.
> >
> >2. #1 requires IPI for synchronization and we can't hold the zone lock
> 
> It would be better to explain here that the synchronization point is
> pcplists draining.

Okay.

> 
> >during processing IPI. In this time, some pages could be moved from buddy
> >list to pcp list on page allocation path and later it could be moved again
> >from pcp list to buddy list. In this time, this page would be on isolate
> 
> It is difficult to understand the problem just by reading this. I
> guess the timelines you included while explaining the problem to me,
> would help here :)

Okay.

> >pageblock, so, the hook is required on free_pcppages_bulk() to prevent
> 
> More clearly, a recheck for pageblock's migratetype would be needed
> in free_pcppages_bulk(), which would again impose overhead outside
> isolation.

Thanks. I will replace above line with yours. :)

> >misplacement. To remove this possibility, disabling and draining pcp
> >list is needed during isolation. It guaratees that there is no page on pcp
> >list on all cpus while isolation, so misplacement problem can't happen.
> >
> >Note that this doesn't fix freepage counting problem. To fix it,
> >we need more logic. Following patches will do it.
> >
> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> >  mm/internal.h       |    2 ++
> >  mm/page_alloc.c     |   27 ++++++++++++++++++++-------
> >  mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
> >  3 files changed, 55 insertions(+), 19 deletions(-)
> >
> >diff --git a/mm/internal.h b/mm/internal.h
> >index a1b651b..81b8884 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
> >  /*
> >   * in mm/page_alloc.c
> >   */
> >+extern void zone_pcp_disable(struct zone *zone);
> >+extern void zone_pcp_enable(struct zone *zone);
> >  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> >  extern void prep_compound_page(struct page *page, unsigned long order);
> >  #ifdef CONFIG_MEMORY_FAILURE
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 3e1e344..4517b1d 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> >  			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> >  			trace_mm_page_pcpu_drain(page, 0, mt);
> >-			if (likely(!is_migrate_isolate_page(page))) {
> >-				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> >-				if (is_migrate_cma(mt))
> >-					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> >-			}
> >+			__mod_zone_freepage_state(zone, 1, mt);
> 
> Could be worth mentioning that this can now be removed as it was an
> incomplete attempt to fix freepage counting, but didn't address the
> misplacement.

Okay. I will mention it.

> >  		} while (--to_free && --batch_free && !list_empty(list));
> >  	}
> >  	spin_unlock(&zone->lock);
> >@@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  	if (!free_pages_prepare(page, order))
> >  		return;
> >
> >-	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	local_irq_save(flags);
> >+	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	__count_vm_events(PGFREE, 1 << order);
> >  	set_freepage_migratetype(page, migratetype);
> >  	free_one_page(page_zone(page), page, pfn, order, migratetype);
> >@@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
> >  	if (!free_pages_prepare(page, 0))
> >  		return;
> >
> >+	local_irq_save(flags);
> >  	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	set_freepage_migratetype(page, migratetype);
> >-	local_irq_save(flags);
> >  	__count_vm_event(PGFREE);
> 
> Maybe add comments to these two to make it clear that this cannot be
> moved outside of the irq disabled part, in case anyone considers it
> (again) in the future?

Okay.

> 
> >@@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >  	 */
> >
> >  out:
> >-	if (!ret) {
> >-		unsigned long nr_pages;
> >-		int migratetype = get_pageblock_migratetype(page);
> >+	if (ret) {
> >+		spin_unlock_irqrestore(&zone->lock, flags);
> >+		return ret;
> >+	}
> >  on pcplists
> >-		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> >-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> >+	migratetype = get_pageblock_migratetype(page);
> >+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> >+	spin_unlock_irqrestore(&zone->lock, flags);
> >
> >-		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> >-	}
> >+	zone_pcp_disable(zone);
> >+
> >+	/*
> >+	 * After this point, freed pages will see MIGRATE_ISOLATE as
> >+	 * their pageblock migratetype on all cpus. And pcp list has
> >+	 * no free page.
> >+	 */
> >+	on_each_cpu(drain_local_pages, NULL, 1);
> 
> Is there any difference between drain_all_pages() and this, or why
> didn't you use drain_all_pages()?

Yes, there is some difference. What we need here is not only to drain
pages on pcplist but also to synchronize memory on every CPUs. Because
drain_all_pages() send IPI only to CPUs having pages on pcplist, we
cannot be sure that all CPUs are synchronized. So I do it in this way.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-07 15:15     ` Vlastimil Babka
@ 2014-08-08  6:45       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-08  6:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 05:15:17PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >Current pageblock isolation logic has a problem that results in incorrect
> >freepage counting. move_freepages_block() doesn't return number of
> >moved pages so freepage count could be wrong if some pages are freed
> >inbetween set_pageblock_migratetype() and move_freepages_block(). Although
> >we fix move_freepages_block() to return number of moved pages, the problem
> 
>     ^ could

Yes, but fixing that is not needed because this patch changes
isolation process and, after that, that behaviour have any problem.

> 
> >wouldn't be fixed completely because buddy allocator doesn't care if merged
> >pages are on different buddy list or not. If some page on normal buddy list
> >is merged with isolated page and moved to isolate buddy list, freepage
> >count should be subtracted, but, it didn't and can't now.
> 
> ... but it's not done now and doing that would impose unwanted
> overhead on buddy merging.

Yes, we don't want more overhead on buddy merging so this patch
introduces PageIsolated() in order to avoid merge problem.

> Also the analogous problem exists when undoing isolation?

There is no merge problem in new (un)isolation process of this patch except
for the page more than pageblock order. This case will be fixed in patch 7.

> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> >  include/linux/page-isolation.h |    2 +
> >  mm/internal.h                  |    3 ++
> >  mm/page_alloc.c                |   28 ++++++-----
> >  mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 118 insertions(+), 22 deletions(-)
> >
> >diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> >index 3fff8e7..3dd39fe 100644
> >--- a/include/linux/page-isolation.h
> >+++ b/include/linux/page-isolation.h
> >@@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
> >  }
> >  #endif
> >
> >+void deactivate_isolated_page(struct zone *zone, struct page *page,
> >+				unsigned int order);
> >  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> >  			 bool skip_hwpoisoned_pages);
> >  void set_pageblock_migratetype(struct page *page, int migratetype);
> >diff --git a/mm/internal.h b/mm/internal.h
> >index 81b8884..c70750a 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
> >   */
> >  extern void zone_pcp_disable(struct zone *zone);
> >  extern void zone_pcp_enable(struct zone *zone);
> >+extern void __free_one_page(struct page *page, unsigned long pfn,
> >+		struct zone *zone, unsigned int order,
> >+		int migratetype);
> >  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> >  extern void prep_compound_page(struct page *page, unsigned long order);
> >  #ifdef CONFIG_MEMORY_FAILURE
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 4517b1d..82da4a8 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
> >   * -- nyc
> >   */
> >
> >-static inline void __free_one_page(struct page *page,
> >+void __free_one_page(struct page *page,
> >  		unsigned long pfn,
> >  		struct zone *zone, unsigned int order,
> >  		int migratetype)
> >@@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
> >  				int migratetype)
> >  {
> >  	unsigned long nr_scanned;
> >+
> >+	if (unlikely(is_migrate_isolate(migratetype))) {
> >+		deactivate_isolated_page(zone, page, order);
> >+		return;
> >+	}
> >+
> 
> This would be more effectively done in the callers, which is where
> migratetype is determined - there are two:
> - free_hot_cold_page() already has this test, so just call deactivation
>   instead of free_one_page() - one test less in this path!
> - __free_pages_ok() could add the test to call deactivation, and
> since you remove another test in the hunk below, the net result is
> the same in this path.

Okay. Will do.

> >--- a/mm/page_isolation.c
> >+++ b/mm/page_isolation.c
> >@@ -9,6 +9,75 @@
> >  #include <linux/hugetlb.h>
> >  #include "internal.h"
> >
> >+#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
> >+
> >+static inline int PageIsolated(struct page *page)
> >+{
> >+	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
> >+}
> >+
> >+static inline void __SetPageIsolated(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> >+	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
> >+}
> >+
> >+static inline void __ClearPageIsolated(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >+	atomic_set(&page->_mapcount, -1);
> >+}
> 
> Hmm wasn't the convention for atomic updates to be without the __ prefix?

I copy-and_paste code for PageBuddy(). :)
I guess that __ prefix here means that we should call it with holding
the zone lock. atomic operation is used to satisfy type definition of
page->_mapcount.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-08  6:45       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-08  6:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Thu, Aug 07, 2014 at 05:15:17PM +0200, Vlastimil Babka wrote:
> On 08/06/2014 09:18 AM, Joonsoo Kim wrote:
> >Current pageblock isolation logic has a problem that results in incorrect
> >freepage counting. move_freepages_block() doesn't return number of
> >moved pages so freepage count could be wrong if some pages are freed
> >inbetween set_pageblock_migratetype() and move_freepages_block(). Although
> >we fix move_freepages_block() to return number of moved pages, the problem
> 
>     ^ could

Yes, but fixing that is not needed because this patch changes
isolation process and, after that, that behaviour have any problem.

> 
> >wouldn't be fixed completely because buddy allocator doesn't care if merged
> >pages are on different buddy list or not. If some page on normal buddy list
> >is merged with isolated page and moved to isolate buddy list, freepage
> >count should be subtracted, but, it didn't and can't now.
> 
> ... but it's not done now and doing that would impose unwanted
> overhead on buddy merging.

Yes, we don't want more overhead on buddy merging so this patch
introduces PageIsolated() in order to avoid merge problem.

> Also the analogous problem exists when undoing isolation?

There is no merge problem in new (un)isolation process of this patch except
for the page more than pageblock order. This case will be fixed in patch 7.

> >Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> >---
> >  include/linux/page-isolation.h |    2 +
> >  mm/internal.h                  |    3 ++
> >  mm/page_alloc.c                |   28 ++++++-----
> >  mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 118 insertions(+), 22 deletions(-)
> >
> >diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> >index 3fff8e7..3dd39fe 100644
> >--- a/include/linux/page-isolation.h
> >+++ b/include/linux/page-isolation.h
> >@@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
> >  }
> >  #endif
> >
> >+void deactivate_isolated_page(struct zone *zone, struct page *page,
> >+				unsigned int order);
> >  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> >  			 bool skip_hwpoisoned_pages);
> >  void set_pageblock_migratetype(struct page *page, int migratetype);
> >diff --git a/mm/internal.h b/mm/internal.h
> >index 81b8884..c70750a 100644
> >--- a/mm/internal.h
> >+++ b/mm/internal.h
> >@@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
> >   */
> >  extern void zone_pcp_disable(struct zone *zone);
> >  extern void zone_pcp_enable(struct zone *zone);
> >+extern void __free_one_page(struct page *page, unsigned long pfn,
> >+		struct zone *zone, unsigned int order,
> >+		int migratetype);
> >  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> >  extern void prep_compound_page(struct page *page, unsigned long order);
> >  #ifdef CONFIG_MEMORY_FAILURE
> >diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >index 4517b1d..82da4a8 100644
> >--- a/mm/page_alloc.c
> >+++ b/mm/page_alloc.c
> >@@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
> >   * -- nyc
> >   */
> >
> >-static inline void __free_one_page(struct page *page,
> >+void __free_one_page(struct page *page,
> >  		unsigned long pfn,
> >  		struct zone *zone, unsigned int order,
> >  		int migratetype)
> >@@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
> >  				int migratetype)
> >  {
> >  	unsigned long nr_scanned;
> >+
> >+	if (unlikely(is_migrate_isolate(migratetype))) {
> >+		deactivate_isolated_page(zone, page, order);
> >+		return;
> >+	}
> >+
> 
> This would be more effectively done in the callers, which is where
> migratetype is determined - there are two:
> - free_hot_cold_page() already has this test, so just call deactivation
>   instead of free_one_page() - one test less in this path!
> - __free_pages_ok() could add the test to call deactivation, and
> since you remove another test in the hunk below, the net result is
> the same in this path.

Okay. Will do.

> >--- a/mm/page_isolation.c
> >+++ b/mm/page_isolation.c
> >@@ -9,6 +9,75 @@
> >  #include <linux/hugetlb.h>
> >  #include "internal.h"
> >
> >+#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
> >+
> >+static inline int PageIsolated(struct page *page)
> >+{
> >+	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
> >+}
> >+
> >+static inline void __SetPageIsolated(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> >+	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
> >+}
> >+
> >+static inline void __ClearPageIsolated(struct page *page)
> >+{
> >+	VM_BUG_ON_PAGE(!PageIsolated(page), page);
> >+	atomic_set(&page->_mapcount, -1);
> >+}
> 
> Hmm wasn't the convention for atomic updates to be without the __ prefix?

I copy-and_paste code for PageBuddy(). :)
I guess that __ prefix here means that we should call it with holding
the zone lock. atomic operation is used to satisfy type definition of
page->_mapcount.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-11  9:23     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 84+ messages in thread
From: Aneesh Kumar K.V @ 2014-08-11  9:23 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm,
	linux-kernel, Joonsoo Kim

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> The check '!PageBuddy(page) && page_count(page) == 0 &&
> migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
> Although it could go into buddy allocator within a short time,
> futher operation such as isolate_freepages_range() in CMA, called after
> test_page_isolated_in_pageblock(), could be failed due to this unstability
> since it requires that the page is on buddy. I think that removing
> this unstability is good thing.

Is that true in case of check_pages_isolated_cb ? Does that require
PageBuddy to be true ?

>
> And, following patch makes isolated freepage has new status matched with
> this condition and this check is the obstacle to that change. So remove
> it.

Can you quote the patch summary in the above case ? ie, something like

And the followiing patch "mm/....." makes isolate freepage.


-aneesh


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
@ 2014-08-11  9:23     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 84+ messages in thread
From: Aneesh Kumar K.V @ 2014-08-11  9:23 UTC (permalink / raw)
  To: Joonsoo Kim, Andrew Morton
  Cc: Kirill A. Shutemov, Rik van Riel, Mel Gorman, Johannes Weiner,
	Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei, Srivatsa S. Bhat,
	Tang Chen, Naoya Horiguchi, Bartlomiej Zolnierkiewicz,
	Wen Congyang, Marek Szyprowski, Michal Nazarewicz, Laura Abbott,
	Heesub Shin, Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm,
	linux-kernel

Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:

> The check '!PageBuddy(page) && page_count(page) == 0 &&
> migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
> Although it could go into buddy allocator within a short time,
> futher operation such as isolate_freepages_range() in CMA, called after
> test_page_isolated_in_pageblock(), could be failed due to this unstability
> since it requires that the page is on buddy. I think that removing
> this unstability is good thing.

Is that true in case of check_pages_isolated_cb ? Does that require
PageBuddy to be true ?

>
> And, following patch makes isolated freepage has new status matched with
> this condition and this check is the obstacle to that change. So remove
> it.

Can you quote the patch summary in the above case ? ie, something like

And the followiing patch "mm/....." makes isolate freepage.


-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-12  1:24     ` Minchan Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  1:24 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, tglx, cody,
	linux-kernel

Hey Joonsoo,

On Wed, Aug 06, 2014 at 04:18:28PM +0900, Joonsoo Kim wrote:
> per cpu pages structure, aka pcp, has high and batch values to control
> how many pages we perform caching. This values could be updated
> asynchronously and updater should ensure that this doesn't make any
> problem. For this purpose, pageset_update() is implemented and do some
> memory synchronization. But, it turns out to be wrong when I implemented
> new feature using this. There is no corresponding smp_rmb() in read-side
> so that it can't guarantee anything. Without correct updating, system
> could hang in free_pcppages_bulk() due to larger batch value than high.
> To properly update this values, we need to synchronization primitives on
> read-side, but, it hurts allocator's fastpath.
> 
> There is another choice for synchronization, that is, sending IPI. This
> is somewhat expensive, but, this is really rare case so I guess it has
> no problem here. However, reducing IPI is very helpful here. Current
> logic handles each CPU's pcp update one by one. To reduce sending IPI,
> we need to re-ogranize the code to handle all CPU's pcp update at one go.
> This patch implement these requirements.

Let's add right reviewer for the patch.
Cced Cody and Thomas.

> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b99643d4..44672dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
>   * not check if the processor is online before following the pageset pointer.
>   * Other parts of the kernel may not check if the zone is available.
>   */
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
>  static void setup_zone_pageset(struct zone *zone);
>  
> @@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
>  	 * needs the percpu allocator in order to allocate its pagesets
>  	 * (a chicken-egg dilemma).
>  	 */
> -	for_each_possible_cpu(cpu) {
> -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> +	setup_pageset(&boot_pageset);
>  
> +	for_each_possible_cpu(cpu) {
>  #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>  		/*
>  		 * We now know the "local memory node" for each node--
> @@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
>   * outside of boot time (or some other assurance that no concurrent updaters
>   * exist).
>   */
> -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> -		unsigned long batch)
> +static void pageset_update(struct zone *zone, int high, int batch)
>  {
> -       /* start with a fail safe value for batch */
> -	pcp->batch = 1;
> -	smp_wmb();
> +	int cpu;
> +	struct per_cpu_pages *pcp;
> +
> +	/* start with a fail safe value for batch */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = 1;
> +	}
> +	kick_all_cpus_sync();
> +
> +	/* Update high, then batch, in order */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->high = high;
> +	}
> +	kick_all_cpus_sync();
>  
> -       /* Update high, then batch, in order */
> -	pcp->high = high;
> -	smp_wmb();
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = batch;
> +	}
> +}
> +
> +/*
> + * pageset_get_values_by_high() gets the high water mark for
> + * hot per_cpu_pagelist to the value high for the pageset p.
> + */
> +static void pageset_get_values_by_high(int input_high,
> +				int *output_high, int *output_batch)

You don't use output_high so we could make it as follows,

int pageset_batch(int high);

> +{
> +	*output_batch = max(1, input_high / 4);
> +	if ((input_high / 4) > (PAGE_SHIFT * 8))
> +		*output_batch = PAGE_SHIFT * 8;
> +}
>  
> -	pcp->batch = batch;
> +/* a companion to pageset_get_values_by_high() */
> +static void pageset_get_values_by_batch(int input_batch,
> +				int *output_high, int *output_batch)
> +{
> +	*output_high = 6 * input_batch;
> +	*output_batch = max(1, 1 * input_batch);
>  }
>  
> -/* a companion to pageset_set_high() */
> -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +static void pageset_get_values(struct zone *zone, int *high, int *batch)
>  {
> -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> +	if (percpu_pagelist_fraction) {
> +		pageset_get_values_by_high(
> +			(zone->managed_pages / percpu_pagelist_fraction),
> +			high, batch);
> +	} else
> +		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
>  }
>  
>  static void pageset_init(struct per_cpu_pageset *p)
> @@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
>  		INIT_LIST_HEAD(&pcp->lists[migratetype]);
>  }
>  
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> +/* Use this only in boot time, because it doesn't do any synchronization */
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)

If we can use it with only boot_pages in boot time, let's make it more clear.

static void boot_setup_pageset(void)
{
	boot_pageset;
	XXX;
}


}
>  {
> -	pageset_init(p);
> -	pageset_set_batch(p, batch);
> -}
> -
> -/*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> - */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> -				unsigned long high)
> -{
> -	unsigned long batch = max(1UL, high / 4);
> -	if ((high / 4) > (PAGE_SHIFT * 8))
> -		batch = PAGE_SHIFT * 8;
> -
> -	pageset_update(&p->pcp, high, batch);
> -}
> -
> -static void pageset_set_high_and_batch(struct zone *zone,
> -				       struct per_cpu_pageset *pcp)
> -{
> -	if (percpu_pagelist_fraction)
> -		pageset_set_high(pcp,
> -			(zone->managed_pages /
> -				percpu_pagelist_fraction));
> -	else
> -		pageset_set_batch(pcp, zone_batchsize(zone));
> -}
> +	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
>  
> -static void __meminit zone_pageset_init(struct zone *zone, int cpu)
> -{
> -	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
> +	pageset_get_values_by_batch(0, &high, &batch);
>  
> -	pageset_init(pcp);
> -	pageset_set_high_and_batch(zone, pcp);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(pcp, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  static void __meminit setup_zone_pageset(struct zone *zone)
>  {
>  	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
> +
> +	pageset_get_values(zone, &high, &batch);
> +
>  	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> -	for_each_possible_cpu(cpu)
> -		zone_pageset_init(zone, cpu);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(zone->pageset, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  /*
> @@ -5925,11 +5947,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
>  		goto out;
>  
>  	for_each_populated_zone(zone) {
> -		unsigned int cpu;
> +		int high, batch;
>  
> -		for_each_possible_cpu(cpu)
> -			pageset_set_high_and_batch(zone,
> -					per_cpu_ptr(zone->pageset, cpu));
> +		pageset_get_values(zone, &high, &batch);
> +		pageset_update(zone, high, batch);
>  	}
>  out:
>  	mutex_unlock(&pcp_batch_high_lock);
> @@ -6452,11 +6473,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>   */
>  void __meminit zone_pcp_update(struct zone *zone)
>  {
> -	unsigned cpu;
> +	int high, batch;
> +
>  	mutex_lock(&pcp_batch_high_lock);
> -	for_each_possible_cpu(cpu)
> -		pageset_set_high_and_batch(zone,
> -				per_cpu_ptr(zone->pageset, cpu));
> +	pageset_get_values(zone, &high, &batch);
> +	pageset_update(zone, high, batch);
>  	mutex_unlock(&pcp_batch_high_lock);
>  }
>  #endif
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
@ 2014-08-12  1:24     ` Minchan Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  1:24 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, tglx, cody,
	linux-kernel

Hey Joonsoo,

On Wed, Aug 06, 2014 at 04:18:28PM +0900, Joonsoo Kim wrote:
> per cpu pages structure, aka pcp, has high and batch values to control
> how many pages we perform caching. This values could be updated
> asynchronously and updater should ensure that this doesn't make any
> problem. For this purpose, pageset_update() is implemented and do some
> memory synchronization. But, it turns out to be wrong when I implemented
> new feature using this. There is no corresponding smp_rmb() in read-side
> so that it can't guarantee anything. Without correct updating, system
> could hang in free_pcppages_bulk() due to larger batch value than high.
> To properly update this values, we need to synchronization primitives on
> read-side, but, it hurts allocator's fastpath.
> 
> There is another choice for synchronization, that is, sending IPI. This
> is somewhat expensive, but, this is really rare case so I guess it has
> no problem here. However, reducing IPI is very helpful here. Current
> logic handles each CPU's pcp update one by one. To reduce sending IPI,
> we need to re-ogranize the code to handle all CPU's pcp update at one go.
> This patch implement these requirements.

Let's add right reviewer for the patch.
Cced Cody and Thomas.

> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b99643d4..44672dc 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
>   * not check if the processor is online before following the pageset pointer.
>   * Other parts of the kernel may not check if the zone is available.
>   */
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
>  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
>  static void setup_zone_pageset(struct zone *zone);
>  
> @@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
>  	 * needs the percpu allocator in order to allocate its pagesets
>  	 * (a chicken-egg dilemma).
>  	 */
> -	for_each_possible_cpu(cpu) {
> -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> +	setup_pageset(&boot_pageset);
>  
> +	for_each_possible_cpu(cpu) {
>  #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>  		/*
>  		 * We now know the "local memory node" for each node--
> @@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
>   * outside of boot time (or some other assurance that no concurrent updaters
>   * exist).
>   */
> -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> -		unsigned long batch)
> +static void pageset_update(struct zone *zone, int high, int batch)
>  {
> -       /* start with a fail safe value for batch */
> -	pcp->batch = 1;
> -	smp_wmb();
> +	int cpu;
> +	struct per_cpu_pages *pcp;
> +
> +	/* start with a fail safe value for batch */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = 1;
> +	}
> +	kick_all_cpus_sync();
> +
> +	/* Update high, then batch, in order */
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->high = high;
> +	}
> +	kick_all_cpus_sync();
>  
> -       /* Update high, then batch, in order */
> -	pcp->high = high;
> -	smp_wmb();
> +	for_each_possible_cpu(cpu) {
> +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> +		pcp->batch = batch;
> +	}
> +}
> +
> +/*
> + * pageset_get_values_by_high() gets the high water mark for
> + * hot per_cpu_pagelist to the value high for the pageset p.
> + */
> +static void pageset_get_values_by_high(int input_high,
> +				int *output_high, int *output_batch)

You don't use output_high so we could make it as follows,

int pageset_batch(int high);

> +{
> +	*output_batch = max(1, input_high / 4);
> +	if ((input_high / 4) > (PAGE_SHIFT * 8))
> +		*output_batch = PAGE_SHIFT * 8;
> +}
>  
> -	pcp->batch = batch;
> +/* a companion to pageset_get_values_by_high() */
> +static void pageset_get_values_by_batch(int input_batch,
> +				int *output_high, int *output_batch)
> +{
> +	*output_high = 6 * input_batch;
> +	*output_batch = max(1, 1 * input_batch);
>  }
>  
> -/* a companion to pageset_set_high() */
> -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> +static void pageset_get_values(struct zone *zone, int *high, int *batch)
>  {
> -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> +	if (percpu_pagelist_fraction) {
> +		pageset_get_values_by_high(
> +			(zone->managed_pages / percpu_pagelist_fraction),
> +			high, batch);
> +	} else
> +		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
>  }
>  
>  static void pageset_init(struct per_cpu_pageset *p)
> @@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
>  		INIT_LIST_HEAD(&pcp->lists[migratetype]);
>  }
>  
> -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> +/* Use this only in boot time, because it doesn't do any synchronization */
> +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)

If we can use it with only boot_pages in boot time, let's make it more clear.

static void boot_setup_pageset(void)
{
	boot_pageset;
	XXX;
}


}
>  {
> -	pageset_init(p);
> -	pageset_set_batch(p, batch);
> -}
> -
> -/*
> - * pageset_set_high() sets the high water mark for hot per_cpu_pagelist
> - * to the value high for the pageset p.
> - */
> -static void pageset_set_high(struct per_cpu_pageset *p,
> -				unsigned long high)
> -{
> -	unsigned long batch = max(1UL, high / 4);
> -	if ((high / 4) > (PAGE_SHIFT * 8))
> -		batch = PAGE_SHIFT * 8;
> -
> -	pageset_update(&p->pcp, high, batch);
> -}
> -
> -static void pageset_set_high_and_batch(struct zone *zone,
> -				       struct per_cpu_pageset *pcp)
> -{
> -	if (percpu_pagelist_fraction)
> -		pageset_set_high(pcp,
> -			(zone->managed_pages /
> -				percpu_pagelist_fraction));
> -	else
> -		pageset_set_batch(pcp, zone_batchsize(zone));
> -}
> +	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
>  
> -static void __meminit zone_pageset_init(struct zone *zone, int cpu)
> -{
> -	struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
> +	pageset_get_values_by_batch(0, &high, &batch);
>  
> -	pageset_init(pcp);
> -	pageset_set_high_and_batch(zone, pcp);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(pcp, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  static void __meminit setup_zone_pageset(struct zone *zone)
>  {
>  	int cpu;
> +	int high, batch;
> +	struct per_cpu_pageset *p;
> +
> +	pageset_get_values(zone, &high, &batch);
> +
>  	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> -	for_each_possible_cpu(cpu)
> -		zone_pageset_init(zone, cpu);
> +	for_each_possible_cpu(cpu) {
> +		p = per_cpu_ptr(zone->pageset, cpu);
> +		pageset_init(p);
> +		p->pcp.high = high;
> +		p->pcp.batch = batch;
> +	}
>  }
>  
>  /*
> @@ -5925,11 +5947,10 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *table, int write,
>  		goto out;
>  
>  	for_each_populated_zone(zone) {
> -		unsigned int cpu;
> +		int high, batch;
>  
> -		for_each_possible_cpu(cpu)
> -			pageset_set_high_and_batch(zone,
> -					per_cpu_ptr(zone->pageset, cpu));
> +		pageset_get_values(zone, &high, &batch);
> +		pageset_update(zone, high, batch);
>  	}
>  out:
>  	mutex_unlock(&pcp_batch_high_lock);
> @@ -6452,11 +6473,11 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>   */
>  void __meminit zone_pcp_update(struct zone *zone)
>  {
> -	unsigned cpu;
> +	int high, batch;
> +
>  	mutex_lock(&pcp_batch_high_lock);
> -	for_each_possible_cpu(cpu)
> -		pageset_set_high_and_batch(zone,
> -				per_cpu_ptr(zone->pageset, cpu));
> +	pageset_get_values(zone, &high, &batch);
> +	pageset_update(zone, high, batch);
>  	mutex_unlock(&pcp_batch_high_lock);
>  }
>  #endif
> -- 
> 1.7.9.5
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-12  1:45     ` Minchan Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  1:45 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 04:18:30PM +0900, Joonsoo Kim wrote:
> In __free_one_page(), we check the buddy page if it is guard page.
> And, if so, we should clear guard attribute on the buddy page. But,
> currently, we clear original page's order rather than buddy one's.
> This doesn't have any problem, because resetting buddy's order
> is useless and the original page's order is re-assigned soon.
> But, it is better to correct code.
> 
> Additionally, I change (set/clear)_page_guard_flag() to
> (set/clear)_page_guard() and makes these functions do all works
> needed for guard page. This may make code more understandable.
> 
> One more thing, I did in this patch, is that fixing freepage accounting.
> If we clear guard page and link it onto isolate buddy list, we should
> not increase freepage count.

You are saying just "shouldn't do that" but don't say "why" and "result"
I know the reason but as you know, I'm one of the person who is rather
familiar with this part but I guess others should spend some time to get.
Kind detail description is never to look down on person. :)

> 

Nice catch, Joonsoo! But what make me worry is is this patch makes 3 thing
all at once.

1. fix - no candidate for stable
2. clean up
3. fix - candidate for stable.

Could you separate 3 and (1,2) in next spin?


> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |   29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 44672dc..3e1e344 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
>  }
>  __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
>  
> -static inline void set_page_guard_flag(struct page *page)
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, order);
> +	/* Guard pages are not available for any usage */
> +	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
>  }
>  
> -static inline void clear_page_guard_flag(struct page *page)
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, 0);
> +	if (!is_migrate_isolate(migratetype))
> +		__mod_zone_freepage_state(zone, (1 << order), migratetype);
>  }
>  #else
> -static inline void set_page_guard_flag(struct page *page) { }
> -static inline void clear_page_guard_flag(struct page *page) { }
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
>  #endif
>  
>  static inline void set_page_order(struct page *page, unsigned int order)
> @@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
>  		 * merge with it and move up one order.
>  		 */
>  		if (page_is_guard(buddy)) {
> -			clear_page_guard_flag(buddy);
> -			set_page_private(page, 0);
> -			__mod_zone_freepage_state(zone, 1 << order,
> -						  migratetype);
> +			clear_page_guard(zone, buddy, order, migratetype);
>  		} else {
>  			list_del(&buddy->lru);
>  			zone->free_area[order].nr_free--;
> @@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
>  			 * pages will stay not present in virtual address space
>  			 */
>  			INIT_LIST_HEAD(&page[size].lru);
> -			set_page_guard_flag(&page[size]);
> -			set_page_private(&page[size], high);
> -			/* Guard pages are not available for any usage */
> -			__mod_zone_freepage_state(zone, -(1 << high),
> -						  migratetype);
> +			set_page_guard(zone, &page[size], high, migratetype);
>  			continue;
>  		}
>  #endif
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
@ 2014-08-12  1:45     ` Minchan Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  1:45 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 04:18:30PM +0900, Joonsoo Kim wrote:
> In __free_one_page(), we check the buddy page if it is guard page.
> And, if so, we should clear guard attribute on the buddy page. But,
> currently, we clear original page's order rather than buddy one's.
> This doesn't have any problem, because resetting buddy's order
> is useless and the original page's order is re-assigned soon.
> But, it is better to correct code.
> 
> Additionally, I change (set/clear)_page_guard_flag() to
> (set/clear)_page_guard() and makes these functions do all works
> needed for guard page. This may make code more understandable.
> 
> One more thing, I did in this patch, is that fixing freepage accounting.
> If we clear guard page and link it onto isolate buddy list, we should
> not increase freepage count.

You are saying just "shouldn't do that" but don't say "why" and "result"
I know the reason but as you know, I'm one of the person who is rather
familiar with this part but I guess others should spend some time to get.
Kind detail description is never to look down on person. :)

> 

Nice catch, Joonsoo! But what make me worry is is this patch makes 3 thing
all at once.

1. fix - no candidate for stable
2. clean up
3. fix - candidate for stable.

Could you separate 3 and (1,2) in next spin?


> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/page_alloc.c |   29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 44672dc..3e1e344 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -441,18 +441,28 @@ static int __init debug_guardpage_minorder_setup(char *buf)
>  }
>  __setup("debug_guardpage_minorder=", debug_guardpage_minorder_setup);
>  
> -static inline void set_page_guard_flag(struct page *page)
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__set_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, order);
> +	/* Guard pages are not available for any usage */
> +	__mod_zone_freepage_state(zone, -(1 << order), migratetype);
>  }
>  
> -static inline void clear_page_guard_flag(struct page *page)
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype)
>  {
>  	__clear_bit(PAGE_DEBUG_FLAG_GUARD, &page->debug_flags);
> +	set_page_private(page, 0);
> +	if (!is_migrate_isolate(migratetype))
> +		__mod_zone_freepage_state(zone, (1 << order), migratetype);
>  }
>  #else
> -static inline void set_page_guard_flag(struct page *page) { }
> -static inline void clear_page_guard_flag(struct page *page) { }
> +static inline void set_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
> +static inline void clear_page_guard(struct zone *zone, struct page *page,
> +				unsigned int order, int migratetype) {}
>  #endif
>  
>  static inline void set_page_order(struct page *page, unsigned int order)
> @@ -594,10 +604,7 @@ static inline void __free_one_page(struct page *page,
>  		 * merge with it and move up one order.
>  		 */
>  		if (page_is_guard(buddy)) {
> -			clear_page_guard_flag(buddy);
> -			set_page_private(page, 0);
> -			__mod_zone_freepage_state(zone, 1 << order,
> -						  migratetype);
> +			clear_page_guard(zone, buddy, order, migratetype);
>  		} else {
>  			list_del(&buddy->lru);
>  			zone->free_area[order].nr_free--;
> @@ -876,11 +883,7 @@ static inline void expand(struct zone *zone, struct page *page,
>  			 * pages will stay not present in virtual address space
>  			 */
>  			INIT_LIST_HEAD(&page[size].lru);
> -			set_page_guard_flag(&page[size]);
> -			set_page_private(&page[size], high);
> -			/* Guard pages are not available for any usage */
> -			__mod_zone_freepage_state(zone, -(1 << high),
> -						  migratetype);
> +			set_page_guard(zone, &page[size], high, migratetype);
>  			continue;
>  		}
>  #endif
> -- 
> 1.7.9.5
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-12  5:17     ` Minchan Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  5:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
> We got migratetype of the freeing page without holding the zone lock so
> it could be racy. There are two cases of this race.
> 
> 1. pages are added to isolate buddy list after restoring original
> migratetype.
> 2. pages are added to normal buddy list while pageblock is isolated.
> 
> If case 1 happens, we can't allocate freepages on isolate buddy list
> until next pageblock isolation occurs.
> In case of 2, pages could be merged with pages on isolate buddy list and
> located on normal buddy list. This makes freepage counting incorrect
> and break the property of pageblock isolation.
> 
> One solution to this problem is checking pageblock migratetype with
> holding zone lock in __free_one_page() and I posted it before, but,
> it didn't get welcome since it needs the hook in zone lock critical
> section on freepath.

I didn't review your v1 but IMHO, this patchset is rather complex.
Normally, we don't like adding more overhead in fast path but we did
several time on hotplug/cma, esp so I don't know a few more thing is
really hesitant. In addition, you proved by this patchset how this
isolation code looks ugly and fragile for race problem so I vote
adding more overhead in fast path if it can make code really simple.

Vlastimil?

To Joonsoo,

you want to send this patchset for stable since review is done?
IIRC, you want to fix freepage couting bug and send it to stable but
as I see this patchset, no make sense to send to stable. :(

> 
> This is another solution to this problem and impose most overhead on
> pageblock isolation logic. Following is how this solution works.
> 
> 1. Extends irq disabled period on freepath to call
> get_pfnblock_migratetype() with irq disabled. With this, we can be
> sure that future freed pages will see modified pageblock migratetype
> after certain synchronization point so we don't need to hold the zone
> lock to get correct pageblock migratetype. Although it extends irq
> disabled period on freepath, I guess it is marginal and better than
> adding the hook in zone lock critical section.

Agreed.

> 
> 2. #1 requires IPI for synchronization and we can't hold the zone lock
> during processing IPI. In this time, some pages could be moved from buddy
> list to pcp list on page allocation path and later it could be moved again
> from pcp list to buddy list. In this time, this page would be on isolate
> pageblock, so, the hook is required on free_pcppages_bulk() to prevent
> misplacement. To remove this possibility, disabling and draining pcp
> list is needed during isolation. It guaratees that there is no page on pcp
> list on all cpus while isolation, so misplacement problem can't happen.
> 
> Note that this doesn't fix freepage counting problem. To fix it,
> we need more logic. Following patches will do it.

I hope to revise description in next spin. It's very hard to parse for
stupid me.

> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/internal.h       |    2 ++
>  mm/page_alloc.c     |   27 ++++++++++++++++++++-------
>  mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
>  3 files changed, 55 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index a1b651b..81b8884 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>  /*
>   * in mm/page_alloc.c
>   */
> +extern void zone_pcp_disable(struct zone *zone);
> +extern void zone_pcp_enable(struct zone *zone);

Nit: Some of pcp functions has prefix zone but others don't.
Which is better? If function has param zone as first argument,
I think it's clear unless the function don't have prefix zone.

>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e1e344..4517b1d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>  			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
>  			trace_mm_page_pcpu_drain(page, 0, mt);
> -			if (likely(!is_migrate_isolate_page(page))) {
> -				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> -				if (is_migrate_cma(mt))
> -					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> -			}
> +			__mod_zone_freepage_state(zone, 1, mt);
>  		} while (--to_free && --batch_free && !list_empty(list));
>  	}
>  	spin_unlock(&zone->lock);
> @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>  	if (!free_pages_prepare(page, order))
>  		return;
>  
> -	migratetype = get_pfnblock_migratetype(page, pfn);
>  	local_irq_save(flags);
> +	migratetype = get_pfnblock_migratetype(page, pfn);

Could you add some comment about page-isolated locking rule in somewhere?
I think it's valuable to add it in code rather than description.

In addition, as your description, get_pfnblock_migratetype should be
protected by irq_disabled. Then, it would be better to add a comment or
VM_BUG_ON check with irq_disabled in get_pfnblock_migratetype but I think
get_pfnblock_migratetype might be called for other purpose in future.
In that case, it's not necessary to disable irq so we could introduce
"get_freeing_page_migratetype" with irq disabled check and use it.

Question. soft_offline_page doesn't have any lock
for get_pageblock_migratetype. Is it okay?

>  	__count_vm_events(PGFREE, 1 << order);
>  	set_freepage_migratetype(page, migratetype);
>  	free_one_page(page_zone(page), page, pfn, order, migratetype);
> @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
>  	if (!free_pages_prepare(page, 0))
>  		return;
>  
> +	local_irq_save(flags);
>  	migratetype = get_pfnblock_migratetype(page, pfn);
>  	set_freepage_migratetype(page, migratetype);
> -	local_irq_save(flags);
>  	__count_vm_event(PGFREE);
>  
>  	/*
> @@ -6469,6 +6465,23 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>  }
>  #endif
>  
> +#ifdef CONFIG_MEMORY_ISOLATION
> +void zone_pcp_disable(struct zone *zone)
> +{
> +	mutex_lock(&pcp_batch_high_lock);
> +	pageset_update(zone, 1, 1);
> +}
> +
> +void zone_pcp_enable(struct zone *zone)
> +{
> +	int high, batch;
> +
> +	pageset_get_values(zone, &high, &batch);
> +	pageset_update(zone, high, batch);
> +	mutex_unlock(&pcp_batch_high_lock);
> +}
> +#endif

Nit:
It is used for only page_isolation.c so how about moving to page_isolation.c?

> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  /*
>   * The zone indicated has a new number of managed_pages; batch sizes and percpu
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 3100f98..439158d 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -16,9 +16,10 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>  	struct memory_isolate_notify arg;
>  	int notifier_ret;
>  	int ret = -EBUSY;
> +	unsigned long nr_pages;
> +	int migratetype;
>  
>  	zone = page_zone(page);
> -

Unnecessary change.

>  	spin_lock_irqsave(&zone->lock, flags);
>  
>  	pfn = page_to_pfn(page);
> @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>  	 */
>  
>  out:
> -	if (!ret) {
> -		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +	if (ret) {
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		return ret;
> +	}
>  
> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	migratetype = get_pageblock_migratetype(page);
> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	spin_unlock_irqrestore(&zone->lock, flags);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> -	}
> +	zone_pcp_disable(zone);

You pcp disable/enable per pageblock so that overhead would be severe.
I believe your remaining patches will solve it. Anyway, let's add "
XXX: should save pcp disable/enable" and you could remove the comment
when your further patches handles it so reviewer could be happy with
fact which author already know the problem and someone could solve
the issue even though your furhter patches might reject.

> +
> +	/*
> +	 * After this point, freed pages will see MIGRATE_ISOLATE as
> +	 * their pageblock migratetype on all cpus. And pcp list has
> +	 * no free page.
> +	 */
> +	on_each_cpu(drain_local_pages, NULL, 1);
>  
> +	spin_lock_irqsave(&zone->lock, flags);
> +	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	__mod_zone_freepage_state(zone, -nr_pages, migratetype);
>  	spin_unlock_irqrestore(&zone->lock, flags);
> -	if (!ret)
> -		drain_all_pages();
> -	return ret;
> +
> +	zone_pcp_enable(zone);
> +
> +	return 0;
>  }
>  
>  void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> @@ -80,9 +93,17 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  	spin_lock_irqsave(&zone->lock, flags);
>  	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>  		goto out;
> +
> +	set_pageblock_migratetype(page, migratetype);
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +
> +	/* Freed pages will see original migratetype after this point */
> +	kick_all_cpus_sync();
> +
> +	spin_lock_irqsave(&zone->lock, flags);
>  	nr_pages = move_freepages_block(zone, page, migratetype);
>  	__mod_zone_freepage_state(zone, nr_pages, migratetype);
> -	set_pageblock_migratetype(page, migratetype);
> +
>  out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-12  5:17     ` Minchan Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  5:17 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
> We got migratetype of the freeing page without holding the zone lock so
> it could be racy. There are two cases of this race.
> 
> 1. pages are added to isolate buddy list after restoring original
> migratetype.
> 2. pages are added to normal buddy list while pageblock is isolated.
> 
> If case 1 happens, we can't allocate freepages on isolate buddy list
> until next pageblock isolation occurs.
> In case of 2, pages could be merged with pages on isolate buddy list and
> located on normal buddy list. This makes freepage counting incorrect
> and break the property of pageblock isolation.
> 
> One solution to this problem is checking pageblock migratetype with
> holding zone lock in __free_one_page() and I posted it before, but,
> it didn't get welcome since it needs the hook in zone lock critical
> section on freepath.

I didn't review your v1 but IMHO, this patchset is rather complex.
Normally, we don't like adding more overhead in fast path but we did
several time on hotplug/cma, esp so I don't know a few more thing is
really hesitant. In addition, you proved by this patchset how this
isolation code looks ugly and fragile for race problem so I vote
adding more overhead in fast path if it can make code really simple.

Vlastimil?

To Joonsoo,

you want to send this patchset for stable since review is done?
IIRC, you want to fix freepage couting bug and send it to stable but
as I see this patchset, no make sense to send to stable. :(

> 
> This is another solution to this problem and impose most overhead on
> pageblock isolation logic. Following is how this solution works.
> 
> 1. Extends irq disabled period on freepath to call
> get_pfnblock_migratetype() with irq disabled. With this, we can be
> sure that future freed pages will see modified pageblock migratetype
> after certain synchronization point so we don't need to hold the zone
> lock to get correct pageblock migratetype. Although it extends irq
> disabled period on freepath, I guess it is marginal and better than
> adding the hook in zone lock critical section.

Agreed.

> 
> 2. #1 requires IPI for synchronization and we can't hold the zone lock
> during processing IPI. In this time, some pages could be moved from buddy
> list to pcp list on page allocation path and later it could be moved again
> from pcp list to buddy list. In this time, this page would be on isolate
> pageblock, so, the hook is required on free_pcppages_bulk() to prevent
> misplacement. To remove this possibility, disabling and draining pcp
> list is needed during isolation. It guaratees that there is no page on pcp
> list on all cpus while isolation, so misplacement problem can't happen.
> 
> Note that this doesn't fix freepage counting problem. To fix it,
> we need more logic. Following patches will do it.

I hope to revise description in next spin. It's very hard to parse for
stupid me.

> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  mm/internal.h       |    2 ++
>  mm/page_alloc.c     |   27 ++++++++++++++++++++-------
>  mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
>  3 files changed, 55 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index a1b651b..81b8884 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>  /*
>   * in mm/page_alloc.c
>   */
> +extern void zone_pcp_disable(struct zone *zone);
> +extern void zone_pcp_enable(struct zone *zone);

Nit: Some of pcp functions has prefix zone but others don't.
Which is better? If function has param zone as first argument,
I think it's clear unless the function don't have prefix zone.

>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3e1e344..4517b1d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>  			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
>  			trace_mm_page_pcpu_drain(page, 0, mt);
> -			if (likely(!is_migrate_isolate_page(page))) {
> -				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> -				if (is_migrate_cma(mt))
> -					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> -			}
> +			__mod_zone_freepage_state(zone, 1, mt);
>  		} while (--to_free && --batch_free && !list_empty(list));
>  	}
>  	spin_unlock(&zone->lock);
> @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>  	if (!free_pages_prepare(page, order))
>  		return;
>  
> -	migratetype = get_pfnblock_migratetype(page, pfn);
>  	local_irq_save(flags);
> +	migratetype = get_pfnblock_migratetype(page, pfn);

Could you add some comment about page-isolated locking rule in somewhere?
I think it's valuable to add it in code rather than description.

In addition, as your description, get_pfnblock_migratetype should be
protected by irq_disabled. Then, it would be better to add a comment or
VM_BUG_ON check with irq_disabled in get_pfnblock_migratetype but I think
get_pfnblock_migratetype might be called for other purpose in future.
In that case, it's not necessary to disable irq so we could introduce
"get_freeing_page_migratetype" with irq disabled check and use it.

Question. soft_offline_page doesn't have any lock
for get_pageblock_migratetype. Is it okay?

>  	__count_vm_events(PGFREE, 1 << order);
>  	set_freepage_migratetype(page, migratetype);
>  	free_one_page(page_zone(page), page, pfn, order, migratetype);
> @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
>  	if (!free_pages_prepare(page, 0))
>  		return;
>  
> +	local_irq_save(flags);
>  	migratetype = get_pfnblock_migratetype(page, pfn);
>  	set_freepage_migratetype(page, migratetype);
> -	local_irq_save(flags);
>  	__count_vm_event(PGFREE);
>  
>  	/*
> @@ -6469,6 +6465,23 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
>  }
>  #endif
>  
> +#ifdef CONFIG_MEMORY_ISOLATION
> +void zone_pcp_disable(struct zone *zone)
> +{
> +	mutex_lock(&pcp_batch_high_lock);
> +	pageset_update(zone, 1, 1);
> +}
> +
> +void zone_pcp_enable(struct zone *zone)
> +{
> +	int high, batch;
> +
> +	pageset_get_values(zone, &high, &batch);
> +	pageset_update(zone, high, batch);
> +	mutex_unlock(&pcp_batch_high_lock);
> +}
> +#endif

Nit:
It is used for only page_isolation.c so how about moving to page_isolation.c?

> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  /*
>   * The zone indicated has a new number of managed_pages; batch sizes and percpu
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 3100f98..439158d 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -16,9 +16,10 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>  	struct memory_isolate_notify arg;
>  	int notifier_ret;
>  	int ret = -EBUSY;
> +	unsigned long nr_pages;
> +	int migratetype;
>  
>  	zone = page_zone(page);
> -

Unnecessary change.

>  	spin_lock_irqsave(&zone->lock, flags);
>  
>  	pfn = page_to_pfn(page);
> @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>  	 */
>  
>  out:
> -	if (!ret) {
> -		unsigned long nr_pages;
> -		int migratetype = get_pageblock_migratetype(page);
> +	if (ret) {
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		return ret;
> +	}
>  
> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	migratetype = get_pageblock_migratetype(page);
> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	spin_unlock_irqrestore(&zone->lock, flags);
>  
> -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> -	}
> +	zone_pcp_disable(zone);

You pcp disable/enable per pageblock so that overhead would be severe.
I believe your remaining patches will solve it. Anyway, let's add "
XXX: should save pcp disable/enable" and you could remove the comment
when your further patches handles it so reviewer could be happy with
fact which author already know the problem and someone could solve
the issue even though your furhter patches might reject.

> +
> +	/*
> +	 * After this point, freed pages will see MIGRATE_ISOLATE as
> +	 * their pageblock migratetype on all cpus. And pcp list has
> +	 * no free page.
> +	 */
> +	on_each_cpu(drain_local_pages, NULL, 1);
>  
> +	spin_lock_irqsave(&zone->lock, flags);
> +	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	__mod_zone_freepage_state(zone, -nr_pages, migratetype);
>  	spin_unlock_irqrestore(&zone->lock, flags);
> -	if (!ret)
> -		drain_all_pages();
> -	return ret;
> +
> +	zone_pcp_enable(zone);
> +
> +	return 0;
>  }
>  
>  void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> @@ -80,9 +93,17 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  	spin_lock_irqsave(&zone->lock, flags);
>  	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>  		goto out;
> +
> +	set_pageblock_migratetype(page, migratetype);
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +
> +	/* Freed pages will see original migratetype after this point */
> +	kick_all_cpus_sync();
> +
> +	spin_lock_irqsave(&zone->lock, flags);
>  	nr_pages = move_freepages_block(zone, page, migratetype);
>  	__mod_zone_freepage_state(zone, nr_pages, migratetype);
> -	set_pageblock_migratetype(page, migratetype);
> +
>  out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
> -- 
> 1.7.9.5
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-06  7:18   ` Joonsoo Kim
@ 2014-08-12  6:43     ` Minchan Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  6:43 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 04:18:34PM +0900, Joonsoo Kim wrote:
> Current pageblock isolation logic has a problem that results in incorrect
> freepage counting. move_freepages_block() doesn't return number of
> moved pages so freepage count could be wrong if some pages are freed
> inbetween set_pageblock_migratetype() and move_freepages_block(). Although

It's a problem introduced by your patch pcp_disable/enable which release the
zone->lock so it would be better to mention it because I got confused
the problem was there. :(

In addition, could you include the situation in description?
It seems you are saying some of pages which was freed could be located in
isolated list already so move_freepages_block moves from isolated list to
isolated list so double accounting happens and it's a BUG. Right?

> we fix move_freepages_block() to return number of moved pages, the problem
> wouldn't be fixed completely because buddy allocator doesn't care if merged
> pages are on different buddy list or not. If some page on normal buddy list
> is merged with isolated page and moved to isolate buddy list, freepage
> count should be subtracted, but, it didn't and can't now.
> 
> To fix this case, freed page should not be added to buddy list
> inbetween set_pageblock_migratetype() and move_freepages_block().
> In this patch, I introduce hook, deactivate_isolate_page() on
> free_one_page() for freeing page on isolate pageblock. This page will
> be marked as PageIsolated() and handled specially in pageblock
> isolation logic.
> 
> Overall design of changed pageblock isolation logic is as following.
> 
> 1. ISOLATION
> - check pageblock is suitable for pageblock isolation.
> - change migratetype of pageblock to MIGRATE_ISOLATE.
> - disable pcp list.
> - drain pcp list.
> - pcp couldn't have any freepage at this point.
> - synchronize all cpus to see correct migratetype.
> - freed pages on this pageblock will be handled specially and
> not added to buddy list from here. With this way, there is no
> possibility of merging pages on different buddy list.

Pz, write down hwo to handle it specially. For instance, mark
the page with new flag and keep it without returning to the
buddy list.

> - move freepages on normal buddy list to isolate buddy list.
> There is no page on isolate buddy list so move_freepages_block()
> returns number of moved freepages correctly.
> - enable pcp list.
> 
> 2. TEST-ISOLATION
> - activates freepages marked as PageIsolated() and add to isolate

I was curious what "activate" means and realized with code inspection.
How about using "- Checking PageIsolated flag of the page and finally
move it into buddy list which should be MIGRATE_ISOLATE migratetype.

> buddy list.
> - test if pageblock is properly isolated.
> 
> 3. UNDO-ISOLATION
> - move freepages from isolate buddy list to normal buddy list.
> There is no page on normal buddy list so move_freepages_block()
> return number of moved freepages correctly.
> - change migratetype of pageblock to normal migratetype
> - synchronize all cpus.
> - activate isolated freepages and add to normal buddy list.
> 
> With this patch, most of freepage counting bugs are solved and
> exceptional handling for freepage count is done in pageblock isolation
> logic rather than allocator.
> 
> Remain problem is for page with pageblock_order. Following patch
> will fix it, too.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/page-isolation.h |    2 +
>  mm/internal.h                  |    3 ++
>  mm/page_alloc.c                |   28 ++++++-----
>  mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
>  4 files changed, 118 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 3fff8e7..3dd39fe 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
>  }
>  #endif
>  
> +void deactivate_isolated_page(struct zone *zone, struct page *page,
> +				unsigned int order);

I don't know what is better name. How about "hijack_isolated_page"?

>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  			 bool skip_hwpoisoned_pages);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
> diff --git a/mm/internal.h b/mm/internal.h
> index 81b8884..c70750a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>   */
>  extern void zone_pcp_disable(struct zone *zone);
>  extern void zone_pcp_enable(struct zone *zone);
> +extern void __free_one_page(struct page *page, unsigned long pfn,
> +		struct zone *zone, unsigned int order,
> +		int migratetype);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4517b1d..82da4a8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>   * -- nyc
>   */
>  
> -static inline void __free_one_page(struct page *page,
> +void __free_one_page(struct page *page,

no inline any more. :(

Personally, it is becoming increasingly clear that it would be better
to add some hooks for isolateed pages to be sure to fix theses problems
without adding more complicated logic.

>  		unsigned long pfn,
>  		struct zone *zone, unsigned int order,
>  		int migratetype)
> @@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
>  				int migratetype)
>  {
>  	unsigned long nr_scanned;
> +
> +	if (unlikely(is_migrate_isolate(migratetype))) {
> +		deactivate_isolated_page(zone, page, order);
> +		return;
> +	}
> +
>  	spin_lock(&zone->lock);
>  	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
>  	if (nr_scanned)
>  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
>  
>  	__free_one_page(page, pfn, zone, order, migratetype);
> -	if (unlikely(!is_migrate_isolate(migratetype)))
> -		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> +	__mod_zone_freepage_state(zone, 1 << order, migratetype);
>  	spin_unlock(&zone->lock);
>  }
>  
> @@ -6413,6 +6418,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	lru_add_drain_all();
>  	drain_all_pages();
>  
> +	/* Make sure the range is really isolated. */
> +	if (test_pages_isolated(start, end, false)) {
> +		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
> +		       start, end);
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +

It would be better to mention why you moved the logic in description
and please write down a description on test_pages_isolated.
"It moves captured isolated page in freeing path to buddy"

>  	order = 0;
>  	outer_start = start;
>  	while (!PageBuddy(pfn_to_page(outer_start))) {
> @@ -6423,15 +6436,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  		outer_start &= ~0UL << order;
>  	}
>  
> -	/* Make sure the range is really isolated. */
> -	if (test_pages_isolated(outer_start, end, false)) {
> -		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
> -		       outer_start, end);
> -		ret = -EBUSY;
> -		goto done;
> -	}
> -
> -
>  	/* Grab isolated pages from freelists. */
>  	outer_end = isolate_freepages_range(&cc, outer_start, end);
>  	if (!outer_end) {
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 439158d..898361f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -9,6 +9,75 @@
>  #include <linux/hugetlb.h>
>  #include "internal.h"
>  
> +#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
> +
> +static inline int PageIsolated(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}
> +
> +void deactivate_isolated_page(struct zone *zone, struct page *page,
> +				unsigned int order)
> +{
> +	spin_lock(&zone->lock);
> +
> +	set_page_private(page, order);
> +	__SetPageIsolated(page);
> +
> +	spin_unlock(&zone->lock);
> +}
> +
> +static void activate_isolated_pages(struct zone *zone, unsigned long start_pfn,

IMO, activate is not a good name.
How about "drain_hijacked_isolate_pages"?

> +				unsigned long end_pfn, int migratetype)
> +{
> +	unsigned long flags;
> +	struct page *page;
> +	unsigned long pfn = start_pfn;
> +	unsigned int order;
> +	unsigned long nr_pages = 0;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	while (pfn < end_pfn) {
> +		if (!pfn_valid_within(pfn)) {
> +			pfn++;
> +			continue;
> +		}
> +
> +		page = pfn_to_page(pfn);
> +		if (PageBuddy(page)) {
> +			pfn += 1 << page_order(page);
> +		} else if (PageIsolated(page)) {
> +			__ClearPageIsolated(page);
> +			set_freepage_migratetype(page, migratetype);
> +			order = page_order(page);
> +			__free_one_page(page, pfn, zone, order, migratetype);
> +
> +			pfn += 1 << order;
> +			nr_pages += 1 << order;
> +		} else {
> +			pfn++;
> +		}
> +	}
> +
> +	if (!is_migrate_isolate(migratetype))
> +		__mod_zone_freepage_state(zone, nr_pages, migratetype);
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>  int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -88,24 +157,26 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  {
>  	struct zone *zone;
>  	unsigned long flags, nr_pages;
> +	unsigned long start_pfn, end_pfn;
>  
>  	zone = page_zone(page);
>  	spin_lock_irqsave(&zone->lock, flags);
> -	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> -		goto out;
> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) {
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		return;
> +	}
>  
> +	nr_pages = move_freepages_block(zone, page, migratetype);
> +	__mod_zone_freepage_state(zone, nr_pages, migratetype);
>  	set_pageblock_migratetype(page, migratetype);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
>  	/* Freed pages will see original migratetype after this point */
>  	kick_all_cpus_sync();
>  
> -	spin_lock_irqsave(&zone->lock, flags);
> -	nr_pages = move_freepages_block(zone, page, migratetype);
> -	__mod_zone_freepage_state(zone, nr_pages, migratetype);
> -
> -out:
> -	spin_unlock_irqrestore(&zone->lock, flags);
> +	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages;
> +	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
>  }
>  
>  static inline struct page *
> @@ -242,6 +313,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  	struct page *page;
>  	struct zone *zone;
>  	int ret;
> +	int order;
> +	unsigned long outer_start;
>  
>  	/*
>  	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
> @@ -256,10 +329,24 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
>  	if ((pfn < end_pfn) || !page)
>  		return -EBUSY;
> -	/* Check all pages are free or marked as ISOLATED */
> +
>  	zone = page_zone(page);
> +	activate_isolated_pages(zone, start_pfn, end_pfn, MIGRATE_ISOLATE);
> +
> +	/* Check all pages are free or marked as ISOLATED */
>  	spin_lock_irqsave(&zone->lock, flags);
> -	ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> +	order = 0;
> +	outer_start = start_pfn;
> +	while (!PageBuddy(pfn_to_page(outer_start))) {
> +		if (++order >= MAX_ORDER) {
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +			return -EBUSY;
> +		}
> +
> +		outer_start &= ~0UL << order;
> +	}
> +
> +	ret = __test_page_isolated_in_pageblock(outer_start, end_pfn,
>  						skip_hwpoisoned_pages);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret ? 0 : -EBUSY;
> -- 
> 1.7.9.5
> 

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-12  6:43     ` Minchan Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Minchan Kim @ 2014-08-12  6:43 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Wed, Aug 06, 2014 at 04:18:34PM +0900, Joonsoo Kim wrote:
> Current pageblock isolation logic has a problem that results in incorrect
> freepage counting. move_freepages_block() doesn't return number of
> moved pages so freepage count could be wrong if some pages are freed
> inbetween set_pageblock_migratetype() and move_freepages_block(). Although

It's a problem introduced by your patch pcp_disable/enable which release the
zone->lock so it would be better to mention it because I got confused
the problem was there. :(

In addition, could you include the situation in description?
It seems you are saying some of pages which was freed could be located in
isolated list already so move_freepages_block moves from isolated list to
isolated list so double accounting happens and it's a BUG. Right?

> we fix move_freepages_block() to return number of moved pages, the problem
> wouldn't be fixed completely because buddy allocator doesn't care if merged
> pages are on different buddy list or not. If some page on normal buddy list
> is merged with isolated page and moved to isolate buddy list, freepage
> count should be subtracted, but, it didn't and can't now.
> 
> To fix this case, freed page should not be added to buddy list
> inbetween set_pageblock_migratetype() and move_freepages_block().
> In this patch, I introduce hook, deactivate_isolate_page() on
> free_one_page() for freeing page on isolate pageblock. This page will
> be marked as PageIsolated() and handled specially in pageblock
> isolation logic.
> 
> Overall design of changed pageblock isolation logic is as following.
> 
> 1. ISOLATION
> - check pageblock is suitable for pageblock isolation.
> - change migratetype of pageblock to MIGRATE_ISOLATE.
> - disable pcp list.
> - drain pcp list.
> - pcp couldn't have any freepage at this point.
> - synchronize all cpus to see correct migratetype.
> - freed pages on this pageblock will be handled specially and
> not added to buddy list from here. With this way, there is no
> possibility of merging pages on different buddy list.

Pz, write down hwo to handle it specially. For instance, mark
the page with new flag and keep it without returning to the
buddy list.

> - move freepages on normal buddy list to isolate buddy list.
> There is no page on isolate buddy list so move_freepages_block()
> returns number of moved freepages correctly.
> - enable pcp list.
> 
> 2. TEST-ISOLATION
> - activates freepages marked as PageIsolated() and add to isolate

I was curious what "activate" means and realized with code inspection.
How about using "- Checking PageIsolated flag of the page and finally
move it into buddy list which should be MIGRATE_ISOLATE migratetype.

> buddy list.
> - test if pageblock is properly isolated.
> 
> 3. UNDO-ISOLATION
> - move freepages from isolate buddy list to normal buddy list.
> There is no page on normal buddy list so move_freepages_block()
> return number of moved freepages correctly.
> - change migratetype of pageblock to normal migratetype
> - synchronize all cpus.
> - activate isolated freepages and add to normal buddy list.
> 
> With this patch, most of freepage counting bugs are solved and
> exceptional handling for freepage count is done in pageblock isolation
> logic rather than allocator.
> 
> Remain problem is for page with pageblock_order. Following patch
> will fix it, too.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
>  include/linux/page-isolation.h |    2 +
>  mm/internal.h                  |    3 ++
>  mm/page_alloc.c                |   28 ++++++-----
>  mm/page_isolation.c            |  107 ++++++++++++++++++++++++++++++++++++----
>  4 files changed, 118 insertions(+), 22 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 3fff8e7..3dd39fe 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -21,6 +21,8 @@ static inline bool is_migrate_isolate(int migratetype)
>  }
>  #endif
>  
> +void deactivate_isolated_page(struct zone *zone, struct page *page,
> +				unsigned int order);

I don't know what is better name. How about "hijack_isolated_page"?

>  bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
>  			 bool skip_hwpoisoned_pages);
>  void set_pageblock_migratetype(struct page *page, int migratetype);
> diff --git a/mm/internal.h b/mm/internal.h
> index 81b8884..c70750a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -110,6 +110,9 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
>   */
>  extern void zone_pcp_disable(struct zone *zone);
>  extern void zone_pcp_enable(struct zone *zone);
> +extern void __free_one_page(struct page *page, unsigned long pfn,
> +		struct zone *zone, unsigned int order,
> +		int migratetype);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4517b1d..82da4a8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>   * -- nyc
>   */
>  
> -static inline void __free_one_page(struct page *page,
> +void __free_one_page(struct page *page,

no inline any more. :(

Personally, it is becoming increasingly clear that it would be better
to add some hooks for isolateed pages to be sure to fix theses problems
without adding more complicated logic.

>  		unsigned long pfn,
>  		struct zone *zone, unsigned int order,
>  		int migratetype)
> @@ -738,14 +738,19 @@ static void free_one_page(struct zone *zone,
>  				int migratetype)
>  {
>  	unsigned long nr_scanned;
> +
> +	if (unlikely(is_migrate_isolate(migratetype))) {
> +		deactivate_isolated_page(zone, page, order);
> +		return;
> +	}
> +
>  	spin_lock(&zone->lock);
>  	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
>  	if (nr_scanned)
>  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
>  
>  	__free_one_page(page, pfn, zone, order, migratetype);
> -	if (unlikely(!is_migrate_isolate(migratetype)))
> -		__mod_zone_freepage_state(zone, 1 << order, migratetype);
> +	__mod_zone_freepage_state(zone, 1 << order, migratetype);
>  	spin_unlock(&zone->lock);
>  }
>  
> @@ -6413,6 +6418,14 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	lru_add_drain_all();
>  	drain_all_pages();
>  
> +	/* Make sure the range is really isolated. */
> +	if (test_pages_isolated(start, end, false)) {
> +		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
> +		       start, end);
> +		ret = -EBUSY;
> +		goto done;
> +	}
> +

It would be better to mention why you moved the logic in description
and please write down a description on test_pages_isolated.
"It moves captured isolated page in freeing path to buddy"

>  	order = 0;
>  	outer_start = start;
>  	while (!PageBuddy(pfn_to_page(outer_start))) {
> @@ -6423,15 +6436,6 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  		outer_start &= ~0UL << order;
>  	}
>  
> -	/* Make sure the range is really isolated. */
> -	if (test_pages_isolated(outer_start, end, false)) {
> -		pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
> -		       outer_start, end);
> -		ret = -EBUSY;
> -		goto done;
> -	}
> -
> -
>  	/* Grab isolated pages from freelists. */
>  	outer_end = isolate_freepages_range(&cc, outer_start, end);
>  	if (!outer_end) {
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 439158d..898361f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -9,6 +9,75 @@
>  #include <linux/hugetlb.h>
>  #include "internal.h"
>  
> +#define ISOLATED_PAGE_MAPCOUNT_VALUE (-64)
> +
> +static inline int PageIsolated(struct page *page)
> +{
> +	return atomic_read(&page->_mapcount) == ISOLATED_PAGE_MAPCOUNT_VALUE;
> +}
> +
> +static inline void __SetPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(atomic_read(&page->_mapcount) != -1, page);
> +	atomic_set(&page->_mapcount, ISOLATED_PAGE_MAPCOUNT_VALUE);
> +}
> +
> +static inline void __ClearPageIsolated(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(!PageIsolated(page), page);
> +	atomic_set(&page->_mapcount, -1);
> +}
> +
> +void deactivate_isolated_page(struct zone *zone, struct page *page,
> +				unsigned int order)
> +{
> +	spin_lock(&zone->lock);
> +
> +	set_page_private(page, order);
> +	__SetPageIsolated(page);
> +
> +	spin_unlock(&zone->lock);
> +}
> +
> +static void activate_isolated_pages(struct zone *zone, unsigned long start_pfn,

IMO, activate is not a good name.
How about "drain_hijacked_isolate_pages"?

> +				unsigned long end_pfn, int migratetype)
> +{
> +	unsigned long flags;
> +	struct page *page;
> +	unsigned long pfn = start_pfn;
> +	unsigned int order;
> +	unsigned long nr_pages = 0;
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	while (pfn < end_pfn) {
> +		if (!pfn_valid_within(pfn)) {
> +			pfn++;
> +			continue;
> +		}
> +
> +		page = pfn_to_page(pfn);
> +		if (PageBuddy(page)) {
> +			pfn += 1 << page_order(page);
> +		} else if (PageIsolated(page)) {
> +			__ClearPageIsolated(page);
> +			set_freepage_migratetype(page, migratetype);
> +			order = page_order(page);
> +			__free_one_page(page, pfn, zone, order, migratetype);
> +
> +			pfn += 1 << order;
> +			nr_pages += 1 << order;
> +		} else {
> +			pfn++;
> +		}
> +	}
> +
> +	if (!is_migrate_isolate(migratetype))
> +		__mod_zone_freepage_state(zone, nr_pages, migratetype);
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
>  int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
>  {
>  	struct zone *zone;
> @@ -88,24 +157,26 @@ void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>  {
>  	struct zone *zone;
>  	unsigned long flags, nr_pages;
> +	unsigned long start_pfn, end_pfn;
>  
>  	zone = page_zone(page);
>  	spin_lock_irqsave(&zone->lock, flags);
> -	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> -		goto out;
> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) {
> +		spin_unlock_irqrestore(&zone->lock, flags);
> +		return;
> +	}
>  
> +	nr_pages = move_freepages_block(zone, page, migratetype);
> +	__mod_zone_freepage_state(zone, nr_pages, migratetype);
>  	set_pageblock_migratetype(page, migratetype);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
>  	/* Freed pages will see original migratetype after this point */
>  	kick_all_cpus_sync();
>  
> -	spin_lock_irqsave(&zone->lock, flags);
> -	nr_pages = move_freepages_block(zone, page, migratetype);
> -	__mod_zone_freepage_state(zone, nr_pages, migratetype);
> -
> -out:
> -	spin_unlock_irqrestore(&zone->lock, flags);
> +	start_pfn = page_to_pfn(page) & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages;
> +	activate_isolated_pages(zone, start_pfn, end_pfn, migratetype);
>  }
>  
>  static inline struct page *
> @@ -242,6 +313,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  	struct page *page;
>  	struct zone *zone;
>  	int ret;
> +	int order;
> +	unsigned long outer_start;
>  
>  	/*
>  	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
> @@ -256,10 +329,24 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
>  	if ((pfn < end_pfn) || !page)
>  		return -EBUSY;
> -	/* Check all pages are free or marked as ISOLATED */
> +
>  	zone = page_zone(page);
> +	activate_isolated_pages(zone, start_pfn, end_pfn, MIGRATE_ISOLATE);
> +
> +	/* Check all pages are free or marked as ISOLATED */
>  	spin_lock_irqsave(&zone->lock, flags);
> -	ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> +	order = 0;
> +	outer_start = start_pfn;
> +	while (!PageBuddy(pfn_to_page(outer_start))) {
> +		if (++order >= MAX_ORDER) {
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +			return -EBUSY;
> +		}
> +
> +		outer_start &= ~0UL << order;
> +	}
> +
> +	ret = __test_page_isolated_in_pageblock(outer_start, end_pfn,
>  						skip_hwpoisoned_pages);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  	return ret ? 0 : -EBUSY;
> -- 
> 1.7.9.5
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-12  5:17     ` Minchan Kim
@ 2014-08-12  9:45       ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-12  9:45 UTC (permalink / raw)
  To: Minchan Kim, Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/12/2014 07:17 AM, Minchan Kim wrote:
> On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
>>
>> One solution to this problem is checking pageblock migratetype with
>> holding zone lock in __free_one_page() and I posted it before, but,
>> it didn't get welcome since it needs the hook in zone lock critical
>> section on freepath.
>
> I didn't review your v1 but IMHO, this patchset is rather complex.

It is, but the complexity is in the isolation code, and not fast paths, 
so that's justifiable IMHO.

> Normally, we don't like adding more overhead in fast path but we did
> several time on hotplug/cma, esp so I don't know a few more thing is
> really hesitant.

This actually undoes most of the overhead, so I'm all for it. Better 
than keep doing stuff the same way just because it was done previously.

> In addition, you proved by this patchset how this
> isolation code looks ugly and fragile for race problem so I vote
> adding more overhead in fast path if it can make code really simple.

Well, I recommend you to check out the v1 then :) That wasn't really 
simple, that was even more hooks rechecking migratetypes at various 
places of the fast paths, when merging buddies etc. This is much better. 
The complexity is mostly in the isolation code, and the overhead happens 
only during isolation.

> Vlastimil?

Well, I was the main opponent of v1 and suggested to do v2 like this, so 
here you go :)

> To Joonsoo,
>
> you want to send this patchset for stable since review is done?
> IIRC, you want to fix freepage couting bug and send it to stable but
> as I see this patchset, no make sense to send to stable. :(

Yeah that's one disadvantage. But I wouldn't like the v1 for stable even 
more.



^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-12  9:45       ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-12  9:45 UTC (permalink / raw)
  To: Minchan Kim, Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/12/2014 07:17 AM, Minchan Kim wrote:
> On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
>>
>> One solution to this problem is checking pageblock migratetype with
>> holding zone lock in __free_one_page() and I posted it before, but,
>> it didn't get welcome since it needs the hook in zone lock critical
>> section on freepath.
>
> I didn't review your v1 but IMHO, this patchset is rather complex.

It is, but the complexity is in the isolation code, and not fast paths, 
so that's justifiable IMHO.

> Normally, we don't like adding more overhead in fast path but we did
> several time on hotplug/cma, esp so I don't know a few more thing is
> really hesitant.

This actually undoes most of the overhead, so I'm all for it. Better 
than keep doing stuff the same way just because it was done previously.

> In addition, you proved by this patchset how this
> isolation code looks ugly and fragile for race problem so I vote
> adding more overhead in fast path if it can make code really simple.

Well, I recommend you to check out the v1 then :) That wasn't really 
simple, that was even more hooks rechecking migratetypes at various 
places of the fast paths, when merging buddies etc. This is much better. 
The complexity is mostly in the isolation code, and the overhead happens 
only during isolation.

> Vlastimil?

Well, I was the main opponent of v1 and suggested to do v2 like this, so 
here you go :)

> To Joonsoo,
>
> you want to send this patchset for stable since review is done?
> IIRC, you want to fix freepage couting bug and send it to stable but
> as I see this patchset, no make sense to send to stable. :(

Yeah that's one disadvantage. But I wouldn't like the v1 for stable even 
more.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
  2014-08-12  6:43     ` Minchan Kim
@ 2014-08-12 10:58       ` Vlastimil Babka
  -1 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-12 10:58 UTC (permalink / raw)
  To: Minchan Kim, Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/12/2014 08:43 AM, Minchan Kim wrote:
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>>    * -- nyc
>>    */
>>
>> -static inline void __free_one_page(struct page *page,
>> +void __free_one_page(struct page *page,
>
> no inline any more. :(

That could be hopefully done differently without killing this property.

> Personally, it is becoming increasingly clear that it would be better
> to add some hooks for isolateed pages to be sure to fix theses problems
> without adding more complicated logic.

Might be a valid argument but please do read the v1 discussions and then 
say if you still hold the opinion. Or maybe you will get a better 
picture afterwards and see a more elegant solution :)

Vlastimil

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs
@ 2014-08-12 10:58       ` Vlastimil Babka
  0 siblings, 0 replies; 84+ messages in thread
From: Vlastimil Babka @ 2014-08-12 10:58 UTC (permalink / raw)
  To: Minchan Kim, Joonsoo Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On 08/12/2014 08:43 AM, Minchan Kim wrote:
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -571,7 +571,7 @@ static inline int page_is_buddy(struct page *page, struct page *buddy,
>>    * -- nyc
>>    */
>>
>> -static inline void __free_one_page(struct page *page,
>> +void __free_one_page(struct page *page,
>
> no inline any more. :(

That could be hopefully done differently without killing this property.

> Personally, it is becoming increasingly clear that it would be better
> to add some hooks for isolateed pages to be sure to fix theses problems
> without adding more complicated logic.

Might be a valid argument but please do read the v1 discussions and then 
say if you still hold the opinion. Or maybe you will get a better 
picture afterwards and see a more elegant solution :)

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-12  9:45       ` Vlastimil Babka
@ 2014-08-13  8:09         ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Minchan Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Tue, Aug 12, 2014 at 11:45:32AM +0200, Vlastimil Babka wrote:
> On 08/12/2014 07:17 AM, Minchan Kim wrote:
> >On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
> >>
> >>One solution to this problem is checking pageblock migratetype with
> >>holding zone lock in __free_one_page() and I posted it before, but,
> >>it didn't get welcome since it needs the hook in zone lock critical
> >>section on freepath.
> >
> >I didn't review your v1 but IMHO, this patchset is rather complex.
> 
> It is, but the complexity is in the isolation code, and not fast
> paths, so that's justifiable IMHO.
> 
> >Normally, we don't like adding more overhead in fast path but we did
> >several time on hotplug/cma, esp so I don't know a few more thing is
> >really hesitant.
> 
> This actually undoes most of the overhead, so I'm all for it. Better
> than keep doing stuff the same way just because it was done
> previously.
> 
> >In addition, you proved by this patchset how this
> >isolation code looks ugly and fragile for race problem so I vote
> >adding more overhead in fast path if it can make code really simple.
> 
> Well, I recommend you to check out the v1 then :) That wasn't really
> simple, that was even more hooks rechecking migratetypes at various
> places of the fast paths, when merging buddies etc. This is much
> better. The complexity is mostly in the isolation code, and the
> overhead happens only during isolation.

Hmm... Okay.

I agree that this way is so complicated. In fact, the real save is just
one is_migrate_isolate_page() check in free_pcppages_bulk() and
this approach makes isolation process really complicated.

I guess that I could improve v1 patchset. How about waiting my
improved v1 and comparing v1' with v2?

If v1 is implemented cleanly, it may be better than this.
I want to try and compare. :)

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-13  8:09         ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Minchan Kim, Andrew Morton, Kirill A. Shutemov, Rik van Riel,
	Mel Gorman, Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Tue, Aug 12, 2014 at 11:45:32AM +0200, Vlastimil Babka wrote:
> On 08/12/2014 07:17 AM, Minchan Kim wrote:
> >On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
> >>
> >>One solution to this problem is checking pageblock migratetype with
> >>holding zone lock in __free_one_page() and I posted it before, but,
> >>it didn't get welcome since it needs the hook in zone lock critical
> >>section on freepath.
> >
> >I didn't review your v1 but IMHO, this patchset is rather complex.
> 
> It is, but the complexity is in the isolation code, and not fast
> paths, so that's justifiable IMHO.
> 
> >Normally, we don't like adding more overhead in fast path but we did
> >several time on hotplug/cma, esp so I don't know a few more thing is
> >really hesitant.
> 
> This actually undoes most of the overhead, so I'm all for it. Better
> than keep doing stuff the same way just because it was done
> previously.
> 
> >In addition, you proved by this patchset how this
> >isolation code looks ugly and fragile for race problem so I vote
> >adding more overhead in fast path if it can make code really simple.
> 
> Well, I recommend you to check out the v1 then :) That wasn't really
> simple, that was even more hooks rechecking migratetypes at various
> places of the fast paths, when merging buddies etc. This is much
> better. The complexity is mostly in the isolation code, and the
> overhead happens only during isolation.

Hmm... Okay.

I agree that this way is so complicated. In fact, the real save is just
one is_migrate_isolate_page() check in free_pcppages_bulk() and
this approach makes isolation process really complicated.

I guess that I could improve v1 patchset. How about waiting my
improved v1 and comparing v1' with v2?

If v1 is implemented cleanly, it may be better than this.
I want to try and compare. :)

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
  2014-08-12  1:24     ` Minchan Kim
@ 2014-08-13  8:13       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, tglx, cody,
	linux-kernel

On Tue, Aug 12, 2014 at 01:24:09AM +0000, Minchan Kim wrote:
> Hey Joonsoo,
> 
> On Wed, Aug 06, 2014 at 04:18:28PM +0900, Joonsoo Kim wrote:
> > per cpu pages structure, aka pcp, has high and batch values to control
> > how many pages we perform caching. This values could be updated
> > asynchronously and updater should ensure that this doesn't make any
> > problem. For this purpose, pageset_update() is implemented and do some
> > memory synchronization. But, it turns out to be wrong when I implemented
> > new feature using this. There is no corresponding smp_rmb() in read-side
> > so that it can't guarantee anything. Without correct updating, system
> > could hang in free_pcppages_bulk() due to larger batch value than high.
> > To properly update this values, we need to synchronization primitives on
> > read-side, but, it hurts allocator's fastpath.
> > 
> > There is another choice for synchronization, that is, sending IPI. This
> > is somewhat expensive, but, this is really rare case so I guess it has
> > no problem here. However, reducing IPI is very helpful here. Current
> > logic handles each CPU's pcp update one by one. To reduce sending IPI,
> > we need to re-ogranize the code to handle all CPU's pcp update at one go.
> > This patch implement these requirements.
> 
> Let's add right reviewer for the patch.
> Cced Cody and Thomas.

Okay. I will do it next time.

> 
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 80 insertions(+), 59 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b99643d4..44672dc 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
> >   * not check if the processor is online before following the pageset pointer.
> >   * Other parts of the kernel may not check if the zone is available.
> >   */
> > -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> > +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
> >  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> >  static void setup_zone_pageset(struct zone *zone);
> >  
> > @@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
> >  	 * needs the percpu allocator in order to allocate its pagesets
> >  	 * (a chicken-egg dilemma).
> >  	 */
> > -	for_each_possible_cpu(cpu) {
> > -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> > +	setup_pageset(&boot_pageset);
> >  
> > +	for_each_possible_cpu(cpu) {
> >  #ifdef CONFIG_HAVE_MEMORYLESS_NODES
> >  		/*
> >  		 * We now know the "local memory node" for each node--
> > @@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
> >   * outside of boot time (or some other assurance that no concurrent updaters
> >   * exist).
> >   */
> > -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> > -		unsigned long batch)
> > +static void pageset_update(struct zone *zone, int high, int batch)
> >  {
> > -       /* start with a fail safe value for batch */
> > -	pcp->batch = 1;
> > -	smp_wmb();
> > +	int cpu;
> > +	struct per_cpu_pages *pcp;
> > +
> > +	/* start with a fail safe value for batch */
> > +	for_each_possible_cpu(cpu) {
> > +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > +		pcp->batch = 1;
> > +	}
> > +	kick_all_cpus_sync();
> > +
> > +	/* Update high, then batch, in order */
> > +	for_each_possible_cpu(cpu) {
> > +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > +		pcp->high = high;
> > +	}
> > +	kick_all_cpus_sync();
> >  
> > -       /* Update high, then batch, in order */
> > -	pcp->high = high;
> > -	smp_wmb();
> > +	for_each_possible_cpu(cpu) {
> > +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > +		pcp->batch = batch;
> > +	}
> > +}
> > +
> > +/*
> > + * pageset_get_values_by_high() gets the high water mark for
> > + * hot per_cpu_pagelist to the value high for the pageset p.
> > + */
> > +static void pageset_get_values_by_high(int input_high,
> > +				int *output_high, int *output_batch)
> 
> You don't use output_high so we could make it as follows,

Yeah, it could be. But I want to make pageset_get_values_xxx variants
consistent. I will remove output_high and leave output_batch as is.

> 
> int pageset_batch(int high);
> 
> > +{
> > +	*output_batch = max(1, input_high / 4);
> > +	if ((input_high / 4) > (PAGE_SHIFT * 8))
> > +		*output_batch = PAGE_SHIFT * 8;
> > +}
> >  
> > -	pcp->batch = batch;
> > +/* a companion to pageset_get_values_by_high() */
> > +static void pageset_get_values_by_batch(int input_batch,
> > +				int *output_high, int *output_batch)
> > +{
> > +	*output_high = 6 * input_batch;
> > +	*output_batch = max(1, 1 * input_batch);
> >  }
> >  
> > -/* a companion to pageset_set_high() */
> > -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> > +static void pageset_get_values(struct zone *zone, int *high, int *batch)
> >  {
> > -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> > +	if (percpu_pagelist_fraction) {
> > +		pageset_get_values_by_high(
> > +			(zone->managed_pages / percpu_pagelist_fraction),
> > +			high, batch);
> > +	} else
> > +		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
> >  }
> >  
> >  static void pageset_init(struct per_cpu_pageset *p)
> > @@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
> >  		INIT_LIST_HEAD(&pcp->lists[migratetype]);
> >  }
> >  
> > -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> > +/* Use this only in boot time, because it doesn't do any synchronization */
> > +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
> 
> If we can use it with only boot_pages in boot time, let's make it more clear.
> 
> static void boot_setup_pageset(void)
> {
> 	boot_pageset;
> 	XXX;
> }

Okay.

Thanks.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management
@ 2014-08-13  8:13       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, tglx, cody,
	linux-kernel

On Tue, Aug 12, 2014 at 01:24:09AM +0000, Minchan Kim wrote:
> Hey Joonsoo,
> 
> On Wed, Aug 06, 2014 at 04:18:28PM +0900, Joonsoo Kim wrote:
> > per cpu pages structure, aka pcp, has high and batch values to control
> > how many pages we perform caching. This values could be updated
> > asynchronously and updater should ensure that this doesn't make any
> > problem. For this purpose, pageset_update() is implemented and do some
> > memory synchronization. But, it turns out to be wrong when I implemented
> > new feature using this. There is no corresponding smp_rmb() in read-side
> > so that it can't guarantee anything. Without correct updating, system
> > could hang in free_pcppages_bulk() due to larger batch value than high.
> > To properly update this values, we need to synchronization primitives on
> > read-side, but, it hurts allocator's fastpath.
> > 
> > There is another choice for synchronization, that is, sending IPI. This
> > is somewhat expensive, but, this is really rare case so I guess it has
> > no problem here. However, reducing IPI is very helpful here. Current
> > logic handles each CPU's pcp update one by one. To reduce sending IPI,
> > we need to re-ogranize the code to handle all CPU's pcp update at one go.
> > This patch implement these requirements.
> 
> Let's add right reviewer for the patch.
> Cced Cody and Thomas.

Okay. I will do it next time.

> 
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/page_alloc.c |  139 ++++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 80 insertions(+), 59 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b99643d4..44672dc 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -3797,7 +3797,7 @@ static void build_zonelist_cache(pg_data_t *pgdat)
> >   * not check if the processor is online before following the pageset pointer.
> >   * Other parts of the kernel may not check if the zone is available.
> >   */
> > -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
> > +static void setup_pageset(struct per_cpu_pageset __percpu *pcp);
> >  static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
> >  static void setup_zone_pageset(struct zone *zone);
> >  
> > @@ -3843,9 +3843,9 @@ static int __build_all_zonelists(void *data)
> >  	 * needs the percpu allocator in order to allocate its pagesets
> >  	 * (a chicken-egg dilemma).
> >  	 */
> > -	for_each_possible_cpu(cpu) {
> > -		setup_pageset(&per_cpu(boot_pageset, cpu), 0);
> > +	setup_pageset(&boot_pageset);
> >  
> > +	for_each_possible_cpu(cpu) {
> >  #ifdef CONFIG_HAVE_MEMORYLESS_NODES
> >  		/*
> >  		 * We now know the "local memory node" for each node--
> > @@ -4227,24 +4227,59 @@ static int zone_batchsize(struct zone *zone)
> >   * outside of boot time (or some other assurance that no concurrent updaters
> >   * exist).
> >   */
> > -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high,
> > -		unsigned long batch)
> > +static void pageset_update(struct zone *zone, int high, int batch)
> >  {
> > -       /* start with a fail safe value for batch */
> > -	pcp->batch = 1;
> > -	smp_wmb();
> > +	int cpu;
> > +	struct per_cpu_pages *pcp;
> > +
> > +	/* start with a fail safe value for batch */
> > +	for_each_possible_cpu(cpu) {
> > +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > +		pcp->batch = 1;
> > +	}
> > +	kick_all_cpus_sync();
> > +
> > +	/* Update high, then batch, in order */
> > +	for_each_possible_cpu(cpu) {
> > +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > +		pcp->high = high;
> > +	}
> > +	kick_all_cpus_sync();
> >  
> > -       /* Update high, then batch, in order */
> > -	pcp->high = high;
> > -	smp_wmb();
> > +	for_each_possible_cpu(cpu) {
> > +		pcp = &per_cpu_ptr(zone->pageset, cpu)->pcp;
> > +		pcp->batch = batch;
> > +	}
> > +}
> > +
> > +/*
> > + * pageset_get_values_by_high() gets the high water mark for
> > + * hot per_cpu_pagelist to the value high for the pageset p.
> > + */
> > +static void pageset_get_values_by_high(int input_high,
> > +				int *output_high, int *output_batch)
> 
> You don't use output_high so we could make it as follows,

Yeah, it could be. But I want to make pageset_get_values_xxx variants
consistent. I will remove output_high and leave output_batch as is.

> 
> int pageset_batch(int high);
> 
> > +{
> > +	*output_batch = max(1, input_high / 4);
> > +	if ((input_high / 4) > (PAGE_SHIFT * 8))
> > +		*output_batch = PAGE_SHIFT * 8;
> > +}
> >  
> > -	pcp->batch = batch;
> > +/* a companion to pageset_get_values_by_high() */
> > +static void pageset_get_values_by_batch(int input_batch,
> > +				int *output_high, int *output_batch)
> > +{
> > +	*output_high = 6 * input_batch;
> > +	*output_batch = max(1, 1 * input_batch);
> >  }
> >  
> > -/* a companion to pageset_set_high() */
> > -static void pageset_set_batch(struct per_cpu_pageset *p, unsigned long batch)
> > +static void pageset_get_values(struct zone *zone, int *high, int *batch)
> >  {
> > -	pageset_update(&p->pcp, 6 * batch, max(1UL, 1 * batch));
> > +	if (percpu_pagelist_fraction) {
> > +		pageset_get_values_by_high(
> > +			(zone->managed_pages / percpu_pagelist_fraction),
> > +			high, batch);
> > +	} else
> > +		pageset_get_values_by_batch(zone_batchsize(zone), high, batch);
> >  }
> >  
> >  static void pageset_init(struct per_cpu_pageset *p)
> > @@ -4260,51 +4295,38 @@ static void pageset_init(struct per_cpu_pageset *p)
> >  		INIT_LIST_HEAD(&pcp->lists[migratetype]);
> >  }
> >  
> > -static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
> > +/* Use this only in boot time, because it doesn't do any synchronization */
> > +static void setup_pageset(struct per_cpu_pageset __percpu *pcp)
> 
> If we can use it with only boot_pages in boot time, let's make it more clear.
> 
> static void boot_setup_pageset(void)
> {
> 	boot_pageset;
> 	XXX;
> }

Okay.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
  2014-08-11  9:23     ` Aneesh Kumar K.V
@ 2014-08-13  8:19       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Ritesh Harjani,
	t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Mon, Aug 11, 2014 at 02:53:35PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > The check '!PageBuddy(page) && page_count(page) == 0 &&
> > migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
> > Although it could go into buddy allocator within a short time,
> > futher operation such as isolate_freepages_range() in CMA, called after
> > test_page_isolated_in_pageblock(), could be failed due to this unstability
> > since it requires that the page is on buddy. I think that removing
> > this unstability is good thing.
> 
> Is that true in case of check_pages_isolated_cb ? Does that require
> PageBuddy to be true ?

I think so.

> 
> >
> > And, following patch makes isolated freepage has new status matched with
> > this condition and this check is the obstacle to that change. So remove
> > it.
> 
> Can you quote the patch summary in the above case ? ie, something like
> 
> And the followiing patch "mm/....." makes isolate freepage.
> 

Okay.

"mm/isolation: change pageblock isolation logic to fix freepage
counting bugs" introduce PageIsolated() and mark freepages
PageIsolated() during isolation. Those pages are !PageBuddy() and
page_count() == 0.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page
@ 2014-08-13  8:19       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Minchan Kim, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Ritesh Harjani,
	t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Mon, Aug 11, 2014 at 02:53:35PM +0530, Aneesh Kumar K.V wrote:
> Joonsoo Kim <iamjoonsoo.kim@lge.com> writes:
> 
> > The check '!PageBuddy(page) && page_count(page) == 0 &&
> > migratetype == MIGRATE_ISOLATE' would mean the page on free processing.
> > Although it could go into buddy allocator within a short time,
> > futher operation such as isolate_freepages_range() in CMA, called after
> > test_page_isolated_in_pageblock(), could be failed due to this unstability
> > since it requires that the page is on buddy. I think that removing
> > this unstability is good thing.
> 
> Is that true in case of check_pages_isolated_cb ? Does that require
> PageBuddy to be true ?

I think so.

> 
> >
> > And, following patch makes isolated freepage has new status matched with
> > this condition and this check is the obstacle to that change. So remove
> > it.
> 
> Can you quote the patch summary in the above case ? ie, something like
> 
> And the followiing patch "mm/....." makes isolate freepage.
> 

Okay.

"mm/isolation: change pageblock isolation logic to fix freepage
counting bugs" introduce PageIsolated() and mark freepages
PageIsolated() during isolation. Those pages are !PageBuddy() and
page_count() == 0.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
  2014-08-12  1:45     ` Minchan Kim
@ 2014-08-13  8:20       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Tue, Aug 12, 2014 at 01:45:23AM +0000, Minchan Kim wrote:
> On Wed, Aug 06, 2014 at 04:18:30PM +0900, Joonsoo Kim wrote:
> > In __free_one_page(), we check the buddy page if it is guard page.
> > And, if so, we should clear guard attribute on the buddy page. But,
> > currently, we clear original page's order rather than buddy one's.
> > This doesn't have any problem, because resetting buddy's order
> > is useless and the original page's order is re-assigned soon.
> > But, it is better to correct code.
> > 
> > Additionally, I change (set/clear)_page_guard_flag() to
> > (set/clear)_page_guard() and makes these functions do all works
> > needed for guard page. This may make code more understandable.
> > 
> > One more thing, I did in this patch, is that fixing freepage accounting.
> > If we clear guard page and link it onto isolate buddy list, we should
> > not increase freepage count.
> 
> You are saying just "shouldn't do that" but don't say "why" and "result"
> I know the reason but as you know, I'm one of the person who is rather
> familiar with this part but I guess others should spend some time to get.
> Kind detail description is never to look down on person. :)

Hmm. In fact, the reason is already mentioned in cover letter, but,
it is better to write it here.

Will do.

> > 
> 
> Nice catch, Joonsoo! But what make me worry is is this patch makes 3 thing
> all at once.
> 
> 1. fix - no candidate for stable
> 2. clean up
> 3. fix - candidate for stable.
> 
> Could you separate 3 and (1,2) in next spin?
> 

Okay!

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC
@ 2014-08-13  8:20       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Tue, Aug 12, 2014 at 01:45:23AM +0000, Minchan Kim wrote:
> On Wed, Aug 06, 2014 at 04:18:30PM +0900, Joonsoo Kim wrote:
> > In __free_one_page(), we check the buddy page if it is guard page.
> > And, if so, we should clear guard attribute on the buddy page. But,
> > currently, we clear original page's order rather than buddy one's.
> > This doesn't have any problem, because resetting buddy's order
> > is useless and the original page's order is re-assigned soon.
> > But, it is better to correct code.
> > 
> > Additionally, I change (set/clear)_page_guard_flag() to
> > (set/clear)_page_guard() and makes these functions do all works
> > needed for guard page. This may make code more understandable.
> > 
> > One more thing, I did in this patch, is that fixing freepage accounting.
> > If we clear guard page and link it onto isolate buddy list, we should
> > not increase freepage count.
> 
> You are saying just "shouldn't do that" but don't say "why" and "result"
> I know the reason but as you know, I'm one of the person who is rather
> familiar with this part but I guess others should spend some time to get.
> Kind detail description is never to look down on person. :)

Hmm. In fact, the reason is already mentioned in cover letter, but,
it is better to write it here.

Will do.

> > 
> 
> Nice catch, Joonsoo! But what make me worry is is this patch makes 3 thing
> all at once.
> 
> 1. fix - no candidate for stable
> 2. clean up
> 3. fix - candidate for stable.
> 
> Could you separate 3 and (1,2) in next spin?
> 

Okay!

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
  2014-08-12  5:17     ` Minchan Kim
@ 2014-08-13  8:29       ` Joonsoo Kim
  -1 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Tue, Aug 12, 2014 at 05:17:45AM +0000, Minchan Kim wrote:
> On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
> > 2. #1 requires IPI for synchronization and we can't hold the zone lock
> > during processing IPI. In this time, some pages could be moved from buddy
> > list to pcp list on page allocation path and later it could be moved again
> > from pcp list to buddy list. In this time, this page would be on isolate
> > pageblock, so, the hook is required on free_pcppages_bulk() to prevent
> > misplacement. To remove this possibility, disabling and draining pcp
> > list is needed during isolation. It guaratees that there is no page on pcp
> > list on all cpus while isolation, so misplacement problem can't happen.
> > 
> > Note that this doesn't fix freepage counting problem. To fix it,
> > we need more logic. Following patches will do it.
> 
> I hope to revise description in next spin. It's very hard to parse for
> stupid me.

Okay. I will do it.

> 
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/internal.h       |    2 ++
> >  mm/page_alloc.c     |   27 ++++++++++++++++++++-------
> >  mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
> >  3 files changed, 55 insertions(+), 19 deletions(-)
> > 
> > diff --git a/mm/internal.h b/mm/internal.h
> > index a1b651b..81b8884 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
> >  /*
> >   * in mm/page_alloc.c
> >   */
> > +extern void zone_pcp_disable(struct zone *zone);
> > +extern void zone_pcp_enable(struct zone *zone);
> 
> Nit: Some of pcp functions has prefix zone but others don't.
> Which is better? If function has param zone as first argument,
> I think it's clear unless the function don't have prefix zone.

Okay.

> 
> >  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> >  extern void prep_compound_page(struct page *page, unsigned long order);
> >  #ifdef CONFIG_MEMORY_FAILURE
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e1e344..4517b1d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> >  			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> >  			trace_mm_page_pcpu_drain(page, 0, mt);
> > -			if (likely(!is_migrate_isolate_page(page))) {
> > -				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> > -				if (is_migrate_cma(mt))
> > -					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> > -			}
> > +			__mod_zone_freepage_state(zone, 1, mt);
> >  		} while (--to_free && --batch_free && !list_empty(list));
> >  	}
> >  	spin_unlock(&zone->lock);
> > @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  	if (!free_pages_prepare(page, order))
> >  		return;
> >  
> > -	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	local_irq_save(flags);
> > +	migratetype = get_pfnblock_migratetype(page, pfn);
> 
> Could you add some comment about page-isolated locking rule in somewhere?
> I think it's valuable to add it in code rather than description.

Will do.

> In addition, as your description, get_pfnblock_migratetype should be
> protected by irq_disabled. Then, it would be better to add a comment or
> VM_BUG_ON check with irq_disabled in get_pfnblock_migratetype but I think
> get_pfnblock_migratetype might be called for other purpose in future.
> In that case, it's not necessary to disable irq so we could introduce
> "get_freeing_page_migratetype" with irq disabled check and use it.

Okay.

> 
> Question. soft_offline_page doesn't have any lock
> for get_pageblock_migratetype. Is it okay?

Hmm... I think it is okay. But, I guess that it need to check
return value of set_migratetype_isolate().

> 
> >  	__count_vm_events(PGFREE, 1 << order);
> >  	set_freepage_migratetype(page, migratetype);
> >  	free_one_page(page_zone(page), page, pfn, order, migratetype);
> > @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
> >  	if (!free_pages_prepare(page, 0))
> >  		return;
> >  
> > +	local_irq_save(flags);
> >  	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	set_freepage_migratetype(page, migratetype);
> > -	local_irq_save(flags);
> >  	__count_vm_event(PGFREE);
> >  
> >  	/*
> > @@ -6469,6 +6465,23 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_MEMORY_ISOLATION
> > +void zone_pcp_disable(struct zone *zone)
> > +{
> > +	mutex_lock(&pcp_batch_high_lock);
> > +	pageset_update(zone, 1, 1);
> > +}
> > +
> > +void zone_pcp_enable(struct zone *zone)
> > +{
> > +	int high, batch;
> > +
> > +	pageset_get_values(zone, &high, &batch);
> > +	pageset_update(zone, high, batch);
> > +	mutex_unlock(&pcp_batch_high_lock);
> > +}
> > +#endif
> 
> Nit:
> It is used for only page_isolation.c so how about moving to page_isolation.c?

I'd like to leave pcp management code in page_alloc.c.

> > +
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  /*
> >   * The zone indicated has a new number of managed_pages; batch sizes and percpu
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index 3100f98..439158d 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -16,9 +16,10 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >  	struct memory_isolate_notify arg;
> >  	int notifier_ret;
> >  	int ret = -EBUSY;
> > +	unsigned long nr_pages;
> > +	int migratetype;
> >  
> >  	zone = page_zone(page);
> > -
> 
> Unnecessary change.

Okay.

> 
> >  	spin_lock_irqsave(&zone->lock, flags);
> >  
> >  	pfn = page_to_pfn(page);
> > @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >  	 */
> >  
> >  out:
> > -	if (!ret) {
> > -		unsigned long nr_pages;
> > -		int migratetype = get_pageblock_migratetype(page);
> > +	if (ret) {
> > +		spin_unlock_irqrestore(&zone->lock, flags);
> > +		return ret;
> > +	}
> >  
> > -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> > +	migratetype = get_pageblock_migratetype(page);
> > +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> >  
> > -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> > -	}
> > +	zone_pcp_disable(zone);
> 
> You pcp disable/enable per pageblock so that overhead would be severe.
> I believe your remaining patches will solve it. Anyway, let's add "
> XXX: should save pcp disable/enable" and you could remove the comment
> when your further patches handles it so reviewer could be happy with
> fact which author already know the problem and someone could solve
> the issue even though your furhter patches might reject.

Okay.

Thanks.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation
@ 2014-08-13  8:29       ` Joonsoo Kim
  0 siblings, 0 replies; 84+ messages in thread
From: Joonsoo Kim @ 2014-08-13  8:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kirill A. Shutemov, Rik van Riel, Mel Gorman,
	Johannes Weiner, Yasuaki Ishimatsu, Zhang Yanfei,
	Srivatsa S. Bhat, Tang Chen, Naoya Horiguchi,
	Bartlomiej Zolnierkiewicz, Wen Congyang, Marek Szyprowski,
	Michal Nazarewicz, Laura Abbott, Heesub Shin, Aneesh Kumar K.V,
	Ritesh Harjani, t.stanislaws, Gioh Kim, linux-mm, linux-kernel

On Tue, Aug 12, 2014 at 05:17:45AM +0000, Minchan Kim wrote:
> On Wed, Aug 06, 2014 at 04:18:33PM +0900, Joonsoo Kim wrote:
> > 2. #1 requires IPI for synchronization and we can't hold the zone lock
> > during processing IPI. In this time, some pages could be moved from buddy
> > list to pcp list on page allocation path and later it could be moved again
> > from pcp list to buddy list. In this time, this page would be on isolate
> > pageblock, so, the hook is required on free_pcppages_bulk() to prevent
> > misplacement. To remove this possibility, disabling and draining pcp
> > list is needed during isolation. It guaratees that there is no page on pcp
> > list on all cpus while isolation, so misplacement problem can't happen.
> > 
> > Note that this doesn't fix freepage counting problem. To fix it,
> > we need more logic. Following patches will do it.
> 
> I hope to revise description in next spin. It's very hard to parse for
> stupid me.

Okay. I will do it.

> 
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> >  mm/internal.h       |    2 ++
> >  mm/page_alloc.c     |   27 ++++++++++++++++++++-------
> >  mm/page_isolation.c |   45 +++++++++++++++++++++++++++++++++------------
> >  3 files changed, 55 insertions(+), 19 deletions(-)
> > 
> > diff --git a/mm/internal.h b/mm/internal.h
> > index a1b651b..81b8884 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -108,6 +108,8 @@ extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address);
> >  /*
> >   * in mm/page_alloc.c
> >   */
> > +extern void zone_pcp_disable(struct zone *zone);
> > +extern void zone_pcp_enable(struct zone *zone);
> 
> Nit: Some of pcp functions has prefix zone but others don't.
> Which is better? If function has param zone as first argument,
> I think it's clear unless the function don't have prefix zone.

Okay.

> 
> >  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> >  extern void prep_compound_page(struct page *page, unsigned long order);
> >  #ifdef CONFIG_MEMORY_FAILURE
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3e1e344..4517b1d 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -726,11 +726,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> >  			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> >  			trace_mm_page_pcpu_drain(page, 0, mt);
> > -			if (likely(!is_migrate_isolate_page(page))) {
> > -				__mod_zone_page_state(zone, NR_FREE_PAGES, 1);
> > -				if (is_migrate_cma(mt))
> > -					__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
> > -			}
> > +			__mod_zone_freepage_state(zone, 1, mt);
> >  		} while (--to_free && --batch_free && !list_empty(list));
> >  	}
> >  	spin_unlock(&zone->lock);
> > @@ -789,8 +785,8 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  	if (!free_pages_prepare(page, order))
> >  		return;
> >  
> > -	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	local_irq_save(flags);
> > +	migratetype = get_pfnblock_migratetype(page, pfn);
> 
> Could you add some comment about page-isolated locking rule in somewhere?
> I think it's valuable to add it in code rather than description.

Will do.

> In addition, as your description, get_pfnblock_migratetype should be
> protected by irq_disabled. Then, it would be better to add a comment or
> VM_BUG_ON check with irq_disabled in get_pfnblock_migratetype but I think
> get_pfnblock_migratetype might be called for other purpose in future.
> In that case, it's not necessary to disable irq so we could introduce
> "get_freeing_page_migratetype" with irq disabled check and use it.

Okay.

> 
> Question. soft_offline_page doesn't have any lock
> for get_pageblock_migratetype. Is it okay?

Hmm... I think it is okay. But, I guess that it need to check
return value of set_migratetype_isolate().

> 
> >  	__count_vm_events(PGFREE, 1 << order);
> >  	set_freepage_migratetype(page, migratetype);
> >  	free_one_page(page_zone(page), page, pfn, order, migratetype);
> > @@ -1410,9 +1406,9 @@ void free_hot_cold_page(struct page *page, bool cold)
> >  	if (!free_pages_prepare(page, 0))
> >  		return;
> >  
> > +	local_irq_save(flags);
> >  	migratetype = get_pfnblock_migratetype(page, pfn);
> >  	set_freepage_migratetype(page, migratetype);
> > -	local_irq_save(flags);
> >  	__count_vm_event(PGFREE);
> >  
> >  	/*
> > @@ -6469,6 +6465,23 @@ void free_contig_range(unsigned long pfn, unsigned nr_pages)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_MEMORY_ISOLATION
> > +void zone_pcp_disable(struct zone *zone)
> > +{
> > +	mutex_lock(&pcp_batch_high_lock);
> > +	pageset_update(zone, 1, 1);
> > +}
> > +
> > +void zone_pcp_enable(struct zone *zone)
> > +{
> > +	int high, batch;
> > +
> > +	pageset_get_values(zone, &high, &batch);
> > +	pageset_update(zone, high, batch);
> > +	mutex_unlock(&pcp_batch_high_lock);
> > +}
> > +#endif
> 
> Nit:
> It is used for only page_isolation.c so how about moving to page_isolation.c?

I'd like to leave pcp management code in page_alloc.c.

> > +
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  /*
> >   * The zone indicated has a new number of managed_pages; batch sizes and percpu
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index 3100f98..439158d 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -16,9 +16,10 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >  	struct memory_isolate_notify arg;
> >  	int notifier_ret;
> >  	int ret = -EBUSY;
> > +	unsigned long nr_pages;
> > +	int migratetype;
> >  
> >  	zone = page_zone(page);
> > -
> 
> Unnecessary change.

Okay.

> 
> >  	spin_lock_irqsave(&zone->lock, flags);
> >  
> >  	pfn = page_to_pfn(page);
> > @@ -55,20 +56,32 @@ int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
> >  	 */
> >  
> >  out:
> > -	if (!ret) {
> > -		unsigned long nr_pages;
> > -		int migratetype = get_pageblock_migratetype(page);
> > +	if (ret) {
> > +		spin_unlock_irqrestore(&zone->lock, flags);
> > +		return ret;
> > +	}
> >  
> > -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
> > +	migratetype = get_pageblock_migratetype(page);
> > +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> >  
> > -		__mod_zone_freepage_state(zone, -nr_pages, migratetype);
> > -	}
> > +	zone_pcp_disable(zone);
> 
> You pcp disable/enable per pageblock so that overhead would be severe.
> I believe your remaining patches will solve it. Anyway, let's add "
> XXX: should save pcp disable/enable" and you could remove the comment
> when your further patches handles it so reviewer could be happy with
> fact which author already know the problem and someone could solve
> the issue even though your furhter patches might reject.

Okay.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2014-08-13  8:29 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-06  7:18 [PATCH v2 0/8] fix freepage count problems in memory isolation Joonsoo Kim
2014-08-06  7:18 ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 1/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-07  1:46   ` Zhang Yanfei
2014-08-07  1:46     ` Zhang Yanfei
2014-08-06  7:18 ` [PATCH v2 1/8] mm/page_alloc: fix pcp high, batch management Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-12  1:24   ` Minchan Kim
2014-08-12  1:24     ` Minchan Kim
2014-08-13  8:13     ` Joonsoo Kim
2014-08-13  8:13       ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 2/8] mm/isolation: remove unstable check for isolated page Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-07 13:49   ` Vlastimil Babka
2014-08-07 13:49     ` Vlastimil Babka
2014-08-08  6:22     ` Joonsoo Kim
2014-08-08  6:22       ` Joonsoo Kim
2014-08-11  9:23   ` Aneesh Kumar K.V
2014-08-11  9:23     ` Aneesh Kumar K.V
2014-08-13  8:19     ` Joonsoo Kim
2014-08-13  8:19       ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 2/8] mm/page_alloc: correct to clear guard attribute in DEBUG_PAGEALLOC Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-12  1:45   ` Minchan Kim
2014-08-12  1:45     ` Minchan Kim
2014-08-13  8:20     ` Joonsoo Kim
2014-08-13  8:20       ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 3/8] mm/isolation: remove unstable check for isolated page Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 3/8] mm/page_alloc: fix pcp high, batch management Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-07  2:11   ` Zhang Yanfei
2014-08-07  2:11     ` Zhang Yanfei
2014-08-07  8:23     ` Joonsoo Kim
2014-08-07  8:23       ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 4/8] mm/isolation: close the two race problems related to pageblock isolation Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-07 14:34   ` Vlastimil Babka
2014-08-07 14:34     ` Vlastimil Babka
2014-08-08  6:30     ` Joonsoo Kim
2014-08-08  6:30       ` Joonsoo Kim
2014-08-12  5:17   ` Minchan Kim
2014-08-12  5:17     ` Minchan Kim
2014-08-12  9:45     ` Vlastimil Babka
2014-08-12  9:45       ` Vlastimil Babka
2014-08-13  8:09       ` Joonsoo Kim
2014-08-13  8:09         ` Joonsoo Kim
2014-08-13  8:29     ` Joonsoo Kim
2014-08-13  8:29       ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 5/8] mm/isolation: change pageblock isolation logic to fix freepage counting bugs Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-06 15:12   ` Vlastimil Babka
2014-08-06 15:12     ` Vlastimil Babka
2014-08-07  8:19     ` Joonsoo Kim
2014-08-07  8:19       ` Joonsoo Kim
2014-08-07  8:53       ` Vlastimil Babka
2014-08-07  8:53         ` Vlastimil Babka
2014-08-07 12:26         ` Joonsoo Kim
2014-08-07 12:26           ` Joonsoo Kim
2014-08-07 13:04           ` Vlastimil Babka
2014-08-07 13:04             ` Vlastimil Babka
2014-08-07 13:35             ` Joonsoo Kim
2014-08-07 13:35               ` Joonsoo Kim
2014-08-07 15:15   ` Vlastimil Babka
2014-08-07 15:15     ` Vlastimil Babka
2014-08-08  6:45     ` Joonsoo Kim
2014-08-08  6:45       ` Joonsoo Kim
2014-08-12  6:43   ` Minchan Kim
2014-08-12  6:43     ` Minchan Kim
2014-08-12 10:58     ` Vlastimil Babka
2014-08-12 10:58       ` Vlastimil Babka
2014-08-06  7:18 ` [PATCH v2 6/8] mm/isolation: factor out pre/post logic on set/unset_migratetype_isolate() Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 7/8] mm/isolation: fix freepage counting bug on start/undo_isolat_page_range() Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-06  7:18 ` [PATCH v2 8/8] mm/isolation: remove useless race handling related to pageblock isolation Joonsoo Kim
2014-08-06  7:18   ` Joonsoo Kim
2014-08-06  7:25 ` [PATCH v2 0/8] fix freepage count problems in memory isolation Joonsoo Kim
2014-08-06  7:25   ` Joonsoo Kim
2014-08-07  0:49 ` Zhang Yanfei
2014-08-07  0:49   ` Zhang Yanfei
2014-08-07  8:20   ` Joonsoo Kim
2014-08-07  8:20     ` Joonsoo Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.