linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] memory-hotplug: handle page race between allocation and isolation
@ 2012-09-05  7:25 Minchan Kim
  2012-09-05  7:26 ` [PATCH 1/3] mm: use get_page_migratetype instead of page_private Minchan Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-05  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu, Mel Gorman,
	linux-mm, linux-kernel, Minchan Kim

Memory hotplug has a subtle race problem so this patchset fixes the problem
(Look at [3/3] for detail and please confirm the problem before review
other patches in this series.)

 [1/3] is just clean up and help for [2/3].
 [2/3] keeps the migratetype information to freed page's index field
       and [3/3] uses the information.
 [3/3] fixes the race problem with [2/3]'s information.

After applying [2/3], migratetype argument in __free_one_page
and free_one_page is redundant so we can remove it but I decide
to not touch them because it increases code size about 50 byte.

Minchan Kim (3):
  mm: use get_page_migratetype instead of page_private
  mm: remain migratetype in freed page
  memory-hotplug: bug fix race between isolation and allocation

 include/linux/mm.h  |   12 ++++++++++++
 mm/page_alloc.c     |   16 ++++++++++------
 mm/page_isolation.c |    7 +++++--
 3 files changed, 27 insertions(+), 8 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] mm: use get_page_migratetype instead of page_private
  2012-09-05  7:25 [PATCH 0/3] memory-hotplug: handle page race between allocation and isolation Minchan Kim
@ 2012-09-05  7:26 ` Minchan Kim
  2012-09-05  9:09   ` Mel Gorman
  2012-09-06  2:02   ` Kamezawa Hiroyuki
  2012-09-05  7:26 ` [PATCH 2/3] mm: remain migratetype in freed page Minchan Kim
  2012-09-05  7:26 ` [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation Minchan Kim
  2 siblings, 2 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-05  7:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu, Mel Gorman,
	linux-mm, linux-kernel, Minchan Kim

page allocator uses set_page_private and page_private for handling
migratetype when it frees page. Let's replace them with [set|get]
_page_migratetype to make it more clear.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/mm.h  |   10 ++++++++++
 mm/page_alloc.c     |   11 +++++++----
 mm/page_isolation.c |    2 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5c76634..86d61d6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -249,6 +249,16 @@ struct inode;
 #define page_private(page)		((page)->private)
 #define set_page_private(page, v)	((page)->private = (v))
 
+static inline void set_page_migratetype(struct page *page, int migratetype)
+{
+	set_page_private(page, migratetype);
+}
+
+static inline int get_page_migratetype(struct page *page)
+{
+	return page_private(page);
+}
+
 /*
  * FIXME: take this include out, include page-flags.h in
  * files which need it (119 of them)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 710d91c..103ba66 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
 			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
-			__free_one_page(page, zone, 0, page_private(page));
-			trace_mm_page_pcpu_drain(page, 0, page_private(page));
+			__free_one_page(page, zone, 0,
+				get_page_migratetype(page));
+			trace_mm_page_pcpu_drain(page, 0,
+				get_page_migratetype(page));
 		} while (--to_free && --batch_free && !list_empty(list));
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
@@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	__count_vm_events(PGFREE, 1 << order);
 	free_one_page(page_zone(page), page, order,
 					get_pageblock_migratetype(page));
+
 	local_irq_restore(flags);
 }
 
@@ -1134,7 +1137,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
 				mt = migratetype;
 		}
-		set_page_private(page, mt);
+		set_page_migratetype(page, mt);
 		list = &page->lru;
 	}
 	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1301,7 +1304,7 @@ void free_hot_cold_page(struct page *page, int cold)
 		return;
 
 	migratetype = get_pageblock_migratetype(page);
-	set_page_private(page, migratetype);
+	set_page_migratetype(page, migratetype);
 	local_irq_save(flags);
 	if (unlikely(wasMlocked))
 		free_page_mlock(page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 64abb33..acf65a7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -199,7 +199,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
 		if (PageBuddy(page))
 			pfn += 1 << page_order(page);
 		else if (page_count(page) == 0 &&
-				page_private(page) == MIGRATE_ISOLATE)
+				get_page_migratetype(page) == MIGRATE_ISOLATE)
 			pfn += 1;
 		else
 			break;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] mm: remain migratetype in freed page
  2012-09-05  7:25 [PATCH 0/3] memory-hotplug: handle page race between allocation and isolation Minchan Kim
  2012-09-05  7:26 ` [PATCH 1/3] mm: use get_page_migratetype instead of page_private Minchan Kim
@ 2012-09-05  7:26 ` Minchan Kim
  2012-09-05  9:25   ` Mel Gorman
  2012-09-05  7:26 ` [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation Minchan Kim
  2 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2012-09-05  7:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu, Mel Gorman,
	linux-mm, linux-kernel, Minchan Kim

Page allocator doesn't keep migratetype information to page
when the page is freed. This patch remains the information
to freed page's index field which isn't used by free/alloc
preparing so it shouldn't change any behavir except below one.

This patch adds a new call site in __free_pages_ok so it might be
overhead a bit but it's for high order allocation.
So I believe damage isn't hurt.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/mm.h |    6 ++++--
 mm/page_alloc.c    |    7 ++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 86d61d6..8fd32da 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -251,12 +251,14 @@ struct inode;
 
 static inline void set_page_migratetype(struct page *page, int migratetype)
 {
-	set_page_private(page, migratetype);
+	VM_BUG_ON((unsigned int)migratetype >= MIGRATE_TYPES);
+	page->index = migratetype;
 }
 
 static inline int get_page_migratetype(struct page *page)
 {
-	return page_private(page);
+	VM_BUG_ON((unsigned int)page->index >= MIGRATE_TYPES);
+	return page->index;
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 103ba66..32985dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -723,6 +723,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 {
 	unsigned long flags;
 	int wasMlocked = __TestClearPageMlocked(page);
+	int migratetype;
 
 	if (!free_pages_prepare(page, order))
 		return;
@@ -731,9 +732,9 @@ static void __free_pages_ok(struct page *page, unsigned int order)
 	if (unlikely(wasMlocked))
 		free_page_mlock(page);
 	__count_vm_events(PGFREE, 1 << order);
-	free_one_page(page_zone(page), page, order,
-					get_pageblock_migratetype(page));
-
+	migratetype = get_pageblock_migratetype(page);
+	set_page_migratetype(page, migratetype);
+	free_one_page(page_zone(page), page, order, migratetype);
 	local_irq_restore(flags);
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation
  2012-09-05  7:25 [PATCH 0/3] memory-hotplug: handle page race between allocation and isolation Minchan Kim
  2012-09-05  7:26 ` [PATCH 1/3] mm: use get_page_migratetype instead of page_private Minchan Kim
  2012-09-05  7:26 ` [PATCH 2/3] mm: remain migratetype in freed page Minchan Kim
@ 2012-09-05  7:26 ` Minchan Kim
  2012-09-05  9:40   ` Mel Gorman
  2012-09-07  6:26   ` jencce zhou
  2 siblings, 2 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-05  7:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu, Mel Gorman,
	linux-mm, linux-kernel, Minchan Kim

Like below, memory-hotplug makes race between page-isolation
and page-allocation so it can hit BUG_ON in __offline_isolated_pages.

	CPU A					CPU B

start_isolate_page_range
set_migratetype_isolate
spin_lock_irqsave(zone->lock)

				free_hot_cold_page(Page A)
				/* without zone->lock */
				migratetype = get_pageblock_migratetype(Page A);
				/*
				 * Page could be moved into MIGRATE_MOVABLE
				 * of per_cpu_pages
				 */
				list_add_tail(&page->lru, &pcp->lists[migratetype]);

set_pageblock_isolate
move_freepages_block
drain_all_pages

				/* Page A could be in MIGRATE_MOVABLE of free_list. */

check_pages_isolated
__test_page_isolated_in_pageblock
/*
 * We can't catch freed page which
 * is free_list[MIGRATE_MOVABLE]
 */
if (PageBuddy(page A))
	pfn += 1 << page_order(page A);

				/* So, Page A could be allocated */

__offline_isolated_pages
/*
 * BUG_ON hit or offline page
 * which is used by someone
 */
BUG_ON(!PageBuddy(page A));

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/page_isolation.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index acf65a7..4699d1f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
 			continue;
 		}
 		page = pfn_to_page(pfn);
-		if (PageBuddy(page))
+		if (PageBuddy(page)) {
+			if (get_page_migratetype(page) != MIGRATE_ISOLATE)
+				break;
 			pfn += 1 << page_order(page);
+		}
 		else if (page_count(page) == 0 &&
 				get_page_migratetype(page) == MIGRATE_ISOLATE)
 			pfn += 1;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm: use get_page_migratetype instead of page_private
  2012-09-05  7:26 ` [PATCH 1/3] mm: use get_page_migratetype instead of page_private Minchan Kim
@ 2012-09-05  9:09   ` Mel Gorman
  2012-09-06  2:17     ` Minchan Kim
  2012-09-06  2:02   ` Kamezawa Hiroyuki
  1 sibling, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2012-09-05  9:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Wed, Sep 05, 2012 at 04:26:00PM +0900, Minchan Kim wrote:
> page allocator uses set_page_private and page_private for handling
> migratetype when it frees page. Let's replace them with [set|get]
> _page_migratetype to make it more clear.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Maybe it's because I'm used of setting set_page_private() in the page
allocator and what it means but I fear that it'll be very easy to confuse
get_page_migratetype() with get_pageblock_migratetype(). The former only
works while the page is in the buddy allocator. The latter can be called
at any time. I'm not against the patch as such but I'm not convinced
either :)

One nit below

> ---
>  include/linux/mm.h  |   10 ++++++++++
>  mm/page_alloc.c     |   11 +++++++----
>  mm/page_isolation.c |    2 +-
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5c76634..86d61d6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -249,6 +249,16 @@ struct inode;
>  #define page_private(page)		((page)->private)
>  #define set_page_private(page, v)	((page)->private = (v))
>  
> +static inline void set_page_migratetype(struct page *page, int migratetype)
> +{
> +	set_page_private(page, migratetype);
> +}
> +
> +static inline int get_page_migratetype(struct page *page)
> +{
> +	return page_private(page);
> +}
> +
>  /*
>   * FIXME: take this include out, include page-flags.h in
>   * files which need it (119 of them)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 710d91c..103ba66 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			/* must delete as __free_one_page list manipulates */
>  			list_del(&page->lru);
>  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> -			__free_one_page(page, zone, 0, page_private(page));
> -			trace_mm_page_pcpu_drain(page, 0, page_private(page));
> +			__free_one_page(page, zone, 0,
> +				get_page_migratetype(page));
> +			trace_mm_page_pcpu_drain(page, 0,
> +				get_page_migratetype(page));
>  		} while (--to_free && --batch_free && !list_empty(list));
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
> @@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>  	__count_vm_events(PGFREE, 1 << order);
>  	free_one_page(page_zone(page), page, order,
>  					get_pageblock_migratetype(page));
> +
>  	local_irq_restore(flags);
>  }
>  

Unnecessary whitespace change.

> @@ -1134,7 +1137,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>  			if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
>  				mt = migratetype;
>  		}
> -		set_page_private(page, mt);
> +		set_page_migratetype(page, mt);
>  		list = &page->lru;
>  	}
>  	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1301,7 +1304,7 @@ void free_hot_cold_page(struct page *page, int cold)
>  		return;
>  
>  	migratetype = get_pageblock_migratetype(page);
> -	set_page_private(page, migratetype);
> +	set_page_migratetype(page, migratetype);
>  	local_irq_save(flags);
>  	if (unlikely(wasMlocked))
>  		free_page_mlock(page);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 64abb33..acf65a7 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -199,7 +199,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
>  		if (PageBuddy(page))
>  			pfn += 1 << page_order(page);
>  		else if (page_count(page) == 0 &&
> -				page_private(page) == MIGRATE_ISOLATE)
> +				get_page_migratetype(page) == MIGRATE_ISOLATE)
>  			pfn += 1;
>  		else
>  			break;

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] mm: remain migratetype in freed page
  2012-09-05  7:26 ` [PATCH 2/3] mm: remain migratetype in freed page Minchan Kim
@ 2012-09-05  9:25   ` Mel Gorman
  2012-09-06  2:28     ` Minchan Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2012-09-05  9:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Wed, Sep 05, 2012 at 04:26:01PM +0900, Minchan Kim wrote:
> Page allocator doesn't keep migratetype information to page
> when the page is freed. This patch remains the information
> to freed page's index field which isn't used by free/alloc
> preparing so it shouldn't change any behavir except below one.
> 

This explanation could have been a *LOT* more helpful.

The page allocator caches the pageblock information in page->private while
it is in the PCP freelists but this is overwritten with the order of the
page when freed to the buddy allocator. This patch stores the migratetype
of the page in the page->index field so that it is available at all times.

> This patch adds a new call site in __free_pages_ok so it might be
> overhead a bit but it's for high order allocation.
> So I believe damage isn't hurt.
> 

The additional call to set_page_migratetype() is not heavy. If you were
adding a new call to get_pageblock_migratetype() or something equally
expensive I would be more concerned.

> Signed-off-by: Minchan Kim <minchan@kernel.org>

The information you store in the page->index becomes stale if the page
gets moved to another free list by move_freepages(). Not sure if that is
a problem for you or not but it is possible that
get_page_migratetype(page) != get_pageblock_migratetype(page)

> ---
>  include/linux/mm.h |    6 ++++--
>  mm/page_alloc.c    |    7 ++++---
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 86d61d6..8fd32da 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -251,12 +251,14 @@ struct inode;
>  
>  static inline void set_page_migratetype(struct page *page, int migratetype)
>  {
> -	set_page_private(page, migratetype);
> +	VM_BUG_ON((unsigned int)migratetype >= MIGRATE_TYPES);

This additional bug check is not mentioned in the changelog. Not clear
if it's necessary.

> +	page->index = migratetype;
>  }
>  
>  static inline int get_page_migratetype(struct page *page)
>  {
> -	return page_private(page);
> +	VM_BUG_ON((unsigned int)page->index >= MIGRATE_TYPES);
> +	return page->index;
>  }
>  
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 103ba66..32985dd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -723,6 +723,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>  {
>  	unsigned long flags;
>  	int wasMlocked = __TestClearPageMlocked(page);
> +	int migratetype;
>  
>  	if (!free_pages_prepare(page, order))
>  		return;
> @@ -731,9 +732,9 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>  	if (unlikely(wasMlocked))
>  		free_page_mlock(page);
>  	__count_vm_events(PGFREE, 1 << order);
> -	free_one_page(page_zone(page), page, order,
> -					get_pageblock_migratetype(page));
> -
> +	migratetype = get_pageblock_migratetype(page);
> +	set_page_migratetype(page, migratetype);
> +	free_one_page(page_zone(page), page, order, migratetype);
>  	local_irq_restore(flags);
>  }
>  

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation
  2012-09-05  7:26 ` [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation Minchan Kim
@ 2012-09-05  9:40   ` Mel Gorman
  2012-09-06  4:49     ` Minchan Kim
  2012-09-07  6:26   ` jencce zhou
  1 sibling, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2012-09-05  9:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Wed, Sep 05, 2012 at 04:26:02PM +0900, Minchan Kim wrote:
> Like below, memory-hotplug makes race between page-isolation
> and page-allocation so it can hit BUG_ON in __offline_isolated_pages.
> 
> 	CPU A					CPU B
> 
> start_isolate_page_range
> set_migratetype_isolate
> spin_lock_irqsave(zone->lock)
> 
> 				free_hot_cold_page(Page A)
> 				/* without zone->lock */
> 				migratetype = get_pageblock_migratetype(Page A);
> 				/*
> 				 * Page could be moved into MIGRATE_MOVABLE
> 				 * of per_cpu_pages
> 				 */
> 				list_add_tail(&page->lru, &pcp->lists[migratetype]);
> 
> set_pageblock_isolate
> move_freepages_block
> drain_all_pages
> 
> 				/* Page A could be in MIGRATE_MOVABLE of free_list. */
> 
> check_pages_isolated
> __test_page_isolated_in_pageblock
> /*
>  * We can't catch freed page which
>  * is free_list[MIGRATE_MOVABLE]
>  */
> if (PageBuddy(page A))
> 	pfn += 1 << page_order(page A);
> 
> 				/* So, Page A could be allocated */
> 
> __offline_isolated_pages
> /*
>  * BUG_ON hit or offline page
>  * which is used by someone
>  */
> BUG_ON(!PageBuddy(page A));
> 

offline_page calling BUG_ON because someone allocated the page is
ridiculous. I did not spot where that check is but it should be changed. The
correct action is to retry the isolation.

> Signed-off-by: Minchan Kim <minchan@kernel.org>

At no point in the changelog do you actually say what he patch does :/

> ---
>  mm/page_isolation.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index acf65a7..4699d1f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
>  			continue;
>  		}
>  		page = pfn_to_page(pfn);
> -		if (PageBuddy(page))
> +		if (PageBuddy(page)) {
> +			if (get_page_migratetype(page) != MIGRATE_ISOLATE)
> +				break;
>  			pfn += 1 << page_order(page);
> +		}

It is possible the page is moved to the MIGRATE_ISOLATE list between when
the page was freed to the buddy allocator and this check was made. The
page->index information is stale and the impact is that the hotplug
operation fails when it could have succeeded. That said, I think it is a
very unlikely race that will never happen in practice.

More importantly, the effect of this path is that EBUSY gets bubbled all
the way up and the hotplug operations fails. This is fine but as the page
is free at the time this problem is detected you also have the option
of moving the PageBuddy page to the MIGRATE_ISOLATE list at this time
if you take the zone lock. This will mean you need to change the name of
test_pages_isolated() of course.

>  		else if (page_count(page) == 0 &&
>  				get_page_migratetype(page) == MIGRATE_ISOLATE)
>  			pfn += 1;
> -- 
> 1.7.9.5
> 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm: use get_page_migratetype instead of page_private
  2012-09-05  7:26 ` [PATCH 1/3] mm: use get_page_migratetype instead of page_private Minchan Kim
  2012-09-05  9:09   ` Mel Gorman
@ 2012-09-06  2:02   ` Kamezawa Hiroyuki
  2012-09-06  2:19     ` Minchan Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Kamezawa Hiroyuki @ 2012-09-06  2:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Yasuaki Ishimatsu, Xishi Qiu, Mel Gorman,
	linux-mm, linux-kernel

(2012/09/05 16:26), Minchan Kim wrote:
> page allocator uses set_page_private and page_private for handling
> migratetype when it frees page. Let's replace them with [set|get]
> _page_migratetype to make it more clear.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>

Hmm. one request from me.

> ---
>   include/linux/mm.h  |   10 ++++++++++
>   mm/page_alloc.c     |   11 +++++++----
>   mm/page_isolation.c |    2 +-
>   3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5c76634..86d61d6 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -249,6 +249,16 @@ struct inode;
>   #define page_private(page)		((page)->private)
>   #define set_page_private(page, v)	((page)->private = (v))
>   
> +static inline void set_page_migratetype(struct page *page, int migratetype)
> +{
> +	set_page_private(page, migratetype);
> +}
> +
> +static inline int get_page_migratetype(struct page *page)
> +{
> +	return page_private(page);
> +}
> +

Could you add comments to explain "when this function returns expected value" ?
These functions can work well only in very restricted area of codes.

By the way, does these functions should be static-inline ?

Thanks,
-Kame

>   /*
>    * FIXME: take this include out, include page-flags.h in
>    * files which need it (119 of them)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 710d91c..103ba66 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>   			/* must delete as __free_one_page list manipulates */
>   			list_del(&page->lru);
>   			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> -			__free_one_page(page, zone, 0, page_private(page));
> -			trace_mm_page_pcpu_drain(page, 0, page_private(page));
> +			__free_one_page(page, zone, 0,
> +				get_page_migratetype(page));
> +			trace_mm_page_pcpu_drain(page, 0,
> +				get_page_migratetype(page));
>   		} while (--to_free && --batch_free && !list_empty(list));
>   	}
>   	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
> @@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
>   	__count_vm_events(PGFREE, 1 << order);
>   	free_one_page(page_zone(page), page, order,
>   					get_pageblock_migratetype(page));
> +
>   	local_irq_restore(flags);
>   }
>   
> @@ -1134,7 +1137,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>   			if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
>   				mt = migratetype;
>   		}
> -		set_page_private(page, mt);
> +		set_page_migratetype(page, mt);
>   		list = &page->lru;
>   	}
>   	__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
> @@ -1301,7 +1304,7 @@ void free_hot_cold_page(struct page *page, int cold)
>   		return;
>   
>   	migratetype = get_pageblock_migratetype(page);
> -	set_page_private(page, migratetype);
> +	set_page_migratetype(page, migratetype);
>   	local_irq_save(flags);
>   	if (unlikely(wasMlocked))
>   		free_page_mlock(page);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 64abb33..acf65a7 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -199,7 +199,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
>   		if (PageBuddy(page))
>   			pfn += 1 << page_order(page);
>   		else if (page_count(page) == 0 &&
> -				page_private(page) == MIGRATE_ISOLATE)
> +				get_page_migratetype(page) == MIGRATE_ISOLATE)
>   			pfn += 1;
>   		else
>   			break;
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm: use get_page_migratetype instead of page_private
  2012-09-05  9:09   ` Mel Gorman
@ 2012-09-06  2:17     ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-06  2:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

Hi Mel,

On Wed, Sep 05, 2012 at 10:09:55AM +0100, Mel Gorman wrote:
> On Wed, Sep 05, 2012 at 04:26:00PM +0900, Minchan Kim wrote:
> > page allocator uses set_page_private and page_private for handling
> > migratetype when it frees page. Let's replace them with [set|get]
> > _page_migratetype to make it more clear.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Maybe it's because I'm used of setting set_page_private() in the page
> allocator and what it means but I fear that it'll be very easy to confuse
> get_page_migratetype() with get_pageblock_migratetype(). The former only
> works while the page is in the buddy allocator. The latter can be called
> at any time. I'm not against the patch as such but I'm not convinced
> either :)

How about using name "get_buddypage_migratetype" instead of "get_page_migratetype"?

> 
> One nit below
> 
> > ---
> >  include/linux/mm.h  |   10 ++++++++++
> >  mm/page_alloc.c     |   11 +++++++----
> >  mm/page_isolation.c |    2 +-
> >  3 files changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 5c76634..86d61d6 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -249,6 +249,16 @@ struct inode;
> >  #define page_private(page)		((page)->private)
> >  #define set_page_private(page, v)	((page)->private = (v))
> >  
> > +static inline void set_page_migratetype(struct page *page, int migratetype)
> > +{
> > +	set_page_private(page, migratetype);
> > +}
> > +
> > +static inline int get_page_migratetype(struct page *page)
> > +{
> > +	return page_private(page);
> > +}
> > +
> >  /*
> >   * FIXME: take this include out, include page-flags.h in
> >   * files which need it (119 of them)
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 710d91c..103ba66 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  			/* must delete as __free_one_page list manipulates */
> >  			list_del(&page->lru);
> >  			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
> > -			__free_one_page(page, zone, 0, page_private(page));
> > -			trace_mm_page_pcpu_drain(page, 0, page_private(page));
> > +			__free_one_page(page, zone, 0,
> > +				get_page_migratetype(page));
> > +			trace_mm_page_pcpu_drain(page, 0,
> > +				get_page_migratetype(page));
> >  		} while (--to_free && --batch_free && !list_empty(list));
> >  	}
> >  	__mod_zone_page_state(zone, NR_FREE_PAGES, count);
> > @@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  	__count_vm_events(PGFREE, 1 << order);
> >  	free_one_page(page_zone(page), page, order,
> >  					get_pageblock_migratetype(page));
> > +
> >  	local_irq_restore(flags);
> >  }
> >  
> 
> Unnecessary whitespace change.

Will fix.
Thanks!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] mm: use get_page_migratetype instead of page_private
  2012-09-06  2:02   ` Kamezawa Hiroyuki
@ 2012-09-06  2:19     ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-06  2:19 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, Yasuaki Ishimatsu, Xishi Qiu, Mel Gorman,
	linux-mm, linux-kernel

Hi Kame,

On Thu, Sep 06, 2012 at 11:02:47AM +0900, Kamezawa Hiroyuki wrote:
> (2012/09/05 16:26), Minchan Kim wrote:
> > page allocator uses set_page_private and page_private for handling
> > migratetype when it frees page. Let's replace them with [set|get]
> > _page_migratetype to make it more clear.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> Hmm. one request from me.
> 
> > ---
> >   include/linux/mm.h  |   10 ++++++++++
> >   mm/page_alloc.c     |   11 +++++++----
> >   mm/page_isolation.c |    2 +-
> >   3 files changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 5c76634..86d61d6 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -249,6 +249,16 @@ struct inode;
> >   #define page_private(page)		((page)->private)
> >   #define set_page_private(page, v)	((page)->private = (v))
> >   
> > +static inline void set_page_migratetype(struct page *page, int migratetype)
> > +{
> > +	set_page_private(page, migratetype);
> > +}
> > +
> > +static inline int get_page_migratetype(struct page *page)
> > +{
> > +	return page_private(page);
> > +}
> > +
> 
> Could you add comments to explain "when this function returns expected value" ?
> These functions can work well only in very restricted area of codes.

Yes. It works only if the page exist in free_list.
I will add the comment about that and hope change function name
get_page_migratetype with get_buddypage_migratetype.
It would be less confusing.

Thanks.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] mm: remain migratetype in freed page
  2012-09-05  9:25   ` Mel Gorman
@ 2012-09-06  2:28     ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-06  2:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Wed, Sep 05, 2012 at 10:25:34AM +0100, Mel Gorman wrote:
> On Wed, Sep 05, 2012 at 04:26:01PM +0900, Minchan Kim wrote:
> > Page allocator doesn't keep migratetype information to page
> > when the page is freed. This patch remains the information
> > to freed page's index field which isn't used by free/alloc
> > preparing so it shouldn't change any behavir except below one.
> > 
> 
> This explanation could have been a *LOT* more helpful.
> 
> The page allocator caches the pageblock information in page->private while
> it is in the PCP freelists but this is overwritten with the order of the
> page when freed to the buddy allocator. This patch stores the migratetype
> of the page in the page->index field so that it is available at all times.

I will add your comment in my description.

> 
> > This patch adds a new call site in __free_pages_ok so it might be
> > overhead a bit but it's for high order allocation.
> > So I believe damage isn't hurt.
> > 
> 
> The additional call to set_page_migratetype() is not heavy. If you were
> adding a new call to get_pageblock_migratetype() or something equally
> expensive I would be more concerned.

I'm lucky to avoid your keen eye. ;)

> 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> The information you store in the page->index becomes stale if the page
> gets moved to another free list by move_freepages(). Not sure if that is
> a problem for you or not but it is possible that
> get_page_migratetype(page) != get_pageblock_migratetype(page)

Thanks for the spot. I have to fix it.

> 
> > ---
> >  include/linux/mm.h |    6 ++++--
> >  mm/page_alloc.c    |    7 ++++---
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 86d61d6..8fd32da 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -251,12 +251,14 @@ struct inode;
> >  
> >  static inline void set_page_migratetype(struct page *page, int migratetype)
> >  {
> > -	set_page_private(page, migratetype);
> > +	VM_BUG_ON((unsigned int)migratetype >= MIGRATE_TYPES);
> 
> This additional bug check is not mentioned in the changelog. Not clear
> if it's necessary.

I'm not strong so if anyone think it's not necessary, I will drop.

> 
> > +	page->index = migratetype;
> >  }
> >  
> >  static inline int get_page_migratetype(struct page *page)
> >  {
> > -	return page_private(page);
> > +	VM_BUG_ON((unsigned int)page->index >= MIGRATE_TYPES);
> > +	return page->index;
> >  }
> >  
> >  /*
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 103ba66..32985dd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -723,6 +723,7 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  {
> >  	unsigned long flags;
> >  	int wasMlocked = __TestClearPageMlocked(page);
> > +	int migratetype;
> >  
> >  	if (!free_pages_prepare(page, order))
> >  		return;
> > @@ -731,9 +732,9 @@ static void __free_pages_ok(struct page *page, unsigned int order)
> >  	if (unlikely(wasMlocked))
> >  		free_page_mlock(page);
> >  	__count_vm_events(PGFREE, 1 << order);
> > -	free_one_page(page_zone(page), page, order,
> > -					get_pageblock_migratetype(page));
> > -
> > +	migratetype = get_pageblock_migratetype(page);
> > +	set_page_migratetype(page, migratetype);
> > +	free_one_page(page_zone(page), page, order, migratetype);
> >  	local_irq_restore(flags);
> >  }
> >  
> 
> -- 
> Mel Gorman
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation
  2012-09-05  9:40   ` Mel Gorman
@ 2012-09-06  4:49     ` Minchan Kim
  2012-09-06  9:24       ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Minchan Kim @ 2012-09-06  4:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Wed, Sep 05, 2012 at 10:40:41AM +0100, Mel Gorman wrote:
> On Wed, Sep 05, 2012 at 04:26:02PM +0900, Minchan Kim wrote:
> > Like below, memory-hotplug makes race between page-isolation
> > and page-allocation so it can hit BUG_ON in __offline_isolated_pages.
> > 
> > 	CPU A					CPU B
> > 
> > start_isolate_page_range
> > set_migratetype_isolate
> > spin_lock_irqsave(zone->lock)
> > 
> > 				free_hot_cold_page(Page A)
> > 				/* without zone->lock */
> > 				migratetype = get_pageblock_migratetype(Page A);
> > 				/*
> > 				 * Page could be moved into MIGRATE_MOVABLE
> > 				 * of per_cpu_pages
> > 				 */
> > 				list_add_tail(&page->lru, &pcp->lists[migratetype]);
> > 
> > set_pageblock_isolate
> > move_freepages_block
> > drain_all_pages
> > 
> > 				/* Page A could be in MIGRATE_MOVABLE of free_list. */
> > 
> > check_pages_isolated
> > __test_page_isolated_in_pageblock
> > /*
> >  * We can't catch freed page which
> >  * is free_list[MIGRATE_MOVABLE]
> >  */
> > if (PageBuddy(page A))
> > 	pfn += 1 << page_order(page A);
> > 
> > 				/* So, Page A could be allocated */
> > 
> > __offline_isolated_pages
> > /*
> >  * BUG_ON hit or offline page
> >  * which is used by someone
> >  */
> > BUG_ON(!PageBuddy(page A));
> > 
> 
> offline_page calling BUG_ON because someone allocated the page is
> ridiculous. I did not spot where that check is but it should be changed. The
> correct action is to retry the isolation.

It is where __offline_isolated_pges.

..
        while (pfn < end_pfn) {
                if (!pfn_valid(pfn)) {
                        pfn++;
                        continue;
                }    
                page = pfn_to_page(pfn);
                BUG_ON(page_count(page));
                BUG_ON(!PageBuddy(page)); <---- HERE
                order = page_order(page);
...

Comment of offline_isolated_pages says following as.

        We cannot do rollback at this point

So if the comment is true, BUG_ON does make sense to me.
But I don't see why we can't retry it as I look thorugh code.
Anyway, It's another story which isn't related to this patch.

> 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> 
> At no point in the changelog do you actually say what he patch does :/

Argh, I will do.

> 
> > ---
> >  mm/page_isolation.c |    5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > index acf65a7..4699d1f 100644
> > --- a/mm/page_isolation.c
> > +++ b/mm/page_isolation.c
> > @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
> >  			continue;
> >  		}
> >  		page = pfn_to_page(pfn);
> > -		if (PageBuddy(page))
> > +		if (PageBuddy(page)) {
> > +			if (get_page_migratetype(page) != MIGRATE_ISOLATE)
> > +				break;
> >  			pfn += 1 << page_order(page);
> > +		}
> 
> It is possible the page is moved to the MIGRATE_ISOLATE list between when
> the page was freed to the buddy allocator and this check was made. The
> page->index information is stale and the impact is that the hotplug
> operation fails when it could have succeeded. That said, I think it is a
> very unlikely race that will never happen in practice.

I understand you mean move_freepages which I have missed. Right?
Then, I will fix it, too.

> 
> More importantly, the effect of this path is that EBUSY gets bubbled all
> the way up and the hotplug operations fails. This is fine but as the page
> is free at the time this problem is detected you also have the option
> of moving the PageBuddy page to the MIGRATE_ISOLATE list at this time
> if you take the zone lock. This will mean you need to change the name of
> test_pages_isolated() of course.

Sorry, I can't get your point. Could you elaborate it more?
Is it related to this patch?


> 
> >  		else if (page_count(page) == 0 &&
> >  				get_page_migratetype(page) == MIGRATE_ISOLATE)
> >  			pfn += 1;
> > -- 
> > 1.7.9.5
> > 
> 
> -- 
> Mel Gorman
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation
  2012-09-06  4:49     ` Minchan Kim
@ 2012-09-06  9:24       ` Mel Gorman
  2012-09-06 23:32         ` Minchan Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2012-09-06  9:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Thu, Sep 06, 2012 at 01:49:03PM +0900, Minchan Kim wrote:
> > > __offline_isolated_pages
> > > /*
> > >  * BUG_ON hit or offline page
> > >  * which is used by someone
> > >  */
> > > BUG_ON(!PageBuddy(page A));
> > > 
> > 
> > offline_page calling BUG_ON because someone allocated the page is
> > ridiculous. I did not spot where that check is but it should be changed. The
> > correct action is to retry the isolation.
> 
> It is where __offline_isolated_pges.
> 
> ..
>         while (pfn < end_pfn) {
>                 if (!pfn_valid(pfn)) {
>                         pfn++;
>                         continue;
>                 }    
>                 page = pfn_to_page(pfn);
>                 BUG_ON(page_count(page));
>                 BUG_ON(!PageBuddy(page)); <---- HERE
>                 order = page_order(page);
> ...
> 
> Comment of offline_isolated_pages says following as.
> 
>         We cannot do rollback at this point
> 
> So if the comment is true, BUG_ON does make sense to me.

It's massive overkill. I see no reason why it cannot return EBUSY all the
way back up to offline_pages() and retry with the migration step.  It would
both remove that BUG_ON and improve reliability of memory hot-remove.

> But I don't see why we can't retry it as I look thorugh code.
> Anyway, It's another story which isn't related to this patch.
> 

True.

> > 
> > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > 
> > At no point in the changelog do you actually say what he patch does :/
> 
> Argh, I will do.
> 
> > 
> > > ---
> > >  mm/page_isolation.c |    5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > index acf65a7..4699d1f 100644
> > > --- a/mm/page_isolation.c
> > > +++ b/mm/page_isolation.c
> > > @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
> > >  			continue;
> > >  		}
> > >  		page = pfn_to_page(pfn);
> > > -		if (PageBuddy(page))
> > > +		if (PageBuddy(page)) {
> > > +			if (get_page_migratetype(page) != MIGRATE_ISOLATE)
> > > +				break;
> > >  			pfn += 1 << page_order(page);
> > > +		}
> > 
> > It is possible the page is moved to the MIGRATE_ISOLATE list between when
> > the page was freed to the buddy allocator and this check was made. The
> > page->index information is stale and the impact is that the hotplug
> > operation fails when it could have succeeded. That said, I think it is a
> > very unlikely race that will never happen in practice.
> 
> I understand you mean move_freepages which I have missed. Right?

Yes.

> Then, I will fix it, too.
> 
> > 
> > More importantly, the effect of this path is that EBUSY gets bubbled all
> > the way up and the hotplug operations fails. This is fine but as the page
> > is free at the time this problem is detected you also have the option
> > of moving the PageBuddy page to the MIGRATE_ISOLATE list at this time
> > if you take the zone lock. This will mean you need to change the name of
> > test_pages_isolated() of course.
> 
> Sorry, I can't get your point. Could you elaborate it more?

You detect a PageBuddy page but it's on the wrong list. Instead of returning
and failing memory-hotremove, move the free page to the correct list at
the time it is detected.

> Is it related to this patch?

No, it's not important and was a suggestion on how it could be made
better. However, retrying hot-remove would be even better again. I'm not
suggesting it be done as part of this series.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation
  2012-09-06  9:24       ` Mel Gorman
@ 2012-09-06 23:32         ` Minchan Kim
  0 siblings, 0 replies; 15+ messages in thread
From: Minchan Kim @ 2012-09-06 23:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	linux-mm, linux-kernel

On Thu, Sep 06, 2012 at 10:24:24AM +0100, Mel Gorman wrote:
> On Thu, Sep 06, 2012 at 01:49:03PM +0900, Minchan Kim wrote:
> > > > __offline_isolated_pages
> > > > /*
> > > >  * BUG_ON hit or offline page
> > > >  * which is used by someone
> > > >  */
> > > > BUG_ON(!PageBuddy(page A));
> > > > 
> > > 
> > > offline_page calling BUG_ON because someone allocated the page is
> > > ridiculous. I did not spot where that check is but it should be changed. The
> > > correct action is to retry the isolation.
> > 
> > It is where __offline_isolated_pges.
> > 
> > ..
> >         while (pfn < end_pfn) {
> >                 if (!pfn_valid(pfn)) {
> >                         pfn++;
> >                         continue;
> >                 }    
> >                 page = pfn_to_page(pfn);
> >                 BUG_ON(page_count(page));
> >                 BUG_ON(!PageBuddy(page)); <---- HERE
> >                 order = page_order(page);
> > ...
> > 
> > Comment of offline_isolated_pages says following as.
> > 
> >         We cannot do rollback at this point
> > 
> > So if the comment is true, BUG_ON does make sense to me.
> 
> It's massive overkill. I see no reason why it cannot return EBUSY all the
> way back up to offline_pages() and retry with the migration step.  It would
> both remove that BUG_ON and improve reliability of memory hot-remove.
> 
> > But I don't see why we can't retry it as I look thorugh code.
> > Anyway, It's another story which isn't related to this patch.
> > 
> 
> True.
> 
> > > 
> > > > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > > 
> > > At no point in the changelog do you actually say what he patch does :/
> > 
> > Argh, I will do.
> > 
> > > 
> > > > ---
> > > >  mm/page_isolation.c |    5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> > > > index acf65a7..4699d1f 100644
> > > > --- a/mm/page_isolation.c
> > > > +++ b/mm/page_isolation.c
> > > > @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
> > > >  			continue;
> > > >  		}
> > > >  		page = pfn_to_page(pfn);
> > > > -		if (PageBuddy(page))
> > > > +		if (PageBuddy(page)) {
> > > > +			if (get_page_migratetype(page) != MIGRATE_ISOLATE)
> > > > +				break;
> > > >  			pfn += 1 << page_order(page);
> > > > +		}
> > > 
> > > It is possible the page is moved to the MIGRATE_ISOLATE list between when
> > > the page was freed to the buddy allocator and this check was made. The
> > > page->index information is stale and the impact is that the hotplug
> > > operation fails when it could have succeeded. That said, I think it is a
> > > very unlikely race that will never happen in practice.
> > 
> > I understand you mean move_freepages which I have missed. Right?
> 
> Yes.
> 
> > Then, I will fix it, too.
> > 
> > > 
> > > More importantly, the effect of this path is that EBUSY gets bubbled all
> > > the way up and the hotplug operations fails. This is fine but as the page
> > > is free at the time this problem is detected you also have the option
> > > of moving the PageBuddy page to the MIGRATE_ISOLATE list at this time
> > > if you take the zone lock. This will mean you need to change the name of
> > > test_pages_isolated() of course.
> > 
> > Sorry, I can't get your point. Could you elaborate it more?
> 
> You detect a PageBuddy page but it's on the wrong list. Instead of returning
> and failing memory-hotremove, move the free page to the correct list at
> the time it is detected.

Good idea.

> 
> > Is it related to this patch?
> 
> No, it's not important and was a suggestion on how it could be made
> better. However, retrying hot-remove would be even better again. I'm not
> suggesting it be done as part of this series.

Mel, Thanks for your review.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation
  2012-09-05  7:26 ` [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation Minchan Kim
  2012-09-05  9:40   ` Mel Gorman
@ 2012-09-07  6:26   ` jencce zhou
  1 sibling, 0 replies; 15+ messages in thread
From: jencce zhou @ 2012-09-07  6:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Kamezawa Hiroyuki, Yasuaki Ishimatsu, Xishi Qiu,
	Mel Gorman, linux-mm, linux-kernel

2012/9/5 Minchan Kim <minchan@kernel.org>:
> Like below, memory-hotplug makes race between page-isolation
> and page-allocation so it can hit BUG_ON in __offline_isolated_pages.
>
>         CPU A                                   CPU B
>
> start_isolate_page_range
> set_migratetype_isolate
> spin_lock_irqsave(zone->lock)
>
>                                 free_hot_cold_page(Page A)
>                                 /* without zone->lock */
>                                 migratetype = get_pageblock_migratetype(Page A);
>                                 /*
>                                  * Page could be moved into MIGRATE_MOVABLE
>                                  * of per_cpu_pages
>                                  */
>                                 list_add_tail(&page->lru, &pcp->lists[migratetype]);
>
> set_pageblock_isolate
here
> move_freepages_block
> drain_all_pages
>
>                                 /* Page A could be in MIGRATE_MOVABLE of free_list. */
             why ?  should it has been moved to MIGRATE_ISOLATE list ?
>
> check_pages_isolated
> __test_page_isolated_in_pageblock
> /*
>  * We can't catch freed page which
>  * is free_list[MIGRATE_MOVABLE]
>  */
> if (PageBuddy(page A))
>         pfn += 1 << page_order(page A);
>
>                                 /* So, Page A could be allocated */
>
> __offline_isolated_pages
> /*
>  * BUG_ON hit or offline page
>  * which is used by someone
>  */
> BUG_ON(!PageBuddy(page A));
>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  mm/page_isolation.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index acf65a7..4699d1f 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn)
>                         continue;
>                 }
>                 page = pfn_to_page(pfn);
> -               if (PageBuddy(page))
> +               if (PageBuddy(page)) {
> +                       if (get_page_migratetype(page) != MIGRATE_ISOLATE)
> +                               break;
>                         pfn += 1 << page_order(page);
> +               }
>                 else if (page_count(page) == 0 &&
>                                 get_page_migratetype(page) == MIGRATE_ISOLATE)
>                         pfn += 1;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-09-07  6:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-05  7:25 [PATCH 0/3] memory-hotplug: handle page race between allocation and isolation Minchan Kim
2012-09-05  7:26 ` [PATCH 1/3] mm: use get_page_migratetype instead of page_private Minchan Kim
2012-09-05  9:09   ` Mel Gorman
2012-09-06  2:17     ` Minchan Kim
2012-09-06  2:02   ` Kamezawa Hiroyuki
2012-09-06  2:19     ` Minchan Kim
2012-09-05  7:26 ` [PATCH 2/3] mm: remain migratetype in freed page Minchan Kim
2012-09-05  9:25   ` Mel Gorman
2012-09-06  2:28     ` Minchan Kim
2012-09-05  7:26 ` [PATCH 3/3] memory-hotplug: bug fix race between isolation and allocation Minchan Kim
2012-09-05  9:40   ` Mel Gorman
2012-09-06  4:49     ` Minchan Kim
2012-09-06  9:24       ` Mel Gorman
2012-09-06 23:32         ` Minchan Kim
2012-09-07  6:26   ` jencce zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).