linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages
@ 2013-02-21 11:01 Lin Feng
  2013-02-21 11:01 ` [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
  2013-02-21 11:01 ` [PATCH V3 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng
  0 siblings, 2 replies; 6+ messages in thread
From: Lin Feng @ 2013-02-21 11:01 UTC (permalink / raw)
  To: akpm, mgorman, bcrl, viro
  Cc: khlebnikov, walken, kamezawa.hiroyu, minchan, riel, rientjes,
	isimatu.yasuaki, wency, laijs, tangchen, guz.fnst, jiang.liu,
	zab, jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel,
	Lin Feng

Currently get_user_pages() always tries to allocate pages from movable zone,
as discussed in thread https://lkml.org/lkml/2012/11/29/69, in some case users
of get_user_pages() is easy to pin user pages for a long time(for now we found
that pages pinned as aio ring pages is such case), which is fatal for memory
hotplug/remove framework.

So the 1st patch introduces a new library function called
get_user_pages_non_movable() to pin pages only from zone non-movable in memory.
It's a wrapper of get_user_pages() but it try its best to make sure that all
pages come from non-movable zone via additional page migration. While if
migration fails, it will keep the base functionality of get_user_pages().

The 2nd patch gets around the aio ring pages can't be migrated bug caused by
get_user_pages() via using the new function. It only works when configed with
CONFIG_MEMORY_HOTREMOVE, otherwise it falls back to use the old version of
get_user_pages().

---
ChangeLog v2->v3:
 Patch1:
 - Handle the hugepage and THP migration bug pointed out by Mel.
 - In response to the other review comments suggested by Mel.
 - Updated according to mm-tree.

 Patch2:
 - Updated according to mm-tree.

ChangeLog v1->v2:
 Patch1:
 - Fix the negative return value bug pointed out by Andrew and other
   suggestions pointed out by Andrew and Jeff.

 Patch2:
 - Kill the CONFIG_MEMORY_HOTREMOVE dependence suggested by Jeff.
---
Lin Feng (2):
  mm: hotplug: implement non-movable version of get_user_pages() called
    get_user_pages_non_movable()
  fs/aio.c: use get_user_pages_non_movable() to pin ring pages when
    support memory hotremove

 fs/aio.c               |    4 +-
 include/linux/mm.h     |   14 ++++++
 include/linux/mmzone.h |    4 ++
 mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_isolation.c    |    8 ++++
 5 files changed, 131 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-21 11:01 [PATCH V3 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
@ 2013-02-21 11:01 ` Lin Feng
  2013-02-26  9:10   ` Lin Feng
  2013-02-21 11:01 ` [PATCH V3 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng
  1 sibling, 1 reply; 6+ messages in thread
From: Lin Feng @ 2013-02-21 11:01 UTC (permalink / raw)
  To: akpm, mgorman, bcrl, viro
  Cc: khlebnikov, walken, kamezawa.hiroyu, minchan, riel, rientjes,
	isimatu.yasuaki, wency, laijs, tangchen, guz.fnst, jiang.liu,
	zab, jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel,
	Lin Feng

get_user_pages() always tries to allocate pages from movable zone, which is not
 reliable to memory hotremove framework in some case.

This patch introduces a new library function called get_user_pages_non_movable()
 to pin pages only from zone non-movable in memory.
It's a wrapper of get_user_pages() but it makes sure that all pages come from
non-movable zone via additional page migration. But if migration fails it
will at least keep the base functionality of get_user_pages().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Zach Brown <zab@redhat.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 include/linux/mm.h     |   14 ++++++
 include/linux/mmzone.h |    4 ++
 mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_isolation.c    |    8 ++++
 4 files changed, 129 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5625c1c..737dc39 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1025,6 +1025,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		    struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas);
+#else
+static inline
+int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
+				vmas);
+}
+#endif
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ab20a60..c31007e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -851,6 +851,10 @@ static inline int is_normal_idx(enum zone_type idx)
 	return (idx == ZONE_NORMAL);
 }
 
+static inline int zone_is_movable(struct zone *zone)
+{
+	return zone_idx(zone) == ZONE_MOVABLE;
+}
 /**
  * is_highmem - helper function to quickly check if a struct zone is a 
  *              highmem zone or not.  This is an attempt to keep references
diff --git a/mm/memory.c b/mm/memory.c
index 16ca5d0..83db7dd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -58,6 +58,8 @@
 #include <linux/elf.h>
 #include <linux/gfp.h>
 #include <linux/migrate.h>
+#include <linux/page-isolation.h>
+#include <linux/mm_inline.h>
 #include <linux/string.h>
 
 #include <asm/io.h>
@@ -2014,6 +2016,107 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/**
+ * It's a wrapper of get_user_pages() but it makes sure that all pages come from
+ * non-movable zone via additional page migration. It's designed for memory
+ * hotremove framework.
+ *
+ * Currently get_user_pages() always tries to allocate pages from movable zone,
+ * in some case users of get_user_pages() is easy to pin user pages for a long
+ * time(for now we found that pages pinned as aio ring pages is such case),
+ * which is fatal for memory hotremove framework.
+ *
+ * This function first calls get_user_pages() to get the candidate pages, and
+ * then check to ensure all pages are from non movable zone. Otherwise migrate
+ * them to non movable zone, then retry. It will at most retry once. If
+ * migration fails, it will keep the base functionality of get_user_pages()
+ * and issue WARN message for memory hot-remove people.
+ *
+ * Fixme: now we don't support non movable version of GUP for hugepage.
+ */
+int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	int ret, i, tried = 0;
+	bool isolate_err, migrate_prepped;
+	LIST_HEAD(pagelist);
+
+retry:
+	BUG_ON(tried == 2);
+	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
+				vmas);
+	/* No ZONE_MOVABLE populated, all pages are from non movable zone */
+	if (movable_zone == ZONE_MOVABLE || ret <= 0)
+		return ret;
+
+	isolate_err = false;
+	migrate_prepped = false;
+
+	for (i = 0; i < ret; i++) {
+		if (zone_is_movable(page_zone(pages[i]))) {
+			/* Fixme: improve for hugepage non movable support */
+			if (PageHuge(pages[i])) {
+				WARN_ONCE(1, "Non movable GUP for hugepages "
+					"haven't been implemented yet, it may "
+					"lead to memory hot-remove failure.\n");
+				continue;
+			}
+
+			/* Hugepage or THP's head page has covered tail pages */
+			if (PageTail(pages[i]) && (page_count(pages[i]) == 1))
+				continue;
+
+			if (!migrate_prepped) {
+				BUG_ON(migrate_prep());
+				migrate_prepped = true;
+			}
+
+			/* Fixme: isolate_lru_page() takes the LRU lock every
+			 * time, batching the lock could avoid potential lock
+			 * contention problems. -Mel Gorman
+			 */
+			if (!isolate_lru_page(pages[i])) {
+				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
+						 page_is_file_cache(pages[i]));
+				list_add(&pages[i]->lru, &pagelist);
+			} else {
+				isolate_err = true;
+				break;
+			}
+		}
+	}
+
+	/* All pages are non movable, we are done :) */
+	if (i == ret && list_empty(&pagelist))
+		return ret;
+
+	/* Undo the effects of former get_user_pages(), ready for another try */
+	release_pages(pages, ret, 1);
+
+	if (!isolate_err) {
+		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
+					MIGRATE_SYNC, MR_SYSCALL);
+		/* Steal pages from non-movable zone successfully? */
+		if (!ret) {
+			tried++;
+			goto retry;
+		}
+	}
+
+	putback_lru_pages(&pagelist);
+	/* Migration failed, in order to keep at least the base functionality of
+	 * get_user_pages(), we pin pages again but give WARN info to remind
+	 * memory hot-remove people, which is a trade-off.
+	 */
+	WARN_ONCE(1, "Non movable zone migration failed, "
+		"it may lead to memroy hot-remove failure.\n");
+	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
+				vmas);
+}
+EXPORT_SYMBOL(get_user_pages_non_movable);
+#endif
 /**
  * get_dump_page() - pin user page in memory while writing it to core dump
  * @addr: user address
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 383bdbb..7823ea5 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	return ret ? 0 : -EBUSY;
 }
 
+/**
+ * @private: 0 means page can be alloced from movable zone, otherwise forbidden
+ */
 struct page *alloc_migrate_target(struct page *page, unsigned long private,
 				  int **resultp)
 {
@@ -254,6 +257,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
 
 	if (PageHighMem(page))
 		gfp_mask |= __GFP_HIGHMEM;
+#if defined(CONFIG_MEMORY_HOTREMOVE) && defined(CONFIG_HIGHMEM)
+	BUILD_BUG_ON(1);
+#endif
+	if (unlikely(private != 0))
+		gfp_mask &= ~__GFP_HIGHMEM;
 
 	return alloc_page(gfp_mask);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V3 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove
  2013-02-21 11:01 [PATCH V3 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
  2013-02-21 11:01 ` [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
@ 2013-02-21 11:01 ` Lin Feng
  1 sibling, 0 replies; 6+ messages in thread
From: Lin Feng @ 2013-02-21 11:01 UTC (permalink / raw)
  To: akpm, mgorman, bcrl, viro
  Cc: khlebnikov, walken, kamezawa.hiroyu, minchan, riel, rientjes,
	isimatu.yasuaki, wency, laijs, tangchen, guz.fnst, jiang.liu,
	zab, jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel,
	Lin Feng

This patch gets around the aio ring pages can't be migrated bug caused by
get_user_pages() via using the new function. It only works as configed with
CONFIG_MEMORY_HOTREMOVE, otherwise it falls back to use the old version
 of get_user_pages().

Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Zach Brown <zab@redhat.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 fs/aio.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2512232..193e145 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -210,8 +210,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	}
 
 	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
-	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
-				       1, 0, ctx->ring_pages, NULL);
+	ctx->nr_pages = get_user_pages_non_movable(current, mm, ctx->mmap_base,
+					nr_pages, 1, 0, ctx->ring_pages, NULL);
 	up_write(&mm->mmap_sem);
 
 	if (unlikely(ctx->nr_pages != nr_pages)) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-21 11:01 ` [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
@ 2013-02-26  9:10   ` Lin Feng
  2013-03-06  7:48     ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 6+ messages in thread
From: Lin Feng @ 2013-02-26  9:10 UTC (permalink / raw)
  To: akpm, mgorman
  Cc: Lin Feng, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	minchan, riel, rientjes, isimatu.yasuaki, wency, laijs, tangchen,
	guz.fnst, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel

Hi Andrew, Mel and other guys,

How about this V3 patch, any comments?

thanks,
linfeng

On 02/21/2013 07:01 PM, Lin Feng wrote:
> get_user_pages() always tries to allocate pages from movable zone, which is not
>  reliable to memory hotremove framework in some case.
> 
> This patch introduces a new library function called get_user_pages_non_movable()
>  to pin pages only from zone non-movable in memory.
> It's a wrapper of get_user_pages() but it makes sure that all pages come from
> non-movable zone via additional page migration. But if migration fails it
> will at least keep the base functionality of get_user_pages().
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Zach Brown <zab@redhat.com>
> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
> ---
>  include/linux/mm.h     |   14 ++++++
>  include/linux/mmzone.h |    4 ++
>  mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/page_isolation.c    |    8 ++++
>  4 files changed, 129 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5625c1c..737dc39 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1025,6 +1025,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  		    struct vm_area_struct **vmas);
>  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>  			struct page **pages);
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
> +		unsigned long start, int nr_pages, int write, int force,
> +		struct page **pages, struct vm_area_struct **vmas);
> +#else
> +static inline
> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
> +		unsigned long start, int nr_pages, int write, int force,
> +		struct page **pages, struct vm_area_struct **vmas)
> +{
> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
> +				vmas);
> +}
> +#endif
>  struct kvec;
>  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>  			struct page **pages);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ab20a60..c31007e 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -851,6 +851,10 @@ static inline int is_normal_idx(enum zone_type idx)
>  	return (idx == ZONE_NORMAL);
>  }
>  
> +static inline int zone_is_movable(struct zone *zone)
> +{
> +	return zone_idx(zone) == ZONE_MOVABLE;
> +}
>  /**
>   * is_highmem - helper function to quickly check if a struct zone is a 
>   *              highmem zone or not.  This is an attempt to keep references
> diff --git a/mm/memory.c b/mm/memory.c
> index 16ca5d0..83db7dd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -58,6 +58,8 @@
>  #include <linux/elf.h>
>  #include <linux/gfp.h>
>  #include <linux/migrate.h>
> +#include <linux/page-isolation.h>
> +#include <linux/mm_inline.h>
>  #include <linux/string.h>
>  
>  #include <asm/io.h>
> @@ -2014,6 +2016,107 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>  }
>  EXPORT_SYMBOL(get_user_pages);
>  
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/**
> + * It's a wrapper of get_user_pages() but it makes sure that all pages come from
> + * non-movable zone via additional page migration. It's designed for memory
> + * hotremove framework.
> + *
> + * Currently get_user_pages() always tries to allocate pages from movable zone,
> + * in some case users of get_user_pages() is easy to pin user pages for a long
> + * time(for now we found that pages pinned as aio ring pages is such case),
> + * which is fatal for memory hotremove framework.
> + *
> + * This function first calls get_user_pages() to get the candidate pages, and
> + * then check to ensure all pages are from non movable zone. Otherwise migrate
> + * them to non movable zone, then retry. It will at most retry once. If
> + * migration fails, it will keep the base functionality of get_user_pages()
> + * and issue WARN message for memory hot-remove people.
> + *
> + * Fixme: now we don't support non movable version of GUP for hugepage.
> + */
> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
> +		unsigned long start, int nr_pages, int write, int force,
> +		struct page **pages, struct vm_area_struct **vmas)
> +{
> +	int ret, i, tried = 0;
> +	bool isolate_err, migrate_prepped;
> +	LIST_HEAD(pagelist);
> +
> +retry:
> +	BUG_ON(tried == 2);
> +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
> +				vmas);
> +	/* No ZONE_MOVABLE populated, all pages are from non movable zone */
> +	if (movable_zone == ZONE_MOVABLE || ret <= 0)
> +		return ret;
> +
> +	isolate_err = false;
> +	migrate_prepped = false;
> +
> +	for (i = 0; i < ret; i++) {
> +		if (zone_is_movable(page_zone(pages[i]))) {
> +			/* Fixme: improve for hugepage non movable support */
> +			if (PageHuge(pages[i])) {
> +				WARN_ONCE(1, "Non movable GUP for hugepages "
> +					"haven't been implemented yet, it may "
> +					"lead to memory hot-remove failure.\n");
> +				continue;
> +			}
> +
> +			/* Hugepage or THP's head page has covered tail pages */
> +			if (PageTail(pages[i]) && (page_count(pages[i]) == 1))
> +				continue;
> +
> +			if (!migrate_prepped) {
> +				BUG_ON(migrate_prep());
> +				migrate_prepped = true;
> +			}
> +
> +			/* Fixme: isolate_lru_page() takes the LRU lock every
> +			 * time, batching the lock could avoid potential lock
> +			 * contention problems. -Mel Gorman
> +			 */
> +			if (!isolate_lru_page(pages[i])) {
> +				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
> +						 page_is_file_cache(pages[i]));
> +				list_add(&pages[i]->lru, &pagelist);
> +			} else {
> +				isolate_err = true;
> +				break;
> +			}
> +		}
> +	}
> +
> +	/* All pages are non movable, we are done :) */
> +	if (i == ret && list_empty(&pagelist))
> +		return ret;
> +
> +	/* Undo the effects of former get_user_pages(), ready for another try */
> +	release_pages(pages, ret, 1);
> +
> +	if (!isolate_err) {
> +		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
> +					MIGRATE_SYNC, MR_SYSCALL);
> +		/* Steal pages from non-movable zone successfully? */
> +		if (!ret) {
> +			tried++;
> +			goto retry;
> +		}
> +	}
> +
> +	putback_lru_pages(&pagelist);
> +	/* Migration failed, in order to keep at least the base functionality of
> +	 * get_user_pages(), we pin pages again but give WARN info to remind
> +	 * memory hot-remove people, which is a trade-off.
> +	 */
> +	WARN_ONCE(1, "Non movable zone migration failed, "
> +		"it may lead to memroy hot-remove failure.\n");
> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
> +				vmas);
> +}
> +EXPORT_SYMBOL(get_user_pages_non_movable);
> +#endif
>  /**
>   * get_dump_page() - pin user page in memory while writing it to core dump
>   * @addr: user address
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 383bdbb..7823ea5 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  	return ret ? 0 : -EBUSY;
>  }
>  
> +/**
> + * @private: 0 means page can be alloced from movable zone, otherwise forbidden
> + */
>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>  				  int **resultp)
>  {
> @@ -254,6 +257,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>  
>  	if (PageHighMem(page))
>  		gfp_mask |= __GFP_HIGHMEM;
> +#if defined(CONFIG_MEMORY_HOTREMOVE) && defined(CONFIG_HIGHMEM)
> +	BUILD_BUG_ON(1);
> +#endif
> +	if (unlikely(private != 0))
> +		gfp_mask &= ~__GFP_HIGHMEM;
>  
>  	return alloc_page(gfp_mask);
>  }
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-26  9:10   ` Lin Feng
@ 2013-03-06  7:48     ` Yasuaki Ishimatsu
  2013-03-06 11:29       ` Lin Feng
  0 siblings, 1 reply; 6+ messages in thread
From: Yasuaki Ishimatsu @ 2013-03-06  7:48 UTC (permalink / raw)
  To: Lin Feng
  Cc: akpm, mgorman, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	minchan, riel, rientjes, wency, laijs, tangchen, guz.fnst,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, m.szyprowski

Hi Lin,

IMHO, current implementation depends on luck. So even if system has
many non movable memory, get_user_pages_non_movable() may not allocate
non movable memory.

At following thread, Marek Szyprowski implemented similar feature which
allocates non movable memory by "get_user_pages()".

https://lkml.org/lkml/2013/3/5/44

I think Marek's way is better.

Thanks,
Yasuaki Ishimatsu

2013/02/26 18:10, Lin Feng wrote:
> Hi Andrew, Mel and other guys,
>
> How about this V3 patch, any comments?
>
> thanks,
> linfeng
>
> On 02/21/2013 07:01 PM, Lin Feng wrote:
>> get_user_pages() always tries to allocate pages from movable zone, which is not
>>   reliable to memory hotremove framework in some case.
>>
>> This patch introduces a new library function called get_user_pages_non_movable()
>>   to pin pages only from zone non-movable in memory.
>> It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> non-movable zone via additional page migration. But if migration fails it
>> will at least keep the base functionality of get_user_pages().
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Zach Brown <zab@redhat.com>
>> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
>> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
>> ---
>>   include/linux/mm.h     |   14 ++++++
>>   include/linux/mmzone.h |    4 ++
>>   mm/memory.c            |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   mm/page_isolation.c    |    8 ++++
>>   4 files changed, 129 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 5625c1c..737dc39 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1025,6 +1025,20 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   		    struct vm_area_struct **vmas);
>>   int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>>   			struct page **pages);
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas);
>> +#else
>> +static inline
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +#endif
>>   struct kvec;
>>   int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>>   			struct page **pages);
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index ab20a60..c31007e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -851,6 +851,10 @@ static inline int is_normal_idx(enum zone_type idx)
>>   	return (idx == ZONE_NORMAL);
>>   }
>>
>> +static inline int zone_is_movable(struct zone *zone)
>> +{
>> +	return zone_idx(zone) == ZONE_MOVABLE;
>> +}
>>   /**
>>    * is_highmem - helper function to quickly check if a struct zone is a
>>    *              highmem zone or not.  This is an attempt to keep references
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 16ca5d0..83db7dd 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -58,6 +58,8 @@
>>   #include <linux/elf.h>
>>   #include <linux/gfp.h>
>>   #include <linux/migrate.h>
>> +#include <linux/page-isolation.h>
>> +#include <linux/mm_inline.h>
>>   #include <linux/string.h>
>>
>>   #include <asm/io.h>
>> @@ -2014,6 +2016,107 @@ long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
>>   }
>>   EXPORT_SYMBOL(get_user_pages);
>>
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +/**
>> + * It's a wrapper of get_user_pages() but it makes sure that all pages come from
>> + * non-movable zone via additional page migration. It's designed for memory
>> + * hotremove framework.
>> + *
>> + * Currently get_user_pages() always tries to allocate pages from movable zone,
>> + * in some case users of get_user_pages() is easy to pin user pages for a long
>> + * time(for now we found that pages pinned as aio ring pages is such case),
>> + * which is fatal for memory hotremove framework.
>> + *
>> + * This function first calls get_user_pages() to get the candidate pages, and
>> + * then check to ensure all pages are from non movable zone. Otherwise migrate
>> + * them to non movable zone, then retry. It will at most retry once. If
>> + * migration fails, it will keep the base functionality of get_user_pages()
>> + * and issue WARN message for memory hot-remove people.
>> + *
>> + * Fixme: now we don't support non movable version of GUP for hugepage.
>> + */
>> +int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
>> +		unsigned long start, int nr_pages, int write, int force,
>> +		struct page **pages, struct vm_area_struct **vmas)
>> +{
>> +	int ret, i, tried = 0;
>> +	bool isolate_err, migrate_prepped;
>> +	LIST_HEAD(pagelist);
>> +
>> +retry:
>> +	BUG_ON(tried == 2);
>> +	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +	/* No ZONE_MOVABLE populated, all pages are from non movable zone */
>> +	if (movable_zone == ZONE_MOVABLE || ret <= 0)
>> +		return ret;
>> +
>> +	isolate_err = false;
>> +	migrate_prepped = false;
>> +
>> +	for (i = 0; i < ret; i++) {
>> +		if (zone_is_movable(page_zone(pages[i]))) {
>> +			/* Fixme: improve for hugepage non movable support */
>> +			if (PageHuge(pages[i])) {
>> +				WARN_ONCE(1, "Non movable GUP for hugepages "
>> +					"haven't been implemented yet, it may "
>> +					"lead to memory hot-remove failure.\n");
>> +				continue;
>> +			}
>> +
>> +			/* Hugepage or THP's head page has covered tail pages */
>> +			if (PageTail(pages[i]) && (page_count(pages[i]) == 1))
>> +				continue;
>> +
>> +			if (!migrate_prepped) {
>> +				BUG_ON(migrate_prep());
>> +				migrate_prepped = true;
>> +			}
>> +
>> +			/* Fixme: isolate_lru_page() takes the LRU lock every
>> +			 * time, batching the lock could avoid potential lock
>> +			 * contention problems. -Mel Gorman
>> +			 */
>> +			if (!isolate_lru_page(pages[i])) {
>> +				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
>> +						 page_is_file_cache(pages[i]));
>> +				list_add(&pages[i]->lru, &pagelist);
>> +			} else {
>> +				isolate_err = true;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	/* All pages are non movable, we are done :) */
>> +	if (i == ret && list_empty(&pagelist))
>> +		return ret;
>> +
>> +	/* Undo the effects of former get_user_pages(), ready for another try */
>> +	release_pages(pages, ret, 1);
>> +
>> +	if (!isolate_err) {
>> +		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
>> +					MIGRATE_SYNC, MR_SYSCALL);
>> +		/* Steal pages from non-movable zone successfully? */
>> +		if (!ret) {
>> +			tried++;
>> +			goto retry;
>> +		}
>> +	}
>> +
>> +	putback_lru_pages(&pagelist);
>> +	/* Migration failed, in order to keep at least the base functionality of
>> +	 * get_user_pages(), we pin pages again but give WARN info to remind
>> +	 * memory hot-remove people, which is a trade-off.
>> +	 */
>> +	WARN_ONCE(1, "Non movable zone migration failed, "
>> +		"it may lead to memroy hot-remove failure.\n");
>> +	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
>> +				vmas);
>> +}
>> +EXPORT_SYMBOL(get_user_pages_non_movable);
>> +#endif
>>   /**
>>    * get_dump_page() - pin user page in memory while writing it to core dump
>>    * @addr: user address
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 383bdbb..7823ea5 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>   	return ret ? 0 : -EBUSY;
>>   }
>>
>> +/**
>> + * @private: 0 means page can be alloced from movable zone, otherwise forbidden
>> + */
>>   struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>   				  int **resultp)
>>   {
>> @@ -254,6 +257,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>
>>   	if (PageHighMem(page))
>>   		gfp_mask |= __GFP_HIGHMEM;
>> +#if defined(CONFIG_MEMORY_HOTREMOVE) && defined(CONFIG_HIGHMEM)
>> +	BUILD_BUG_ON(1);
>> +#endif
>> +	if (unlikely(private != 0))
>> +		gfp_mask &= ~__GFP_HIGHMEM;
>>
>>   	return alloc_page(gfp_mask);
>>   }
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-03-06  7:48     ` Yasuaki Ishimatsu
@ 2013-03-06 11:29       ` Lin Feng
  0 siblings, 0 replies; 6+ messages in thread
From: Lin Feng @ 2013-03-06 11:29 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: akpm, mgorman, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	minchan, riel, rientjes, wency, laijs, tangchen, guz.fnst,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, m.szyprowski

Hi Yasuaki,

On 03/06/2013 03:48 PM, Yasuaki Ishimatsu wrote:
> Hi Lin,
> 
> IMHO, current implementation depends on luck. So even if system has
> many non movable memory, get_user_pages_non_movable() may not allocate
> non movable memory.

Sorry, I'm not quite understand here, since the to be pinned pages are
never pinned before, the migration successful probability is quite high,
not just depends on luck.

> 
> At following thread, Marek Szyprowski implemented similar feature which
> allocates non movable memory by "get_user_pages()".
> 
> https://lkml.org/lkml/2013/3/5/44
> 
> I think Marek's way is better.

I think the two versions of get_user_pages() are not that great and one
I can see is that Marek's patchset adding new flag to instruct the page
allocation while I'm not sure if that approach is proper since it touches
the core page allocation codes.

Besides in Marek's version we also have to patch every place if we want to 
get around migration failure caused by GUP long pinned pages unless
we want GUP fall into getting all pages from non movable zones.

As Mel suggested before adding a migrate callback for such pages may be 
another approach, but we also have to distinguish the long-time pin
and short-time pin while which is not expectable beforehand. 

So what we can do to improve is to migrate the pinned pages by GUP just
the time we perform the migration caused by memory hotplug or cma people
so that nobody cases if the page is long-time pinned or not :)

But is that feasible to migrate a being pinned page ? 
  
thanks,
linfeng

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-03-06 11:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-21 11:01 [PATCH V3 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
2013-02-21 11:01 ` [PATCH V3 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
2013-02-26  9:10   ` Lin Feng
2013-03-06  7:48     ` Yasuaki Ishimatsu
2013-03-06 11:29       ` Lin Feng
2013-02-21 11:01 ` [PATCH V3 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).