[PATCH V2 0/2] mm: hotplug: implement non-movable version of get_user

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V2 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages
@ 2013-02-05  9:21 Lin Feng
  2013-02-05  9:21 ` [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
  2013-02-05  9:21 ` [PATCH V2 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng
  0 siblings, 2 replies; 37+ messages in thread
From: Lin Feng @ 2013-02-05  9:21 UTC (permalink / raw)
  To: akpm, mgorman, bcrl, viro
  Cc: khlebnikov, walken, kamezawa.hiroyu, minchan, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, jmoyer, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Lin Feng

Currently get_user_pages() always tries to allocate pages from movable zone,
as discussed in thread https://lkml.org/lkml/2012/11/29/69, in some case users
of get_user_pages() is easy to pin user pages for a long time(for now we found
that pages pinned as aio ring pages is such case), which is fatal for memory
hotplug/remove framework.

So the 1st patch introduces a new library function called
get_user_pages_non_movable() to pin pages only from zone non-movable in memory.
It's a wrapper of get_user_pages() but it makes sure that all pages come from
non-movable zone via additional page migration.

The 2nd patch gets around the aio ring pages can't be migrated bug caused by
get_user_pages() via using the new function. It only works when configed with
CONFIG_MEMORY_HOTREMOVE, otherwise it falls back to use the old version of
get_user_pages().

---
ChangeLog v1->v2:
 Patch1:
 - Fix the negative return value bug pointed out by Andrew and other
   suggestions pointed out by Andrew and Jeff.

 Patch2:
 - Kill the CONFIG_MEMORY_HOTREMOVE dependence suggested by Jeff.
---
Lin Feng (2):
  mm: hotplug: implement non-movable version of get_user_pages() called
    get_user_pages_non_movable()
  fs/aio.c: use get_user_pages_non_movable() to pin ring pages when
    support memory hotremove

 fs/aio.c               |    4 +-
 include/linux/mm.h     |    3 ++
 include/linux/mmzone.h |    4 ++
 mm/memory.c            |   83 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_isolation.c    |    5 +++
 5 files changed, 97 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-05  9:21 [PATCH V2 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
@ 2013-02-05  9:21 ` Lin Feng
  2013-02-05 12:01   ` Mel Gorman
       [not found]   ` <20130220113757.GA10124@hacker.(null)>
  2013-02-05  9:21 ` [PATCH V2 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng
  1 sibling, 2 replies; 37+ messages in thread
From: Lin Feng @ 2013-02-05  9:21 UTC (permalink / raw)
  To: akpm, mgorman, bcrl, viro
  Cc: khlebnikov, walken, kamezawa.hiroyu, minchan, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, jmoyer, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Lin Feng

get_user_pages() always tries to allocate pages from movable zone, which is not
 reliable to memory hotremove framework in some case.

This patch introduces a new library function called get_user_pages_non_movable()
 to pin pages only from zone non-movable in memory.
It's a wrapper of get_user_pages() but it makes sure that all pages come from
non-movable zone via additional page migration.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Zach Brown <zab@redhat.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 include/linux/mm.h     |    3 ++
 include/linux/mmzone.h |    4 ++
 mm/memory.c            |   83 ++++++++++++++++++++++++++++++++++++++++++++++++
 mm/page_isolation.c    |    5 +++
 4 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 12f5a09..3ff9eba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1049,6 +1049,9 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			struct page **pages, struct vm_area_struct **vmas);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
 			struct page **pages);
+int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas);
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e25ab6f..1506351 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -841,6 +841,10 @@ static inline int is_normal_idx(enum zone_type idx)
 	return (idx == ZONE_NORMAL);
 }
 
+static inline int zone_is_movable(struct zone *zone)
+{
+	return zone_idx(zone) == ZONE_MOVABLE;
+}
 /**
  * is_highmem - helper function to quickly check if a struct zone is a 
  *              highmem zone or not.  This is an attempt to keep references
diff --git a/mm/memory.c b/mm/memory.c
index bb1369f..ede53cc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -58,6 +58,8 @@
 #include <linux/elf.h>
 #include <linux/gfp.h>
 #include <linux/migrate.h>
+#include <linux/page-isolation.h>
+#include <linux/mm_inline.h>
 #include <linux/string.h>
 
 #include <asm/io.h>
@@ -1995,6 +1997,87 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 }
 EXPORT_SYMBOL(get_user_pages);
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/**
+ * It's a wrapper of get_user_pages() but it makes sure that all pages come from
+ * non-movable zone via additional page migration. It's designed for memory
+ * hotremove framework.
+ *
+ * Currently get_user_pages() always tries to allocate pages from movable zone,
+ * in some case users of get_user_pages() is easy to pin user pages for a long
+ *  time(for now we found that pages pinned as aio ring pages is such case),
+ * which is fatal for memory hotremove framework.
+ *
+ * This function first calls get_user_pages() to get the candidate pages, and
+ * then check to ensure all pages are from non movable zone. Otherwise migrate
+ * them to non movable zone, then retry. It will at most retry once.
+ */
+int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	int ret, i, isolate_err, migrate_pre_flag;
+	LIST_HEAD(pagelist);
+
+retry:
+	ret = get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
+				vmas);
+	if (ret <= 0)
+		return ret;
+
+	isolate_err = 0;
+	migrate_pre_flag = 0;
+
+	for (i = 0; i < ret; i++) {
+		if (zone_is_movable(page_zone(pages[i]))) {
+			if (!migrate_pre_flag) {
+				if (migrate_prep())
+					goto release_page;
+				migrate_pre_flag = 1;
+			}
+
+			if (!isolate_lru_page(pages[i])) {
+				inc_zone_page_state(pages[i], NR_ISOLATED_ANON +
+						 page_is_file_cache(pages[i]));
+				list_add_tail(&pages[i]->lru, &pagelist);
+			} else {
+				isolate_err = 1;
+				goto release_page;
+			}
+		}
+	}
+
+	/* All pages are non movable, we are done :) */
+	if (i == ret && list_empty(&pagelist))
+		return ret;
+
+release_page:
+	/* Undo the effects of former get_user_pages(), we won't pin anything */
+	release_pages(pages, ret, 1);
+
+	if (migrate_pre_flag && !isolate_err) {
+		ret = migrate_pages(&pagelist, alloc_migrate_target, 1,
+					false, MIGRATE_SYNC, MR_SYSCALL);
+		/* Steal pages from non-movable zone successfully? */
+		if (!ret)
+			goto retry;
+	}
+
+	putback_lru_pages(&pagelist);
+	/* Migration failed, we pin 0 page, tell caller the truth */
+	return 0;
+}
+#else
+inline int get_user_pages_non_movable(struct task_struct *tsk, struct mm_struct *mm,
+		unsigned long start, int nr_pages, int write, int force,
+		struct page **pages, struct vm_area_struct **vmas)
+{
+	return get_user_pages(tsk, mm, start, nr_pages, write, force, pages,
+				vmas);
+}
+#endif
+EXPORT_SYMBOL(get_user_pages_non_movable);
+
 /**
  * get_dump_page() - pin user page in memory while writing it to core dump
  * @addr: user address
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 383bdbb..1b7bd17 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -247,6 +247,9 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	return ret ? 0 : -EBUSY;
 }
 
+/**
+ * @private: 0 means page can be alloced from movable zone, otherwise forbidden
+ */
 struct page *alloc_migrate_target(struct page *page, unsigned long private,
 				  int **resultp)
 {
@@ -254,6 +257,8 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
 
 	if (PageHighMem(page))
 		gfp_mask |= __GFP_HIGHMEM;
+	if (unlikely(private != 0))
+		gfp_mask &= ~__GFP_MOVABLE;
 
 	return alloc_page(gfp_mask);
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH V2 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove
  2013-02-05  9:21 [PATCH V2 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
  2013-02-05  9:21 ` [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
@ 2013-02-05  9:21 ` Lin Feng
  1 sibling, 0 replies; 37+ messages in thread
From: Lin Feng @ 2013-02-05  9:21 UTC (permalink / raw)
  To: akpm, mgorman, bcrl, viro
  Cc: khlebnikov, walken, kamezawa.hiroyu, minchan, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, jmoyer, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Lin Feng

This patch gets around the aio ring pages can't be migrated bug caused by
get_user_pages() via using the new function. It only works as configed with
CONFIG_MEMORY_HOTREMOVE, otherwise it falls back to use the old version
 of get_user_pages().

Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Zach Brown <zab@redhat.com>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
---
 fs/aio.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 71f613c..f7a0d5c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -138,8 +138,8 @@ static int aio_setup_ring(struct kioctx *ctx)
 	}
 
 	dprintk("mmap address: 0x%08lx\n", info->mmap_base);
-	info->nr_pages = get_user_pages(current, ctx->mm,
-					info->mmap_base, nr_pages, 
+	info->nr_pages = get_user_pages_non_movable(current, ctx->mm,
+					info->mmap_base, nr_pages,
 					1, 0, info->ring_pages, NULL);
 	up_write(&ctx->mm->mmap_sem);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-05  9:21 ` [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
@ 2013-02-05 12:01   ` Mel Gorman
  2013-02-06  0:42     ` Minchan Kim
       [not found]   ` <20130220113757.GA10124@hacker.(null)>
  1 sibling, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2013-02-05 12:01 UTC (permalink / raw)
  To: Lin Feng
  Cc: akpm, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu, minchan,
	riel, rientjes, isimatu.yasuaki, wency, laijs, jiang.liu, zab,
	jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel

On Tue, Feb 05, 2013 at 05:21:52PM +0800, Lin Feng wrote:
> get_user_pages() always tries to allocate pages from movable zone, which is not
>  reliable to memory hotremove framework in some case.
> 
> This patch introduces a new library function called get_user_pages_non_movable()
>  to pin pages only from zone non-movable in memory.
> It's a wrapper of get_user_pages() but it makes sure that all pages come from
> non-movable zone via additional page migration.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Zach Brown <zab@redhat.com>
> Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>

I already had started the review of V1 before this was sent
unfortunately. However, I think the feedback I gave for V1 is still
valid so I'll wait for comments on that review before digging further.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-05 12:01   ` Mel Gorman
@ 2013-02-06  0:42     ` Minchan Kim
  2013-02-06  0:52       ` Benjamin LaHaise
  2013-02-06  9:56       ` Mel Gorman
  0 siblings, 2 replies; 37+ messages in thread
From: Minchan Kim @ 2013-02-06  0:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Lin Feng, akpm, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	riel, rientjes, isimatu.yasuaki, wency, laijs, jiang.liu, zab,
	jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel

On Tue, Feb 05, 2013 at 12:01:37PM +0000, Mel Gorman wrote:
> On Tue, Feb 05, 2013 at 05:21:52PM +0800, Lin Feng wrote:
> > get_user_pages() always tries to allocate pages from movable zone, which is not
> >  reliable to memory hotremove framework in some case.
> > 
> > This patch introduces a new library function called get_user_pages_non_movable()
> >  to pin pages only from zone non-movable in memory.
> > It's a wrapper of get_user_pages() but it makes sure that all pages come from
> > non-movable zone via additional page migration.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> > Cc: Jeff Moyer <jmoyer@redhat.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Zach Brown <zab@redhat.com>
> > Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> > Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
> 
> I already had started the review of V1 before this was sent
> unfortunately. However, I think the feedback I gave for V1 is still
> valid so I'll wait for comments on that review before digging further.

Mel, Andrew

Sorry for making noise if you already confirmed the direction but I have a concern
about that. Because IMHO, we can't expect most of user for MEMORY_HOTPLUG will release
pinned pages immediately. In addtion, MEMORY_HOTPLUG could be used for embedded system
for reducing power by PASR and some drivers in embedded could use GUP anytime and anywhere.
They can't know in advance they will use pinned pages long time or release in short time
because it depends on some event like user's response which is very not predetermined.
So for solving it, we can add some WARN_ON in CMA/MEMORY_HOTPLUG part just in case of
failing migration by page count and then, investigate they are really using GUP and
it's REALLY a culprit. If so, yell to them "Please use GUP_NM instead"?
Yes. it could be done but it would be rather trobulesome job. Even it couldn't be triggered
during QE phase so that trouble doesn't end until all guys uses GUP_NM.
Let's consider another case. Some driver pin the page in very short time
so he decide to use GUP instead of GUP_NM but someday, someuser start to use the driver
very often so although pinning time is very short, it could be forever pinning effect
if the use calls it very often. In the end, we should change it with GUP_NM, again.
IMHO, In future, we ends up changing most of GUP user with GUP_NM if CMA and MEMORY_HOTPLUG
is available all over the world.

So, what's wrong if we replace get_user_pages with get_user_pages_non_movable
in MEMORY_HOTPLUG/CMA without exposing get_user_pages_non_movable?

I mean this

#ifdef CONFIG_MIGRATE_ISOLATE
int get_user_pages()
{
        return __get_user_pages_non_movable();
}
#else
int get_user_pages()
{
        return old_get_user_pages();
}
#endif

IMHO, get_user_pages isn't performance sensitive function. If user was sensitive
about it, he should have tried get_user_pages_fast.
THP degradation by increasing MIGRATE_UNMOVABLE?
Lin said most of GUP pages release the page in short so is it really problem?
Even in embedded, we don't use THP yet but CMA and GUP call would be not too often
but failing of CMA would be critical.

I'd like to hear opinions.

> 
> -- 
> Mel Gorman
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-06  0:42     ` Minchan Kim
@ 2013-02-06  0:52       ` Benjamin LaHaise
  2013-02-06  9:56       ` Mel Gorman
  1 sibling, 0 replies; 37+ messages in thread
From: Benjamin LaHaise @ 2013-02-06  0:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Lin Feng, akpm, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel

On Wed, Feb 06, 2013 at 09:42:34AM +0900, Minchan Kim wrote:
> THP degradation by increasing MIGRATE_UNMOVABLE?
> Lin said most of GUP pages release the page in short so is it really problem?
> Even in embedded, we don't use THP yet but CMA and GUP call would be not too often
> but failing of CMA would be critical.
> 
> I'd like to hear opinions.

If aio was given a callback to migrate the pages on, it could just migrate 
the pages as needed.  There's nothing fundamental preventing that approach.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-06  0:42     ` Minchan Kim
  2013-02-06  0:52       ` Benjamin LaHaise
@ 2013-02-06  9:56       ` Mel Gorman
  2013-02-08  2:32         ` Minchan Kim
  2013-05-13  9:11         ` Tang Chen
  1 sibling, 2 replies; 37+ messages in thread
From: Mel Gorman @ 2013-02-06  9:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Lin Feng, akpm, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	riel, rientjes, isimatu.yasuaki, wency, laijs, jiang.liu, zab,
	jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel

On Wed, Feb 06, 2013 at 09:42:34AM +0900, Minchan Kim wrote:
> On Tue, Feb 05, 2013 at 12:01:37PM +0000, Mel Gorman wrote:
> > On Tue, Feb 05, 2013 at 05:21:52PM +0800, Lin Feng wrote:
> > > get_user_pages() always tries to allocate pages from movable zone, which is not
> > >  reliable to memory hotremove framework in some case.
> > > 
> > > This patch introduces a new library function called get_user_pages_non_movable()
> > >  to pin pages only from zone non-movable in memory.
> > > It's a wrapper of get_user_pages() but it makes sure that all pages come from
> > > non-movable zone via additional page migration.
> > > 
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> > > Cc: Jeff Moyer <jmoyer@redhat.com>
> > > Cc: Minchan Kim <minchan@kernel.org>
> > > Cc: Zach Brown <zab@redhat.com>
> > > Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> > > Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > > Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
> > 
> > I already had started the review of V1 before this was sent
> > unfortunately. However, I think the feedback I gave for V1 is still
> > valid so I'll wait for comments on that review before digging further.
> 
> Mel, Andrew
> 
> Sorry for making noise if you already confirmed the direction but I have a concern
> about that.

I haven't confirmed any sort of direction, nor do I determine the
direction for memory hot-remove which I'm only paying vague attention to.
I stated a while ago that I think the use of ZONE_MOVABLE is a bad idea
for "guaranteeing" memory hot-remove and is already going the "wrong"
direction. That's just my opinion.

This patch is about mitigating (but not solving) the problem of long-lived
pins. In the general case, about all I could think of for that is that the
kernel would have to warn the administrator what applications had pinned
the memory and wait for the user to shut them down. To guarantee anything,
it would be necessary for subsystems to implement a callback for migration
to unpin pages, barrier operations until migration completes and pin the
new pfns.

> Because IMHO, we can't expect most of user for MEMORY_HOTPLUG will release
> pinned pages immediately.

Indeed not, but it's not really what this patch is about. This patch is
about moving the pages before they get permanently pinned. It mitigates
the problem but does not solve it because there is no guarantee that the
driver pinning a page will flag it properly.

> In addtion, MEMORY_HOTPLUG could be used for embedded system
> for reducing power by PASR and some drivers in embedded could use GUP anytime and anywhere.
> They can't know in advance they will use pinned pages long time or release in short time
> because it depends on some event like user's response which is very not predetermined.

True. This patch does not solve that problem.

> So for solving it, we can add some WARN_ON in CMA/MEMORY_HOTPLUG part just in case of
> failing migration by page count and then, investigate they are really using GUP and
> it's REALLY a culprit. If so, yell to them "Please use GUP_NM instead"?

Within the context of this patch, that is their main option. Finding
who is holding the pin is a problem. For userspace-pinned buffers it's
straight-forward as rmap will identify what processes are holding the
pin (page->list vmas->mm, lookup all tasks until p->mm == mm) and report
that. For driver-related pins, it's not as straight-forward. I guess there
could be callback to give meaningful information on it but no guarantee
that drivers pinning pages will implement it. In that case all you could do
was dump page->mapping and punt it at a kernel developer to figure out the
responsible driver. This might be managable for memory hot-remove where
there is an administator but may not work at all for embedded users.

There is the possibility that callbacks could be introduced for
migrate_unpin() and migrate_pin() that takes a list of PFN pairs
(old,new). The unpin callback should release the old PFNs and barrier
against any operations until the migrate_pfn() callback is called with
the updated pfns to be repinned. Again it would fully depend on subsystems
implementing it properly.

The callback interface would be more robust but puts a lot more work on
the driver side where your milage will vary.

> Yes. it could be done but it would be rather trobulesome job. Even it couldn't be triggered
> during QE phase so that trouble doesn't end until all guys uses GUP_NM.
> Let's consider another case. Some driver pin the page in very short time
> so he decide to use GUP instead of GUP_NM but someday, someuser start to use the driver
> very often so although pinning time is very short, it could be forever pinning effect
> if the use calls it very often. In the end, we should change it with GUP_NM, again.
> IMHO, In future, we ends up changing most of GUP user with GUP_NM if CMA and MEMORY_HOTPLUG
> is available all over the world.
> 

Same thing, callbacks to unpin and barrier would handle such a case by
effectively freezing the driver or subsystem responsible for the page.

> So, what's wrong if we replace get_user_pages with get_user_pages_non_movable
> in MEMORY_HOTPLUG/CMA without exposing get_user_pages_non_movable?
> 
> I mean this
> 
> #ifdef CONFIG_MIGRATE_ISOLATE
> int get_user_pages()
> {
>         return __get_user_pages_non_movable();
> }
> #else
> int get_user_pages()
> {
>         return old_get_user_pages();
> }
> #endif
> 

That will migrate everything out of ZONE_MOVABLE every time it's pinned.
One consequence is that direct IO can never use ZONE_MOVABLE on these
systems. It'll create a variation of the lowmem exhaustion problem.

> IMHO, get_user_pages isn't performance sensitive function. If user was sensitive
> about it, he should have tried get_user_pages_fast.

That opens a different cans of works. get_user_pages is part of the
gup_fast slowpath.

> THP degradation by increasing MIGRATE_UNMOVABLE?

The patch should not be converting MIGRATE_MOVABLE requests to
MIGRATE_UNMOVABLE. I covered this in the review of v1.

> Lin said most of GUP pages release the page in short so is it really problem?
> Even in embedded, we don't use THP yet but CMA and GUP call would be not too often
> but failing of CMA would be critical.
> 

To guarantee CMA can migrate pages pinned by drivers I think you need
migrate-related callsbacks to unpin, barrier the driver until migration
completes and repin.

I do not know, or at least have no heard, of anyone working on such a
scheme.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-06  9:56       ` Mel Gorman
@ 2013-02-08  2:32         ` Minchan Kim
  2013-05-13  9:11         ` Tang Chen
  1 sibling, 0 replies; 37+ messages in thread
From: Minchan Kim @ 2013-02-08  2:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Lin Feng, akpm, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	riel, rientjes, isimatu.yasuaki, wency, laijs, jiang.liu, zab,
	jmoyer, linux-mm, linux-aio, linux-fsdevel, linux-kernel

Hi Mel,

On Wed, Feb 06, 2013 at 09:56:17AM +0000, Mel Gorman wrote:
> On Wed, Feb 06, 2013 at 09:42:34AM +0900, Minchan Kim wrote:
> > On Tue, Feb 05, 2013 at 12:01:37PM +0000, Mel Gorman wrote:
> > > On Tue, Feb 05, 2013 at 05:21:52PM +0800, Lin Feng wrote:
> > > > get_user_pages() always tries to allocate pages from movable zone, which is not
> > > >  reliable to memory hotremove framework in some case.
> > > > 
> > > > This patch introduces a new library function called get_user_pages_non_movable()
> > > >  to pin pages only from zone non-movable in memory.
> > > > It's a wrapper of get_user_pages() but it makes sure that all pages come from
> > > > non-movable zone via additional page migration.
> > > > 
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: Mel Gorman <mgorman@suse.de>
> > > > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > > > Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> > > > Cc: Jeff Moyer <jmoyer@redhat.com>
> > > > Cc: Minchan Kim <minchan@kernel.org>
> > > > Cc: Zach Brown <zab@redhat.com>
> > > > Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
> > > > Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > > > Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
> > > 
> > > I already had started the review of V1 before this was sent
> > > unfortunately. However, I think the feedback I gave for V1 is still
> > > valid so I'll wait for comments on that review before digging further.
> > 
> > Mel, Andrew
> > 
> > Sorry for making noise if you already confirmed the direction but I have a concern
> > about that.
> 
> I haven't confirmed any sort of direction, nor do I determine the
> direction for memory hot-remove which I'm only paying vague attention to.
> I stated a while ago that I think the use of ZONE_MOVABLE is a bad idea
> for "guaranteeing" memory hot-remove and is already going the "wrong"
> direction. That's just my opinion.
> 
> This patch is about mitigating (but not solving) the problem of long-lived
> pins. In the general case, about all I could think of for that is that the

Agreed.

> kernel would have to warn the administrator what applications had pinned
> the memory and wait for the user to shut them down. To guarantee anything,
> it would be necessary for subsystems to implement a callback for migration
> to unpin pages, barrier operations until migration completes and pin the
> new pfns.

It could be applied for SUBSYSTEM but it's very hard for all DRIVER developer,
and I doubt we can give them a common template most of driver developers can
reuse it.

> 
> > Because IMHO, we can't expect most of user for MEMORY_HOTPLUG will release
> > pinned pages immediately.
> 
> Indeed not, but it's not really what this patch is about. This patch is
> about moving the pages before they get permanently pinned. It mitigates
> the problem but does not solve it because there is no guarantee that the
> driver pinning a page will flag it properly.

True.
And I doubt what memory-hotplug guys really want is best effort,
not guarantee. Anway, CMA want to guarantee, even low latency and I hope
this patch solves both memory-hotplug and CMA solve the problem.

> 
> > In addtion, MEMORY_HOTPLUG could be used for embedded system
> > for reducing power by PASR and some drivers in embedded could use GUP anytime and anywhere.
> > They can't know in advance they will use pinned pages long time or release in short time
> > because it depends on some event like user's response which is very not predetermined.
> 
> True. This patch does not solve that problem.
> 
> > So for solving it, we can add some WARN_ON in CMA/MEMORY_HOTPLUG part just in case of
> > failing migration by page count and then, investigate they are really using GUP and
> > it's REALLY a culprit. If so, yell to them "Please use GUP_NM instead"?
> 
> Within the context of this patch, that is their main option. Finding
> who is holding the pin is a problem. For userspace-pinned buffers it's
> straight-forward as rmap will identify what processes are holding the
> pin (page->list vmas->mm, lookup all tasks until p->mm == mm) and report
> that. For driver-related pins, it's not as straight-forward. I guess there

True.

> could be callback to give meaningful information on it but no guarantee
> that drivers pinning pages will implement it. In that case all you could do

Nod.

> was dump page->mapping and punt it at a kernel developer to figure out the
> responsible driver. This might be managable for memory hot-remove where
> there is an administator but may not work at all for embedded users.

Yeab. Even there are proprietary modules in embedded, we can't see soruce code.

> 
> There is the possibility that callbacks could be introduced for
> migrate_unpin() and migrate_pin() that takes a list of PFN pairs
> (old,new). The unpin callback should release the old PFNs and barrier
> against any operations until the migrate_pfn() callback is called with
> the updated pfns to be repinned. Again it would fully depend on subsystems
> implementing it properly.
> 
> The callback interface would be more robust but puts a lot more work on
> the driver side where your milage will vary.

True.

> 
> > Yes. it could be done but it would be rather trobulesome job. Even it couldn't be triggered
> > during QE phase so that trouble doesn't end until all guys uses GUP_NM.
> > Let's consider another case. Some driver pin the page in very short time
> > so he decide to use GUP instead of GUP_NM but someday, someuser start to use the driver
> > very often so although pinning time is very short, it could be forever pinning effect
> > if the use calls it very often. In the end, we should change it with GUP_NM, again.
> > IMHO, In future, we ends up changing most of GUP user with GUP_NM if CMA and MEMORY_HOTPLUG
> > is available all over the world.
> > 
> 
> Same thing, callbacks to unpin and barrier would handle such a case by
> effectively freezing the driver or subsystem responsible for the page.
> 
> > So, what's wrong if we replace get_user_pages with get_user_pages_non_movable
> > in MEMORY_HOTPLUG/CMA without exposing get_user_pages_non_movable?
> > 
> > I mean this
> > 
> > #ifdef CONFIG_MIGRATE_ISOLATE
> > int get_user_pages()
> > {
> >         return __get_user_pages_non_movable();
> > }
> > #else
> > int get_user_pages()
> > {
> >         return old_get_user_pages();
> > }
> > #endif
> > 
> 
> That will migrate everything out of ZONE_MOVABLE every time it's pinned.
> One consequence is that direct IO can never use ZONE_MOVABLE on these
> systems. It'll create a variation of the lowmem exhaustion problem.

For example, there is 4G highmem zone and half of it is movable zone.
In thit case, we can use extra 2G highmem zone space instead of lowmem.
But I agree it could end up pinning many pages of lowmem so the problem
would happens. IMHO, it should be trade-off for using MEMORY-HOTPLUG/CMA?

> 
> > IMHO, get_user_pages isn't performance sensitive function. If user was sensitive
> > about it, he should have tried get_user_pages_fast.
> 
> That opens a different cans of works. get_user_pages is part of the
> gup_fast slowpath.
> 
> > THP degradation by increasing MIGRATE_UNMOVABLE?
> 
> The patch should not be converting MIGRATE_MOVABLE requests to
> MIGRATE_UNMOVABLE. I covered this in the review of v1.

I guess memory-hotplug guys want to use GUP_NM for long-time pin user.
So doesn't it make sense to migrate the page into MIGRATE_UNMOVABLE?
But I'm not sure GUP_NM's semantic.

> 
> > Lin said most of GUP pages release the page in short so is it really problem?
> > Even in embedded, we don't use THP yet but CMA and GUP call would be not too often
> > but failing of CMA would be critical.
> > 
> 
> To guarantee CMA can migrate pages pinned by drivers I think you need
> migrate-related callsbacks to unpin, barrier the driver until migration
> completes and repin.

I agree it's a ideal solution when we consider in future but as you already
mentioned, it's not easy for all drivers.
In fact, I don't want to insist on my opinion for CMA because I guess CMA
design is not good from the beginning.

I just posted my concern and want to discuss to solve the problem but
if there are not plain solution now, let me pass the decision to maintainer.

Thanks for sharing your opinion, Mel!

> 
> I do not know, or at least have no heard, of anyone working on such a
> scheme.
> 
> -- 
> Mel Gorman
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
       [not found]   ` <20130220113757.GA10124@hacker.(null)>
@ 2013-02-20 12:39     ` Lin Feng
  0 siblings, 0 replies; 37+ messages in thread
From: Lin Feng @ 2013-02-20 12:39 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: akpm, mgorman, bcrl, viro, khlebnikov, walken, kamezawa.hiroyu,
	minchan, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel

Hi  Wanpeng,

On 02/20/2013 07:37 PM, Wanpeng Li wrote:
>> + * This function first calls get_user_pages() to get the candidate pages, and
>> >+ * then check to ensure all pages are from non movable zone. Otherwise migrate
> How about "Otherwise migrate candidate pages which have already been 
> isolated to non movable zone."?
> 

Which is just what the code does, I'm feeling that it's too detailed to be proper :(
Do we have to comment it like that detailedly?

thanks,
linfeng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-02-06  9:56       ` Mel Gorman
  2013-02-08  2:32         ` Minchan Kim
@ 2013-05-13  9:11         ` Tang Chen
  2013-05-13  9:19           ` Mel Gorman
  1 sibling, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-13  9:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Lin Feng, akpm, bcrl, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

Hi Mel,

On 02/06/2013 05:56 PM, Mel Gorman wrote:
>
> There is the possibility that callbacks could be introduced for
> migrate_unpin() and migrate_pin() that takes a list of PFN pairs
> (old,new). The unpin callback should release the old PFNs and barrier
> against any operations until the migrate_pfn() callback is called with
> the updated pfns to be repinned. Again it would fully depend on subsystems
> implementing it properly.
>
> The callback interface would be more robust but puts a lot more work on
> the driver side where your milage will vary.
>

I'm very interested in the "callback" way you said.

For memory hot-remove case, the aio pages are pined in memory and making
the pages cannot be offlined, furthermore, the pages cannot be removed.

IIUC, you mean implement migrate_unpin() and migrate_pin() callbacks in aio
subsystem, and call them when hot-remove code tries to offline pages, 
right ?

If so, I'm wondering where should we put this callback pointers ?
In struct page ?


It has been a long time since this topic was discussed. But to solve this
problem cleanly for hotplug guys and CMA guys, please give some more 
comments.

Thanks. :)

>
> To guarantee CMA can migrate pages pinned by drivers I think you need
> migrate-related callsbacks to unpin, barrier the driver until migration
> completes and repin.
>
> I do not know, or at least have no heard, of anyone working on such a
> scheme.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-13  9:11         ` Tang Chen
@ 2013-05-13  9:19           ` Mel Gorman
  2013-05-13 14:37             ` Benjamin LaHaise
  2013-05-14  3:55             ` Tang Chen
  0 siblings, 2 replies; 37+ messages in thread
From: Mel Gorman @ 2013-05-13  9:19 UTC (permalink / raw)
  To: Tang Chen
  Cc: Minchan Kim, Lin Feng, akpm, bcrl, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

On Mon, May 13, 2013 at 05:11:43PM +0800, Tang Chen wrote:
> Hi Mel,
> 
> On 02/06/2013 05:56 PM, Mel Gorman wrote:
> >
> >There is the possibility that callbacks could be introduced for
> >migrate_unpin() and migrate_pin() that takes a list of PFN pairs
> >(old,new). The unpin callback should release the old PFNs and barrier
> >against any operations until the migrate_pfn() callback is called with
> >the updated pfns to be repinned. Again it would fully depend on subsystems
> >implementing it properly.
> >
> >The callback interface would be more robust but puts a lot more work on
> >the driver side where your milage will vary.
> >
> 
> I'm very interested in the "callback" way you said.
> 
> For memory hot-remove case, the aio pages are pined in memory and making
> the pages cannot be offlined, furthermore, the pages cannot be removed.
> 
> IIUC, you mean implement migrate_unpin() and migrate_pin() callbacks in aio
> subsystem, and call them when hot-remove code tries to offline
> pages, right ?
> 
> If so, I'm wondering where should we put this callback pointers ?
> In struct page ?
> 

No, I would expect the callbacks to be part the address space operations
which can be found via page->mapping.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-13  9:19           ` Mel Gorman
@ 2013-05-13 14:37             ` Benjamin LaHaise
  2013-05-13 14:54               ` Jeff Moyer
  2013-05-14  3:55             ` Tang Chen
  1 sibling, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-13 14:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Tang Chen, Minchan Kim, Lin Feng, akpm, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

On Mon, May 13, 2013 at 10:19:02AM +0100, Mel Gorman wrote:
> On Mon, May 13, 2013 at 05:11:43PM +0800, Tang Chen wrote:
...
> > If so, I'm wondering where should we put this callback pointers ?
> > In struct page ?
> > 
> 
> No, I would expect the callbacks to be part the address space operations
> which can be found via page->mapping.

If someone adds those callbacks and provides a means for testing them, 
it would be pretty trivial to change the aio code to migrate its pinned 
pages on demand.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-13 14:37             ` Benjamin LaHaise
@ 2013-05-13 14:54               ` Jeff Moyer
  2013-05-13 15:01                 ` Benjamin LaHaise
  0 siblings, 1 reply; 37+ messages in thread
From: Jeff Moyer @ 2013-05-13 14:54 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Mel Gorman, Tang Chen, Minchan Kim, Lin Feng, akpm, viro,
	khlebnikov, walken, kamezawa.hiroyu, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Marek Szyprowski

Benjamin LaHaise <bcrl@kvack.org> writes:

> On Mon, May 13, 2013 at 10:19:02AM +0100, Mel Gorman wrote:
>> On Mon, May 13, 2013 at 05:11:43PM +0800, Tang Chen wrote:
> ...
>> > If so, I'm wondering where should we put this callback pointers ?
>> > In struct page ?
>> > 
>> 
>> No, I would expect the callbacks to be part the address space operations
>> which can be found via page->mapping.
>
> If someone adds those callbacks and provides a means for testing them, 
> it would be pretty trivial to change the aio code to migrate its pinned 
> pages on demand.

How do you propose to move the ring pages?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-13 14:54               ` Jeff Moyer
@ 2013-05-13 15:01                 ` Benjamin LaHaise
  2013-05-14  1:24                   ` Tang Chen
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-13 15:01 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Mel Gorman, Tang Chen, Minchan Kim, Lin Feng, akpm, viro,
	khlebnikov, walken, kamezawa.hiroyu, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Marek Szyprowski

On Mon, May 13, 2013 at 10:54:03AM -0400, Jeff Moyer wrote:
> How do you propose to move the ring pages?

It's the same problem as doing a TLB shootdown: flush the old pages from 
userspace's mapping, copy any existing data to the new pages, then 
repopulate the page tables.  It will likely require the addition of 
address_space_operations for the mapping, but that's not too hard to do.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-13 15:01                 ` Benjamin LaHaise
@ 2013-05-14  1:24                   ` Tang Chen
  2013-05-14 13:58                     ` Benjamin LaHaise
  0 siblings, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-14  1:24 UTC (permalink / raw)
  To: Benjamin LaHaise, Jeff Moyer, Mel Gorman
  Cc: Minchan Kim, Lin Feng, akpm, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, linux-mm, linux-aio, linux-fsdevel, linux-kernel,
	Marek Szyprowski

Hi Mel, Benjamin, Jeff,

On 05/13/2013 11:01 PM, Benjamin LaHaise wrote:
> On Mon, May 13, 2013 at 10:54:03AM -0400, Jeff Moyer wrote:
>> How do you propose to move the ring pages?
>
> It's the same problem as doing a TLB shootdown: flush the old pages from
> userspace's mapping, copy any existing data to the new pages, then
> repopulate the page tables.  It will likely require the addition of
> address_space_operations for the mapping, but that's not too hard to do.
>

I think we add migrate_unpin() callback to decrease page->count if 
necessary,
and migrate the page to a new page, and add migrate_pin() callback to pin
the new page again.

The migrate procedure will work just as before. We use callbacks to 
decrease
the page->count before migration starts, and increase it when the migration
is done.

And migrate_pin() and migrate_unpin() callbacks will be added to
struct address_space_operations.

Is that right ?

If so, I'll be working on it.

Thanks. :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-13  9:19           ` Mel Gorman
  2013-05-13 14:37             ` Benjamin LaHaise
@ 2013-05-14  3:55             ` Tang Chen
  2013-05-15 13:24               ` Mel Gorman
  1 sibling, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-14  3:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Lin Feng, akpm, bcrl, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

Hi Mel,

On 05/13/2013 05:19 PM, Mel Gorman wrote:
>> For memory hot-remove case, the aio pages are pined in memory and making
>> the pages cannot be offlined, furthermore, the pages cannot be removed.
>>
>> IIUC, you mean implement migrate_unpin() and migrate_pin() callbacks in aio
>> subsystem, and call them when hot-remove code tries to offline
>> pages, right ?
>>
>> If so, I'm wondering where should we put this callback pointers ?
>> In struct page ?
>>
>
> No, I would expect the callbacks to be part the address space operations
> which can be found via page->mapping.
>

Two more problems I don't quite understand.

1. For an anonymous page, it has no address_space, and no address space
    operation. But the aio ring problem just happened when dealing with
    anonymous pages. Please refer to:
    (https://lkml.org/lkml/2012/11/29/69)

    If we put the the callbacks in page->mapping->a_ops, the anonymous 
pages
    won't be able to use them.

    And we cannot give a default callback because the situation we are 
dealing
    with is a special situation.

    So where to put the callback for anonymous pages ?


2. How to find out the reason why page->count != 1 in 
migrate_page_move_mapping() ?

    In the problem we are dealing with, get_user_pages() is called to 
pin the pages
    in memory. And the pages are migratable. So we want to decrease the 
page->count.

    But get_user_pages() is not the only reason leading to page->count 
increased.
    How can I know when should decrease teh page->count or when should not ?

    The way I can figure out is to assign the callback pointer in 
get_user_pages()
    because it is get_user_pages() who pins the pages.


Thanks. :)









^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-14  1:24                   ` Tang Chen
@ 2013-05-14 13:58                     ` Benjamin LaHaise
  2013-05-15  2:09                       ` Tang Chen
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-14 13:58 UTC (permalink / raw)
  To: Tang Chen
  Cc: Jeff Moyer, Mel Gorman, Minchan Kim, Lin Feng, akpm, viro,
	khlebnikov, walken, kamezawa.hiroyu, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Marek Szyprowski

On Tue, May 14, 2013 at 09:24:58AM +0800, Tang Chen wrote:
> Hi Mel, Benjamin, Jeff,
> 
> On 05/13/2013 11:01 PM, Benjamin LaHaise wrote:
> >On Mon, May 13, 2013 at 10:54:03AM -0400, Jeff Moyer wrote:
> >>How do you propose to move the ring pages?
> >
> >It's the same problem as doing a TLB shootdown: flush the old pages from
> >userspace's mapping, copy any existing data to the new pages, then
> >repopulate the page tables.  It will likely require the addition of
> >address_space_operations for the mapping, but that's not too hard to do.
> >
> 
> I think we add migrate_unpin() callback to decrease page->count if 
> necessary,
> and migrate the page to a new page, and add migrate_pin() callback to pin
> the new page again.

You can't just decrease the page count for this to work.  The pages are 
pinned because aio_complete() can occur at any time and needs to have a 
place to write the completion events.  When changing pages, aio has to 
take the appropriate lock when changing one page for another.

> The migrate procedure will work just as before. We use callbacks to 
> decrease
> the page->count before migration starts, and increase it when the migration
> is done.
> 
> And migrate_pin() and migrate_unpin() callbacks will be added to
> struct address_space_operations.

I think the existing migratepage operation in address_space_operations can 
be used.  Does it get called when hot unplug occurs?  That is: is testing 
with the migrate_pages syscall similar enough to the memory removal case?

		-ben

> Is that right ?
> 
> If so, I'll be working on it.
> 
> Thanks. :)

-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-14 13:58                     ` Benjamin LaHaise
@ 2013-05-15  2:09                       ` Tang Chen
  2013-05-15  7:21                         ` Tang Chen
  0 siblings, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-15  2:09 UTC (permalink / raw)
  To: Benjamin LaHaise, Mel Gorman
  Cc: Jeff Moyer, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

Hi Benjamin, Mel,

Please see below.

On 05/14/2013 09:58 PM, Benjamin LaHaise wrote:
> On Tue, May 14, 2013 at 09:24:58AM +0800, Tang Chen wrote:
>> Hi Mel, Benjamin, Jeff,
>>
>> On 05/13/2013 11:01 PM, Benjamin LaHaise wrote:
>>> On Mon, May 13, 2013 at 10:54:03AM -0400, Jeff Moyer wrote:
>>>> How do you propose to move the ring pages?
>>>
>>> It's the same problem as doing a TLB shootdown: flush the old pages from
>>> userspace's mapping, copy any existing data to the new pages, then
>>> repopulate the page tables.  It will likely require the addition of
>>> address_space_operations for the mapping, but that's not too hard to do.
>>>
>>
>> I think we add migrate_unpin() callback to decrease page->count if
>> necessary,
>> and migrate the page to a new page, and add migrate_pin() callback to pin
>> the new page again.
>
> You can't just decrease the page count for this to work.  The pages are
> pinned because aio_complete() can occur at any time and needs to have a
> place to write the completion events.  When changing pages, aio has to
> take the appropriate lock when changing one page for another.

In aio_complete(),

aio_complete() {
	......
	spin_lock_irqsave(&ctx->completion_lock, flags);
	//write the completion event.
	spin_unlock_irqrestore(&ctx->completion_lock, flags);
	......
}

So for this problem, I think we can hold ctx->completion_lock in the aio
callbacks to prevent aio subsystem accessing pages who are being migrated.

>
>> The migrate procedure will work just as before. We use callbacks to
>> decrease
>> the page->count before migration starts, and increase it when the migration
>> is done.
>>
>> And migrate_pin() and migrate_unpin() callbacks will be added to
>> struct address_space_operations.
>
> I think the existing migratepage operation in address_space_operations can
> be used.  Does it get called when hot unplug occurs?  That is: is testing
> with the migrate_pages syscall similar enough to the memory removal case?
>

But as I said, for anonymous pages such as aio ring buffer, they don't have
address_space_operations. So where should we put the callbacks' pointers ?

Add something like address_space_operations to struct anon_vma ?

Thanks. :)







^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-15  2:09                       ` Tang Chen
@ 2013-05-15  7:21                         ` Tang Chen
  0 siblings, 0 replies; 37+ messages in thread
From: Tang Chen @ 2013-05-15  7:21 UTC (permalink / raw)
  To: Benjamin LaHaise, Mel Gorman
  Cc: Jeff Moyer, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

Hi Benjamin, Mel,

On 05/15/2013 10:09 AM, Tang Chen wrote:
> Hi Benjamin, Mel,
>
> Please see below.
>
> On 05/14/2013 09:58 PM, Benjamin LaHaise wrote:
>> On Tue, May 14, 2013 at 09:24:58AM +0800, Tang Chen wrote:
>>> Hi Mel, Benjamin, Jeff,
>>>
>>> On 05/13/2013 11:01 PM, Benjamin LaHaise wrote:
>>>> On Mon, May 13, 2013 at 10:54:03AM -0400, Jeff Moyer wrote:
>>>>> How do you propose to move the ring pages?
>>>>
>>>> It's the same problem as doing a TLB shootdown: flush the old pages
>>>> from
>>>> userspace's mapping, copy any existing data to the new pages, then
>>>> repopulate the page tables. It will likely require the addition of
>>>> address_space_operations for the mapping, but that's not too hard to
>>>> do.
>>>>
>>>
>>> I think we add migrate_unpin() callback to decrease page->count if
>>> necessary,
>>> and migrate the page to a new page, and add migrate_pin() callback to
>>> pin
>>> the new page again.
>>
>> You can't just decrease the page count for this to work. The pages are
>> pinned because aio_complete() can occur at any time and needs to have a
>> place to write the completion events. When changing pages, aio has to
>> take the appropriate lock when changing one page for another.
>
> In aio_complete(),
>
> aio_complete() {
> ......
> spin_lock_irqsave(&ctx->completion_lock, flags);
> //write the completion event.
> spin_unlock_irqrestore(&ctx->completion_lock, flags);
> ......
> }
>
> So for this problem, I think we can hold kioctx->completion_lock in the aio
> callbacks to prevent aio subsystem accessing pages who are being migrated.
>

Another problem here is:

We intend to call these callbacks in the page migrate path, and we need to
know which lock to hold. But there is no way for migrate path to know this
info.

The migrate path is common for all kinds of pages, so we cannot pass any
specific parameter to the callbacks in migrate path.

When we get a page, we cannot get any kioctx info from the page. So how can
the callback know which lock to require without any parameter ? Or do we 
have
any other way to do so ?

Would you please give some more advice about this ?

BTW, we also need to update kioctx->ring_pages.

Thanks. :)

>>
>>> The migrate procedure will work just as before. We use callbacks to
>>> decrease
>>> the page->count before migration starts, and increase it when the
>>> migration
>>> is done.
>>>
>>> And migrate_pin() and migrate_unpin() callbacks will be added to
>>> struct address_space_operations.
>>
>> I think the existing migratepage operation in address_space_operations
>> can
>> be used. Does it get called when hot unplug occurs? That is: is testing
>> with the migrate_pages syscall similar enough to the memory removal case?
>>
>
> But as I said, for anonymous pages such as aio ring buffer, they don't have
> address_space_operations. So where should we put the callbacks' pointers ?
>
> Add something like address_space_operations to struct anon_vma ?
>
> Thanks. :)
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-14  3:55             ` Tang Chen
@ 2013-05-15 13:24               ` Mel Gorman
  2013-05-16  5:54                 ` Tang Chen
  0 siblings, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2013-05-15 13:24 UTC (permalink / raw)
  To: Tang Chen
  Cc: Minchan Kim, Lin Feng, akpm, bcrl, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

On Tue, May 14, 2013 at 11:55:31AM +0800, Tang Chen wrote:
> Hi Mel,
> 
> On 05/13/2013 05:19 PM, Mel Gorman wrote:
> >>For memory hot-remove case, the aio pages are pined in memory and making
> >>the pages cannot be offlined, furthermore, the pages cannot be removed.
> >>
> >>IIUC, you mean implement migrate_unpin() and migrate_pin() callbacks in aio
> >>subsystem, and call them when hot-remove code tries to offline
> >>pages, right ?
> >>
> >>If so, I'm wondering where should we put this callback pointers ?
> >>In struct page ?
> >>
> >
> >No, I would expect the callbacks to be part the address space operations
> >which can be found via page->mapping.
> >
> 
> Two more problems I don't quite understand.
> 

Bear in mind I've done no research on this particular problem. At best,
the migrate pin/unpin is the direction that I'd start with if I was tasked
with fixing this (which I'm not). Hence, I cannot answer your questions
at the level of detail you are looking for.

> 1. For an anonymous page, it has no address_space, and no address space
>    operation. But the aio ring problem just happened when dealing with
>    anonymous pages. Please refer to:
>    (https://lkml.org/lkml/2012/11/29/69)
> 

If it is to be an address space operations sturcture then you'll need a
pseudo mapping structure for anonymous pages that are pinned by aio --
similar in principal to how swapper_space is used for managing PageSwapCache 
or how anon_vma structures can be associated with a page.

However, I warn you that you may find that the address_space is the
wrong level to register such callbacks, it just seemed like the obvious
first choice. A potential alternative implementation is to create a 1:1
association between pages and a long-lived holder that is stored on a hash
table (similar style of arrangement as page_waitqueue).  A page is looked up
in the hash table and if an entry exists, it points to an callback structure
to the subsystem holding the pin. It's up to the subsystem to register the
callbacks when it is about to pin a page (get_user_pages_longlived(....,
&release_ops) and figure out how to release the pin safely.

> 2. How to find out the reason why page->count != 1 in
> migrate_page_move_mapping() ?
> 
>    In the problem we are dealing with, get_user_pages() is called to
> pin the pages
>    in memory. And the pages are migratable. So we want to decrease
> the page->count.
> 
>    But get_user_pages() is not the only reason leading to
> page->count increased.
>    How can I know when should decrease teh page->count or when should not ?
> 

You cannot just arbitrarily drop the page->count without causing problems. It
has to be released by the subsystem holding the pin because only it can
know when it's safe.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()
  2013-05-15 13:24               ` Mel Gorman
@ 2013-05-16  5:54                 ` Tang Chen
  2013-05-17  0:23                   ` [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()) Benjamin LaHaise
  0 siblings, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-16  5:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Lin Feng, akpm, bcrl, viro, khlebnikov, walken,
	kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency, laijs,
	jiang.liu, zab, jmoyer, linux-mm, linux-aio, linux-fsdevel,
	linux-kernel, Marek Szyprowski

Hi Mel,

On 05/15/2013 09:24 PM, Mel Gorman wrote:
> If it is to be an address space operations sturcture then you'll need a
> pseudo mapping structure for anonymous pages that are pinned by aio --
> similar in principal to how swapper_space is used for managing PageSwapCache
> or how anon_vma structures can be associated with a page.
>
> However, I warn you that you may find that the address_space is the
> wrong level to register such callbacks, it just seemed like the obvious
> first choice. A potential alternative implementation is to create a 1:1
> association between pages and a long-lived holder that is stored on a hash
> table (similar style of arrangement as page_waitqueue).  A page is looked up
> in the hash table and if an entry exists, it points to an callback structure
> to the subsystem holding the pin. It's up to the subsystem to register the
> callbacks when it is about to pin a page (get_user_pages_longlived(....,
> &release_ops) and figure out how to release the pin safely.
>

OK, I'll try to figure out a proper place to put the callbacks.
But I think we need to add something new to struct page. I'm just
not sure if it is OK. Maybe we can discuss more about it when I send
a RFC patch.

Thanks for the advices, and I'll try them.

Thanks. :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-16  5:54                 ` Tang Chen
@ 2013-05-17  0:23                   ` Benjamin LaHaise
  2013-05-17  3:28                     ` Tang Chen
  2013-05-17 18:17                     ` Zach Brown
  0 siblings, 2 replies; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-17  0:23 UTC (permalink / raw)
  To: Tang Chen
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On Thu, May 16, 2013 at 01:54:18PM +0800, Tang Chen wrote:
...
> OK, I'll try to figure out a proper place to put the callbacks.
> But I think we need to add something new to struct page. I'm just
> not sure if it is OK. Maybe we can discuss more about it when I send
> a RFC patch.
...

I ended up working on this a bit today, and managed to cobble together 
something that somewhat works -- please see the patch below.  It still is 
not completely tested, and it has a rather nasty bug owing to the fact 
that the file descriptors returned by anon_inode_getfile() all share the 
same inode (read: more than one instance of aio does not work), but it 
shows the basic idea.  Also, bad things probably happen if someone does 
an mremap() on the aio ring buffer.  I'll polish this off sometime next 
week after the long weekend if noone beats me to it.

		-ben
-- 
"Thought is the essence of where you are now."

 fs/aio.c                |   95 ++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/migrate.h |    3 +
 mm/migrate.c            |    2 -
 3 files changed, 96 insertions(+), 4 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index c5b1a8c..dbad23e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -35,6 +35,9 @@
 #include <linux/eventfd.h>
 #include <linux/blkdev.h>
 #include <linux/compat.h>
+#include <linux/anon_inodes.h>
+#include <linux/migrate.h>
+#include <linux/ramfs.h>
 
 #include <asm/kmap_types.h>
 #include <asm/uaccess.h>
@@ -108,6 +111,7 @@ struct kioctx {
 	} ____cacheline_aligned_in_smp;
 
 	struct page		*internal_pages[AIO_RING_PAGES];
+	struct file		*ctx_file;
 };
 
 /*------ sysctl variables----*/
@@ -146,8 +150,59 @@ static void aio_free_ring(struct kioctx *ctx)
 
 	if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages)
 		kfree(ctx->ring_pages);
+
+	if (ctx->ctx_file) {
+		truncate_setsize(ctx->ctx_file->f_inode, 0);
+		fput(ctx->ctx_file);
+		ctx->ctx_file = NULL;
+	}
+}
+
+static int aio_ctx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	vma->vm_ops = &generic_file_vm_ops;
+	return 0;
+}
+
+static const struct file_operations aio_ctx_fops = {
+	.mmap	= aio_ctx_mmap,
+};
+
+static int aio_set_page_dirty(struct page *page)
+{
+	return 0;
+}
+
+static int aio_migratepage(struct address_space *mapping, struct page *new,
+			   struct page *old, enum migrate_mode mode)
+{
+	struct kioctx *ctx = mapping->private_data;
+	unsigned long flags;
+	unsigned idx = old->index;
+	int rc;
+
+	BUG_ON(PageWriteback(old));    /* Writeback must be complete */
+	put_page(old);
+	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
+	if (rc != MIGRATEPAGE_SUCCESS) {
+		get_page(old);
+		return rc;
+	}
+	get_page(new);
+
+	spin_lock_irqsave(&ctx->completion_lock, flags);
+	migrate_page_copy(new, old);
+	ctx->ring_pages[idx] = new;
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
+	return MIGRATEPAGE_SUCCESS;
 }
 
+static const struct address_space_operations aio_ctx_aops = {
+	.set_page_dirty = aio_set_page_dirty,
+	.migratepage	= aio_migratepage,
+};
+
 static int aio_setup_ring(struct kioctx *ctx)
 {
 	struct aio_ring *ring;
@@ -155,6 +210,7 @@ static int aio_setup_ring(struct kioctx *ctx)
 	struct mm_struct *mm = current->mm;
 	unsigned long size, populate;
 	int nr_pages;
+	int i;
 
 	/* Compensate for the ring buffer's head/tail overlap entry */
 	nr_events += 2;	/* 1 is required, 2 for good luck */
@@ -166,6 +222,31 @@ static int aio_setup_ring(struct kioctx *ctx)
 	if (nr_pages < 0)
 		return -EINVAL;
 
+	ctx->ctx_file = anon_inode_getfile("[aio]", &aio_ctx_fops, ctx, O_RDWR);
+	if (IS_ERR(ctx->ctx_file)) {
+		ctx->ctx_file = NULL;
+		return -EAGAIN;
+	}
+	ctx->ctx_file->f_inode->i_mapping->a_ops = &aio_ctx_aops;
+	ctx->ctx_file->f_inode->i_mapping->private_data = ctx;
+	ctx->ctx_file->f_inode->i_size = PAGE_SIZE * (loff_t)nr_pages;
+
+	for (i=0; i<nr_pages; i++) {
+		struct page *page;
+		void *ptr;
+		page = find_or_create_page(ctx->ctx_file->f_inode->i_mapping,
+					   i, GFP_KERNEL);
+		if (!page) {
+			break;
+		}
+		ptr = kmap(page);
+		clear_page(ptr);
+		kunmap(page);
+		SetPageUptodate(page);
+		SetPageDirty(page);
+		unlock_page(page);
+	}
+
 	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
 
 	ctx->nr_events = 0;
@@ -180,20 +261,25 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 	down_write(&mm->mmap_sem);
-	ctx->mmap_base = do_mmap_pgoff(NULL, 0, ctx->mmap_size,
+	ctx->mmap_base = do_mmap_pgoff(ctx->ctx_file, 0, ctx->mmap_size,
 				       PROT_READ|PROT_WRITE,
-				       MAP_ANONYMOUS|MAP_PRIVATE, 0, &populate);
+				       MAP_SHARED|MAP_POPULATE, 0,
+				       &populate);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		up_write(&mm->mmap_sem);
 		ctx->mmap_size = 0;
 		aio_free_ring(ctx);
 		return -EAGAIN;
 	}
+	up_write(&mm->mmap_sem);
+	mm_populate(ctx->mmap_base, populate);
 
 	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
 	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
 				       1, 0, ctx->ring_pages, NULL);
-	up_write(&mm->mmap_sem);
+	for (i=0; i<ctx->nr_pages; i++) {
+		put_page(ctx->ring_pages[i]);
+	}
 
 	if (unlikely(ctx->nr_pages != nr_pages)) {
 		aio_free_ring(ctx);
@@ -403,6 +489,8 @@ out_cleanup:
 	err = -EAGAIN;
 	aio_free_ring(ctx);
 out_freectx:
+	if (ctx->ctx_file)
+		fput(ctx->ctx_file);
 	kmem_cache_free(kioctx_cachep, ctx);
 	pr_debug("error allocating ioctx %d\n", err);
 	return ERR_PTR(err);
@@ -852,6 +940,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
 	ioctx = ioctx_alloc(nr_events);
 	ret = PTR_ERR(ioctx);
 	if (!IS_ERR(ioctx)) {
+		ctx = ioctx->user_id;
 		ret = put_user(ioctx->user_id, ctxp);
 		if (ret)
 			kill_ioctx(ioctx);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index a405d3dc..b6f3289 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -55,6 +55,9 @@ extern int migrate_vmas(struct mm_struct *mm,
 extern void migrate_page_copy(struct page *newpage, struct page *page);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct page *newpage, struct page *page);
+extern int migrate_page_move_mapping(struct address_space *mapping,
+                struct page *newpage, struct page *page,
+                struct buffer_head *head, enum migrate_mode mode);
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
diff --git a/mm/migrate.c b/mm/migrate.c
index 27ed225..ac9c3a9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -294,7 +294,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
  * 2 for pages with a mapping
  * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
  */
-static int migrate_page_move_mapping(struct address_space *mapping,
+int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page,
 		struct buffer_head *head, enum migrate_mode mode)
 {

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-17  0:23                   ` [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()) Benjamin LaHaise
@ 2013-05-17  3:28                     ` Tang Chen
  2013-05-17 14:37                       ` Benjamin LaHaise
  2013-05-17 18:17                     ` Zach Brown
  1 sibling, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-17  3:28 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

Hi Benjamin,

Thank you very much for your idea. :)

I have no objection to your idea, but seeing from your patch, this only
works for aio subsystem because you changed the way to allocate the aio
ring pages, with a file mapping.

So far as I know, not only aio, but also other subsystems, such CMA, will
also have problem like this. The page cannot be migrated because it is
pinned in memory. So I think we should work out a common way to solve how
to migrate pinned pages.

I'm working in the way Mel has said, migrate_unpin() and migrate_pin()
callbacks. But as you saw, I met some problems, like I don't where to put
these two callbacks. And discussed with you guys, I want to try this:

1. Add a new member to struct page, used to remember the pin holders of
    this page, including the pin and unpin callbacks and the necessary data.
    This is more like a callback chain.
    (I'm worry about this step, I'm not sure if it is good enough. After 
all,
     we need a good place to put the callbacks.)

And then, like Mel said,

2. Implement the callbacks in the subsystems, and register them to the
    new member in struct page.

3. Call these callbacks before and after migration.


I think I'll send a RFC patch next week when I finished the outline. I'm
just thinking of finding a common way to solve this problem that all the
other subsystems will benefit.

Thanks. :)


On 05/17/2013 08:23 AM, Benjamin LaHaise wrote:
> On Thu, May 16, 2013 at 01:54:18PM +0800, Tang Chen wrote:
> ...
>> OK, I'll try to figure out a proper place to put the callbacks.
>> But I think we need to add something new to struct page. I'm just
>> not sure if it is OK. Maybe we can discuss more about it when I send
>> a RFC patch.
> ...
>
> I ended up working on this a bit today, and managed to cobble together
> something that somewhat works -- please see the patch below.  It still is
> not completely tested, and it has a rather nasty bug owing to the fact
> that the file descriptors returned by anon_inode_getfile() all share the
> same inode (read: more than one instance of aio does not work), but it
> shows the basic idea.  Also, bad things probably happen if someone does
> an mremap() on the aio ring buffer.  I'll polish this off sometime next
> week after the long weekend if noone beats me to it.
>
> 		-ben

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-17  3:28                     ` Tang Chen
@ 2013-05-17 14:37                       ` Benjamin LaHaise
  2013-05-21  2:07                         ` Tang Chen
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-17 14:37 UTC (permalink / raw)
  To: Tang Chen
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On Fri, May 17, 2013 at 11:28:52AM +0800, Tang Chen wrote:
> Hi Benjamin,
> 
> Thank you very much for your idea. :)
> 
> I have no objection to your idea, but seeing from your patch, this only
> works for aio subsystem because you changed the way to allocate the aio
> ring pages, with a file mapping.

That is correct.  There is no way you're going to be able to solve this 
problem without dealing with the issue on a subsystem by subsystem basis.

> So far as I know, not only aio, but also other subsystems, such CMA, will
> also have problem like this. The page cannot be migrated because it is
> pinned in memory. So I think we should work out a common way to solve how
> to migrate pinned pages.

A generic approach would require hardware support, but I doubt that is 
going to happen.

> I'm working in the way Mel has said, migrate_unpin() and migrate_pin()
> callbacks. But as you saw, I met some problems, like I don't where to put
> these two callbacks. And discussed with you guys, I want to try this:
> 
> 1. Add a new member to struct page, used to remember the pin holders of
>    this page, including the pin and unpin callbacks and the necessary data.
>    This is more like a callback chain.
>    (I'm worry about this step, I'm not sure if it is good enough. After 
> all,
>     we need a good place to put the callbacks.)

Putting function pointers into struct page is not going to happen.  You'd 
be adding a significant amount of memory overhead for something that is 
never going to be used on the vast majority of systems (2 function pointers 
would be 16 bytes per page on a 64 bit system).  Keep in mind that distro 
kernels tend to enable almost all config options on their kernels, so the 
overhead of any approach has to make sense for the users of the kernel that 
will never make use of this kind of migration.

> And then, like Mel said,
> 
> 2. Implement the callbacks in the subsystems, and register them to the
>    new member in struct page.

No, the hook should be in the address_space_operations.  We already have 
a pointer to an address space in struct page.  This avoids adding more 
overhead to struct page.

> 3. Call these callbacks before and after migration.

How is that better than using the existing hook in address_space_operations?

> I think I'll send a RFC patch next week when I finished the outline. I'm
> just thinking of finding a common way to solve this problem that all the
> other subsystems will benefit.

Before pursuing this approach, make sure you've got buy-in for all of the 
overhead you're adding to the system.  I don't think that growing struct 
page is going to be an acceptable design choice given the amount of 
overhead it will incur.

> Thanks. :)

Cheers,

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-17  0:23                   ` [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()) Benjamin LaHaise
  2013-05-17  3:28                     ` Tang Chen
@ 2013-05-17 18:17                     ` Zach Brown
  2013-05-17 18:30                       ` Benjamin LaHaise
  1 sibling, 1 reply; 37+ messages in thread
From: Zach Brown @ 2013-05-17 18:17 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Mel Gorman, Minchan Kim, Lin Feng, akpm, viro,
	khlebnikov, walken, kamezawa.hiroyu, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, jmoyer, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Marek Szyprowski

> I ended up working on this a bit today, and managed to cobble together 
> something that somewhat works -- please see the patch below.

Just some quick observations:

> +	ctx->ctx_file = anon_inode_getfile("[aio]", &aio_ctx_fops, ctx, O_RDWR);
> +	if (IS_ERR(ctx->ctx_file)) {
> +		ctx->ctx_file = NULL;
> +		return -EAGAIN;
> +	}

It's too bad that aio contexts will now be accounted against the filp
limits (get_empty_filp -> files_stat.max_files, etc). 

> +	for (i=0; i<nr_pages; i++) {
> +		struct page *page;
> +		void *ptr;
> +		page = find_or_create_page(ctx->ctx_file->f_inode->i_mapping,
> +					   i, GFP_KERNEL);
> +		if (!page) {
> +			break;
> +		}
> +		ptr = kmap(page);
> +		clear_page(ptr);
> +		kunmap(page);
> +		SetPageUptodate(page);
> +		SetPageDirty(page);
> +		unlock_page(page);
> +	}

If they're GFP_KERNEL then you don't need to kmap them.  But we probably
want to allocate with GFP_HIGHUSER and then use clear_user_highpage() to
zero them?

- z

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-17 18:17                     ` Zach Brown
@ 2013-05-17 18:30                       ` Benjamin LaHaise
  0 siblings, 0 replies; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-17 18:30 UTC (permalink / raw)
  To: Zach Brown
  Cc: Tang Chen, Mel Gorman, Minchan Kim, Lin Feng, akpm, viro,
	khlebnikov, walken, kamezawa.hiroyu, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, jmoyer, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Marek Szyprowski

On Fri, May 17, 2013 at 11:17:08AM -0700, Zach Brown wrote:
> > I ended up working on this a bit today, and managed to cobble together 
> > something that somewhat works -- please see the patch below.
> 
> Just some quick observations:
> 
> > +	ctx->ctx_file = anon_inode_getfile("[aio]", &aio_ctx_fops, ctx, O_RDWR);
> > +	if (IS_ERR(ctx->ctx_file)) {
> > +		ctx->ctx_file = NULL;
> > +		return -EAGAIN;
> > +	}
> 
> It's too bad that aio contexts will now be accounted against the filp
> limits (get_empty_filp -> files_stat.max_files, etc). 

Yeah, that is a downside of this approach.  It would be possible to to 
do it with only an inode/address_space, but that would mean bypassing 
do_mmap(), which is not worth considering.  If it is really an issue, we 
could add a flag to bypass that limit since aio has its own.  
anon_inode_getfile() as it stands is a major problem.

> > +	for (i=0; i<nr_pages; i++) {
> > +		struct page *page;
> > +		void *ptr;
> > +		page = find_or_create_page(ctx->ctx_file->f_inode->i_mapping,
> > +					   i, GFP_KERNEL);
> > +		if (!page) {
> > +			break;
> > +		}
> > +		ptr = kmap(page);
> > +		clear_page(ptr);
> > +		kunmap(page);
> > +		SetPageUptodate(page);
> > +		SetPageDirty(page);
> > +		unlock_page(page);
> > +	}
> 
> If they're GFP_KERNEL then you don't need to kmap them.  But we probably
> want to allocate with GFP_HIGHUSER and then use clear_user_highpage() to
> zero them?

Adding __GFP_ZERO would fix that too. The next respin will include that 
change.  I also have to properly handle the mremap() case as well.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-17 14:37                       ` Benjamin LaHaise
@ 2013-05-21  2:07                         ` Tang Chen
  2013-05-21  2:27                           ` Benjamin LaHaise
  0 siblings, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-05-21  2:07 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

Hi Benjamin,

Sorry for the late. Please see below.

On 05/17/2013 10:37 PM, Benjamin LaHaise wrote:
> On Fri, May 17, 2013 at 11:28:52AM +0800, Tang Chen wrote:
>> Hi Benjamin,
>>
>> Thank you very much for your idea. :)
>>
>> I have no objection to your idea, but seeing from your patch, this only
>> works for aio subsystem because you changed the way to allocate the aio
>> ring pages, with a file mapping.
>
> That is correct.  There is no way you're going to be able to solve this
> problem without dealing with the issue on a subsystem by subsystem basis.
>

Yes, I understand that. We need subsystem work anyway.

>> I'm working in the way Mel has said, migrate_unpin() and migrate_pin()
>> callbacks. But as you saw, I met some problems, like I don't where to put
>> these two callbacks. And discussed with you guys, I want to try this:
>>
>> 1. Add a new member to struct page, used to remember the pin holders of
>>     this page, including the pin and unpin callbacks and the necessary data.
>>     This is more like a callback chain.
>>     (I'm worry about this step, I'm not sure if it is good enough. After
>> all,
>>      we need a good place to put the callbacks.)
>
> Putting function pointers into struct page is not going to happen.  You'd
> be adding a significant amount of memory overhead for something that is
> never going to be used on the vast majority of systems (2 function pointers
> would be 16 bytes per page on a 64 bit system).  Keep in mind that distro
> kernels tend to enable almost all config options on their kernels, so the
> overhead of any approach has to make sense for the users of the kernel that
> will never make use of this kind of migration.

True. But I just cannot find a place to hold the callbacks.

>
>> 3. Call these callbacks before and after migration.
>
> How is that better than using the existing hook in address_space_operations?

I'm not saying using two callbacks before and after migration is better.
I don't want to use address_space_operations is because there is no such 
member
for anonymous pages.

In your idea, using a file mapping will create a 
address_space_operations. But
I really don't think we can modify the way of memory allocation for all the
subsystems who has this problem. Maybe not just aio and cma. That means if
you want to pin pages in memory, you have to use a file mapping. This makes
the memory allocation more complicated. And the idea should be known by all
the subsystem developers. Is that going to happen ?

I also thought about reuse one field of struct page. But as you said, there
may not be many users of this functionality. Reusing a field of struct page
will make things more complicated and lead to high coupling.

So, how about the other idea that Mel mentioned ?

We create a 1-1 mapping of pinned page ranges and the pinner (subsystem
callbacks and data), maybe a global list or a hash table. And then, we can
find the callbacks.

Thanks. :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-21  2:07                         ` Tang Chen
@ 2013-05-21  2:27                           ` Benjamin LaHaise
  2013-06-11  9:42                             ` Tang Chen
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2013-05-21  2:27 UTC (permalink / raw)
  To: Tang Chen
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On Tue, May 21, 2013 at 10:07:52AM +0800, Tang Chen wrote:
....
> I'm not saying using two callbacks before and after migration is better.
> I don't want to use address_space_operations is because there is no such 
> member
> for anonymous pages.

That depends on the nature of the pinning.  For the general case of 
get_user_pages(), you're correct that it won't work for anonymous memory.

> In your idea, using a file mapping will create a 
> address_space_operations. But
> I really don't think we can modify the way of memory allocation for all the
> subsystems who has this problem. Maybe not just aio and cma. That means if
> you want to pin pages in memory, you have to use a file mapping. This makes
> the memory allocation more complicated. And the idea should be known by all
> the subsystem developers. Is that going to happen ?

Different subsystems will need to use different approaches to fixing the 
issue.  I doubt any single approach will work for everything.

> I also thought about reuse one field of struct page. But as you said, there
> may not be many users of this functionality. Reusing a field of struct page
> will make things more complicated and lead to high coupling.

What happens when more than one subsystem tries to pin a particular page?  
What if it's a shared page rather than an anonymous page?

> So, how about the other idea that Mel mentioned ?
> 
> We create a 1-1 mapping of pinned page ranges and the pinner (subsystem
> callbacks and data), maybe a global list or a hash table. And then, we can
> find the callbacks.

Maybe that is the simplest approach, but it's going to make get_user_pages() 
slower and more complicated (as if it wasn't already).  Maybe with all the 
bells and whistles of per-cpu data structures and such you can make it work, 
but I'm pretty sure someone running the large unmentionable benchmark will 
complain about the performance regressions you're going to introduce.  At 
least in the case of the AIO ring buffer, using the address_space approach 
doesn't introduce any new performance issues.  There's also the bigger 
question of if you can or cannot exclude get_user_pages_fast() from this.  
In short: you've got a lot more work on your hands to do.

> Thanks. :)

Cheers,

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-05-21  2:27                           ` Benjamin LaHaise
@ 2013-06-11  9:42                             ` Tang Chen
  2013-06-11 14:45                               ` Benjamin LaHaise
  0 siblings, 1 reply; 37+ messages in thread
From: Tang Chen @ 2013-06-11  9:42 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

Hi Benjamin,

Are you still working on this problem ?

Thanks. :)

On 05/21/2013 10:27 AM, Benjamin LaHaise wrote:
> On Tue, May 21, 2013 at 10:07:52AM +0800, Tang Chen wrote:
> ....
>> I'm not saying using two callbacks before and after migration is better.
>> I don't want to use address_space_operations is because there is no such
>> member
>> for anonymous pages.
>
> That depends on the nature of the pinning.  For the general case of
> get_user_pages(), you're correct that it won't work for anonymous memory.
>
>> In your idea, using a file mapping will create a
>> address_space_operations. But
>> I really don't think we can modify the way of memory allocation for all the
>> subsystems who has this problem. Maybe not just aio and cma. That means if
>> you want to pin pages in memory, you have to use a file mapping. This makes
>> the memory allocation more complicated. And the idea should be known by all
>> the subsystem developers. Is that going to happen ?
>
> Different subsystems will need to use different approaches to fixing the
> issue.  I doubt any single approach will work for everything.
>
>> I also thought about reuse one field of struct page. But as you said, there
>> may not be many users of this functionality. Reusing a field of struct page
>> will make things more complicated and lead to high coupling.
>
> What happens when more than one subsystem tries to pin a particular page?
> What if it's a shared page rather than an anonymous page?
>
>> So, how about the other idea that Mel mentioned ?
>>
>> We create a 1-1 mapping of pinned page ranges and the pinner (subsystem
>> callbacks and data), maybe a global list or a hash table. And then, we can
>> find the callbacks.
>
> Maybe that is the simplest approach, but it's going to make get_user_pages()
> slower and more complicated (as if it wasn't already).  Maybe with all the
> bells and whistles of per-cpu data structures and such you can make it work,
> but I'm pretty sure someone running the large unmentionable benchmark will
> complain about the performance regressions you're going to introduce.  At
> least in the case of the AIO ring buffer, using the address_space approach
> doesn't introduce any new performance issues.  There's also the bigger
> question of if you can or cannot exclude get_user_pages_fast() from this.
> In short: you've got a lot more work on your hands to do.
>
>> Thanks. :)
>
> Cheers,
>
> 		-ben

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-06-11  9:42                             ` Tang Chen
@ 2013-06-11 14:45                               ` Benjamin LaHaise
  2013-06-28  9:24                                 ` Gu Zheng
  2013-07-01  7:23                                 ` Gu Zheng
  0 siblings, 2 replies; 37+ messages in thread
From: Benjamin LaHaise @ 2013-06-11 14:45 UTC (permalink / raw)
  To: Tang Chen
  Cc: Mel Gorman, Minchan Kim, Lin Feng, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

Hi Tang,

On Tue, Jun 11, 2013 at 05:42:31PM +0800, Tang Chen wrote:
> Hi Benjamin,
> 
> Are you still working on this problem ?
> 
> Thanks. :)

Below is a copy of the most recent version of this patch I have worked 
on.  This version works and stands up to my testing using move_pages() to 
force the migration of the aio ring buffer.  A test program is available 
at http://www.kvack.org/~bcrl/aio/aio-numa-test.c .  Please note that 
this version is not suitable for mainline as the modifactions to the 
anon inode code are undesirable, so that part needs reworking.

		-ben


 fs/aio.c                |  113 ++++++++++++++++++++++++++++++++++++++++++++----
 fs/anon_inodes.c        |   14 ++++-
 include/linux/migrate.h |    3 +
 mm/migrate.c            |    2 
 mm/swap.c               |    1 
 5 files changed, 121 insertions(+), 12 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index c5b1a8c..a951690 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -35,6 +35,9 @@
 #include <linux/eventfd.h>
 #include <linux/blkdev.h>
 #include <linux/compat.h>
+#include <linux/anon_inodes.h>
+#include <linux/migrate.h>
+#include <linux/ramfs.h>
 
 #include <asm/kmap_types.h>
 #include <asm/uaccess.h>
@@ -108,6 +111,7 @@ struct kioctx {
 	} ____cacheline_aligned_in_smp;
 
 	struct page		*internal_pages[AIO_RING_PAGES];
+	struct file		*ctx_file;
 };
 
 /*------ sysctl variables----*/
@@ -136,18 +140,80 @@ __initcall(aio_setup);
 
 static void aio_free_ring(struct kioctx *ctx)
 {
-	long i;
-
-	for (i = 0; i < ctx->nr_pages; i++)
-		put_page(ctx->ring_pages[i]);
+	int i;
 
 	if (ctx->mmap_size)
 		vm_munmap(ctx->mmap_base, ctx->mmap_size);
 
+	if (ctx->ctx_file)
+		truncate_setsize(ctx->ctx_file->f_inode, 0);
+
+	for (i = 0; i < ctx->nr_pages; i++) {
+		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
+			 page_count(ctx->ring_pages[i]));
+		put_page(ctx->ring_pages[i]);
+	}
+
 	if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages)
 		kfree(ctx->ring_pages);
+
+	if (ctx->ctx_file) {
+		truncate_setsize(ctx->ctx_file->f_inode, 0);
+		pr_debug("pid(%d) i_nlink=%u d_count=%d, d_unhashed=%d i_count=%d\n",
+			 current->pid, ctx->ctx_file->f_inode->i_nlink,
+			 ctx->ctx_file->f_path.dentry->d_count,
+			 d_unhashed(ctx->ctx_file->f_path.dentry),
+			 atomic_read(&ctx->ctx_file->f_path.dentry->d_inode->i_count));
+		fput(ctx->ctx_file);
+		ctx->ctx_file = NULL;
+	}
+}
+
+static int aio_ctx_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	vma->vm_ops = &generic_file_vm_ops;
+	return 0;
+}
+
+static const struct file_operations aio_ctx_fops = {
+	.mmap	= aio_ctx_mmap,
+};
+
+static int aio_set_page_dirty(struct page *page)
+{
+	return 0;
+}
+
+static int aio_migratepage(struct address_space *mapping, struct page *new,
+			   struct page *old, enum migrate_mode mode)
+{
+	struct kioctx *ctx = mapping->private_data;
+	unsigned long flags;
+	unsigned idx = old->index;
+	int rc;
+
+	BUG_ON(PageWriteback(old));    /* Writeback must be complete */
+	put_page(old);
+	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
+	if (rc != MIGRATEPAGE_SUCCESS) {
+		get_page(old);
+		return rc;
+	}
+	get_page(new);
+
+	spin_lock_irqsave(&ctx->completion_lock, flags);
+	migrate_page_copy(new, old);
+	ctx->ring_pages[idx] = new;
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
+	return MIGRATEPAGE_SUCCESS;
 }
 
+static const struct address_space_operations aio_ctx_aops = {
+	.set_page_dirty = aio_set_page_dirty,
+	.migratepage	= aio_migratepage,
+};
+
 static int aio_setup_ring(struct kioctx *ctx)
 {
 	struct aio_ring *ring;
@@ -155,6 +221,7 @@ static int aio_setup_ring(struct kioctx *ctx)
 	struct mm_struct *mm = current->mm;
 	unsigned long size, populate;
 	int nr_pages;
+	int i;
 
 	/* Compensate for the ring buffer's head/tail overlap entry */
 	nr_events += 2;	/* 1 is required, 2 for good luck */
@@ -166,6 +233,28 @@ static int aio_setup_ring(struct kioctx *ctx)
 	if (nr_pages < 0)
 		return -EINVAL;
 
+	ctx->ctx_file = anon_inode_getfile("[aio]", &aio_ctx_fops, ctx, O_RDWR);
+	if (IS_ERR(ctx->ctx_file)) {
+		ctx->ctx_file = NULL;
+		return -EAGAIN;
+	}
+	ctx->ctx_file->f_inode->i_mapping->a_ops = &aio_ctx_aops;
+	ctx->ctx_file->f_inode->i_mapping->private_data = ctx;
+	ctx->ctx_file->f_inode->i_size = PAGE_SIZE * (loff_t)nr_pages;
+
+	for (i=0; i<nr_pages; i++) {
+		struct page *page;
+		page = find_or_create_page(ctx->ctx_file->f_inode->i_mapping,
+					   i, GFP_HIGHUSER | __GFP_ZERO);
+		if (!page)
+			break;
+		pr_debug("pid(%d) page[%d]->count=%d\n",
+			 current->pid, i, page_count(page));
+		SetPageUptodate(page);
+		SetPageDirty(page);
+		unlock_page(page);
+	}
+
 	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
 
 	ctx->nr_events = 0;
@@ -180,20 +269,25 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 	down_write(&mm->mmap_sem);
-	ctx->mmap_base = do_mmap_pgoff(NULL, 0, ctx->mmap_size,
-				       PROT_READ|PROT_WRITE,
-				       MAP_ANONYMOUS|MAP_PRIVATE, 0, &populate);
+	ctx->mmap_base = do_mmap_pgoff(ctx->ctx_file, 0, ctx->mmap_size,
+				       PROT_READ | PROT_WRITE,
+				       MAP_SHARED | MAP_POPULATE, 0,
+				       &populate);
 	if (IS_ERR((void *)ctx->mmap_base)) {
 		up_write(&mm->mmap_sem);
 		ctx->mmap_size = 0;
 		aio_free_ring(ctx);
 		return -EAGAIN;
 	}
+	up_write(&mm->mmap_sem);
+	mm_populate(ctx->mmap_base, populate);
 
 	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
 	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
 				       1, 0, ctx->ring_pages, NULL);
-	up_write(&mm->mmap_sem);
+	for (i=0; i<ctx->nr_pages; i++) {
+		put_page(ctx->ring_pages[i]);
+	}
 
 	if (unlikely(ctx->nr_pages != nr_pages)) {
 		aio_free_ring(ctx);
@@ -403,6 +497,8 @@ out_cleanup:
 	err = -EAGAIN;
 	aio_free_ring(ctx);
 out_freectx:
+	if (ctx->ctx_file)
+		fput(ctx->ctx_file);
 	kmem_cache_free(kioctx_cachep, ctx);
 	pr_debug("error allocating ioctx %d\n", err);
 	return ERR_PTR(err);
@@ -852,6 +948,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
 	ioctx = ioctx_alloc(nr_events);
 	ret = PTR_ERR(ioctx);
 	if (!IS_ERR(ioctx)) {
+		ctx = ioctx->user_id;
 		ret = put_user(ioctx->user_id, ctxp);
 		if (ret)
 			kill_ioctx(ioctx);
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 47a65df..376d289 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -131,6 +131,7 @@ struct file *anon_inode_getfile(const char *name,
 	struct qstr this;
 	struct path path;
 	struct file *file;
+	struct inode *inode;
 
 	if (IS_ERR(anon_inode_inode))
 		return ERR_PTR(-ENODEV);
@@ -138,6 +139,12 @@ struct file *anon_inode_getfile(const char *name,
 	if (fops->owner && !try_module_get(fops->owner))
 		return ERR_PTR(-ENOENT);
 
+	inode = anon_inode_mkinode(anon_inode_inode->i_sb);
+	if (IS_ERR(inode)) {
+		file = ERR_PTR(-ENOMEM);
+		goto err_module;
+	}
+
 	/*
 	 * Link the inode to a directory entry by creating a unique name
 	 * using the inode sequence number.
@@ -155,17 +162,18 @@ struct file *anon_inode_getfile(const char *name,
 	 * We know the anon_inode inode count is always greater than zero,
 	 * so ihold() is safe.
 	 */
-	ihold(anon_inode_inode);
+	//ihold(inode);
 
-	d_instantiate(path.dentry, anon_inode_inode);
+	d_instantiate(path.dentry, inode);
 
 	file = alloc_file(&path, OPEN_FMODE(flags), fops);
 	if (IS_ERR(file))
 		goto err_dput;
-	file->f_mapping = anon_inode_inode->i_mapping;
+	file->f_mapping = inode->i_mapping;
 
 	file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
 	file->private_data = priv;
+	drop_nlink(inode);
 
 	return file;
 
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index a405d3dc..b6f3289 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -55,6 +55,9 @@ extern int migrate_vmas(struct mm_struct *mm,
 extern void migrate_page_copy(struct page *newpage, struct page *page);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct page *newpage, struct page *page);
+extern int migrate_page_move_mapping(struct address_space *mapping,
+                struct page *newpage, struct page *page,
+                struct buffer_head *head, enum migrate_mode mode);
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
diff --git a/mm/migrate.c b/mm/migrate.c
index 27ed225..ac9c3a9 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -294,7 +294,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
  * 2 for pages with a mapping
  * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
  */
-static int migrate_page_move_mapping(struct address_space *mapping,
+int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page,
 		struct buffer_head *head, enum migrate_mode mode)
 {
diff --git a/mm/swap.c b/mm/swap.c
index dfd7d71..bbfba0a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -160,6 +160,7 @@ skip_lock_tail:
 
 void put_page(struct page *page)
 {
+	BUG_ON(page_count(page) <= 0);
 	if (unlikely(PageCompound(page)))
 		put_compound_page(page);
 	else if (put_page_testzero(page))

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-06-11 14:45                               ` Benjamin LaHaise
@ 2013-06-28  9:24                                 ` Gu Zheng
  2013-07-01  7:23                                 ` Gu Zheng
  1 sibling, 0 replies; 37+ messages in thread
From: Gu Zheng @ 2013-06-28  9:24 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Mel Gorman, Minchan Kim, Lin Feng, akpm, viro,
	khlebnikov, walken, kamezawa.hiroyu, riel, rientjes,
	isimatu.yasuaki, wency, laijs, jiang.liu, zab, jmoyer, linux-mm,
	linux-aio, linux-fsdevel, linux-kernel, Marek Szyprowski

On 06/11/2013 10:45 PM, Benjamin LaHaise wrote:

> Hi Tang,
> 
> On Tue, Jun 11, 2013 at 05:42:31PM +0800, Tang Chen wrote:
>> Hi Benjamin,
>>
>> Are you still working on this problem ?
>>
>> Thanks. :)
> 
> Below is a copy of the most recent version of this patch I have worked 
> on.  This version works and stands up to my testing using move_pages() to 
> force the migration of the aio ring buffer.  A test program is available 
> at http://www.kvack.org/~bcrl/aio/aio-numa-test.c .  Please note that 
> this version is not suitable for mainline as the modifactions to the 

> anon inode code are undesirable, so that part needs reworking.

Hi Ben,
Are you still working on this patch?
As you know, using the current anon inode will lead to more than one instance of
aio can not work. Have you found a way to fix this issue? Or can we use some
other ones to replace the anon inode?

Thanks,
Gu

> 
> 		-ben
> 
> 
>  fs/aio.c                |  113 ++++++++++++++++++++++++++++++++++++++++++++----
>  fs/anon_inodes.c        |   14 ++++-
>  include/linux/migrate.h |    3 +
>  mm/migrate.c            |    2 
>  mm/swap.c               |    1 
>  5 files changed, 121 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index c5b1a8c..a951690 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -35,6 +35,9 @@
>  #include <linux/eventfd.h>
>  #include <linux/blkdev.h>
>  #include <linux/compat.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/migrate.h>
> +#include <linux/ramfs.h>
>  
>  #include <asm/kmap_types.h>
>  #include <asm/uaccess.h>
> @@ -108,6 +111,7 @@ struct kioctx {
>  	} ____cacheline_aligned_in_smp;
>  
>  	struct page		*internal_pages[AIO_RING_PAGES];
> +	struct file		*ctx_file;
>  };
>  
>  /*------ sysctl variables----*/
> @@ -136,18 +140,80 @@ __initcall(aio_setup);
>  
>  static void aio_free_ring(struct kioctx *ctx)
>  {
> -	long i;
> -
> -	for (i = 0; i < ctx->nr_pages; i++)
> -		put_page(ctx->ring_pages[i]);
> +	int i;
>  
>  	if (ctx->mmap_size)
>  		vm_munmap(ctx->mmap_base, ctx->mmap_size);
>  
> +	if (ctx->ctx_file)
> +		truncate_setsize(ctx->ctx_file->f_inode, 0);
> +
> +	for (i = 0; i < ctx->nr_pages; i++) {
> +		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
> +			 page_count(ctx->ring_pages[i]));
> +		put_page(ctx->ring_pages[i]);
> +	}
> +
>  	if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages)
>  		kfree(ctx->ring_pages);
> +
> +	if (ctx->ctx_file) {
> +		truncate_setsize(ctx->ctx_file->f_inode, 0);
> +		pr_debug("pid(%d) i_nlink=%u d_count=%d, d_unhashed=%d i_count=%d\n",
> +			 current->pid, ctx->ctx_file->f_inode->i_nlink,
> +			 ctx->ctx_file->f_path.dentry->d_count,
> +			 d_unhashed(ctx->ctx_file->f_path.dentry),
> +			 atomic_read(&ctx->ctx_file->f_path.dentry->d_inode->i_count));
> +		fput(ctx->ctx_file);
> +		ctx->ctx_file = NULL;
> +	}
> +}
> +
> +static int aio_ctx_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	vma->vm_ops = &generic_file_vm_ops;
> +	return 0;
> +}
> +
> +static const struct file_operations aio_ctx_fops = {
> +	.mmap	= aio_ctx_mmap,
> +};
> +
> +static int aio_set_page_dirty(struct page *page)
> +{
> +	return 0;
> +}
> +
> +static int aio_migratepage(struct address_space *mapping, struct page *new,
> +			   struct page *old, enum migrate_mode mode)
> +{
> +	struct kioctx *ctx = mapping->private_data;
> +	unsigned long flags;
> +	unsigned idx = old->index;
> +	int rc;
> +
> +	BUG_ON(PageWriteback(old));    /* Writeback must be complete */
> +	put_page(old);
> +	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
> +	if (rc != MIGRATEPAGE_SUCCESS) {
> +		get_page(old);
> +		return rc;
> +	}
> +	get_page(new);
> +
> +	spin_lock_irqsave(&ctx->completion_lock, flags);
> +	migrate_page_copy(new, old);
> +	ctx->ring_pages[idx] = new;
> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
> +
> +	return MIGRATEPAGE_SUCCESS;
>  }
>  
> +static const struct address_space_operations aio_ctx_aops = {
> +	.set_page_dirty = aio_set_page_dirty,
> +	.migratepage	= aio_migratepage,
> +};
> +
>  static int aio_setup_ring(struct kioctx *ctx)
>  {
>  	struct aio_ring *ring;
> @@ -155,6 +221,7 @@ static int aio_setup_ring(struct kioctx *ctx)
>  	struct mm_struct *mm = current->mm;
>  	unsigned long size, populate;
>  	int nr_pages;
> +	int i;
>  
>  	/* Compensate for the ring buffer's head/tail overlap entry */
>  	nr_events += 2;	/* 1 is required, 2 for good luck */
> @@ -166,6 +233,28 @@ static int aio_setup_ring(struct kioctx *ctx)
>  	if (nr_pages < 0)
>  		return -EINVAL;
>  
> +	ctx->ctx_file = anon_inode_getfile("[aio]", &aio_ctx_fops, ctx, O_RDWR);
> +	if (IS_ERR(ctx->ctx_file)) {
> +		ctx->ctx_file = NULL;
> +		return -EAGAIN;
> +	}
> +	ctx->ctx_file->f_inode->i_mapping->a_ops = &aio_ctx_aops;
> +	ctx->ctx_file->f_inode->i_mapping->private_data = ctx;
> +	ctx->ctx_file->f_inode->i_size = PAGE_SIZE * (loff_t)nr_pages;
> +
> +	for (i=0; i<nr_pages; i++) {
> +		struct page *page;
> +		page = find_or_create_page(ctx->ctx_file->f_inode->i_mapping,
> +					   i, GFP_HIGHUSER | __GFP_ZERO);
> +		if (!page)
> +			break;
> +		pr_debug("pid(%d) page[%d]->count=%d\n",
> +			 current->pid, i, page_count(page));
> +		SetPageUptodate(page);
> +		SetPageDirty(page);
> +		unlock_page(page);
> +	}
> +
>  	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
>  
>  	ctx->nr_events = 0;
> @@ -180,20 +269,25 @@ static int aio_setup_ring(struct kioctx *ctx)
>  	ctx->mmap_size = nr_pages * PAGE_SIZE;
>  	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
>  	down_write(&mm->mmap_sem);
> -	ctx->mmap_base = do_mmap_pgoff(NULL, 0, ctx->mmap_size,
> -				       PROT_READ|PROT_WRITE,
> -				       MAP_ANONYMOUS|MAP_PRIVATE, 0, &populate);
> +	ctx->mmap_base = do_mmap_pgoff(ctx->ctx_file, 0, ctx->mmap_size,
> +				       PROT_READ | PROT_WRITE,
> +				       MAP_SHARED | MAP_POPULATE, 0,
> +				       &populate);
>  	if (IS_ERR((void *)ctx->mmap_base)) {
>  		up_write(&mm->mmap_sem);
>  		ctx->mmap_size = 0;
>  		aio_free_ring(ctx);
>  		return -EAGAIN;
>  	}
> +	up_write(&mm->mmap_sem);
> +	mm_populate(ctx->mmap_base, populate);
>  
>  	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
>  	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
>  				       1, 0, ctx->ring_pages, NULL);
> -	up_write(&mm->mmap_sem);
> +	for (i=0; i<ctx->nr_pages; i++) {
> +		put_page(ctx->ring_pages[i]);
> +	}
>  
>  	if (unlikely(ctx->nr_pages != nr_pages)) {
>  		aio_free_ring(ctx);
> @@ -403,6 +497,8 @@ out_cleanup:
>  	err = -EAGAIN;
>  	aio_free_ring(ctx);
>  out_freectx:
> +	if (ctx->ctx_file)
> +		fput(ctx->ctx_file);
>  	kmem_cache_free(kioctx_cachep, ctx);
>  	pr_debug("error allocating ioctx %d\n", err);
>  	return ERR_PTR(err);
> @@ -852,6 +948,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
>  	ioctx = ioctx_alloc(nr_events);
>  	ret = PTR_ERR(ioctx);
>  	if (!IS_ERR(ioctx)) {
> +		ctx = ioctx->user_id;
>  		ret = put_user(ioctx->user_id, ctxp);
>  		if (ret)
>  			kill_ioctx(ioctx);
> diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> index 47a65df..376d289 100644
> --- a/fs/anon_inodes.c
> +++ b/fs/anon_inodes.c
> @@ -131,6 +131,7 @@ struct file *anon_inode_getfile(const char *name,
>  	struct qstr this;
>  	struct path path;
>  	struct file *file;
> +	struct inode *inode;
>  
>  	if (IS_ERR(anon_inode_inode))
>  		return ERR_PTR(-ENODEV);
> @@ -138,6 +139,12 @@ struct file *anon_inode_getfile(const char *name,
>  	if (fops->owner && !try_module_get(fops->owner))
>  		return ERR_PTR(-ENOENT);
>  
> +	inode = anon_inode_mkinode(anon_inode_inode->i_sb);
> +	if (IS_ERR(inode)) {
> +		file = ERR_PTR(-ENOMEM);
> +		goto err_module;
> +	}
> +
>  	/*
>  	 * Link the inode to a directory entry by creating a unique name
>  	 * using the inode sequence number.
> @@ -155,17 +162,18 @@ struct file *anon_inode_getfile(const char *name,
>  	 * We know the anon_inode inode count is always greater than zero,
>  	 * so ihold() is safe.
>  	 */
> -	ihold(anon_inode_inode);
> +	//ihold(inode);
>  
> -	d_instantiate(path.dentry, anon_inode_inode);
> +	d_instantiate(path.dentry, inode);
>  
>  	file = alloc_file(&path, OPEN_FMODE(flags), fops);
>  	if (IS_ERR(file))
>  		goto err_dput;
> -	file->f_mapping = anon_inode_inode->i_mapping;
> +	file->f_mapping = inode->i_mapping;
>  
>  	file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
>  	file->private_data = priv;
> +	drop_nlink(inode);
>  
>  	return file;
>  
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index a405d3dc..b6f3289 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -55,6 +55,9 @@ extern int migrate_vmas(struct mm_struct *mm,
>  extern void migrate_page_copy(struct page *newpage, struct page *page);
>  extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>  				  struct page *newpage, struct page *page);
> +extern int migrate_page_move_mapping(struct address_space *mapping,
> +                struct page *newpage, struct page *page,
> +                struct buffer_head *head, enum migrate_mode mode);
>  #else
>  
>  static inline void putback_lru_pages(struct list_head *l) {}
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 27ed225..ac9c3a9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -294,7 +294,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
>   * 2 for pages with a mapping
>   * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
>   */
> -static int migrate_page_move_mapping(struct address_space *mapping,
> +int migrate_page_move_mapping(struct address_space *mapping,
>  		struct page *newpage, struct page *page,
>  		struct buffer_head *head, enum migrate_mode mode)
>  {
> diff --git a/mm/swap.c b/mm/swap.c
> index dfd7d71..bbfba0a 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -160,6 +160,7 @@ skip_lock_tail:
>  
>  void put_page(struct page *page)
>  {
> +	BUG_ON(page_count(page) <= 0);
>  	if (unlikely(PageCompound(page)))
>  		put_compound_page(page);
>  	else if (put_page_testzero(page))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-06-11 14:45                               ` Benjamin LaHaise
  2013-06-28  9:24                                 ` Gu Zheng
@ 2013-07-01  7:23                                 ` Gu Zheng
  2013-07-02 18:00                                   ` Benjamin LaHaise
  1 sibling, 1 reply; 37+ messages in thread
From: Gu Zheng @ 2013-07-01  7:23 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Mel Gorman, Minchan Kim, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On 06/11/2013 10:45 PM, Benjamin LaHaise wrote:

> Hi Tang,
> 
> On Tue, Jun 11, 2013 at 05:42:31PM +0800, Tang Chen wrote:
>> Hi Benjamin,
>>
>> Are you still working on this problem ?
>>
>> Thanks. :)
> 
> Below is a copy of the most recent version of this patch I have worked 
> on.  This version works and stands up to my testing using move_pages() to 
> force the migration of the aio ring buffer.  A test program is available 
> at http://www.kvack.org/~bcrl/aio/aio-numa-test.c .  Please note that 
> this version is not suitable for mainline as the modifactions to the 
> anon inode code are undesirable, so that part needs reworking.



Hi Ben,
Are you still working on this patch?
As you know, using the current anon inode will lead to more than one instance of
aio can not work. Have you found a way to fix this issue? Or can we use some
other ones to replace the anon inode?

Thanks,
Gu

> 
> 		-ben
> 
> 
>  fs/aio.c                |  113 ++++++++++++++++++++++++++++++++++++++++++++----
>  fs/anon_inodes.c        |   14 ++++-
>  include/linux/migrate.h |    3 +
>  mm/migrate.c            |    2 
>  mm/swap.c               |    1 
>  5 files changed, 121 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index c5b1a8c..a951690 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -35,6 +35,9 @@
>  #include <linux/eventfd.h>
>  #include <linux/blkdev.h>
>  #include <linux/compat.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/migrate.h>
> +#include <linux/ramfs.h>
>  
>  #include <asm/kmap_types.h>
>  #include <asm/uaccess.h>
> @@ -108,6 +111,7 @@ struct kioctx {
>  	} ____cacheline_aligned_in_smp;
>  
>  	struct page		*internal_pages[AIO_RING_PAGES];
> +	struct file		*ctx_file;
>  };
>  
>  /*------ sysctl variables----*/
> @@ -136,18 +140,80 @@ __initcall(aio_setup);
>  
>  static void aio_free_ring(struct kioctx *ctx)
>  {
> -	long i;
> -
> -	for (i = 0; i < ctx->nr_pages; i++)
> -		put_page(ctx->ring_pages[i]);
> +	int i;
>  
>  	if (ctx->mmap_size)
>  		vm_munmap(ctx->mmap_base, ctx->mmap_size);
>  
> +	if (ctx->ctx_file)
> +		truncate_setsize(ctx->ctx_file->f_inode, 0);
> +
> +	for (i = 0; i < ctx->nr_pages; i++) {
> +		pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i,
> +			 page_count(ctx->ring_pages[i]));
> +		put_page(ctx->ring_pages[i]);
> +	}
> +
>  	if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages)
>  		kfree(ctx->ring_pages);
> +
> +	if (ctx->ctx_file) {
> +		truncate_setsize(ctx->ctx_file->f_inode, 0);
> +		pr_debug("pid(%d) i_nlink=%u d_count=%d, d_unhashed=%d i_count=%d\n",
> +			 current->pid, ctx->ctx_file->f_inode->i_nlink,
> +			 ctx->ctx_file->f_path.dentry->d_count,
> +			 d_unhashed(ctx->ctx_file->f_path.dentry),
> +			 atomic_read(&ctx->ctx_file->f_path.dentry->d_inode->i_count));
> +		fput(ctx->ctx_file);
> +		ctx->ctx_file = NULL;
> +	}
> +}
> +
> +static int aio_ctx_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	vma->vm_ops = &generic_file_vm_ops;
> +	return 0;
> +}
> +
> +static const struct file_operations aio_ctx_fops = {
> +	.mmap	= aio_ctx_mmap,
> +};
> +
> +static int aio_set_page_dirty(struct page *page)
> +{
> +	return 0;
> +}
> +
> +static int aio_migratepage(struct address_space *mapping, struct page *new,
> +			   struct page *old, enum migrate_mode mode)
> +{
> +	struct kioctx *ctx = mapping->private_data;
> +	unsigned long flags;
> +	unsigned idx = old->index;
> +	int rc;
> +
> +	BUG_ON(PageWriteback(old));    /* Writeback must be complete */
> +	put_page(old);
> +	rc = migrate_page_move_mapping(mapping, new, old, NULL, mode);
> +	if (rc != MIGRATEPAGE_SUCCESS) {
> +		get_page(old);
> +		return rc;
> +	}
> +	get_page(new);
> +
> +	spin_lock_irqsave(&ctx->completion_lock, flags);
> +	migrate_page_copy(new, old);
> +	ctx->ring_pages[idx] = new;
> +	spin_unlock_irqrestore(&ctx->completion_lock, flags);
> +
> +	return MIGRATEPAGE_SUCCESS;
>  }
>  
> +static const struct address_space_operations aio_ctx_aops = {
> +	.set_page_dirty = aio_set_page_dirty,
> +	.migratepage	= aio_migratepage,
> +};
> +
>  static int aio_setup_ring(struct kioctx *ctx)
>  {
>  	struct aio_ring *ring;
> @@ -155,6 +221,7 @@ static int aio_setup_ring(struct kioctx *ctx)
>  	struct mm_struct *mm = current->mm;
>  	unsigned long size, populate;
>  	int nr_pages;
> +	int i;
>  
>  	/* Compensate for the ring buffer's head/tail overlap entry */
>  	nr_events += 2;	/* 1 is required, 2 for good luck */
> @@ -166,6 +233,28 @@ static int aio_setup_ring(struct kioctx *ctx)
>  	if (nr_pages < 0)
>  		return -EINVAL;
>  
> +	ctx->ctx_file = anon_inode_getfile("[aio]", &aio_ctx_fops, ctx, O_RDWR);
> +	if (IS_ERR(ctx->ctx_file)) {
> +		ctx->ctx_file = NULL;
> +		return -EAGAIN;
> +	}
> +	ctx->ctx_file->f_inode->i_mapping->a_ops = &aio_ctx_aops;
> +	ctx->ctx_file->f_inode->i_mapping->private_data = ctx;
> +	ctx->ctx_file->f_inode->i_size = PAGE_SIZE * (loff_t)nr_pages;
> +
> +	for (i=0; i<nr_pages; i++) {
> +		struct page *page;
> +		page = find_or_create_page(ctx->ctx_file->f_inode->i_mapping,
> +					   i, GFP_HIGHUSER | __GFP_ZERO);
> +		if (!page)
> +			break;
> +		pr_debug("pid(%d) page[%d]->count=%d\n",
> +			 current->pid, i, page_count(page));
> +		SetPageUptodate(page);
> +		SetPageDirty(page);
> +		unlock_page(page);
> +	}
> +
>  	nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) / sizeof(struct io_event);
>  
>  	ctx->nr_events = 0;
> @@ -180,20 +269,25 @@ static int aio_setup_ring(struct kioctx *ctx)
>  	ctx->mmap_size = nr_pages * PAGE_SIZE;
>  	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
>  	down_write(&mm->mmap_sem);
> -	ctx->mmap_base = do_mmap_pgoff(NULL, 0, ctx->mmap_size,
> -				       PROT_READ|PROT_WRITE,
> -				       MAP_ANONYMOUS|MAP_PRIVATE, 0, &populate);
> +	ctx->mmap_base = do_mmap_pgoff(ctx->ctx_file, 0, ctx->mmap_size,
> +				       PROT_READ | PROT_WRITE,
> +				       MAP_SHARED | MAP_POPULATE, 0,
> +				       &populate);
>  	if (IS_ERR((void *)ctx->mmap_base)) {
>  		up_write(&mm->mmap_sem);
>  		ctx->mmap_size = 0;
>  		aio_free_ring(ctx);
>  		return -EAGAIN;
>  	}
> +	up_write(&mm->mmap_sem);
> +	mm_populate(ctx->mmap_base, populate);
>  
>  	pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base);
>  	ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages,
>  				       1, 0, ctx->ring_pages, NULL);
> -	up_write(&mm->mmap_sem);
> +	for (i=0; i<ctx->nr_pages; i++) {
> +		put_page(ctx->ring_pages[i]);
> +	}
>  
>  	if (unlikely(ctx->nr_pages != nr_pages)) {
>  		aio_free_ring(ctx);
> @@ -403,6 +497,8 @@ out_cleanup:
>  	err = -EAGAIN;
>  	aio_free_ring(ctx);
>  out_freectx:
> +	if (ctx->ctx_file)
> +		fput(ctx->ctx_file);
>  	kmem_cache_free(kioctx_cachep, ctx);
>  	pr_debug("error allocating ioctx %d\n", err);
>  	return ERR_PTR(err);
> @@ -852,6 +948,7 @@ SYSCALL_DEFINE2(io_setup, unsigned, nr_events, aio_context_t __user *, ctxp)
>  	ioctx = ioctx_alloc(nr_events);
>  	ret = PTR_ERR(ioctx);
>  	if (!IS_ERR(ioctx)) {
> +		ctx = ioctx->user_id;
>  		ret = put_user(ioctx->user_id, ctxp);
>  		if (ret)
>  			kill_ioctx(ioctx);
> diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> index 47a65df..376d289 100644
> --- a/fs/anon_inodes.c
> +++ b/fs/anon_inodes.c
> @@ -131,6 +131,7 @@ struct file *anon_inode_getfile(const char *name,
>  	struct qstr this;
>  	struct path path;
>  	struct file *file;
> +	struct inode *inode;
>  
>  	if (IS_ERR(anon_inode_inode))
>  		return ERR_PTR(-ENODEV);
> @@ -138,6 +139,12 @@ struct file *anon_inode_getfile(const char *name,
>  	if (fops->owner && !try_module_get(fops->owner))
>  		return ERR_PTR(-ENOENT);
>  
> +	inode = anon_inode_mkinode(anon_inode_inode->i_sb);
> +	if (IS_ERR(inode)) {
> +		file = ERR_PTR(-ENOMEM);
> +		goto err_module;
> +	}
> +
>  	/*
>  	 * Link the inode to a directory entry by creating a unique name
>  	 * using the inode sequence number.
> @@ -155,17 +162,18 @@ struct file *anon_inode_getfile(const char *name,
>  	 * We know the anon_inode inode count is always greater than zero,
>  	 * so ihold() is safe.
>  	 */
> -	ihold(anon_inode_inode);
> +	//ihold(inode);
>  
> -	d_instantiate(path.dentry, anon_inode_inode);
> +	d_instantiate(path.dentry, inode);
>  
>  	file = alloc_file(&path, OPEN_FMODE(flags), fops);
>  	if (IS_ERR(file))
>  		goto err_dput;
> -	file->f_mapping = anon_inode_inode->i_mapping;
> +	file->f_mapping = inode->i_mapping;
>  
>  	file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
>  	file->private_data = priv;
> +	drop_nlink(inode);
>  
>  	return file;
>  
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index a405d3dc..b6f3289 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -55,6 +55,9 @@ extern int migrate_vmas(struct mm_struct *mm,
>  extern void migrate_page_copy(struct page *newpage, struct page *page);
>  extern int migrate_huge_page_move_mapping(struct address_space *mapping,
>  				  struct page *newpage, struct page *page);
> +extern int migrate_page_move_mapping(struct address_space *mapping,
> +                struct page *newpage, struct page *page,
> +                struct buffer_head *head, enum migrate_mode mode);
>  #else
>  
>  static inline void putback_lru_pages(struct list_head *l) {}
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 27ed225..ac9c3a9 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -294,7 +294,7 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head,
>   * 2 for pages with a mapping
>   * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
>   */
> -static int migrate_page_move_mapping(struct address_space *mapping,
> +int migrate_page_move_mapping(struct address_space *mapping,
>  		struct page *newpage, struct page *page,
>  		struct buffer_head *head, enum migrate_mode mode)
>  {
> diff --git a/mm/swap.c b/mm/swap.c
> index dfd7d71..bbfba0a 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -160,6 +160,7 @@ skip_lock_tail:
>  
>  void put_page(struct page *page)
>  {
> +	BUG_ON(page_count(page) <= 0);
>  	if (unlikely(PageCompound(page)))
>  		put_compound_page(page);
>  	else if (put_page_testzero(page))
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-07-01  7:23                                 ` Gu Zheng
@ 2013-07-02 18:00                                   ` Benjamin LaHaise
  2013-07-03  1:53                                     ` Gu Zheng
  2013-07-04  6:51                                     ` Gu Zheng
  0 siblings, 2 replies; 37+ messages in thread
From: Benjamin LaHaise @ 2013-07-02 18:00 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Tang Chen, Mel Gorman, Minchan Kim, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On Mon, Jul 01, 2013 at 03:23:39PM +0800, Gu Zheng wrote:
> Hi Ben,
> Are you still working on this patch?
> As you know, using the current anon inode will lead to more than one instance of
> aio can not work. Have you found a way to fix this issue? Or can we use some
> other ones to replace the anon inode?

This patch hasn't been a high priority for me.  I would really appreciate 
it if someone could confirm that this patch does indeed fix the hotplug 
page migration issue by testing it in a system that hits the bug.  Removing 
the anon_inode bits isn't too much work, but I'd just like to have some 
confirmation that this fix is considered to be "good enough" for the 
problem at hand before spending any further time on it.  There was talk of 
using another approach, but it's not clear if there was any progress.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-07-02 18:00                                   ` Benjamin LaHaise
@ 2013-07-03  1:53                                     ` Gu Zheng
  2013-07-04  6:51                                     ` Gu Zheng
  1 sibling, 0 replies; 37+ messages in thread
From: Gu Zheng @ 2013-07-03  1:53 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Mel Gorman, Minchan Kim, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On 07/03/2013 02:00 AM, Benjamin LaHaise wrote:

> On Mon, Jul 01, 2013 at 03:23:39PM +0800, Gu Zheng wrote:
>> Hi Ben,
>> Are you still working on this patch?
>> As you know, using the current anon inode will lead to more than one instance of
>> aio can not work. Have you found a way to fix this issue? Or can we use some
>> other ones to replace the anon inode?
> 
> This patch hasn't been a high priority for me.  I would really appreciate 
> it if someone could confirm that this patch does indeed fix the hotplug 
> page migration issue by testing it in a system that hits the bug.  Removing 
> the anon_inode bits isn't too much work, but I'd just like to have some 
> confirmation that this fix is considered to be "good enough" for the 
> problem at hand before spending any further time on it.  There was talk of 
> using another approach, but it's not clear if there was any progress.

Yeah, we have not seen anyone try to fix this issue using the other approach we
talked. I'm not sure whether your patch can indeed fix the problem, but I'll
carry out a complete test to confirm it, and I'll be very glad to continue this
job based on your patch if you do not have enough time working on it.:)

Thanks,
Gu

> 
> 		-ben



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-07-02 18:00                                   ` Benjamin LaHaise
  2013-07-03  1:53                                     ` Gu Zheng
@ 2013-07-04  6:51                                     ` Gu Zheng
  2013-07-04 11:41                                       ` Benjamin LaHaise
  1 sibling, 1 reply; 37+ messages in thread
From: Gu Zheng @ 2013-07-04  6:51 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Mel Gorman, Minchan Kim, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On 07/03/2013 02:00 AM, Benjamin LaHaise wrote:

> On Mon, Jul 01, 2013 at 03:23:39PM +0800, Gu Zheng wrote:
>> Hi Ben,
>> Are you still working on this patch?
>> As you know, using the current anon inode will lead to more than one instance of
>> aio can not work. Have you found a way to fix this issue? Or can we use some
>> other ones to replace the anon inode?
> 
> This patch hasn't been a high priority for me.  I would really appreciate 
> it if someone could confirm that this patch does indeed fix the hotplug 
> page migration issue by testing it in a system that hits the bug.  Removing 
> the anon_inode bits isn't too much work, but I'd just like to have some 
> confirmation that this fix is considered to be "good enough" for the 
> problem at hand before spending any further time on it.  There was talk of 
> using another approach, but it's not clear if there was any progress.

Hi Ben,
      When I test your patch on kernel 3.10, the kernel panic when aio job
complete or exit, exactly in aio_free_ring(), the following is a part of dmesg.

Thanks,
Gu

kernel BUG at mm/swap.c:163!

invalid opcode: 0000 [#1] SMP

Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net
macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table
mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich
mfd_core ioatdma i7core_edac edac_core e1000e igb dca i2c_algo_bit i2c_core ptp
pps_core ext4(F) jbd2(F) mbcache(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F)
mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)

CPU: 4 PID: 100 Comm: kworker/4:1 Tainted: GF            3.10.0-aio-migrate+
#107
Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS
Version 89.32 DP Proto 08/16/2012
Workqueue: events kill_ioctx_work

task: ffff8807dda974e0 ti: ffff8807dda98000 task.ti: ffff8807dda98000

RIP: 0010:[<ffffffff8111a9a8>]  [<ffffffff8111a9a8>] put_page+0x48/0x60

RSP: 0018:ffff8807dda99cd8  EFLAGS: 00010246

RAX: 0000000000000000 RBX: ffff8807be1f1e00 RCX: 0000000000000001

RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffea001b196c80

RBP: ffff8807dda99cd8 R08: 0000000000000000 R09: 0000000000000000

R10: ffff8807ffbb5f00 R11: 000000000000005a R12: 0000000000000001

R13: 0000000000000000 R14: ffff8807dda974e0 R15: ffff8807be1f1ec8

FS:  0000000000000000(0000) GS:ffff8807fd680000(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000003b826dc7d0 CR3: 0000000001a0b000 CR4: 00000000000007e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Stack:

 ffff8807dda99d18 ffffffff811b11f6 0000000000000000 0000000200000000

 ffff8807be1f1e00 ffff8807be1f1e80 000000000000000c 0000000000000000

 ffff8807dda99dc8 ffffffff811b21a2 00000001000438ec ffff8807fd692d00

Call Trace:

 [<ffffffff811b11f6>] aio_free_ring+0x96/0x1c0

 [<ffffffff811b21a2>] free_ioctx+0x1f2/0x250

 [<ffffffff81081a5d>] ? idle_balance+0xed/0x140

 [<ffffffff811b221a>] put_ioctx+0x1a/0x30

 [<ffffffff811b24af>] kill_ioctx_work+0x2f/0x40

 [<ffffffff81060933>] process_one_work+0x183/0x490

 [<ffffffff81061ac0>] worker_thread+0x120/0x3a0

 [<ffffffff810619a0>] ? manage_workers+0x160/0x160

 [<ffffffff8106786e>] kthread+0xce/0xe0

 [<ffffffff810677a0>] ? kthread_freezable_should_stop+0x70/0x70

 [<ffffffff8154b79c>] ret_from_fork+0x7c/0xb0

 [<ffffffff810677a0>] ? kthread_freezable_should_stop+0x70/0x70

Code: 07 00 c0 75 1f f0 ff 4f 1c 0f 94 c0 84 c0 75 0b c9 66 90 c3 0f 1f 80 00 00
00 00 e8 53 fe ff ff c9 66 90 c3 e8 7a fe ff ff c9 c3 <0f> 0b 66 0f 1f 44 00 00
eb f8 48 8b 47 30 eb bc 0f 1f 84 00 00
RIP  [<ffffffff8111a9a8>] put_page+0x48/0x60

 RSP <ffff8807dda99cd8>

---[ end trace b5e2c17407c840d8 ]---

Jul  4 15:49:50 BUG: unable to handle kernel paging request at ffffffffffffffd8

IP: [<ffffffff81067140>] kthread_data+0x10/0x20

PGD 1a0c067 PUD 1a0e067 PMD 0

Oops: 0000 [#2] SMP

Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4
nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net
macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table
mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich
mfd_core ioatdma i7core_edac edac_core e1000e igb dca i2c_algo_bit i2c_core ptp
pps_core ext4(F) jbd2(F) mbcache(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F)
mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)

CPU: 4 PID: 100 Comm: kworker/4:1 Tainted: GF     D      3.10.0-aio-migrate+
#107
Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS
Version 89.32 DP Proto 08/16/2012
task: ffff8807dda974e0 ti: ffff8807dda98000 task.ti: ffff8807dda98000

RIP: 0010:[<ffffffff81067140>]  [<ffffffff81067140>] kthread_data+0x10/0x20

RSP: 0018:ffff8807dda999b8  EFLAGS: 00010092

RAX: 0000000000000000 RBX: 0000000000000004 RCX: ffffffff81da3ea0

RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8807dda974e0

RBP: ffff8807dda999b8 R08: ffff8807dda97550 R09: 0000000000000006

R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004

R13: ffff8807dda97ab8 R14: 0000000000000001 R15: 0000000000000006

FS:  0000000000000000(0000) GS:ffff8807fd680000(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000000028 CR3: 0000000001a0b000 CR4: 00000000000007e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Stack:

 ffff8807dda999d8 ffffffff8105e155 ffff8807dda999d8 ffff8807fd692d00

 ffff8807dda99a68 ffffffff8154168b ffff8807dda99fd8 0000000000012d00

 ffff8807dda98010 0000000000012d00 0000000000012d00 0000000000012d00

Call Trace:

 [<ffffffff8105e155>] wq_worker_sleeping+0x15/0xa0

 [<ffffffff8154168b>] __schedule+0x5ab/0x6f0

 [<ffffffff81239992>] ? put_io_context_active+0xc2/0xf0

 [<ffffffff815419a9>] schedule+0x29/0x70

 [<ffffffff81047795>] do_exit+0x2d5/0x480

 [<ffffffff81544029>] oops_end+0xa9/0xf0

 [<ffffffff810058eb>] die+0x5b/0x90

 [<ffffffff81543b8b>] do_trap+0xcb/0x170

 [<ffffffff81546e22>] ? __atomic_notifier_call_chain+0x12/0x20

 [<ffffffff81003565>] do_invalid_op+0x95/0xb0

 [<ffffffff8111a9a8>] ? put_page+0x48/0x60

 [<ffffffff8111c411>] ? truncate_inode_pages_range+0x201/0x500

 [<ffffffff8154c8e8>] invalid_op+0x18/0x20

 [<ffffffff8111a9a8>] ? put_page+0x48/0x60

 [<ffffffff8111c829>] ? truncate_setsize+0x19/0x20

 [<ffffffff811b11f6>] aio_free_ring+0x96/0x1c0

 [<ffffffff811b21a2>] free_ioctx+0x1f2/0x250

 [<ffffffff81081a5d>] ? idle_balance+0xed/0x140

 [<ffffffff811b221a>] put_ioctx+0x1a/0x30

 [<ffffffff811b24af>] kill_ioctx_work+0x2f/0x40

 [<ffffffff81060933>] process_one_work+0x183/0x490

 [<ffffffff81061ac0>] worker_thread+0x120/0x3a0

 [<ffffffff810619a0>] ? manage_workers+0x160/0x160

 [<ffffffff8106786e>] kthread+0xce/0xe0

 [<ffffffff810677a0>] ? kthread_freezable_should_stop+0x70/0x70

 [<ffffffff8154b79c>] ret_from_fork+0x7c/0xb0

 [<ffffffff810677a0>] ? kthread_freezable_should_stop+0x70/0x70

Code: 80 05 00 00 48 8b 40 c8 c9 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00
00 00 55 48 89 e5 66 66 66 66 90 48 8b 87 80 05 00 00 <48> 8b 40 d8 c9 c3 66 2e
0f 1f 84 00 00 00 00 00 55 48 89 e5 66
RIP  [<ffffffff81067140>] kthread_data+0x10/0x20

 RSP <ffff8807dda999b8>

CR2: ffffffffffffffd8

---[ end trace b5e2c17407c840d9 ]---

DP kernel: -----Fixing recursive fault but reboot is needed!

-------[ cut here ]------------

Jul  4 15:49:50 DP kernel: kernel BUG at mm/swap.c:163!

Jul  4 15:49:50 DP kernel: invalid opcode: 0000 [#1] SMP

Jul  4 15:49:50 DP kernel: Modules linked in: ebtable_nat ebtables
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge
stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4
nf_defrag_ipv4 iptable_filter ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror
dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt
iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm
crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core ioatdma i7core_edac
edac_core e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) jbd2(F)
mbcache(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F)
mptbase(F) scsi_transport_sas(F)
Jul  4 15:49:50 DP kernel: CPU: 4 PID: 100 Comm: kworker/4:1 Tainted: GF
    3.10.0-aio-migrate+ #107
Jul  4 15:49:50 DP kernel: Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS
PRIMEQUEST 1000 Series BIOS Version 89.32 DP Proto 08/16/2012

Jul  4 15:49:50 DP kernel: Workqueue: events kill_ioctx_work

Jul  4 15:49:50 DP kernel: task: ffff8807dda974e0 ti: ffff8807dda98000 task.ti:
ffff8807dda98000
Jul  4 15:49:50 DP kernel: RIP: 0010:[<ffffffff8111a9a8>]  [<ffffffff8111a9a8>]
put_page+0x48/0x60
Jul  4 15:49:50 DP kernel: RSP: 0018:ffff8807dda99cd8  EFLAGS: 00010246

Jul  4 15:49:50 DP kernel: RAX: 0000000000000000 RBX: ffff8807be1f1e00 RCX:
0000000000000001
Jul  4 15:49:50 DP kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffffea001b196c80
Jul  4 15:49:50 DP kernel: RBP: ffff8807dda99cd8 R08: 0000000000000000 R09:
0000000000000000
Jul  4 15:49:50 DP kernel: R10: ffff8807ffbb5f00 R11: 000000000000005a R12:
0000000000000001
Jul  4 15:49:50 DP kernel: R13: 0000000000000000 R14: ffff8807dda974e0 R15:
ffff8807be1f1ec8
Jul  4 15:49:50 DP kernel: FS:  0000000000000000(0000) GS:ffff8807fd680000(0000)
knlGS:0000000000000000
Jul  4 15:49:50 DP kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

Jul  4 15:49:50 DP kernel: CR2: 0000003b826dc7d0 CR3: 0000000001a0b000 CR4:
00000000000007e0
Jul  4 15:49:50 DP kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Jul  4 15:49:50 DP kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Jul  4 15:49:50 DP kernel: Stack:

Jul  4 15:49:50 DP kernel: ffff8807dda99d18 ffffffff811b11f6 0000000000000000
0000000200000000
Jul  4 15:49:50 DP kernel: ffff8807be1f1e00 ffff8807be1f1e80 000000000000000c
0000000000000000
Jul  4 15:49:50 DP kernel: ffff8807dda99dc8 ffffffff811b21a2 00000001000438ec
ffff8807fd692d00
Jul  4 15:49:50 DP kernel: Call Trace:

Jul  4 15:49:50 DP kernel: [<ffffffff811b11f6>] aio_free_ring+0x96/0x1c0

Jul  4 15:49:50 DP kernel: [<ffffffff811b21a2>] free_ioctx+0x1f2/0x250

Jul  4 15:49:50 DP kernel: [<ffffffff81081a5d>] ? idle_balance+0xed/0x140

Jul  4 15:49:50 DP kernel: [<ffffffff811b221a>] put_ioctx+0x1a/0x30

Jul  4 15:49:50 DP kernel: [<ffffffff811b24af>] kill_ioctx_work+0x2f/0x40

Jul  4 15:49:50 DP kernel: [<ffffffff81060933>] process_one_work+0x183/0x490

Jul  4 15:49:50 DP kernel: [<ffffffff81061ac0>] worker_thread+0x120/0x3a0

Jul  4 15:49:50 DP kernel: [<ffffffff810619a0>] ? manage_workers+0x160/0x160

Jul  4 15:49:50 DP kernel: [<ffffffff8106786e>] kthread+0xce/0xe0

Jul  4 15:49:50 DP kernel: [<ffffffff810677a0>] ?
kthread_freezable_should_stop+0x70/0x70
Jul  4 15:49:50 DP kernel: [<ffffffff8154b79c>] ret_from_fork+0x7c/0xb0

Jul  4 15:49:50 DP kernel: [<ffffffff810677a0>] ?
kthread_freezable_should_stop+0x70/0x70
Jul  4 15:49:50 DP kernel: Code: 07 00 c0 75 1f f0 ff 4f 1c 0f 94 c0 84 c0 75 0b
c9 66 90 c3 0f 1f 80 00 00 00 00 e8 53 fe ff ff c9 66 90 c3 e8 7a fe ff ff c9 c3
<0f> 0b 66 0f 1f 44 00 00 eb f8 48 8b 47 30 eb bc 0f 1f 84 00 00
Jul  4 15:49:50 DP kernel: RIP  [<ffffffff8111a9a8>] put_page+0x48/0x60

Jul  4 15:49:50 DP kernel: RSP <ffff8807dda99cd8>

Jul  4 15:49:50 DP kernel: ---[ end trace b5e2c17407c840d8 ]---

INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 9, t=21056
jiffies, g=4158, c=4157, q=1040)
sending NMI to all CPUs:

NMI backtrace for cpu 4

CPU: 4 PID: 100 Comm: kworker/4:1 Tainted: GF     D      3.10.0-aio-migrate+
#107
Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS
Version 89.32 DP Proto 08/16/2012
task: ffff8807dda974e0 ti: ffff8807dda98000 task.ti: ffff8807dda98000

RIP: 0010:[<ffffffff81542cd2>]  [<ffffffff81542cd2>]
_raw_spin_lock_irq+0x22/0x30
RSP: 0018:ffff8807dda99618  EFLAGS: 00000002

RAX: 000000000000497c RBX: ffff8807fd692d00 RCX: ffff8807dda98010

RDX: 000000000000497e RSI: ffffffff815419a9 RDI: ffff8807fd692d00

RBP: ffff8807dda99618 R08: 0000000000000004 R09: 0000000000000100

R10: 00000000000009fe R11: 00000000000009fe R12: 0000000000000004

R13: 0000000000000009 R14: 0000000000000009 R15: 0000000000000000

FS:  0000000000000000(0000) GS:ffff8807fd680000(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000000028 CR3: 0000000001a0b000 CR4: 00000000000007e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Stack:

 ffff8807dda996a8 ffffffff815411b6 ffff8807dda99fd8 0000000000012d00

 ffff8807dda98010 0000000000012d00 0000000000012d00 0000000000012d00

 ffff8807dda99fd8 0000000000012d00 ffff8807dda974e0 ffff8807dda996c8

Call Trace:

 [<ffffffff815411b6>] __schedule+0xd6/0x6f0

 [<ffffffff815419a9>] schedule+0x29/0x70

 [<ffffffff810478ea>] do_exit+0x42a/0x480

 [<ffffffff81544029>] oops_end+0xa9/0xf0

 [<ffffffff81035c3e>] no_context+0x11e/0x1f0

 [<ffffffff81035e2d>] __bad_area_nosemaphore+0x11d/0x220

 [<ffffffff81035f43>] bad_area_nosemaphore+0x13/0x20

 [<ffffffff815469b5>] __do_page_fault+0xc5/0x490

 [<ffffffff810d4a97>] ? call_rcu_sched+0x17/0x20

 [<ffffffff8125d3da>] ? strlcpy+0x4a/0x60

 [<ffffffff81546d8e>] do_page_fault+0xe/0x10

 [<ffffffff815434f2>] page_fault+0x22/0x30

 [<ffffffff81067140>] ? kthread_data+0x10/0x20

 [<ffffffff8105e155>] wq_worker_sleeping+0x15/0xa0

 [<ffffffff8154168b>] __schedule+0x5ab/0x6f0

 [<ffffffff81239992>] ? put_io_context_active+0xc2/0xf0

 [<ffffffff815419a9>] schedule+0x29/0x70

 [<ffffffff81047795>] do_exit+0x2d5/0x480

 [<ffffffff81544029>] oops_end+0xa9/0xf0

 [<ffffffff810058eb>] die+0x5b/0x90

 [<ffffffff81543b8b>] do_trap+0xcb/0x170

 [<ffffffff81546e22>] ? __atomic_notifier_call_chain+0x12/0x20

 [<ffffffff81003565>] do_invalid_op+0x95/0xb0

 [<ffffffff8111a9a8>] ? put_page+0x48/0x60

 [<ffffffff8111c411>] ? truncate_inode_pages_range+0x201/0x500

 [<ffffffff8154c8e8>] invalid_op+0x18/0x20

 [<ffffffff8111a9a8>] ? put_page+0x48/0x60

 [<ffffffff8111c829>] ? truncate_setsize+0x19/0x20

 [<ffffffff811b11f6>] aio_free_ring+0x96/0x1c0

 [<ffffffff811b21a2>] free_ioctx+0x1f2/0x250

 [<ffffffff81081a5d>] ? idle_balance+0xed/0x140

 [<ffffffff811b221a>] put_ioctx+0x1a/0x30

 [<ffffffff811b24af>] kill_ioctx_work+0x2f/0x40

 [<ffffffff81060933>] process_one_work+0x183/0x490

 [<ffffffff81061ac0>] worker_thread+0x120/0x3a0

 [<ffffffff810619a0>] ? manage_workers+0x160/0x160

 [<ffffffff8106786e>] kthread+0xce/0xe0

 [<ffffffff810677a0>] ? kthread_freezable_should_stop+0x70/0x70

 [<ffffffff8154b79c>] ret_from_fork+0x7c/0xb0

 [<ffffffff810677a0>] ? kthread_freezable_should_stop+0x70/0x70

Code: 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 fa b8 00 00 01 00 f0
0f c1 07 89 c2 c1 ea 10 66 39 c2 74 0d 0f 1f 00 f3 90 <0f> b7 07 66 39 c2 75 f6
c9 c3 0f 1f 40 00 55 48 89 e5 66 66 66
NMI backtrace for cpu 1


> 
> 		-ben



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-07-04  6:51                                     ` Gu Zheng
@ 2013-07-04 11:41                                       ` Benjamin LaHaise
  2013-07-05  3:21                                         ` Gu Zheng
  0 siblings, 1 reply; 37+ messages in thread
From: Benjamin LaHaise @ 2013-07-04 11:41 UTC (permalink / raw)
  To: Gu Zheng
  Cc: Tang Chen, Mel Gorman, Minchan Kim, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On Thu, Jul 04, 2013 at 02:51:18PM +0800, Gu Zheng wrote:
> Hi Ben,
>       When I test your patch on kernel 3.10, the kernel panic when aio job
> complete or exit, exactly in aio_free_ring(), the following is a part of dmesg.

What is your test case?

		-ben

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())
  2013-07-04 11:41                                       ` Benjamin LaHaise
@ 2013-07-05  3:21                                         ` Gu Zheng
  0 siblings, 0 replies; 37+ messages in thread
From: Gu Zheng @ 2013-07-05  3:21 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Tang Chen, Mel Gorman, Minchan Kim, akpm, viro, khlebnikov,
	walken, kamezawa.hiroyu, riel, rientjes, isimatu.yasuaki, wency,
	laijs, jiang.liu, zab, jmoyer, linux-mm, linux-aio,
	linux-fsdevel, linux-kernel, Marek Szyprowski

On 07/04/2013 07:41 PM, Benjamin LaHaise wrote:

> On Thu, Jul 04, 2013 at 02:51:18PM +0800, Gu Zheng wrote:
>> Hi Ben,
>>       When I test your patch on kernel 3.10, the kernel panic when aio job
>> complete or exit, exactly in aio_free_ring(), the following is a part of dmesg.
> 
> What is your test case?

Just the one you mentioned in the previous mail:
 http://www.kvack.org/~bcrl/aio/aio-numa-test.c

Thanks,
Gu

> 
> 		-ben
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2013-07-05  3:24 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-05  9:21 [PATCH V2 0/2] mm: hotplug: implement non-movable version of get_user_pages() to kill long-time pin pages Lin Feng
2013-02-05  9:21 ` [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
2013-02-05 12:01   ` Mel Gorman
2013-02-06  0:42     ` Minchan Kim
2013-02-06  0:52       ` Benjamin LaHaise
2013-02-06  9:56       ` Mel Gorman
2013-02-08  2:32         ` Minchan Kim
2013-05-13  9:11         ` Tang Chen
2013-05-13  9:19           ` Mel Gorman
2013-05-13 14:37             ` Benjamin LaHaise
2013-05-13 14:54               ` Jeff Moyer
2013-05-13 15:01                 ` Benjamin LaHaise
2013-05-14  1:24                   ` Tang Chen
2013-05-14 13:58                     ` Benjamin LaHaise
2013-05-15  2:09                       ` Tang Chen
2013-05-15  7:21                         ` Tang Chen
2013-05-14  3:55             ` Tang Chen
2013-05-15 13:24               ` Mel Gorman
2013-05-16  5:54                 ` Tang Chen
2013-05-17  0:23                   ` [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable()) Benjamin LaHaise
2013-05-17  3:28                     ` Tang Chen
2013-05-17 14:37                       ` Benjamin LaHaise
2013-05-21  2:07                         ` Tang Chen
2013-05-21  2:27                           ` Benjamin LaHaise
2013-06-11  9:42                             ` Tang Chen
2013-06-11 14:45                               ` Benjamin LaHaise
2013-06-28  9:24                                 ` Gu Zheng
2013-07-01  7:23                                 ` Gu Zheng
2013-07-02 18:00                                   ` Benjamin LaHaise
2013-07-03  1:53                                     ` Gu Zheng
2013-07-04  6:51                                     ` Gu Zheng
2013-07-04 11:41                                       ` Benjamin LaHaise
2013-07-05  3:21                                         ` Gu Zheng
2013-05-17 18:17                     ` Zach Brown
2013-05-17 18:30                       ` Benjamin LaHaise
     [not found]   ` <20130220113757.GA10124@hacker.(null)>
2013-02-20 12:39     ` [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable() Lin Feng
2013-02-05  9:21 ` [PATCH V2 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove Lin Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).