All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] Accelerating page migrations
@ 2017-02-17 15:05 Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 01/14] mm/migrate: Add new mode parameter to migrate_page_copy() function Zi Yan
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <zi.yan@cs.rutgers.edu>

Hi all,

I was asked to post my complete patcheset for comment, so this patchset includes
parallel page migration patches I sent before. It is rebased on
mmotm-2017-02-07-15-20.

This RFC is trying to accelerate huge page migrations, so it relies on
THP migration from Naoya (https://lwn.net/Articles/713667/). There are a lot of
rough edges in the code, but I want to get comments about how those ideas look
to you.

You can find kernel code with this patchset and THP migration patchset here:
https://github.com/x-y-z/linux-thp-migration/tree/page_migration_opt_linux-mm

General description:
=======================================================

There are four parts:
1. Parallel page migration (Patch 1-6). It uses multi-threaded process instead of
   existing single-threaded one to transfer huge pages, where each thread will
   transfer a part of a huge page. It makes better use of existing memory 
   bandwidth.

2. Concurrent page migration (Patch 7,8). Linux currently transfer a list of pages
   sequentially. This is good for error handling, but bad for utilizing 
   memory bandwidth. This part batches page migration. It first
   unmaps all pages, then copy all pages together, finally reinstates PTEs
   after data copy is done. Only anonymous page is supported.

3. Exchange page migration (Patch 9-11). This is trying to save new page allocations
   when two-way page migrations are performed between two memory nodes. 
   Instead of repeating new page allocation then page migraiton, it simply 
   exchange page content of two peer pages. Only anonymous page is supported.

4. DMA page migration (Patch 12-14). This is trying to free CPUs from data copy.
   It uses DMA engine in a system to copy page data instead of CPU threads.

Experiment results:
=======================================================

I did page migration micro-benchmark with the changes on a two-socket Intel
E5-2640v3 with DDR4 at 1866MHz and cross-socket BW 32.0GB/s. This machine also
has 16 channel IOAT DMA, which provides ~11GB/s data copy throughput.

1. Parallel page migration: it increases 2MB page migration throughput from
   3.0GB/s (1-thread) to 8.6GB/s (8-thread).
2. Concurrent page migration: it increases 16 2MB page migration throughput from
   3.3GB/s (1-thread) to 14.4GB/s (8-thread).
3. Exchange page migration: it increases 16 two-way 2MB page migration throughput
   from 3.3GB/s (1-thread) to 17.8GB/s (8-thread).
4. DMA page migration: it can saturate Intel DMA engine's 11GB/s data copy throughput.

The improvements (except for DMA) was also tested on a IBM Power8 and NVIDIA
ARM64 systems and the range of improvements in results is very similar.

Best Regards,
Yan Zi


Zi Yan (14):
  mm/migrate: Add new mode parameter to migrate_page_copy() function
  mm/migrate: Make migrate_mode types non-exclussive
  mm/migrate: Add copy_pages_mthread function
  mm/migrate: Add new migrate mode MIGRATE_MT
  mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls
  sysctl: Add global tunable mt_page_copy
  migrate: Add copy_page_lists_mthread() function.
  mm: migrate: Add concurrent page migration into move_pages syscall.
  mm: migrate: Add exchange_page_mthread() and
    exchange_page_lists_mthread() to exchange two pages or two page
    lists.
  mm: Add exchange_pages and exchange_pages_concur functions to exchange
    two lists of pages instead of two migrate_pages().
  mm: migrate: Add exchange_pages syscall to exchange two page lists.
  migrate: Add copy_page_dma to use DMA Engine to copy pages.
  mm: migrate: Add copy_page_dma into migrate_page_copy.
  mm: Add copy_page_lists_dma_always to support copy a list of pages.

 arch/x86/entry/syscalls/syscall_64.tbl |    2 +
 fs/aio.c                               |    2 +-
 fs/f2fs/data.c                         |    2 +-
 fs/hugetlbfs/inode.c                   |    2 +-
 fs/ubifs/file.c                        |    2 +-
 include/linux/highmem.h                |    3 +
 include/linux/ksm.h                    |    5 +
 include/linux/migrate.h                |    6 +-
 include/linux/migrate_mode.h           |   10 +-
 include/linux/sched/sysctl.h           |    4 +
 include/linux/syscalls.h               |    5 +
 include/uapi/linux/mempolicy.h         |    6 +-
 kernel/sysctl.c                        |   32 +
 mm/Makefile                            |    3 +
 mm/compaction.c                        |   20 +-
 mm/copy_pages.c                        |  720 ++++++++++++++++++
 mm/exchange.c                          | 1257 ++++++++++++++++++++++++++++++++
 mm/internal.h                          |   11 +
 mm/ksm.c                               |   35 +
 mm/mempolicy.c                         |    7 +-
 mm/migrate.c                           |  573 ++++++++++++++-
 mm/shmem.c                             |    2 +-
 22 files changed, 2662 insertions(+), 47 deletions(-)
 create mode 100644 mm/copy_pages.c
 create mode 100644 mm/exchange.c

-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC PATCH 01/14] mm/migrate: Add new mode parameter to migrate_page_copy() function
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 02/14] mm/migrate: Make migrate_mode types non-exclussive Zi Yan
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <zi.yan@cs.rutgers.edu>

This is a prequisite change required to make page migration framewok
copy in different modes like the default single threaded or the new
multi threaded one yet to be introduced in follow up patches. This
does not change any existing functionality. Only migrate_page_copy()
and copy_huge_page() function's signatures are affected.

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 fs/aio.c                     |  2 +-
 fs/f2fs/data.c               |  2 +-
 fs/hugetlbfs/inode.c         |  2 +-
 fs/ubifs/file.c              |  2 +-
 include/linux/migrate.h      |  6 ++++--
 include/linux/migrate_mode.h |  1 +
 mm/migrate.c                 | 14 ++++++++------
 mm/shmem.c                   |  2 +-
 8 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 5517794add2d..6e46c2887aa8 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -418,7 +418,7 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
 	 * events from being lost.
 	 */
 	spin_lock_irqsave(&ctx->completion_lock, flags);
-	migrate_page_copy(new, old);
+	migrate_page_copy(new, old, MIGRATE_ST);
 	BUG_ON(ctx->ring_pages[idx] != old);
 	ctx->ring_pages[idx] = new;
 	spin_unlock_irqrestore(&ctx->completion_lock, flags);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9f0ba90b92e4..4ac32954b59c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1937,7 +1937,7 @@ int f2fs_migrate_page(struct address_space *mapping,
 		SetPagePrivate(newpage);
 	set_page_private(newpage, page_private(page));
 
-	migrate_page_copy(newpage, page);
+	migrate_page_copy(newpage, page, MIGRATE_ST);
 
 	return MIGRATEPAGE_SUCCESS;
 }
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 4fb7b10f3a05..62b740524d54 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -850,7 +850,7 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
 	rc = migrate_huge_page_move_mapping(mapping, newpage, page);
 	if (rc != MIGRATEPAGE_SUCCESS)
 		return rc;
-	migrate_page_copy(newpage, page);
+	migrate_page_copy(newpage, page, MIGRATE_ST);
 
 	return MIGRATEPAGE_SUCCESS;
 }
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 957ec7c90b43..8a8980ceb9cc 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1468,7 +1468,7 @@ static int ubifs_migrate_page(struct address_space *mapping,
 		SetPagePrivate(newpage);
 	}
 
-	migrate_page_copy(newpage, page);
+	migrate_page_copy(newpage, page, MIGRATE_ST);
 	return MIGRATEPAGE_SUCCESS;
 }
 #endif
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index fa76b516fa47..63cfcb7ddbb5 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -42,7 +42,8 @@ extern void putback_movable_page(struct page *page);
 
 extern int migrate_prep(void);
 extern int migrate_prep_local(void);
-extern void migrate_page_copy(struct page *newpage, struct page *page);
+extern void migrate_page_copy(struct page *newpage, struct page *page,
+			enum migrate_mode mode);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct page *newpage, struct page *page);
 extern int migrate_page_move_mapping(struct address_space *mapping,
@@ -63,7 +64,8 @@ static inline int migrate_prep(void) { return -ENOSYS; }
 static inline int migrate_prep_local(void) { return -ENOSYS; }
 
 static inline void migrate_page_copy(struct page *newpage,
-				     struct page *page) {}
+				     struct page *page,
+				     enum migrate_mode mode) {}
 
 static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct page *newpage, struct page *page)
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index ebf3d89a3919..b3b9acbff444 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -11,6 +11,7 @@ enum migrate_mode {
 	MIGRATE_ASYNC,
 	MIGRATE_SYNC_LIGHT,
 	MIGRATE_SYNC,
+	MIGRATE_ST
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/migrate.c b/mm/migrate.c
index e9de317820f8..5913f5b54832 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -596,7 +596,8 @@ static void __copy_gigantic_page(struct page *dst, struct page *src,
 	}
 }
 
-static void copy_huge_page(struct page *dst, struct page *src)
+static void copy_huge_page(struct page *dst, struct page *src,
+				enum migrate_mode mode)
 {
 	int i;
 	int nr_pages;
@@ -625,12 +626,13 @@ static void copy_huge_page(struct page *dst, struct page *src)
 /*
  * Copy the page to its new location
  */
-void migrate_page_copy(struct page *newpage, struct page *page)
+void migrate_page_copy(struct page *newpage, struct page *page,
+					   enum migrate_mode mode)
 {
 	int cpupid;
 
 	if (PageHuge(page) || PageTransHuge(page))
-		copy_huge_page(newpage, page);
+		copy_huge_page(newpage, page, mode);
 	else
 		copy_highpage(newpage, page);
 
@@ -712,7 +714,7 @@ int migrate_page(struct address_space *mapping,
 	if (rc != MIGRATEPAGE_SUCCESS)
 		return rc;
 
-	migrate_page_copy(newpage, page);
+	migrate_page_copy(newpage, page, mode);
 	return MIGRATEPAGE_SUCCESS;
 }
 EXPORT_SYMBOL(migrate_page);
@@ -762,7 +764,7 @@ int buffer_migrate_page(struct address_space *mapping,
 
 	SetPagePrivate(newpage);
 
-	migrate_page_copy(newpage, page);
+	migrate_page_copy(newpage, page, MIGRATE_ST);
 
 	bh = head;
 	do {
@@ -1994,7 +1996,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	/* anon mapping, we can simply copy page->mapping to the new page: */
 	new_page->mapping = page->mapping;
 	new_page->index = page->index;
-	migrate_page_copy(new_page, page);
+	migrate_page_copy(new_page, page, MIGRATE_ST);
 	WARN_ON(PageLRU(new_page));
 
 	/* Recheck the target PMD */
diff --git a/mm/shmem.c b/mm/shmem.c
index c6ea40d78028..dd04932dc97d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2522,7 +2522,7 @@ static int shmem_migrate_page(struct address_space *mapping,
 	rc = shmem_migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0);
 	if (rc != MIGRATEPAGE_SUCCESS)
 		return rc;
-	migrate_page_copy(newpage, page);
+	migrate_page_copy(newpage, page, MIGRATE_ST);
 
 	return MIGRATEPAGE_SUCCESS;
 }
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 02/14] mm/migrate: Make migrate_mode types non-exclussive
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 01/14] mm/migrate: Add new mode parameter to migrate_page_copy() function Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function Zi Yan
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

It basically changes the enum declaration from numbers to bit positions
so that they can be used in combination which was not the case earlier.
No functionality has been changed.

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/linux/migrate_mode.h |  8 ++++----
 mm/compaction.c              | 20 ++++++++++----------
 mm/migrate.c                 | 14 +++++++-------
 3 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index b3b9acbff444..89c170060e5b 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -8,10 +8,10 @@
  * MIGRATE_SYNC will block when migrating pages
  */
 enum migrate_mode {
-	MIGRATE_ASYNC,
-	MIGRATE_SYNC_LIGHT,
-	MIGRATE_SYNC,
-	MIGRATE_ST
+	MIGRATE_ASYNC		= 1<<0,
+	MIGRATE_SYNC_LIGHT	= 1<<1,
+	MIGRATE_SYNC		= 1<<2,
+	MIGRATE_ST		= 1<<3,
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/compaction.c b/mm/compaction.c
index 5657a75ea6a8..de4634c60cca 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -296,7 +296,7 @@ static void update_pageblock_skip(struct compact_control *cc,
 	if (migrate_scanner) {
 		if (pfn > zone->compact_cached_migrate_pfn[0])
 			zone->compact_cached_migrate_pfn[0] = pfn;
-		if (cc->mode != MIGRATE_ASYNC &&
+		if (!(cc->mode & MIGRATE_ASYNC) &&
 		    pfn > zone->compact_cached_migrate_pfn[1])
 			zone->compact_cached_migrate_pfn[1] = pfn;
 	} else {
@@ -329,7 +329,7 @@ static void update_pageblock_skip(struct compact_control *cc,
 static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
 						struct compact_control *cc)
 {
-	if (cc->mode == MIGRATE_ASYNC) {
+	if (cc->mode & MIGRATE_ASYNC) {
 		if (!spin_trylock_irqsave(lock, *flags)) {
 			cc->contended = true;
 			return false;
@@ -370,7 +370,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
 	}
 
 	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC) {
+		if (cc->mode & MIGRATE_ASYNC) {
 			cc->contended = true;
 			return true;
 		}
@@ -393,7 +393,7 @@ static inline bool compact_should_abort(struct compact_control *cc)
 {
 	/* async compaction aborts if contended */
 	if (need_resched()) {
-		if (cc->mode == MIGRATE_ASYNC) {
+		if (cc->mode & MIGRATE_ASYNC) {
 			cc->contended = true;
 			return true;
 		}
@@ -688,7 +688,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (cc->mode == MIGRATE_ASYNC)
+		if (cc->mode & MIGRATE_ASYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -700,7 +700,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	if (compact_should_abort(cc))
 		return 0;
 
-	if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
+	if (cc->direct_compaction && (cc->mode & MIGRATE_ASYNC)) {
 		skip_on_failure = true;
 		next_skip_pfn = block_end_pfn(low_pfn, cc->order);
 	}
@@ -1195,7 +1195,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	struct page *page;
 	const isolate_mode_t isolate_mode =
 		(sysctl_compact_unevictable_allowed ? ISOLATE_UNEVICTABLE : 0) |
-		(cc->mode != MIGRATE_SYNC ? ISOLATE_ASYNC_MIGRATE : 0);
+		(!(cc->mode & MIGRATE_SYNC) ? ISOLATE_ASYNC_MIGRATE : 0);
 
 	/*
 	 * Start at where we last stopped, or beginning of the zone as
@@ -1241,7 +1241,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 		 * Async compaction is optimistic to see if the minimum amount
 		 * of work satisfies the allocation.
 		 */
-		if (cc->mode == MIGRATE_ASYNC &&
+		if ((cc->mode & MIGRATE_ASYNC) &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page)))
 			continue;
 
@@ -1481,7 +1481,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	unsigned long start_pfn = zone->zone_start_pfn;
 	unsigned long end_pfn = zone_end_pfn(zone);
 	const int migratetype = gfpflags_to_migratetype(cc->gfp_mask);
-	const bool sync = cc->mode != MIGRATE_ASYNC;
+	const bool sync = !(cc->mode & MIGRATE_ASYNC);
 
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
@@ -1577,7 +1577,7 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 			 * order-aligned block, so skip the rest of it.
 			 */
 			if (cc->direct_compaction &&
-						(cc->mode == MIGRATE_ASYNC)) {
+						(cc->mode & MIGRATE_ASYNC)) {
 				cc->migrate_pfn = block_end_pfn(
 						cc->migrate_pfn - 1, cc->order);
 				/* Draining pcplists is useless in this case */
diff --git a/mm/migrate.c b/mm/migrate.c
index 5913f5b54832..87253cb9b50a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -359,7 +359,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head,
 	struct buffer_head *bh = head;
 
 	/* Simple case, sync compaction */
-	if (mode != MIGRATE_ASYNC) {
+	if (!(mode & MIGRATE_ASYNC)) {
 		do {
 			get_bh(bh);
 			lock_buffer(bh);
@@ -460,7 +460,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	 * the mapping back due to an elevated page count, we would have to
 	 * block waiting on other references to be dropped.
 	 */
-	if (mode == MIGRATE_ASYNC && head &&
+	if ((mode & MIGRATE_ASYNC) && head &&
 			!buffer_migrate_lock_buffers(head, mode)) {
 		page_ref_unfreeze(page, expected_count);
 		spin_unlock_irq(&mapping->tree_lock);
@@ -746,7 +746,7 @@ int buffer_migrate_page(struct address_space *mapping,
 	 * with an IRQ-safe spinlock held. In the sync case, the buffers
 	 * need to be locked now
 	 */
-	if (mode != MIGRATE_ASYNC)
+	if (!(mode & MIGRATE_ASYNC))
 		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
 
 	ClearPagePrivate(page);
@@ -828,7 +828,7 @@ static int fallback_migrate_page(struct address_space *mapping,
 {
 	if (PageDirty(page)) {
 		/* Only writeback pages in full synchronous migration */
-		if (mode != MIGRATE_SYNC)
+		if (!(mode & MIGRATE_SYNC))
 			return -EBUSY;
 		return writeout(mapping, page);
 	}
@@ -937,7 +937,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	bool is_lru = !__PageMovable(page);
 
 	if (!trylock_page(page)) {
-		if (!force || mode == MIGRATE_ASYNC)
+		if (!force || (mode & MIGRATE_ASYNC))
 			goto out;
 
 		/*
@@ -966,7 +966,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		 * the retry loop is too short and in the sync-light case,
 		 * the overhead of stalling is too much
 		 */
-		if (mode != MIGRATE_SYNC) {
+		if (!(mode & MIGRATE_SYNC)) {
 			rc = -EBUSY;
 			goto out_unlock;
 		}
@@ -1236,7 +1236,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 		return -ENOMEM;
 
 	if (!trylock_page(hpage)) {
-		if (!force || mode != MIGRATE_SYNC)
+		if (!force || !(mode & MIGRATE_SYNC))
 			goto out;
 		lock_page(hpage);
 	}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 01/14] mm/migrate: Add new mode parameter to migrate_page_copy() function Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 02/14] mm/migrate: Make migrate_mode types non-exclussive Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-23  6:06   ` Naoya Horiguchi
  2017-02-17 15:05 ` [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT Zi Yan
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

This change adds a new function copy_pages_mthread to enable multi threaded
page copy which can be utilized during migration. This function splits the
page copy request into multiple threads which will handle individual chunk
and send them as jobs to system_highpri_wq work queue.

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/linux/highmem.h |  2 ++
 mm/Makefile             |  2 ++
 mm/copy_pages.c         | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 90 insertions(+)
 create mode 100644 mm/copy_pages.c

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index bb3f3297062a..e1f4f1b82812 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -236,6 +236,8 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
 
 #endif
 
+int copy_pages_mthread(struct page *to, struct page *from, int nr_pages);
+
 static inline void copy_highpage(struct page *to, struct page *from)
 {
 	char *vfrom, *vto;
diff --git a/mm/Makefile b/mm/Makefile
index aa0aa17cb413..cdd4bab9cc66 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -43,6 +43,8 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
 
 obj-y += init-mm.o
 
+obj-y += copy_pages.o
+
 ifdef CONFIG_NO_BOOTMEM
 	obj-y		+= nobootmem.o
 else
diff --git a/mm/copy_pages.c b/mm/copy_pages.c
new file mode 100644
index 000000000000..c357e7b01042
--- /dev/null
+++ b/mm/copy_pages.c
@@ -0,0 +1,86 @@
+/*
+ * This implements parallel page copy function through multi threaded
+ * work queues.
+ *
+ * Zi Yan <ziy@nvidia.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include <linux/highmem.h>
+#include <linux/workqueue.h>
+#include <linux/slab.h>
+#include <linux/freezer.h>
+
+/*
+ * nr_copythreads can be the highest number of threads for given node
+ * on any architecture. The actual number of copy threads will be
+ * limited by the cpumask weight of the target node.
+ */
+unsigned int nr_copythreads = 8;
+
+struct copy_info {
+	struct work_struct copy_work;
+	char *to;
+	char *from;
+	unsigned long chunk_size;
+};
+
+static void copy_pages(char *vto, char *vfrom, unsigned long size)
+{
+	memcpy(vto, vfrom, size);
+}
+
+static void copythread(struct work_struct *work)
+{
+	struct copy_info *info = (struct copy_info *) work;
+
+	copy_pages(info->to, info->from, info->chunk_size);
+}
+
+int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
+{
+	unsigned int node = page_to_nid(to);
+	const struct cpumask *cpumask = cpumask_of_node(node);
+	struct copy_info *work_items;
+	char *vto, *vfrom;
+	unsigned long i, cthreads, cpu, chunk_size;
+	int cpu_id_list[32] = {0};
+
+	cthreads = nr_copythreads;
+	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));
+	cthreads = (cthreads / 2) * 2;
+	work_items = kcalloc(cthreads, sizeof(struct copy_info), GFP_KERNEL);
+	if (!work_items)
+		return -ENOMEM;
+
+	i = 0;
+	for_each_cpu(cpu, cpumask) {
+		if (i >= cthreads)
+			break;
+		cpu_id_list[i] = cpu;
+		++i;
+	}
+
+	vfrom = kmap(from);
+	vto = kmap(to);
+	chunk_size = PAGE_SIZE * nr_pages / cthreads;
+
+	for (i = 0; i < cthreads; ++i) {
+		INIT_WORK((struct work_struct *) &work_items[i], copythread);
+
+		work_items[i].to = vto + i * chunk_size;
+		work_items[i].from = vfrom + i * chunk_size;
+		work_items[i].chunk_size = chunk_size;
+
+		queue_work_on(cpu_id_list[i], system_highpri_wq,
+					  (struct work_struct *) &work_items[i]);
+	}
+
+	for (i = 0; i < cthreads; ++i)
+		flush_work((struct work_struct *) &work_items[i]);
+
+	kunmap(to);
+	kunmap(from);
+	kfree(work_items);
+	return 0;
+}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (2 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-23  6:54   ` Naoya Horiguchi
  2017-02-17 15:05 ` [RFC PATCH 05/14] mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls Zi Yan
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

This change adds a new migration mode called MIGRATE_MT to enable multi
threaded page copy implementation inside copy_huge_page() function by
selectively calling copy_pages_mthread() when requested. But it still
falls back using the regular page copy mechanism instead the previous
multi threaded attempt fails. It also attempts multi threaded copy for
regular pages.

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/linux/migrate_mode.h |  1 +
 mm/migrate.c                 | 25 ++++++++++++++++++-------
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 89c170060e5b..d344ad60f499 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -12,6 +12,7 @@ enum migrate_mode {
 	MIGRATE_SYNC_LIGHT	= 1<<1,
 	MIGRATE_SYNC		= 1<<2,
 	MIGRATE_ST		= 1<<3,
+	MIGRATE_MT		= 1<<4,
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/migrate.c b/mm/migrate.c
index 87253cb9b50a..21307219428d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -601,6 +601,7 @@ static void copy_huge_page(struct page *dst, struct page *src,
 {
 	int i;
 	int nr_pages;
+	int rc = -EFAULT;
 
 	if (PageHuge(src)) {
 		/* hugetlbfs page */
@@ -617,10 +618,14 @@ static void copy_huge_page(struct page *dst, struct page *src,
 		nr_pages = hpage_nr_pages(src);
 	}
 
-	for (i = 0; i < nr_pages; i++) {
-		cond_resched();
-		copy_highpage(dst + i, src + i);
-	}
+	if (mode & MIGRATE_MT)
+		rc = copy_pages_mthread(dst, src, nr_pages);
+
+	if (rc)
+		for (i = 0; i < nr_pages; i++) {
+			cond_resched();
+			copy_highpage(dst + i, src + i);
+		}
 }
 
 /*
@@ -631,10 +636,16 @@ void migrate_page_copy(struct page *newpage, struct page *page,
 {
 	int cpupid;
 
-	if (PageHuge(page) || PageTransHuge(page))
+	if (PageHuge(page) || PageTransHuge(page)) {
 		copy_huge_page(newpage, page, mode);
-	else
-		copy_highpage(newpage, page);
+	} else {
+		if (mode & MIGRATE_MT) {
+			if (copy_pages_mthread(newpage, page, 1))
+				copy_highpage(newpage, page);
+		} else {
+			copy_highpage(newpage, page);
+		}
+	}
 
 	if (PageError(page))
 		SetPageError(newpage);
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 05/14] mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (3 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 06/14] sysctl: Add global tunable mt_page_copy Zi Yan
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

This change adds a new mode flag MPOL_MF_MOVE_MT for migration system
calls like move_pages() and mbind() which indicates request for using
the multi threaded copy method.

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 include/uapi/linux/mempolicy.h |  4 +++-
 mm/mempolicy.c                 |  7 ++++++-
 mm/migrate.c                   | 14 ++++++++++----
 3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 9cd8b21dddbe..8f1db2e2d677 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -53,10 +53,12 @@ enum mpol_rebind_step {
 #define MPOL_MF_MOVE_ALL (1<<2)	/* Move every page to conform to policy */
 #define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
 #define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
+#define MPOL_MF_MOVE_MT  (1<<6)	/* Use multi-threaded page copy routine */
 
 #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
 			 MPOL_MF_MOVE     | 	\
-			 MPOL_MF_MOVE_ALL)
+			 MPOL_MF_MOVE_ALL |	\
+			 MPOL_MF_MOVE_MT)
 
 /*
  * Internal flags that share the struct mempolicy flags word with
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 435bb7bec0a5..fc714840538e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1300,9 +1300,14 @@ static long do_mbind(unsigned long start, unsigned long len,
 		int nr_failed = 0;
 
 		if (!list_empty(&pagelist)) {
+			enum migrate_mode mode = MIGRATE_SYNC;
+
+			if (flags & MPOL_MF_MOVE_MT)
+				mode |= MIGRATE_MT;
+
 			WARN_ON_ONCE(flags & MPOL_MF_LAZY);
 			nr_failed = migrate_pages(&pagelist, new_page, NULL,
-				start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND);
+					start, mode, MR_MEMPOLICY_MBIND);
 			if (nr_failed)
 				putback_movable_pages(&pagelist);
 		}
diff --git a/mm/migrate.c b/mm/migrate.c
index 21307219428d..2e58aad7c96f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1446,11 +1446,16 @@ static struct page *new_page_node(struct page *p, unsigned long private,
  */
 static int do_move_page_to_node_array(struct mm_struct *mm,
 				      struct page_to_node *pm,
-				      int migrate_all)
+				      int migrate_all,
+					  int migrate_use_mt)
 {
 	int err;
 	struct page_to_node *pp;
 	LIST_HEAD(pagelist);
+	enum migrate_mode mode = MIGRATE_SYNC;
+
+	if (migrate_use_mt)
+		mode |= MIGRATE_MT;
 
 	down_read(&mm->mmap_sem);
 
@@ -1527,7 +1532,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 	err = 0;
 	if (!list_empty(&pagelist)) {
 		err = migrate_pages(&pagelist, new_page_node, NULL,
-				(unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL);
+				(unsigned long)pm, mode, MR_SYSCALL);
 		if (err)
 			putback_movable_pages(&pagelist);
 	}
@@ -1604,7 +1609,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 
 		/* Migrate this chunk */
 		err = do_move_page_to_node_array(mm, pm,
-						 flags & MPOL_MF_MOVE_ALL);
+						 flags & MPOL_MF_MOVE_ALL,
+						 flags & MPOL_MF_MOVE_MT);
 		if (err < 0)
 			goto out_pm;
 
@@ -1711,7 +1717,7 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
 	nodemask_t task_nodes;
 
 	/* Check flags */
-	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL))
+	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_MOVE_MT))
 		return -EINVAL;
 
 	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 06/14] sysctl: Add global tunable mt_page_copy
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (4 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 05/14] mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function Zi Yan
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

A new global sysctl tunable 'mt_page_copy' is added which will override
syscall specific requests and enable multi threaded page copy during
all migrations on the system. This tunable is disabled by default.

Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 kernel/sysctl.c | 11 +++++++++++
 mm/migrate.c    |  5 +++++
 2 files changed, 16 insertions(+)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5b8c0fb3f0ea..70a654146519 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -97,6 +97,8 @@
 
 #if defined(CONFIG_SYSCTL)
 
+extern int mt_page_copy;
+
 /* External variables not in a header file. */
 extern int suid_dumpable;
 #ifdef CONFIG_COREDUMP
@@ -1360,6 +1362,15 @@ static struct ctl_table vm_table[] = {
 		.proc_handler   = &hugetlb_mempolicy_sysctl_handler,
 	},
 #endif
+	{
+		.procname	= "mt_page_copy",
+		.data		= &mt_page_copy,
+		.maxlen		= sizeof(mt_page_copy),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 	 {
 		.procname	= "hugetlb_shm_group",
 		.data		= &sysctl_hugetlb_shm_group,
diff --git a/mm/migrate.c b/mm/migrate.c
index 2e58aad7c96f..0e9b1f17cf8b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -48,6 +48,8 @@
 
 #include "internal.h"
 
+int mt_page_copy = 0;
+
 /*
  * migrate_prep() needs to be called before we start compiling a list of pages
  * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
@@ -618,6 +620,9 @@ static void copy_huge_page(struct page *dst, struct page *src,
 		nr_pages = hpage_nr_pages(src);
 	}
 
+	if (mt_page_copy)
+		mode |= MIGRATE_MT;
+
 	if (mode & MIGRATE_MT)
 		rc = copy_pages_mthread(dst, src, nr_pages);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (5 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 06/14] sysctl: Add global tunable mt_page_copy Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-23  8:54   ` Naoya Horiguchi
  2017-02-17 15:05 ` [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall Zi Yan
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

It supports copying a list of pages via multi-threaded process.
It evenly distributes a list of pages to a group of threads and
uses the same subroutine as copy_page_mthread()

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/copy_pages.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/internal.h   |  3 +++
 2 files changed, 65 insertions(+)

diff --git a/mm/copy_pages.c b/mm/copy_pages.c
index c357e7b01042..516c0a1a57f3 100644
--- a/mm/copy_pages.c
+++ b/mm/copy_pages.c
@@ -84,3 +84,65 @@ int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
 	kfree(work_items);
 	return 0;
 }
+
+int copy_page_lists_mthread(struct page **to, struct page **from, int nr_pages) 
+{
+	int err = 0;
+	unsigned int cthreads, node = page_to_nid(*to);
+	int i;
+	struct copy_info *work_items;
+	int nr_pages_per_page = hpage_nr_pages(*from);
+	const struct cpumask *cpumask = cpumask_of_node(node);
+	int cpu_id_list[32] = {0};
+	int cpu;
+
+	cthreads = nr_copythreads;
+	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));
+	cthreads = (cthreads / 2) * 2;
+	cthreads = min_t(unsigned int, nr_pages, cthreads);
+
+	work_items = kzalloc(sizeof(struct copy_info)*nr_pages,
+						 GFP_KERNEL);
+	if (!work_items)
+		return -ENOMEM;
+
+	i = 0;
+	for_each_cpu(cpu, cpumask) {
+		if (i >= cthreads)
+			break;
+		cpu_id_list[i] = cpu;
+		++i;
+	}
+
+	for (i = 0; i < nr_pages; ++i) {
+		int thread_idx = i % cthreads;
+
+		INIT_WORK((struct work_struct *)&work_items[i], 
+				  copythread);
+
+		work_items[i].to = kmap(to[i]);
+		work_items[i].from = kmap(from[i]);
+		work_items[i].chunk_size = PAGE_SIZE * hpage_nr_pages(from[i]);
+
+		BUG_ON(nr_pages_per_page != hpage_nr_pages(from[i]));
+		BUG_ON(nr_pages_per_page != hpage_nr_pages(to[i]));
+
+
+		queue_work_on(cpu_id_list[thread_idx], 
+					  system_highpri_wq, 
+					  (struct work_struct *)&work_items[i]);
+	}
+
+	/* Wait until it finishes  */
+	for (i = 0; i < cthreads; ++i)
+		flush_work((struct work_struct *) &work_items[i]);
+
+	for (i = 0; i < nr_pages; ++i) {
+			kunmap(to[i]);
+			kunmap(from[i]);
+	}
+
+	kfree(work_items);
+
+	return err;
+}
diff --git a/mm/internal.h b/mm/internal.h
index ccfc2a2969f4..175e08ed524a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -498,4 +498,7 @@ extern const struct trace_print_flags pageflag_names[];
 extern const struct trace_print_flags vmaflag_names[];
 extern const struct trace_print_flags gfpflag_names[];
 
+extern int copy_page_lists_mthread(struct page **to,
+			struct page **from, int nr_pages);
+
 #endif	/* __MM_INTERNAL_H */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (6 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-24  8:25   ` Naoya Horiguchi
  2017-02-17 15:05 ` [RFC PATCH 09/14] mm: migrate: Add exchange_page_mthread() and exchange_page_lists_mthread() to exchange two pages or two page lists Zi Yan
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

Concurrent page migration moves a list of pages all together,
concurrently via multi-threaded. This is different from
existing page migration process which migrate pages sequentially.
Current implementation only migrates anonymous pages.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/migrate_mode.h   |   1 +
 include/uapi/linux/mempolicy.h |   1 +
 mm/migrate.c                   | 495 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 492 insertions(+), 5 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index d344ad60f499..2bd849d89122 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -13,6 +13,7 @@ enum migrate_mode {
 	MIGRATE_SYNC		= 1<<2,
 	MIGRATE_ST		= 1<<3,
 	MIGRATE_MT		= 1<<4,
+	MIGRATE_CONCUR		= 1<<5,
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 8f1db2e2d677..6d9758a32053 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -54,6 +54,7 @@ enum mpol_rebind_step {
 #define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
 #define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
 #define MPOL_MF_MOVE_MT  (1<<6)	/* Use multi-threaded page copy routine */
+#define MPOL_MF_MOVE_CONCUR  (1<<7)	/* Migrate a list of pages concurrently */
 
 #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
 			 MPOL_MF_MOVE     | 	\
diff --git a/mm/migrate.c b/mm/migrate.c
index 0e9b1f17cf8b..a35e6fd43a50 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -50,6 +50,14 @@
 
 int mt_page_copy = 0;
 
+
+struct page_migration_work_item {
+	struct page *old_page;
+	struct page *new_page;
+	struct anon_vma *anon_vma;
+	struct list_head list;
+};
+
 /*
  * migrate_prep() needs to be called before we start compiling a list of pages
  * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
@@ -1312,6 +1320,471 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	return rc;
 }
 
+static int __unmap_page_concur(struct page *page, struct page *newpage,
+				struct anon_vma **anon_vma,
+				int force, enum migrate_mode mode)
+{
+	int rc = -EAGAIN;
+
+	if (!trylock_page(page)) {
+		if (!force || mode == MIGRATE_ASYNC)
+			goto out;
+
+		/*
+		 * It's not safe for direct compaction to call lock_page.
+		 * For example, during page readahead pages are added locked
+		 * to the LRU. Later, when the IO completes the pages are
+		 * marked uptodate and unlocked. However, the queueing
+		 * could be merging multiple pages for one bio (e.g.
+		 * mpage_readpages). If an allocation happens for the
+		 * second or third page, the process can end up locking
+		 * the same page twice and deadlocking. Rather than
+		 * trying to be clever about what pages can be locked,
+		 * avoid the use of lock_page for direct compaction
+		 * altogether.
+		 */
+		if (current->flags & PF_MEMALLOC)
+			goto out;
+
+		lock_page(page);
+	}
+
+	/* We are working on page_mapping(page) == NULL */
+	VM_BUG_ON_PAGE(PageWriteback(page), page);
+
+	/*
+	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
+	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * This get_anon_vma() delays freeing anon_vma pointer until the end
+	 * of migration. File cache pages are no problem because of page_lock()
+	 * File Caches may use write_page() or lock_page() in migration, then,
+	 * just care Anon page here.
+	 *
+	 * Only page_get_anon_vma() understands the subtleties of
+	 * getting a hold on an anon_vma from outside one of its mms.
+	 * But if we cannot get anon_vma, then we won't need it anyway,
+	 * because that implies that the anon page is no longer mapped
+	 * (and cannot be remapped so long as we hold the page lock).
+	 */
+	if (PageAnon(page) && !PageKsm(page))
+		*anon_vma = page_get_anon_vma(page);
+
+	/*
+	 * Block others from accessing the new page when we get around to
+	 * establishing additional references. We are usually the only one
+	 * holding a reference to newpage at this point. We used to have a BUG
+	 * here if trylock_page(newpage) fails, but would like to allow for
+	 * cases where there might be a race with the previous use of newpage.
+	 * This is much like races on refcount of oldpage: just don't BUG().
+	 */
+	if (unlikely(!trylock_page(newpage)))
+		goto out_unlock;
+
+	/*
+	 * Corner case handling:
+	 * 1. When a new swap-cache page is read into, it is added to the LRU
+	 * and treated as swapcache but it has no rmap yet.
+	 * Calling try_to_unmap() against a page->mapping==NULL page will
+	 * trigger a BUG.  So handle it here.
+	 * 2. An orphaned page (see truncate_complete_page) might have
+	 * fs-private metadata. The page can be picked up due to memory
+	 * offlining.  Everywhere else except page reclaim, the page is
+	 * invisible to the vm, so the page can not be migrated.  So try to
+	 * free the metadata, so the page can be freed.
+	 */
+	if (!page->mapping) {
+		VM_BUG_ON_PAGE(PageAnon(page), page);
+		if (page_has_private(page)) {
+			try_to_free_buffers(page);
+			goto out_unlock_both;
+		}
+	} else {
+		VM_BUG_ON_PAGE(!page_mapped(page), page);
+		/* Establish migration ptes */
+		VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !*anon_vma,
+				page);
+		rc = try_to_unmap(page,
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+	}
+
+	return rc;
+
+out_unlock_both:
+	unlock_page(newpage);
+out_unlock:
+	/* Drop an anon_vma reference if we took one */
+	if (*anon_vma)
+		put_anon_vma(*anon_vma);
+	unlock_page(page);
+out:
+	return rc;
+}
+
+static int unmap_pages_and_get_new_concur(new_page_t get_new_page,
+				free_page_t put_new_page, unsigned long private,
+				struct page_migration_work_item *item,
+				int force,
+				enum migrate_mode mode, int reason)
+{
+	int rc = MIGRATEPAGE_SUCCESS;
+	int *result = NULL;
+
+	
+	item->new_page = get_new_page(item->old_page, private, &result);
+
+	if (!item->new_page) {
+		rc = -ENOMEM;
+		return rc;
+	}
+
+	if (page_count(item->old_page) == 1) {
+		rc = -ECANCELED;
+		goto out;
+	}
+
+	if (unlikely(PageTransHuge(item->old_page) &&
+		!PageTransHuge(item->new_page))) {
+		lock_page(item->old_page);
+		rc = split_huge_page(item->old_page);
+		unlock_page(item->old_page);
+		if (rc)
+			goto out;
+	}
+
+	rc = __unmap_page_concur(item->old_page, item->new_page, &item->anon_vma,
+							force, mode);
+	if (rc == MIGRATEPAGE_SUCCESS) {
+		put_new_page = NULL;
+		return rc;
+	}
+
+out:
+	if (rc != -EAGAIN) {
+		list_del(&item->old_page->lru);
+		dec_zone_page_state(item->old_page, NR_ISOLATED_ANON +
+				page_is_file_cache(item->old_page));
+
+		putback_lru_page(item->old_page);
+	}
+
+	/*
+	 * If migration was not successful and there's a freeing callback, use
+	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
+	 * during isolation.
+	 */
+	if (put_new_page)
+		put_new_page(item->new_page, private);
+	else
+		putback_lru_page(item->new_page);
+
+	if (result) {
+		if (rc)
+			*result = rc;
+		else
+			*result = page_to_nid(item->new_page);
+	}
+
+	return rc;
+}
+
+static int move_mapping_concurr(struct list_head *unmapped_list_ptr,
+					   struct list_head *wip_list_ptr,
+					   enum migrate_mode mode)
+{
+	struct page_migration_work_item *iterator, *iterator2;
+	struct address_space *mapping;
+
+	list_for_each_entry_safe(iterator, iterator2, unmapped_list_ptr, list) {
+		VM_BUG_ON_PAGE(!PageLocked(iterator->old_page), iterator->old_page);
+		VM_BUG_ON_PAGE(!PageLocked(iterator->new_page), iterator->new_page);
+
+		mapping = page_mapping(iterator->old_page);
+
+		VM_BUG_ON(mapping);
+
+		VM_BUG_ON(PageWriteback(iterator->old_page));
+
+		if (page_count(iterator->old_page) != 1) {
+			list_move(&iterator->list, wip_list_ptr);
+			continue;
+		}
+
+		iterator->new_page->index = iterator->old_page->index;
+		iterator->new_page->mapping = iterator->old_page->mapping;
+		if (PageSwapBacked(iterator->old_page))
+			SetPageSwapBacked(iterator->new_page);
+	}
+
+	return 0;
+}
+
+static void migrate_page_copy_page_flags(struct page *newpage, struct page *page)
+{
+	int cpupid;
+
+	if (PageError(page))
+		SetPageError(newpage);
+	if (PageReferenced(page))
+		SetPageReferenced(newpage);
+	if (PageUptodate(page))
+		SetPageUptodate(newpage);
+	if (TestClearPageActive(page)) {
+		VM_BUG_ON_PAGE(PageUnevictable(page), page);
+		SetPageActive(newpage);
+	} else if (TestClearPageUnevictable(page))
+		SetPageUnevictable(newpage);
+	if (PageChecked(page))
+		SetPageChecked(newpage);
+	if (PageMappedToDisk(page))
+		SetPageMappedToDisk(newpage);
+
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (PageDirty(page))
+		SetPageDirty(newpage);
+
+	if (page_is_young(page))
+		set_page_young(newpage);
+	if (page_is_idle(page))
+		set_page_idle(newpage);
+
+	/*
+	 * Copy NUMA information to the new page, to prevent over-eager
+	 * future migrations of this same page.
+	 */
+	cpupid = page_cpupid_xchg_last(page, -1);
+	page_cpupid_xchg_last(newpage, cpupid);
+
+	ksm_migrate_page(newpage, page);
+	/*
+	 * Please do not reorder this without considering how mm/ksm.c's
+	 * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache().
+	 */
+	if (PageSwapCache(page))
+		ClearPageSwapCache(page);
+	ClearPagePrivate(page);
+	set_page_private(page, 0);
+
+	/*
+	 * If any waiters have accumulated on the new page then
+	 * wake them up.
+	 */
+	if (PageWriteback(newpage))
+		end_page_writeback(newpage);
+
+	copy_page_owner(page, newpage);
+
+	mem_cgroup_migrate(page, newpage);
+}
+
+
+static int copy_to_new_pages_concur(struct list_head *unmapped_list_ptr,
+				enum migrate_mode mode)
+{
+	struct page_migration_work_item *iterator;
+	int num_pages = 0, idx = 0;
+	struct page **src_page_list = NULL, **dst_page_list = NULL;
+	unsigned long size = 0;
+	int rc = -EFAULT;
+
+	list_for_each_entry(iterator, unmapped_list_ptr, list) {
+		++num_pages;
+		size += PAGE_SIZE * hpage_nr_pages(iterator->old_page);
+	}
+
+	src_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
+	if (!src_page_list)
+		return -ENOMEM;
+	dst_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
+	if (!dst_page_list)
+		return -ENOMEM;
+
+	list_for_each_entry(iterator, unmapped_list_ptr, list) {
+		src_page_list[idx] = iterator->old_page;
+		dst_page_list[idx] = iterator->new_page;
+		++idx;
+	}
+
+	BUG_ON(idx != num_pages);
+	
+	if (mode & MIGRATE_MT)
+		rc = copy_page_lists_mthread(dst_page_list, src_page_list,
+							num_pages);
+
+	if (rc)
+		list_for_each_entry(iterator, unmapped_list_ptr, list) {
+			if (PageHuge(iterator->old_page) ||
+				PageTransHuge(iterator->old_page))
+				copy_huge_page(iterator->new_page, iterator->old_page, 0);
+			else
+				copy_highpage(iterator->new_page, iterator->old_page);
+		}
+
+	kfree(src_page_list);
+	kfree(dst_page_list);
+
+	list_for_each_entry(iterator, unmapped_list_ptr, list) {
+		migrate_page_copy_page_flags(iterator->new_page, iterator->old_page);
+	}
+
+	return 0;
+}
+
+static int remove_migration_ptes_concurr(struct list_head *unmapped_list_ptr)
+{
+	struct page_migration_work_item *iterator, *iterator2;
+
+	list_for_each_entry_safe(iterator, iterator2, unmapped_list_ptr, list) {
+		remove_migration_ptes(iterator->old_page, iterator->new_page, false);
+
+		unlock_page(iterator->new_page);
+
+		if (iterator->anon_vma)
+			put_anon_vma(iterator->anon_vma);
+
+		unlock_page(iterator->old_page);
+
+		list_del(&iterator->old_page->lru);
+		dec_zone_page_state(iterator->old_page, NR_ISOLATED_ANON +
+				page_is_file_cache(iterator->old_page));
+
+		putback_lru_page(iterator->old_page);
+		iterator->old_page = NULL;
+
+		putback_lru_page(iterator->new_page);
+		iterator->new_page = NULL;
+	}
+
+	return 0;
+}
+
+int migrate_pages_concur(struct list_head *from, new_page_t get_new_page,
+		free_page_t put_new_page, unsigned long private,
+		enum migrate_mode mode, int reason)
+{
+	int retry = 1;
+	int nr_failed = 0;
+	int nr_succeeded = 0;
+	int pass = 0;
+	struct page *page;
+	int swapwrite = current->flags & PF_SWAPWRITE;
+	int rc;
+	int total_num_pages = 0, idx;
+	struct page_migration_work_item *item_list;
+	struct page_migration_work_item *iterator, *iterator2;
+	int item_list_order = 0;
+
+	LIST_HEAD(wip_list);
+	LIST_HEAD(unmapped_list);
+	LIST_HEAD(serialized_list);
+	LIST_HEAD(failed_list);
+
+	if (!swapwrite)
+		current->flags |= PF_SWAPWRITE;
+
+	list_for_each_entry(page, from, lru)
+		++total_num_pages;
+
+	item_list_order = get_order(total_num_pages *
+		sizeof(struct page_migration_work_item));
+
+	if (item_list_order > MAX_ORDER) {
+		item_list = alloc_pages_exact(total_num_pages *
+			sizeof(struct page_migration_work_item), GFP_ATOMIC);
+		memset(item_list, 0, total_num_pages *
+			sizeof(struct page_migration_work_item));
+	} else {
+		item_list = (struct page_migration_work_item *)__get_free_pages(GFP_ATOMIC,
+						item_list_order);
+		memset(item_list, 0, PAGE_SIZE<<item_list_order);
+	}
+
+	idx = 0;
+	list_for_each_entry(page, from, lru) {
+		item_list[idx].old_page = page;
+		item_list[idx].new_page = NULL;
+		INIT_LIST_HEAD(&item_list[idx].list);
+		list_add_tail(&item_list[idx].list, &wip_list);
+		idx += 1;
+	}
+
+	for(pass = 0; pass < 1 && retry; pass++) {
+		retry = 0;
+
+		/* unmap and get new page for page_mapping(page) == NULL */
+		list_for_each_entry_safe(iterator, iterator2, &wip_list, list) {
+			cond_resched();
+
+			if (iterator->new_page)
+				continue;
+
+			/* We do not migrate huge pages, file-backed, or swapcached pages */
+			if (PageHuge(iterator->old_page))
+				rc = -ENODEV;
+			else if ((page_mapping(iterator->old_page) != NULL))
+				rc = -ENODEV;
+			else
+				rc = unmap_pages_and_get_new_concur(get_new_page, put_new_page,
+						private, iterator, pass > 2, mode,
+						reason);
+
+			switch(rc) {
+			case -ENODEV:
+				list_move(&iterator->list, &serialized_list);
+				break;
+			case -ENOMEM:
+				goto out;
+			case -EAGAIN:
+				retry++;
+				break;
+			case MIGRATEPAGE_SUCCESS:
+				list_move(&iterator->list, &unmapped_list);
+				nr_succeeded++;
+				break;
+			default:
+				/*
+				 * Permanent failure (-EBUSY, -ENOSYS, etc.):
+				 * unlike -EAGAIN case, the failed page is
+				 * removed from migration page list and not
+				 * retried in the next outer loop.
+				 */
+				list_move(&iterator->list, &failed_list);
+				nr_failed++;
+				break;
+			}
+		}
+		/* move page->mapping to new page, only -EAGAIN could happen  */
+		move_mapping_concurr(&unmapped_list, &wip_list, mode);
+		/* copy pages in unmapped_list */
+		copy_to_new_pages_concur(&unmapped_list, mode);
+		/* remove migration pte, if old_page is NULL?, unlock old and new
+		 * pages, put anon_vma, put old and new pages */
+		remove_migration_ptes_concurr(&unmapped_list);
+	}
+	nr_failed += retry;
+	rc = nr_failed;
+
+	if (!list_empty(&serialized_list))
+		rc = migrate_pages(from, get_new_page, put_new_page,
+				private, mode, reason);
+out:
+	if (nr_succeeded)
+		count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
+	if (nr_failed)
+		count_vm_events(PGMIGRATE_FAIL, nr_failed);
+	trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
+
+	if (item_list_order >= MAX_ORDER)
+		free_pages_exact(item_list, total_num_pages *
+			sizeof(struct page_migration_work_item));
+	else
+		free_pages((unsigned long)item_list, item_list_order);
+
+	if (!swapwrite)
+		current->flags &= ~PF_SWAPWRITE;
+
+	return rc;
+}
+
 /*
  * migrate_pages - migrate the pages specified in a list, to the free pages
  *		   supplied as the target for the page migration
@@ -1452,7 +1925,8 @@ static struct page *new_page_node(struct page *p, unsigned long private,
 static int do_move_page_to_node_array(struct mm_struct *mm,
 				      struct page_to_node *pm,
 				      int migrate_all,
-					  int migrate_use_mt)
+					  int migrate_use_mt,
+					  int migrate_concur)
 {
 	int err;
 	struct page_to_node *pp;
@@ -1536,8 +2010,16 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 
 	err = 0;
 	if (!list_empty(&pagelist)) {
-		err = migrate_pages(&pagelist, new_page_node, NULL,
-				(unsigned long)pm, mode, MR_SYSCALL);
+		if (migrate_concur)
+			err = migrate_pages_concur(&pagelist, new_page_node, NULL,
+					(unsigned long)pm,
+					mode,
+					MR_SYSCALL);
+		else
+			err = migrate_pages(&pagelist, new_page_node, NULL,
+					(unsigned long)pm,
+					mode,
+					MR_SYSCALL);
 		if (err)
 			putback_movable_pages(&pagelist);
 	}
@@ -1615,7 +2097,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		/* Migrate this chunk */
 		err = do_move_page_to_node_array(mm, pm,
 						 flags & MPOL_MF_MOVE_ALL,
-						 flags & MPOL_MF_MOVE_MT);
+						 flags & MPOL_MF_MOVE_MT,
+						 flags & MPOL_MF_MOVE_CONCUR);
 		if (err < 0)
 			goto out_pm;
 
@@ -1722,7 +2205,9 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
 	nodemask_t task_nodes;
 
 	/* Check flags */
-	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_MOVE_MT))
+	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|
+				  MPOL_MF_MOVE_MT|
+				  MPOL_MF_MOVE_CONCUR))
 		return -EINVAL;
 
 	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 09/14] mm: migrate: Add exchange_page_mthread() and exchange_page_lists_mthread() to exchange two pages or two page lists.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (7 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 10/14] mm: Add exchange_pages and exchange_pages_concur functions to exchange two lists of pages instead of two migrate_pages() Zi Yan
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

When some pages are going to migrated into a full memory node, instead
of two-step migrate_pages(), we use exchange_page_mthread() to exchange
two pages. This can save two unnecessary page allocations.

Current implmentation only supports anonymous pages.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/copy_pages.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/internal.h   |   5 +++
 2 files changed, 138 insertions(+)

diff --git a/mm/copy_pages.c b/mm/copy_pages.c
index 516c0a1a57f3..879e2d944ad0 100644
--- a/mm/copy_pages.c
+++ b/mm/copy_pages.c
@@ -146,3 +146,136 @@ int copy_page_lists_mthread(struct page **to, struct page **from, int nr_pages)
 
 	return err;
 }
+static void exchange_page_routine(char *to, char *from, unsigned long chunk_size)
+{
+	u64 tmp;
+	int i;
+
+	for (i = 0; i < chunk_size; i += sizeof(tmp)) {
+		tmp = *((u64*)(from + i));
+		*((u64*)(from + i)) = *((u64*)(to + i));
+		*((u64*)(to + i)) = tmp;
+	}
+}
+
+static void exchange_page_work_queue_thread(struct work_struct *work)
+{
+	struct copy_info *my_work = (struct copy_info*)work;
+
+	exchange_page_routine(my_work->to,
+					  my_work->from,
+					  my_work->chunk_size);
+}
+
+int exchange_page_mthread(struct page *to, struct page *from, int nr_pages)
+{
+	int total_mt_num = nr_copythreads;
+	int to_node = page_to_nid(to);
+	int i;
+	struct copy_info *work_items;
+	char *vto, *vfrom;
+	unsigned long chunk_size;
+	const struct cpumask *per_node_cpumask = cpumask_of_node(to_node);
+	int cpu_id_list[32] = {0};
+	int cpu;
+
+	work_items = kzalloc(sizeof(struct copy_info)*total_mt_num,
+						 GFP_KERNEL);
+	if (!work_items)
+		return -ENOMEM;
+
+	i = 0;
+	for_each_cpu(cpu, per_node_cpumask) {
+		if (i >= total_mt_num)
+			break;
+		cpu_id_list[i] = cpu;
+		++i;
+	}
+
+	vfrom = kmap(from);
+	vto = kmap(to);
+	chunk_size = PAGE_SIZE*nr_pages / total_mt_num;
+
+	for (i = 0; i < total_mt_num; ++i) {
+		INIT_WORK((struct work_struct *)&work_items[i],
+				exchange_page_work_queue_thread);
+
+		work_items[i].to = vto + i * chunk_size;
+		work_items[i].from = vfrom + i * chunk_size;
+		work_items[i].chunk_size = chunk_size;
+
+		queue_work_on(cpu_id_list[i],
+					  system_highpri_wq,
+					  (struct work_struct *)&work_items[i]);
+	}
+
+	/* Wait until it finishes  */
+	for (i = 0; i < total_mt_num; ++i)
+		flush_work((struct work_struct *) &work_items[i]);
+
+	kunmap(to);
+	kunmap(from);
+
+	kfree(work_items);
+
+	return 0;
+}
+
+int exchange_page_lists_mthread(struct page **to, struct page **from,
+		int nr_pages)
+{
+	int err = 0;
+	int total_mt_num = nr_copythreads;
+	int to_node = page_to_nid(*to);
+	int i;
+	struct copy_info *work_items;
+	int nr_pages_per_page = hpage_nr_pages(*from);
+	const struct cpumask *per_node_cpumask = cpumask_of_node(to_node);
+	int cpu_id_list[32] = {0};
+	int cpu;
+
+	work_items = kzalloc(sizeof(struct copy_info)*nr_pages,
+						 GFP_KERNEL);
+	if (!work_items)
+		return -ENOMEM;
+
+	total_mt_num = min_t(int, nr_pages, total_mt_num);
+
+	i = 0;
+	for_each_cpu(cpu, per_node_cpumask) {
+		if (i >= total_mt_num)
+			break;
+		cpu_id_list[i] = cpu;
+		++i;
+	}
+
+	for (i = 0; i < nr_pages; ++i) {
+		int thread_idx = i % total_mt_num;
+
+		INIT_WORK((struct work_struct *)&work_items[i],
+				exchange_page_work_queue_thread);
+
+		work_items[i].to = kmap(to[i]);
+		work_items[i].from = kmap(from[i]);
+		work_items[i].chunk_size = PAGE_SIZE * hpage_nr_pages(from[i]);
+
+		BUG_ON(nr_pages_per_page != hpage_nr_pages(from[i]));
+		BUG_ON(nr_pages_per_page != hpage_nr_pages(to[i]));
+
+
+		queue_work_on(cpu_id_list[thread_idx], system_highpri_wq, (struct work_struct *)&work_items[i]);
+	}
+
+	/* Wait until it finishes  */
+	for (i = 0; i < total_mt_num; ++i)
+		flush_work((struct work_struct *) &work_items[i]);
+
+	for (i = 0; i < nr_pages; ++i) {
+			kunmap(to[i]);
+			kunmap(from[i]);
+	}
+
+	kfree(work_items);
+
+	return err;
+}
diff --git a/mm/internal.h b/mm/internal.h
index 175e08ed524a..b99a634b4d09 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -501,4 +501,9 @@ extern const struct trace_print_flags gfpflag_names[];
 extern int copy_page_lists_mthread(struct page **to,
 			struct page **from, int nr_pages);
 
+extern int exchange_page_mthread(struct page *to, struct page *from,
+			int nr_pages);
+extern int exchange_page_lists_mthread(struct page **to,
+						  struct page **from, 
+						  int nr_pages);
 #endif	/* __MM_INTERNAL_H */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 10/14] mm: Add exchange_pages and exchange_pages_concur functions to exchange two lists of pages instead of two migrate_pages().
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (8 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 09/14] mm: migrate: Add exchange_page_mthread() and exchange_page_lists_mthread() to exchange two pages or two page lists Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 11/14] mm: migrate: Add exchange_pages syscall to exchange two page lists Zi Yan
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

This aims to reduce the overhead of two migrate_pages().

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/ksm.h |   5 +
 mm/Makefile         |   1 +
 mm/exchange.c       | 888 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/ksm.c            |  35 +++
 4 files changed, 929 insertions(+)
 create mode 100644 mm/exchange.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 481c8c4627ca..f2659be1984e 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -62,6 +62,7 @@ struct page *ksm_might_need_to_copy(struct page *page,
 
 int rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc);
 void ksm_migrate_page(struct page *newpage, struct page *oldpage);
+void ksm_exchange_page(struct page *to_page, struct page *from_page);
 
 #else  /* !CONFIG_KSM */
 
@@ -102,6 +103,10 @@ static inline int rmap_walk_ksm(struct page *page,
 static inline void ksm_migrate_page(struct page *newpage, struct page *oldpage)
 {
 }
+static inline void ksm_exchange_page(struct page *to_page,
+				struct page *from_page)
+{
+}
 #endif /* CONFIG_MMU */
 #endif /* !CONFIG_KSM */
 
diff --git a/mm/Makefile b/mm/Makefile
index cdd4bab9cc66..56afe2210746 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -44,6 +44,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
 obj-y += init-mm.o
 
 obj-y += copy_pages.o
+obj-y += exchange.o
 
 ifdef CONFIG_NO_BOOTMEM
 	obj-y		+= nobootmem.o
diff --git a/mm/exchange.c b/mm/exchange.c
new file mode 100644
index 000000000000..dfed26ebff47
--- /dev/null
+++ b/mm/exchange.c
@@ -0,0 +1,888 @@
+/*
+ * Exchange two in-use pages. Page flags and page->mapping are exchanged
+ * as well. Only anonymous pages are supported.
+ *
+ * Copyright (C) 2016 NVIDIA, Zi Yan <ziy@nvidia.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include <linux/syscalls.h>
+#include <linux/migrate.h>
+#include <linux/security.h>
+#include <linux/cpuset.h>
+#include <linux/hugetlb.h>
+#include <linux/mm_inline.h>
+#include <linux/page_idle.h>
+#include <linux/page-flags.h>
+#include <linux/ksm.h>
+#include <linux/memcontrol.h>
+#include <linux/balloon_compaction.h>
+#include <linux/buffer_head.h>
+
+
+#include "internal.h"
+
+/*
+ * Move a list of individual pages
+ */
+struct pages_to_node {
+	unsigned long from_addr;
+	int from_status;
+
+	unsigned long to_addr;
+	int to_status;
+};
+
+struct exchange_page_info {
+	struct page *from_page;
+	struct page *to_page;
+
+	struct anon_vma *from_anon_vma;
+	struct anon_vma *to_anon_vma;
+
+	struct list_head list;
+};
+
+struct page_flags {
+	unsigned int page_error :1;
+	unsigned int page_referenced:1;
+	unsigned int page_uptodate:1;
+	unsigned int page_active:1;
+	unsigned int page_unevictable:1;
+	unsigned int page_checked:1;
+	unsigned int page_mappedtodisk:1;
+	unsigned int page_dirty:1;
+	unsigned int page_is_young:1;
+	unsigned int page_is_idle:1;
+	unsigned int page_swapcache:1;
+	unsigned int page_writeback:1;
+	unsigned int page_private:1;
+	unsigned int __pad:3;
+};
+
+
+static void exchange_page(char *to, char *from)
+{
+	u64 tmp;
+	int i;
+
+	for (i = 0; i < PAGE_SIZE; i += sizeof(tmp)) {
+		tmp = *((u64*)(from + i));
+		*((u64*)(from + i)) = *((u64*)(to + i));
+		*((u64*)(to + i)) = tmp;
+	}
+}
+
+static inline void exchange_highpage(struct page *to, struct page *from)
+{
+	char *vfrom, *vto;
+
+	vfrom = kmap_atomic(from);
+	vto = kmap_atomic(to);
+	exchange_page(vto, vfrom);
+	kunmap_atomic(vto);
+	kunmap_atomic(vfrom);
+}
+
+static void __exchange_gigantic_page(struct page *dst, struct page *src,
+				int nr_pages)
+{
+	int i;
+	struct page *dst_base = dst;
+	struct page *src_base = src;
+
+	for (i = 0; i < nr_pages; ) {
+		cond_resched();
+		exchange_highpage(dst, src);
+
+		i++;
+		dst = mem_map_next(dst, dst_base, i);
+		src = mem_map_next(src, src_base, i);
+	}
+}
+
+static void exchange_huge_page(struct page *dst, struct page *src)
+{
+	int i;
+	int nr_pages;
+
+	if (PageHuge(src)) {
+		/* hugetlbfs page */
+		struct hstate *h = page_hstate(src);
+		nr_pages = pages_per_huge_page(h);
+
+		if (unlikely(nr_pages > MAX_ORDER_NR_PAGES)) {
+			__exchange_gigantic_page(dst, src, nr_pages);
+			return;
+		}
+	} else {
+		/* thp page */
+		BUG_ON(!PageTransHuge(src));
+		nr_pages = hpage_nr_pages(src);
+	}
+
+	for (i = 0; i < nr_pages; i++) {
+		cond_resched();
+		exchange_highpage(dst + i, src + i);
+	}
+}
+
+/*
+ * Copy the page to its new location without polluting cache
+ */
+static void exchange_page_flags(struct page *to_page, struct page *from_page)
+{
+	int from_cpupid, to_cpupid;
+	struct page_flags from_page_flags, to_page_flags;
+	struct mem_cgroup *to_memcg = page_memcg(to_page),
+					  *from_memcg = page_memcg(from_page);
+
+	from_cpupid = page_cpupid_xchg_last(from_page, -1);
+
+	from_page_flags.page_error = TestClearPageError(from_page);
+	from_page_flags.page_referenced = TestClearPageReferenced(from_page);
+	from_page_flags.page_uptodate = PageUptodate(from_page);
+	ClearPageUptodate(from_page);
+	from_page_flags.page_active = TestClearPageActive(from_page);
+	from_page_flags.page_unevictable = TestClearPageUnevictable(from_page);
+	from_page_flags.page_checked = PageChecked(from_page);
+	ClearPageChecked(from_page);
+	from_page_flags.page_mappedtodisk = PageMappedToDisk(from_page);
+	ClearPageMappedToDisk(from_page);
+	from_page_flags.page_dirty = PageDirty(from_page);
+	ClearPageDirty(from_page);
+	from_page_flags.page_is_young = test_and_clear_page_young(from_page);
+	from_page_flags.page_is_idle = page_is_idle(from_page);
+	clear_page_idle(from_page);
+	from_page_flags.page_swapcache = PageSwapCache(from_page);
+	from_page_flags.page_private = PagePrivate(from_page);
+	ClearPagePrivate(from_page);
+	from_page_flags.page_writeback = test_clear_page_writeback(from_page);
+
+
+	to_cpupid = page_cpupid_xchg_last(to_page, -1);
+
+	to_page_flags.page_error = TestClearPageError(to_page);
+	to_page_flags.page_referenced = TestClearPageReferenced(to_page);
+	to_page_flags.page_uptodate = PageUptodate(to_page);
+	ClearPageUptodate(to_page);
+	to_page_flags.page_active = TestClearPageActive(to_page);
+	to_page_flags.page_unevictable = TestClearPageUnevictable(to_page);
+	to_page_flags.page_checked = PageChecked(to_page);
+	ClearPageChecked(to_page);
+	to_page_flags.page_mappedtodisk = PageMappedToDisk(to_page);
+	ClearPageMappedToDisk(to_page);
+	to_page_flags.page_dirty = PageDirty(to_page);
+	ClearPageDirty(to_page);
+	to_page_flags.page_is_young = test_and_clear_page_young(to_page);
+	to_page_flags.page_is_idle = page_is_idle(to_page);
+	clear_page_idle(to_page);
+	to_page_flags.page_swapcache = PageSwapCache(to_page);
+	to_page_flags.page_private = PagePrivate(to_page);
+	ClearPagePrivate(to_page);
+	to_page_flags.page_writeback = test_clear_page_writeback(to_page);
+
+	/* set to_page */
+	if (from_page_flags.page_error)
+		SetPageError(to_page);
+	if (from_page_flags.page_referenced)
+		SetPageReferenced(to_page);
+	if (from_page_flags.page_uptodate)
+		SetPageUptodate(to_page);
+	if (from_page_flags.page_active) {
+		VM_BUG_ON_PAGE(from_page_flags.page_unevictable, from_page);
+		SetPageActive(to_page);
+	} else if (from_page_flags.page_unevictable)
+		SetPageUnevictable(to_page);
+	if (from_page_flags.page_checked)
+		SetPageChecked(to_page);
+	if (from_page_flags.page_mappedtodisk)
+		SetPageMappedToDisk(to_page);
+
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (from_page_flags.page_dirty)
+		SetPageDirty(to_page);
+
+	if (from_page_flags.page_is_young)
+		set_page_young(to_page);
+	if (from_page_flags.page_is_idle)
+		set_page_idle(to_page);
+
+	/* set from_page */
+	if (to_page_flags.page_error)
+		SetPageError(from_page);
+	if (to_page_flags.page_referenced)
+		SetPageReferenced(from_page);
+	if (to_page_flags.page_uptodate)
+		SetPageUptodate(from_page);
+	if (to_page_flags.page_active) {
+		VM_BUG_ON_PAGE(to_page_flags.page_unevictable, from_page);
+		SetPageActive(from_page);
+	} else if (to_page_flags.page_unevictable)
+		SetPageUnevictable(from_page);
+	if (to_page_flags.page_checked)
+		SetPageChecked(from_page);
+	if (to_page_flags.page_mappedtodisk)
+		SetPageMappedToDisk(from_page);
+
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (to_page_flags.page_dirty)
+		SetPageDirty(from_page);
+
+	if (to_page_flags.page_is_young)
+		set_page_young(from_page);
+	if (to_page_flags.page_is_idle)
+		set_page_idle(from_page);
+
+	/*
+	 * Copy NUMA information to the new page, to prevent over-eager
+	 * future migrations of this same page.
+	 */
+	page_cpupid_xchg_last(to_page, from_cpupid);
+	page_cpupid_xchg_last(from_page, to_cpupid);
+
+	ksm_exchange_page(to_page, from_page);
+	/*
+	 * Please do not reorder this without considering how mm/ksm.c's
+	 * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache().
+	 */
+	ClearPageSwapCache(to_page);
+	ClearPageSwapCache(from_page);
+	if (from_page_flags.page_swapcache)
+		SetPageSwapCache(to_page);
+	if (to_page_flags.page_swapcache)
+		SetPageSwapCache(from_page);
+
+
+#ifdef CONFIG_PAGE_OWNER
+	/* exchange page owner  */
+	BUG();
+#endif
+	/* exchange mem cgroup  */
+	to_page->mem_cgroup = from_memcg;
+	from_page->mem_cgroup = to_memcg;
+
+}
+
+/*
+ * Replace the page in the mapping.
+ *
+ * The number of remaining references must be:
+ * 1 for anonymous pages without a mapping
+ * 2 for pages with a mapping
+ * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
+ */
+
+static int exchange_page_move_mapping(struct address_space *to_mapping,
+			struct address_space *from_mapping,
+			struct page *to_page, struct page *from_page,
+			enum migrate_mode mode,
+			int to_extra_count, int from_extra_count)
+{
+	int to_expected_count = 1 + to_extra_count,
+		from_expected_count = 1 + from_extra_count;
+	unsigned long from_page_index = page_index(from_page),
+				  to_page_index = page_index(to_page);
+	int to_swapbacked = PageSwapBacked(to_page),
+		from_swapbacked = PageSwapBacked(from_page);
+	struct address_space *to_mapping_value = to_page->mapping,
+						 *from_mapping_value = from_page->mapping;
+
+
+	if (!to_mapping) {
+		/* Anonymous page without mapping */
+		if (page_count(to_page) != to_expected_count)
+			return -EAGAIN;
+	}
+
+	if (!from_mapping) {
+		/* Anonymous page without mapping */
+		if (page_count(from_page) != from_expected_count)
+			return -EAGAIN;
+	}
+
+	/*
+	 * Now we know that no one else is looking at the page:
+	 * no turning back from here.
+	 */
+	/* from_page  */
+	from_page->index = to_page_index;
+	from_page->mapping = to_mapping_value;
+
+	ClearPageSwapBacked(from_page);
+	if (to_swapbacked)
+		SetPageSwapBacked(from_page);
+
+
+	/* to_page  */
+	to_page->index = from_page_index;
+	to_page->mapping = from_mapping_value;
+
+	ClearPageSwapBacked(to_page);
+	if (from_swapbacked)
+		SetPageSwapBacked(to_page);
+
+	return MIGRATEPAGE_SUCCESS;
+}
+
+static int exchange_from_to_pages(struct page *to_page, struct page *from_page,
+				enum migrate_mode mode)
+{
+	int rc = -EBUSY;
+	struct address_space *to_page_mapping, *from_page_mapping;
+
+	VM_BUG_ON_PAGE(!PageLocked(from_page), from_page);
+	VM_BUG_ON_PAGE(!PageLocked(to_page), to_page);
+
+	/* copy page->mapping not use page_mapping()  */
+	to_page_mapping = page_mapping(to_page);
+	from_page_mapping = page_mapping(from_page);
+
+	BUG_ON(from_page_mapping);
+	BUG_ON(to_page_mapping);
+
+	BUG_ON(PageWriteback(from_page));
+	BUG_ON(PageWriteback(to_page));
+
+	/* actual page mapping exchange */
+	rc = exchange_page_move_mapping(to_page_mapping, from_page_mapping,
+						to_page, from_page, mode, 0, 0);
+	/* actual page data exchange  */
+	if (rc != MIGRATEPAGE_SUCCESS)
+		return rc;
+
+	rc = -EFAULT;
+
+	if (mode & MIGRATE_MT)
+		rc = exchange_page_mthread(to_page, from_page,
+				hpage_nr_pages(from_page));
+	if (rc) {
+		if (PageHuge(from_page) || PageTransHuge(from_page))
+			exchange_huge_page(to_page, from_page);
+		else
+			exchange_highpage(to_page, from_page);
+		rc = 0;
+	}
+
+	exchange_page_flags(to_page, from_page);
+
+	return rc;
+}
+
+static int unmap_and_exchange_anon(struct page *from_page, struct page *to_page,
+				enum migrate_mode mode)
+{
+	int rc = -EAGAIN;
+	int from_page_was_mapped = 0, to_page_was_mapped = 0;
+	struct anon_vma *anon_vma_from_page = NULL, *anon_vma_to_page = NULL;
+
+	/* from_page lock down  */
+	if (!trylock_page(from_page)) {
+		if (mode & MIGRATE_ASYNC)
+			goto out;
+
+		lock_page(from_page);
+	}
+
+	BUG_ON(PageWriteback(from_page));
+
+	/*
+	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
+	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * This get_anon_vma() delays freeing anon_vma pointer until the end
+	 * of migration. File cache pages are no problem because of page_lock()
+	 * File Caches may use write_page() or lock_page() in migration, then,
+	 * just care Anon page here.
+	 *
+	 * Only page_get_anon_vma() understands the subtleties of
+	 * getting a hold on an anon_vma from outside one of its mms.
+	 * But if we cannot get anon_vma, then we won't need it anyway,
+	 * because that implies that the anon page is no longer mapped
+	 * (and cannot be remapped so long as we hold the page lock).
+	 */
+	if (PageAnon(from_page) && !PageKsm(from_page))
+		anon_vma_from_page = page_get_anon_vma(from_page);
+
+	/* to_page lock down  */
+	if (!trylock_page(to_page)) {
+		if (mode & MIGRATE_ASYNC)
+			goto out_unlock;
+
+		lock_page(to_page);
+	}
+
+	BUG_ON(PageWriteback(to_page));
+
+	/*
+	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
+	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * This get_anon_vma() delays freeing anon_vma pointer until the end
+	 * of migration. File cache pages are no problem because of page_lock()
+	 * File Caches may use write_page() or lock_page() in migration, then,
+	 * just care Anon page here.
+	 *
+	 * Only page_get_anon_vma() understands the subtleties of
+	 * getting a hold on an anon_vma from outside one of its mms.
+	 * But if we cannot get anon_vma, then we won't need it anyway,
+	 * because that implies that the anon page is no longer mapped
+	 * (and cannot be remapped so long as we hold the page lock).
+	 */
+	if (PageAnon(to_page) && !PageKsm(to_page))
+		anon_vma_to_page = page_get_anon_vma(to_page);
+
+	/*
+	 * Corner case handling:
+	 * 1. When a new swap-cache page is read into, it is added to the LRU
+	 * and treated as swapcache but it has no rmap yet.
+	 * Calling try_to_unmap() against a page->mapping==NULL page will
+	 * trigger a BUG.  So handle it here.
+	 * 2. An orphaned page (see truncate_complete_page) might have
+	 * fs-private metadata. The page can be picked up due to memory
+	 * offlining.  Everywhere else except page reclaim, the page is
+	 * invisible to the vm, so the page can not be migrated.  So try to
+	 * free the metadata, so the page can be freed.
+	 */
+	if (!from_page->mapping) {
+		VM_BUG_ON_PAGE(PageAnon(from_page), from_page);
+		if (page_has_private(from_page)) {
+			try_to_free_buffers(from_page);
+			goto out_unlock_both;
+		}
+	} else if (page_mapped(from_page)) {
+		/* Establish migration ptes */
+		VM_BUG_ON_PAGE(PageAnon(from_page) && !PageKsm(from_page) &&
+					   !anon_vma_from_page, from_page);
+		try_to_unmap(from_page,
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+		from_page_was_mapped = 1;
+	}
+
+	if (!to_page->mapping) {
+		VM_BUG_ON_PAGE(PageAnon(to_page), to_page);
+		if (page_has_private(to_page)) {
+			try_to_free_buffers(to_page);
+			goto out_unlock_both;
+		}
+	} else if (page_mapped(to_page)) {
+		/* Establish migration ptes */
+		VM_BUG_ON_PAGE(PageAnon(to_page) && !PageKsm(to_page) &&
+					   !anon_vma_to_page, to_page);
+		try_to_unmap(to_page,
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+		to_page_was_mapped = 1;
+	}
+
+	if (!page_mapped(from_page) && !page_mapped(to_page))
+		rc = exchange_from_to_pages(to_page, from_page, mode);
+
+	if (from_page_was_mapped)
+		remove_migration_ptes(from_page,
+			rc == MIGRATEPAGE_SUCCESS ? to_page : from_page, false);
+
+	if (to_page_was_mapped)
+		remove_migration_ptes(to_page,
+			rc == MIGRATEPAGE_SUCCESS ? from_page : to_page, false);
+
+
+out_unlock_both:
+	if (anon_vma_to_page)
+		put_anon_vma(anon_vma_to_page);
+	unlock_page(to_page);
+out_unlock:
+	/* Drop an anon_vma reference if we took one */
+	if (anon_vma_from_page)
+		put_anon_vma(anon_vma_from_page);
+	unlock_page(from_page);
+out:
+
+	return rc;
+}
+
+/*
+ * Exchange pages in the exchange_list
+ *
+ * Caller should release the exchange_list resource.
+ *
+ * */
+static int exchange_pages(struct list_head *exchange_list,
+			enum migrate_mode mode,
+			int reason)
+{
+	struct exchange_page_info *one_pair, *one_pair2;
+	int failed = 0;
+
+	list_for_each_entry_safe(one_pair, one_pair2, exchange_list, list) {
+		struct page *from_page = one_pair->from_page;
+		struct page *to_page = one_pair->to_page;
+		int rc;
+
+		if ((page_mapping(from_page) != NULL) ||
+			(page_mapping(to_page) != NULL)) {
+			++failed;
+			goto putback;
+		}
+
+		
+		rc = unmap_and_exchange_anon(from_page, to_page, mode);
+
+		if (rc != MIGRATEPAGE_SUCCESS)
+			++failed;
+
+putback:
+		putback_lru_page(from_page);
+		putback_lru_page(to_page);
+
+	}
+	return failed;
+}
+
+
+static int unmap_pair_pages_concur(struct exchange_page_info *one_pair,
+				int force, enum migrate_mode mode)
+{
+	int rc = -EAGAIN;
+	struct anon_vma *anon_vma_from_page = NULL, *anon_vma_to_page = NULL;
+	struct page *from_page = one_pair->from_page;
+	struct page *to_page = one_pair->to_page;
+
+	/* from_page lock down  */
+	if (!trylock_page(from_page)) {
+		if (!force || (mode & MIGRATE_ASYNC))
+			goto out;
+
+		lock_page(from_page);
+	}
+
+	BUG_ON(PageWriteback(from_page));
+
+	/*
+	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
+	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * This get_anon_vma() delays freeing anon_vma pointer until the end
+	 * of migration. File cache pages are no problem because of page_lock()
+	 * File Caches may use write_page() or lock_page() in migration, then,
+	 * just care Anon page here.
+	 *
+	 * Only page_get_anon_vma() understands the subtleties of
+	 * getting a hold on an anon_vma from outside one of its mms.
+	 * But if we cannot get anon_vma, then we won't need it anyway,
+	 * because that implies that the anon page is no longer mapped
+	 * (and cannot be remapped so long as we hold the page lock).
+	 */
+	if (PageAnon(from_page) && !PageKsm(from_page))
+		one_pair->from_anon_vma = anon_vma_from_page
+					= page_get_anon_vma(from_page);
+
+	/* to_page lock down  */
+	if (!trylock_page(to_page)) {
+		if (!force || (mode & MIGRATE_ASYNC))
+			goto out_unlock;
+
+		lock_page(to_page);
+	}
+
+	BUG_ON(PageWriteback(to_page));
+
+	/*
+	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
+	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * This get_anon_vma() delays freeing anon_vma pointer until the end
+	 * of migration. File cache pages are no problem because of page_lock()
+	 * File Caches may use write_page() or lock_page() in migration, then,
+	 * just care Anon page here.
+	 *
+	 * Only page_get_anon_vma() understands the subtleties of
+	 * getting a hold on an anon_vma from outside one of its mms.
+	 * But if we cannot get anon_vma, then we won't need it anyway,
+	 * because that implies that the anon page is no longer mapped
+	 * (and cannot be remapped so long as we hold the page lock).
+	 */
+	if (PageAnon(to_page) && !PageKsm(to_page))
+		one_pair->to_anon_vma = anon_vma_to_page = page_get_anon_vma(to_page);
+
+	/*
+	 * Corner case handling:
+	 * 1. When a new swap-cache page is read into, it is added to the LRU
+	 * and treated as swapcache but it has no rmap yet.
+	 * Calling try_to_unmap() against a page->mapping==NULL page will
+	 * trigger a BUG.  So handle it here.
+	 * 2. An orphaned page (see truncate_complete_page) might have
+	 * fs-private metadata. The page can be picked up due to memory
+	 * offlining.  Everywhere else except page reclaim, the page is
+	 * invisible to the vm, so the page can not be migrated.  So try to
+	 * free the metadata, so the page can be freed.
+	 */
+	if (!from_page->mapping) {
+		VM_BUG_ON_PAGE(PageAnon(from_page), from_page);
+		if (page_has_private(from_page)) {
+			try_to_free_buffers(from_page);
+			goto out_unlock_both;
+		}
+	} else {
+		VM_BUG_ON_PAGE(!page_mapped(from_page), from_page);
+		/* Establish migration ptes */
+		VM_BUG_ON_PAGE(PageAnon(from_page) && !PageKsm(from_page) &&
+					   !anon_vma_from_page, from_page);
+		rc = try_to_unmap(from_page,
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+	}
+
+	if (!to_page->mapping) {
+		VM_BUG_ON_PAGE(PageAnon(to_page), to_page);
+		if (page_has_private(to_page)) {
+			try_to_free_buffers(to_page);
+			goto out_unlock_both;
+		}
+	} else {
+		VM_BUG_ON_PAGE(!page_mapped(to_page), to_page);
+		/* Establish migration ptes */
+		VM_BUG_ON_PAGE(PageAnon(to_page) && !PageKsm(to_page) &&
+					   !anon_vma_to_page, to_page);
+		rc = try_to_unmap(to_page,
+			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
+	}
+
+	return rc;
+
+out_unlock_both:
+	if (anon_vma_to_page)
+		put_anon_vma(anon_vma_to_page);
+	unlock_page(to_page);
+out_unlock:
+	/* Drop an anon_vma reference if we took one */
+	if (anon_vma_from_page)
+		put_anon_vma(anon_vma_from_page);
+	unlock_page(from_page);
+out:
+
+	return rc;
+}
+
+static int exchange_page_mapping_concur(struct list_head *unmapped_list_ptr,
+					   struct list_head *exchange_list_ptr,
+						enum migrate_mode mode)
+{
+	int rc = -EBUSY;
+	int nr_failed = 0;
+	struct address_space *to_page_mapping, *from_page_mapping;
+	struct exchange_page_info *one_pair, *one_pair2;
+
+	list_for_each_entry_safe(one_pair, one_pair2, unmapped_list_ptr, list) {
+		struct page *from_page = one_pair->from_page;
+		struct page *to_page = one_pair->to_page;
+
+		VM_BUG_ON_PAGE(!PageLocked(from_page), from_page);
+		VM_BUG_ON_PAGE(!PageLocked(to_page), to_page);
+
+		/* copy page->mapping not use page_mapping()  */
+		to_page_mapping = page_mapping(to_page);
+		from_page_mapping = page_mapping(from_page);
+
+		BUG_ON(from_page_mapping);
+		BUG_ON(to_page_mapping);
+
+		BUG_ON(PageWriteback(from_page));
+		BUG_ON(PageWriteback(to_page));
+
+		/* actual page mapping exchange */
+		rc = exchange_page_move_mapping(to_page_mapping, from_page_mapping,
+							to_page, from_page, mode, 0, 0);
+
+		if (rc) {
+			list_move(&one_pair->list, exchange_list_ptr);
+			++nr_failed;
+		}
+	}
+
+	return nr_failed;
+}
+
+static int exchange_page_data_concur(struct list_head *unmapped_list_ptr,
+									enum migrate_mode mode)
+{
+	struct exchange_page_info *one_pair;
+	int num_pages = 0, idx = 0;
+	struct page **src_page_list = NULL, **dst_page_list = NULL;
+	unsigned long size = 0;
+	int rc = -EFAULT;
+
+	/* form page list  */
+	list_for_each_entry(one_pair, unmapped_list_ptr, list) {
+		++num_pages;
+		size += PAGE_SIZE * hpage_nr_pages(one_pair->from_page);
+	}
+
+	src_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
+	if (!src_page_list)
+		return -ENOMEM;
+	dst_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
+	if (!dst_page_list)
+		return -ENOMEM;
+
+	list_for_each_entry(one_pair, unmapped_list_ptr, list) {
+		src_page_list[idx] = one_pair->from_page;
+		dst_page_list[idx] = one_pair->to_page;
+		++idx;
+	}
+
+	BUG_ON(idx != num_pages);
+
+
+	if (mode & MIGRATE_MT)
+		rc = exchange_page_lists_mthread(dst_page_list, src_page_list,
+				num_pages);
+
+	if (rc) {
+		list_for_each_entry(one_pair, unmapped_list_ptr, list) {
+			if (PageHuge(one_pair->from_page) ||
+				PageTransHuge(one_pair->from_page)) {
+				exchange_huge_page(one_pair->to_page, one_pair->from_page);
+			} else {
+				exchange_highpage(one_pair->to_page, one_pair->from_page);
+			}
+		}
+	}
+
+	kfree(src_page_list);
+	kfree(dst_page_list);
+
+	list_for_each_entry(one_pair, unmapped_list_ptr, list) {
+		exchange_page_flags(one_pair->to_page, one_pair->from_page);
+	}
+	
+	return rc;
+}
+
+static int remove_migration_ptes_concur(struct list_head *unmapped_list_ptr)
+{
+	struct exchange_page_info *iterator;
+
+	list_for_each_entry(iterator, unmapped_list_ptr, list) {
+		remove_migration_ptes(iterator->from_page, iterator->to_page, false);
+		remove_migration_ptes(iterator->to_page, iterator->from_page, false);
+
+		unlock_page(iterator->from_page);
+
+		if (iterator->from_anon_vma)
+			put_anon_vma(iterator->from_anon_vma);
+
+		unlock_page(iterator->to_page);
+
+		if (iterator->to_anon_vma)
+			put_anon_vma(iterator->to_anon_vma);
+
+
+		putback_lru_page(iterator->from_page);
+		iterator->from_page = NULL;
+
+		putback_lru_page(iterator->to_page);
+		iterator->to_page = NULL;
+	}
+
+	return 0;
+}
+
+static int exchange_pages_concur(struct list_head *exchange_list,
+		enum migrate_mode mode, int reason)
+{
+	struct exchange_page_info *one_pair, *one_pair2;
+	int pass = 0;
+	int retry = 1;
+	int nr_failed = 0;
+	int nr_succeeded = 0;
+	int rc = 0;
+	LIST_HEAD(serialized_list);
+	LIST_HEAD(unmapped_list);
+
+	for(pass = 0; pass < 10 && retry; pass++) {
+		retry = 0;
+
+		/* unmap and get new page for page_mapping(page) == NULL */
+		list_for_each_entry_safe(one_pair, one_pair2, exchange_list, list) {
+			cond_resched();
+
+		/* We do not exchange huge pages and file-backed pages concurrently */
+			if (PageHuge(one_pair->from_page) || PageHuge(one_pair->to_page)) {
+				rc = -ENODEV;
+			}
+			else if ((page_mapping(one_pair->from_page) != NULL) ||
+					 (page_mapping(one_pair->from_page) != NULL)) {
+				rc = -ENODEV;
+			}
+			else
+				rc = unmap_pair_pages_concur(one_pair,
+											pass > 2,
+											mode);
+
+			switch(rc) {
+			case -ENODEV:
+				list_move(&one_pair->list, &serialized_list);
+				break;
+			case -ENOMEM:
+				goto out;
+			case -EAGAIN:
+				retry++;
+				break;
+			case MIGRATEPAGE_SUCCESS:
+				list_move(&one_pair->list, &unmapped_list);
+				nr_succeeded++;
+				break;
+			default:
+				/*
+				 * Permanent failure (-EBUSY, -ENOSYS, etc.):
+				 * unlike -EAGAIN case, the failed page is
+				 * removed from migration page list and not
+				 * retried in the next outer loop.
+				 */
+				list_move(&one_pair->list, &serialized_list);
+				nr_failed++;
+				break;
+			}
+		}
+
+		/* move page->mapping to new page, only -EAGAIN could happen  */
+		exchange_page_mapping_concur(&unmapped_list, exchange_list, mode);
+
+
+		/* copy pages in unmapped_list */
+		exchange_page_data_concur(&unmapped_list, mode);
+
+
+		/* remove migration pte, if old_page is NULL?, unlock old and new
+		 * pages, put anon_vma, put old and new pages */
+		remove_migration_ptes_concur(&unmapped_list);
+	}
+
+	nr_failed += retry;
+	rc = nr_failed;
+
+	list_for_each_entry_safe(one_pair, one_pair2, &serialized_list, list) {
+		struct page *from_page = one_pair->from_page;
+		struct page *to_page = one_pair->to_page;
+		int rc;
+
+		if ((page_mapping(from_page) != NULL) ||
+			(page_mapping(to_page) != NULL)) {
+			++nr_failed;
+			goto putback;
+		}
+
+		
+		rc = unmap_and_exchange_anon(from_page, to_page, mode);
+
+		if (rc != MIGRATEPAGE_SUCCESS)
+			++nr_failed;
+
+putback:
+
+		putback_lru_page(from_page);
+		putback_lru_page(to_page);
+
+	}
+out:
+	list_splice(&unmapped_list, exchange_list);
+	list_splice(&serialized_list, exchange_list);
+
+	return nr_failed?-EFAULT:0;
+}
diff --git a/mm/ksm.c b/mm/ksm.c
index 2e129f0e1919..5dc47f630d57 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2013,6 +2013,41 @@ void ksm_migrate_page(struct page *newpage, struct page *oldpage)
 		set_page_stable_node(oldpage, NULL);
 	}
 }
+
+void ksm_exchange_page(struct page *to_page, struct page *from_page)
+{
+	struct stable_node *to_stable_node, *from_stable_node;
+
+	VM_BUG_ON_PAGE(!PageLocked(to_page), to_page);
+	VM_BUG_ON_PAGE(!PageLocked(from_page), from_page);
+
+	to_stable_node = page_stable_node(to_page);
+	from_stable_node = page_stable_node(from_page);
+	if (to_stable_node) {
+		VM_BUG_ON_PAGE(to_stable_node->kpfn != page_to_pfn(from_page),
+					from_page);
+		to_stable_node->kpfn = page_to_pfn(from_page);
+		/*
+		 * newpage->mapping was set in advance; now we need smp_wmb()
+		 * to make sure that the new stable_node->kpfn is visible
+		 * to get_ksm_page() before it can see that oldpage->mapping
+		 * has gone stale (or that PageSwapCache has been cleared).
+		 */
+		smp_wmb();
+	}
+	if (from_stable_node) {
+		VM_BUG_ON_PAGE(from_stable_node->kpfn != page_to_pfn(to_page),
+					to_page);
+		from_stable_node->kpfn = page_to_pfn(to_page);
+		/*
+		 * newpage->mapping was set in advance; now we need smp_wmb()
+		 * to make sure that the new stable_node->kpfn is visible
+		 * to get_ksm_page() before it can see that oldpage->mapping
+		 * has gone stale (or that PageSwapCache has been cleared).
+		 */
+		smp_wmb();
+	}
+}
 #endif /* CONFIG_MIGRATION */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 11/14] mm: migrate: Add exchange_pages syscall to exchange two page lists.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (9 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 10/14] mm: Add exchange_pages and exchange_pages_concur functions to exchange two lists of pages instead of two migrate_pages() Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 12/14] migrate: Add copy_page_dma to use DMA Engine to copy pages Zi Yan
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

This can save calling two move_pages().

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 arch/x86/entry/syscalls/syscall_64.tbl |   2 +
 include/linux/syscalls.h               |   5 +
 mm/exchange.c                          | 369 +++++++++++++++++++++++++++++++++
 3 files changed, 376 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index e93ef0b38db8..944f94781f18 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,8 @@
 330	common	pkey_alloc		sys_pkey_alloc
 331	common	pkey_free		sys_pkey_free
 
+330	64	exchange_pages		sys_exchange_pages
+
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
 # for native 64-bit operation.
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 91a740f6b884..c87310440228 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -736,6 +736,11 @@ asmlinkage long sys_move_pages(pid_t pid, unsigned long nr_pages,
 				const int __user *nodes,
 				int __user *status,
 				int flags);
+asmlinkage long sys_exchange_pages(pid_t pid, unsigned long nr_pages,
+				const void __user * __user *from_pages,
+				const void __user * __user *to_pages,
+				int __user *status,
+				int flags);
 asmlinkage long sys_mbind(unsigned long start, unsigned long len,
 				unsigned long mode,
 				const unsigned long __user *nmask,
diff --git a/mm/exchange.c b/mm/exchange.c
index dfed26ebff47..c513fb502725 100644
--- a/mm/exchange.c
+++ b/mm/exchange.c
@@ -886,3 +886,372 @@ static int exchange_pages_concur(struct list_head *exchange_list,
 
 	return nr_failed?-EFAULT:0;
 }
+
+/*
+ * Move a set of pages as indicated in the pm array. The addr
+ * field must be set to the virtual address of the page to be moved
+ * and the node number must contain a valid target node.
+ * The pm array ends with node = MAX_NUMNODES.
+ */
+static int do_exchange_page_array(struct mm_struct *mm,
+				      struct pages_to_node *pm,
+					  int migrate_all,
+					  int migrate_use_mt,
+					  int migrate_batch)
+{
+	int err;
+	struct pages_to_node *pp;
+	LIST_HEAD(err_page_list);
+	LIST_HEAD(exchange_page_list);
+	enum migrate_mode mode = MIGRATE_SYNC;
+
+	if (migrate_use_mt)
+		mode |= MIGRATE_MT;
+
+
+	down_read(&mm->mmap_sem);
+
+	/*
+	 * Build a list of pages to migrate
+	 */
+	for (pp = pm; pp->from_addr != 0 && pp->to_addr != 0; pp++) {
+		struct vm_area_struct *from_vma, *to_vma;
+		struct page *from_page, *to_page;
+		unsigned int follflags;
+		bool isolated = false;
+
+		err = -EFAULT;
+		from_vma = find_vma(mm, pp->from_addr);
+		if (!from_vma || 
+			pp->from_addr < from_vma->vm_start || 
+			!vma_migratable(from_vma))
+			goto set_from_status;
+
+		/* FOLL_DUMP to ignore special (like zero) pages */
+		follflags = FOLL_GET | FOLL_SPLIT | FOLL_DUMP;
+		if (thp_migration_supported())
+			follflags &= ~FOLL_SPLIT;
+		from_page = follow_page(from_vma, pp->from_addr, follflags);
+
+		err = PTR_ERR(from_page);
+		if (IS_ERR(from_page))
+			goto set_from_status;
+
+		err = -ENOENT;
+		if (!from_page)
+			goto set_from_status;
+
+		err = -EACCES;
+		if (page_mapcount(from_page) > 1 &&
+				!migrate_all)
+			goto put_and_set_from_page;
+
+		if (PageHuge(from_page)) {
+			if (PageHead(from_page)) 
+				if (isolate_huge_page(from_page, &err_page_list)) {
+					err = 0;
+					isolated = true;
+				}
+			goto put_and_set_from_page;
+		} else if (PageTransCompound(from_page)) {
+			if (PageTail(from_page)) {
+				err = -EACCES;
+				goto put_and_set_from_page;
+			}
+		}
+
+		err = isolate_lru_page(from_page);
+		if (!err) {
+			list_add_tail(&from_page->lru, &err_page_list);
+			inc_zone_page_state(from_page, NR_ISOLATED_ANON +
+					    page_is_file_cache(from_page));
+			isolated = true;
+		}
+put_and_set_from_page:
+		/*
+		 * Either remove the duplicate refcount from
+		 * isolate_lru_page() or drop the page ref if it was
+		 * not isolated.
+		 *
+		 * Since FOLL_GET calls get_page(), and isolate_lru_page()
+		 * also calls get_page()
+		 */
+		put_page(from_page);
+set_from_status:
+		pp->from_status = err;
+
+		if (err)
+			continue;
+
+		/* to pages  */
+		isolated = false;
+		err = -EFAULT;
+		to_vma = find_vma(mm, pp->to_addr);
+		if (!to_vma || 
+			pp->to_addr < to_vma->vm_start || 
+			!vma_migratable(to_vma))
+			goto set_to_status;
+
+		/* FOLL_DUMP to ignore special (like zero) pages */
+		follflags = FOLL_GET | FOLL_SPLIT | FOLL_DUMP;
+		if (thp_migration_supported())
+			follflags &= ~FOLL_SPLIT;
+		to_page = follow_page(to_vma, pp->to_addr, follflags);
+
+		err = PTR_ERR(to_page);
+		if (IS_ERR(to_page))
+			goto set_to_status;
+
+		err = -ENOENT;
+		if (!to_page)
+			goto set_to_status;
+
+		err = -EACCES;
+		if (page_mapcount(to_page) > 1 &&
+				!migrate_all)
+			goto put_and_set_to_page;
+
+		if (PageHuge(to_page)) {
+			if (PageHead(to_page)) 
+				if (isolate_huge_page(to_page, &err_page_list)) {
+					err = 0;
+					isolated = true;
+				}
+			goto put_and_set_to_page;
+		} else if (PageTransCompound(to_page)) {
+			if (PageTail(to_page)) {
+				err = -EACCES;
+				goto put_and_set_to_page;
+			}
+		}
+
+		err = isolate_lru_page(to_page);
+		if (!err) {
+			list_add_tail(&to_page->lru, &err_page_list);
+			inc_zone_page_state(to_page, NR_ISOLATED_ANON +
+					    page_is_file_cache(to_page));
+			isolated = true;
+		}
+put_and_set_to_page:
+		/*
+		 * Either remove the duplicate refcount from
+		 * isolate_lru_page() or drop the page ref if it was
+		 * not isolated.
+		 *
+		 * Since FOLL_GET calls get_page(), and isolate_lru_page()
+		 * also calls get_page()
+		 */
+		put_page(to_page);
+set_to_status:
+		pp->to_status = err;
+
+
+		if (!err) {
+			if ((PageHuge(from_page) != PageHuge(to_page)) ||
+				(PageTransHuge(from_page) != PageTransHuge(to_page))) {
+				pp->to_status = -EFAULT;
+				continue;
+			} else {
+				struct exchange_page_info *one_pair = 
+					kzalloc(sizeof(struct exchange_page_info), GFP_ATOMIC);
+				if (!one_pair) {
+					err = -ENOMEM;
+					break;
+				}
+
+
+				list_del(&from_page->lru);
+				list_del(&to_page->lru);
+
+				one_pair->from_page = from_page;
+				one_pair->to_page = to_page;
+
+				list_add_tail(&one_pair->list, &exchange_page_list);
+			}
+		}
+
+	}
+
+	/* 
+	 * Put back previous isolated pages back
+	 *
+	 * For those not isolated, put_page() should take care of them.
+	 *
+	 * */
+	if (!list_empty(&err_page_list)) {
+		putback_movable_pages(&err_page_list);
+	}
+
+	err = 0;
+	if (!list_empty(&exchange_page_list)) {
+		if (migrate_batch) 
+			err = exchange_pages_concur(&exchange_page_list, mode, MR_SYSCALL);
+		else
+			err = exchange_pages(&exchange_page_list, mode, MR_SYSCALL);
+	}
+
+	while (!list_empty(&exchange_page_list)) {
+		struct exchange_page_info *one_pair = 
+			list_first_entry(&exchange_page_list, 
+							 struct exchange_page_info, list);
+
+		list_del(&one_pair->list);
+		kfree(one_pair);
+	}
+
+	up_read(&mm->mmap_sem);
+
+	return err;
+}
+/*
+ * Migrate an array of page address onto an array of nodes and fill
+ * the corresponding array of status.
+ */
+static int do_pages_exchange(struct mm_struct *mm, nodemask_t task_nodes,
+			 unsigned long nr_pages,
+			 const void __user * __user *from_pages,
+			 const void __user * __user *to_pages,
+			 int __user *status, int flags)
+{
+	struct pages_to_node *pm;
+	unsigned long chunk_nr_pages;
+	unsigned long chunk_start;
+	int err;
+
+	err = -ENOMEM;
+	pm = (struct pages_to_node *)__get_free_page(GFP_KERNEL);
+	if (!pm)
+		goto out;
+
+	migrate_prep();
+
+	/*
+	 * Store a chunk of pages_to_node array in a page,
+	 * but keep the last one as a marker
+	 */
+	chunk_nr_pages = (PAGE_SIZE / sizeof(struct pages_to_node)) - 1;
+
+	for (chunk_start = 0;
+	     chunk_start < nr_pages;
+	     chunk_start += chunk_nr_pages) {
+		int j;
+
+		if (chunk_start + chunk_nr_pages > nr_pages)
+			chunk_nr_pages = nr_pages - chunk_start;
+
+		/* fill the chunk pm with addrs and nodes from user-space */
+		for (j = 0; j < chunk_nr_pages; j++) {
+			const void __user *p;
+
+			err = -EFAULT;
+			if (get_user(p, from_pages + j + chunk_start))
+				goto out_pm;
+			pm[j].from_addr = (unsigned long) p;
+
+			if (get_user(p, to_pages + j + chunk_start))
+				goto out_pm;
+			pm[j].to_addr = (unsigned long) p;
+
+		}
+
+
+		/* End marker for this chunk */
+		pm[chunk_nr_pages].from_addr = pm[chunk_nr_pages].to_addr = 0;
+
+		/* Migrate this chunk */
+		err = do_exchange_page_array(mm, pm,
+						 flags & MPOL_MF_MOVE_ALL,
+						 flags & MPOL_MF_MOVE_MT,
+						 flags & MPOL_MF_MOVE_CONCUR);
+		if (err < 0)
+			goto out_pm;
+
+		/* Return status information */
+		for (j = 0; j < chunk_nr_pages; j++)
+			if (put_user(pm[j].to_status, status + j + chunk_start)) {
+				err = -EFAULT;
+				goto out_pm;
+			}
+	}
+	err = 0;
+
+out_pm:
+	free_page((unsigned long)pm);
+
+out:
+	return err;
+}
+
+
+
+
+SYSCALL_DEFINE6(exchange_pages, pid_t, pid, unsigned long, nr_pages,
+		const void __user * __user *, from_pages,
+		const void __user * __user *, to_pages,
+		int __user *, status, int, flags)
+{
+	const struct cred *cred = current_cred(), *tcred;
+	struct task_struct *task;
+	struct mm_struct *mm;
+	int err;
+	nodemask_t task_nodes;
+
+	/* Check flags */
+	if (flags & ~(MPOL_MF_MOVE|
+				  MPOL_MF_MOVE_ALL|
+				  MPOL_MF_MOVE_MT|
+				  MPOL_MF_MOVE_CONCUR))
+		return -EINVAL;
+
+	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
+		return -EPERM;
+
+	/* Find the mm_struct */
+	rcu_read_lock();
+	task = pid ? find_task_by_vpid(pid) : current;
+	if (!task) {
+		rcu_read_unlock();
+		return -ESRCH;
+	}
+	get_task_struct(task);
+
+	/*
+	 * Check if this process has the right to modify the specified
+	 * process. The right exists if the process has administrative
+	 * capabilities, superuser privileges or the same
+	 * userid as the target process.
+	 */
+	tcred = __task_cred(task);
+	if (!uid_eq(cred->euid, tcred->suid) && !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->uid,  tcred->suid) && !uid_eq(cred->uid,  tcred->uid) &&
+	    !capable(CAP_SYS_NICE)) {
+		rcu_read_unlock();
+		err = -EPERM;
+		goto out;
+	}
+	rcu_read_unlock();
+
+ 	err = security_task_movememory(task);
+ 	if (err)
+		goto out;
+
+	task_nodes = cpuset_mems_allowed(task);
+	mm = get_task_mm(task);
+	put_task_struct(task);
+
+	if (!mm)
+		return -EINVAL;
+
+
+	err = do_pages_exchange(mm, task_nodes, nr_pages, from_pages,
+				    to_pages, status, flags);
+
+	mmput(mm);
+
+	return err;
+
+out:
+	put_task_struct(task);
+
+	return err;
+}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 12/14] migrate: Add copy_page_dma to use DMA Engine to copy pages.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (10 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 11/14] mm: migrate: Add exchange_pages syscall to exchange two page lists Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 13/14] mm: migrate: Add copy_page_dma into migrate_page_copy Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 14/14] mm: Add copy_page_lists_dma_always to support copy a list of pages Zi Yan
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

vm.use_all_dma_chans will grab all usable DMA channels
vm.limit_dma_chans will limit how many DMA channels in use

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/highmem.h      |   1 +
 include/linux/sched/sysctl.h |   4 +
 kernel/sysctl.c              |  21 ++++
 mm/copy_pages.c              | 281 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 307 insertions(+)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index e1f4f1b82812..1388ff5d0e53 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -237,6 +237,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
 #endif
 
 int copy_pages_mthread(struct page *to, struct page *from, int nr_pages);
+int copy_page_dma(struct page *to, struct page *from, int nr_pages);
 
 static inline void copy_highpage(struct page *to, struct page *from)
 {
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 22db1e63707e..d5efb4093386 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -78,4 +78,8 @@ extern int sysctl_schedstats(struct ctl_table *table, int write,
 				 void __user *buffer, size_t *lenp,
 				 loff_t *ppos);
 
+extern int sysctl_dma_page_migration(struct ctl_table *table, int write,
+				 void __user *buffer, size_t *lenp,
+				 loff_t *ppos);
+
 #endif /* _SCHED_SYSCTL_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 70a654146519..55c812c313b8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -99,6 +99,10 @@
 
 extern int mt_page_copy;
 
+extern int use_all_dma_chans;
+extern int limit_dma_chans;
+
+
 /* External variables not in a header file. */
 extern int suid_dumpable;
 #ifdef CONFIG_COREDUMP
@@ -1372,6 +1376,23 @@ static struct ctl_table vm_table[] = {
 		.extra2		= &one,
 	},
 	 {
+		.procname	= "use_all_dma_chans",
+		.data		= &use_all_dma_chans,
+		.maxlen		= sizeof(use_all_dma_chans),
+		.mode		= 0644,
+		.proc_handler	= sysctl_dma_page_migration,
+		.extra1		= &zero,
+		.extra2		= &one,
+	 },
+	 {
+		.procname	= "limit_dma_chans",
+		.data		= &limit_dma_chans,
+		.maxlen		= sizeof(limit_dma_chans),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+	 },
+	 {
 		.procname	= "hugetlb_shm_group",
 		.data		= &sysctl_hugetlb_shm_group,
 		.maxlen		= sizeof(gid_t),
diff --git a/mm/copy_pages.c b/mm/copy_pages.c
index 879e2d944ad0..f135bf505183 100644
--- a/mm/copy_pages.c
+++ b/mm/copy_pages.c
@@ -10,7 +10,16 @@
 #include <linux/workqueue.h>
 #include <linux/slab.h>
 #include <linux/freezer.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
 
+#define NUM_AVAIL_DMA_CHAN 16
+
+int use_all_dma_chans = 0;
+int limit_dma_chans = NUM_AVAIL_DMA_CHAN;
+
+struct dma_chan *copy_chan[NUM_AVAIL_DMA_CHAN] = {0};
+struct dma_device *copy_dev[NUM_AVAIL_DMA_CHAN] = {0};
 /*
  * nr_copythreads can be the highest number of threads for given node
  * on any architecture. The actual number of copy threads will be
@@ -279,3 +288,275 @@ int exchange_page_lists_mthread(struct page **to, struct page **from,
 
 	return err;
 }
+
+#ifdef CONFIG_PROC_SYSCTL
+int sysctl_dma_page_migration(struct ctl_table *table, int write,
+				 void __user *buffer, size_t *lenp,
+				 loff_t *ppos)
+{
+	int err = 0;
+	int use_all_dma_chans_prior_val = use_all_dma_chans;
+	dma_cap_mask_t copy_mask;
+
+	if (write && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (err < 0)
+		return err;
+	if (write) {
+		/* Grab all DMA channels  */
+		if (use_all_dma_chans_prior_val == 0 && use_all_dma_chans == 1) {
+			int i;
+
+			dma_cap_zero(copy_mask);
+			dma_cap_set(DMA_MEMCPY, copy_mask);
+
+			dmaengine_get();
+			for (i = 0; i < NUM_AVAIL_DMA_CHAN; ++i) {
+				if (!copy_chan[i])
+					copy_chan[i] = dma_request_channel(copy_mask, NULL, NULL);
+				if (!copy_chan[i]) {
+					pr_err("%s: cannot grab channel: %d\n", __func__, i);
+					continue;
+				}
+
+				copy_dev[i] = copy_chan[i]->device;
+
+				if (!copy_dev[i]) {
+					pr_err("%s: no device: %d\n", __func__, i);
+					continue;
+				}
+			}
+
+		} 
+		/* Release all DMA channels  */
+		else if (use_all_dma_chans_prior_val == 1 && use_all_dma_chans == 0) {
+			int i;
+
+			for (i = 0; i < NUM_AVAIL_DMA_CHAN; ++i) {
+				if (copy_chan[i]) {
+					dma_release_channel(copy_chan[i]);
+					copy_chan[i] = NULL;
+					copy_dev[i] = NULL;
+				}
+			}
+
+			dmaengine_put();
+		}
+
+		if (err)
+			use_all_dma_chans = use_all_dma_chans_prior_val;
+	}
+	return err;
+}
+
+#endif
+
+static int copy_page_dma_once(struct page *to, struct page *from, int nr_pages)
+{
+	static struct dma_chan *copy_chan = NULL;
+	struct dma_device *device = NULL;
+	struct dma_async_tx_descriptor *tx = NULL;
+	dma_cookie_t cookie;
+	enum dma_ctrl_flags flags = 0;
+	struct dmaengine_unmap_data *unmap = NULL;
+	dma_cap_mask_t mask;
+	int ret_val = 0;
+
+	
+	dma_cap_zero(mask);
+	dma_cap_set(DMA_MEMCPY, mask);
+
+	dmaengine_get();
+
+	copy_chan = dma_request_channel(mask, NULL, NULL);
+
+	if (!copy_chan) {
+		pr_err("%s: cannot get a channel\n", __func__);
+		ret_val = -1;
+		goto no_chan;
+	}
+
+	device = copy_chan->device;
+
+	if (!device) {
+		pr_err("%s: cannot get a device\n", __func__);
+		ret_val = -2;
+		goto release;
+	}
+		
+	unmap = dmaengine_get_unmap_data(device->dev, 2, GFP_NOWAIT);
+
+	if (!unmap) {
+		pr_err("%s: cannot get unmap data\n", __func__);
+		ret_val = -3;
+		goto release;
+	}
+
+	unmap->to_cnt = 1;
+	unmap->addr[0] = dma_map_page(device->dev, from, 0, PAGE_SIZE*nr_pages,
+					  DMA_TO_DEVICE);
+	unmap->from_cnt = 1;
+	unmap->addr[1] = dma_map_page(device->dev, to, 0, PAGE_SIZE*nr_pages,
+					  DMA_FROM_DEVICE);
+	unmap->len = PAGE_SIZE*nr_pages;
+
+	tx = device->device_prep_dma_memcpy(copy_chan, 
+						unmap->addr[1],
+						unmap->addr[0], unmap->len,
+						flags);
+
+	if (!tx) {
+		pr_err("%s: null tx descriptor\n", __func__);
+		ret_val = -4;
+		goto unmap_dma;
+	}
+
+	cookie = tx->tx_submit(tx);
+
+	if (dma_submit_error(cookie)) {
+		pr_err("%s: submission error\n", __func__);
+		ret_val = -5;
+		goto unmap_dma;
+	}
+
+	if (dma_sync_wait(copy_chan, cookie) != DMA_COMPLETE) {
+		pr_err("%s: dma does not complete properly\n", __func__);
+		ret_val = -6;
+	}
+
+unmap_dma:
+	dmaengine_unmap_put(unmap);
+release:
+	if (copy_chan) {
+		dma_release_channel(copy_chan);
+	}
+no_chan:
+	dmaengine_put();
+
+	return ret_val;
+}
+
+static int copy_page_dma_always(struct page *to, struct page *from, int nr_pages)
+{
+	struct dma_async_tx_descriptor *tx[NUM_AVAIL_DMA_CHAN] = {0};
+	dma_cookie_t cookie[NUM_AVAIL_DMA_CHAN];
+	enum dma_ctrl_flags flags[NUM_AVAIL_DMA_CHAN] = {0};
+	struct dmaengine_unmap_data *unmap[NUM_AVAIL_DMA_CHAN] = {0};
+	int ret_val = 0;
+	int total_available_chans = NUM_AVAIL_DMA_CHAN;
+	int i;
+	size_t page_offset;
+
+	for (i = 0; i < NUM_AVAIL_DMA_CHAN; ++i) {
+		if (!copy_chan[i])
+			total_available_chans = i;
+	}
+	if (total_available_chans != NUM_AVAIL_DMA_CHAN) {
+		pr_err("%d channels are missing", NUM_AVAIL_DMA_CHAN - total_available_chans);
+	}
+
+	total_available_chans = min_t(int, total_available_chans, limit_dma_chans);
+
+	/* round down to closest 2^x value  */
+	total_available_chans = 1<<ilog2(total_available_chans);
+
+	if ((nr_pages != 1) && (nr_pages % total_available_chans != 0))
+		return -EFAULT;
+	
+	for (i = 0; i < total_available_chans; ++i) {
+		unmap[i] = dmaengine_get_unmap_data(copy_dev[i]->dev, 2, GFP_NOWAIT);
+		if (!unmap[i]) {
+			pr_err("%s: no unmap data at chan %d\n", __func__, i);
+			ret_val = -EFAULT;
+			goto unmap_dma;
+		}
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		if (nr_pages == 1) {
+			page_offset = PAGE_SIZE / total_available_chans;
+
+			unmap[i]->to_cnt = 1;
+			unmap[i]->addr[0] = dma_map_page(copy_dev[i]->dev, from, page_offset*i,
+							  page_offset,
+							  DMA_TO_DEVICE);
+			unmap[i]->from_cnt = 1;
+			unmap[i]->addr[1] = dma_map_page(copy_dev[i]->dev, to, page_offset*i,
+							  page_offset,
+							  DMA_FROM_DEVICE);
+			unmap[i]->len = page_offset;
+		} else {
+			page_offset = nr_pages / total_available_chans;
+
+			unmap[i]->to_cnt = 1;
+			unmap[i]->addr[0] = dma_map_page(copy_dev[i]->dev, 
+								from + page_offset*i, 
+								0,
+								PAGE_SIZE*page_offset,
+								DMA_TO_DEVICE);
+			unmap[i]->from_cnt = 1;
+			unmap[i]->addr[1] = dma_map_page(copy_dev[i]->dev, 
+								to + page_offset*i, 
+								0,
+								PAGE_SIZE*page_offset,
+								DMA_FROM_DEVICE);
+			unmap[i]->len = PAGE_SIZE*page_offset;
+		}
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		tx[i] = copy_dev[i]->device_prep_dma_memcpy(copy_chan[i], 
+							unmap[i]->addr[1],
+							unmap[i]->addr[0], 
+							unmap[i]->len,
+							flags[i]);
+		if (!tx[i]) {
+			pr_err("%s: no tx descriptor at chan %d\n", __func__, i);
+			ret_val = -EFAULT;
+			goto unmap_dma;
+		}
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		cookie[i] = tx[i]->tx_submit(tx[i]);
+
+		if (dma_submit_error(cookie[i])) {
+			pr_err("%s: submission error at chan %d\n", __func__, i);
+			ret_val = -EFAULT;
+			goto unmap_dma;
+		}
+					
+		dma_async_issue_pending(copy_chan[i]);
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		if (dma_sync_wait(copy_chan[i], cookie[i]) != DMA_COMPLETE) {
+			ret_val = -EFAULT;
+			pr_err("%s: dma does not complete at chan %d\n", __func__, i);
+		}
+	}
+
+unmap_dma:
+
+	for (i = 0; i < total_available_chans; ++i) {
+		if (unmap[i])
+			dmaengine_unmap_put(unmap[i]);
+	}
+
+	return ret_val;
+}
+
+int copy_page_dma(struct page *to, struct page *from, int nr_pages)
+{
+	BUG_ON(hpage_nr_pages(from) != nr_pages);
+	BUG_ON(hpage_nr_pages(to) != nr_pages);
+
+	if (!use_all_dma_chans) {
+		return copy_page_dma_once(to, from, nr_pages);
+	} 
+
+	return copy_page_dma_always(to, from, nr_pages);
+}
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 13/14] mm: migrate: Add copy_page_dma into migrate_page_copy.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (11 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 12/14] migrate: Add copy_page_dma to use DMA Engine to copy pages Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  2017-02-17 15:05 ` [RFC PATCH 14/14] mm: Add copy_page_lists_dma_always to support copy a list of pages Zi Yan
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

Fallback to copy_highpage when it fails.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/migrate_mode.h   |  1 +
 include/uapi/linux/mempolicy.h |  1 +
 mm/migrate.c                   | 27 +++++++++++++++++++--------
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 2bd849d89122..798737d0a0bc 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -14,6 +14,7 @@ enum migrate_mode {
 	MIGRATE_ST		= 1<<3,
 	MIGRATE_MT		= 1<<4,
 	MIGRATE_CONCUR		= 1<<5,
+	MIGRATE_DMA			= 1<<6,
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 6d9758a32053..bf40534cc93a 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -55,6 +55,7 @@ enum mpol_rebind_step {
 #define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
 #define MPOL_MF_MOVE_MT  (1<<6)	/* Use multi-threaded page copy routine */
 #define MPOL_MF_MOVE_CONCUR  (1<<7)	/* Migrate a list of pages concurrently */
+#define MPOL_MF_MOVE_DMA (1<<8)	/* Use DMA based page copy routine */
 
 #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
 			 MPOL_MF_MOVE     | 	\
diff --git a/mm/migrate.c b/mm/migrate.c
index a35e6fd43a50..464bc9ba8083 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -634,6 +634,9 @@ static void copy_huge_page(struct page *dst, struct page *src,
 	if (mode & MIGRATE_MT)
 		rc = copy_pages_mthread(dst, src, nr_pages);
 
+	if (rc && (mode & MIGRATE_DMA))
+		rc = copy_page_dma(dst, src, nr_pages);
+
 	if (rc)
 		for (i = 0; i < nr_pages; i++) {
 			cond_resched();
@@ -648,16 +651,18 @@ void migrate_page_copy(struct page *newpage, struct page *page,
 					   enum migrate_mode mode)
 {
 	int cpupid;
+	int rc = -EFAULT;
 
 	if (PageHuge(page) || PageTransHuge(page)) {
 		copy_huge_page(newpage, page, mode);
 	} else {
-		if (mode & MIGRATE_MT) {
-			if (copy_pages_mthread(newpage, page, 1))
-				copy_highpage(newpage, page);
-		} else {
+		if (mode & MIGRATE_DMA)
+			rc = copy_page_dma(newpage, page, 1);
+		else if (mode & MIGRATE_MT)
+			rc = copy_pages_mthread(newpage, page, 1);
+
+		if (rc)
 			copy_highpage(newpage, page);
-		}
 	}
 
 	if (PageError(page))
@@ -1926,7 +1931,8 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 				      struct page_to_node *pm,
 				      int migrate_all,
 					  int migrate_use_mt,
-					  int migrate_concur)
+					  int migrate_concur,
+					  int migrate_use_dma)
 {
 	int err;
 	struct page_to_node *pp;
@@ -1936,6 +1942,9 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 	if (migrate_use_mt)
 		mode |= MIGRATE_MT;
 
+	if (migrate_use_dma)
+		mode |= MIGRATE_DMA;
+
 	down_read(&mm->mmap_sem);
 
 	/*
@@ -2098,7 +2107,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		err = do_move_page_to_node_array(mm, pm,
 						 flags & MPOL_MF_MOVE_ALL,
 						 flags & MPOL_MF_MOVE_MT,
-						 flags & MPOL_MF_MOVE_CONCUR);
+						 flags & MPOL_MF_MOVE_CONCUR,
+						 flags & MPOL_MF_MOVE_DMA);
 		if (err < 0)
 			goto out_pm;
 
@@ -2207,7 +2217,8 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
 	/* Check flags */
 	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|
 				  MPOL_MF_MOVE_MT|
-				  MPOL_MF_MOVE_CONCUR))
+				  MPOL_MF_MOVE_CONCUR|
+				  MPOL_MF_MOVE_DMA))
 		return -EINVAL;
 
 	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC PATCH 14/14] mm: Add copy_page_lists_dma_always to support copy a list of pages.
  2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
                   ` (12 preceding siblings ...)
  2017-02-17 15:05 ` [RFC PATCH 13/14] mm: migrate: Add copy_page_dma into migrate_page_copy Zi Yan
@ 2017-02-17 15:05 ` Zi Yan
  13 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-17 15:05 UTC (permalink / raw)
  To: linux-mm; +Cc: dnellans, apopple, paulmck, khandual, zi.yan

From: Zi Yan <ziy@nvidia.com>

Both src and dst page lists should match the page size at each
page and the length of both lists is shared.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/copy_pages.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/internal.h   |   3 ++
 mm/migrate.c    |   5 +-
 3 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/mm/copy_pages.c b/mm/copy_pages.c
index f135bf505183..cf674840a830 100644
--- a/mm/copy_pages.c
+++ b/mm/copy_pages.c
@@ -560,3 +560,161 @@ int copy_page_dma(struct page *to, struct page *from, int nr_pages)
 
 	return copy_page_dma_always(to, from, nr_pages);
 }
+
+/*
+ * Use DMA copy a list of pages to a new location
+ *
+ * Just put each page into individual DMA channel.
+ *
+ * */
+int copy_page_lists_dma_always(struct page **to, struct page **from, int nr_pages)
+{
+	struct dma_async_tx_descriptor **tx = NULL;
+	dma_cookie_t *cookie = NULL;
+	enum dma_ctrl_flags flags[NUM_AVAIL_DMA_CHAN] = {0};
+	struct dmaengine_unmap_data *unmap[NUM_AVAIL_DMA_CHAN] = {0};
+	int ret_val = 0;
+	int total_available_chans = NUM_AVAIL_DMA_CHAN;
+	int i;
+
+	for (i = 0; i < NUM_AVAIL_DMA_CHAN; ++i) {
+		if (!copy_chan[i])
+			total_available_chans = i;
+	}
+	if (total_available_chans != NUM_AVAIL_DMA_CHAN)
+		pr_err("%d channels are missing\n", NUM_AVAIL_DMA_CHAN - total_available_chans);
+
+	if (limit_dma_chans < total_available_chans)
+		total_available_chans = limit_dma_chans;
+
+	/* round down to closest 2^x value  */
+	total_available_chans = 1<<ilog2(total_available_chans);
+
+	total_available_chans = min_t(int, total_available_chans, nr_pages);
+
+
+	tx = kzalloc(sizeof(struct dma_async_tx_descriptor*)*nr_pages, GFP_KERNEL);
+	if (!tx) {
+		ret_val = -ENOMEM;
+		goto out;
+	}
+	cookie = kzalloc(sizeof(dma_cookie_t)*nr_pages, GFP_KERNEL);
+	if (!cookie) {
+		ret_val = -ENOMEM;
+		goto out_free_tx;
+	}
+
+	
+	for (i = 0; i < total_available_chans; ++i) {
+		int num_xfer_per_dev = nr_pages / total_available_chans;
+		
+		if (i < (nr_pages % total_available_chans))
+			num_xfer_per_dev += 1;
+
+		unmap[i] = dmaengine_get_unmap_data(copy_dev[i]->dev, 
+						2*num_xfer_per_dev, GFP_NOWAIT);
+		if (!unmap[i]) {
+			pr_err("%s: no unmap data at chan %d\n", __func__, i);
+			ret_val = -ENODEV;
+			goto unmap_dma;
+		}
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		int num_xfer_per_dev = nr_pages / total_available_chans;
+		int xfer_idx;
+		
+		if (i < (nr_pages % total_available_chans))
+			num_xfer_per_dev += 1;
+
+		unmap[i]->to_cnt = num_xfer_per_dev;
+		unmap[i]->from_cnt = num_xfer_per_dev;
+		unmap[i]->len = hpage_nr_pages(from[i]) * PAGE_SIZE; 
+
+		for (xfer_idx = 0; xfer_idx < num_xfer_per_dev; ++xfer_idx) {
+			int page_idx = i + xfer_idx * total_available_chans;
+			size_t page_len = hpage_nr_pages(from[page_idx]) * PAGE_SIZE;
+
+			BUG_ON(page_len != hpage_nr_pages(to[page_idx]) * PAGE_SIZE);
+			BUG_ON(unmap[i]->len != page_len);
+
+			unmap[i]->addr[xfer_idx] = 
+				 dma_map_page(copy_dev[i]->dev, from[page_idx], 
+							  0,
+							  page_len,
+							  DMA_TO_DEVICE);
+
+			unmap[i]->addr[xfer_idx+num_xfer_per_dev] = 
+				 dma_map_page(copy_dev[i]->dev, to[page_idx], 
+							  0,
+							  page_len,
+							  DMA_FROM_DEVICE);
+		}
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		int num_xfer_per_dev = nr_pages / total_available_chans;
+		int xfer_idx;
+		
+		if (i < (nr_pages % total_available_chans))
+			num_xfer_per_dev += 1;
+
+		for (xfer_idx = 0; xfer_idx < num_xfer_per_dev; ++xfer_idx) {
+			int page_idx = i + xfer_idx * total_available_chans;
+
+			tx[page_idx] = copy_dev[i]->device_prep_dma_memcpy(copy_chan[i], 
+								unmap[i]->addr[xfer_idx + num_xfer_per_dev],
+								unmap[i]->addr[xfer_idx], 
+								unmap[i]->len,
+								flags[i]);
+			if (!tx[page_idx]) {
+				pr_err("%s: no tx descriptor at chan %d xfer %d\n", 
+					   __func__, i, xfer_idx);
+				ret_val = -ENODEV;
+				goto unmap_dma;
+			}
+
+			cookie[page_idx] = tx[page_idx]->tx_submit(tx[page_idx]);
+
+			if (dma_submit_error(cookie[page_idx])) {
+				pr_err("%s: submission error at chan %d xfer %d\n",
+					   __func__, i, xfer_idx);
+				ret_val = -ENODEV;
+				goto unmap_dma;
+			}
+		}
+
+		dma_async_issue_pending(copy_chan[i]);
+	}
+
+	for (i = 0; i < total_available_chans; ++i) {
+		int num_xfer_per_dev = nr_pages / total_available_chans;
+		int xfer_idx;
+		
+		if (i < (nr_pages % total_available_chans))
+			num_xfer_per_dev += 1;
+
+		for (xfer_idx = 0; xfer_idx < num_xfer_per_dev; ++xfer_idx) {
+			int page_idx = i + xfer_idx * total_available_chans;
+
+			if (dma_sync_wait(copy_chan[i], cookie[page_idx]) != DMA_COMPLETE) {
+				ret_val = -EFAULT;
+				pr_err("%s: dma does not complete at chan %d, xfer %d\n",
+					   __func__, i, xfer_idx);
+			}
+		}
+	}
+
+unmap_dma:
+	for (i = 0; i < total_available_chans; ++i) {
+		if (unmap[i])
+			dmaengine_unmap_put(unmap[i]);
+	}
+
+	kfree(cookie);
+out_free_tx:
+	kfree(tx);
+out:
+
+	return ret_val;
+}
diff --git a/mm/internal.h b/mm/internal.h
index b99a634b4d09..32048e89bfda 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -500,10 +500,13 @@ extern const struct trace_print_flags gfpflag_names[];
 
 extern int copy_page_lists_mthread(struct page **to,
 			struct page **from, int nr_pages);
+extern int copy_page_lists_dma_always(struct page **to,
+			struct page **from, int nr_pages);
 
 extern int exchange_page_mthread(struct page *to, struct page *from,
 			int nr_pages);
 extern int exchange_page_lists_mthread(struct page **to,
 						  struct page **from, 
 						  int nr_pages);
+
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 464bc9ba8083..63e44ac65184 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1611,7 +1611,10 @@ static int copy_to_new_pages_concur(struct list_head *unmapped_list_ptr,
 
 	BUG_ON(idx != num_pages);
 	
-	if (mode & MIGRATE_MT)
+	if (mode & MIGRATE_DMA)
+		rc = copy_page_lists_dma_always(dst_page_list, src_page_list,
+							num_pages);
+	else if (mode & MIGRATE_MT)
 		rc = copy_page_lists_mthread(dst_page_list, src_page_list,
 							num_pages);
 
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function
  2017-02-17 15:05 ` [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function Zi Yan
@ 2017-02-23  6:06   ` Naoya Horiguchi
  2017-02-23  7:50     ` Anshuman Khandual
  0 siblings, 1 reply; 25+ messages in thread
From: Naoya Horiguchi @ 2017-02-23  6:06 UTC (permalink / raw)
  To: Zi Yan; +Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On Fri, Feb 17, 2017 at 10:05:40AM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This change adds a new function copy_pages_mthread to enable multi threaded
> page copy which can be utilized during migration. This function splits the
> page copy request into multiple threads which will handle individual chunk
> and send them as jobs to system_highpri_wq work queue.
> 
> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  include/linux/highmem.h |  2 ++
>  mm/Makefile             |  2 ++
>  mm/copy_pages.c         | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 90 insertions(+)
>  create mode 100644 mm/copy_pages.c
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index bb3f3297062a..e1f4f1b82812 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -236,6 +236,8 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
>  
>  #endif
>  
> +int copy_pages_mthread(struct page *to, struct page *from, int nr_pages);
> +
>  static inline void copy_highpage(struct page *to, struct page *from)
>  {
>  	char *vfrom, *vto;
> diff --git a/mm/Makefile b/mm/Makefile
> index aa0aa17cb413..cdd4bab9cc66 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -43,6 +43,8 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
>  
>  obj-y += init-mm.o
>  
> +obj-y += copy_pages.o
> +
>  ifdef CONFIG_NO_BOOTMEM
>  	obj-y		+= nobootmem.o
>  else
> diff --git a/mm/copy_pages.c b/mm/copy_pages.c
> new file mode 100644
> index 000000000000..c357e7b01042
> --- /dev/null
> +++ b/mm/copy_pages.c
> @@ -0,0 +1,86 @@
> +/*
> + * This implements parallel page copy function through multi threaded
> + * work queues.
> + *
> + * Zi Yan <ziy@nvidia.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + */
> +#include <linux/highmem.h>
> +#include <linux/workqueue.h>
> +#include <linux/slab.h>
> +#include <linux/freezer.h>
> +
> +/*
> + * nr_copythreads can be the highest number of threads for given node
> + * on any architecture. The actual number of copy threads will be
> + * limited by the cpumask weight of the target node.
> + */
> +unsigned int nr_copythreads = 8;

If you give this as a constant, how about defining as macro?

> +
> +struct copy_info {
> +	struct work_struct copy_work;
> +	char *to;
> +	char *from;
> +	unsigned long chunk_size;
> +};
> +
> +static void copy_pages(char *vto, char *vfrom, unsigned long size)
> +{
> +	memcpy(vto, vfrom, size);
> +}
> +
> +static void copythread(struct work_struct *work)
> +{
> +	struct copy_info *info = (struct copy_info *) work;
> +
> +	copy_pages(info->to, info->from, info->chunk_size);
> +}
> +
> +int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
> +{
> +	unsigned int node = page_to_nid(to);
> +	const struct cpumask *cpumask = cpumask_of_node(node);
> +	struct copy_info *work_items;
> +	char *vto, *vfrom;
> +	unsigned long i, cthreads, cpu, chunk_size;
> +	int cpu_id_list[32] = {0};

Why 32? Maybe you can set the array size with nr_copythreads (macro version.)

> +
> +	cthreads = nr_copythreads;
> +	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));

nitpick, but looks a little wordy, can it be simply like below?

  cthreads = min_t(unsigned int, nr_copythreads, cpumask_weight(cpumask));

> +	cthreads = (cthreads / 2) * 2;

I'm not sure the intention here. # of threads should be even number?
If cpumask_weight() is 1, cthreads is 0, that could cause zero division.
So you had better making sure to prevent it.

Thanks,
Naoya Horiguchi

> +	work_items = kcalloc(cthreads, sizeof(struct copy_info), GFP_KERNEL);
> +	if (!work_items)
> +		return -ENOMEM;
> +
> +	i = 0;
> +	for_each_cpu(cpu, cpumask) {
> +		if (i >= cthreads)
> +			break;
> +		cpu_id_list[i] = cpu;
> +		++i;
> +	}
> +
> +	vfrom = kmap(from);
> +	vto = kmap(to);
> +	chunk_size = PAGE_SIZE * nr_pages / cthreads;
> +
> +	for (i = 0; i < cthreads; ++i) {
> +		INIT_WORK((struct work_struct *) &work_items[i], copythread);
> +
> +		work_items[i].to = vto + i * chunk_size;
> +		work_items[i].from = vfrom + i * chunk_size;
> +		work_items[i].chunk_size = chunk_size;
> +
> +		queue_work_on(cpu_id_list[i], system_highpri_wq,
> +					  (struct work_struct *) &work_items[i]);
> +	}
> +
> +	for (i = 0; i < cthreads; ++i)
> +		flush_work((struct work_struct *) &work_items[i]);
> +
> +	kunmap(to);
> +	kunmap(from);
> +	kfree(work_items);
> +	return 0;
> +}
> -- 
> 2.11.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT
  2017-02-17 15:05 ` [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT Zi Yan
@ 2017-02-23  6:54   ` Naoya Horiguchi
  2017-02-23  7:54     ` Anshuman Khandual
  0 siblings, 1 reply; 25+ messages in thread
From: Naoya Horiguchi @ 2017-02-23  6:54 UTC (permalink / raw)
  To: Zi Yan; +Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On Fri, Feb 17, 2017 at 10:05:41AM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> This change adds a new migration mode called MIGRATE_MT to enable multi
> threaded page copy implementation inside copy_huge_page() function by
> selectively calling copy_pages_mthread() when requested. But it still
> falls back using the regular page copy mechanism instead the previous
> multi threaded attempt fails. It also attempts multi threaded copy for
> regular pages.
> 
> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  include/linux/migrate_mode.h |  1 +
>  mm/migrate.c                 | 25 ++++++++++++++++++-------
>  2 files changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index 89c170060e5b..d344ad60f499 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -12,6 +12,7 @@ enum migrate_mode {
>  	MIGRATE_SYNC_LIGHT	= 1<<1,
>  	MIGRATE_SYNC		= 1<<2,
>  	MIGRATE_ST		= 1<<3,
> +	MIGRATE_MT		= 1<<4,

Could you update the comment above this definition to cover the new flags.

Thanks,
Naoya Horiguchi

>  };
>  
>  #endif		/* MIGRATE_MODE_H_INCLUDED */
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 87253cb9b50a..21307219428d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -601,6 +601,7 @@ static void copy_huge_page(struct page *dst, struct page *src,
>  {
>  	int i;
>  	int nr_pages;
> +	int rc = -EFAULT;
>  
>  	if (PageHuge(src)) {
>  		/* hugetlbfs page */
> @@ -617,10 +618,14 @@ static void copy_huge_page(struct page *dst, struct page *src,
>  		nr_pages = hpage_nr_pages(src);
>  	}
>  
> -	for (i = 0; i < nr_pages; i++) {
> -		cond_resched();
> -		copy_highpage(dst + i, src + i);
> -	}
> +	if (mode & MIGRATE_MT)
> +		rc = copy_pages_mthread(dst, src, nr_pages);
> +
> +	if (rc)
> +		for (i = 0; i < nr_pages; i++) {
> +			cond_resched();
> +			copy_highpage(dst + i, src + i);
> +		}
>  }
>  
>  /*
> @@ -631,10 +636,16 @@ void migrate_page_copy(struct page *newpage, struct page *page,
>  {
>  	int cpupid;
>  
> -	if (PageHuge(page) || PageTransHuge(page))
> +	if (PageHuge(page) || PageTransHuge(page)) {
>  		copy_huge_page(newpage, page, mode);
> -	else
> -		copy_highpage(newpage, page);
> +	} else {
> +		if (mode & MIGRATE_MT) {
> +			if (copy_pages_mthread(newpage, page, 1))
> +				copy_highpage(newpage, page);
> +		} else {
> +			copy_highpage(newpage, page);
> +		}
> +	}
>  
>  	if (PageError(page))
>  		SetPageError(newpage);
> -- 
> 2.11.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function
  2017-02-23  6:06   ` Naoya Horiguchi
@ 2017-02-23  7:50     ` Anshuman Khandual
  2017-02-23  8:02       ` Naoya Horiguchi
  0 siblings, 1 reply; 25+ messages in thread
From: Anshuman Khandual @ 2017-02-23  7:50 UTC (permalink / raw)
  To: Naoya Horiguchi, Zi Yan
  Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On 02/23/2017 11:36 AM, Naoya Horiguchi wrote:
> On Fri, Feb 17, 2017 at 10:05:40AM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This change adds a new function copy_pages_mthread to enable multi threaded
>> page copy which can be utilized during migration. This function splits the
>> page copy request into multiple threads which will handle individual chunk
>> and send them as jobs to system_highpri_wq work queue.
>>
>> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  include/linux/highmem.h |  2 ++
>>  mm/Makefile             |  2 ++
>>  mm/copy_pages.c         | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 90 insertions(+)
>>  create mode 100644 mm/copy_pages.c
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index bb3f3297062a..e1f4f1b82812 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -236,6 +236,8 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
>>  
>>  #endif
>>  
>> +int copy_pages_mthread(struct page *to, struct page *from, int nr_pages);
>> +
>>  static inline void copy_highpage(struct page *to, struct page *from)
>>  {
>>  	char *vfrom, *vto;
>> diff --git a/mm/Makefile b/mm/Makefile
>> index aa0aa17cb413..cdd4bab9cc66 100644
>> --- a/mm/Makefile
>> +++ b/mm/Makefile
>> @@ -43,6 +43,8 @@ obj-y			:= filemap.o mempool.o oom_kill.o \
>>  
>>  obj-y += init-mm.o
>>  
>> +obj-y += copy_pages.o
>> +
>>  ifdef CONFIG_NO_BOOTMEM
>>  	obj-y		+= nobootmem.o
>>  else
>> diff --git a/mm/copy_pages.c b/mm/copy_pages.c
>> new file mode 100644
>> index 000000000000..c357e7b01042
>> --- /dev/null
>> +++ b/mm/copy_pages.c
>> @@ -0,0 +1,86 @@
>> +/*
>> + * This implements parallel page copy function through multi threaded
>> + * work queues.
>> + *
>> + * Zi Yan <ziy@nvidia.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.
>> + */
>> +#include <linux/highmem.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/slab.h>
>> +#include <linux/freezer.h>
>> +
>> +/*
>> + * nr_copythreads can be the highest number of threads for given node
>> + * on any architecture. The actual number of copy threads will be
>> + * limited by the cpumask weight of the target node.
>> + */
>> +unsigned int nr_copythreads = 8;
> 
> If you give this as a constant, how about defining as macro?

Sure, will change it up next time around.

> 
>> +
>> +struct copy_info {
>> +	struct work_struct copy_work;
>> +	char *to;
>> +	char *from;
>> +	unsigned long chunk_size;
>> +};
>> +
>> +static void copy_pages(char *vto, char *vfrom, unsigned long size)
>> +{
>> +	memcpy(vto, vfrom, size);
>> +}
>> +
>> +static void copythread(struct work_struct *work)
>> +{
>> +	struct copy_info *info = (struct copy_info *) work;
>> +
>> +	copy_pages(info->to, info->from, info->chunk_size);
>> +}
>> +
>> +int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
>> +{
>> +	unsigned int node = page_to_nid(to);
>> +	const struct cpumask *cpumask = cpumask_of_node(node);
>> +	struct copy_info *work_items;
>> +	char *vto, *vfrom;
>> +	unsigned long i, cthreads, cpu, chunk_size;
>> +	int cpu_id_list[32] = {0};
> 
> Why 32? Maybe you can set the array size with nr_copythreads (macro version.)

Sure, will do.

> 
>> +
>> +	cthreads = nr_copythreads;
>> +	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));
> 
> nitpick, but looks a little wordy, can it be simply like below?
> 
>   cthreads = min_t(unsigned int, nr_copythreads, cpumask_weight(cpumask));
> 
>> +	cthreads = (cthreads / 2) * 2;
> 
> I'm not sure the intention here. # of threads should be even number?

Yes.

> If cpumask_weight() is 1, cthreads is 0, that could cause zero division.
> So you had better making sure to prevent it.

If cpumask_weight() is 1, then min_t(unsigned int, 8, 1) should be
greater that equal to 1. Then cthreads can end up in 0. That is
possible. But how there is a chance of zero division ? May be its
possible if we are trying move into a CPU less memory only node
where cpumask_weight() is 0 ?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT
  2017-02-23  6:54   ` Naoya Horiguchi
@ 2017-02-23  7:54     ` Anshuman Khandual
  0 siblings, 0 replies; 25+ messages in thread
From: Anshuman Khandual @ 2017-02-23  7:54 UTC (permalink / raw)
  To: Naoya Horiguchi, Zi Yan
  Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On 02/23/2017 12:24 PM, Naoya Horiguchi wrote:
> On Fri, Feb 17, 2017 at 10:05:41AM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> This change adds a new migration mode called MIGRATE_MT to enable multi
>> threaded page copy implementation inside copy_huge_page() function by
>> selectively calling copy_pages_mthread() when requested. But it still
>> falls back using the regular page copy mechanism instead the previous
>> multi threaded attempt fails. It also attempts multi threaded copy for
>> regular pages.
>>
>> Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  include/linux/migrate_mode.h |  1 +
>>  mm/migrate.c                 | 25 ++++++++++++++++++-------
>>  2 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
>> index 89c170060e5b..d344ad60f499 100644
>> --- a/include/linux/migrate_mode.h
>> +++ b/include/linux/migrate_mode.h
>> @@ -12,6 +12,7 @@ enum migrate_mode {
>>  	MIGRATE_SYNC_LIGHT	= 1<<1,
>>  	MIGRATE_SYNC		= 1<<2,
>>  	MIGRATE_ST		= 1<<3,
>> +	MIGRATE_MT		= 1<<4,
> 
> Could you update the comment above this definition to cover the new flags.

Sure, will do.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function
  2017-02-23  7:50     ` Anshuman Khandual
@ 2017-02-23  8:02       ` Naoya Horiguchi
  2017-03-09  5:35         ` Anshuman Khandual
  0 siblings, 1 reply; 25+ messages in thread
From: Naoya Horiguchi @ 2017-02-23  8:02 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: Zi Yan, linux-mm, dnellans, apopple, paulmck, zi.yan

On Thu, Feb 23, 2017 at 01:20:16PM +0530, Anshuman Khandual wrote:
...
> > 
> >> +
> >> +	cthreads = nr_copythreads;
> >> +	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));
> > 
> > nitpick, but looks a little wordy, can it be simply like below?
> > 
> >   cthreads = min_t(unsigned int, nr_copythreads, cpumask_weight(cpumask));
> > 
> >> +	cthreads = (cthreads / 2) * 2;
> > 
> > I'm not sure the intention here. # of threads should be even number?
> 
> Yes.
> 
> > If cpumask_weight() is 1, cthreads is 0, that could cause zero division.
> > So you had better making sure to prevent it.
> 
> If cpumask_weight() is 1, then min_t(unsigned int, 8, 1) should be
> greater that equal to 1. Then cthreads can end up in 0. That is
> possible. But how there is a chance of zero division ? 

Hi Anshuman,

I just thought like above when reading the line your patch introduces:

       chunk_size = PAGE_SIZE * nr_pages / cthreads
                                           ~~~~~~~~
                                           (this can be 0?)

- Naoya

> May be its
> possible if we are trying move into a CPU less memory only node
> where cpumask_weight() is 0 ?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function.
  2017-02-17 15:05 ` [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function Zi Yan
@ 2017-02-23  8:54   ` Naoya Horiguchi
  2017-03-09 13:02     ` Anshuman Khandual
  0 siblings, 1 reply; 25+ messages in thread
From: Naoya Horiguchi @ 2017-02-23  8:54 UTC (permalink / raw)
  To: Zi Yan; +Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On Fri, Feb 17, 2017 at 10:05:44AM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> It supports copying a list of pages via multi-threaded process.
> It evenly distributes a list of pages to a group of threads and
> uses the same subroutine as copy_page_mthread()

The new function has many duplicate lines with copy_page_mthread(),
so please consider factoring out them into a common routine.
That makes your code more readable/maintainable.

Thanks,
Naoya Horiguchi

> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/copy_pages.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/internal.h   |  3 +++
>  2 files changed, 65 insertions(+)
> 
> diff --git a/mm/copy_pages.c b/mm/copy_pages.c
> index c357e7b01042..516c0a1a57f3 100644
> --- a/mm/copy_pages.c
> +++ b/mm/copy_pages.c
> @@ -84,3 +84,65 @@ int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)
>  	kfree(work_items);
>  	return 0;
>  }
> +
> +int copy_page_lists_mthread(struct page **to, struct page **from, int nr_pages) 
> +{
> +	int err = 0;
> +	unsigned int cthreads, node = page_to_nid(*to);
> +	int i;
> +	struct copy_info *work_items;
> +	int nr_pages_per_page = hpage_nr_pages(*from);
> +	const struct cpumask *cpumask = cpumask_of_node(node);
> +	int cpu_id_list[32] = {0};
> +	int cpu;
> +
> +	cthreads = nr_copythreads;
> +	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));
> +	cthreads = (cthreads / 2) * 2;
> +	cthreads = min_t(unsigned int, nr_pages, cthreads);
> +
> +	work_items = kzalloc(sizeof(struct copy_info)*nr_pages,
> +						 GFP_KERNEL);
> +	if (!work_items)
> +		return -ENOMEM;
> +
> +	i = 0;
> +	for_each_cpu(cpu, cpumask) {
> +		if (i >= cthreads)
> +			break;
> +		cpu_id_list[i] = cpu;
> +		++i;
> +	}
> +
> +	for (i = 0; i < nr_pages; ++i) {
> +		int thread_idx = i % cthreads;
> +
> +		INIT_WORK((struct work_struct *)&work_items[i], 
> +				  copythread);
> +
> +		work_items[i].to = kmap(to[i]);
> +		work_items[i].from = kmap(from[i]);
> +		work_items[i].chunk_size = PAGE_SIZE * hpage_nr_pages(from[i]);
> +
> +		BUG_ON(nr_pages_per_page != hpage_nr_pages(from[i]));
> +		BUG_ON(nr_pages_per_page != hpage_nr_pages(to[i]));
> +
> +
> +		queue_work_on(cpu_id_list[thread_idx], 
> +					  system_highpri_wq, 
> +					  (struct work_struct *)&work_items[i]);
> +	}
> +
> +	/* Wait until it finishes  */
> +	for (i = 0; i < cthreads; ++i)
> +		flush_work((struct work_struct *) &work_items[i]);
> +
> +	for (i = 0; i < nr_pages; ++i) {
> +			kunmap(to[i]);
> +			kunmap(from[i]);
> +	}
> +
> +	kfree(work_items);
> +
> +	return err;
> +}
> diff --git a/mm/internal.h b/mm/internal.h
> index ccfc2a2969f4..175e08ed524a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -498,4 +498,7 @@ extern const struct trace_print_flags pageflag_names[];
>  extern const struct trace_print_flags vmaflag_names[];
>  extern const struct trace_print_flags gfpflag_names[];
>  
> +extern int copy_page_lists_mthread(struct page **to,
> +			struct page **from, int nr_pages);
> +
>  #endif	/* __MM_INTERNAL_H */
> -- 
> 2.11.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall.
  2017-02-17 15:05 ` [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall Zi Yan
@ 2017-02-24  8:25   ` Naoya Horiguchi
  2017-02-24 15:05     ` Zi Yan
  0 siblings, 1 reply; 25+ messages in thread
From: Naoya Horiguchi @ 2017-02-24  8:25 UTC (permalink / raw)
  To: Zi Yan; +Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On Fri, Feb 17, 2017 at 10:05:45AM -0500, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> Concurrent page migration moves a list of pages all together,
> concurrently via multi-threaded. This is different from
> existing page migration process which migrate pages sequentially.
> Current implementation only migrates anonymous pages.

Please explain more about your new migration scheme, especially
difference from original page migration code is very imporatant
for reviewers and other developers to understand your work quickly.

>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  include/linux/migrate_mode.h   |   1 +
>  include/uapi/linux/mempolicy.h |   1 +
>  mm/migrate.c                   | 495 ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 492 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index d344ad60f499..2bd849d89122 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -13,6 +13,7 @@ enum migrate_mode {
>  	MIGRATE_SYNC		= 1<<2,
>  	MIGRATE_ST		= 1<<3,
>  	MIGRATE_MT		= 1<<4,
> +	MIGRATE_CONCUR		= 1<<5,

This new flag MIGRATE_CONCUR seems unused from other code, so is it unneeded
now, or is there a typo somewhere?

>  };
>
>  #endif		/* MIGRATE_MODE_H_INCLUDED */
> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> index 8f1db2e2d677..6d9758a32053 100644
> --- a/include/uapi/linux/mempolicy.h
> +++ b/include/uapi/linux/mempolicy.h
> @@ -54,6 +54,7 @@ enum mpol_rebind_step {
>  #define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
>  #define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
>  #define MPOL_MF_MOVE_MT  (1<<6)	/* Use multi-threaded page copy routine */
> +#define MPOL_MF_MOVE_CONCUR  (1<<7)	/* Migrate a list of pages concurrently */
>
>  #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
>  			 MPOL_MF_MOVE     | 	\
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 0e9b1f17cf8b..a35e6fd43a50 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -50,6 +50,14 @@
>
>  int mt_page_copy = 0;
>
> +
> +struct page_migration_work_item {
> +	struct page *old_page;
> +	struct page *new_page;
> +	struct anon_vma *anon_vma;
> +	struct list_head list;
> +};
> +
>  /*
>   * migrate_prep() needs to be called before we start compiling a list of pages
>   * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
> @@ -1312,6 +1320,471 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>  	return rc;
>  }
>
> +static int __unmap_page_concur(struct page *page, struct page *newpage,

Most code of this function is a copy of __unmap_and_move(), so please define
a new subfunction, and make __unmap_page_concur() and __unmap_and_move()
call it.

> +				struct anon_vma **anon_vma,
> +				int force, enum migrate_mode mode)
> +{
> +	int rc = -EAGAIN;
> +
> +	if (!trylock_page(page)) {
> +		if (!force || mode == MIGRATE_ASYNC)
> +			goto out;
> +
> +		/*
> +		 * It's not safe for direct compaction to call lock_page.
> +		 * For example, during page readahead pages are added locked
> +		 * to the LRU. Later, when the IO completes the pages are
> +		 * marked uptodate and unlocked. However, the queueing
> +		 * could be merging multiple pages for one bio (e.g.
> +		 * mpage_readpages). If an allocation happens for the
> +		 * second or third page, the process can end up locking
> +		 * the same page twice and deadlocking. Rather than
> +		 * trying to be clever about what pages can be locked,
> +		 * avoid the use of lock_page for direct compaction
> +		 * altogether.
> +		 */
> +		if (current->flags & PF_MEMALLOC)
> +			goto out;
> +
> +		lock_page(page);
> +	}
> +
> +	/* We are working on page_mapping(page) == NULL */
> +	VM_BUG_ON_PAGE(PageWriteback(page), page);

Although anonymous page shouldn't have PageWriteback, but existing migration
code (below) doesn't call VM_BUG_ON_PAGE even in that case. Any special reason
to do differently for concurrent migration?

        if (PageWriteback(page)) {
                /*
                 * Only in the case of a full synchronous migration is it
                 * necessary to wait for PageWriteback. In the async case,
                 * the retry loop is too short and in the sync-light case,
                 * the overhead of stalling is too much
                 */
                if (mode != MIGRATE_SYNC) {
                        rc = -EBUSY;
                        goto out_unlock;
                }
                if (!force)
                        goto out_unlock;
                wait_on_page_writeback(page);
        }

> +
> +	/*
> +	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
> +	 * we cannot notice that anon_vma is freed while we migrates a page.
> +	 * This get_anon_vma() delays freeing anon_vma pointer until the end
> +	 * of migration. File cache pages are no problem because of page_lock()
> +	 * File Caches may use write_page() or lock_page() in migration, then,
> +	 * just care Anon page here.
> +	 *
> +	 * Only page_get_anon_vma() understands the subtleties of
> +	 * getting a hold on an anon_vma from outside one of its mms.
> +	 * But if we cannot get anon_vma, then we won't need it anyway,
> +	 * because that implies that the anon page is no longer mapped
> +	 * (and cannot be remapped so long as we hold the page lock).
> +	 */
> +	if (PageAnon(page) && !PageKsm(page))
> +		*anon_vma = page_get_anon_vma(page);
> +
> +	/*
> +	 * Block others from accessing the new page when we get around to
> +	 * establishing additional references. We are usually the only one
> +	 * holding a reference to newpage at this point. We used to have a BUG
> +	 * here if trylock_page(newpage) fails, but would like to allow for
> +	 * cases where there might be a race with the previous use of newpage.
> +	 * This is much like races on refcount of oldpage: just don't BUG().
> +	 */
> +	if (unlikely(!trylock_page(newpage)))
> +		goto out_unlock;
> +
> +	/*
> +	 * Corner case handling:
> +	 * 1. When a new swap-cache page is read into, it is added to the LRU
> +	 * and treated as swapcache but it has no rmap yet.
> +	 * Calling try_to_unmap() against a page->mapping==NULL page will
> +	 * trigger a BUG.  So handle it here.
> +	 * 2. An orphaned page (see truncate_complete_page) might have
> +	 * fs-private metadata. The page can be picked up due to memory
> +	 * offlining.  Everywhere else except page reclaim, the page is
> +	 * invisible to the vm, so the page can not be migrated.  So try to
> +	 * free the metadata, so the page can be freed.
> +	 */
> +	if (!page->mapping) {
> +		VM_BUG_ON_PAGE(PageAnon(page), page);
> +		if (page_has_private(page)) {
> +			try_to_free_buffers(page);
> +			goto out_unlock_both;
> +		}
> +	} else {
> +		VM_BUG_ON_PAGE(!page_mapped(page), page);
> +		/* Establish migration ptes */
> +		VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !*anon_vma,
> +				page);
> +		rc = try_to_unmap(page,
> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> +	}
> +
> +	return rc;
> +
> +out_unlock_both:
> +	unlock_page(newpage);
> +out_unlock:
> +	/* Drop an anon_vma reference if we took one */
> +	if (*anon_vma)
> +		put_anon_vma(*anon_vma);
> +	unlock_page(page);
> +out:
> +	return rc;
> +}
> +
> +static int unmap_pages_and_get_new_concur(new_page_t get_new_page,
> +				free_page_t put_new_page, unsigned long private,
> +				struct page_migration_work_item *item,
> +				int force,
> +				enum migrate_mode mode, int reason)

Here many duplicates too, but you give struct page_migration_work_item
as an argument, so this duplicates might be OK.

> +{
> +	int rc = MIGRATEPAGE_SUCCESS;
> +	int *result = NULL;
> +
> +
> +	item->new_page = get_new_page(item->old_page, private, &result);
> +
> +	if (!item->new_page) {
> +		rc = -ENOMEM;
> +		return rc;
> +	}
> +
> +	if (page_count(item->old_page) == 1) {
> +		rc = -ECANCELED;
> +		goto out;
> +	}
> +
> +	if (unlikely(PageTransHuge(item->old_page) &&
> +		!PageTransHuge(item->new_page))) {
> +		lock_page(item->old_page);
> +		rc = split_huge_page(item->old_page);
> +		unlock_page(item->old_page);
> +		if (rc)
> +			goto out;
> +	}
> +
> +	rc = __unmap_page_concur(item->old_page, item->new_page, &item->anon_vma,
> +							force, mode);
> +	if (rc == MIGRATEPAGE_SUCCESS) {
> +		put_new_page = NULL;
> +		return rc;
> +	}
> +
> +out:
> +	if (rc != -EAGAIN) {
> +		list_del(&item->old_page->lru);
> +		dec_zone_page_state(item->old_page, NR_ISOLATED_ANON +
> +				page_is_file_cache(item->old_page));
> +
> +		putback_lru_page(item->old_page);
> +	}
> +
> +	/*
> +	 * If migration was not successful and there's a freeing callback, use
> +	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
> +	 * during isolation.
> +	 */
> +	if (put_new_page)
> +		put_new_page(item->new_page, private);
> +	else
> +		putback_lru_page(item->new_page);
> +
> +	if (result) {
> +		if (rc)
> +			*result = rc;
> +		else
> +			*result = page_to_nid(item->new_page);
> +	}
> +
> +	return rc;
> +}
> +
> +static int move_mapping_concurr(struct list_head *unmapped_list_ptr,
> +					   struct list_head *wip_list_ptr,
> +					   enum migrate_mode mode)
> +{
> +	struct page_migration_work_item *iterator, *iterator2;
> +	struct address_space *mapping;
> +
> +	list_for_each_entry_safe(iterator, iterator2, unmapped_list_ptr, list) {
> +		VM_BUG_ON_PAGE(!PageLocked(iterator->old_page), iterator->old_page);
> +		VM_BUG_ON_PAGE(!PageLocked(iterator->new_page), iterator->new_page);
> +
> +		mapping = page_mapping(iterator->old_page);
> +
> +		VM_BUG_ON(mapping);
> +
> +		VM_BUG_ON(PageWriteback(iterator->old_page));
> +
> +		if (page_count(iterator->old_page) != 1) {
> +			list_move(&iterator->list, wip_list_ptr);
> +			continue;
> +		}
> +
> +		iterator->new_page->index = iterator->old_page->index;
> +		iterator->new_page->mapping = iterator->old_page->mapping;
> +		if (PageSwapBacked(iterator->old_page))
> +			SetPageSwapBacked(iterator->new_page);
> +	}
> +
> +	return 0;
> +}
> +
> +static void migrate_page_copy_page_flags(struct page *newpage, struct page *page)

This function is nearly identical with migrate_page_copy(), so please make
it call this function inside it.

> +{
> +	int cpupid;
> +
> +	if (PageError(page))
> +		SetPageError(newpage);
> +	if (PageReferenced(page))
> +		SetPageReferenced(newpage);
> +	if (PageUptodate(page))
> +		SetPageUptodate(newpage);
> +	if (TestClearPageActive(page)) {
> +		VM_BUG_ON_PAGE(PageUnevictable(page), page);
> +		SetPageActive(newpage);
> +	} else if (TestClearPageUnevictable(page))
> +		SetPageUnevictable(newpage);
> +	if (PageChecked(page))
> +		SetPageChecked(newpage);
> +	if (PageMappedToDisk(page))
> +		SetPageMappedToDisk(newpage);
> +
> +	/* Move dirty on pages not done by migrate_page_move_mapping() */
> +	if (PageDirty(page))
> +		SetPageDirty(newpage);
> +
> +	if (page_is_young(page))
> +		set_page_young(newpage);
> +	if (page_is_idle(page))
> +		set_page_idle(newpage);
> +
> +	/*
> +	 * Copy NUMA information to the new page, to prevent over-eager
> +	 * future migrations of this same page.
> +	 */
> +	cpupid = page_cpupid_xchg_last(page, -1);
> +	page_cpupid_xchg_last(newpage, cpupid);
> +
> +	ksm_migrate_page(newpage, page);
> +	/*
> +	 * Please do not reorder this without considering how mm/ksm.c's
> +	 * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache().
> +	 */
> +	if (PageSwapCache(page))
> +		ClearPageSwapCache(page);
> +	ClearPagePrivate(page);
> +	set_page_private(page, 0);
> +
> +	/*
> +	 * If any waiters have accumulated on the new page then
> +	 * wake them up.
> +	 */
> +	if (PageWriteback(newpage))
> +		end_page_writeback(newpage);
> +
> +	copy_page_owner(page, newpage);
> +
> +	mem_cgroup_migrate(page, newpage);
> +}
> +
> +
> +static int copy_to_new_pages_concur(struct list_head *unmapped_list_ptr,
> +				enum migrate_mode mode)
> +{
> +	struct page_migration_work_item *iterator;
> +	int num_pages = 0, idx = 0;
> +	struct page **src_page_list = NULL, **dst_page_list = NULL;
> +	unsigned long size = 0;
> +	int rc = -EFAULT;
> +
> +	list_for_each_entry(iterator, unmapped_list_ptr, list) {
> +		++num_pages;
> +		size += PAGE_SIZE * hpage_nr_pages(iterator->old_page);
> +	}
> +
> +	src_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
> +	if (!src_page_list)
> +		return -ENOMEM;
> +	dst_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
> +	if (!dst_page_list)
> +		return -ENOMEM;
> +
> +	list_for_each_entry(iterator, unmapped_list_ptr, list) {
> +		src_page_list[idx] = iterator->old_page;
> +		dst_page_list[idx] = iterator->new_page;
> +		++idx;
> +	}
> +
> +	BUG_ON(idx != num_pages);
> +
> +	if (mode & MIGRATE_MT)

just my guessing, you mean MIGRATE_CONCUR?

> +		rc = copy_page_lists_mthread(dst_page_list, src_page_list,
> +							num_pages);
> +
> +	if (rc)
> +		list_for_each_entry(iterator, unmapped_list_ptr, list) {
> +			if (PageHuge(iterator->old_page) ||
> +				PageTransHuge(iterator->old_page))
> +				copy_huge_page(iterator->new_page, iterator->old_page, 0);
> +			else
> +				copy_highpage(iterator->new_page, iterator->old_page);
> +		}
> +
> +	kfree(src_page_list);
> +	kfree(dst_page_list);
> +
> +	list_for_each_entry(iterator, unmapped_list_ptr, list) {
> +		migrate_page_copy_page_flags(iterator->new_page, iterator->old_page);
> +	}
> +
> +	return 0;
> +}
> +
> +static int remove_migration_ptes_concurr(struct list_head *unmapped_list_ptr)
> +{
> +	struct page_migration_work_item *iterator, *iterator2;
> +
> +	list_for_each_entry_safe(iterator, iterator2, unmapped_list_ptr, list) {
> +		remove_migration_ptes(iterator->old_page, iterator->new_page, false);
> +
> +		unlock_page(iterator->new_page);
> +
> +		if (iterator->anon_vma)
> +			put_anon_vma(iterator->anon_vma);
> +
> +		unlock_page(iterator->old_page);
> +
> +		list_del(&iterator->old_page->lru);
> +		dec_zone_page_state(iterator->old_page, NR_ISOLATED_ANON +
> +				page_is_file_cache(iterator->old_page));
> +
> +		putback_lru_page(iterator->old_page);
> +		iterator->old_page = NULL;
> +
> +		putback_lru_page(iterator->new_page);
> +		iterator->new_page = NULL;
> +	}
> +
> +	return 0;
> +}
> +
> +int migrate_pages_concur(struct list_head *from, new_page_t get_new_page,
> +		free_page_t put_new_page, unsigned long private,
> +		enum migrate_mode mode, int reason)
> +{
> +	int retry = 1;
> +	int nr_failed = 0;
> +	int nr_succeeded = 0;
> +	int pass = 0;
> +	struct page *page;
> +	int swapwrite = current->flags & PF_SWAPWRITE;
> +	int rc;
> +	int total_num_pages = 0, idx;
> +	struct page_migration_work_item *item_list;
> +	struct page_migration_work_item *iterator, *iterator2;
> +	int item_list_order = 0;
> +
> +	LIST_HEAD(wip_list);
> +	LIST_HEAD(unmapped_list);
> +	LIST_HEAD(serialized_list);
> +	LIST_HEAD(failed_list);
> +
> +	if (!swapwrite)
> +		current->flags |= PF_SWAPWRITE;
> +
> +	list_for_each_entry(page, from, lru)
> +		++total_num_pages;
> +
> +	item_list_order = get_order(total_num_pages *
> +		sizeof(struct page_migration_work_item));
> +
> +	if (item_list_order > MAX_ORDER) {
> +		item_list = alloc_pages_exact(total_num_pages *
> +			sizeof(struct page_migration_work_item), GFP_ATOMIC);
> +		memset(item_list, 0, total_num_pages *
> +			sizeof(struct page_migration_work_item));
> +	} else {
> +		item_list = (struct page_migration_work_item *)__get_free_pages(GFP_ATOMIC,
> +						item_list_order);

The allocation could fail, so error handling is needed here.

> +		memset(item_list, 0, PAGE_SIZE<<item_list_order);
> +	}
> +
> +	idx = 0;
> +	list_for_each_entry(page, from, lru) {
> +		item_list[idx].old_page = page;
> +		item_list[idx].new_page = NULL;
> +		INIT_LIST_HEAD(&item_list[idx].list);
> +		list_add_tail(&item_list[idx].list, &wip_list);
> +		idx += 1;
> +	}

At this point, all migration target pages are moved to wip_list, so
the list 'from' (passed from and returned back to the caller) becomes empty.
When all migration trial are done and there still remain pages on wip_list
and/or serialized_list, the remaining pages should be moved back to 'from'.

> +
> +	for(pass = 0; pass < 1 && retry; pass++) {
> +		retry = 0;
> +
> +		/* unmap and get new page for page_mapping(page) == NULL */
> +		list_for_each_entry_safe(iterator, iterator2, &wip_list, list) {
> +			cond_resched();
> +
> +			if (iterator->new_page)
> +				continue;
> +
> +			/* We do not migrate huge pages, file-backed, or swapcached pages */

Just "huge page" are confusing, maybe you mean hugetlb.

> +			if (PageHuge(iterator->old_page))
> +				rc = -ENODEV;
> +			else if ((page_mapping(iterator->old_page) != NULL))
> +				rc = -ENODEV;
> +			else
> +				rc = unmap_pages_and_get_new_concur(get_new_page, put_new_page,
> +						private, iterator, pass > 2, mode,
> +						reason);
> +
> +			switch(rc) {
> +			case -ENODEV:
> +				list_move(&iterator->list, &serialized_list);
> +				break;
> +			case -ENOMEM:
> +				goto out;
> +			case -EAGAIN:
> +				retry++;
> +				break;
> +			case MIGRATEPAGE_SUCCESS:
> +				list_move(&iterator->list, &unmapped_list);
> +				nr_succeeded++;
> +				break;
> +			default:
> +				/*
> +				 * Permanent failure (-EBUSY, -ENOSYS, etc.):
> +				 * unlike -EAGAIN case, the failed page is
> +				 * removed from migration page list and not
> +				 * retried in the next outer loop.
> +				 */
> +				list_move(&iterator->list, &failed_list);
> +				nr_failed++;
> +				break;
> +			}
> +		}
> +		/* move page->mapping to new page, only -EAGAIN could happen  */
> +		move_mapping_concurr(&unmapped_list, &wip_list, mode);
> +		/* copy pages in unmapped_list */
> +		copy_to_new_pages_concur(&unmapped_list, mode);
> +		/* remove migration pte, if old_page is NULL?, unlock old and new
> +		 * pages, put anon_vma, put old and new pages */
> +		remove_migration_ptes_concurr(&unmapped_list);
> +	}
> +	nr_failed += retry;
> +	rc = nr_failed;
> +
> +	if (!list_empty(&serialized_list))
> +		rc = migrate_pages(from, get_new_page, put_new_page,

You should give &serialized_list instead of from, right?

Thanks,
Naoya Horiguchi

> +				private, mode, reason);
> +out:
> +	if (nr_succeeded)
> +		count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
> +	if (nr_failed)
> +		count_vm_events(PGMIGRATE_FAIL, nr_failed);
> +	trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
> +
> +	if (item_list_order >= MAX_ORDER)
> +		free_pages_exact(item_list, total_num_pages *
> +			sizeof(struct page_migration_work_item));
> +	else
> +		free_pages((unsigned long)item_list, item_list_order);
> +
> +	if (!swapwrite)
> +		current->flags &= ~PF_SWAPWRITE;
> +
> +	return rc;
> +}
> +
>  /*
>   * migrate_pages - migrate the pages specified in a list, to the free pages
>   *		   supplied as the target for the page migration
> @@ -1452,7 +1925,8 @@ static struct page *new_page_node(struct page *p, unsigned long private,
>  static int do_move_page_to_node_array(struct mm_struct *mm,
>  				      struct page_to_node *pm,
>  				      int migrate_all,
> -					  int migrate_use_mt)
> +					  int migrate_use_mt,
> +					  int migrate_concur)
>  {
>  	int err;
>  	struct page_to_node *pp;
> @@ -1536,8 +2010,16 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
>
>  	err = 0;
>  	if (!list_empty(&pagelist)) {
> -		err = migrate_pages(&pagelist, new_page_node, NULL,
> -				(unsigned long)pm, mode, MR_SYSCALL);
> +		if (migrate_concur)
> +			err = migrate_pages_concur(&pagelist, new_page_node, NULL,
> +					(unsigned long)pm,
> +					mode,
> +					MR_SYSCALL);
> +		else
> +			err = migrate_pages(&pagelist, new_page_node, NULL,
> +					(unsigned long)pm,
> +					mode,
> +					MR_SYSCALL);
>  		if (err)
>  			putback_movable_pages(&pagelist);
>  	}
> @@ -1615,7 +2097,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>  		/* Migrate this chunk */
>  		err = do_move_page_to_node_array(mm, pm,
>  						 flags & MPOL_MF_MOVE_ALL,
> -						 flags & MPOL_MF_MOVE_MT);
> +						 flags & MPOL_MF_MOVE_MT,
> +						 flags & MPOL_MF_MOVE_CONCUR);
>  		if (err < 0)
>  			goto out_pm;
>
> @@ -1722,7 +2205,9 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
>  	nodemask_t task_nodes;
>
>  	/* Check flags */
> -	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_MOVE_MT))
> +	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|
> +				  MPOL_MF_MOVE_MT|
> +				  MPOL_MF_MOVE_CONCUR))
>  		return -EINVAL;
>
>  	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
> --
> 2.11.0
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall.
  2017-02-24  8:25   ` Naoya Horiguchi
@ 2017-02-24 15:05     ` Zi Yan
  0 siblings, 0 replies; 25+ messages in thread
From: Zi Yan @ 2017-02-24 15:05 UTC (permalink / raw)
  To: Naoya Horiguchi; +Cc: linux-mm, dnellans, apopple, paulmck, khandual

[-- Attachment #1: Type: text/plain, Size: 23903 bytes --]

On 24 Feb 2017, at 2:25, Naoya Horiguchi wrote:

> On Fri, Feb 17, 2017 at 10:05:45AM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Concurrent page migration moves a list of pages all together,
>> concurrently via multi-threaded. This is different from
>> existing page migration process which migrate pages sequentially.
>> Current implementation only migrates anonymous pages.
>
> Please explain more about your new migration scheme, especially
> difference from original page migration code is very imporatant
> for reviewers and other developers to understand your work quickly.

Sure.

Current migrate_pages() accepts a list of pages and migrates them
one after anther sequentially, just looping through a process of
1) getting a new page, 2) unmapping the old page, 3) copy the content of
old page to the new page, 4) mapping the new page. Thus, in step 3,
at most 4KB data (2MB if THP migration is enabled) is copied. Such small
amount of data limits copy throughput.

This concurrent page migration patch is trying to aggregate all the data copy
parts while copying a list of pages to increase data copy throughput. Combining
this and parallel page migration, I am able to reach the peak memory bandwidth
while copying 16 2MB THP in both a Intel two-socket machine and a IBM Power8
two-socket machine. The data copy throughput is ~4x and ~2.6x of single and sequential
THP migration's data copy throughput in the Intel machine and the Power8 machine
respectively.

This at least provides an option for people who want to migrate pages in full
speed.


>
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  include/linux/migrate_mode.h   |   1 +
>>  include/uapi/linux/mempolicy.h |   1 +
>>  mm/migrate.c                   | 495 ++++++++++++++++++++++++++++++++++++++++-
>>  3 files changed, 492 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
>> index d344ad60f499..2bd849d89122 100644
>> --- a/include/linux/migrate_mode.h
>> +++ b/include/linux/migrate_mode.h
>> @@ -13,6 +13,7 @@ enum migrate_mode {
>>  	MIGRATE_SYNC		= 1<<2,
>>  	MIGRATE_ST		= 1<<3,
>>  	MIGRATE_MT		= 1<<4,
>> +	MIGRATE_CONCUR		= 1<<5,
>
> This new flag MIGRATE_CONCUR seems unused from other code, so is it unneeded
> now, or is there a typo somewhere?

It is not used at the moment. My original purpose was to rename existing
migrate_pages() to migrate_pages_sequential(), make migrate_pages() use
migrate_pages_sequential() by default, and use migrate_pages_concur()
when MIGRATE_CONCUR is set.

I will make the changes in next version.

>
>>  };
>>
>>  #endif		/* MIGRATE_MODE_H_INCLUDED */
>> diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
>> index 8f1db2e2d677..6d9758a32053 100644
>> --- a/include/uapi/linux/mempolicy.h
>> +++ b/include/uapi/linux/mempolicy.h
>> @@ -54,6 +54,7 @@ enum mpol_rebind_step {
>>  #define MPOL_MF_LAZY	 (1<<3)	/* Modifies '_MOVE:  lazy migrate on fault */
>>  #define MPOL_MF_INTERNAL (1<<4)	/* Internal flags start here */
>>  #define MPOL_MF_MOVE_MT  (1<<6)	/* Use multi-threaded page copy routine */
>> +#define MPOL_MF_MOVE_CONCUR  (1<<7)	/* Migrate a list of pages concurrently */
>>
>>  #define MPOL_MF_VALID	(MPOL_MF_STRICT   | 	\
>>  			 MPOL_MF_MOVE     | 	\
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 0e9b1f17cf8b..a35e6fd43a50 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -50,6 +50,14 @@
>>
>>  int mt_page_copy = 0;
>>
>> +
>> +struct page_migration_work_item {
>> +	struct page *old_page;
>> +	struct page *new_page;
>> +	struct anon_vma *anon_vma;
>> +	struct list_head list;
>> +};
>> +
>>  /*
>>   * migrate_prep() needs to be called before we start compiling a list of pages
>>   * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
>> @@ -1312,6 +1320,471 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
>>  	return rc;
>>  }
>>
>> +static int __unmap_page_concur(struct page *page, struct page *newpage,
>
> Most code of this function is a copy of __unmap_and_move(), so please define
> a new subfunction, and make __unmap_page_concur() and __unmap_and_move()
> call it.
>

Sure.

>> +				struct anon_vma **anon_vma,
>> +				int force, enum migrate_mode mode)
>> +{
>> +	int rc = -EAGAIN;
>> +
>> +	if (!trylock_page(page)) {
>> +		if (!force || mode == MIGRATE_ASYNC)
>> +			goto out;
>> +
>> +		/*
>> +		 * It's not safe for direct compaction to call lock_page.
>> +		 * For example, during page readahead pages are added locked
>> +		 * to the LRU. Later, when the IO completes the pages are
>> +		 * marked uptodate and unlocked. However, the queueing
>> +		 * could be merging multiple pages for one bio (e.g.
>> +		 * mpage_readpages). If an allocation happens for the
>> +		 * second or third page, the process can end up locking
>> +		 * the same page twice and deadlocking. Rather than
>> +		 * trying to be clever about what pages can be locked,
>> +		 * avoid the use of lock_page for direct compaction
>> +		 * altogether.
>> +		 */
>> +		if (current->flags & PF_MEMALLOC)
>> +			goto out;
>> +
>> +		lock_page(page);
>> +	}
>> +
>> +	/* We are working on page_mapping(page) == NULL */
>> +	VM_BUG_ON_PAGE(PageWriteback(page), page);
>
> Although anonymous page shouldn't have PageWriteback, but existing migration
> code (below) doesn't call VM_BUG_ON_PAGE even in that case. Any special reason
> to do differently for concurrent migration?
>
>         if (PageWriteback(page)) {
>                 /*
>                  * Only in the case of a full synchronous migration is it
>                  * necessary to wait for PageWriteback. In the async case,
>                  * the retry loop is too short and in the sync-light case,
>                  * the overhead of stalling is too much
>                  */
>                 if (mode != MIGRATE_SYNC) {
>                         rc = -EBUSY;
>                         goto out_unlock;
>                 }
>                 if (!force)
>                         goto out_unlock;
>                 wait_on_page_writeback(page);
>         }
>

Thanks for pointing it out. I must forget to change this.
I will use VM_BUG_ON_PAGE(PageWriteback(page)) here.


>> +
>> +	/*
>> +	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
>> +	 * we cannot notice that anon_vma is freed while we migrates a page.
>> +	 * This get_anon_vma() delays freeing anon_vma pointer until the end
>> +	 * of migration. File cache pages are no problem because of page_lock()
>> +	 * File Caches may use write_page() or lock_page() in migration, then,
>> +	 * just care Anon page here.
>> +	 *
>> +	 * Only page_get_anon_vma() understands the subtleties of
>> +	 * getting a hold on an anon_vma from outside one of its mms.
>> +	 * But if we cannot get anon_vma, then we won't need it anyway,
>> +	 * because that implies that the anon page is no longer mapped
>> +	 * (and cannot be remapped so long as we hold the page lock).
>> +	 */
>> +	if (PageAnon(page) && !PageKsm(page))
>> +		*anon_vma = page_get_anon_vma(page);
>> +
>> +	/*
>> +	 * Block others from accessing the new page when we get around to
>> +	 * establishing additional references. We are usually the only one
>> +	 * holding a reference to newpage at this point. We used to have a BUG
>> +	 * here if trylock_page(newpage) fails, but would like to allow for
>> +	 * cases where there might be a race with the previous use of newpage.
>> +	 * This is much like races on refcount of oldpage: just don't BUG().
>> +	 */
>> +	if (unlikely(!trylock_page(newpage)))
>> +		goto out_unlock;
>> +
>> +	/*
>> +	 * Corner case handling:
>> +	 * 1. When a new swap-cache page is read into, it is added to the LRU
>> +	 * and treated as swapcache but it has no rmap yet.
>> +	 * Calling try_to_unmap() against a page->mapping==NULL page will
>> +	 * trigger a BUG.  So handle it here.
>> +	 * 2. An orphaned page (see truncate_complete_page) might have
>> +	 * fs-private metadata. The page can be picked up due to memory
>> +	 * offlining.  Everywhere else except page reclaim, the page is
>> +	 * invisible to the vm, so the page can not be migrated.  So try to
>> +	 * free the metadata, so the page can be freed.
>> +	 */
>> +	if (!page->mapping) {
>> +		VM_BUG_ON_PAGE(PageAnon(page), page);
>> +		if (page_has_private(page)) {
>> +			try_to_free_buffers(page);
>> +			goto out_unlock_both;
>> +		}
>> +	} else {
>> +		VM_BUG_ON_PAGE(!page_mapped(page), page);
>> +		/* Establish migration ptes */
>> +		VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !*anon_vma,
>> +				page);
>> +		rc = try_to_unmap(page,
>> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>> +	}
>> +
>> +	return rc;
>> +
>> +out_unlock_both:
>> +	unlock_page(newpage);
>> +out_unlock:
>> +	/* Drop an anon_vma reference if we took one */
>> +	if (*anon_vma)
>> +		put_anon_vma(*anon_vma);
>> +	unlock_page(page);
>> +out:
>> +	return rc;
>> +}
>> +
>> +static int unmap_pages_and_get_new_concur(new_page_t get_new_page,
>> +				free_page_t put_new_page, unsigned long private,
>> +				struct page_migration_work_item *item,
>> +				int force,
>> +				enum migrate_mode mode, int reason)
>
> Here many duplicates too, but you give struct page_migration_work_item
> as an argument, so this duplicates might be OK.
>
>> +{
>> +	int rc = MIGRATEPAGE_SUCCESS;
>> +	int *result = NULL;
>> +
>> +
>> +	item->new_page = get_new_page(item->old_page, private, &result);
>> +
>> +	if (!item->new_page) {
>> +		rc = -ENOMEM;
>> +		return rc;
>> +	}
>> +
>> +	if (page_count(item->old_page) == 1) {
>> +		rc = -ECANCELED;
>> +		goto out;
>> +	}
>> +
>> +	if (unlikely(PageTransHuge(item->old_page) &&
>> +		!PageTransHuge(item->new_page))) {
>> +		lock_page(item->old_page);
>> +		rc = split_huge_page(item->old_page);
>> +		unlock_page(item->old_page);
>> +		if (rc)
>> +			goto out;
>> +	}
>> +
>> +	rc = __unmap_page_concur(item->old_page, item->new_page, &item->anon_vma,
>> +							force, mode);
>> +	if (rc == MIGRATEPAGE_SUCCESS) {
>> +		put_new_page = NULL;
>> +		return rc;
>> +	}
>> +
>> +out:
>> +	if (rc != -EAGAIN) {
>> +		list_del(&item->old_page->lru);
>> +		dec_zone_page_state(item->old_page, NR_ISOLATED_ANON +
>> +				page_is_file_cache(item->old_page));
>> +
>> +		putback_lru_page(item->old_page);
>> +	}
>> +
>> +	/*
>> +	 * If migration was not successful and there's a freeing callback, use
>> +	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
>> +	 * during isolation.
>> +	 */
>> +	if (put_new_page)
>> +		put_new_page(item->new_page, private);
>> +	else
>> +		putback_lru_page(item->new_page);
>> +
>> +	if (result) {
>> +		if (rc)
>> +			*result = rc;
>> +		else
>> +			*result = page_to_nid(item->new_page);
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static int move_mapping_concurr(struct list_head *unmapped_list_ptr,
>> +					   struct list_head *wip_list_ptr,
>> +					   enum migrate_mode mode)
>> +{
>> +	struct page_migration_work_item *iterator, *iterator2;
>> +	struct address_space *mapping;
>> +
>> +	list_for_each_entry_safe(iterator, iterator2, unmapped_list_ptr, list) {
>> +		VM_BUG_ON_PAGE(!PageLocked(iterator->old_page), iterator->old_page);
>> +		VM_BUG_ON_PAGE(!PageLocked(iterator->new_page), iterator->new_page);
>> +
>> +		mapping = page_mapping(iterator->old_page);
>> +
>> +		VM_BUG_ON(mapping);
>> +
>> +		VM_BUG_ON(PageWriteback(iterator->old_page));
>> +
>> +		if (page_count(iterator->old_page) != 1) {
>> +			list_move(&iterator->list, wip_list_ptr);
>> +			continue;
>> +		}
>> +
>> +		iterator->new_page->index = iterator->old_page->index;
>> +		iterator->new_page->mapping = iterator->old_page->mapping;
>> +		if (PageSwapBacked(iterator->old_page))
>> +			SetPageSwapBacked(iterator->new_page);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void migrate_page_copy_page_flags(struct page *newpage, struct page *page)
>
> This function is nearly identical with migrate_page_copy(), so please make
> it call this function inside it.

Sure.

>
>> +{
>> +	int cpupid;
>> +
>> +	if (PageError(page))
>> +		SetPageError(newpage);
>> +	if (PageReferenced(page))
>> +		SetPageReferenced(newpage);
>> +	if (PageUptodate(page))
>> +		SetPageUptodate(newpage);
>> +	if (TestClearPageActive(page)) {
>> +		VM_BUG_ON_PAGE(PageUnevictable(page), page);
>> +		SetPageActive(newpage);
>> +	} else if (TestClearPageUnevictable(page))
>> +		SetPageUnevictable(newpage);
>> +	if (PageChecked(page))
>> +		SetPageChecked(newpage);
>> +	if (PageMappedToDisk(page))
>> +		SetPageMappedToDisk(newpage);
>> +
>> +	/* Move dirty on pages not done by migrate_page_move_mapping() */
>> +	if (PageDirty(page))
>> +		SetPageDirty(newpage);
>> +
>> +	if (page_is_young(page))
>> +		set_page_young(newpage);
>> +	if (page_is_idle(page))
>> +		set_page_idle(newpage);
>> +
>> +	/*
>> +	 * Copy NUMA information to the new page, to prevent over-eager
>> +	 * future migrations of this same page.
>> +	 */
>> +	cpupid = page_cpupid_xchg_last(page, -1);
>> +	page_cpupid_xchg_last(newpage, cpupid);
>> +
>> +	ksm_migrate_page(newpage, page);
>> +	/*
>> +	 * Please do not reorder this without considering how mm/ksm.c's
>> +	 * get_ksm_page() depends upon ksm_migrate_page() and PageSwapCache().
>> +	 */
>> +	if (PageSwapCache(page))
>> +		ClearPageSwapCache(page);
>> +	ClearPagePrivate(page);
>> +	set_page_private(page, 0);
>> +
>> +	/*
>> +	 * If any waiters have accumulated on the new page then
>> +	 * wake them up.
>> +	 */
>> +	if (PageWriteback(newpage))
>> +		end_page_writeback(newpage);
>> +
>> +	copy_page_owner(page, newpage);
>> +
>> +	mem_cgroup_migrate(page, newpage);
>> +}
>> +
>> +
>> +static int copy_to_new_pages_concur(struct list_head *unmapped_list_ptr,
>> +				enum migrate_mode mode)
>> +{
>> +	struct page_migration_work_item *iterator;
>> +	int num_pages = 0, idx = 0;
>> +	struct page **src_page_list = NULL, **dst_page_list = NULL;
>> +	unsigned long size = 0;
>> +	int rc = -EFAULT;
>> +
>> +	list_for_each_entry(iterator, unmapped_list_ptr, list) {
>> +		++num_pages;
>> +		size += PAGE_SIZE * hpage_nr_pages(iterator->old_page);
>> +	}
>> +
>> +	src_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
>> +	if (!src_page_list)
>> +		return -ENOMEM;
>> +	dst_page_list = kzalloc(sizeof(struct page *)*num_pages, GFP_KERNEL);
>> +	if (!dst_page_list)
>> +		return -ENOMEM;
>> +
>> +	list_for_each_entry(iterator, unmapped_list_ptr, list) {
>> +		src_page_list[idx] = iterator->old_page;
>> +		dst_page_list[idx] = iterator->new_page;
>> +		++idx;
>> +	}
>> +
>> +	BUG_ON(idx != num_pages);
>> +
>> +	if (mode & MIGRATE_MT)
>
> just my guessing, you mean MIGRATE_CONCUR?

No. This is for multi-threaded case. As I mentioned above,
MIGRATE_CONCUR will be used to select migrate_pages_sequential()
and migrate_pages_concur()

>
>> +		rc = copy_page_lists_mthread(dst_page_list, src_page_list,
>> +							num_pages);
>> +
>> +	if (rc)
>> +		list_for_each_entry(iterator, unmapped_list_ptr, list) {
>> +			if (PageHuge(iterator->old_page) ||
>> +				PageTransHuge(iterator->old_page))
>> +				copy_huge_page(iterator->new_page, iterator->old_page, 0);
>> +			else
>> +				copy_highpage(iterator->new_page, iterator->old_page);
>> +		}
>> +
>> +	kfree(src_page_list);
>> +	kfree(dst_page_list);
>> +
>> +	list_for_each_entry(iterator, unmapped_list_ptr, list) {
>> +		migrate_page_copy_page_flags(iterator->new_page, iterator->old_page);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int remove_migration_ptes_concurr(struct list_head *unmapped_list_ptr)
>> +{
>> +	struct page_migration_work_item *iterator, *iterator2;
>> +
>> +	list_for_each_entry_safe(iterator, iterator2, unmapped_list_ptr, list) {
>> +		remove_migration_ptes(iterator->old_page, iterator->new_page, false);
>> +
>> +		unlock_page(iterator->new_page);
>> +
>> +		if (iterator->anon_vma)
>> +			put_anon_vma(iterator->anon_vma);
>> +
>> +		unlock_page(iterator->old_page);
>> +
>> +		list_del(&iterator->old_page->lru);
>> +		dec_zone_page_state(iterator->old_page, NR_ISOLATED_ANON +
>> +				page_is_file_cache(iterator->old_page));
>> +
>> +		putback_lru_page(iterator->old_page);
>> +		iterator->old_page = NULL;
>> +
>> +		putback_lru_page(iterator->new_page);
>> +		iterator->new_page = NULL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int migrate_pages_concur(struct list_head *from, new_page_t get_new_page,
>> +		free_page_t put_new_page, unsigned long private,
>> +		enum migrate_mode mode, int reason)
>> +{
>> +	int retry = 1;
>> +	int nr_failed = 0;
>> +	int nr_succeeded = 0;
>> +	int pass = 0;
>> +	struct page *page;
>> +	int swapwrite = current->flags & PF_SWAPWRITE;
>> +	int rc;
>> +	int total_num_pages = 0, idx;
>> +	struct page_migration_work_item *item_list;
>> +	struct page_migration_work_item *iterator, *iterator2;
>> +	int item_list_order = 0;
>> +
>> +	LIST_HEAD(wip_list);
>> +	LIST_HEAD(unmapped_list);
>> +	LIST_HEAD(serialized_list);
>> +	LIST_HEAD(failed_list);
>> +
>> +	if (!swapwrite)
>> +		current->flags |= PF_SWAPWRITE;
>> +
>> +	list_for_each_entry(page, from, lru)
>> +		++total_num_pages;
>> +
>> +	item_list_order = get_order(total_num_pages *
>> +		sizeof(struct page_migration_work_item));
>> +
>> +	if (item_list_order > MAX_ORDER) {
>> +		item_list = alloc_pages_exact(total_num_pages *
>> +			sizeof(struct page_migration_work_item), GFP_ATOMIC);
>> +		memset(item_list, 0, total_num_pages *
>> +			sizeof(struct page_migration_work_item));
>> +	} else {
>> +		item_list = (struct page_migration_work_item *)__get_free_pages(GFP_ATOMIC,
>> +						item_list_order);
>
> The allocation could fail, so error handling is needed here.
>

Got it.

>> +		memset(item_list, 0, PAGE_SIZE<<item_list_order);
>> +	}
>> +
>> +	idx = 0;
>> +	list_for_each_entry(page, from, lru) {
>> +		item_list[idx].old_page = page;
>> +		item_list[idx].new_page = NULL;
>> +		INIT_LIST_HEAD(&item_list[idx].list);
>> +		list_add_tail(&item_list[idx].list, &wip_list);
>> +		idx += 1;
>> +	}
>
> At this point, all migration target pages are moved to wip_list, so
> the list 'from' (passed from and returned back to the caller) becomes empty.
> When all migration trial are done and there still remain pages on wip_list
> and/or serialized_list, the remaining pages should be moved back to 'from'.

Right. Thanks for pointing this out.



>
>> +
>> +	for(pass = 0; pass < 1 && retry; pass++) {
>> +		retry = 0;
>> +
>> +		/* unmap and get new page for page_mapping(page) == NULL */
>> +		list_for_each_entry_safe(iterator, iterator2, &wip_list, list) {
>> +			cond_resched();
>> +
>> +			if (iterator->new_page)
>> +				continue;
>> +
>> +			/* We do not migrate huge pages, file-backed, or swapcached pages */
>
> Just "huge page" are confusing, maybe you mean hugetlb.

Right. Will change it to hugetlb.

>
>> +			if (PageHuge(iterator->old_page))
>> +				rc = -ENODEV;
>> +			else if ((page_mapping(iterator->old_page) != NULL))
>> +				rc = -ENODEV;
>> +			else
>> +				rc = unmap_pages_and_get_new_concur(get_new_page, put_new_page,
>> +						private, iterator, pass > 2, mode,
>> +						reason);
>> +
>> +			switch(rc) {
>> +			case -ENODEV:
>> +				list_move(&iterator->list, &serialized_list);
>> +				break;
>> +			case -ENOMEM:
>> +				goto out;
>> +			case -EAGAIN:
>> +				retry++;
>> +				break;
>> +			case MIGRATEPAGE_SUCCESS:
>> +				list_move(&iterator->list, &unmapped_list);
>> +				nr_succeeded++;
>> +				break;
>> +			default:
>> +				/*
>> +				 * Permanent failure (-EBUSY, -ENOSYS, etc.):
>> +				 * unlike -EAGAIN case, the failed page is
>> +				 * removed from migration page list and not
>> +				 * retried in the next outer loop.
>> +				 */
>> +				list_move(&iterator->list, &failed_list);
>> +				nr_failed++;
>> +				break;
>> +			}
>> +		}
>> +		/* move page->mapping to new page, only -EAGAIN could happen  */
>> +		move_mapping_concurr(&unmapped_list, &wip_list, mode);
>> +		/* copy pages in unmapped_list */
>> +		copy_to_new_pages_concur(&unmapped_list, mode);
>> +		/* remove migration pte, if old_page is NULL?, unlock old and new
>> +		 * pages, put anon_vma, put old and new pages */
>> +		remove_migration_ptes_concurr(&unmapped_list);
>> +	}
>> +	nr_failed += retry;
>> +	rc = nr_failed;
>> +
>> +	if (!list_empty(&serialized_list))
>> +		rc = migrate_pages(from, get_new_page, put_new_page,
>
> You should give &serialized_list instead of from, right?

Right. Thanks.


>
> Thanks,
> Naoya Horiguchi
>
>> +				private, mode, reason);
>> +out:
>> +	if (nr_succeeded)
>> +		count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
>> +	if (nr_failed)
>> +		count_vm_events(PGMIGRATE_FAIL, nr_failed);
>> +	trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
>> +
>> +	if (item_list_order >= MAX_ORDER)
>> +		free_pages_exact(item_list, total_num_pages *
>> +			sizeof(struct page_migration_work_item));
>> +	else
>> +		free_pages((unsigned long)item_list, item_list_order);
>> +
>> +	if (!swapwrite)
>> +		current->flags &= ~PF_SWAPWRITE;
>> +
>> +	return rc;
>> +}
>> +
>>  /*
>>   * migrate_pages - migrate the pages specified in a list, to the free pages
>>   *		   supplied as the target for the page migration
>> @@ -1452,7 +1925,8 @@ static struct page *new_page_node(struct page *p, unsigned long private,
>>  static int do_move_page_to_node_array(struct mm_struct *mm,
>>  				      struct page_to_node *pm,
>>  				      int migrate_all,
>> -					  int migrate_use_mt)
>> +					  int migrate_use_mt,
>> +					  int migrate_concur)
>>  {
>>  	int err;
>>  	struct page_to_node *pp;
>> @@ -1536,8 +2010,16 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
>>
>>  	err = 0;
>>  	if (!list_empty(&pagelist)) {
>> -		err = migrate_pages(&pagelist, new_page_node, NULL,
>> -				(unsigned long)pm, mode, MR_SYSCALL);
>> +		if (migrate_concur)
>> +			err = migrate_pages_concur(&pagelist, new_page_node, NULL,
>> +					(unsigned long)pm,
>> +					mode,
>> +					MR_SYSCALL);
>> +		else
>> +			err = migrate_pages(&pagelist, new_page_node, NULL,
>> +					(unsigned long)pm,
>> +					mode,
>> +					MR_SYSCALL);
>>  		if (err)
>>  			putback_movable_pages(&pagelist);
>>  	}
>> @@ -1615,7 +2097,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>  		/* Migrate this chunk */
>>  		err = do_move_page_to_node_array(mm, pm,
>>  						 flags & MPOL_MF_MOVE_ALL,
>> -						 flags & MPOL_MF_MOVE_MT);
>> +						 flags & MPOL_MF_MOVE_MT,
>> +						 flags & MPOL_MF_MOVE_CONCUR);
>>  		if (err < 0)
>>  			goto out_pm;
>>
>> @@ -1722,7 +2205,9 @@ SYSCALL_DEFINE6(move_pages, pid_t, pid, unsigned long, nr_pages,
>>  	nodemask_t task_nodes;
>>
>>  	/* Check flags */
>> -	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|MPOL_MF_MOVE_MT))
>> +	if (flags & ~(MPOL_MF_MOVE|MPOL_MF_MOVE_ALL|
>> +				  MPOL_MF_MOVE_MT|
>> +				  MPOL_MF_MOVE_CONCUR))
>>  		return -EINVAL;
>>
>>  	if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE))
>> --
>> 2.11.0
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
Best Regards
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function
  2017-02-23  8:02       ` Naoya Horiguchi
@ 2017-03-09  5:35         ` Anshuman Khandual
  0 siblings, 0 replies; 25+ messages in thread
From: Anshuman Khandual @ 2017-03-09  5:35 UTC (permalink / raw)
  To: Naoya Horiguchi, Anshuman Khandual
  Cc: Zi Yan, linux-mm, dnellans, apopple, paulmck, zi.yan

On 02/23/2017 01:32 PM, Naoya Horiguchi wrote:
> On Thu, Feb 23, 2017 at 01:20:16PM +0530, Anshuman Khandual wrote:
> ...
>>>
>>>> +
>>>> +	cthreads = nr_copythreads;
>>>> +	cthreads = min_t(unsigned int, cthreads, cpumask_weight(cpumask));
>>>
>>> nitpick, but looks a little wordy, can it be simply like below?
>>>
>>>   cthreads = min_t(unsigned int, nr_copythreads, cpumask_weight(cpumask));
>>>
>>>> +	cthreads = (cthreads / 2) * 2;
>>>
>>> I'm not sure the intention here. # of threads should be even number?
>>
>> Yes.
>>
>>> If cpumask_weight() is 1, cthreads is 0, that could cause zero division.
>>> So you had better making sure to prevent it.
>>
>> If cpumask_weight() is 1, then min_t(unsigned int, 8, 1) should be
>> greater that equal to 1. Then cthreads can end up in 0. That is
>> possible. But how there is a chance of zero division ? 
> 
> Hi Anshuman,
> 
> I just thought like above when reading the line your patch introduces:
> 
>        chunk_size = PAGE_SIZE * nr_pages / cthreads
>                                            ~~~~~~~~
>                                            (this can be 0?)

Right cthreads can be 0. I am changing like this.

cthreads = min_t(unsigned int, NR_COPYTHREADS, cpumask_weight(cpumask));
cthreads = (cthreads / 2) * 2;
if (!cthreads)
      cthreads = 1;

In the first two statements cthreads can be 0 if cpumask_weight() turns
to be 1 or 0 in which case we force it to be 1. Then with this

        i = 0;
        for_each_cpu(cpu, cpumask) {
                if (i >= cthreads)
                        break;
                cpu_id_list[i] = cpu;
                ++i;
        }

cpu_id_list[0] will have the single cpu (in case cpumask of the node has
a cpu) or it will have 0 in case its memory only cpu less node. In both
cases the page copy happens in single threaded manner. This also removes
the possibility of divide by zero scenario here.

chunk_size = PAGE_SIZE * nr_pages / cthreads;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function.
  2017-02-23  8:54   ` Naoya Horiguchi
@ 2017-03-09 13:02     ` Anshuman Khandual
  0 siblings, 0 replies; 25+ messages in thread
From: Anshuman Khandual @ 2017-03-09 13:02 UTC (permalink / raw)
  To: Naoya Horiguchi, Zi Yan
  Cc: linux-mm, dnellans, apopple, paulmck, khandual, zi.yan

On 02/23/2017 02:24 PM, Naoya Horiguchi wrote:
> On Fri, Feb 17, 2017 at 10:05:44AM -0500, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> It supports copying a list of pages via multi-threaded process.
>> It evenly distributes a list of pages to a group of threads and
>> uses the same subroutine as copy_page_mthread()
> The new function has many duplicate lines with copy_page_mthread(),
> so please consider factoring out them into a common routine.
> That makes your code more readable/maintainable.

Though it looks very similar to each other. There are some
subtle differences which makes it harder to factor them out
in common functions.

int copy_pages_mthread(struct page *to, struct page *from, int nr_pages)

* This takes a single source page and single destination
  page and copies contiguous address data between these
  two pages. The size of the copy can be a single page
  for normal page or it can be multi pages if its a huge
  page.

* The work is split into PAGE_SIZE * nr_pages / threads and
  assigned to individual threads which is decided based on
  number of CPUs present on the target node. A single thread
  takes a single work queue job and executes it.

int copy_page_list_mt(struct page **to, struct page **from, int nr_pages)

* This takes multiple source pages and multiple destination
  pages and copies contiguous address data between two pages
  in a single work queue job. The size of the copy is decided
  based on type of page whether normal or huge.

* Each job does a single copy of a source page to destination
  page and we create as many jobs as number of pages though
  they are assigned to number of thread based on the number
  of CPUs present on the destination node. So one CPU can
  get more than one page copy job scheduled.

- Anshuman

 

  


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-03-09 13:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 15:05 [RFC PATCH 00/14] Accelerating page migrations Zi Yan
2017-02-17 15:05 ` [RFC PATCH 01/14] mm/migrate: Add new mode parameter to migrate_page_copy() function Zi Yan
2017-02-17 15:05 ` [RFC PATCH 02/14] mm/migrate: Make migrate_mode types non-exclussive Zi Yan
2017-02-17 15:05 ` [RFC PATCH 03/14] mm/migrate: Add copy_pages_mthread function Zi Yan
2017-02-23  6:06   ` Naoya Horiguchi
2017-02-23  7:50     ` Anshuman Khandual
2017-02-23  8:02       ` Naoya Horiguchi
2017-03-09  5:35         ` Anshuman Khandual
2017-02-17 15:05 ` [RFC PATCH 04/14] mm/migrate: Add new migrate mode MIGRATE_MT Zi Yan
2017-02-23  6:54   ` Naoya Horiguchi
2017-02-23  7:54     ` Anshuman Khandual
2017-02-17 15:05 ` [RFC PATCH 05/14] mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls Zi Yan
2017-02-17 15:05 ` [RFC PATCH 06/14] sysctl: Add global tunable mt_page_copy Zi Yan
2017-02-17 15:05 ` [RFC PATCH 07/14] migrate: Add copy_page_lists_mthread() function Zi Yan
2017-02-23  8:54   ` Naoya Horiguchi
2017-03-09 13:02     ` Anshuman Khandual
2017-02-17 15:05 ` [RFC PATCH 08/14] mm: migrate: Add concurrent page migration into move_pages syscall Zi Yan
2017-02-24  8:25   ` Naoya Horiguchi
2017-02-24 15:05     ` Zi Yan
2017-02-17 15:05 ` [RFC PATCH 09/14] mm: migrate: Add exchange_page_mthread() and exchange_page_lists_mthread() to exchange two pages or two page lists Zi Yan
2017-02-17 15:05 ` [RFC PATCH 10/14] mm: Add exchange_pages and exchange_pages_concur functions to exchange two lists of pages instead of two migrate_pages() Zi Yan
2017-02-17 15:05 ` [RFC PATCH 11/14] mm: migrate: Add exchange_pages syscall to exchange two page lists Zi Yan
2017-02-17 15:05 ` [RFC PATCH 12/14] migrate: Add copy_page_dma to use DMA Engine to copy pages Zi Yan
2017-02-17 15:05 ` [RFC PATCH 13/14] mm: migrate: Add copy_page_dma into migrate_page_copy Zi Yan
2017-02-17 15:05 ` [RFC PATCH 14/14] mm: Add copy_page_lists_dma_always to support copy a list of pages Zi Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.