linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/15] HMM anonymous memory migration.
@ 2015-08-13 19:37 Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 01/15] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Linda Wang, Kevin E Martin, Jeff Law,
	Or Gerlitz, Sagi Grimberg

Minor fixes since last post (1), apply on top of rc6 done
that because conflict in infiniband are harder to solve then
conflict with mm tree.

Tree with the patchset:
git://people.freedesktop.org/~glisse/linux hmm-v10 branch

This part of the patchset implement anonymous memory migration.
It allows to migrate anonymous memory seamlessly to device
memory and to have migration back to system memory if CPU try
to access migrated memory.

For the rational behind HMM please refer to core HMM patchset

https://lkml.org/lkml/2015/8/13/623

Cheers,
Jérôme

To: "Andrew Morton" <akpm@linux-foundation.org>,
To: <linux-kernel@vger.kernel.org>,
To: linux-mm <linux-mm@kvack.org>,
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
Cc: "Mel Gorman" <mgorman@suse.de>,
Cc: "H. Peter Anvin" <hpa@zytor.com>,
Cc: "Peter Zijlstra" <peterz@infradead.org>,
Cc: "Linda Wang" <lwang@redhat.com>,
Cc: "Kevin E Martin" <kem@redhat.com>,
Cc: "Andrea Arcangeli" <aarcange@redhat.com>,
Cc: "Johannes Weiner" <jweiner@redhat.com>,
Cc: "Larry Woodman" <lwoodman@redhat.com>,
Cc: "Rik van Riel" <riel@redhat.com>,
Cc: "Dave Airlie" <airlied@redhat.com>,
Cc: "Jeff Law" <law@redhat.com>,
Cc: "Brendan Conoboy" <blc@redhat.com>,
Cc: "Joe Donohue" <jdonohue@redhat.com>,
Cc: "Christophe Harle" <charle@nvidia.com>,
Cc: "Duncan Poole" <dpoole@nvidia.com>,
Cc: "Sherry Cheung" <SCheung@nvidia.com>,
Cc: "Subhash Gutti" <sgutti@nvidia.com>,
Cc: "John Hubbard" <jhubbard@nvidia.com>,
Cc: "Mark Hairgrove" <mhairgrove@nvidia.com>,
Cc: "Lucien Dunning" <ldunning@nvidia.com>,
Cc: "Cameron Buschardt" <cabuschardt@nvidia.com>,
Cc: "Arvind Gopalakrishnan" <arvindg@nvidia.com>,
Cc: "Haggai Eran" <haggaie@mellanox.com>,
Cc: "Or Gerlitz" <ogerlitz@mellanox.com>,
Cc: "Sagi Grimberg" <sagig@mellanox.com>
Cc: "Shachar Raindel" <raindel@mellanox.com>,
Cc: "Liran Liss" <liranl@mellanox.com>,
Cc: "Roland Dreier" <roland@purestorage.com>,
Cc: "Sander, Ben" <ben.sander@amd.com>,
Cc: "Stoner, Greg" <Greg.Stoner@amd.com>,
Cc: "Bridgman, John" <John.Bridgman@amd.com>,
Cc: "Mantor, Michael" <Michael.Mantor@amd.com>,
Cc: "Blinzer, Paul" <Paul.Blinzer@amd.com>,
Cc: "Morichetti, Laurent" <Laurent.Morichetti@amd.com>,
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
Cc: "Leonid Shamis" <Leonid.Shamis@amd.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/15] fork: pass the dst vma to copy_page_range() and its sub-functions.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 02/15] HMM: add special swap filetype for memory migrated to device v2 Jérôme Glisse
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

For HMM we will need to resort to the old way of allocating new page
for anonymous memory when that anonymous memory have been migrated
to device memory.

This does not impact any process that do not use HMM through some
device driver. Only process that migrate anonymous memory to device
memory with HMM will have to copy migrated page on fork.

We do not expect this to be a common or advised thing to do so we
resort to the simpler solution of allocating new page. If this kind
of usage turns out to be important we will revisit way to achieve
COW even for remote memory.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h |  5 +++--
 kernel/fork.c      |  2 +-
 mm/memory.c        | 33 +++++++++++++++++++++------------
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b5bf210..580fe65 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1124,8 +1124,9 @@ int walk_page_range(unsigned long addr, unsigned long end,
 int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk);
 void free_pgd_range(struct mmu_gather *tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
-int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
-			struct vm_area_struct *vma);
+int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
+		    struct vm_area_struct *dst_vma,
+		    struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
 		loff_t const holebegin, loff_t const holelen, int even_cows);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
diff --git a/kernel/fork.c b/kernel/fork.c
index bf2dcb6..2d32a4b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -497,7 +497,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 		rb_parent = &tmp->vm_rb;
 
 		mm->map_count++;
-		retval = copy_page_range(mm, oldmm, mpnt);
+		retval = copy_page_range(mm, oldmm, tmp, mpnt);
 
 		if (tmp->vm_ops && tmp->vm_ops->open)
 			tmp->vm_ops->open(tmp);
diff --git a/mm/memory.c b/mm/memory.c
index d784e35..71b5c35 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -885,8 +885,10 @@ out_set_pte:
 }
 
 static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		   pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
-		   unsigned long addr, unsigned long end)
+			  pmd_t *dst_pmd, pmd_t *src_pmd,
+			  struct vm_area_struct *dst_vma,
+			  struct vm_area_struct *vma,
+			  unsigned long addr, unsigned long end)
 {
 	pte_t *orig_src_pte, *orig_dst_pte;
 	pte_t *src_pte, *dst_pte;
@@ -947,9 +949,12 @@ again:
 	return 0;
 }
 
-static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pud_t *dst_pud, pud_t *src_pud, struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end)
+static inline int copy_pmd_range(struct mm_struct *dst_mm,
+				 struct mm_struct *src_mm,
+				 pud_t *dst_pud, pud_t *src_pud,
+				 struct vm_area_struct *dst_vma,
+				 struct vm_area_struct *vma,
+				 unsigned long addr, unsigned long end)
 {
 	pmd_t *src_pmd, *dst_pmd;
 	unsigned long next;
@@ -974,15 +979,18 @@ static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src
 		if (pmd_none_or_clear_bad(src_pmd))
 			continue;
 		if (copy_pte_range(dst_mm, src_mm, dst_pmd, src_pmd,
-						vma, addr, next))
+				   dst_vma, vma, addr, next))
 			return -ENOMEM;
 	} while (dst_pmd++, src_pmd++, addr = next, addr != end);
 	return 0;
 }
 
-static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pgd_t *dst_pgd, pgd_t *src_pgd, struct vm_area_struct *vma,
-		unsigned long addr, unsigned long end)
+static inline int copy_pud_range(struct mm_struct *dst_mm,
+				 struct mm_struct *src_mm,
+				 pgd_t *dst_pgd, pgd_t *src_pgd,
+				 struct vm_area_struct *dst_vma,
+				 struct vm_area_struct *vma,
+				 unsigned long addr, unsigned long end)
 {
 	pud_t *src_pud, *dst_pud;
 	unsigned long next;
@@ -996,14 +1004,15 @@ static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src
 		if (pud_none_or_clear_bad(src_pud))
 			continue;
 		if (copy_pmd_range(dst_mm, src_mm, dst_pud, src_pud,
-						vma, addr, next))
+				   dst_vma, vma, addr, next))
 			return -ENOMEM;
 	} while (dst_pud++, src_pud++, addr = next, addr != end);
 	return 0;
 }
 
 int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		struct vm_area_struct *vma)
+		    struct vm_area_struct *dst_vma,
+		    struct vm_area_struct *vma)
 {
 	pgd_t *src_pgd, *dst_pgd;
 	unsigned long next;
@@ -1057,7 +1066,7 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (pgd_none_or_clear_bad(src_pgd))
 			continue;
 		if (unlikely(copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
-					    vma, addr, next))) {
+					    dst_vma, vma, addr, next))) {
 			ret = -ENOMEM;
 			break;
 		}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/15] HMM: add special swap filetype for memory migrated to device v2.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 01/15] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 03/15] HMM: add new HMM page table flag (valid device memory) Jérôme Glisse
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jerome Glisse, Jatin Kumar

From: Jerome Glisse <jglisse@redhat.com>

When migrating anonymous memory from system memory to device memory
CPU pte are replaced with special HMM swap entry so that page fault,
get user page (gup), fork, ... are properly redirected to HMM helpers.

This patch only add the new swap type entry and hooks HMM helpers
functions inside the page fault and fork code path.

Changed since v1:
  - Fix name when of HMM CPU page fault function.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
---
 include/linux/hmm.h     | 34 ++++++++++++++++++++++++++++++++++
 include/linux/swap.h    | 13 ++++++++++++-
 include/linux/swapops.h | 43 ++++++++++++++++++++++++++++++++++++++++++-
 mm/hmm.c                | 21 +++++++++++++++++++++
 mm/memory.c             | 22 ++++++++++++++++++++++
 5 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 4bc132a..7c66513 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -272,6 +272,40 @@ void hmm_mirror_range_dirty(struct hmm_mirror *mirror,
 			    unsigned long start,
 			    unsigned long end);
 
+int hmm_handle_cpu_fault(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			pmd_t *pmdp, unsigned long addr,
+			unsigned flags, pte_t orig_pte);
+
+int hmm_mm_fork(struct mm_struct *src_mm,
+		struct mm_struct *dst_mm,
+		struct vm_area_struct *dst_vma,
+		pmd_t *dst_pmd,
+		unsigned long start,
+		unsigned long end);
+
+#else /* CONFIG_HMM */
+
+static inline int hmm_handle_cpu_fault(struct mm_struct *mm,
+				       struct vm_area_struct *vma,
+				       pmd_t *pmdp, unsigned long addr,
+				       unsigned flags, pte_t orig_pte)
+{
+	return VM_FAULT_SIGBUS;
+}
+
+static inline int hmm_mm_fork(struct mm_struct *src_mm,
+			      struct mm_struct *dst_mm,
+			      struct vm_area_struct *dst_vma,
+			      pmd_t *dst_pmd,
+			      unsigned long start,
+			      unsigned long end)
+{
+	BUG();
+	return -ENOMEM;
+}
 
 #endif /* CONFIG_HMM */
+
+
 #endif
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 3887472..f98053b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -70,8 +70,19 @@ static inline int current_is_kswapd(void)
 #define SWP_HWPOISON_NUM 0
 #endif
 
+/*
+ * HMM (heterogeneous memory management) used when data is in remote memory.
+ */
+#ifdef CONFIG_HMM
+#define SWP_HMM_NUM 1
+#define SWP_HMM		(MAX_SWAPFILES + SWP_MIGRATION_NUM + SWP_HWPOISON_NUM)
+#else
+#define SWP_HMM_NUM 0
+#endif
+
 #define MAX_SWAPFILES \
-	((1 << MAX_SWAPFILES_SHIFT) - SWP_MIGRATION_NUM - SWP_HWPOISON_NUM)
+	((1 << MAX_SWAPFILES_SHIFT) - SWP_MIGRATION_NUM - \
+	 SWP_HWPOISON_NUM - SWP_HMM_NUM)
 
 /*
  * Magic header for a swap area. The first part of the union is
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index cedf3d3..934359f 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -190,7 +190,7 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
 }
 #endif
 
-#if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION)
+#if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION) || defined(CONFIG_HMM)
 static inline int non_swap_entry(swp_entry_t entry)
 {
 	return swp_type(entry) >= MAX_SWAPFILES;
@@ -202,4 +202,45 @@ static inline int non_swap_entry(swp_entry_t entry)
 }
 #endif
 
+#ifdef CONFIG_HMM
+static inline swp_entry_t make_hmm_entry(void)
+{
+	/* We do not store anything inside the CPU page table entry (pte). */
+	return swp_entry(SWP_HMM, 0);
+}
+
+static inline swp_entry_t make_hmm_entry_locked(void)
+{
+	/* We do not store anything inside the CPU page table entry (pte). */
+	return swp_entry(SWP_HMM, 1);
+}
+
+static inline swp_entry_t make_hmm_entry_poisonous(void)
+{
+	/* We do not store anything inside the CPU page table entry (pte). */
+	return swp_entry(SWP_HMM, 2);
+}
+
+static inline int is_hmm_entry(swp_entry_t entry)
+{
+	return (swp_type(entry) == SWP_HMM);
+}
+
+static inline int is_hmm_entry_locked(swp_entry_t entry)
+{
+	return (swp_type(entry) == SWP_HMM) && (swp_offset(entry) == 1);
+}
+
+static inline int is_hmm_entry_poisonous(swp_entry_t entry)
+{
+	return (swp_type(entry) == SWP_HMM) && (swp_offset(entry) == 2);
+}
+#else /* CONFIG_HMM */
+static inline int is_hmm_entry(swp_entry_t swp)
+{
+	return 0;
+}
+#endif /* CONFIG_HMM */
+
+
 #endif /* _LINUX_SWAPOPS_H */
diff --git a/mm/hmm.c b/mm/hmm.c
index e5d5f29..d44c54f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -416,6 +416,27 @@ static struct mmu_notifier_ops hmm_notifier_ops = {
 };
 
 
+int hmm_handle_cpu_fault(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			pmd_t *pmdp, unsigned long addr,
+			unsigned flags, pte_t orig_pte)
+{
+	return VM_FAULT_SIGBUS;
+}
+EXPORT_SYMBOL(hmm_handle_cpu_fault);
+
+int hmm_mm_fork(struct mm_struct *src_mm,
+		struct mm_struct *dst_mm,
+		struct vm_area_struct *dst_vma,
+		pmd_t *dst_pmd,
+		unsigned long start,
+		unsigned long end)
+{
+	return -ENOMEM;
+}
+EXPORT_SYMBOL(hmm_mm_fork);
+
+
 struct mm_pt_iter {
 	struct mm_struct	*mm;
 	pte_t			*ptep;
diff --git a/mm/memory.c b/mm/memory.c
index 71b5c35..33994a7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -53,6 +53,7 @@
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
 #include <linux/mmu_notifier.h>
+#include <linux/hmm.h>
 #include <linux/kallsyms.h>
 #include <linux/swapops.h>
 #include <linux/elf.h>
@@ -893,9 +894,11 @@ static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	pte_t *orig_src_pte, *orig_dst_pte;
 	pte_t *src_pte, *dst_pte;
 	spinlock_t *src_ptl, *dst_ptl;
+	unsigned cnt_hmm_entry = 0;
 	int progress = 0;
 	int rss[NR_MM_COUNTERS];
 	swp_entry_t entry = (swp_entry_t){0};
+	unsigned long start;
 
 again:
 	init_rss_vec(rss);
@@ -909,6 +912,7 @@ again:
 	orig_src_pte = src_pte;
 	orig_dst_pte = dst_pte;
 	arch_enter_lazy_mmu_mode();
+	start = addr;
 
 	do {
 		/*
@@ -925,6 +929,12 @@ again:
 			progress++;
 			continue;
 		}
+		if (unlikely(!pte_present(*src_pte))) {
+			entry = pte_to_swp_entry(*src_pte);
+
+			if (is_hmm_entry(entry))
+				cnt_hmm_entry++;
+		}
 		entry.val = copy_one_pte(dst_mm, src_mm, dst_pte, src_pte,
 							vma, addr, rss);
 		if (entry.val)
@@ -939,6 +949,15 @@ again:
 	pte_unmap_unlock(orig_dst_pte, dst_ptl);
 	cond_resched();
 
+	if (cnt_hmm_entry) {
+		int ret;
+
+		ret = hmm_mm_fork(src_mm, dst_mm, dst_vma,
+				  dst_pmd, start, end);
+		if (ret)
+			return ret;
+	}
+
 	if (entry.val) {
 		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0)
 			return -ENOMEM;
@@ -2488,6 +2507,9 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			migration_entry_wait(mm, pmd, address);
 		} else if (is_hwpoison_entry(entry)) {
 			ret = VM_FAULT_HWPOISON;
+		} else if (is_hmm_entry(entry)) {
+			ret = hmm_handle_cpu_fault(mm, vma, pmd, address,
+						   flags, orig_pte);
 		} else {
 			print_bad_pte(vma, address, orig_pte, NULL);
 			ret = VM_FAULT_SIGBUS;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/15] HMM: add new HMM page table flag (valid device memory).
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 01/15] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 02/15] HMM: add special swap filetype for memory migrated to device v2 Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 04/15] HMM: add new HMM page table flag (select flag) Jérôme Glisse
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse, Jatin Kumar

For memory migrated to device we need a new type of memory entry.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
---
 include/linux/hmm_pt.h | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
index 8a59a75..b017aa7 100644
--- a/include/linux/hmm_pt.h
+++ b/include/linux/hmm_pt.h
@@ -74,10 +74,11 @@ static inline unsigned long hmm_pde_pfn(dma_addr_t pde)
  * In the first case the device driver must ignore any pfn entry as they might
  * show as transient state while HMM is mapping the page.
  */
-#define HMM_PTE_VALID_DMA_BIT	0
-#define HMM_PTE_VALID_PFN_BIT	1
-#define HMM_PTE_WRITE_BIT	2
-#define HMM_PTE_DIRTY_BIT	3
+#define HMM_PTE_VALID_DEV_BIT	0
+#define HMM_PTE_VALID_DMA_BIT	1
+#define HMM_PTE_VALID_PFN_BIT	2
+#define HMM_PTE_WRITE_BIT	3
+#define HMM_PTE_DIRTY_BIT	4
 /*
  * Reserve some bits for device driver private flags. Note that thus can only
  * be manipulated using the hmm_pte_*_bit() sets of helpers.
@@ -85,7 +86,7 @@ static inline unsigned long hmm_pde_pfn(dma_addr_t pde)
  * WARNING ONLY SET/CLEAR THOSE FLAG ON PTE ENTRY THAT HAVE THE VALID BIT SET
  * AS OTHERWISE ANY BIT SET BY THE DRIVER WILL BE OVERWRITTEN BY HMM.
  */
-#define HMM_PTE_HW_SHIFT	4
+#define HMM_PTE_HW_SHIFT	8
 
 #define HMM_PTE_PFN_MASK	(~((dma_addr_t)((1 << PAGE_SHIFT) - 1)))
 #define HMM_PTE_DMA_MASK	(~((dma_addr_t)((1 << PAGE_SHIFT) - 1)))
@@ -166,6 +167,7 @@ static inline bool hmm_pte_test_and_set_bit(dma_addr_t *ptep,
 	HMM_PTE_TEST_AND_CLEAR_BIT(name, bit)\
 	HMM_PTE_TEST_AND_SET_BIT(name, bit)
 
+HMM_PTE_BIT_HELPER(valid_dev, HMM_PTE_VALID_DEV_BIT)
 HMM_PTE_BIT_HELPER(valid_dma, HMM_PTE_VALID_DMA_BIT)
 HMM_PTE_BIT_HELPER(valid_pfn, HMM_PTE_VALID_PFN_BIT)
 HMM_PTE_BIT_HELPER(dirty, HMM_PTE_DIRTY_BIT)
@@ -176,11 +178,23 @@ static inline dma_addr_t hmm_pte_from_pfn(dma_addr_t pfn)
 	return (pfn << PAGE_SHIFT) | (1 << HMM_PTE_VALID_PFN_BIT);
 }
 
+static inline dma_addr_t hmm_pte_from_dev_addr(dma_addr_t dma_addr)
+{
+	return (dma_addr & HMM_PTE_DMA_MASK) | (1 << HMM_PTE_VALID_DEV_BIT);
+}
+
 static inline dma_addr_t hmm_pte_from_dma_addr(dma_addr_t dma_addr)
 {
 	return (dma_addr & HMM_PTE_DMA_MASK) | (1 << HMM_PTE_VALID_DMA_BIT);
 }
 
+static inline dma_addr_t hmm_pte_dev_addr(dma_addr_t pte)
+{
+	/* FIXME Use max dma addr instead of 0 ? */
+	return hmm_pte_test_valid_dev(&pte) ? (pte & HMM_PTE_DMA_MASK) :
+					      (dma_addr_t)-1UL;
+}
+
 static inline dma_addr_t hmm_pte_dma_addr(dma_addr_t pte)
 {
 	/* FIXME Use max dma addr instead of 0 ? */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/15] HMM: add new HMM page table flag (select flag).
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (2 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 03/15] HMM: add new HMM page table flag (valid device memory) Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 05/15] HMM: handle HMM device page table entry on mirror page table fault and update Jérôme Glisse
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

When migrating memory the same array for HMM page table entry might be
use with several different devices. Add a new select flag so current
device driver callback can know which entry are selected for the device.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/hmm_pt.h | 6 ++++--
 mm/hmm.c               | 5 ++++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
index b017aa7..f745d6c 100644
--- a/include/linux/hmm_pt.h
+++ b/include/linux/hmm_pt.h
@@ -77,8 +77,9 @@ static inline unsigned long hmm_pde_pfn(dma_addr_t pde)
 #define HMM_PTE_VALID_DEV_BIT	0
 #define HMM_PTE_VALID_DMA_BIT	1
 #define HMM_PTE_VALID_PFN_BIT	2
-#define HMM_PTE_WRITE_BIT	3
-#define HMM_PTE_DIRTY_BIT	4
+#define HMM_PTE_SELECT		3
+#define HMM_PTE_WRITE_BIT	4
+#define HMM_PTE_DIRTY_BIT	5
 /*
  * Reserve some bits for device driver private flags. Note that thus can only
  * be manipulated using the hmm_pte_*_bit() sets of helpers.
@@ -170,6 +171,7 @@ static inline bool hmm_pte_test_and_set_bit(dma_addr_t *ptep,
 HMM_PTE_BIT_HELPER(valid_dev, HMM_PTE_VALID_DEV_BIT)
 HMM_PTE_BIT_HELPER(valid_dma, HMM_PTE_VALID_DMA_BIT)
 HMM_PTE_BIT_HELPER(valid_pfn, HMM_PTE_VALID_PFN_BIT)
+HMM_PTE_BIT_HELPER(select, HMM_PTE_SELECT)
 HMM_PTE_BIT_HELPER(dirty, HMM_PTE_DIRTY_BIT)
 HMM_PTE_BIT_HELPER(write, HMM_PTE_WRITE_BIT)
 
diff --git a/mm/hmm.c b/mm/hmm.c
index d44c54f..08c0160 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -743,6 +743,7 @@ static int hmm_mirror_fault_hpmd(struct hmm_mirror *mirror,
 			BUG_ON(hmm_pte_pfn(hmm_pte[i]) != pfn);
 			if (pmd_write(*pmdp))
 				hmm_pte_set_write(&hmm_pte[i]);
+			hmm_pte_set_select(&hmm_pte[i]);
 		} while (addr += PAGE_SIZE, pfn++, i++, addr != next);
 		hmm_pt_iter_directory_unlock(iter);
 		mirror_fault->addr = addr;
@@ -811,6 +812,7 @@ static int hmm_mirror_fault_pmd(pmd_t *pmdp,
 			BUG_ON(hmm_pte_pfn(hmm_pte[i]) != pte_pfn(*ptep));
 			if (pte_write(*ptep))
 				hmm_pte_set_write(&hmm_pte[i]);
+			hmm_pte_set_select(&hmm_pte[i]);
 		} while (addr += PAGE_SIZE, ptep++, i++, addr != next);
 		hmm_pt_iter_directory_unlock(iter);
 		pte_unmap(ptep - 1);
@@ -843,7 +845,8 @@ static int hmm_mirror_dma_map(struct hmm_mirror *mirror,
 
 again:
 			pte = ACCESS_ONCE(hmm_pte[i]);
-			if (!hmm_pte_test_valid_pfn(&pte)) {
+			if (!hmm_pte_test_valid_pfn(&pte) ||
+			    !hmm_pte_test_select(&pte)) {
 				if (!hmm_pte_test_valid_dma(&pte)) {
 					ret = -ENOENT;
 					break;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/15] HMM: handle HMM device page table entry on mirror page table fault and update.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (3 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 04/15] HMM: add new HMM page table flag (select flag) Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 06/15] HMM: mm add helper to update page table when migrating memory back v2 Jérôme Glisse
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

When faulting or updating the device page table properly handle the case of
device memory entry.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 mm/hmm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/hmm.c b/mm/hmm.c
index 08c0160..8b1003a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -607,6 +607,13 @@ static void hmm_mirror_update_pte(struct hmm_mirror *mirror,
 		goto out;
 	}
 
+	if (hmm_pte_test_valid_dev(hmm_pte)) {
+		*hmm_pte &= event->pte_mask;
+		if (!hmm_pte_test_valid_dev(hmm_pte))
+			hmm_pt_iter_directory_unref(iter);
+		return;
+	}
+
 	if (!hmm_pte_test_valid_dma(hmm_pte))
 		return;
 
@@ -795,6 +802,12 @@ static int hmm_mirror_fault_pmd(pmd_t *pmdp,
 		ptep = pte_offset_map(pmdp, start);
 		hmm_pt_iter_directory_lock(iter);
 		do {
+			if (hmm_pte_test_valid_dev(&hmm_pte[i])) {
+				if (write)
+					hmm_pte_set_write(&hmm_pte[i]);
+				continue;
+			}
+
 			if (!pte_present(*ptep) ||
 			    (write && !pte_write(*ptep))) {
 				ret = -ENOENT;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/15] HMM: mm add helper to update page table when migrating memory back v2.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (4 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 05/15] HMM: handle HMM device page table entry on mirror page table fault and update Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 07/15] HMM: mm add helper to update page table when migrating memory v2 Jérôme Glisse
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

To migrate memory back we first need to lock HMM special CPU page
table entry so we know no one else might try to migrate those entry
back. Helper also allocate new page where data will be copied back
from the device. Then we can proceed with the device DMA operation.

Once DMA is done we can update again the CPU page table to point to
the new page that holds the content copied back from device memory.

Note that we do not need to invalidate the range are we are only
modifying non present CPU page table entry.

Changed since v1:
  - Save memcg against which each page is precharge as it might
    change along the way.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h |  12 +++
 mm/memory.c        | 257 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 269 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 580fe65..eb1e9b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2249,6 +2249,18 @@ static inline void hmm_mm_init(struct mm_struct *mm)
 {
 	mm->hmm = NULL;
 }
+
+int mm_hmm_migrate_back(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			pte_t *new_pte,
+			unsigned long start,
+			unsigned long end);
+void mm_hmm_migrate_back_cleanup(struct mm_struct *mm,
+				 struct vm_area_struct *vma,
+				 pte_t *new_pte,
+				 dma_addr_t *hmm_pte,
+				 unsigned long start,
+				 unsigned long end);
 #else /* !CONFIG_HMM */
 static inline void hmm_mm_init(struct mm_struct *mm)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 33994a7..b2b3677 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3469,6 +3469,263 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(handle_mm_fault);
 
+
+#ifdef CONFIG_HMM
+/* mm_hmm_migrate_back() - lock HMM CPU page table entry and allocate new page.
+ *
+ * @mm: The mm struct.
+ * @vma: The vm area struct the range is in.
+ * @new_pte: Array of new CPU page table entry value.
+ * @start: Start address of the range (inclusive).
+ * @end: End address of the range (exclusive).
+ *
+ * This function will lock HMM page table entry and allocate new page for entry
+ * it successfully locked.
+ */
+int mm_hmm_migrate_back(struct mm_struct *mm,
+			struct vm_area_struct *vma,
+			pte_t *new_pte,
+			unsigned long start,
+			unsigned long end)
+{
+	pte_t hmm_entry = swp_entry_to_pte(make_hmm_entry_locked());
+	unsigned long addr, i;
+	int ret = 0;
+
+	VM_BUG_ON(vma->vm_ops || (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)));
+
+	if (unlikely(anon_vma_prepare(vma)))
+		return -ENOMEM;
+
+	start &= PAGE_MASK;
+	end = PAGE_ALIGN(end);
+	memset(new_pte, 0, sizeof(pte_t) * ((end - start) >> PAGE_SHIFT));
+
+	for (addr = start; addr < end;) {
+		unsigned long cstart, next;
+		spinlock_t *ptl;
+		pgd_t *pgdp;
+		pud_t *pudp;
+		pmd_t *pmdp;
+		pte_t *ptep;
+
+		pgdp = pgd_offset(mm, addr);
+		pudp = pud_offset(pgdp, addr);
+		/*
+		 * Some other thread might already have migrated back the entry
+		 * and freed the page table. Unlikely thought.
+		 */
+		if (unlikely(!pudp)) {
+			addr = min((addr + PUD_SIZE) & PUD_MASK, end);
+			continue;
+		}
+		pmdp = pmd_offset(pudp, addr);
+		if (unlikely(!pmdp || pmd_bad(*pmdp) || pmd_none(*pmdp) ||
+			     pmd_trans_huge(*pmdp))) {
+			addr = min((addr + PMD_SIZE) & PMD_MASK, end);
+			continue;
+		}
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		for (cstart = addr, i = (addr - start) >> PAGE_SHIFT,
+		     next = min((addr + PMD_SIZE) & PMD_MASK, end);
+		     addr < next; addr += PAGE_SIZE, ptep++, i++) {
+			swp_entry_t entry;
+
+			entry = pte_to_swp_entry(*ptep);
+			if (pte_none(*ptep) || pte_present(*ptep) ||
+			    !is_hmm_entry(entry) ||
+			    is_hmm_entry_locked(entry))
+				continue;
+
+			set_pte_at(mm, addr, ptep, hmm_entry);
+			new_pte[i] = pte_mkspecial(pfn_pte(my_zero_pfn(addr),
+						   vma->vm_page_prot));
+		}
+		pte_unmap_unlock(ptep - 1, ptl);
+
+		for (addr = cstart, i = (addr - start) >> PAGE_SHIFT;
+		     addr < next; addr += PAGE_SIZE, i++) {
+			struct mem_cgroup *memcg;
+			struct page *page;
+
+			if (!pte_present(new_pte[i]))
+				continue;
+
+			page = alloc_zeroed_user_highpage_movable(vma, addr);
+			if (!page) {
+				ret = -ENOMEM;
+				break;
+			}
+			__SetPageUptodate(page);
+			if (mem_cgroup_try_charge(page, mm, GFP_KERNEL,
+						  &memcg)) {
+				page_cache_release(page);
+				ret = -ENOMEM;
+				break;
+			}
+			/*
+			 * We can safely reuse the s_mem/mapping field of page
+			 * struct to store the memcg as the page is only seen
+			 * by HMM at this point and we can clear it before it
+			 * is public see mm_hmm_migrate_back_cleanup().
+			 */
+			page->s_mem = memcg;
+			new_pte[i] = mk_pte(page, vma->vm_page_prot);
+			if (vma->vm_flags & VM_WRITE) {
+				new_pte[i] = pte_mkdirty(new_pte[i]);
+				new_pte[i] = pte_mkwrite(new_pte[i]);
+			}
+		}
+
+		if (!ret)
+			continue;
+
+		hmm_entry = swp_entry_to_pte(make_hmm_entry());
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		for (addr = cstart, i = (addr - start) >> PAGE_SHIFT;
+		     addr < next; addr += PAGE_SIZE, ptep++, i++) {
+			unsigned long pfn = pte_pfn(new_pte[i]);
+
+			if (!pte_present(new_pte[i]) || !is_zero_pfn(pfn))
+				continue;
+
+			set_pte_at(mm, addr, ptep, hmm_entry);
+			pte_clear(mm, addr, &new_pte[i]);
+		}
+		pte_unmap_unlock(ptep - 1, ptl);
+		break;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(mm_hmm_migrate_back);
+
+/* mm_hmm_migrate_back_cleanup() - set CPU page table entry to new page.
+ *
+ * @mm: The mm struct.
+ * @vma: The vm area struct the range is in.
+ * @new_pte: Array of new CPU page table entry value.
+ * @hmm_pte: Array of HMM table entry indicating if migration was successful.
+ * @start: Start address of the range (inclusive).
+ * @end: End address of the range (exclusive).
+ *
+ * This is call after mm_hmm_migrate_back() and after effective migration. It
+ * will set CPU page table entry to new value pointing to newly allocated page
+ * where the data was effectively copied back from device memory.
+ *
+ * Any failure will trigger a bug on.
+ *
+ * TODO: For copy failure we might simply set a new value for the HMM special
+ * entry indicating poisonous entry.
+ */
+void mm_hmm_migrate_back_cleanup(struct mm_struct *mm,
+				 struct vm_area_struct *vma,
+				 pte_t *new_pte,
+				 dma_addr_t *hmm_pte,
+				 unsigned long start,
+				 unsigned long end)
+{
+	pte_t hmm_poison = swp_entry_to_pte(make_hmm_entry_poisonous());
+	unsigned long addr, i;
+
+	for (addr = start; addr < end;) {
+		unsigned long cstart, next, free_pages;
+		spinlock_t *ptl;
+		pgd_t *pgdp;
+		pud_t *pudp;
+		pmd_t *pmdp;
+		pte_t *ptep;
+
+		/*
+		 * We know for certain that we did set special swap entry for
+		 * the range and HMM entry are mark as locked so it means that
+		 * no one beside us can modify them which apply that all level
+		 * of the CPU page table are valid.
+		 */
+		pgdp = pgd_offset(mm, addr);
+		pudp = pud_offset(pgdp, addr);
+		VM_BUG_ON(!pudp);
+		pmdp = pmd_offset(pudp, addr);
+		VM_BUG_ON(!pmdp || pmd_bad(*pmdp) || pmd_none(*pmdp) ||
+			  pmd_trans_huge(*pmdp));
+
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		for (next = min((addr + PMD_SIZE) & PMD_MASK, end),
+		     cstart = addr, i = (addr - start) >> PAGE_SHIFT,
+		     free_pages = 0; addr < next; addr += PAGE_SIZE,
+		     ptep++, i++) {
+			struct mem_cgroup *memcg;
+			swp_entry_t entry;
+			struct page *page;
+
+			if (!pte_present(new_pte[i]))
+				continue;
+
+			entry = pte_to_swp_entry(*ptep);
+
+			/*
+			 * Sanity catch all the things that could go wrong but
+			 * should not, no plan B here.
+			 */
+			VM_BUG_ON(pte_none(*ptep));
+			VM_BUG_ON(pte_present(*ptep));
+			VM_BUG_ON(!is_hmm_entry_locked(entry));
+
+			if (!hmm_pte_test_valid_dma(&hmm_pte[i]) &&
+			    !hmm_pte_test_valid_pfn(&hmm_pte[i])) {
+				set_pte_at(mm, addr, ptep, hmm_poison);
+				free_pages++;
+				continue;
+			}
+
+			page = pte_page(new_pte[i]);
+
+			/*
+			 * Up to now the s_mem/mapping field stored the memcg
+			 * against which the page was pre-charged. Save it and
+			 * clear field so PageAnon() return false.
+			 */
+			memcg = page->s_mem;
+			page->s_mem = NULL;
+
+			inc_mm_counter_fast(mm, MM_ANONPAGES);
+			page_add_new_anon_rmap(page, vma, addr);
+			mem_cgroup_commit_charge(page, memcg, false);
+			lru_cache_add_active_or_unevictable(page, vma);
+			set_pte_at(mm, addr, ptep, new_pte[i]);
+			update_mmu_cache(vma, addr, ptep);
+			pte_clear(mm, addr, &new_pte[i]);
+		}
+		pte_unmap_unlock(ptep - 1, ptl);
+
+		if (!free_pages)
+			continue;
+
+		for (addr = cstart, i = (addr - start) >> PAGE_SHIFT;
+		     addr < next; addr += PAGE_SIZE, i++) {
+			struct mem_cgroup *memcg;
+			struct page *page;
+
+			if (!pte_present(new_pte[i]))
+				continue;
+
+			page = pte_page(new_pte[i]);
+
+			/*
+			 * Up to now the s_mem/mapping field stored the memcg
+			 * against which the page was pre-charged.
+			 */
+			memcg = page->s_mem;
+			page->s_mem = NULL;
+
+			mem_cgroup_cancel_charge(page, memcg);
+			page_cache_release(page);
+		}
+	}
+}
+EXPORT_SYMBOL(mm_hmm_migrate_back_cleanup);
+#endif
+
+
 #ifndef __PAGETABLE_PUD_FOLDED
 /*
  * Allocate page upper directory.
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/15] HMM: mm add helper to update page table when migrating memory v2.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (5 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 06/15] HMM: mm add helper to update page table when migrating memory back v2 Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 08/15] HMM: new callback for copying memory from and to device " Jérôme Glisse
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

For doing memory migration to remote memory we need to unmap range
of anonymous memory from CPU page table and replace page table entry
with special HMM entry.

This is a multi-stage process, first we save and replace page table
entry with special HMM entry, also flushing tlb in the process. If
we run into non allocated entry we either use the zero page or we
allocate new page. For swaped entry we try to swap them in.

Once we have set the page table entry to the special entry we check
the page backing each of the address to make sure that only page
table mappings are holding reference on the page, which means we
can safely migrate the page to device memory. Because the CPU page
table entry are special entry, no get_user_pages() can reference
the page anylonger. So we are safe from race on that front. Note
that the page can still be referenced by get_user_pages() from
other process but in that case the page is write protected and
as we do not drop the mapcount nor the page count we know that
all user of get_user_pages() are only doing read only access (on
write access they would allocate a new page).

Once we have identified all the page that are safe to migrate the
first function return and let HMM schedule the migration with the
device driver.

Finaly there is a cleanup function that will drop the mapcount and
reference count on all page that have been successfully migrated,
or restore the page table entry otherwise.

Changed since v1:
  - Fix pmd/pte allocation when migrating.
  - Fix reverse logic on mm_forbids_zeropage()
  - Add comment on why we add to lru list new page.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/mm.h |  14 ++
 mm/memory.c        | 471 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 485 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index eb1e9b2..0a6a292 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2250,6 +2250,20 @@ static inline void hmm_mm_init(struct mm_struct *mm)
 	mm->hmm = NULL;
 }
 
+int mm_hmm_migrate(struct mm_struct *mm,
+		   struct vm_area_struct *vma,
+		   pte_t *save_pte,
+		   bool *backoff,
+		   const void *mmu_notifier_exclude,
+		   unsigned long start,
+		   unsigned long end);
+void mm_hmm_migrate_cleanup(struct mm_struct *mm,
+			    struct vm_area_struct *vma,
+			    pte_t *save_pte,
+			    dma_addr_t *hmm_pte,
+			    unsigned long start,
+			    unsigned long end);
+
 int mm_hmm_migrate_back(struct mm_struct *mm,
 			struct vm_area_struct *vma,
 			pte_t *new_pte,
diff --git a/mm/memory.c b/mm/memory.c
index b2b3677..d34e610 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -54,6 +54,7 @@
 #include <linux/memcontrol.h>
 #include <linux/mmu_notifier.h>
 #include <linux/hmm.h>
+#include <linux/hmm_pt.h>
 #include <linux/kallsyms.h>
 #include <linux/swapops.h>
 #include <linux/elf.h>
@@ -3723,6 +3724,476 @@ void mm_hmm_migrate_back_cleanup(struct mm_struct *mm,
 	}
 }
 EXPORT_SYMBOL(mm_hmm_migrate_back_cleanup);
+
+/* mm_hmm_migrate() - unmap range and set special HMM pte for it.
+ *
+ * @mm: The mm struct.
+ * @vma: The vm area struct the range is in.
+ * @save_pte: array where to save current CPU page table entry value.
+ * @backoff: Pointer toward a boolean indicating that we need to stop.
+ * @exclude: The mmu_notifier listener to exclude from mmu_notifier callback.
+ * @start: Start address of the range (inclusive).
+ * @end: End address of the range (exclusive).
+ * Returns: 0 on success, -EINVAL if some argument where invalid, -ENOMEM if
+ * it failed allocating memory for performing the operation, -EFAULT if some
+ * memory backing the range is in bad state, -EAGAIN if backoff flag turned
+ * to true.
+ *
+ * The process of memory migration is bit involve, first we must set all CPU
+ * page table entry to the special HMM locked entry ensuring us exclusive
+ * control over the page table entry (ie no other process can change the page
+ * table but us).
+ *
+ * While doing that we must handle empty and swaped entry. For empty entry we
+ * either use the zero page or allocate a new page. For swap entry we call
+ * __handle_mm_fault() to try to faultin the page (swap entry can be a number
+ * of thing).
+ *
+ * Once we have unmapped we need to check that we can effectively migrate the
+ * page, by testing that no one is holding a reference on the page beside the
+ * reference taken by each page mapping.
+ *
+ * On success every valid entry inside save_pte array is an entry that can be
+ * migrated.
+ *
+ * Note that this function does not free any of the page, nor does it updates
+ * the various memcg counter (exception being for accounting new allocation).
+ * This happen inside the mm_hmm_migrate_cleanup() function.
+ *
+ */
+int mm_hmm_migrate(struct mm_struct *mm,
+		   struct vm_area_struct *vma,
+		   pte_t *save_pte,
+		   bool *backoff,
+		   const void *mmu_notifier_exclude,
+		   unsigned long start,
+		   unsigned long end)
+{
+	pte_t hmm_entry = swp_entry_to_pte(make_hmm_entry_locked());
+	struct mmu_notifier_range range = {
+		.start = start,
+		.end = end,
+		.event = MMU_MIGRATE,
+	};
+	unsigned long addr = start, i;
+	struct mmu_gather tlb;
+	int ret = 0;
+
+	/* Only allow anonymous mapping and sanity check arguments. */
+	if (vma->vm_ops || unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)))
+		return -EINVAL;
+	start &= PAGE_MASK;
+	end = PAGE_ALIGN(end);
+	if (start >= end || end > vma->vm_end)
+		return -EINVAL;
+
+	/* Only need to test on the last address of the range. */
+	if (check_stack_guard_page(vma, end) < 0)
+		return -EFAULT;
+
+	/* Try to fail early on. */
+	if (unlikely(anon_vma_prepare(vma)))
+		return -ENOMEM;
+
+retry:
+	lru_add_drain();
+	tlb_gather_mmu(&tlb, mm, range.start, range.end);
+	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start_excluding(mm, &range,
+						      mmu_notifier_exclude);
+	tlb_start_vma(&tlb, vma);
+	for (addr = range.start, i = 0; addr < end && !ret;) {
+		unsigned long cstart, next, npages = 0;
+		spinlock_t *ptl;
+		pgd_t *pgdp;
+		pud_t *pudp;
+		pmd_t *pmdp;
+		pte_t *ptep;
+
+		/*
+		 * Pretty much the exact same logic as __handle_mm_fault(),
+		 * exception being the handling of huge pmd.
+		 */
+		pgdp = pgd_offset(mm, addr);
+		pudp = pud_alloc(mm, pgdp, addr);
+		if (!pudp) {
+			ret = -ENOMEM;
+			break;
+		}
+		pmdp = pmd_alloc(mm, pudp, addr);
+		if (!pmdp) {
+			ret = -ENOMEM;
+			break;
+		}
+		if (unlikely(pmd_trans_splitting(*pmdp))) {
+			wait_split_huge_page(vma->anon_vma, pmdp);
+			ret = -EAGAIN;
+			break;
+		}
+		if (unlikely(pmd_none(*pmdp)) &&
+		    unlikely(__pte_alloc(mm, vma, pmdp, addr))) {
+			ret = -ENOMEM;
+			break;
+		}
+		/*
+		 * If an huge pmd materialized from under us split it and break
+		 * out of the loop to retry.
+		 */
+		if (unlikely(pmd_trans_huge(*pmdp))) {
+			split_huge_page_pmd(vma, addr, pmdp);
+			ret = -EAGAIN;
+			break;
+		}
+
+		/*
+		 * A regular pmd is established and it can't morph into a huge
+		 * pmd from under us anymore at this point because we hold the
+		 * mmap_sem read mode and khugepaged takes it in write mode. So
+		 * now it's safe to run pte_offset_map().
+		 */
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		for (i = (addr - start) >> PAGE_SHIFT, cstart = addr,
+		     next = min((addr + PMD_SIZE) & PMD_MASK, end);
+		     addr < next; addr += PAGE_SIZE, ptep++, i++) {
+			save_pte[i] = ptep_get_and_clear(mm, addr, ptep);
+			tlb_remove_tlb_entry(&tlb, ptep, addr);
+			set_pte_at(mm, addr, ptep, hmm_entry);
+
+			if (pte_present(save_pte[i]))
+				continue;
+
+			if (!pte_none(save_pte[i])) {
+				set_pte_at(mm, addr, ptep, save_pte[i]);
+				ret = -ENOENT;
+				ptep++;
+				break;
+			}
+			/*
+			 * TODO: This mm_forbids_zeropage() really does not
+			 * apply to us. First it seems only S390 have it set,
+			 * second we are not even using the zero page entry
+			 * to populate the CPU page table, thought on error
+			 * we might use the save_pte entry to set the CPU
+			 * page table entry.
+			 *
+			 * Live with that oddity for now.
+			 */
+			if (mm_forbids_zeropage(mm)) {
+				pte_clear(mm, addr, &save_pte[i]);
+				npages++;
+				continue;
+			}
+			save_pte[i] = pte_mkspecial(pfn_pte(my_zero_pfn(addr),
+						    vma->vm_page_prot));
+		}
+		pte_unmap_unlock(ptep - 1, ptl);
+
+		/*
+		 * So we must allocate pages before checking for error, which
+		 * here indicate that one entry is a swap entry. We need to
+		 * allocate first because otherwise there is no easy way to
+		 * know on retry or in error code path wether the CPU page
+		 * table locked HMM entry is ours or from some other thread.
+		 */
+
+		if (!npages)
+			continue;
+
+		for (next = addr, addr = cstart,
+		     i = (addr - start) >> PAGE_SHIFT;
+		     addr < next; addr += PAGE_SIZE, i++) {
+			struct mem_cgroup *memcg;
+			struct page *page;
+
+			if (pte_present(save_pte[i]) || !pte_none(save_pte[i]))
+				continue;
+
+			page = alloc_zeroed_user_highpage_movable(vma, addr);
+			if (!page) {
+				ret = -ENOMEM;
+				break;
+			}
+			__SetPageUptodate(page);
+			if (mem_cgroup_try_charge(page, mm, GFP_KERNEL,
+						  &memcg)) {
+				page_cache_release(page);
+				ret = -ENOMEM;
+				break;
+			}
+			save_pte[i] = mk_pte(page, vma->vm_page_prot);
+			if (vma->vm_flags & VM_WRITE)
+				save_pte[i] = pte_mkwrite(save_pte[i]);
+			inc_mm_counter_fast(mm, MM_ANONPAGES);
+			/*
+			 * Because we set the page table entry to the special
+			 * HMM locked entry we know no other process might do
+			 * anything with it and thus we can safely account the
+			 * page without holding any lock at this point.
+			 */
+			page_add_new_anon_rmap(page, vma, addr);
+			mem_cgroup_commit_charge(page, memcg, false);
+			/*
+			 * Add to active list so we know vmscan will not waste
+			 * its time with that page while we are still using it.
+			 */
+			lru_cache_add_active_or_unevictable(page, vma);
+		}
+	}
+	tlb_end_vma(&tlb, vma);
+	mmu_notifier_invalidate_range_end_excluding(mm, &range,
+						    mmu_notifier_exclude);
+	tlb_finish_mmu(&tlb, range.start, range.end);
+
+	if (backoff && *backoff) {
+		/* Stick to the range we updated. */
+		ret = -EAGAIN;
+		end = addr;
+		goto out;
+	}
+
+	/* Check if something is missing or something went wrong. */
+	if (ret == -ENOENT) {
+		int flags = FAULT_FLAG_ALLOW_RETRY;
+
+		do {
+			/*
+			 * Using __handle_mm_fault() as current->mm != mm ie we
+			 * might have been call from a kernel thread on behalf
+			 * of a driver and all accounting handle_mm_fault() is
+			 * pointless in our case.
+			 */
+			ret = __handle_mm_fault(mm, vma, addr, flags);
+			flags |= FAULT_FLAG_TRIED;
+		} while ((ret & VM_FAULT_RETRY));
+		if ((ret & VM_FAULT_ERROR)) {
+			/* Stick to the range we updated. */
+			end = addr;
+			ret = -EFAULT;
+			goto out;
+		}
+		range.start = addr;
+		goto retry;
+	}
+	if (ret == -EAGAIN) {
+		range.start = addr;
+		goto retry;
+	}
+	if (ret)
+		/* Stick to the range we updated. */
+		end = addr;
+
+	/*
+	 * At this point no one else can take a reference on the page from this
+	 * process CPU page table. So we can safely check wether we can migrate
+	 * or not the page.
+	 */
+
+out:
+	for (addr = start, i = 0; addr < end;) {
+		unsigned long next;
+		spinlock_t *ptl;
+		pgd_t *pgdp;
+		pud_t *pudp;
+		pmd_t *pmdp;
+		pte_t *ptep;
+
+		/*
+		 * We know for certain that we did set special swap entry for
+		 * the range and HMM entry are mark as locked so it means that
+		 * no one beside us can modify them which apply that all level
+		 * of the CPU page table are valid.
+		 */
+		pgdp = pgd_offset(mm, addr);
+		pudp = pud_offset(pgdp, addr);
+		VM_BUG_ON(!pudp);
+		pmdp = pmd_offset(pudp, addr);
+		VM_BUG_ON(!pmdp || pmd_bad(*pmdp) || pmd_none(*pmdp) ||
+			  pmd_trans_huge(*pmdp));
+
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		for (next = min((addr + PMD_SIZE) & PMD_MASK, end),
+		     i = (addr - start) >> PAGE_SHIFT; addr < next;
+		     addr += PAGE_SIZE, ptep++, i++) {
+			struct page *page;
+			swp_entry_t entry;
+			int swapped;
+
+			entry = pte_to_swp_entry(save_pte[i]);
+			if (is_hmm_entry(entry)) {
+				/*
+				 * Logic here is pretty involve. If save_pte is
+				 * an HMM special swap entry then it means that
+				 * we failed to swap in that page so error must
+				 * be set.
+				 *
+				 * If that's not the case than it means we are
+				 * seriously screw.
+				 */
+				VM_BUG_ON(!ret);
+				continue;
+			}
+
+			/*
+			 * This can not happen, no one else can replace our
+			 * special entry and as range end is re-ajusted on
+			 * error.
+			 */
+			entry = pte_to_swp_entry(*ptep);
+			VM_BUG_ON(!is_hmm_entry_locked(entry));
+
+			/* On error or backoff restore all the saved pte. */
+			if (ret)
+				goto restore;
+
+			page = vm_normal_page(vma, addr, save_pte[i]);
+			/* The zero page is fine to migrate. */
+			if (!page)
+				continue;
+
+			/*
+			 * Check that only CPU mapping hold a reference on the
+			 * page. To make thing simpler we just refuse bail out
+			 * if page_mapcount() != page_count() (also accounting
+			 * for swap cache).
+			 *
+			 * There is a small window here where wp_page_copy()
+			 * might have decremented mapcount but have not yet
+			 * decremented the page count. This is not an issue as
+			 * we backoff in that case.
+			 */
+			swapped = PageSwapCache(page);
+			if (page_mapcount(page) + swapped == page_count(page))
+				continue;
+
+restore:
+			/* Ok we have to restore that page. */
+			set_pte_at(mm, addr, ptep, save_pte[i]);
+			/*
+			 * No need to invalidate - it was non-present
+			 * before.
+			 */
+			update_mmu_cache(vma, addr, ptep);
+			pte_clear(mm, addr, &save_pte[i]);
+		}
+		pte_unmap_unlock(ptep - 1, ptl);
+	}
+	return ret;
+}
+EXPORT_SYMBOL(mm_hmm_migrate);
+
+/* mm_hmm_migrate_cleanup() - unmap range cleanup.
+ *
+ * @mm: The mm struct.
+ * @vma: The vm area struct the range is in.
+ * @save_pte: Array where to save current CPU page table entry value.
+ * @hmm_pte: Array of HMM table entry indicating if migration was successful.
+ * @start: Start address of the range (inclusive).
+ * @end: End address of the range (exclusive).
+ *
+ * This is call after mm_hmm_migrate() and after effective migration. It will
+ * restore CPU page table entry for page that not been migrated or in case of
+ * failure.
+ *
+ * It will free pages that have been migrated and updates appropriate counters,
+ * it will also "unlock" special HMM pte entry.
+ */
+void mm_hmm_migrate_cleanup(struct mm_struct *mm,
+			    struct vm_area_struct *vma,
+			    pte_t *save_pte,
+			    dma_addr_t *hmm_pte,
+			    unsigned long start,
+			    unsigned long end)
+{
+	pte_t hmm_entry = swp_entry_to_pte(make_hmm_entry());
+	struct page *pages[MMU_GATHER_BUNDLE];
+	unsigned long addr, c, i;
+
+	for (addr = start, i = 0; addr < end;) {
+		unsigned long next;
+		spinlock_t *ptl;
+		pgd_t *pgdp;
+		pud_t *pudp;
+		pmd_t *pmdp;
+		pte_t *ptep;
+
+		/*
+		 * We know for certain that we did set special swap entry for
+		 * the range and HMM entry are mark as locked so it means that
+		 * no one beside us can modify them which apply that all level
+		 * of the CPU page table are valid.
+		 */
+		pgdp = pgd_offset(mm, addr);
+		pudp = pud_offset(pgdp, addr);
+		VM_BUG_ON(!pudp);
+		pmdp = pmd_offset(pudp, addr);
+		VM_BUG_ON(!pmdp || pmd_bad(*pmdp) || pmd_none(*pmdp) ||
+			  pmd_trans_huge(*pmdp));
+
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		for (next = min((addr + PMD_SIZE) & PMD_MASK, end),
+		     i = (addr - start) >> PAGE_SHIFT; addr < next;
+		     addr += PAGE_SIZE, ptep++, i++) {
+			struct page *page;
+			swp_entry_t entry;
+
+			/*
+			 * This can't happen no one else can replace our
+			 * precious special entry.
+			 */
+			entry = pte_to_swp_entry(*ptep);
+			VM_BUG_ON(!is_hmm_entry_locked(entry));
+
+			if (!hmm_pte_test_valid_dev(&hmm_pte[i])) {
+				/* Ok we have to restore that page. */
+				set_pte_at(mm, addr, ptep, save_pte[i]);
+				/*
+				 * No need to invalidate - it was non-present
+				 * before.
+				 */
+				update_mmu_cache(vma, addr, ptep);
+				pte_clear(mm, addr, &save_pte[i]);
+				continue;
+			}
+
+			/* Set unlocked entry. */
+			set_pte_at(mm, addr, ptep, hmm_entry);
+			/*
+			 * No need to invalidate - it was non-present
+			 * before.
+			 */
+			update_mmu_cache(vma, addr, ptep);
+
+			page = vm_normal_page(vma, addr, save_pte[i]);
+			/* The zero page is fine to migrate. */
+			if (!page)
+				continue;
+
+			page_remove_rmap(page);
+			dec_mm_counter_fast(mm, MM_ANONPAGES);
+		}
+		pte_unmap_unlock(ptep - 1, ptl);
+	}
+
+	/* Free pages. */
+	for (addr = start, i = 0, c = 0; addr < end; i++, addr += PAGE_SIZE) {
+		if (pte_none(save_pte[i]))
+			continue;
+		if (c >= MMU_GATHER_BUNDLE) {
+			/*
+			 * TODO: What we really want to do is keep the memory
+			 * accounted inside the memory group and inside rss
+			 * while still freeing the page. So that migration
+			 * back from device memory will not fail because we
+			 * go over memory group limit.
+			 */
+			free_pages_and_swap_cache(pages, c);
+			c = 0;
+		}
+		pages[c] = vm_normal_page(vma, addr, save_pte[i]);
+		c = pages[c] ? c + 1 : c;
+	}
+}
+EXPORT_SYMBOL(mm_hmm_migrate_cleanup);
 #endif
 
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/15] HMM: new callback for copying memory from and to device memory v2.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (6 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 07/15] HMM: mm add helper to update page table when migrating memory v2 Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 09/15] HMM: allow to get pointer to spinlock protecting a directory Jérôme Glisse
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jerome Glisse, Jatin Kumar

From: Jerome Glisse <jglisse@redhat.com>

This patch only adds the new callback device driver must implement
to copy memory from and to device memory.

Changed since v1:
  - Pass down the vma to the copy function.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
---
 include/linux/hmm.h | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/hmm.c            |   2 +
 2 files changed, 107 insertions(+)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 7c66513..9fbfc07 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -65,6 +65,8 @@ enum hmm_etype {
 	HMM_DEVICE_RFAULT,
 	HMM_DEVICE_WFAULT,
 	HMM_WRITE_PROTECT,
+	HMM_COPY_FROM_DEVICE,
+	HMM_COPY_TO_DEVICE,
 };
 
 /* struct hmm_event - memory event information.
@@ -170,6 +172,109 @@ struct hmm_device_ops {
 	 */
 	int (*update)(struct hmm_mirror *mirror,
 		      struct hmm_event *event);
+
+	/* copy_from_device() - copy from device memory to system memory.
+	 *
+	 * @mirror: The mirror that link process address space with the device.
+	 * @event: The event that triggered the copy.
+	 * @dst: Array containing hmm_pte of destination memory.
+	 * @start: Start address of the range (sub-range of event) to copy.
+	 * @end: End address of the range (sub-range of event) to copy.
+	 * Returns: 0 on success, error code otherwise {-ENOMEM, -EIO}.
+	 *
+	 * Called when migrating memory from device memory to system memory.
+	 * The dst array contains valid DMA address for the device of the page
+	 * to copy to (or pfn of page if hmm_device.device == NULL).
+	 *
+	 * If event.etype == HMM_FORK then device driver only need to schedule
+	 * a copy to the system pages given in the dst hmm_pte array. Do not
+	 * update the device page, and do not pause/stop the device threads
+	 * that are using this address space. Just copy memory.
+	 *
+	 * If event.type == HMM_COPY_FROM_DEVICE then device driver must first
+	 * write protect the range then schedule the copy, then update its page
+	 * table to use the new system memory given the dst array. Some device
+	 * can perform all this in an atomic fashion from device point of view.
+	 * The device driver must also free the device memory once the copy is
+	 * done.
+	 *
+	 * Device driver must not fail lightly, any failure result in device
+	 * process being kill and CPU page table set to HWPOISON entry.
+	 *
+	 * Note that device driver must clear the valid bit of the dst entry it
+	 * failed to copy.
+	 *
+	 * On failure the mirror will be kill by HMM which will do a HMM_MUNMAP
+	 * invalidation of all the memory when this happen the device driver
+	 * can free the device memory.
+	 *
+	 * Note also that there can be hole in the range being copied ie some
+	 * entry of dst array will not have the valid bit set, device driver
+	 * must simply ignore non valid entry.
+	 *
+	 * Finaly device driver must set the dirty bit for each page that was
+	 * modified since it was copied inside the device memory. This must be
+	 * conservative ie if device can not determine that with certainty then
+	 * it must set the dirty bit unconditionally.
+	 *
+	 * Return 0 on success, error value otherwise :
+	 * -ENOMEM Not enough memory for performing the operation.
+	 * -EIO    Some input/output error with the device.
+	 *
+	 * All other return value trigger warning and are transformed to -EIO.
+	 */
+	int (*copy_from_device)(struct hmm_mirror *mirror,
+				const struct hmm_event *event,
+				dma_addr_t *dst,
+				unsigned long start,
+				unsigned long end);
+
+	/* copy_to_device() - copy to device memory from system memory.
+	 *
+	 * @mirror: The mirror that link process address space with the device.
+	 * @event: The event that triggered the copy.
+	 * @vma: The vma corresponding to the range.
+	 * @dst: Array containing hmm_pte of destination memory.
+	 * @start: Start address of the range (sub-range of event) to copy.
+	 * @end: End address of the range (sub-range of event) to copy.
+	 * Returns: 0 on success, error code otherwise {-ENOMEM, -EIO}.
+	 *
+	 * Called when migrating memory from system memory to device memory.
+	 * The dst array is empty, all of its entry are equal to zero. Device
+	 * driver must allocate the device memory and populate each entry using
+	 * hmm_pte_from_device_pfn() only the valid device bit and hardware
+	 * specific bit will be preserve (write and dirty will be taken from
+	 * the original entry inside the mirror page table). It is advice to
+	 * set the device pfn to match the physical address of device memory
+	 * being use. The event.etype will be equals to HMM_COPY_TO_DEVICE.
+	 *
+	 * Device driver that can atomically copy a page and update its page
+	 * table entry to point to the device memory can do that. Partial
+	 * failure is allowed, entry that have not been migrated must have
+	 * the HMM_PTE_VALID_DEV bit clear inside the dst array. HMM will
+	 * update the CPU page table of failed entry to point back to the
+	 * system page.
+	 *
+	 * Note that device driver is responsible for allocating and freeing
+	 * the device memory and properly updating to dst array entry with
+	 * the allocated device memory.
+	 *
+	 * Return 0 on success, error value otherwise :
+	 * -ENOMEM Not enough memory for performing the operation.
+	 * -EIO    Some input/output error with the device.
+	 *
+	 * All other return value trigger warning and are transformed to -EIO.
+	 * Errors means that the migration is aborted. So in case of partial
+	 * failure if device do not want to fully abort it must return 0.
+	 * Device driver can update device page table only if it knows it will
+	 * not return failure.
+	 */
+	int (*copy_to_device)(struct hmm_mirror *mirror,
+			      const struct hmm_event *event,
+			      struct vm_area_struct *vma,
+			      dma_addr_t *dst,
+			      unsigned long start,
+			      unsigned long end);
 };
 
 
diff --git a/mm/hmm.c b/mm/hmm.c
index 8b1003a..3944b2a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -78,6 +78,8 @@ static inline int hmm_event_init(struct hmm_event *event,
 	switch (etype) {
 	case HMM_DEVICE_RFAULT:
 	case HMM_DEVICE_WFAULT:
+	case HMM_COPY_TO_DEVICE:
+	case HMM_COPY_FROM_DEVICE:
 		break;
 	case HMM_FORK:
 	case HMM_WRITE_PROTECT:
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/15] HMM: allow to get pointer to spinlock protecting a directory.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (7 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 08/15] HMM: new callback for copying memory from and to device " Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 10/15] HMM: split DMA mapping function in two Jérôme Glisse
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

Several use case for getting pointer to spinlock protecting a directory.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 include/linux/hmm_pt.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/include/linux/hmm_pt.h b/include/linux/hmm_pt.h
index f745d6c..22100a6 100644
--- a/include/linux/hmm_pt.h
+++ b/include/linux/hmm_pt.h
@@ -255,6 +255,16 @@ static inline void hmm_pt_directory_lock(struct hmm_pt *pt,
 		spin_lock(&pt->lock);
 }
 
+static inline spinlock_t *hmm_pt_directory_lock_ptr(struct hmm_pt *pt,
+						    struct page *ptd,
+						    unsigned level)
+{
+	if (level)
+		return &ptd->ptl;
+	else
+		return &pt->lock;
+}
+
 static inline void hmm_pt_directory_unlock(struct hmm_pt *pt,
 					   struct page *ptd,
 					   unsigned level)
@@ -272,6 +282,13 @@ static inline void hmm_pt_directory_lock(struct hmm_pt *pt,
 	spin_lock(&pt->lock);
 }
 
+static inline spinlock_t *hmm_pt_directory_lock_ptr(struct hmm_pt *pt,
+						    struct page *ptd,
+						    unsigned level)
+{
+	return &pt->lock;
+}
+
 static inline void hmm_pt_directory_unlock(struct hmm_pt *pt,
 					   struct page *ptd,
 					   unsigned level)
@@ -358,6 +375,14 @@ static inline void hmm_pt_iter_directory_lock(struct hmm_pt_iter *iter)
 	hmm_pt_directory_lock(pt, iter->ptd[pt->llevel - 1], pt->llevel);
 }
 
+static inline spinlock_t *hmm_pt_iter_directory_lock_ptr(struct hmm_pt_iter *i)
+{
+	struct hmm_pt *pt = i->pt;
+
+	return hmm_pt_directory_lock_ptr(pt, i->ptd[pt->llevel - 1],
+					 pt->llevel);
+}
+
 static inline void hmm_pt_iter_directory_unlock(struct hmm_pt_iter *iter)
 {
 	struct hmm_pt *pt = iter->pt;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/15] HMM: split DMA mapping function in two.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (8 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 09/15] HMM: allow to get pointer to spinlock protecting a directory Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 11/15] HMM: add helpers for migration back to system memory v3 Jérôme Glisse
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

To be able to reuse the DMA mapping logic, split it in two functions.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 mm/hmm.c | 120 ++++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 65 insertions(+), 55 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 3944b2a..72187ca 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -837,76 +837,86 @@ static int hmm_mirror_fault_pmd(pmd_t *pmdp,
 	return ret;
 }
 
+static int hmm_mirror_dma_map_range(struct hmm_mirror *mirror,
+				    dma_addr_t *hmm_pte,
+				    spinlock_t *lock,
+				    unsigned long npages)
+{
+	struct device *dev = mirror->device->dev;
+	unsigned long i;
+	int ret = 0;
+
+	for (i = 0; i < npages; i++) {
+		dma_addr_t dma_addr, pte;
+		struct page *page;
+
+again:
+		pte = ACCESS_ONCE(hmm_pte[i]);
+		if (!hmm_pte_test_valid_pfn(&pte) || !hmm_pte_test_select(&pte))
+			continue;
+
+		page = pfn_to_page(hmm_pte_pfn(pte));
+		VM_BUG_ON(!page);
+		dma_addr = dma_map_page(dev, page, 0, PAGE_SIZE,
+					DMA_BIDIRECTIONAL);
+		if (dma_mapping_error(dev, dma_addr)) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		/*
+		 * Make sure we transfer the dirty bit. Note that there
+		 * might still be a window for another thread to set
+		 * the dirty bit before we check for pte equality. This
+		 * will just lead to a useless retry so it is not the
+		 * end of the world here.
+		 */
+		if (lock)
+			spin_lock(lock);
+		if (hmm_pte_test_dirty(&hmm_pte[i]))
+			hmm_pte_set_dirty(&pte);
+		if (ACCESS_ONCE(hmm_pte[i]) != pte) {
+				if (lock)
+					spin_unlock(lock);
+				dma_unmap_page(dev, dma_addr, PAGE_SIZE,
+					       DMA_BIDIRECTIONAL);
+				if (hmm_pte_test_valid_pfn(&hmm_pte[i]))
+					goto again;
+				continue;
+		}
+		hmm_pte[i] = hmm_pte_from_dma_addr(dma_addr);
+		if (hmm_pte_test_write(&pte))
+			hmm_pte_set_write(&hmm_pte[i]);
+		if (hmm_pte_test_dirty(&pte))
+			hmm_pte_set_dirty(&hmm_pte[i]);
+		if (lock)
+			spin_unlock(lock);
+	}
+
+	return ret;
+}
+
 static int hmm_mirror_dma_map(struct hmm_mirror *mirror,
 			      struct hmm_pt_iter *iter,
 			      unsigned long start,
 			      unsigned long end)
 {
-	struct device *dev = mirror->device->dev;
 	unsigned long addr;
 	int ret;
 
 	for (ret = 0, addr = start; !ret && addr < end;) {
-		unsigned long i = 0, next = end;
+		unsigned long next = end, npages;
 		dma_addr_t *hmm_pte;
+		spinlock_t *lock;
 
 		hmm_pte = hmm_pt_iter_populate(iter, addr, &next);
 		if (!hmm_pte)
 			return -ENOENT;
 
-		do {
-			dma_addr_t dma_addr, pte;
-			struct page *page;
-
-again:
-			pte = ACCESS_ONCE(hmm_pte[i]);
-			if (!hmm_pte_test_valid_pfn(&pte) ||
-			    !hmm_pte_test_select(&pte)) {
-				if (!hmm_pte_test_valid_dma(&pte)) {
-					ret = -ENOENT;
-					break;
-				}
-				continue;
-			}
-
-			page = pfn_to_page(hmm_pte_pfn(pte));
-			VM_BUG_ON(!page);
-			dma_addr = dma_map_page(dev, page, 0, PAGE_SIZE,
-						DMA_BIDIRECTIONAL);
-			if (dma_mapping_error(dev, dma_addr)) {
-				ret = -ENOMEM;
-				break;
-			}
-
-			hmm_pt_iter_directory_lock(iter);
-			/*
-			 * Make sure we transfer the dirty bit. Note that there
-			 * might still be a window for another thread to set
-			 * the dirty bit before we check for pte equality. This
-			 * will just lead to a useless retry so it is not the
-			 * end of the world here.
-			 */
-			if (hmm_pte_test_dirty(&hmm_pte[i]))
-				hmm_pte_set_dirty(&pte);
-			if (ACCESS_ONCE(hmm_pte[i]) != pte) {
-				hmm_pt_iter_directory_unlock(iter);
-				dma_unmap_page(dev, dma_addr, PAGE_SIZE,
-					       DMA_BIDIRECTIONAL);
-				if (hmm_pte_test_valid_pfn(&pte))
-					goto again;
-				if (!hmm_pte_test_valid_dma(&pte)) {
-					ret = -ENOENT;
-					break;
-				}
-			} else {
-				hmm_pte[i] = hmm_pte_from_dma_addr(dma_addr);
-				if (hmm_pte_test_write(&pte))
-					hmm_pte_set_write(&hmm_pte[i]);
-				if (hmm_pte_test_dirty(&pte))
-					hmm_pte_set_dirty(&hmm_pte[i]);
-				hmm_pt_iter_directory_unlock(iter);
-			}
-		} while (addr += PAGE_SIZE, i++, addr != next && !ret);
+		npages = (next - addr) >> PAGE_SHIFT;
+		lock = hmm_pt_iter_directory_lock_ptr(iter);
+		ret = hmm_mirror_dma_map_range(mirror, hmm_pte, lock, npages);
+		addr = next;
 	}
 
 	return ret;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/15] HMM: add helpers for migration back to system memory v3.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (9 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 10/15] HMM: split DMA mapping function in two Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 12/15] HMM: fork copy migrated memory into system memory for child process Jérôme Glisse
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse, Jatin Kumar

This patch add all necessary functions and helpers for migration
from device memory back to system memory. They are 3 differents
case that would use that code :
  - CPU page fault
  - fork
  - device driver request

Note that this patch use regular memory accounting this means that
migration can fail as a result of memory cgroup resource exhaustion.
Latter patches will modify memcg to allow to keep remote memory
accounted as regular memory thus removing this point of failure.

Changed since v1:
  - Fixed logic in dma unmap code path on migration error.

Changed since v2:
  - Adapt to HMM page table changes.
  - Fix bug in migration failure code path.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
---
 mm/hmm.c | 151 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/mm/hmm.c b/mm/hmm.c
index 72187ca..5400dfb 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -47,6 +47,12 @@
 
 static struct mmu_notifier_ops hmm_notifier_ops;
 static void hmm_mirror_kill(struct hmm_mirror *mirror);
+static int hmm_mirror_migrate_back(struct hmm_mirror *mirror,
+				   struct hmm_event *event,
+				   pte_t *new_pte,
+				   dma_addr_t *dst,
+				   unsigned long start,
+				   unsigned long end);
 static inline int hmm_mirror_update(struct hmm_mirror *mirror,
 				    struct hmm_event *event,
 				    struct page *page);
@@ -418,6 +424,46 @@ static struct mmu_notifier_ops hmm_notifier_ops = {
 };
 
 
+static int hmm_migrate_back(struct hmm *hmm,
+			    struct hmm_event *event,
+			    struct mm_struct *mm,
+			    struct vm_area_struct *vma,
+			    pte_t *new_pte,
+			    dma_addr_t *dst,
+			    unsigned long start,
+			    unsigned long end)
+{
+	struct hmm_mirror *mirror;
+	int r, ret;
+
+	/*
+	 * Do not return right away on error, as there might be valid page we
+	 * can migrate.
+	 */
+	ret = mm_hmm_migrate_back(mm, vma, new_pte, start, end);
+
+again:
+	down_read(&hmm->rwsem);
+	hlist_for_each_entry(mirror, &hmm->mirrors, mlist) {
+		r = hmm_mirror_migrate_back(mirror, event, new_pte,
+					    dst, start, end);
+		if (r) {
+			ret = ret ? ret : r;
+			mirror = hmm_mirror_ref(mirror);
+			BUG_ON(!mirror);
+			up_read(&hmm->rwsem);
+			hmm_mirror_kill(mirror);
+			hmm_mirror_unref(&mirror);
+			goto again;
+		}
+	}
+	up_read(&hmm->rwsem);
+
+	mm_hmm_migrate_back_cleanup(mm, vma, new_pte, dst, start, end);
+
+	return ret;
+}
+
 int hmm_handle_cpu_fault(struct mm_struct *mm,
 			struct vm_area_struct *vma,
 			pmd_t *pmdp, unsigned long addr,
@@ -1068,6 +1114,111 @@ out:
 }
 EXPORT_SYMBOL(hmm_mirror_fault);
 
+static int hmm_mirror_migrate_back(struct hmm_mirror *mirror,
+				   struct hmm_event *event,
+				   pte_t *new_pte,
+				   dma_addr_t *dst,
+				   unsigned long start,
+				   unsigned long end)
+{
+	unsigned long addr, i, npages = (end - start) >> PAGE_SHIFT;
+	struct hmm_device *device = mirror->device;
+	struct device *dev = mirror->device->dev;
+	struct hmm_pt_iter iter;
+	int r, ret = 0;
+
+	hmm_pt_iter_init(&iter, &mirror->pt);
+	for (addr = start, i = 0; addr < end; addr += PAGE_SIZE, ++i) {
+		unsigned long next = end;
+		dma_addr_t *hmm_pte;
+
+		hmm_pte_clear_select(&dst[i]);
+
+		if (!pte_present(new_pte[i]))
+			continue;
+		hmm_pte = hmm_pt_iter_lookup(&iter, addr, &next);
+		if (!hmm_pte)
+			continue;
+
+		if (!hmm_pte_test_valid_dev(hmm_pte))
+			continue;
+
+		dst[i] = hmm_pte_from_pfn(pte_pfn(new_pte[i]));
+		hmm_pte_set_select(&dst[i]);
+		hmm_pte_set_write(&dst[i]);
+	}
+
+	if (dev) {
+		ret = hmm_mirror_dma_map_range(mirror, dst, NULL, npages);
+		if (ret) {
+			for (i = 0; i < npages; ++i) {
+				if (!hmm_pte_test_select(&dst[i]))
+					continue;
+				if (hmm_pte_test_valid_dma(&dst[i]))
+					continue;
+				dst[i] = 0;
+			}
+		}
+	}
+
+	r = device->ops->copy_from_device(mirror, event, dst, start, end);
+
+	/* Update mirror page table with successfully migrated entry. */
+	for (addr = start; addr < end;) {
+		unsigned long idx, next = end, npages;
+		dma_addr_t *hmm_pte;
+
+		hmm_pte = hmm_pt_iter_walk(&iter, &addr, &next);
+		if (!hmm_pte)
+			continue;
+		idx = (addr - event->start) >> PAGE_SHIFT;
+		npages = (next - addr) >> PAGE_SHIFT;
+		hmm_pt_iter_directory_lock(&iter);
+		for (i = 0; i < npages; i++, idx++) {
+			if (!hmm_pte_test_valid_pfn(&dst[idx]) &&
+			    !hmm_pte_test_valid_dma(&dst[idx])) {
+				if (hmm_pte_test_valid_dev(&hmm_pte[i])) {
+					hmm_pte[i] = 0;
+					hmm_pt_iter_directory_unref(&iter);
+				}
+				continue;
+			}
+
+			VM_BUG_ON(!hmm_pte_test_select(&dst[idx]));
+			VM_BUG_ON(!hmm_pte_test_valid_dev(&hmm_pte[i]));
+			hmm_pte[i] = dst[idx];
+		}
+		hmm_pt_iter_directory_unlock(&iter);
+
+		/* DMA unmap failed migrate entry. */
+		if (dev) {
+			idx = (addr - event->start) >> PAGE_SHIFT;
+			for (i = 0; i < npages; i++, idx++) {
+				dma_addr_t dma_addr;
+
+				/*
+				 * Failed entry have the valid bit clear but
+				 * the select bit remain set.
+				 */
+				if (!hmm_pte_test_select(&dst[idx]) ||
+				    hmm_pte_test_valid_dma(&dst[i]))
+					continue;
+
+				hmm_pte_set_valid_dma(&dst[idx]);
+				dma_addr = hmm_pte_dma_addr(dst[idx]);
+				dma_unmap_page(dev, dma_addr, PAGE_SIZE,
+					       DMA_BIDIRECTIONAL);
+				dst[idx] = 0;
+			}
+		}
+
+		addr = next;
+	}
+	hmm_pt_iter_fini(&iter);
+
+	return ret ? ret : r;
+}
+
 /* hmm_mirror_range_discard() - discard a range of address.
  *
  * @mirror: The mirror struct.
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/15] HMM: fork copy migrated memory into system memory for child process.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (10 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 11/15] HMM: add helpers for migration back to system memory v3 Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 13/15] HMM: CPU page fault on migrated memory Jérôme Glisse
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

When forking if process being fork had any memory migrated to some
device memory, we need to make a system copy for the child process.
Latter patches can revisit this and use the same COW semantic for
device memory.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 mm/hmm.c | 38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 5400dfb..e23b264 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -480,7 +480,37 @@ int hmm_mm_fork(struct mm_struct *src_mm,
 		unsigned long start,
 		unsigned long end)
 {
-	return -ENOMEM;
+	unsigned long npages = (end - start) >> PAGE_SHIFT;
+	struct hmm_event event;
+	dma_addr_t *dst;
+	struct hmm *hmm;
+	pte_t *new_pte;
+	int ret;
+
+	hmm = hmm_ref(src_mm->hmm);
+	if (!hmm)
+		return -EINVAL;
+
+
+	dst = kcalloc(npages, sizeof(*dst), GFP_KERNEL);
+	if (!dst) {
+		hmm_unref(hmm);
+		return -ENOMEM;
+	}
+	new_pte = kcalloc(npages, sizeof(*new_pte), GFP_KERNEL);
+	if (!new_pte) {
+		kfree(dst);
+		hmm_unref(hmm);
+		return -ENOMEM;
+	}
+
+	hmm_event_init(&event, hmm, start, end, HMM_FORK);
+	ret = hmm_migrate_back(hmm, &event, dst_mm, dst_vma, new_pte,
+			       dst, start, end);
+	hmm_unref(hmm);
+	kfree(new_pte);
+	kfree(dst);
+	return ret;
 }
 EXPORT_SYMBOL(hmm_mm_fork);
 
@@ -656,6 +686,12 @@ static void hmm_mirror_update_pte(struct hmm_mirror *mirror,
 	}
 
 	if (hmm_pte_test_valid_dev(hmm_pte)) {
+		/*
+		 * On fork device memory is duplicated so no need to write
+		 * protect it.
+		 */
+		if (event->etype == HMM_FORK)
+			return;
 		*hmm_pte &= event->pte_mask;
 		if (!hmm_pte_test_valid_dev(hmm_pte))
 			hmm_pt_iter_directory_unref(iter);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 13/15] HMM: CPU page fault on migrated memory.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (11 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 12/15] HMM: fork copy migrated memory into system memory for child process Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 14/15] HMM: add mirror fault support for system to device memory migration v3 Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 15/15] HMM/dummy: add fake device memory to dummy HMM device driver Jérôme Glisse
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

When CPU try to access memory that have been migrated to device memory
we have to copy it back to system memory. This patch implement the CPU
page fault handler for special HMM pte swap entry.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 mm/hmm.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index e23b264..97193e6 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -469,7 +469,59 @@ int hmm_handle_cpu_fault(struct mm_struct *mm,
 			pmd_t *pmdp, unsigned long addr,
 			unsigned flags, pte_t orig_pte)
 {
-	return VM_FAULT_SIGBUS;
+	unsigned long start, end;
+	struct hmm_event event;
+	swp_entry_t entry;
+	struct hmm *hmm;
+	dma_addr_t dst;
+	pte_t new_pte;
+	int ret;
+
+	/* First check for poisonous entry. */
+	entry = pte_to_swp_entry(orig_pte);
+	if (is_hmm_entry_poisonous(entry))
+		return VM_FAULT_SIGBUS;
+
+	hmm = hmm_ref(mm->hmm);
+	if (!hmm) {
+		pte_t poison = swp_entry_to_pte(make_hmm_entry_poisonous());
+		spinlock_t *ptl;
+		pte_t *ptep;
+
+		/* Check if cpu pte is already updated. */
+		ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+		if (!pte_same(*ptep, orig_pte)) {
+			pte_unmap_unlock(ptep, ptl);
+			return 0;
+		}
+		set_pte_at(mm, addr, ptep, poison);
+		pte_unmap_unlock(ptep, ptl);
+		return VM_FAULT_SIGBUS;
+	}
+
+	/*
+	 * TODO we likely want to migrate more then one page at a time, we need
+	 * to call into the device driver to get good hint on the range to copy
+	 * back to system memory.
+	 *
+	 * For now just live with the one page at a time solution.
+	 */
+	start = addr & PAGE_MASK;
+	end = start + PAGE_SIZE;
+	hmm_event_init(&event, hmm, start, end, HMM_COPY_FROM_DEVICE);
+
+	ret = hmm_migrate_back(hmm, &event, mm, vma, &new_pte,
+			       &dst, start, end);
+	hmm_unref(hmm);
+	switch (ret) {
+	case 0:
+		return VM_FAULT_MAJOR;
+	case -ENOMEM:
+		return VM_FAULT_OOM;
+	case -EINVAL:
+	default:
+		return VM_FAULT_SIGBUS;
+	}
 }
 EXPORT_SYMBOL(hmm_handle_cpu_fault);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 14/15] HMM: add mirror fault support for system to device memory migration v3.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (12 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 13/15] HMM: CPU page fault on migrated memory Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  2015-08-13 19:37 ` [PATCH 15/15] HMM/dummy: add fake device memory to dummy HMM device driver Jérôme Glisse
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse, Jatin Kumar

Migration to device memory is done as a special kind of device mirror
fault. Memory migration being initiated by device driver and never by
HMM (unless it is a migration back to system memory).

Changed since v1:
  - Adapt to HMM page table changes.

Changed since v2:
  - Fix error code path for migration, calling mm_hmm_migrate_cleanup()
    is wrong.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jatin Kumar <jakumar@nvidia.com>
---
 mm/hmm.c | 170 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/mm/hmm.c b/mm/hmm.c
index 97193e6..863aa3a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -53,6 +53,10 @@ static int hmm_mirror_migrate_back(struct hmm_mirror *mirror,
 				   dma_addr_t *dst,
 				   unsigned long start,
 				   unsigned long end);
+static int hmm_mirror_migrate(struct hmm_mirror *mirror,
+			      struct hmm_event *event,
+			      struct vm_area_struct *vma,
+			      struct hmm_pt_iter *iter);
 static inline int hmm_mirror_update(struct hmm_mirror *mirror,
 				    struct hmm_event *event,
 				    struct page *page);
@@ -101,6 +105,12 @@ static inline int hmm_event_init(struct hmm_event *event,
 	return 0;
 }
 
+static inline unsigned long hmm_event_npages(const struct hmm_event *event)
+{
+	return (PAGE_ALIGN(event->end) - (event->start & PAGE_MASK)) >>
+	       PAGE_SHIFT;
+}
+
 
 /* hmm - core HMM functions.
  *
@@ -1180,6 +1190,9 @@ retry:
 	}
 
 	switch (event->etype) {
+	case HMM_COPY_TO_DEVICE:
+		ret = hmm_mirror_migrate(mirror, event, vma, &iter);
+		break;
 	case HMM_DEVICE_RFAULT:
 	case HMM_DEVICE_WFAULT:
 		ret = hmm_mirror_handle_fault(mirror, event, vma, &iter);
@@ -1307,6 +1320,163 @@ static int hmm_mirror_migrate_back(struct hmm_mirror *mirror,
 	return ret ? ret : r;
 }
 
+static int hmm_mirror_migrate(struct hmm_mirror *mirror,
+			      struct hmm_event *event,
+			      struct vm_area_struct *vma,
+			      struct hmm_pt_iter *iter)
+{
+	struct hmm_device *device = mirror->device;
+	struct hmm *hmm = mirror->hmm;
+	struct hmm_event invalidate;
+	unsigned long addr, npages;
+	struct hmm_mirror *tmp;
+	dma_addr_t *dst;
+	pte_t *save_pte;
+	int r = 0, ret;
+
+	/* Only allow migration of private anonymous memory. */
+	if (vma->vm_ops || unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)))
+		return -EINVAL;
+
+	/*
+	 * TODO More advance loop for splitting migration into several chunk.
+	 * For now limit the amount that can be migrated in one shot. Also we
+	 * would need to see if we need rescheduling if this is happening as
+	 * part of system call to the device driver.
+	 */
+	npages = hmm_event_npages(event);
+	if (npages * max(sizeof(*dst), sizeof(*save_pte)) > PAGE_SIZE)
+		return -EINVAL;
+	dst = kcalloc(npages, sizeof(*dst), GFP_KERNEL);
+	if (dst == NULL)
+		return -ENOMEM;
+	save_pte = kcalloc(npages, sizeof(*save_pte), GFP_KERNEL);
+	if (save_pte == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = mm_hmm_migrate(hmm->mm, vma, save_pte, &event->backoff,
+			     &hmm->mmu_notifier, event->start, event->end);
+	if (ret == -EAGAIN)
+		goto out;
+	if (ret)
+		goto out;
+
+	/*
+	 * Now invalidate for all other device, note that they can not race
+	 * with us as the CPU page table is full of special entry.
+	 */
+	hmm_event_init(&invalidate, mirror->hmm, event->start,
+		       event->end, HMM_MIGRATE);
+again:
+	down_read(&hmm->rwsem);
+	hlist_for_each_entry(tmp, &hmm->mirrors, mlist) {
+		if (tmp == mirror)
+			continue;
+		if (hmm_mirror_update(tmp, &invalidate, NULL)) {
+			hmm_mirror_ref(tmp);
+			up_read(&hmm->rwsem);
+			hmm_mirror_kill(tmp);
+			hmm_mirror_unref(&tmp);
+			goto again;
+		}
+	}
+	up_read(&hmm->rwsem);
+
+	/*
+	 * Populate the mirror page table with saved entry and also mark entry
+	 * that can be migrated.
+	 */
+	for (addr = event->start; addr < event->end;) {
+		unsigned long i, idx, next = event->end, npages;
+		dma_addr_t *hmm_pte;
+
+		hmm_pte = hmm_pt_iter_populate(iter, addr, &next);
+		if (!hmm_pte) {
+			ret = -ENOMEM;
+			goto out_cleanup;
+		}
+
+		npages = (next - addr) >> PAGE_SHIFT;
+		idx = (addr - event->start) >> PAGE_SHIFT;
+		hmm_pt_iter_directory_lock(iter);
+		for (i = 0; i < npages; i++, idx++) {
+			hmm_pte_clear_select(&hmm_pte[i]);
+			if (!pte_present(save_pte[idx]))
+				continue;
+			hmm_pte_set_select(&hmm_pte[i]);
+			/* This can not be a valid device entry here. */
+			VM_BUG_ON(hmm_pte_test_valid_dev(&hmm_pte[i]));
+			if (hmm_pte_test_valid_dma(&hmm_pte[i]))
+				continue;
+
+			if (hmm_pte_test_valid_pfn(&hmm_pte[i]))
+				continue;
+
+			hmm_pt_iter_directory_ref(iter);
+			hmm_pte[i] = hmm_pte_from_pfn(pte_pfn(save_pte[idx]));
+			if (pte_write(save_pte[idx]))
+				hmm_pte_set_write(&hmm_pte[i]);
+			hmm_pte_set_select(&hmm_pte[i]);
+		}
+		hmm_pt_iter_directory_unlock(iter);
+
+		if (device->dev) {
+			spinlock_t *lock;
+
+			lock = hmm_pt_iter_directory_lock_ptr(iter);
+			ret = hmm_mirror_dma_map_range(mirror, hmm_pte,
+						       lock, npages);
+			/* Keep going only for entry that have been mapped. */
+			if (ret) {
+				for (i = 0; i < npages; ++i) {
+					if (!hmm_pte_test_select(&dst[i]))
+						continue;
+					if (hmm_pte_test_valid_dma(&dst[i]))
+						continue;
+					hmm_pte_clear_select(&hmm_pte[i]);
+				}
+			}
+		}
+		addr = next;
+	}
+
+	/* Now Waldo we can do the copy. */
+	r = device->ops->copy_to_device(mirror, event, vma, dst,
+					event->start, event->end);
+
+	/* Update mirror page table with successfully migrated entry. */
+	for (addr = event->start; addr < event->end;) {
+		unsigned long i, idx, next = event->end, npages;
+		dma_addr_t *hmm_pte;
+
+		hmm_pte = hmm_pt_iter_walk(iter, &addr, &next);
+		if (!hmm_pte)
+			continue;
+		npages = (next - addr) >> PAGE_SHIFT;
+		idx = (addr - event->start) >> PAGE_SHIFT;
+		hmm_pt_iter_directory_lock(iter);
+		for (i = 0; i < npages; i++, idx++) {
+			if (!hmm_pte_test_valid_dev(&dst[idx]))
+				continue;
+
+			VM_BUG_ON(!hmm_pte_test_select(&hmm_pte[i]));
+			hmm_pte[i] = dst[idx];
+		}
+		hmm_pt_iter_directory_unlock(iter);
+		addr = next;
+	}
+
+out_cleanup:
+	mm_hmm_migrate_cleanup(hmm->mm, vma, save_pte, dst,
+			       event->start, event->end);
+out:
+	kfree(save_pte);
+	kfree(dst);
+	return ret ? ret : r;
+}
+
 /* hmm_mirror_range_discard() - discard a range of address.
  *
  * @mirror: The mirror struct.
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 15/15] HMM/dummy: add fake device memory to dummy HMM device driver.
  2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
                   ` (13 preceding siblings ...)
  2015-08-13 19:37 ` [PATCH 14/15] HMM: add mirror fault support for system to device memory migration v3 Jérôme Glisse
@ 2015-08-13 19:37 ` Jérôme Glisse
  14 siblings, 0 replies; 16+ messages in thread
From: Jérôme Glisse @ 2015-08-13 19:37 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm
  Cc: Linus Torvalds, joro, Mel Gorman, H. Peter Anvin, Peter Zijlstra,
	Andrea Arcangeli, Johannes Weiner, Larry Woodman, Rik van Riel,
	Dave Airlie, Brendan Conoboy, Joe Donohue, Christophe Harle,
	Duncan Poole, Sherry Cheung, Subhash Gutti, John Hubbard,
	Mark Hairgrove, Lucien Dunning, Cameron Buschardt,
	Arvind Gopalakrishnan, Haggai Eran, Shachar Raindel, Liran Liss,
	Roland Dreier, Ben Sander, Greg Stoner, John Bridgman,
	Michael Mantor, Paul Blinzer, Leonid Shamis, Laurent Morichetti,
	Alexander Deucher, Jérôme Glisse

This patch add fake device memory by simply using regular system memory
page and pretending they are not accessible by the CPU directly. This
serve to showcase how migration to device memory can be impemented inside
a real device driver.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/char/hmm_dummy.c       | 395 +++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/hmm_dummy.h |  17 +-
 2 files changed, 391 insertions(+), 21 deletions(-)

diff --git a/drivers/char/hmm_dummy.c b/drivers/char/hmm_dummy.c
index 52843cb..a4af5b1 100644
--- a/drivers/char/hmm_dummy.c
+++ b/drivers/char/hmm_dummy.c
@@ -43,6 +43,9 @@
 #define HMM_DUMMY_MAX_DEVICES 4
 #define HMM_DUMMY_MAX_MIRRORS 4
 
+#define HMM_DUMMY_RMEM_SIZE (32UL << 20UL)
+#define HMM_DUMMY_RMEM_NBITS (HMM_DUMMY_RMEM_SIZE >> PAGE_SHIFT)
+
 struct dummy_device;
 
 struct dummy_mirror {
@@ -70,6 +73,8 @@ struct dummy_device {
 	/* device file mapping tracking (keep track of all vma) */
 	struct dummy_mirror	*dmirrors[HMM_DUMMY_MAX_MIRRORS];
 	struct address_space	*fmapping[HMM_DUMMY_MAX_MIRRORS];
+	struct page		**rmem_pages;
+	unsigned long		*rmem_bitmap;
 };
 
 struct dummy_event {
@@ -77,11 +82,30 @@ struct dummy_event {
 	struct list_head	list;
 	uint64_t		nsys_pages;
 	uint64_t		nfaulted_sys_pages;
+	uint64_t		ndev_pages;
+	uint64_t		nfaulted_dev_pages;
+	unsigned		*dpfn;
+	unsigned		npages;
 	bool			backoff;
 };
 
 static struct dummy_device ddevices[HMM_DUMMY_MAX_DEVICES];
 
+/* dummy_device_pfn_to_page() - Return struct page of fake device memory.
+ *
+ * @ddevice: The dummy device.
+ * @pfn: The fake device page frame number.
+ * Return: The pointer to the struct page of the fake device memory.
+ *
+ * For the dummy device remote memory we simply allocate regular page and
+ * pretend they are not accessible directly by the CPU.
+ */
+struct page *dummy_device_pfn_to_page(struct dummy_device *ddevice,
+				      unsigned pfn)
+{
+	return ddevice->rmem_pages[pfn];
+}
+
 
 static void dummy_mirror_release(struct hmm_mirror *mirror)
 {
@@ -233,9 +257,11 @@ static int dummy_mirror_pt_invalidate(struct hmm_mirror *mirror,
 	unsigned long addr = event->start;
 	struct hmm_pt_iter miter, diter;
 	struct dummy_mirror *dmirror;
+	struct dummy_device *ddevice;
 	int ret = 0;
 
 	dmirror = container_of(mirror, struct dummy_mirror, mirror);
+	ddevice = dmirror->ddevice;
 
 	hmm_pt_iter_init(&diter, &dmirror->pt);
 	hmm_pt_iter_init(&miter, &mirror->pt);
@@ -259,6 +285,24 @@ static int dummy_mirror_pt_invalidate(struct hmm_mirror *mirror,
 		 */
 		hmm_pt_iter_directory_lock(&diter);
 
+		/* Handle the fake device memory page table entry case. */
+		if (hmm_pte_test_valid_dev(dpte)) {
+			unsigned dpfn = hmm_pte_dev_addr(*dpte) >> PAGE_SHIFT;
+
+			*dpte &= event->pte_mask;
+			if (!hmm_pte_test_valid_dev(dpte)) {
+				/*
+				 * Just directly free the fake device memory.
+				 */
+				clear_bit(dpfn, ddevice->rmem_bitmap);
+				hmm_pt_iter_directory_unref(&diter);
+			}
+			hmm_pt_iter_directory_unlock(&diter);
+
+			addr += PAGE_SIZE;
+			continue;
+		}
+
 		/*
 		 * Just skip this entry if it is not valid inside the dummy
 		 * mirror page table.
@@ -341,10 +385,178 @@ static int dummy_mirror_update(struct hmm_mirror *mirror,
 	}
 }
 
+static int dummy_copy_from_device(struct hmm_mirror *mirror,
+				  const struct hmm_event *event,
+				  dma_addr_t *dst,
+				  unsigned long start,
+				  unsigned long end)
+{
+	struct hmm_pt_iter miter, diter;
+	struct dummy_device *ddevice;
+	struct dummy_mirror *dmirror;
+	struct dummy_event *devent;
+	unsigned long addr = start;
+	int ret = 0, i = 0;
+
+	dmirror = container_of(mirror, struct dummy_mirror, mirror);
+	devent = container_of(event, struct dummy_event, hevent);
+	ddevice = dmirror->ddevice;
+
+	hmm_pt_iter_init(&diter, &dmirror->pt);
+	hmm_pt_iter_init(&miter, &mirror->pt);
+
+	do {
+		struct page *spage, *dpage;
+		unsigned long dpfn, next = end;
+		dma_addr_t *mpte, *dpte;
+
+		mpte = hmm_pt_iter_lookup(&miter, addr, &next);
+		if (!mpte || !hmm_pte_test_valid_dev(mpte) ||
+		    !hmm_pte_test_select(&dst[i])) {
+			i++;
+			continue;
+		}
+
+		dpte = hmm_pt_iter_lookup(&diter, addr, &next);
+		/*
+		 * Sanity check, that that device driver page table is a valid
+		 * entry pointing to device memory.
+		 */
+		if (!dpte || !hmm_pte_test_valid_dev(dpte) ||
+		    !hmm_pte_test_select(&dst[i])) {
+			ret = -EINVAL;
+			break;
+		}
+
+		dpfn = hmm_pte_dev_addr(*mpte) >> PAGE_SHIFT;
+		spage = dummy_device_pfn_to_page(ddevice, dpfn);
+		dpage = pfn_to_page(hmm_pte_pfn(dst[i]));
+		copy_highpage(dpage, spage);
+
+		/* Directly free the fake device memory. */
+		clear_bit(dpfn, ddevice->rmem_bitmap);
+
+		if (hmm_pte_test_and_clear_dirty(dpte))
+			hmm_pte_set_dirty(&dst[i]);
+
+		/*
+		 * This is bit inefficient to lock directoy per entry instead
+		 * of locking directory and going over all its entry. But this
+		 * is a dummy driver and we do not care about efficiency here.
+		 */
+		hmm_pt_iter_directory_lock(&diter);
+		*dpte = dst[i];
+		hmm_pte_clear_dirty(dpte);
+		hmm_pt_iter_directory_unlock(&diter);
+
+		i++;
+	} while (addr += PAGE_SIZE, addr < end);
+
+	hmm_pt_iter_fini(&miter);
+	hmm_pt_iter_fini(&diter);
+
+	return ret;
+}
+
+static int dummy_copy_to_device(struct hmm_mirror *mirror,
+				const struct hmm_event *event,
+				struct vm_area_struct *vma,
+				dma_addr_t *dst,
+				unsigned long start,
+				unsigned long end)
+{
+	struct hmm_pt_iter miter, diter;
+	struct dummy_device *ddevice;
+	struct dummy_mirror *dmirror;
+	struct dummy_event *devent;
+	unsigned long addr = start;
+	int ret = 0, i = 0;
+
+	dmirror = container_of(mirror, struct dummy_mirror, mirror);
+	devent = container_of(event, struct dummy_event, hevent);
+	ddevice = dmirror->ddevice;
+
+	hmm_pt_iter_init(&diter, &dmirror->pt);
+	hmm_pt_iter_init(&miter, &mirror->pt);
+
+	do {
+		struct page *spage, *dpage;
+		dma_addr_t *mpte, *dpte;
+		unsigned long next = end;
+
+		mpte = hmm_pt_iter_lookup(&miter, addr, &next);
+		/*
+		 * Sanity check, this is only important for debugging HMM, a
+		 * device driver can ignore those test and assume everything
+		 * below is false (ie mpte is not NULL and it is a valid pfn
+		 * entry with the select bit set).
+		 */
+		if (!mpte || !hmm_pte_test_valid_pfn(mpte) ||
+		    !hmm_pte_test_select(mpte)) {
+			pr_debug("(%s:%4d) (HMM FATAL) empty pt at 0x%lX\n",
+				 __FILE__, __LINE__, addr);
+			ret = -EINVAL;
+			break;
+		}
+
+		dpte = hmm_pt_iter_populate(&diter, addr, &next);
+		if (!dpte) {
+			ret = -ENOMEM;
+			break;
+		}
+		/*
+		 * Sanity check, this is only important for debugging HMM, a
+		 * device driver can ignore those test and assume everything
+		 * below is false (ie dpte is not a valid device entry).
+		 */
+		if (hmm_pte_test_valid_dev(dpte)) {
+			pr_debug("(%s:%4d) (DUMMY FATAL) existing device entry %pad at 0x%lX\n",
+				 __FILE__, __LINE__, dpte, addr);
+			ret = -EINVAL;
+			break;
+		}
+
+		spage = pfn_to_page(hmm_pte_pfn(*mpte));
+		dpage = dummy_device_pfn_to_page(ddevice, devent->dpfn[i]);
+		dst[i] = hmm_pte_from_dev_addr(devent->dpfn[i] << PAGE_SHIFT);
+		copy_highpage(dpage, spage);
+		devent->dpfn[i] = -1;
+		devent->nfaulted_dev_pages++;
+
+		/*
+		 * This is bit inefficient to lock directoy per entry instead
+		 * of locking directory and going over all its entry. But this
+		 * is a dummy driver and we do not care about efficiency here.
+		 */
+		hmm_pt_iter_directory_lock(&diter);
+		if (hmm_pte_test_and_clear_dirty(dpte))
+			hmm_pte_set_dirty(&dst[i]);
+		if (vma->vm_flags & VM_WRITE)
+			hmm_pte_set_write(&dst[i]);
+		/*
+		 * Increment ref count of dummy page table directory if the
+		 * previous entry was not valid. Note that previous entry
+		 * can not be a valid device memory entry.
+		 */
+		if (!hmm_pte_test_valid_pfn(dpte))
+			hmm_pt_iter_directory_ref(&diter);
+		*dpte = dst[i];
+		hmm_pt_iter_directory_unlock(&diter);
+
+	} while (i++, addr += PAGE_SIZE, addr < end);
+
+	hmm_pt_iter_fini(&miter);
+	hmm_pt_iter_fini(&diter);
+
+	return ret;
+}
+
 static const struct hmm_device_ops hmm_dummy_ops = {
 	.release		= &dummy_mirror_release,
 	.free			= &dummy_mirror_free,
 	.update			= &dummy_mirror_update,
+	.copy_from_device	= &dummy_copy_from_device,
+	.copy_to_device		= &dummy_copy_to_device,
 };
 
 
@@ -443,6 +655,7 @@ static int dummy_read(struct dummy_mirror *dmirror,
 		      char __user *buf,
 		      size_t size)
 {
+	struct dummy_device *ddevice = dmirror->ddevice;
 	struct hmm_event *event = &devent->hevent;
 	long r = 0;
 
@@ -483,14 +696,21 @@ static int dummy_read(struct dummy_mirror *dmirror,
 			 * coherent value for each page table entry.
 			 */
 			dpte = ACCESS_ONCE(*dptep);
-			if (!hmm_pte_test_valid_pfn(&dpte)) {
+
+			if (hmm_pte_test_valid_dev(&dpte)) {
+				dma_addr_t dpfn;
+
+				dpfn = hmm_pte_dev_addr(dpte) >> PAGE_SHIFT;
+				page = dummy_device_pfn_to_page(ddevice, dpfn);
+				devent->ndev_pages++;
+			} else if (hmm_pte_test_valid_pfn(&dpte)) {
+				page = pfn_to_page(hmm_pte_pfn(dpte));
+				devent->nsys_pages++;
+			} else {
 				dummy_mirror_access_stop(dmirror, devent);
 				break;
 			}
 
-			devent->nsys_pages++;
-
-			page = pfn_to_page(hmm_pte_pfn(dpte));
 			ptr = kmap(page);
 			r = copy_to_user(buf, ptr + offset, count);
 
@@ -515,6 +735,7 @@ static int dummy_write(struct dummy_mirror *dmirror,
 		       char __user *buf,
 		       size_t size)
 {
+	struct dummy_device *ddevice = dmirror->ddevice;
 	struct hmm_event *event = &devent->hevent;
 	long r = 0;
 
@@ -555,15 +776,25 @@ static int dummy_write(struct dummy_mirror *dmirror,
 			 * coherent value for each page table entry.
 			 */
 			dpte = ACCESS_ONCE(*dptep);
-			if (!hmm_pte_test_valid_pfn(&dpte) ||
-			    !hmm_pte_test_write(&dpte)) {
+			if (!hmm_pte_test_write(&dpte)) {
+				dummy_mirror_access_stop(dmirror, devent);
+				break;
+			}
+			
+			if (hmm_pte_test_valid_dev(&dpte)) {
+				dma_addr_t dpfn;
+
+				dpfn = hmm_pte_dev_addr(dpte) >> PAGE_SHIFT;
+				page = dummy_device_pfn_to_page(ddevice, dpfn);
+				devent->ndev_pages++;
+			} else if (hmm_pte_test_valid_pfn(&dpte)) {
+				page = pfn_to_page(hmm_pte_pfn(dpte));
+				devent->nsys_pages++;
+			} else {
 				dummy_mirror_access_stop(dmirror, devent);
 				break;
 			}
 
-			devent->nsys_pages++;
-
-			page = pfn_to_page(hmm_pte_pfn(dpte));
 			ptr = kmap(page);
 			r = copy_from_user(ptr + offset, buf, count);
 
@@ -583,6 +814,58 @@ static int dummy_write(struct dummy_mirror *dmirror,
 	return r;
 }
 
+static int dummy_lmem_to_rmem(struct dummy_mirror *dmirror,
+			      struct dummy_event *devent)
+{
+	struct dummy_device *ddevice = dmirror->ddevice;
+	struct hmm_mirror *mirror = &dmirror->mirror;
+	int i, ret;
+
+	devent->hevent.start = PAGE_MASK & devent->hevent.start;
+	devent->hevent.end = PAGE_ALIGN(devent->hevent.end);
+	devent->hevent.etype = HMM_COPY_TO_DEVICE;
+
+	/* Simple bitmap allocator for fake device memory. */
+	devent->dpfn = kcalloc(devent->npages, sizeof(unsigned), GFP_KERNEL);
+	if (devent->dpfn == NULL) {
+		return -ENOMEM;
+	}
+
+	/*
+	 * Pre-allocate device memory. Device driver is free to pre-allocate
+	 * memory or to allocate it inside the copy callback.
+	 */
+	mutex_lock(&ddevice->mutex);
+	for (i = 0; i < devent->npages; ++i) {
+		int idx;
+
+		idx = find_first_zero_bit(ddevice->rmem_bitmap,
+					  HMM_DUMMY_RMEM_NBITS);
+		if (idx < 0) {
+			while ((--i) > 0) {
+				idx = devent->dpfn[i];
+				clear_bit(idx, ddevice->rmem_bitmap);
+			}
+			mutex_unlock(&ddevice->mutex);
+			kfree(devent->dpfn);
+			return -ENOMEM;
+		}
+		devent->dpfn[i] = idx;
+		set_bit(idx, ddevice->rmem_bitmap);
+	}
+	mutex_unlock(&ddevice->mutex);
+
+	ret = hmm_mirror_fault(mirror, &devent->hevent);
+	for (i = 0; i < devent->npages; ++i) {
+		if (devent->dpfn[i] == -1U)
+			continue;
+		clear_bit(devent->dpfn[i], ddevice->rmem_bitmap);
+	}
+	kfree(devent->dpfn);
+
+	return ret;
+}
+
 
 /*
  * Below are the vm operation for the dummy device file. Sadly we can not allow
@@ -695,11 +978,26 @@ static int dummy_fops_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+struct dummy_ioctlp {
+	uint64_t		address;
+	uint64_t		size;
+};
+
+static void dummy_event_init(struct dummy_event *devent,
+			     const struct dummy_ioctlp *ioctlp)
+{
+	memset(devent, 0, sizeof(*devent));
+	devent->hevent.start = ioctlp->address;
+	devent->hevent.end = ioctlp->address + ioctlp->size;
+	devent->npages = PAGE_ALIGN(ioctlp->size) >> PAGE_SHIFT;
+}
+
 static long dummy_fops_unlocked_ioctl(struct file *filp,
 				      unsigned int command,
 				      unsigned long arg)
 {
 	void __user *uarg = (void __user *)arg;
+	struct hmm_dummy_migrate dmigrate;
 	struct dummy_device *ddevice;
 	struct dummy_mirror *dmirror;
 	struct hmm_dummy_write dwrite;
@@ -765,15 +1063,15 @@ static long dummy_fops_unlocked_ioctl(struct file *filp,
 			return -EFAULT;
 		}
 
-		memset(&devent, 0, sizeof(devent));
-		devent.hevent.start = dread.address;
-		devent.hevent.end = dread.address + dread.size;
+		dummy_event_init(&devent, (struct dummy_ioctlp*)&dread);
 		ret = dummy_read(dmirror, &devent,
 				 (void __user *)dread.ptr,
 				 dread.size);
 
 		dread.nsys_pages = devent.nsys_pages;
 		dread.nfaulted_sys_pages = devent.nfaulted_sys_pages;
+		dread.ndev_pages = devent.ndev_pages;
+		dread.nfaulted_dev_pages = devent.nfaulted_dev_pages;
 		if (copy_to_user(uarg, &dread, sizeof(dread))) {
 			dummy_mirror_worker_thread_stop(dmirror);
 			return -EFAULT;
@@ -787,15 +1085,15 @@ static long dummy_fops_unlocked_ioctl(struct file *filp,
 			return -EFAULT;
 		}
 
-		memset(&devent, 0, sizeof(devent));
-		devent.hevent.start = dwrite.address;
-		devent.hevent.end = dwrite.address + dwrite.size;
+		dummy_event_init(&devent, (struct dummy_ioctlp*)&dwrite);
 		ret = dummy_write(dmirror, &devent,
 				  (void __user *)dwrite.ptr,
 				  dwrite.size);
 
 		dwrite.nsys_pages = devent.nsys_pages;
 		dwrite.nfaulted_sys_pages = devent.nfaulted_sys_pages;
+		dwrite.ndev_pages = devent.ndev_pages;
+		dwrite.nfaulted_dev_pages = devent.nfaulted_dev_pages;
 		if (copy_to_user(uarg, &dwrite, sizeof(dwrite))) {
 			dummy_mirror_worker_thread_stop(dmirror);
 			return -EFAULT;
@@ -803,6 +1101,23 @@ static long dummy_fops_unlocked_ioctl(struct file *filp,
 
 		dummy_mirror_worker_thread_stop(dmirror);
 		return ret;
+	case HMM_DUMMY_MIGRATE_TO:
+		if (copy_from_user(&dmigrate, uarg, sizeof(dmigrate))) {
+			dummy_mirror_worker_thread_stop(dmirror);
+			return -EFAULT;
+		}
+
+		dummy_event_init(&devent, (struct dummy_ioctlp*)&dmigrate);
+		ret = dummy_lmem_to_rmem(dmirror, &devent);
+
+		dmigrate.nfaulted_dev_pages = devent.nfaulted_dev_pages;
+		if (copy_to_user(uarg, &dmigrate, sizeof(dmigrate))) {
+			dummy_mirror_worker_thread_stop(dmirror);
+			return -EFAULT;
+		}
+
+		dummy_mirror_worker_thread_stop(dmirror);
+		return ret;
 	default:
 		return -EINVAL;
 	}
@@ -826,20 +1141,44 @@ static const struct file_operations hmm_dummy_fops = {
  */
 static int dummy_device_init(struct dummy_device *ddevice)
 {
-	int ret, i;
+	struct page **pages;
+	unsigned long *bitmap;
+	int ret, i, npages;
+
+	npages = HMM_DUMMY_RMEM_SIZE >> PAGE_SHIFT;
+	bitmap = kzalloc(BITS_TO_LONGS(npages) * sizeof(long), GFP_KERNEL);
+	if (!bitmap) {
+		return -ENOMEM;
+	}
+	pages = kzalloc(npages * sizeof(void*), GFP_KERNEL);
+	if (!pages) {
+		kfree(bitmap);
+		return -ENOMEM;
+	}
+	for (i = 0; i < npages; ++i) {
+		pages[i] = alloc_page(GFP_KERNEL);
+		if (!pages[i]) {
+			while ((--i)) {
+				__free_page(pages[i]);
+			}
+			kfree(bitmap);
+			kfree(pages);
+			return -ENOMEM;
+		}
+	}
 
 	ret = alloc_chrdev_region(&ddevice->dev, 0,
 				  HMM_DUMMY_MAX_DEVICES,
 				  ddevice->name);
 	if (ret < 0)
-		return ret;
+		goto error;
 	ddevice->major = MAJOR(ddevice->dev);
 
 	cdev_init(&ddevice->cdevice, &hmm_dummy_fops);
 	ret = cdev_add(&ddevice->cdevice, ddevice->dev, HMM_DUMMY_MAX_MIRRORS);
 	if (ret) {
 		unregister_chrdev_region(ddevice->dev, HMM_DUMMY_MAX_MIRRORS);
-		return ret;
+		goto error;
 	}
 
 	/* Register the hmm device. */
@@ -853,14 +1192,25 @@ static int dummy_device_init(struct dummy_device *ddevice)
 	if (ret) {
 		cdev_del(&ddevice->cdevice);
 		unregister_chrdev_region(ddevice->dev, HMM_DUMMY_MAX_MIRRORS);
+		goto error;
 	}
+	ddevice->rmem_bitmap = bitmap;
+	ddevice->rmem_pages = pages;
+	return 0;
+
+error:
+	for (i = 0; i < npages; ++i) {
+		__free_page(pages[i]);
+	}
+	kfree(bitmap);
+	kfree(pages);
 	return ret;
 }
 
 static void dummy_device_fini(struct dummy_device *ddevice)
 {
 	struct dummy_mirror *dmirror;
-	unsigned i;
+	unsigned i, npages;
 
 	/* First unregister all mirror. */
 	do {
@@ -880,6 +1230,13 @@ static void dummy_device_fini(struct dummy_device *ddevice)
 
 	cdev_del(&ddevice->cdevice);
 	unregister_chrdev_region(ddevice->dev, HMM_DUMMY_MAX_MIRRORS);
+
+	npages = HMM_DUMMY_RMEM_SIZE >> PAGE_SHIFT;
+	for (i = 0; i < npages; ++i) {
+		__free_page(ddevice->rmem_pages[i]);
+	}
+	kfree(ddevice->rmem_bitmap);
+	kfree(ddevice->rmem_pages);
 }
 
 static int __init hmm_dummy_init(void)
diff --git a/include/uapi/linux/hmm_dummy.h b/include/uapi/linux/hmm_dummy.h
index 3af71d4..a98b03d 100644
--- a/include/uapi/linux/hmm_dummy.h
+++ b/include/uapi/linux/hmm_dummy.h
@@ -31,7 +31,9 @@ struct hmm_dummy_read {
 	uint64_t		ptr;
 	uint64_t		nsys_pages;
 	uint64_t		nfaulted_sys_pages;
-	uint64_t		reserved[11];
+	uint64_t		ndev_pages;
+	uint64_t		nfaulted_dev_pages;
+	uint64_t		reserved[9];
 };
 
 struct hmm_dummy_write {
@@ -40,12 +42,23 @@ struct hmm_dummy_write {
 	uint64_t		ptr;
 	uint64_t		nsys_pages;
 	uint64_t		nfaulted_sys_pages;
-	uint64_t		reserved[11];
+	uint64_t		ndev_pages;
+	uint64_t		nfaulted_dev_pages;
+	uint64_t		reserved[9];
+};
+
+struct hmm_dummy_migrate {
+	uint64_t		address;
+	uint64_t		size;
+	uint64_t		nfaulted_sys_pages;
+	uint64_t		nfaulted_dev_pages;
+	uint64_t		reserved[12];
 };
 
 /* Expose the address space of the calling process through hmm dummy dev file */
 #define HMM_DUMMY_EXPOSE_MM	_IO('H', 0x00)
 #define HMM_DUMMY_READ		_IOWR('H', 0x01, struct hmm_dummy_read)
 #define HMM_DUMMY_WRITE		_IOWR('H', 0x02, struct hmm_dummy_write)
+#define HMM_DUMMY_MIGRATE_TO	_IOWR('H', 0x03, struct hmm_dummy_migrate)
 
 #endif /* _UAPI_LINUX_HMM_DUMMY_H */
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-08-13 19:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-13 19:37 [PATCH 00/15] HMM anonymous memory migration Jérôme Glisse
2015-08-13 19:37 ` [PATCH 01/15] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
2015-08-13 19:37 ` [PATCH 02/15] HMM: add special swap filetype for memory migrated to device v2 Jérôme Glisse
2015-08-13 19:37 ` [PATCH 03/15] HMM: add new HMM page table flag (valid device memory) Jérôme Glisse
2015-08-13 19:37 ` [PATCH 04/15] HMM: add new HMM page table flag (select flag) Jérôme Glisse
2015-08-13 19:37 ` [PATCH 05/15] HMM: handle HMM device page table entry on mirror page table fault and update Jérôme Glisse
2015-08-13 19:37 ` [PATCH 06/15] HMM: mm add helper to update page table when migrating memory back v2 Jérôme Glisse
2015-08-13 19:37 ` [PATCH 07/15] HMM: mm add helper to update page table when migrating memory v2 Jérôme Glisse
2015-08-13 19:37 ` [PATCH 08/15] HMM: new callback for copying memory from and to device " Jérôme Glisse
2015-08-13 19:37 ` [PATCH 09/15] HMM: allow to get pointer to spinlock protecting a directory Jérôme Glisse
2015-08-13 19:37 ` [PATCH 10/15] HMM: split DMA mapping function in two Jérôme Glisse
2015-08-13 19:37 ` [PATCH 11/15] HMM: add helpers for migration back to system memory v3 Jérôme Glisse
2015-08-13 19:37 ` [PATCH 12/15] HMM: fork copy migrated memory into system memory for child process Jérôme Glisse
2015-08-13 19:37 ` [PATCH 13/15] HMM: CPU page fault on migrated memory Jérôme Glisse
2015-08-13 19:37 ` [PATCH 14/15] HMM: add mirror fault support for system to device memory migration v3 Jérôme Glisse
2015-08-13 19:37 ` [PATCH 15/15] HMM/dummy: add fake device memory to dummy HMM device driver Jérôme Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).