All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 10/14] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE
@ 2018-03-16 20:35 ` jglisse
  0 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

Make naming consistent accross code, DEVICE_PRIVATE is the name use
outside HMM code so use that one.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 4 ++--
 mm/hmm.c            | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 6d2b6bf6da4b..78018b3e7a9f 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -88,13 +88,13 @@ struct hmm;
  *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
  *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
  *      set and the pfn value is undefined.
- * HMM_PFN_DEVICE_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
+ * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
  */
 #define HMM_PFN_VALID (1 << 0)
 #define HMM_PFN_WRITE (1 << 1)
 #define HMM_PFN_ERROR (1 << 2)
 #define HMM_PFN_SPECIAL (1 << 3)
-#define HMM_PFN_DEVICE_UNADDRESSABLE (1 << 4)
+#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
 #define HMM_PFN_SHIFT 5
 
 /*
diff --git a/mm/hmm.c b/mm/hmm.c
index 2118e42cb838..857eec622c98 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -429,7 +429,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 					pfns[i] |= HMM_PFN_WRITE;
 				} else if (write_fault)
 					goto fault;
-				pfns[i] |= HMM_PFN_DEVICE_UNADDRESSABLE;
+				pfns[i] |= HMM_PFN_DEVICE_PRIVATE;
 			} else if (is_migration_entry(entry)) {
 				if (hmm_vma_walk->fault) {
 					pte_unmap(ptep);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/14] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE
@ 2018-03-16 20:35 ` jglisse
  0 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: JA(C)rA'me Glisse <jglisse@redhat.com>

Make naming consistent accross code, DEVICE_PRIVATE is the name use
outside HMM code so use that one.

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 4 ++--
 mm/hmm.c            | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 6d2b6bf6da4b..78018b3e7a9f 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -88,13 +88,13 @@ struct hmm;
  *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
  *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
  *      set and the pfn value is undefined.
- * HMM_PFN_DEVICE_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
+ * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
  */
 #define HMM_PFN_VALID (1 << 0)
 #define HMM_PFN_WRITE (1 << 1)
 #define HMM_PFN_ERROR (1 << 2)
 #define HMM_PFN_SPECIAL (1 << 3)
-#define HMM_PFN_DEVICE_UNADDRESSABLE (1 << 4)
+#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
 #define HMM_PFN_SHIFT 5
 
 /*
diff --git a/mm/hmm.c b/mm/hmm.c
index 2118e42cb838..857eec622c98 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -429,7 +429,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 					pfns[i] |= HMM_PFN_WRITE;
 				} else if (write_fault)
 					goto fault;
-				pfns[i] |= HMM_PFN_DEVICE_UNADDRESSABLE;
+				pfns[i] |= HMM_PFN_DEVICE_PRIVATE;
 			} else if (is_migration_entry(entry)) {
 				if (hmm_vma_walk->fault) {
 					pte_unmap(ptep);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/14] mm/hmm: move hmm_pfns_clear() closer to where it is use
  2018-03-16 20:35 ` jglisse
@ 2018-03-16 20:35   ` jglisse
  -1 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

Move hmm_pfns_clear() closer to where it is use to make it clear it
is not use by page table walkers.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 mm/hmm.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 857eec622c98..3a708f500b80 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -297,14 +297,6 @@ static int hmm_pfns_bad(unsigned long addr,
 	return 0;
 }
 
-static void hmm_pfns_clear(uint64_t *pfns,
-			   unsigned long addr,
-			   unsigned long end)
-{
-	for (; addr < end; addr += PAGE_SIZE, pfns++)
-		*pfns = 0;
-}
-
 /*
  * hmm_vma_walk_hole() - handle a range back by no pmd or no pte
  * @start: range virtual start address (inclusive)
@@ -463,6 +455,14 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	return 0;
 }
 
+static void hmm_pfns_clear(uint64_t *pfns,
+			   unsigned long addr,
+			   unsigned long end)
+{
+	for (; addr < end; addr += PAGE_SIZE, pfns++)
+		*pfns = 0;
+}
+
 static void hmm_pfns_special(struct hmm_range *range)
 {
 	unsigned long addr = range->start, i = 0;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/14] mm/hmm: move hmm_pfns_clear() closer to where it is use
@ 2018-03-16 20:35   ` jglisse
  0 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: JA(C)rA'me Glisse <jglisse@redhat.com>

Move hmm_pfns_clear() closer to where it is use to make it clear it
is not use by page table walkers.

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 mm/hmm.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 857eec622c98..3a708f500b80 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -297,14 +297,6 @@ static int hmm_pfns_bad(unsigned long addr,
 	return 0;
 }
 
-static void hmm_pfns_clear(uint64_t *pfns,
-			   unsigned long addr,
-			   unsigned long end)
-{
-	for (; addr < end; addr += PAGE_SIZE, pfns++)
-		*pfns = 0;
-}
-
 /*
  * hmm_vma_walk_hole() - handle a range back by no pmd or no pte
  * @start: range virtual start address (inclusive)
@@ -463,6 +455,14 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	return 0;
 }
 
+static void hmm_pfns_clear(uint64_t *pfns,
+			   unsigned long addr,
+			   unsigned long end)
+{
+	for (; addr < end; addr += PAGE_SIZE, pfns++)
+		*pfns = 0;
+}
+
 static void hmm_pfns_special(struct hmm_range *range)
 {
 	unsigned long addr = range->start, i = 0;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/14] mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()
  2018-03-16 20:35 ` jglisse
@ 2018-03-16 20:35   ` jglisse
  -1 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

No functional change, just create one function to handle pmd and one
to handle pte (hmm_vma_handle_pmd() and hmm_vma_handle_pte()).

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 mm/hmm.c | 174 +++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 102 insertions(+), 72 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 3a708f500b80..40aaa757f262 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -332,6 +332,99 @@ static int hmm_vma_walk_hole(unsigned long addr,
 	return hmm_vma_walk->fault ? -EAGAIN : 0;
 }
 
+static int hmm_vma_handle_pmd(struct mm_walk *walk,
+			      unsigned long addr,
+			      unsigned long end,
+			      uint64_t *pfns,
+			      pmd_t pmd)
+{
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	unsigned long pfn, i;
+	uint64_t flag = 0;
+
+	if (pmd_protnone(pmd))
+		return hmm_vma_walk_hole(addr, end, walk);
+
+	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pmd_write(pmd))
+		return hmm_vma_walk_hole(addr, end, walk);
+
+	pfn = pmd_pfn(pmd) + pte_index(addr);
+	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
+	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
+		pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
+	hmm_vma_walk->last = end;
+	return 0;
+}
+
+static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
+			      unsigned long end, pmd_t *pmdp, pte_t *ptep,
+			      uint64_t *pfns)
+{
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct vm_area_struct *vma = walk->vma;
+	pte_t pte = *ptep;
+
+	*pfns = 0;
+
+	if (pte_none(pte)) {
+		*pfns = 0;
+		if (hmm_vma_walk->fault)
+			goto fault;
+		return 0;
+	}
+
+	if (!pte_present(pte)) {
+		swp_entry_t entry = pte_to_swp_entry(pte);
+
+		if (!non_swap_entry(entry)) {
+			if (hmm_vma_walk->fault)
+				goto fault;
+			return 0;
+		}
+
+		/*
+		 * This is a special swap entry, ignore migration, use
+		 * device and report anything else as error.
+		 */
+		if (is_device_private_entry(entry)) {
+			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
+			if (is_write_device_private_entry(entry)) {
+				*pfns |= HMM_PFN_WRITE;
+			} else if ((hmm_vma_walk->fault & hmm_vma_walk->write))
+				goto fault;
+			*pfns |= HMM_PFN_DEVICE_PRIVATE;
+			return 0;
+		}
+
+		if (is_migration_entry(entry)) {
+			if (hmm_vma_walk->fault) {
+				pte_unmap(ptep);
+				hmm_vma_walk->last = addr;
+				migration_entry_wait(vma->vm_mm,
+						pmdp, addr);
+				return -EAGAIN;
+			}
+			return 0;
+		}
+
+		/* Report error for everything else */
+		*pfns = HMM_PFN_ERROR;
+		return -EFAULT;
+	}
+
+	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pte_write(pte))
+		goto fault;
+
+	*pfns = hmm_pfn_from_pfn(pte_pfn(pte));
+	*pfns |= pte_write(pte) ? HMM_PFN_WRITE : 0;
+	return 0;
+
+fault:
+	pte_unmap(ptep);
+	/* Fault all pages in range if ask for */
+	return hmm_vma_walk_hole(addr, end, walk);
+}
+
 static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			    unsigned long start,
 			    unsigned long end,
@@ -339,25 +432,20 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	struct vm_area_struct *vma = walk->vma;
 	uint64_t *pfns = range->pfns;
 	unsigned long addr = start, i;
-	bool write_fault;
 	pte_t *ptep;
 
 	i = (addr - range->start) >> PAGE_SHIFT;
-	write_fault = hmm_vma_walk->fault & hmm_vma_walk->write;
 
 again:
 	if (pmd_none(*pmdp))
 		return hmm_vma_walk_hole(start, end, walk);
 
-	if (pmd_huge(*pmdp) && vma->vm_flags & VM_HUGETLB)
+	if (pmd_huge(*pmdp) && (range->vma->vm_flags & VM_HUGETLB))
 		return hmm_pfns_bad(start, end, walk);
 
 	if (pmd_devmap(*pmdp) || pmd_trans_huge(*pmdp)) {
-		unsigned long pfn;
-		uint64_t flag = 0;
 		pmd_t pmd;
 
 		/*
@@ -373,17 +461,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		barrier();
 		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
 			goto again;
-		if (pmd_protnone(pmd))
-			return hmm_vma_walk_hole(start, end, walk);
 
-		if (write_fault && !pmd_write(pmd))
-			return hmm_vma_walk_hole(start, end, walk);
-
-		pfn = pmd_pfn(pmd) + pte_index(addr);
-		flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
-		for (; addr < end; addr += PAGE_SIZE, i++, pfn++)
-			pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
-		return 0;
+		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
 	}
 
 	if (pmd_bad(*pmdp))
@@ -391,67 +470,18 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 
 	ptep = pte_offset_map(pmdp, addr);
 	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
-		pte_t pte = *ptep;
+		int r;
 
-		pfns[i] = 0;
-
-		if (pte_none(pte)) {
-			pfns[i] = 0;
-			if (hmm_vma_walk->fault)
-				goto fault;
-			continue;
-		}
-
-		if (!pte_present(pte)) {
-			swp_entry_t entry = pte_to_swp_entry(pte);
-
-			if (!non_swap_entry(entry)) {
-				if (hmm_vma_walk->fault)
-					goto fault;
-				continue;
-			}
-
-			/*
-			 * This is a special swap entry, ignore migration, use
-			 * device and report anything else as error.
-			 */
-			if (is_device_private_entry(entry)) {
-				pfns[i] = hmm_pfn_from_pfn(swp_offset(entry));
-				if (is_write_device_private_entry(entry)) {
-					pfns[i] |= HMM_PFN_WRITE;
-				} else if (write_fault)
-					goto fault;
-				pfns[i] |= HMM_PFN_DEVICE_PRIVATE;
-			} else if (is_migration_entry(entry)) {
-				if (hmm_vma_walk->fault) {
-					pte_unmap(ptep);
-					hmm_vma_walk->last = addr;
-					migration_entry_wait(vma->vm_mm,
-							     pmdp, addr);
-					return -EAGAIN;
-				}
-				continue;
-			} else {
-				/* Report error for everything else */
-				pfns[i] = HMM_PFN_ERROR;
-			}
-			continue;
+		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+		if (r) {
+			/* hmm_vma_handle_pte() did unmap pte directory */
+			hmm_vma_walk->last = addr;
+			return r;
 		}
-
-		if (write_fault && !pte_write(pte))
-			goto fault;
-
-		pfns[i] = hmm_pfn_from_pfn(pte_pfn(pte));
-		pfns[i] |= pte_write(pte) ? HMM_PFN_WRITE : 0;
-		continue;
-
-fault:
-		pte_unmap(ptep);
-		/* Fault all pages in range if ask for */
-		return hmm_vma_walk_hole(start, end, walk);
 	}
 	pte_unmap(ptep - 1);
 
+	hmm_vma_walk->last = addr;
 	return 0;
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/14] mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()
@ 2018-03-16 20:35   ` jglisse
  0 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: JA(C)rA'me Glisse <jglisse@redhat.com>

No functional change, just create one function to handle pmd and one
to handle pte (hmm_vma_handle_pmd() and hmm_vma_handle_pte()).

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 mm/hmm.c | 174 +++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 102 insertions(+), 72 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 3a708f500b80..40aaa757f262 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -332,6 +332,99 @@ static int hmm_vma_walk_hole(unsigned long addr,
 	return hmm_vma_walk->fault ? -EAGAIN : 0;
 }
 
+static int hmm_vma_handle_pmd(struct mm_walk *walk,
+			      unsigned long addr,
+			      unsigned long end,
+			      uint64_t *pfns,
+			      pmd_t pmd)
+{
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	unsigned long pfn, i;
+	uint64_t flag = 0;
+
+	if (pmd_protnone(pmd))
+		return hmm_vma_walk_hole(addr, end, walk);
+
+	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pmd_write(pmd))
+		return hmm_vma_walk_hole(addr, end, walk);
+
+	pfn = pmd_pfn(pmd) + pte_index(addr);
+	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
+	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
+		pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
+	hmm_vma_walk->last = end;
+	return 0;
+}
+
+static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
+			      unsigned long end, pmd_t *pmdp, pte_t *ptep,
+			      uint64_t *pfns)
+{
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct vm_area_struct *vma = walk->vma;
+	pte_t pte = *ptep;
+
+	*pfns = 0;
+
+	if (pte_none(pte)) {
+		*pfns = 0;
+		if (hmm_vma_walk->fault)
+			goto fault;
+		return 0;
+	}
+
+	if (!pte_present(pte)) {
+		swp_entry_t entry = pte_to_swp_entry(pte);
+
+		if (!non_swap_entry(entry)) {
+			if (hmm_vma_walk->fault)
+				goto fault;
+			return 0;
+		}
+
+		/*
+		 * This is a special swap entry, ignore migration, use
+		 * device and report anything else as error.
+		 */
+		if (is_device_private_entry(entry)) {
+			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
+			if (is_write_device_private_entry(entry)) {
+				*pfns |= HMM_PFN_WRITE;
+			} else if ((hmm_vma_walk->fault & hmm_vma_walk->write))
+				goto fault;
+			*pfns |= HMM_PFN_DEVICE_PRIVATE;
+			return 0;
+		}
+
+		if (is_migration_entry(entry)) {
+			if (hmm_vma_walk->fault) {
+				pte_unmap(ptep);
+				hmm_vma_walk->last = addr;
+				migration_entry_wait(vma->vm_mm,
+						pmdp, addr);
+				return -EAGAIN;
+			}
+			return 0;
+		}
+
+		/* Report error for everything else */
+		*pfns = HMM_PFN_ERROR;
+		return -EFAULT;
+	}
+
+	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pte_write(pte))
+		goto fault;
+
+	*pfns = hmm_pfn_from_pfn(pte_pfn(pte));
+	*pfns |= pte_write(pte) ? HMM_PFN_WRITE : 0;
+	return 0;
+
+fault:
+	pte_unmap(ptep);
+	/* Fault all pages in range if ask for */
+	return hmm_vma_walk_hole(addr, end, walk);
+}
+
 static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			    unsigned long start,
 			    unsigned long end,
@@ -339,25 +432,20 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	struct vm_area_struct *vma = walk->vma;
 	uint64_t *pfns = range->pfns;
 	unsigned long addr = start, i;
-	bool write_fault;
 	pte_t *ptep;
 
 	i = (addr - range->start) >> PAGE_SHIFT;
-	write_fault = hmm_vma_walk->fault & hmm_vma_walk->write;
 
 again:
 	if (pmd_none(*pmdp))
 		return hmm_vma_walk_hole(start, end, walk);
 
-	if (pmd_huge(*pmdp) && vma->vm_flags & VM_HUGETLB)
+	if (pmd_huge(*pmdp) && (range->vma->vm_flags & VM_HUGETLB))
 		return hmm_pfns_bad(start, end, walk);
 
 	if (pmd_devmap(*pmdp) || pmd_trans_huge(*pmdp)) {
-		unsigned long pfn;
-		uint64_t flag = 0;
 		pmd_t pmd;
 
 		/*
@@ -373,17 +461,8 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		barrier();
 		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
 			goto again;
-		if (pmd_protnone(pmd))
-			return hmm_vma_walk_hole(start, end, walk);
 
-		if (write_fault && !pmd_write(pmd))
-			return hmm_vma_walk_hole(start, end, walk);
-
-		pfn = pmd_pfn(pmd) + pte_index(addr);
-		flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
-		for (; addr < end; addr += PAGE_SIZE, i++, pfn++)
-			pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
-		return 0;
+		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
 	}
 
 	if (pmd_bad(*pmdp))
@@ -391,67 +470,18 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 
 	ptep = pte_offset_map(pmdp, addr);
 	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
-		pte_t pte = *ptep;
+		int r;
 
-		pfns[i] = 0;
-
-		if (pte_none(pte)) {
-			pfns[i] = 0;
-			if (hmm_vma_walk->fault)
-				goto fault;
-			continue;
-		}
-
-		if (!pte_present(pte)) {
-			swp_entry_t entry = pte_to_swp_entry(pte);
-
-			if (!non_swap_entry(entry)) {
-				if (hmm_vma_walk->fault)
-					goto fault;
-				continue;
-			}
-
-			/*
-			 * This is a special swap entry, ignore migration, use
-			 * device and report anything else as error.
-			 */
-			if (is_device_private_entry(entry)) {
-				pfns[i] = hmm_pfn_from_pfn(swp_offset(entry));
-				if (is_write_device_private_entry(entry)) {
-					pfns[i] |= HMM_PFN_WRITE;
-				} else if (write_fault)
-					goto fault;
-				pfns[i] |= HMM_PFN_DEVICE_PRIVATE;
-			} else if (is_migration_entry(entry)) {
-				if (hmm_vma_walk->fault) {
-					pte_unmap(ptep);
-					hmm_vma_walk->last = addr;
-					migration_entry_wait(vma->vm_mm,
-							     pmdp, addr);
-					return -EAGAIN;
-				}
-				continue;
-			} else {
-				/* Report error for everything else */
-				pfns[i] = HMM_PFN_ERROR;
-			}
-			continue;
+		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+		if (r) {
+			/* hmm_vma_handle_pte() did unmap pte directory */
+			hmm_vma_walk->last = addr;
+			return r;
 		}
-
-		if (write_fault && !pte_write(pte))
-			goto fault;
-
-		pfns[i] = hmm_pfn_from_pfn(pte_pfn(pte));
-		pfns[i] |= pte_write(pte) ? HMM_PFN_WRITE : 0;
-		continue;
-
-fault:
-		pte_unmap(ptep);
-		/* Fault all pages in range if ask for */
-		return hmm_vma_walk_hole(start, end, walk);
 	}
 	pte_unmap(ptep - 1);
 
+	hmm_vma_walk->last = addr;
 	return 0;
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 13/14] mm/hmm: change hmm_vma_fault() to allow write fault on page basis
  2018-03-16 20:35 ` jglisse
@ 2018-03-16 20:35   ` jglisse
  -1 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

This change hmm_vma_fault() to not take a global write fault flag
for a range but instead rely on caller to populate HMM pfns array
with proper fault flag ie HMM_PFN_VALID if driver want read fault
for that address or HMM_PFN_VALID and HMM_PFN_WRITE for write.

Moreover by setting HMM_PFN_DEVICE_PRIVATE the device driver can
ask for device private memory to be migrated back to system memory
through page fault.

This is more flexible API and it better reflects how device handles
and reports fault.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h |   2 +-
 mm/hmm.c            | 150 +++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 119 insertions(+), 33 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 78018b3e7a9f..ee758c4e4bec 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -317,7 +317,7 @@ bool hmm_vma_range_done(struct hmm_range *range);
  *
  * See the function description in mm/hmm.c for further documentation.
  */
-int hmm_vma_fault(struct hmm_range *range, bool write, bool block);
+int hmm_vma_fault(struct hmm_range *range, bool block);
 #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
 
 
diff --git a/mm/hmm.c b/mm/hmm.c
index 40aaa757f262..0ea530d0fd1d 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -256,12 +256,10 @@ struct hmm_vma_walk {
 	unsigned long		last;
 	bool			fault;
 	bool			block;
-	bool			write;
 };
 
-static int hmm_vma_do_fault(struct mm_walk *walk,
-			    unsigned long addr,
-			    uint64_t *pfn)
+static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
+			    bool write_fault, uint64_t *pfn)
 {
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
@@ -269,7 +267,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk,
 	int r;
 
 	flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
-	flags |= hmm_vma_walk->write ? FAULT_FLAG_WRITE : 0;
+	flags |= write_fault ? FAULT_FLAG_WRITE : 0;
 	r = handle_mm_fault(vma, addr, flags);
 	if (r & VM_FAULT_RETRY)
 		return -EBUSY;
@@ -301,15 +299,17 @@ static int hmm_pfns_bad(unsigned long addr,
  * hmm_vma_walk_hole() - handle a range back by no pmd or no pte
  * @start: range virtual start address (inclusive)
  * @end: range virtual end address (exclusive)
+ * @fault: should we fault or not ?
+ * @write_fault: write fault ?
  * @walk: mm_walk structure
  * Returns: 0 on success, -EAGAIN after page fault, or page fault error
  *
  * This is an helper call whenever pmd_none() or pte_none() returns true
  * or when there is no directory covering the range.
  */
-static int hmm_vma_walk_hole(unsigned long addr,
-			     unsigned long end,
-			     struct mm_walk *walk)
+static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
+			      bool fault, bool write_fault,
+			      struct mm_walk *walk)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -320,16 +320,89 @@ static int hmm_vma_walk_hole(unsigned long addr,
 	i = (addr - range->start) >> PAGE_SHIFT;
 	for (; addr < end; addr += PAGE_SIZE, i++) {
 		pfns[i] = 0;
-		if (hmm_vma_walk->fault) {
+		if (fault || write_fault) {
 			int ret;
 
-			ret = hmm_vma_do_fault(walk, addr, &pfns[i]);
+			ret = hmm_vma_do_fault(walk, addr, write_fault,
+					       &pfns[i]);
 			if (ret != -EAGAIN)
 				return ret;
 		}
 	}
 
-	return hmm_vma_walk->fault ? -EAGAIN : 0;
+	return (fault || write_fault) ? -EAGAIN : 0;
+}
+
+static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
+				      uint64_t pfns, uint64_t cpu_flags,
+				      bool *fault, bool *write_fault)
+{
+	*fault = *write_fault = false;
+	if (!hmm_vma_walk->fault)
+		return;
+
+	/* We aren't ask to do anything ... */
+	if (!(pfns & HMM_PFN_VALID))
+		return;
+	/* If CPU page table is not valid then we need to fault */
+	*fault = cpu_flags & HMM_PFN_VALID;
+	/* Need to write fault ? */
+	if ((pfns & HMM_PFN_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) {
+		*fault = *write_fault = false;
+		return;
+	}
+	/* Do we fault on device memory ? */
+	if ((pfns & HMM_PFN_DEVICE_PRIVATE) &&
+	    (cpu_flags & HMM_PFN_DEVICE_PRIVATE)) {
+		*write_fault = pfns & HMM_PFN_WRITE;
+		*fault = true;
+	}
+}
+
+static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
+				 const uint64_t *pfns, unsigned long npages,
+				 uint64_t cpu_flags, bool *fault,
+				 bool *write_fault)
+{
+	unsigned long i;
+
+	if (!hmm_vma_walk->fault) {
+		*fault = *write_fault = false;
+		return;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags,
+				   fault, write_fault);
+		if ((*fault) || (*write_fault))
+			return;
+	}
+}
+
+static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
+			     struct mm_walk *walk)
+{
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
+	bool fault, write_fault;
+	unsigned long i, npages;
+	uint64_t *pfns;
+
+	i = (addr - range->start) >> PAGE_SHIFT;
+	npages = (end - addr) >> PAGE_SHIFT;
+	pfns = &range->pfns[i];
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages,
+			     0, &fault, &write_fault);
+	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+}
+
+static inline uint64_t pmd_to_hmm_pfn_flags(pmd_t pmd)
+{
+	if (pmd_protnone(pmd))
+		return 0;
+	return pmd_write(pmd) ? HMM_PFN_VALID |
+				HMM_PFN_WRITE :
+				HMM_PFN_VALID;
 }
 
 static int hmm_vma_handle_pmd(struct mm_walk *walk,
@@ -339,14 +412,17 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 			      pmd_t pmd)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
-	unsigned long pfn, i;
-	uint64_t flag = 0;
+	unsigned long pfn, npages, i;
+	uint64_t flag = 0, cpu_flags;
+	bool fault, write_fault;
 
-	if (pmd_protnone(pmd))
-		return hmm_vma_walk_hole(addr, end, walk);
+	npages = (end - addr) >> PAGE_SHIFT;
+	cpu_flags = pmd_to_hmm_pfn_flags(pmd);
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
+			     &fault, &write_fault);
 
-	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pmd_write(pmd))
-		return hmm_vma_walk_hole(addr, end, walk);
+	if (pmd_protnone(pmd) || fault || write_fault)
+		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + pte_index(addr);
 	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
@@ -356,19 +432,33 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 	return 0;
 }
 
+static inline uint64_t pte_to_hmm_pfn_flags(pte_t pte)
+{
+	if (pte_none(pte) || !pte_present(pte))
+		return 0;
+	return pte_write(pte) ? HMM_PFN_VALID |
+				HMM_PFN_WRITE :
+				HMM_PFN_VALID;
+}
+
 static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			      unsigned long end, pmd_t *pmdp, pte_t *ptep,
 			      uint64_t *pfns)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct vm_area_struct *vma = walk->vma;
+	bool fault, write_fault;
+	uint64_t cpu_flags;
 	pte_t pte = *ptep;
 
 	*pfns = 0;
+	cpu_flags = pte_to_hmm_pfn_flags(pte);
+	hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
+			   &fault, &write_fault);
 
 	if (pte_none(pte)) {
 		*pfns = 0;
-		if (hmm_vma_walk->fault)
+		if (fault || write_fault)
 			goto fault;
 		return 0;
 	}
@@ -377,7 +467,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		swp_entry_t entry = pte_to_swp_entry(pte);
 
 		if (!non_swap_entry(entry)) {
-			if (hmm_vma_walk->fault)
+			if (fault || write_fault)
 				goto fault;
 			return 0;
 		}
@@ -387,21 +477,20 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		 * device and report anything else as error.
 		 */
 		if (is_device_private_entry(entry)) {
+			cpu_flags = HMM_PFN_VALID | HMM_PFN_DEVICE_PRIVATE;
+			cpu_flags |= is_write_device_private_entry(entry) ?
+					HMM_PFN_WRITE : 0;
 			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
-			if (is_write_device_private_entry(entry)) {
-				*pfns |= HMM_PFN_WRITE;
-			} else if ((hmm_vma_walk->fault & hmm_vma_walk->write))
-				goto fault;
 			*pfns |= HMM_PFN_DEVICE_PRIVATE;
 			return 0;
 		}
 
 		if (is_migration_entry(entry)) {
-			if (hmm_vma_walk->fault) {
+			if (fault || write_fault) {
 				pte_unmap(ptep);
 				hmm_vma_walk->last = addr;
 				migration_entry_wait(vma->vm_mm,
-						pmdp, addr);
+						     pmdp, addr);
 				return -EAGAIN;
 			}
 			return 0;
@@ -412,17 +501,16 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		return -EFAULT;
 	}
 
-	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pte_write(pte))
+	if (fault || write_fault)
 		goto fault;
 
-	*pfns = hmm_pfn_from_pfn(pte_pfn(pte));
-	*pfns |= pte_write(pte) ? HMM_PFN_WRITE : 0;
+	*pfns = hmm_pfn_from_pfn(pte_pfn(pte)) | cpu_flags;
 	return 0;
 
 fault:
 	pte_unmap(ptep);
 	/* Fault all pages in range if ask for */
-	return hmm_vma_walk_hole(addr, end, walk);
+	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 }
 
 static int hmm_vma_walk_pmd(pmd_t *pmdp,
@@ -642,7 +730,6 @@ EXPORT_SYMBOL(hmm_vma_range_done);
 /*
  * hmm_vma_fault() - try to fault some address in a virtual address range
  * @range: range being faulted and all needed informations
- * @write: is it a write fault
  * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem)
  * Returns: 0 success, error otherwise (-EAGAIN means mmap_sem have been drop)
  *
@@ -684,7 +771,7 @@ EXPORT_SYMBOL(hmm_vma_range_done);
  *
  * YOU HAVE BEEN WARNED !
  */
-int hmm_vma_fault(struct hmm_range *range, bool write, bool block)
+int hmm_vma_fault(struct hmm_range *range, bool block)
 {
 	struct vm_area_struct *vma = range->vma;
 	unsigned long start = range->start;
@@ -732,7 +819,6 @@ int hmm_vma_fault(struct hmm_range *range, bool write, bool block)
 	}
 
 	hmm_vma_walk.fault = true;
-	hmm_vma_walk.write = write;
 	hmm_vma_walk.block = block;
 	hmm_vma_walk.range = range;
 	mm_walk.private = &hmm_vma_walk;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 13/14] mm/hmm: change hmm_vma_fault() to allow write fault on page basis
@ 2018-03-16 20:35   ` jglisse
  0 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: JA(C)rA'me Glisse <jglisse@redhat.com>

This change hmm_vma_fault() to not take a global write fault flag
for a range but instead rely on caller to populate HMM pfns array
with proper fault flag ie HMM_PFN_VALID if driver want read fault
for that address or HMM_PFN_VALID and HMM_PFN_WRITE for write.

Moreover by setting HMM_PFN_DEVICE_PRIVATE the device driver can
ask for device private memory to be migrated back to system memory
through page fault.

This is more flexible API and it better reflects how device handles
and reports fault.

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h |   2 +-
 mm/hmm.c            | 150 +++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 119 insertions(+), 33 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 78018b3e7a9f..ee758c4e4bec 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -317,7 +317,7 @@ bool hmm_vma_range_done(struct hmm_range *range);
  *
  * See the function description in mm/hmm.c for further documentation.
  */
-int hmm_vma_fault(struct hmm_range *range, bool write, bool block);
+int hmm_vma_fault(struct hmm_range *range, bool block);
 #endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
 
 
diff --git a/mm/hmm.c b/mm/hmm.c
index 40aaa757f262..0ea530d0fd1d 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -256,12 +256,10 @@ struct hmm_vma_walk {
 	unsigned long		last;
 	bool			fault;
 	bool			block;
-	bool			write;
 };
 
-static int hmm_vma_do_fault(struct mm_walk *walk,
-			    unsigned long addr,
-			    uint64_t *pfn)
+static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
+			    bool write_fault, uint64_t *pfn)
 {
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
@@ -269,7 +267,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk,
 	int r;
 
 	flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
-	flags |= hmm_vma_walk->write ? FAULT_FLAG_WRITE : 0;
+	flags |= write_fault ? FAULT_FLAG_WRITE : 0;
 	r = handle_mm_fault(vma, addr, flags);
 	if (r & VM_FAULT_RETRY)
 		return -EBUSY;
@@ -301,15 +299,17 @@ static int hmm_pfns_bad(unsigned long addr,
  * hmm_vma_walk_hole() - handle a range back by no pmd or no pte
  * @start: range virtual start address (inclusive)
  * @end: range virtual end address (exclusive)
+ * @fault: should we fault or not ?
+ * @write_fault: write fault ?
  * @walk: mm_walk structure
  * Returns: 0 on success, -EAGAIN after page fault, or page fault error
  *
  * This is an helper call whenever pmd_none() or pte_none() returns true
  * or when there is no directory covering the range.
  */
-static int hmm_vma_walk_hole(unsigned long addr,
-			     unsigned long end,
-			     struct mm_walk *walk)
+static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
+			      bool fault, bool write_fault,
+			      struct mm_walk *walk)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -320,16 +320,89 @@ static int hmm_vma_walk_hole(unsigned long addr,
 	i = (addr - range->start) >> PAGE_SHIFT;
 	for (; addr < end; addr += PAGE_SIZE, i++) {
 		pfns[i] = 0;
-		if (hmm_vma_walk->fault) {
+		if (fault || write_fault) {
 			int ret;
 
-			ret = hmm_vma_do_fault(walk, addr, &pfns[i]);
+			ret = hmm_vma_do_fault(walk, addr, write_fault,
+					       &pfns[i]);
 			if (ret != -EAGAIN)
 				return ret;
 		}
 	}
 
-	return hmm_vma_walk->fault ? -EAGAIN : 0;
+	return (fault || write_fault) ? -EAGAIN : 0;
+}
+
+static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
+				      uint64_t pfns, uint64_t cpu_flags,
+				      bool *fault, bool *write_fault)
+{
+	*fault = *write_fault = false;
+	if (!hmm_vma_walk->fault)
+		return;
+
+	/* We aren't ask to do anything ... */
+	if (!(pfns & HMM_PFN_VALID))
+		return;
+	/* If CPU page table is not valid then we need to fault */
+	*fault = cpu_flags & HMM_PFN_VALID;
+	/* Need to write fault ? */
+	if ((pfns & HMM_PFN_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) {
+		*fault = *write_fault = false;
+		return;
+	}
+	/* Do we fault on device memory ? */
+	if ((pfns & HMM_PFN_DEVICE_PRIVATE) &&
+	    (cpu_flags & HMM_PFN_DEVICE_PRIVATE)) {
+		*write_fault = pfns & HMM_PFN_WRITE;
+		*fault = true;
+	}
+}
+
+static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
+				 const uint64_t *pfns, unsigned long npages,
+				 uint64_t cpu_flags, bool *fault,
+				 bool *write_fault)
+{
+	unsigned long i;
+
+	if (!hmm_vma_walk->fault) {
+		*fault = *write_fault = false;
+		return;
+	}
+
+	for (i = 0; i < npages; ++i) {
+		hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags,
+				   fault, write_fault);
+		if ((*fault) || (*write_fault))
+			return;
+	}
+}
+
+static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
+			     struct mm_walk *walk)
+{
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
+	bool fault, write_fault;
+	unsigned long i, npages;
+	uint64_t *pfns;
+
+	i = (addr - range->start) >> PAGE_SHIFT;
+	npages = (end - addr) >> PAGE_SHIFT;
+	pfns = &range->pfns[i];
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages,
+			     0, &fault, &write_fault);
+	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+}
+
+static inline uint64_t pmd_to_hmm_pfn_flags(pmd_t pmd)
+{
+	if (pmd_protnone(pmd))
+		return 0;
+	return pmd_write(pmd) ? HMM_PFN_VALID |
+				HMM_PFN_WRITE :
+				HMM_PFN_VALID;
 }
 
 static int hmm_vma_handle_pmd(struct mm_walk *walk,
@@ -339,14 +412,17 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 			      pmd_t pmd)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
-	unsigned long pfn, i;
-	uint64_t flag = 0;
+	unsigned long pfn, npages, i;
+	uint64_t flag = 0, cpu_flags;
+	bool fault, write_fault;
 
-	if (pmd_protnone(pmd))
-		return hmm_vma_walk_hole(addr, end, walk);
+	npages = (end - addr) >> PAGE_SHIFT;
+	cpu_flags = pmd_to_hmm_pfn_flags(pmd);
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
+			     &fault, &write_fault);
 
-	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pmd_write(pmd))
-		return hmm_vma_walk_hole(addr, end, walk);
+	if (pmd_protnone(pmd) || fault || write_fault)
+		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + pte_index(addr);
 	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
@@ -356,19 +432,33 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 	return 0;
 }
 
+static inline uint64_t pte_to_hmm_pfn_flags(pte_t pte)
+{
+	if (pte_none(pte) || !pte_present(pte))
+		return 0;
+	return pte_write(pte) ? HMM_PFN_VALID |
+				HMM_PFN_WRITE :
+				HMM_PFN_VALID;
+}
+
 static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			      unsigned long end, pmd_t *pmdp, pte_t *ptep,
 			      uint64_t *pfns)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct vm_area_struct *vma = walk->vma;
+	bool fault, write_fault;
+	uint64_t cpu_flags;
 	pte_t pte = *ptep;
 
 	*pfns = 0;
+	cpu_flags = pte_to_hmm_pfn_flags(pte);
+	hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
+			   &fault, &write_fault);
 
 	if (pte_none(pte)) {
 		*pfns = 0;
-		if (hmm_vma_walk->fault)
+		if (fault || write_fault)
 			goto fault;
 		return 0;
 	}
@@ -377,7 +467,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		swp_entry_t entry = pte_to_swp_entry(pte);
 
 		if (!non_swap_entry(entry)) {
-			if (hmm_vma_walk->fault)
+			if (fault || write_fault)
 				goto fault;
 			return 0;
 		}
@@ -387,21 +477,20 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		 * device and report anything else as error.
 		 */
 		if (is_device_private_entry(entry)) {
+			cpu_flags = HMM_PFN_VALID | HMM_PFN_DEVICE_PRIVATE;
+			cpu_flags |= is_write_device_private_entry(entry) ?
+					HMM_PFN_WRITE : 0;
 			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
-			if (is_write_device_private_entry(entry)) {
-				*pfns |= HMM_PFN_WRITE;
-			} else if ((hmm_vma_walk->fault & hmm_vma_walk->write))
-				goto fault;
 			*pfns |= HMM_PFN_DEVICE_PRIVATE;
 			return 0;
 		}
 
 		if (is_migration_entry(entry)) {
-			if (hmm_vma_walk->fault) {
+			if (fault || write_fault) {
 				pte_unmap(ptep);
 				hmm_vma_walk->last = addr;
 				migration_entry_wait(vma->vm_mm,
-						pmdp, addr);
+						     pmdp, addr);
 				return -EAGAIN;
 			}
 			return 0;
@@ -412,17 +501,16 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		return -EFAULT;
 	}
 
-	if ((hmm_vma_walk->fault & hmm_vma_walk->write) && !pte_write(pte))
+	if (fault || write_fault)
 		goto fault;
 
-	*pfns = hmm_pfn_from_pfn(pte_pfn(pte));
-	*pfns |= pte_write(pte) ? HMM_PFN_WRITE : 0;
+	*pfns = hmm_pfn_from_pfn(pte_pfn(pte)) | cpu_flags;
 	return 0;
 
 fault:
 	pte_unmap(ptep);
 	/* Fault all pages in range if ask for */
-	return hmm_vma_walk_hole(addr, end, walk);
+	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 }
 
 static int hmm_vma_walk_pmd(pmd_t *pmdp,
@@ -642,7 +730,6 @@ EXPORT_SYMBOL(hmm_vma_range_done);
 /*
  * hmm_vma_fault() - try to fault some address in a virtual address range
  * @range: range being faulted and all needed informations
- * @write: is it a write fault
  * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem)
  * Returns: 0 success, error otherwise (-EAGAIN means mmap_sem have been drop)
  *
@@ -684,7 +771,7 @@ EXPORT_SYMBOL(hmm_vma_range_done);
  *
  * YOU HAVE BEEN WARNED !
  */
-int hmm_vma_fault(struct hmm_range *range, bool write, bool block)
+int hmm_vma_fault(struct hmm_range *range, bool block)
 {
 	struct vm_area_struct *vma = range->vma;
 	unsigned long start = range->start;
@@ -732,7 +819,6 @@ int hmm_vma_fault(struct hmm_range *range, bool write, bool block)
 	}
 
 	hmm_vma_walk.fault = true;
-	hmm_vma_walk.write = write;
 	hmm_vma_walk.block = block;
 	hmm_vma_walk.range = range;
 	mm_walk.private = &hmm_vma_walk;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 14/14] mm/hmm: use device driver encoding for HMM pfn
  2018-03-16 20:35 ` jglisse
@ 2018-03-16 20:35   ` jglisse
  -1 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: Jérôme Glisse <jglisse@redhat.com>

User of hmm_vma_fault() and hmm_vma_get_pfns() provide a flags array
and pfn shift value allowing them to define their own encoding for HMM
pfn that are fill inside the pfns array of the hmm_range struct. With
this device driver can get pfn that match their own private encoding
out of HMM without having to do any convertion.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 91 ++++++++++++++++++++++++++++++++---------------------
 mm/hmm.c            | 83 +++++++++++++++++++++++++++---------------------
 2 files changed, 102 insertions(+), 72 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index ee758c4e4bec..cb9af99f9371 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -80,68 +80,106 @@
 struct hmm;
 
 /*
+ * hmm_pfn_flag_e - HMM uses its own pfn type to keep several flags per page
+ *
  * Flags:
  * HMM_PFN_VALID: pfn is valid
  * HMM_PFN_WRITE: CPU page table has write permission set
  * HMM_PFN_ERROR: corresponding CPU page table entry points to poisoned memory
+ * HMM_PFN_EMPTY: corresponding CPU page table entry is pte_none()
  * HMM_PFN_SPECIAL: corresponding CPU page table entry is special; i.e., the
  *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
  *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
  *      set and the pfn value is undefined.
- * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
+ * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
+ */
+enum hmm_pfn_flag_e {
+	HMM_PFN_VALID = 0,
+	HMM_PFN_WRITE,
+	HMM_PFN_ERROR,
+	HMM_PFN_NONE,
+	HMM_PFN_SPECIAL,
+	HMM_PFN_DEVICE_PRIVATE,
+	HMM_PFN_FLAG_MAX
+};
+
+/*
+ * struct hmm_range - track invalidation lock on virtual address range
+ *
+ * @vma: the vm area struct for the range
+ * @list: all range lock are on a list
+ * @start: range virtual start address (inclusive)
+ * @end: range virtual end address (exclusive)
+ * @pfns: array of pfns (big enough for the range)
+ * @flags: pfn flags to match device driver page table
+ * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
+ * @valid: pfns array did not change since it has been fill by an HMM function
  */
-#define HMM_PFN_VALID (1 << 0)
-#define HMM_PFN_WRITE (1 << 1)
-#define HMM_PFN_ERROR (1 << 2)
-#define HMM_PFN_SPECIAL (1 << 3)
-#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
-#define HMM_PFN_SHIFT 5
+struct hmm_range {
+	struct vm_area_struct	*vma;
+	struct list_head	list;
+	unsigned long		start;
+	unsigned long		end;
+	uint64_t		*pfns;
+	const uint64_t		*flags;
+	uint8_t			pfn_shift;
+	bool			valid;
+};
 
 /*
  * hmm_pfn_to_page() - return struct page pointed to by a valid HMM pfn
+ * @range: range use to decode HMM pfn value
  * @pfn: HMM pfn value to get corresponding struct page from
  * Returns: struct page pointer if pfn is a valid HMM pfn, NULL otherwise
  *
  * If the uint64_t is valid (ie valid flag set) then return the struct page
  * matching the pfn value stored in the HMM pfn. Otherwise return NULL.
  */
-static inline struct page *hmm_pfn_to_page(uint64_t pfn)
+static inline struct page *hmm_pfn_to_page(const struct hmm_range *range,
+					   uint64_t pfn)
 {
-	if (!(pfn & HMM_PFN_VALID))
+	if (!(pfn & range->flags[HMM_PFN_VALID]))
 		return NULL;
-	return pfn_to_page(pfn >> HMM_PFN_SHIFT);
+	return pfn_to_page(pfn >> range->pfn_shift);
 }
 
 /*
  * hmm_pfn_to_pfn() - return pfn value store in a HMM pfn
+ * @range: range use to decode HMM pfn value
  * @pfn: HMM pfn value to extract pfn from
  * Returns: pfn value if HMM pfn is valid, -1UL otherwise
  */
-static inline unsigned long hmm_pfn_to_pfn(uint64_t pfn)
+static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range,
+					   uint64_t pfn)
 {
-	if (!(pfn & HMM_PFN_VALID))
+	if (!(pfn & range->flags[HMM_PFN_VALID]))
 		return -1UL;
-	return (pfn >> HMM_PFN_SHIFT);
+	return (pfn >> range->pfn_shift);
 }
 
 /*
  * hmm_pfn_from_page() - create a valid HMM pfn value from struct page
+ * @range: range use to encode HMM pfn value
  * @page: struct page pointer for which to create the HMM pfn
  * Returns: valid HMM pfn for the page
  */
-static inline uint64_t hmm_pfn_from_page(struct page *page)
+static inline uint64_t hmm_pfn_from_page(const struct hmm_range *range,
+					 struct page *page)
 {
-	return (page_to_pfn(page) << HMM_PFN_SHIFT) | HMM_PFN_VALID;
+	return (page_to_pfn(page) << range->pfn_shift) |
+		range->flags[HMM_PFN_VALID];
 }
 
 /*
  * hmm_pfn_from_pfn() - create a valid HMM pfn value from pfn
+ * @range: range use to encode HMM pfn value
  * @pfn: pfn value for which to create the HMM pfn
  * Returns: valid HMM pfn for the pfn
  */
-static inline uint64_t hmm_pfn_from_pfn(unsigned long pfn)
+static inline uint64_t hmm_pfn_from_pfn(const struct hmm_range *range,
+					unsigned long pfn)
 {
-	return (pfn << HMM_PFN_SHIFT) | HMM_PFN_VALID;
+	return (pfn << range->pfn_shift) | range->flags[HMM_PFN_VALID];
 }
 
 
@@ -263,25 +301,6 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
 void hmm_mirror_unregister(struct hmm_mirror *mirror);
 
 
-/*
- * struct hmm_range - track invalidation lock on virtual address range
- *
- * @vma: the vm area struct for the range
- * @list: all range lock are on a list
- * @start: range virtual start address (inclusive)
- * @end: range virtual end address (exclusive)
- * @pfns: array of pfns (big enough for the range)
- * @valid: pfns array did not change since it has been fill by an HMM function
- */
-struct hmm_range {
-	struct vm_area_struct	*vma;
-	struct list_head	list;
-	unsigned long		start;
-	unsigned long		end;
-	uint64_t		*pfns;
-	bool			valid;
-};
-
 /*
  * To snapshot the CPU page table, call hmm_vma_get_pfns(), then take a device
  * driver lock that serializes device page table updates, then call
diff --git a/mm/hmm.c b/mm/hmm.c
index 0ea530d0fd1d..7ccca5478ea1 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -263,6 +263,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 {
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
 	int r;
 
@@ -272,7 +273,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 	if (r & VM_FAULT_RETRY)
 		return -EBUSY;
 	if (r & VM_FAULT_ERROR) {
-		*pfn = HMM_PFN_ERROR;
+		*pfn = range->flags[HMM_PFN_ERROR];
 		return -EFAULT;
 	}
 
@@ -290,7 +291,7 @@ static int hmm_pfns_bad(unsigned long addr,
 
 	i = (addr - range->start) >> PAGE_SHIFT;
 	for (; addr < end; addr += PAGE_SIZE, i++)
-		pfns[i] = HMM_PFN_ERROR;
+		pfns[i] = range->flags[HMM_PFN_ERROR];
 
 	return 0;
 }
@@ -319,7 +320,7 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
 	hmm_vma_walk->last = addr;
 	i = (addr - range->start) >> PAGE_SHIFT;
 	for (; addr < end; addr += PAGE_SIZE, i++) {
-		pfns[i] = 0;
+		pfns[i] = range->flags[HMM_PFN_NONE];
 		if (fault || write_fault) {
 			int ret;
 
@@ -337,24 +338,27 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
 				      uint64_t pfns, uint64_t cpu_flags,
 				      bool *fault, bool *write_fault)
 {
+	struct hmm_range *range = hmm_vma_walk->range;
+
 	*fault = *write_fault = false;
 	if (!hmm_vma_walk->fault)
 		return;
 
 	/* We aren't ask to do anything ... */
-	if (!(pfns & HMM_PFN_VALID))
+	if (!(pfns & range->flags[HMM_PFN_VALID]))
 		return;
 	/* If CPU page table is not valid then we need to fault */
-	*fault = cpu_flags & HMM_PFN_VALID;
+	*fault = cpu_flags & range->flags[HMM_PFN_VALID];
 	/* Need to write fault ? */
-	if ((pfns & HMM_PFN_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) {
+	if ((pfns & range->flags[HMM_PFN_WRITE]) &&
+	    !(cpu_flags & range->flags[HMM_PFN_WRITE])) {
 		*fault = *write_fault = false;
 		return;
 	}
 	/* Do we fault on device memory ? */
-	if ((pfns & HMM_PFN_DEVICE_PRIVATE) &&
-	    (cpu_flags & HMM_PFN_DEVICE_PRIVATE)) {
-		*write_fault = pfns & HMM_PFN_WRITE;
+	if ((pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) &&
+	    (cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
+		*write_fault = pfns & range->flags[HMM_PFN_WRITE];
 		*fault = true;
 	}
 }
@@ -396,13 +400,13 @@ static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 }
 
-static inline uint64_t pmd_to_hmm_pfn_flags(pmd_t pmd)
+static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
 {
 	if (pmd_protnone(pmd))
 		return 0;
-	return pmd_write(pmd) ? HMM_PFN_VALID |
-				HMM_PFN_WRITE :
-				HMM_PFN_VALID;
+	return pmd_write(pmd) ? range->flags[HMM_PFN_VALID] |
+				range->flags[HMM_PFN_WRITE] :
+				range->flags[HMM_PFN_VALID];
 }
 
 static int hmm_vma_handle_pmd(struct mm_walk *walk,
@@ -412,12 +416,13 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 			      pmd_t pmd)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	unsigned long pfn, npages, i;
-	uint64_t flag = 0, cpu_flags;
 	bool fault, write_fault;
+	uint64_t cpu_flags;
 
 	npages = (end - addr) >> PAGE_SHIFT;
-	cpu_flags = pmd_to_hmm_pfn_flags(pmd);
+	cpu_flags = pmd_to_hmm_pfn_flags(range, pmd);
 	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
 			     &fault, &write_fault);
 
@@ -425,20 +430,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + pte_index(addr);
-	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
 	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
-		pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
+		pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags;
 	hmm_vma_walk->last = end;
 	return 0;
 }
 
-static inline uint64_t pte_to_hmm_pfn_flags(pte_t pte)
+static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
 {
 	if (pte_none(pte) || !pte_present(pte))
 		return 0;
-	return pte_write(pte) ? HMM_PFN_VALID |
-				HMM_PFN_WRITE :
-				HMM_PFN_VALID;
+	return pte_write(pte) ? range->flags[HMM_PFN_VALID] |
+				range->flags[HMM_PFN_WRITE] :
+				range->flags[HMM_PFN_VALID];
 }
 
 static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
@@ -446,18 +450,18 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			      uint64_t *pfns)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
 	bool fault, write_fault;
 	uint64_t cpu_flags;
 	pte_t pte = *ptep;
 
-	*pfns = 0;
-	cpu_flags = pte_to_hmm_pfn_flags(pte);
+	*pfns = range->flags[HMM_PFN_NONE];
+	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
 	hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
 			   &fault, &write_fault);
 
 	if (pte_none(pte)) {
-		*pfns = 0;
 		if (fault || write_fault)
 			goto fault;
 		return 0;
@@ -477,11 +481,16 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		 * device and report anything else as error.
 		 */
 		if (is_device_private_entry(entry)) {
-			cpu_flags = HMM_PFN_VALID | HMM_PFN_DEVICE_PRIVATE;
+			cpu_flags = range->flags[HMM_PFN_VALID] |
+				    range->flags[HMM_PFN_DEVICE_PRIVATE];
 			cpu_flags |= is_write_device_private_entry(entry) ?
-					HMM_PFN_WRITE : 0;
-			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
-			*pfns |= HMM_PFN_DEVICE_PRIVATE;
+					range->flags[HMM_PFN_WRITE] : 0;
+			hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
+					   &fault, &write_fault);
+			if (fault || write_fault)
+				goto fault;
+			*pfns = hmm_pfn_from_pfn(range, swp_offset(entry));
+			*pfns |= cpu_flags;
 			return 0;
 		}
 
@@ -504,7 +513,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (fault || write_fault)
 		goto fault;
 
-	*pfns = hmm_pfn_from_pfn(pte_pfn(pte)) | cpu_flags;
+	*pfns = hmm_pfn_from_pfn(range, pte_pfn(pte)) | cpu_flags;
 	return 0;
 
 fault:
@@ -573,12 +582,13 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	return 0;
 }
 
-static void hmm_pfns_clear(uint64_t *pfns,
+static void hmm_pfns_clear(struct hmm_range *range,
+			   uint64_t *pfns,
 			   unsigned long addr,
 			   unsigned long end)
 {
 	for (; addr < end; addr += PAGE_SIZE, pfns++)
-		*pfns = 0;
+		*pfns = range->flags[HMM_PFN_NONE];
 }
 
 static void hmm_pfns_special(struct hmm_range *range)
@@ -586,7 +596,7 @@ static void hmm_pfns_special(struct hmm_range *range)
 	unsigned long addr = range->start, i = 0;
 
 	for (; addr < range->end; addr += PAGE_SIZE, i++)
-		range->pfns[i] = HMM_PFN_SPECIAL;
+		range->pfns[i] = range->flags[HMM_PFN_SPECIAL];
 }
 
 /*
@@ -644,7 +654,7 @@ int hmm_vma_get_pfns(struct hmm_range *range)
 		 * is not a case we care about (some operation like atomic no
 		 * longer make sense).
 		 */
-		hmm_pfns_clear(range->pfns, range->start, range->end);
+		hmm_pfns_clear(range, range->pfns, range->start, range->end);
 		return 0;
 	}
 
@@ -788,7 +798,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
 
 	hmm = hmm_register(vma->vm_mm);
 	if (!hmm) {
-		hmm_pfns_clear(range->pfns, range->start, range->end);
+		hmm_pfns_clear(range, range->pfns, range->start, range->end);
 		return -ENOMEM;
 	}
 	/* Caller must have registered a mirror using hmm_mirror_register() */
@@ -814,7 +824,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
 		 * is not a case we care about (some operation like atomic no
 		 * longer make sense).
 		 */
-		hmm_pfns_clear(range->pfns, range->start, range->end);
+		hmm_pfns_clear(range, range->pfns, range->start, range->end);
 		return 0;
 	}
 
@@ -841,7 +851,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
 		unsigned long i;
 
 		i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
-		hmm_pfns_clear(&range->pfns[i], hmm_vma_walk.last, range->end);
+		hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last,
+			       range->end);
 		hmm_vma_range_done(range);
 	}
 	return ret;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 14/14] mm/hmm: use device driver encoding for HMM pfn
@ 2018-03-16 20:35   ` jglisse
  0 siblings, 0 replies; 16+ messages in thread
From: jglisse @ 2018-03-16 20:35 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Jérôme Glisse,
	Evgeny Baskakov, Ralph Campbell, Mark Hairgrove, John Hubbard

From: JA(C)rA'me Glisse <jglisse@redhat.com>

User of hmm_vma_fault() and hmm_vma_get_pfns() provide a flags array
and pfn shift value allowing them to define their own encoding for HMM
pfn that are fill inside the pfns array of the hmm_range struct. With
this device driver can get pfn that match their own private encoding
out of HMM without having to do any convertion.

Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/hmm.h | 91 ++++++++++++++++++++++++++++++++---------------------
 mm/hmm.c            | 83 +++++++++++++++++++++++++++---------------------
 2 files changed, 102 insertions(+), 72 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index ee758c4e4bec..cb9af99f9371 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -80,68 +80,106 @@
 struct hmm;
 
 /*
+ * hmm_pfn_flag_e - HMM uses its own pfn type to keep several flags per page
+ *
  * Flags:
  * HMM_PFN_VALID: pfn is valid
  * HMM_PFN_WRITE: CPU page table has write permission set
  * HMM_PFN_ERROR: corresponding CPU page table entry points to poisoned memory
+ * HMM_PFN_EMPTY: corresponding CPU page table entry is pte_none()
  * HMM_PFN_SPECIAL: corresponding CPU page table entry is special; i.e., the
  *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
  *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
  *      set and the pfn value is undefined.
- * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
+ * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
+ */
+enum hmm_pfn_flag_e {
+	HMM_PFN_VALID = 0,
+	HMM_PFN_WRITE,
+	HMM_PFN_ERROR,
+	HMM_PFN_NONE,
+	HMM_PFN_SPECIAL,
+	HMM_PFN_DEVICE_PRIVATE,
+	HMM_PFN_FLAG_MAX
+};
+
+/*
+ * struct hmm_range - track invalidation lock on virtual address range
+ *
+ * @vma: the vm area struct for the range
+ * @list: all range lock are on a list
+ * @start: range virtual start address (inclusive)
+ * @end: range virtual end address (exclusive)
+ * @pfns: array of pfns (big enough for the range)
+ * @flags: pfn flags to match device driver page table
+ * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
+ * @valid: pfns array did not change since it has been fill by an HMM function
  */
-#define HMM_PFN_VALID (1 << 0)
-#define HMM_PFN_WRITE (1 << 1)
-#define HMM_PFN_ERROR (1 << 2)
-#define HMM_PFN_SPECIAL (1 << 3)
-#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
-#define HMM_PFN_SHIFT 5
+struct hmm_range {
+	struct vm_area_struct	*vma;
+	struct list_head	list;
+	unsigned long		start;
+	unsigned long		end;
+	uint64_t		*pfns;
+	const uint64_t		*flags;
+	uint8_t			pfn_shift;
+	bool			valid;
+};
 
 /*
  * hmm_pfn_to_page() - return struct page pointed to by a valid HMM pfn
+ * @range: range use to decode HMM pfn value
  * @pfn: HMM pfn value to get corresponding struct page from
  * Returns: struct page pointer if pfn is a valid HMM pfn, NULL otherwise
  *
  * If the uint64_t is valid (ie valid flag set) then return the struct page
  * matching the pfn value stored in the HMM pfn. Otherwise return NULL.
  */
-static inline struct page *hmm_pfn_to_page(uint64_t pfn)
+static inline struct page *hmm_pfn_to_page(const struct hmm_range *range,
+					   uint64_t pfn)
 {
-	if (!(pfn & HMM_PFN_VALID))
+	if (!(pfn & range->flags[HMM_PFN_VALID]))
 		return NULL;
-	return pfn_to_page(pfn >> HMM_PFN_SHIFT);
+	return pfn_to_page(pfn >> range->pfn_shift);
 }
 
 /*
  * hmm_pfn_to_pfn() - return pfn value store in a HMM pfn
+ * @range: range use to decode HMM pfn value
  * @pfn: HMM pfn value to extract pfn from
  * Returns: pfn value if HMM pfn is valid, -1UL otherwise
  */
-static inline unsigned long hmm_pfn_to_pfn(uint64_t pfn)
+static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range,
+					   uint64_t pfn)
 {
-	if (!(pfn & HMM_PFN_VALID))
+	if (!(pfn & range->flags[HMM_PFN_VALID]))
 		return -1UL;
-	return (pfn >> HMM_PFN_SHIFT);
+	return (pfn >> range->pfn_shift);
 }
 
 /*
  * hmm_pfn_from_page() - create a valid HMM pfn value from struct page
+ * @range: range use to encode HMM pfn value
  * @page: struct page pointer for which to create the HMM pfn
  * Returns: valid HMM pfn for the page
  */
-static inline uint64_t hmm_pfn_from_page(struct page *page)
+static inline uint64_t hmm_pfn_from_page(const struct hmm_range *range,
+					 struct page *page)
 {
-	return (page_to_pfn(page) << HMM_PFN_SHIFT) | HMM_PFN_VALID;
+	return (page_to_pfn(page) << range->pfn_shift) |
+		range->flags[HMM_PFN_VALID];
 }
 
 /*
  * hmm_pfn_from_pfn() - create a valid HMM pfn value from pfn
+ * @range: range use to encode HMM pfn value
  * @pfn: pfn value for which to create the HMM pfn
  * Returns: valid HMM pfn for the pfn
  */
-static inline uint64_t hmm_pfn_from_pfn(unsigned long pfn)
+static inline uint64_t hmm_pfn_from_pfn(const struct hmm_range *range,
+					unsigned long pfn)
 {
-	return (pfn << HMM_PFN_SHIFT) | HMM_PFN_VALID;
+	return (pfn << range->pfn_shift) | range->flags[HMM_PFN_VALID];
 }
 
 
@@ -263,25 +301,6 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
 void hmm_mirror_unregister(struct hmm_mirror *mirror);
 
 
-/*
- * struct hmm_range - track invalidation lock on virtual address range
- *
- * @vma: the vm area struct for the range
- * @list: all range lock are on a list
- * @start: range virtual start address (inclusive)
- * @end: range virtual end address (exclusive)
- * @pfns: array of pfns (big enough for the range)
- * @valid: pfns array did not change since it has been fill by an HMM function
- */
-struct hmm_range {
-	struct vm_area_struct	*vma;
-	struct list_head	list;
-	unsigned long		start;
-	unsigned long		end;
-	uint64_t		*pfns;
-	bool			valid;
-};
-
 /*
  * To snapshot the CPU page table, call hmm_vma_get_pfns(), then take a device
  * driver lock that serializes device page table updates, then call
diff --git a/mm/hmm.c b/mm/hmm.c
index 0ea530d0fd1d..7ccca5478ea1 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -263,6 +263,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 {
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
 	int r;
 
@@ -272,7 +273,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 	if (r & VM_FAULT_RETRY)
 		return -EBUSY;
 	if (r & VM_FAULT_ERROR) {
-		*pfn = HMM_PFN_ERROR;
+		*pfn = range->flags[HMM_PFN_ERROR];
 		return -EFAULT;
 	}
 
@@ -290,7 +291,7 @@ static int hmm_pfns_bad(unsigned long addr,
 
 	i = (addr - range->start) >> PAGE_SHIFT;
 	for (; addr < end; addr += PAGE_SIZE, i++)
-		pfns[i] = HMM_PFN_ERROR;
+		pfns[i] = range->flags[HMM_PFN_ERROR];
 
 	return 0;
 }
@@ -319,7 +320,7 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
 	hmm_vma_walk->last = addr;
 	i = (addr - range->start) >> PAGE_SHIFT;
 	for (; addr < end; addr += PAGE_SIZE, i++) {
-		pfns[i] = 0;
+		pfns[i] = range->flags[HMM_PFN_NONE];
 		if (fault || write_fault) {
 			int ret;
 
@@ -337,24 +338,27 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
 				      uint64_t pfns, uint64_t cpu_flags,
 				      bool *fault, bool *write_fault)
 {
+	struct hmm_range *range = hmm_vma_walk->range;
+
 	*fault = *write_fault = false;
 	if (!hmm_vma_walk->fault)
 		return;
 
 	/* We aren't ask to do anything ... */
-	if (!(pfns & HMM_PFN_VALID))
+	if (!(pfns & range->flags[HMM_PFN_VALID]))
 		return;
 	/* If CPU page table is not valid then we need to fault */
-	*fault = cpu_flags & HMM_PFN_VALID;
+	*fault = cpu_flags & range->flags[HMM_PFN_VALID];
 	/* Need to write fault ? */
-	if ((pfns & HMM_PFN_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) {
+	if ((pfns & range->flags[HMM_PFN_WRITE]) &&
+	    !(cpu_flags & range->flags[HMM_PFN_WRITE])) {
 		*fault = *write_fault = false;
 		return;
 	}
 	/* Do we fault on device memory ? */
-	if ((pfns & HMM_PFN_DEVICE_PRIVATE) &&
-	    (cpu_flags & HMM_PFN_DEVICE_PRIVATE)) {
-		*write_fault = pfns & HMM_PFN_WRITE;
+	if ((pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) &&
+	    (cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
+		*write_fault = pfns & range->flags[HMM_PFN_WRITE];
 		*fault = true;
 	}
 }
@@ -396,13 +400,13 @@ static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 }
 
-static inline uint64_t pmd_to_hmm_pfn_flags(pmd_t pmd)
+static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
 {
 	if (pmd_protnone(pmd))
 		return 0;
-	return pmd_write(pmd) ? HMM_PFN_VALID |
-				HMM_PFN_WRITE :
-				HMM_PFN_VALID;
+	return pmd_write(pmd) ? range->flags[HMM_PFN_VALID] |
+				range->flags[HMM_PFN_WRITE] :
+				range->flags[HMM_PFN_VALID];
 }
 
 static int hmm_vma_handle_pmd(struct mm_walk *walk,
@@ -412,12 +416,13 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 			      pmd_t pmd)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	unsigned long pfn, npages, i;
-	uint64_t flag = 0, cpu_flags;
 	bool fault, write_fault;
+	uint64_t cpu_flags;
 
 	npages = (end - addr) >> PAGE_SHIFT;
-	cpu_flags = pmd_to_hmm_pfn_flags(pmd);
+	cpu_flags = pmd_to_hmm_pfn_flags(range, pmd);
 	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
 			     &fault, &write_fault);
 
@@ -425,20 +430,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + pte_index(addr);
-	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
 	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
-		pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
+		pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags;
 	hmm_vma_walk->last = end;
 	return 0;
 }
 
-static inline uint64_t pte_to_hmm_pfn_flags(pte_t pte)
+static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
 {
 	if (pte_none(pte) || !pte_present(pte))
 		return 0;
-	return pte_write(pte) ? HMM_PFN_VALID |
-				HMM_PFN_WRITE :
-				HMM_PFN_VALID;
+	return pte_write(pte) ? range->flags[HMM_PFN_VALID] |
+				range->flags[HMM_PFN_WRITE] :
+				range->flags[HMM_PFN_VALID];
 }
 
 static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
@@ -446,18 +450,18 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			      uint64_t *pfns)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
 	bool fault, write_fault;
 	uint64_t cpu_flags;
 	pte_t pte = *ptep;
 
-	*pfns = 0;
-	cpu_flags = pte_to_hmm_pfn_flags(pte);
+	*pfns = range->flags[HMM_PFN_NONE];
+	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
 	hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
 			   &fault, &write_fault);
 
 	if (pte_none(pte)) {
-		*pfns = 0;
 		if (fault || write_fault)
 			goto fault;
 		return 0;
@@ -477,11 +481,16 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		 * device and report anything else as error.
 		 */
 		if (is_device_private_entry(entry)) {
-			cpu_flags = HMM_PFN_VALID | HMM_PFN_DEVICE_PRIVATE;
+			cpu_flags = range->flags[HMM_PFN_VALID] |
+				    range->flags[HMM_PFN_DEVICE_PRIVATE];
 			cpu_flags |= is_write_device_private_entry(entry) ?
-					HMM_PFN_WRITE : 0;
-			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
-			*pfns |= HMM_PFN_DEVICE_PRIVATE;
+					range->flags[HMM_PFN_WRITE] : 0;
+			hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
+					   &fault, &write_fault);
+			if (fault || write_fault)
+				goto fault;
+			*pfns = hmm_pfn_from_pfn(range, swp_offset(entry));
+			*pfns |= cpu_flags;
 			return 0;
 		}
 
@@ -504,7 +513,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (fault || write_fault)
 		goto fault;
 
-	*pfns = hmm_pfn_from_pfn(pte_pfn(pte)) | cpu_flags;
+	*pfns = hmm_pfn_from_pfn(range, pte_pfn(pte)) | cpu_flags;
 	return 0;
 
 fault:
@@ -573,12 +582,13 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	return 0;
 }
 
-static void hmm_pfns_clear(uint64_t *pfns,
+static void hmm_pfns_clear(struct hmm_range *range,
+			   uint64_t *pfns,
 			   unsigned long addr,
 			   unsigned long end)
 {
 	for (; addr < end; addr += PAGE_SIZE, pfns++)
-		*pfns = 0;
+		*pfns = range->flags[HMM_PFN_NONE];
 }
 
 static void hmm_pfns_special(struct hmm_range *range)
@@ -586,7 +596,7 @@ static void hmm_pfns_special(struct hmm_range *range)
 	unsigned long addr = range->start, i = 0;
 
 	for (; addr < range->end; addr += PAGE_SIZE, i++)
-		range->pfns[i] = HMM_PFN_SPECIAL;
+		range->pfns[i] = range->flags[HMM_PFN_SPECIAL];
 }
 
 /*
@@ -644,7 +654,7 @@ int hmm_vma_get_pfns(struct hmm_range *range)
 		 * is not a case we care about (some operation like atomic no
 		 * longer make sense).
 		 */
-		hmm_pfns_clear(range->pfns, range->start, range->end);
+		hmm_pfns_clear(range, range->pfns, range->start, range->end);
 		return 0;
 	}
 
@@ -788,7 +798,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
 
 	hmm = hmm_register(vma->vm_mm);
 	if (!hmm) {
-		hmm_pfns_clear(range->pfns, range->start, range->end);
+		hmm_pfns_clear(range, range->pfns, range->start, range->end);
 		return -ENOMEM;
 	}
 	/* Caller must have registered a mirror using hmm_mirror_register() */
@@ -814,7 +824,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
 		 * is not a case we care about (some operation like atomic no
 		 * longer make sense).
 		 */
-		hmm_pfns_clear(range->pfns, range->start, range->end);
+		hmm_pfns_clear(range, range->pfns, range->start, range->end);
 		return 0;
 	}
 
@@ -841,7 +851,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
 		unsigned long i;
 
 		i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
-		hmm_pfns_clear(&range->pfns[i], hmm_vma_walk.last, range->end);
+		hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last,
+			       range->end);
 		hmm_vma_range_done(range);
 	}
 	return ret;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/14] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE
  2018-03-16 20:35 ` jglisse
@ 2018-03-19 23:09   ` John Hubbard
  -1 siblings, 0 replies; 16+ messages in thread
From: John Hubbard @ 2018-03-19 23:09 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Evgeny Baskakov, Ralph Campbell,
	Mark Hairgrove

On 03/16/2018 01:35 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> Make naming consistent accross code, DEVICE_PRIVATE is the name use
> outside HMM code so use that one.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/hmm.h | 4 ++--
>  mm/hmm.c            | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

Seems entirely harmless. :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 6d2b6bf6da4b..78018b3e7a9f 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -88,13 +88,13 @@ struct hmm;
>   *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
>   *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
>   *      set and the pfn value is undefined.
> - * HMM_PFN_DEVICE_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
> + * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
>   */
>  #define HMM_PFN_VALID (1 << 0)
>  #define HMM_PFN_WRITE (1 << 1)
>  #define HMM_PFN_ERROR (1 << 2)
>  #define HMM_PFN_SPECIAL (1 << 3)
> -#define HMM_PFN_DEVICE_UNADDRESSABLE (1 << 4)
> +#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
>  #define HMM_PFN_SHIFT 5
>  
>  /*
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 2118e42cb838..857eec622c98 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -429,7 +429,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  					pfns[i] |= HMM_PFN_WRITE;
>  				} else if (write_fault)
>  					goto fault;
> -				pfns[i] |= HMM_PFN_DEVICE_UNADDRESSABLE;
> +				pfns[i] |= HMM_PFN_DEVICE_PRIVATE;
>  			} else if (is_migration_entry(entry)) {
>  				if (hmm_vma_walk->fault) {
>  					pte_unmap(ptep);
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/14] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE
@ 2018-03-19 23:09   ` John Hubbard
  0 siblings, 0 replies; 16+ messages in thread
From: John Hubbard @ 2018-03-19 23:09 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Evgeny Baskakov, Ralph Campbell,
	Mark Hairgrove

On 03/16/2018 01:35 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> Make naming consistent accross code, DEVICE_PRIVATE is the name use
> outside HMM code so use that one.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/hmm.h | 4 ++--
>  mm/hmm.c            | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

Seems entirely harmless. :)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 6d2b6bf6da4b..78018b3e7a9f 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -88,13 +88,13 @@ struct hmm;
>   *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
>   *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
>   *      set and the pfn value is undefined.
> - * HMM_PFN_DEVICE_UNADDRESSABLE: unaddressable device memory (ZONE_DEVICE)
> + * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
>   */
>  #define HMM_PFN_VALID (1 << 0)
>  #define HMM_PFN_WRITE (1 << 1)
>  #define HMM_PFN_ERROR (1 << 2)
>  #define HMM_PFN_SPECIAL (1 << 3)
> -#define HMM_PFN_DEVICE_UNADDRESSABLE (1 << 4)
> +#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
>  #define HMM_PFN_SHIFT 5
>  
>  /*
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 2118e42cb838..857eec622c98 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -429,7 +429,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  					pfns[i] |= HMM_PFN_WRITE;
>  				} else if (write_fault)
>  					goto fault;
> -				pfns[i] |= HMM_PFN_DEVICE_UNADDRESSABLE;
> +				pfns[i] |= HMM_PFN_DEVICE_PRIVATE;
>  			} else if (is_migration_entry(entry)) {
>  				if (hmm_vma_walk->fault) {
>  					pte_unmap(ptep);
> 

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 11/14] mm/hmm: move hmm_pfns_clear() closer to where it is use
  2018-03-16 20:35   ` jglisse
@ 2018-03-19 23:12     ` John Hubbard
  -1 siblings, 0 replies; 16+ messages in thread
From: John Hubbard @ 2018-03-19 23:12 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Evgeny Baskakov, Ralph Campbell,
	Mark Hairgrove

On 03/16/2018 01:35 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> Move hmm_pfns_clear() closer to where it is use to make it clear it
> is not use by page table walkers.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/hmm.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 857eec622c98..3a708f500b80 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -297,14 +297,6 @@ static int hmm_pfns_bad(unsigned long addr,
>  	return 0;
>  }
>  
> -static void hmm_pfns_clear(uint64_t *pfns,
> -			   unsigned long addr,
> -			   unsigned long end)
> -{
> -	for (; addr < end; addr += PAGE_SIZE, pfns++)
> -		*pfns = 0;
> -}
> -
>  /*
>   * hmm_vma_walk_hole() - handle a range back by no pmd or no pte
>   * @start: range virtual start address (inclusive)
> @@ -463,6 +455,14 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  	return 0;
>  }
>  
> +static void hmm_pfns_clear(uint64_t *pfns,
> +			   unsigned long addr,
> +			   unsigned long end)
> +{
> +	for (; addr < end; addr += PAGE_SIZE, pfns++)
> +		*pfns = 0;
> +}
> +

Yep, identical, so no functional changes.

>  static void hmm_pfns_special(struct hmm_range *range)
>  {
>  	unsigned long addr = range->start, i = 0;

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 11/14] mm/hmm: move hmm_pfns_clear() closer to where it is use
@ 2018-03-19 23:12     ` John Hubbard
  0 siblings, 0 replies; 16+ messages in thread
From: John Hubbard @ 2018-03-19 23:12 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Evgeny Baskakov, Ralph Campbell,
	Mark Hairgrove

On 03/16/2018 01:35 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> Move hmm_pfns_clear() closer to where it is use to make it clear it
> is not use by page table walkers.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  mm/hmm.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 857eec622c98..3a708f500b80 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -297,14 +297,6 @@ static int hmm_pfns_bad(unsigned long addr,
>  	return 0;
>  }
>  
> -static void hmm_pfns_clear(uint64_t *pfns,
> -			   unsigned long addr,
> -			   unsigned long end)
> -{
> -	for (; addr < end; addr += PAGE_SIZE, pfns++)
> -		*pfns = 0;
> -}
> -
>  /*
>   * hmm_vma_walk_hole() - handle a range back by no pmd or no pte
>   * @start: range virtual start address (inclusive)
> @@ -463,6 +455,14 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  	return 0;
>  }
>  
> +static void hmm_pfns_clear(uint64_t *pfns,
> +			   unsigned long addr,
> +			   unsigned long end)
> +{
> +	for (; addr < end; addr += PAGE_SIZE, pfns++)
> +		*pfns = 0;
> +}
> +

Yep, identical, so no functional changes.

>  static void hmm_pfns_special(struct hmm_range *range)
>  {
>  	unsigned long addr = range->start, i = 0;

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 14/14] mm/hmm: use device driver encoding for HMM pfn
  2018-03-16 20:35   ` jglisse
@ 2018-03-19 23:20     ` John Hubbard
  -1 siblings, 0 replies; 16+ messages in thread
From: John Hubbard @ 2018-03-19 23:20 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Evgeny Baskakov, Ralph Campbell,
	Mark Hairgrove

On 03/16/2018 01:35 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> User of hmm_vma_fault() and hmm_vma_get_pfns() provide a flags array
> and pfn shift value allowing them to define their own encoding for HMM
> pfn that are fill inside the pfns array of the hmm_range struct. With
> this device driver can get pfn that match their own private encoding
> out of HMM without having to do any convertion.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/hmm.h | 91 ++++++++++++++++++++++++++++++++---------------------
>  mm/hmm.c            | 83 +++++++++++++++++++++++++++---------------------
>  2 files changed, 102 insertions(+), 72 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index ee758c4e4bec..cb9af99f9371 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -80,68 +80,106 @@
>  struct hmm;
>  
>  /*
> + * hmm_pfn_flag_e - HMM uses its own pfn type to keep several flags per page

OK, so here's the patch that switches over from bits to enum-based flags. But it is
still mysterious to me.

Maybe this is the place to write some details about how this array of flags actually
works. At first reading it is deeply confusing.

p.s. I still need to review the large patches: #11-13. I should get to those tomorrow 
morning. 

thanks,
-- 
John Hubbard
NVIDIA

> + *
>   * Flags:
>   * HMM_PFN_VALID: pfn is valid
>   * HMM_PFN_WRITE: CPU page table has write permission set
>   * HMM_PFN_ERROR: corresponding CPU page table entry points to poisoned memory
> + * HMM_PFN_EMPTY: corresponding CPU page table entry is pte_none()
>   * HMM_PFN_SPECIAL: corresponding CPU page table entry is special; i.e., the
>   *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
>   *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
>   *      set and the pfn value is undefined.
> - * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
> + * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
> + */
> +enum hmm_pfn_flag_e {
> +	HMM_PFN_VALID = 0,
> +	HMM_PFN_WRITE,
> +	HMM_PFN_ERROR,
> +	HMM_PFN_NONE,
> +	HMM_PFN_SPECIAL,
> +	HMM_PFN_DEVICE_PRIVATE,
> +	HMM_PFN_FLAG_MAX
> +};
> +
> +/*
> + * struct hmm_range - track invalidation lock on virtual address range
> + *
> + * @vma: the vm area struct for the range
> + * @list: all range lock are on a list
> + * @start: range virtual start address (inclusive)
> + * @end: range virtual end address (exclusive)
> + * @pfns: array of pfns (big enough for the range)
> + * @flags: pfn flags to match device driver page table
> + * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
> + * @valid: pfns array did not change since it has been fill by an HMM function
>   */
> -#define HMM_PFN_VALID (1 << 0)
> -#define HMM_PFN_WRITE (1 << 1)
> -#define HMM_PFN_ERROR (1 << 2)
> -#define HMM_PFN_SPECIAL (1 << 3)
> -#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
> -#define HMM_PFN_SHIFT 5
> +struct hmm_range {
> +	struct vm_area_struct	*vma;
> +	struct list_head	list;
> +	unsigned long		start;
> +	unsigned long		end;
> +	uint64_t		*pfns;
> +	const uint64_t		*flags;
> +	uint8_t			pfn_shift;
> +	bool			valid;
> +};
>  
>  /*
>   * hmm_pfn_to_page() - return struct page pointed to by a valid HMM pfn
> + * @range: range use to decode HMM pfn value
>   * @pfn: HMM pfn value to get corresponding struct page from
>   * Returns: struct page pointer if pfn is a valid HMM pfn, NULL otherwise
>   *
>   * If the uint64_t is valid (ie valid flag set) then return the struct page
>   * matching the pfn value stored in the HMM pfn. Otherwise return NULL.
>   */
> -static inline struct page *hmm_pfn_to_page(uint64_t pfn)
> +static inline struct page *hmm_pfn_to_page(const struct hmm_range *range,
> +					   uint64_t pfn)
>  {
> -	if (!(pfn & HMM_PFN_VALID))
> +	if (!(pfn & range->flags[HMM_PFN_VALID]))
>  		return NULL;
> -	return pfn_to_page(pfn >> HMM_PFN_SHIFT);
> +	return pfn_to_page(pfn >> range->pfn_shift);
>  }
>  
>  /*
>   * hmm_pfn_to_pfn() - return pfn value store in a HMM pfn
> + * @range: range use to decode HMM pfn value
>   * @pfn: HMM pfn value to extract pfn from
>   * Returns: pfn value if HMM pfn is valid, -1UL otherwise
>   */
> -static inline unsigned long hmm_pfn_to_pfn(uint64_t pfn)
> +static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range,
> +					   uint64_t pfn)
>  {
> -	if (!(pfn & HMM_PFN_VALID))
> +	if (!(pfn & range->flags[HMM_PFN_VALID]))
>  		return -1UL;
> -	return (pfn >> HMM_PFN_SHIFT);
> +	return (pfn >> range->pfn_shift);
>  }
>  
>  /*
>   * hmm_pfn_from_page() - create a valid HMM pfn value from struct page
> + * @range: range use to encode HMM pfn value
>   * @page: struct page pointer for which to create the HMM pfn
>   * Returns: valid HMM pfn for the page
>   */
> -static inline uint64_t hmm_pfn_from_page(struct page *page)
> +static inline uint64_t hmm_pfn_from_page(const struct hmm_range *range,
> +					 struct page *page)
>  {
> -	return (page_to_pfn(page) << HMM_PFN_SHIFT) | HMM_PFN_VALID;
> +	return (page_to_pfn(page) << range->pfn_shift) |
> +		range->flags[HMM_PFN_VALID];
>  }
>  
>  /*
>   * hmm_pfn_from_pfn() - create a valid HMM pfn value from pfn
> + * @range: range use to encode HMM pfn value
>   * @pfn: pfn value for which to create the HMM pfn
>   * Returns: valid HMM pfn for the pfn
>   */
> -static inline uint64_t hmm_pfn_from_pfn(unsigned long pfn)
> +static inline uint64_t hmm_pfn_from_pfn(const struct hmm_range *range,
> +					unsigned long pfn)
>  {
> -	return (pfn << HMM_PFN_SHIFT) | HMM_PFN_VALID;
> +	return (pfn << range->pfn_shift) | range->flags[HMM_PFN_VALID];
>  }
>  
>  
> @@ -263,25 +301,6 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
>  void hmm_mirror_unregister(struct hmm_mirror *mirror);
>  
>  
> -/*
> - * struct hmm_range - track invalidation lock on virtual address range
> - *
> - * @vma: the vm area struct for the range
> - * @list: all range lock are on a list
> - * @start: range virtual start address (inclusive)
> - * @end: range virtual end address (exclusive)
> - * @pfns: array of pfns (big enough for the range)
> - * @valid: pfns array did not change since it has been fill by an HMM function
> - */
> -struct hmm_range {
> -	struct vm_area_struct	*vma;
> -	struct list_head	list;
> -	unsigned long		start;
> -	unsigned long		end;
> -	uint64_t		*pfns;
> -	bool			valid;
> -};
> -
>  /*
>   * To snapshot the CPU page table, call hmm_vma_get_pfns(), then take a device
>   * driver lock that serializes device page table updates, then call
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 0ea530d0fd1d..7ccca5478ea1 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -263,6 +263,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
>  {
>  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	struct vm_area_struct *vma = walk->vma;
>  	int r;
>  
> @@ -272,7 +273,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
>  	if (r & VM_FAULT_RETRY)
>  		return -EBUSY;
>  	if (r & VM_FAULT_ERROR) {
> -		*pfn = HMM_PFN_ERROR;
> +		*pfn = range->flags[HMM_PFN_ERROR];
>  		return -EFAULT;
>  	}
>  
> @@ -290,7 +291,7 @@ static int hmm_pfns_bad(unsigned long addr,
>  
>  	i = (addr - range->start) >> PAGE_SHIFT;
>  	for (; addr < end; addr += PAGE_SIZE, i++)
> -		pfns[i] = HMM_PFN_ERROR;
> +		pfns[i] = range->flags[HMM_PFN_ERROR];
>  
>  	return 0;
>  }
> @@ -319,7 +320,7 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
>  	hmm_vma_walk->last = addr;
>  	i = (addr - range->start) >> PAGE_SHIFT;
>  	for (; addr < end; addr += PAGE_SIZE, i++) {
> -		pfns[i] = 0;
> +		pfns[i] = range->flags[HMM_PFN_NONE];
>  		if (fault || write_fault) {
>  			int ret;
>  
> @@ -337,24 +338,27 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
>  				      uint64_t pfns, uint64_t cpu_flags,
>  				      bool *fault, bool *write_fault)
>  {
> +	struct hmm_range *range = hmm_vma_walk->range;
> +
>  	*fault = *write_fault = false;
>  	if (!hmm_vma_walk->fault)
>  		return;
>  
>  	/* We aren't ask to do anything ... */
> -	if (!(pfns & HMM_PFN_VALID))
> +	if (!(pfns & range->flags[HMM_PFN_VALID]))
>  		return;
>  	/* If CPU page table is not valid then we need to fault */
> -	*fault = cpu_flags & HMM_PFN_VALID;
> +	*fault = cpu_flags & range->flags[HMM_PFN_VALID];
>  	/* Need to write fault ? */
> -	if ((pfns & HMM_PFN_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) {
> +	if ((pfns & range->flags[HMM_PFN_WRITE]) &&
> +	    !(cpu_flags & range->flags[HMM_PFN_WRITE])) {
>  		*fault = *write_fault = false;
>  		return;
>  	}
>  	/* Do we fault on device memory ? */
> -	if ((pfns & HMM_PFN_DEVICE_PRIVATE) &&
> -	    (cpu_flags & HMM_PFN_DEVICE_PRIVATE)) {
> -		*write_fault = pfns & HMM_PFN_WRITE;
> +	if ((pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) &&
> +	    (cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
> +		*write_fault = pfns & range->flags[HMM_PFN_WRITE];
>  		*fault = true;
>  	}
>  }
> @@ -396,13 +400,13 @@ static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
>  	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>  }
>  
> -static inline uint64_t pmd_to_hmm_pfn_flags(pmd_t pmd)
> +static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
>  {
>  	if (pmd_protnone(pmd))
>  		return 0;
> -	return pmd_write(pmd) ? HMM_PFN_VALID |
> -				HMM_PFN_WRITE :
> -				HMM_PFN_VALID;
> +	return pmd_write(pmd) ? range->flags[HMM_PFN_VALID] |
> +				range->flags[HMM_PFN_WRITE] :
> +				range->flags[HMM_PFN_VALID];
>  }
>  
>  static int hmm_vma_handle_pmd(struct mm_walk *walk,
> @@ -412,12 +416,13 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>  			      pmd_t pmd)
>  {
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	unsigned long pfn, npages, i;
> -	uint64_t flag = 0, cpu_flags;
>  	bool fault, write_fault;
> +	uint64_t cpu_flags;
>  
>  	npages = (end - addr) >> PAGE_SHIFT;
> -	cpu_flags = pmd_to_hmm_pfn_flags(pmd);
> +	cpu_flags = pmd_to_hmm_pfn_flags(range, pmd);
>  	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
>  			     &fault, &write_fault);
>  
> @@ -425,20 +430,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>  		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>  
>  	pfn = pmd_pfn(pmd) + pte_index(addr);
> -	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
>  	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
> -		pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
> +		pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags;
>  	hmm_vma_walk->last = end;
>  	return 0;
>  }
>  
> -static inline uint64_t pte_to_hmm_pfn_flags(pte_t pte)
> +static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
>  {
>  	if (pte_none(pte) || !pte_present(pte))
>  		return 0;
> -	return pte_write(pte) ? HMM_PFN_VALID |
> -				HMM_PFN_WRITE :
> -				HMM_PFN_VALID;
> +	return pte_write(pte) ? range->flags[HMM_PFN_VALID] |
> +				range->flags[HMM_PFN_WRITE] :
> +				range->flags[HMM_PFN_VALID];
>  }
>  
>  static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> @@ -446,18 +450,18 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>  			      uint64_t *pfns)
>  {
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	struct vm_area_struct *vma = walk->vma;
>  	bool fault, write_fault;
>  	uint64_t cpu_flags;
>  	pte_t pte = *ptep;
>  
> -	*pfns = 0;
> -	cpu_flags = pte_to_hmm_pfn_flags(pte);
> +	*pfns = range->flags[HMM_PFN_NONE];
> +	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
>  	hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
>  			   &fault, &write_fault);
>  
>  	if (pte_none(pte)) {
> -		*pfns = 0;
>  		if (fault || write_fault)
>  			goto fault;
>  		return 0;
> @@ -477,11 +481,16 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>  		 * device and report anything else as error.
>  		 */
>  		if (is_device_private_entry(entry)) {
> -			cpu_flags = HMM_PFN_VALID | HMM_PFN_DEVICE_PRIVATE;
> +			cpu_flags = range->flags[HMM_PFN_VALID] |
> +				    range->flags[HMM_PFN_DEVICE_PRIVATE];
>  			cpu_flags |= is_write_device_private_entry(entry) ?
> -					HMM_PFN_WRITE : 0;
> -			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
> -			*pfns |= HMM_PFN_DEVICE_PRIVATE;
> +					range->flags[HMM_PFN_WRITE] : 0;
> +			hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
> +					   &fault, &write_fault);
> +			if (fault || write_fault)
> +				goto fault;
> +			*pfns = hmm_pfn_from_pfn(range, swp_offset(entry));
> +			*pfns |= cpu_flags;
>  			return 0;
>  		}
>  
> @@ -504,7 +513,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>  	if (fault || write_fault)
>  		goto fault;
>  
> -	*pfns = hmm_pfn_from_pfn(pte_pfn(pte)) | cpu_flags;
> +	*pfns = hmm_pfn_from_pfn(range, pte_pfn(pte)) | cpu_flags;
>  	return 0;
>  
>  fault:
> @@ -573,12 +582,13 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  	return 0;
>  }
>  
> -static void hmm_pfns_clear(uint64_t *pfns,
> +static void hmm_pfns_clear(struct hmm_range *range,
> +			   uint64_t *pfns,
>  			   unsigned long addr,
>  			   unsigned long end)
>  {
>  	for (; addr < end; addr += PAGE_SIZE, pfns++)
> -		*pfns = 0;
> +		*pfns = range->flags[HMM_PFN_NONE];
>  }
>  
>  static void hmm_pfns_special(struct hmm_range *range)
> @@ -586,7 +596,7 @@ static void hmm_pfns_special(struct hmm_range *range)
>  	unsigned long addr = range->start, i = 0;
>  
>  	for (; addr < range->end; addr += PAGE_SIZE, i++)
> -		range->pfns[i] = HMM_PFN_SPECIAL;
> +		range->pfns[i] = range->flags[HMM_PFN_SPECIAL];
>  }
>  
>  /*
> @@ -644,7 +654,7 @@ int hmm_vma_get_pfns(struct hmm_range *range)
>  		 * is not a case we care about (some operation like atomic no
>  		 * longer make sense).
>  		 */
> -		hmm_pfns_clear(range->pfns, range->start, range->end);
> +		hmm_pfns_clear(range, range->pfns, range->start, range->end);
>  		return 0;
>  	}
>  
> @@ -788,7 +798,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
>  
>  	hmm = hmm_register(vma->vm_mm);
>  	if (!hmm) {
> -		hmm_pfns_clear(range->pfns, range->start, range->end);
> +		hmm_pfns_clear(range, range->pfns, range->start, range->end);
>  		return -ENOMEM;
>  	}
>  	/* Caller must have registered a mirror using hmm_mirror_register() */
> @@ -814,7 +824,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
>  		 * is not a case we care about (some operation like atomic no
>  		 * longer make sense).
>  		 */
> -		hmm_pfns_clear(range->pfns, range->start, range->end);
> +		hmm_pfns_clear(range, range->pfns, range->start, range->end);
>  		return 0;
>  	}
>  
> @@ -841,7 +851,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
>  		unsigned long i;
>  
>  		i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
> -		hmm_pfns_clear(&range->pfns[i], hmm_vma_walk.last, range->end);
> +		hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last,
> +			       range->end);
>  		hmm_vma_range_done(range);
>  	}
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 14/14] mm/hmm: use device driver encoding for HMM pfn
@ 2018-03-19 23:20     ` John Hubbard
  0 siblings, 0 replies; 16+ messages in thread
From: John Hubbard @ 2018-03-19 23:20 UTC (permalink / raw)
  To: jglisse, linux-mm
  Cc: Andrew Morton, linux-kernel, Evgeny Baskakov, Ralph Campbell,
	Mark Hairgrove

On 03/16/2018 01:35 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> User of hmm_vma_fault() and hmm_vma_get_pfns() provide a flags array
> and pfn shift value allowing them to define their own encoding for HMM
> pfn that are fill inside the pfns array of the hmm_range struct. With
> this device driver can get pfn that match their own private encoding
> out of HMM without having to do any convertion.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mark Hairgrove <mhairgrove@nvidia.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/hmm.h | 91 ++++++++++++++++++++++++++++++++---------------------
>  mm/hmm.c            | 83 +++++++++++++++++++++++++++---------------------
>  2 files changed, 102 insertions(+), 72 deletions(-)
> 
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index ee758c4e4bec..cb9af99f9371 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -80,68 +80,106 @@
>  struct hmm;
>  
>  /*
> + * hmm_pfn_flag_e - HMM uses its own pfn type to keep several flags per page

OK, so here's the patch that switches over from bits to enum-based flags. But it is
still mysterious to me.

Maybe this is the place to write some details about how this array of flags actually
works. At first reading it is deeply confusing.

p.s. I still need to review the large patches: #11-13. I should get to those tomorrow 
morning. 

thanks,
-- 
John Hubbard
NVIDIA

> + *
>   * Flags:
>   * HMM_PFN_VALID: pfn is valid
>   * HMM_PFN_WRITE: CPU page table has write permission set
>   * HMM_PFN_ERROR: corresponding CPU page table entry points to poisoned memory
> + * HMM_PFN_EMPTY: corresponding CPU page table entry is pte_none()
>   * HMM_PFN_SPECIAL: corresponding CPU page table entry is special; i.e., the
>   *      result of vm_insert_pfn() or vm_insert_page(). Therefore, it should not
>   *      be mirrored by a device, because the entry will never have HMM_PFN_VALID
>   *      set and the pfn value is undefined.
> - * HMM_PFN_DEVICE_PRIVATE: unaddressable device memory (ZONE_DEVICE)
> + * HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
> + */
> +enum hmm_pfn_flag_e {
> +	HMM_PFN_VALID = 0,
> +	HMM_PFN_WRITE,
> +	HMM_PFN_ERROR,
> +	HMM_PFN_NONE,
> +	HMM_PFN_SPECIAL,
> +	HMM_PFN_DEVICE_PRIVATE,
> +	HMM_PFN_FLAG_MAX
> +};
> +
> +/*
> + * struct hmm_range - track invalidation lock on virtual address range
> + *
> + * @vma: the vm area struct for the range
> + * @list: all range lock are on a list
> + * @start: range virtual start address (inclusive)
> + * @end: range virtual end address (exclusive)
> + * @pfns: array of pfns (big enough for the range)
> + * @flags: pfn flags to match device driver page table
> + * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
> + * @valid: pfns array did not change since it has been fill by an HMM function
>   */
> -#define HMM_PFN_VALID (1 << 0)
> -#define HMM_PFN_WRITE (1 << 1)
> -#define HMM_PFN_ERROR (1 << 2)
> -#define HMM_PFN_SPECIAL (1 << 3)
> -#define HMM_PFN_DEVICE_PRIVATE (1 << 4)
> -#define HMM_PFN_SHIFT 5
> +struct hmm_range {
> +	struct vm_area_struct	*vma;
> +	struct list_head	list;
> +	unsigned long		start;
> +	unsigned long		end;
> +	uint64_t		*pfns;
> +	const uint64_t		*flags;
> +	uint8_t			pfn_shift;
> +	bool			valid;
> +};
>  
>  /*
>   * hmm_pfn_to_page() - return struct page pointed to by a valid HMM pfn
> + * @range: range use to decode HMM pfn value
>   * @pfn: HMM pfn value to get corresponding struct page from
>   * Returns: struct page pointer if pfn is a valid HMM pfn, NULL otherwise
>   *
>   * If the uint64_t is valid (ie valid flag set) then return the struct page
>   * matching the pfn value stored in the HMM pfn. Otherwise return NULL.
>   */
> -static inline struct page *hmm_pfn_to_page(uint64_t pfn)
> +static inline struct page *hmm_pfn_to_page(const struct hmm_range *range,
> +					   uint64_t pfn)
>  {
> -	if (!(pfn & HMM_PFN_VALID))
> +	if (!(pfn & range->flags[HMM_PFN_VALID]))
>  		return NULL;
> -	return pfn_to_page(pfn >> HMM_PFN_SHIFT);
> +	return pfn_to_page(pfn >> range->pfn_shift);
>  }
>  
>  /*
>   * hmm_pfn_to_pfn() - return pfn value store in a HMM pfn
> + * @range: range use to decode HMM pfn value
>   * @pfn: HMM pfn value to extract pfn from
>   * Returns: pfn value if HMM pfn is valid, -1UL otherwise
>   */
> -static inline unsigned long hmm_pfn_to_pfn(uint64_t pfn)
> +static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range,
> +					   uint64_t pfn)
>  {
> -	if (!(pfn & HMM_PFN_VALID))
> +	if (!(pfn & range->flags[HMM_PFN_VALID]))
>  		return -1UL;
> -	return (pfn >> HMM_PFN_SHIFT);
> +	return (pfn >> range->pfn_shift);
>  }
>  
>  /*
>   * hmm_pfn_from_page() - create a valid HMM pfn value from struct page
> + * @range: range use to encode HMM pfn value
>   * @page: struct page pointer for which to create the HMM pfn
>   * Returns: valid HMM pfn for the page
>   */
> -static inline uint64_t hmm_pfn_from_page(struct page *page)
> +static inline uint64_t hmm_pfn_from_page(const struct hmm_range *range,
> +					 struct page *page)
>  {
> -	return (page_to_pfn(page) << HMM_PFN_SHIFT) | HMM_PFN_VALID;
> +	return (page_to_pfn(page) << range->pfn_shift) |
> +		range->flags[HMM_PFN_VALID];
>  }
>  
>  /*
>   * hmm_pfn_from_pfn() - create a valid HMM pfn value from pfn
> + * @range: range use to encode HMM pfn value
>   * @pfn: pfn value for which to create the HMM pfn
>   * Returns: valid HMM pfn for the pfn
>   */
> -static inline uint64_t hmm_pfn_from_pfn(unsigned long pfn)
> +static inline uint64_t hmm_pfn_from_pfn(const struct hmm_range *range,
> +					unsigned long pfn)
>  {
> -	return (pfn << HMM_PFN_SHIFT) | HMM_PFN_VALID;
> +	return (pfn << range->pfn_shift) | range->flags[HMM_PFN_VALID];
>  }
>  
>  
> @@ -263,25 +301,6 @@ int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm);
>  void hmm_mirror_unregister(struct hmm_mirror *mirror);
>  
>  
> -/*
> - * struct hmm_range - track invalidation lock on virtual address range
> - *
> - * @vma: the vm area struct for the range
> - * @list: all range lock are on a list
> - * @start: range virtual start address (inclusive)
> - * @end: range virtual end address (exclusive)
> - * @pfns: array of pfns (big enough for the range)
> - * @valid: pfns array did not change since it has been fill by an HMM function
> - */
> -struct hmm_range {
> -	struct vm_area_struct	*vma;
> -	struct list_head	list;
> -	unsigned long		start;
> -	unsigned long		end;
> -	uint64_t		*pfns;
> -	bool			valid;
> -};
> -
>  /*
>   * To snapshot the CPU page table, call hmm_vma_get_pfns(), then take a device
>   * driver lock that serializes device page table updates, then call
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 0ea530d0fd1d..7ccca5478ea1 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -263,6 +263,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
>  {
>  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_REMOTE;
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	struct vm_area_struct *vma = walk->vma;
>  	int r;
>  
> @@ -272,7 +273,7 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
>  	if (r & VM_FAULT_RETRY)
>  		return -EBUSY;
>  	if (r & VM_FAULT_ERROR) {
> -		*pfn = HMM_PFN_ERROR;
> +		*pfn = range->flags[HMM_PFN_ERROR];
>  		return -EFAULT;
>  	}
>  
> @@ -290,7 +291,7 @@ static int hmm_pfns_bad(unsigned long addr,
>  
>  	i = (addr - range->start) >> PAGE_SHIFT;
>  	for (; addr < end; addr += PAGE_SIZE, i++)
> -		pfns[i] = HMM_PFN_ERROR;
> +		pfns[i] = range->flags[HMM_PFN_ERROR];
>  
>  	return 0;
>  }
> @@ -319,7 +320,7 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
>  	hmm_vma_walk->last = addr;
>  	i = (addr - range->start) >> PAGE_SHIFT;
>  	for (; addr < end; addr += PAGE_SIZE, i++) {
> -		pfns[i] = 0;
> +		pfns[i] = range->flags[HMM_PFN_NONE];
>  		if (fault || write_fault) {
>  			int ret;
>  
> @@ -337,24 +338,27 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
>  				      uint64_t pfns, uint64_t cpu_flags,
>  				      bool *fault, bool *write_fault)
>  {
> +	struct hmm_range *range = hmm_vma_walk->range;
> +
>  	*fault = *write_fault = false;
>  	if (!hmm_vma_walk->fault)
>  		return;
>  
>  	/* We aren't ask to do anything ... */
> -	if (!(pfns & HMM_PFN_VALID))
> +	if (!(pfns & range->flags[HMM_PFN_VALID]))
>  		return;
>  	/* If CPU page table is not valid then we need to fault */
> -	*fault = cpu_flags & HMM_PFN_VALID;
> +	*fault = cpu_flags & range->flags[HMM_PFN_VALID];
>  	/* Need to write fault ? */
> -	if ((pfns & HMM_PFN_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) {
> +	if ((pfns & range->flags[HMM_PFN_WRITE]) &&
> +	    !(cpu_flags & range->flags[HMM_PFN_WRITE])) {
>  		*fault = *write_fault = false;
>  		return;
>  	}
>  	/* Do we fault on device memory ? */
> -	if ((pfns & HMM_PFN_DEVICE_PRIVATE) &&
> -	    (cpu_flags & HMM_PFN_DEVICE_PRIVATE)) {
> -		*write_fault = pfns & HMM_PFN_WRITE;
> +	if ((pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) &&
> +	    (cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
> +		*write_fault = pfns & range->flags[HMM_PFN_WRITE];
>  		*fault = true;
>  	}
>  }
> @@ -396,13 +400,13 @@ static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
>  	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>  }
>  
> -static inline uint64_t pmd_to_hmm_pfn_flags(pmd_t pmd)
> +static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
>  {
>  	if (pmd_protnone(pmd))
>  		return 0;
> -	return pmd_write(pmd) ? HMM_PFN_VALID |
> -				HMM_PFN_WRITE :
> -				HMM_PFN_VALID;
> +	return pmd_write(pmd) ? range->flags[HMM_PFN_VALID] |
> +				range->flags[HMM_PFN_WRITE] :
> +				range->flags[HMM_PFN_VALID];
>  }
>  
>  static int hmm_vma_handle_pmd(struct mm_walk *walk,
> @@ -412,12 +416,13 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>  			      pmd_t pmd)
>  {
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	unsigned long pfn, npages, i;
> -	uint64_t flag = 0, cpu_flags;
>  	bool fault, write_fault;
> +	uint64_t cpu_flags;
>  
>  	npages = (end - addr) >> PAGE_SHIFT;
> -	cpu_flags = pmd_to_hmm_pfn_flags(pmd);
> +	cpu_flags = pmd_to_hmm_pfn_flags(range, pmd);
>  	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
>  			     &fault, &write_fault);
>  
> @@ -425,20 +430,19 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>  		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>  
>  	pfn = pmd_pfn(pmd) + pte_index(addr);
> -	flag |= pmd_write(pmd) ? HMM_PFN_WRITE : 0;
>  	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
> -		pfns[i] = hmm_pfn_from_pfn(pfn) | flag;
> +		pfns[i] = hmm_pfn_from_pfn(range, pfn) | cpu_flags;
>  	hmm_vma_walk->last = end;
>  	return 0;
>  }
>  
> -static inline uint64_t pte_to_hmm_pfn_flags(pte_t pte)
> +static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
>  {
>  	if (pte_none(pte) || !pte_present(pte))
>  		return 0;
> -	return pte_write(pte) ? HMM_PFN_VALID |
> -				HMM_PFN_WRITE :
> -				HMM_PFN_VALID;
> +	return pte_write(pte) ? range->flags[HMM_PFN_VALID] |
> +				range->flags[HMM_PFN_WRITE] :
> +				range->flags[HMM_PFN_VALID];
>  }
>  
>  static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> @@ -446,18 +450,18 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>  			      uint64_t *pfns)
>  {
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
>  	struct vm_area_struct *vma = walk->vma;
>  	bool fault, write_fault;
>  	uint64_t cpu_flags;
>  	pte_t pte = *ptep;
>  
> -	*pfns = 0;
> -	cpu_flags = pte_to_hmm_pfn_flags(pte);
> +	*pfns = range->flags[HMM_PFN_NONE];
> +	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
>  	hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
>  			   &fault, &write_fault);
>  
>  	if (pte_none(pte)) {
> -		*pfns = 0;
>  		if (fault || write_fault)
>  			goto fault;
>  		return 0;
> @@ -477,11 +481,16 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>  		 * device and report anything else as error.
>  		 */
>  		if (is_device_private_entry(entry)) {
> -			cpu_flags = HMM_PFN_VALID | HMM_PFN_DEVICE_PRIVATE;
> +			cpu_flags = range->flags[HMM_PFN_VALID] |
> +				    range->flags[HMM_PFN_DEVICE_PRIVATE];
>  			cpu_flags |= is_write_device_private_entry(entry) ?
> -					HMM_PFN_WRITE : 0;
> -			*pfns = hmm_pfn_from_pfn(swp_offset(entry));
> -			*pfns |= HMM_PFN_DEVICE_PRIVATE;
> +					range->flags[HMM_PFN_WRITE] : 0;
> +			hmm_pte_need_fault(hmm_vma_walk, *pfns, cpu_flags,
> +					   &fault, &write_fault);
> +			if (fault || write_fault)
> +				goto fault;
> +			*pfns = hmm_pfn_from_pfn(range, swp_offset(entry));
> +			*pfns |= cpu_flags;
>  			return 0;
>  		}
>  
> @@ -504,7 +513,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>  	if (fault || write_fault)
>  		goto fault;
>  
> -	*pfns = hmm_pfn_from_pfn(pte_pfn(pte)) | cpu_flags;
> +	*pfns = hmm_pfn_from_pfn(range, pte_pfn(pte)) | cpu_flags;
>  	return 0;
>  
>  fault:
> @@ -573,12 +582,13 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>  	return 0;
>  }
>  
> -static void hmm_pfns_clear(uint64_t *pfns,
> +static void hmm_pfns_clear(struct hmm_range *range,
> +			   uint64_t *pfns,
>  			   unsigned long addr,
>  			   unsigned long end)
>  {
>  	for (; addr < end; addr += PAGE_SIZE, pfns++)
> -		*pfns = 0;
> +		*pfns = range->flags[HMM_PFN_NONE];
>  }
>  
>  static void hmm_pfns_special(struct hmm_range *range)
> @@ -586,7 +596,7 @@ static void hmm_pfns_special(struct hmm_range *range)
>  	unsigned long addr = range->start, i = 0;
>  
>  	for (; addr < range->end; addr += PAGE_SIZE, i++)
> -		range->pfns[i] = HMM_PFN_SPECIAL;
> +		range->pfns[i] = range->flags[HMM_PFN_SPECIAL];
>  }
>  
>  /*
> @@ -644,7 +654,7 @@ int hmm_vma_get_pfns(struct hmm_range *range)
>  		 * is not a case we care about (some operation like atomic no
>  		 * longer make sense).
>  		 */
> -		hmm_pfns_clear(range->pfns, range->start, range->end);
> +		hmm_pfns_clear(range, range->pfns, range->start, range->end);
>  		return 0;
>  	}
>  
> @@ -788,7 +798,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
>  
>  	hmm = hmm_register(vma->vm_mm);
>  	if (!hmm) {
> -		hmm_pfns_clear(range->pfns, range->start, range->end);
> +		hmm_pfns_clear(range, range->pfns, range->start, range->end);
>  		return -ENOMEM;
>  	}
>  	/* Caller must have registered a mirror using hmm_mirror_register() */
> @@ -814,7 +824,7 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
>  		 * is not a case we care about (some operation like atomic no
>  		 * longer make sense).
>  		 */
> -		hmm_pfns_clear(range->pfns, range->start, range->end);
> +		hmm_pfns_clear(range, range->pfns, range->start, range->end);
>  		return 0;
>  	}
>  
> @@ -841,7 +851,8 @@ int hmm_vma_fault(struct hmm_range *range, bool block)
>  		unsigned long i;
>  
>  		i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
> -		hmm_pfns_clear(&range->pfns[i], hmm_vma_walk.last, range->end);
> +		hmm_pfns_clear(range, &range->pfns[i], hmm_vma_walk.last,
> +			       range->end);
>  		hmm_vma_range_done(range);
>  	}
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-19 23:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16 20:35 [PATCH 10/14] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE jglisse
2018-03-16 20:35 ` jglisse
2018-03-16 20:35 ` [PATCH 11/14] mm/hmm: move hmm_pfns_clear() closer to where it is use jglisse
2018-03-16 20:35   ` jglisse
2018-03-19 23:12   ` John Hubbard
2018-03-19 23:12     ` John Hubbard
2018-03-16 20:35 ` [PATCH 12/14] mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd() jglisse
2018-03-16 20:35   ` jglisse
2018-03-16 20:35 ` [PATCH 13/14] mm/hmm: change hmm_vma_fault() to allow write fault on page basis jglisse
2018-03-16 20:35   ` jglisse
2018-03-16 20:35 ` [PATCH 14/14] mm/hmm: use device driver encoding for HMM pfn jglisse
2018-03-16 20:35   ` jglisse
2018-03-19 23:20   ` John Hubbard
2018-03-19 23:20     ` John Hubbard
2018-03-19 23:09 ` [PATCH 10/14] mm/hmm: rename HMM_PFN_DEVICE_UNADDRESSABLE to HMM_PFN_DEVICE_PRIVATE John Hubbard
2018-03-19 23:09   ` John Hubbard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.