All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH  hmm 0/8] Various error case bug fixes for hmm_range_fault()
@ 2020-03-11 18:34 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:34 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

The hmm_range_fault() flow is fairly complicated. The scheme allows the
caller to specify if it needs a usable result for each page, or if it only
needs the current page table status filled in. This mixture of behavior is
useful for a caller that wants to build a 'prefetch around fault'
algorithm.

Although we have no in-tree users of this capability, I am working on
having RDMA ODP work in this manner, and removing these bugs from
hmm_range_fault() is a necessary step.

The basic principles are:

 - If the caller did not ask for a VA to be valid then hmm_range_fault()
   should not fail because of that VA

 - If 0 is returned from hmm_range_fault() then the entire pfns array
   contains valid data

 - HMM_PFN_ERROR is set if faulting fails, or if asking for faulting
   would fail

 - A 0 return from hmm_range_fault() does not have HMM_PFN_ERROR in any
   VA's the caller asked to be valid

This series does not get there completely, I have a followup series
closing some more complex cases.

I tested this series using Ralph's hmm tester he posted a while back,
other testing would be appreciated.

Jason Gunthorpe (8):
  mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  mm/hmm: don't free the cached pgmap while scanning
  mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
  mm/hmm: add missing call to hmm_range_need_fault() before returning
    EFAULT
  mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
  mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid
    pages
  mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL
    handling

 mm/hmm.c | 166 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 79 insertions(+), 87 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH  hmm 0/8] Various error case bug fixes for hmm_range_fault()
@ 2020-03-11 18:34 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:34 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

The hmm_range_fault() flow is fairly complicated. The scheme allows the
caller to specify if it needs a usable result for each page, or if it only
needs the current page table status filled in. This mixture of behavior is
useful for a caller that wants to build a 'prefetch around fault'
algorithm.

Although we have no in-tree users of this capability, I am working on
having RDMA ODP work in this manner, and removing these bugs from
hmm_range_fault() is a necessary step.

The basic principles are:

 - If the caller did not ask for a VA to be valid then hmm_range_fault()
   should not fail because of that VA

 - If 0 is returned from hmm_range_fault() then the entire pfns array
   contains valid data

 - HMM_PFN_ERROR is set if faulting fails, or if asking for faulting
   would fail

 - A 0 return from hmm_range_fault() does not have HMM_PFN_ERROR in any
   VA's the caller asked to be valid

This series does not get there completely, I have a followup series
closing some more complex cases.

I tested this series using Ralph's hmm tester he posted a while back,
other testing would be appreciated.

Jason Gunthorpe (8):
  mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  mm/hmm: don't free the cached pgmap while scanning
  mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
  mm/hmm: add missing call to hmm_range_need_fault() before returning
    EFAULT
  mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
  mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid
    pages
  mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL
    handling

 mm/hmm.c | 166 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 79 insertions(+), 87 deletions(-)

-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH  hmm 0/8] Various error case bug fixes for hmm_range_fault()
@ 2020-03-11 18:34 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:34 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

The hmm_range_fault() flow is fairly complicated. The scheme allows the
caller to specify if it needs a usable result for each page, or if it only
needs the current page table status filled in. This mixture of behavior is
useful for a caller that wants to build a 'prefetch around fault'
algorithm.

Although we have no in-tree users of this capability, I am working on
having RDMA ODP work in this manner, and removing these bugs from
hmm_range_fault() is a necessary step.

The basic principles are:

 - If the caller did not ask for a VA to be valid then hmm_range_fault()
   should not fail because of that VA

 - If 0 is returned from hmm_range_fault() then the entire pfns array
   contains valid data

 - HMM_PFN_ERROR is set if faulting fails, or if asking for faulting
   would fail

 - A 0 return from hmm_range_fault() does not have HMM_PFN_ERROR in any
   VA's the caller asked to be valid

This series does not get there completely, I have a followup series
closing some more complex cases.

I tested this series using Ralph's hmm tester he posted a while back,
other testing would be appreciated.

Jason Gunthorpe (8):
  mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  mm/hmm: don't free the cached pgmap while scanning
  mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
  mm/hmm: add missing call to hmm_range_need_fault() before returning
    EFAULT
  mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
  mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid
    pages
  mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL
    handling

 mm/hmm.c | 166 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 79 insertions(+), 87 deletions(-)

-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH  hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:34   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:34 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

Many of the direct returns of error skipped doing the pte_unmap(). All non
zero exit paths must unmap the pte.

The pte_unmap() is split unnaturally like this because some of the error
exit paths trigger a sleep and must release the lock before sleeping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 72e5a6d9a41756..35f85424176d14 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		}
 
 		/* Report error for everything else */
+		pte_unmap(ptep);
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
 	} else {
@@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (pte_devmap(pte)) {
 		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
 					      hmm_vma_walk->pgmap);
-		if (unlikely(!hmm_vma_walk->pgmap))
+		if (unlikely(!hmm_vma_walk->pgmap)) {
+			pte_unmap(ptep);
 			return -EBUSY;
+		}
 	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
 		if (!is_zero_pfn(pte_pfn(pte))) {
+			pte_unmap(ptep);
 			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
@@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 
 		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
 		if (r) {
-			/* hmm_vma_handle_pte() did unmap pte directory */
+			/* hmm_vma_handle_pte() did pte_unmap() */
 			hmm_vma_walk->last = addr;
 			return r;
 		}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-11 18:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:34 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

Many of the direct returns of error skipped doing the pte_unmap(). All non
zero exit paths must unmap the pte.

The pte_unmap() is split unnaturally like this because some of the error
exit paths trigger a sleep and must release the lock before sleeping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 72e5a6d9a41756..35f85424176d14 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		}
 
 		/* Report error for everything else */
+		pte_unmap(ptep);
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
 	} else {
@@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (pte_devmap(pte)) {
 		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
 					      hmm_vma_walk->pgmap);
-		if (unlikely(!hmm_vma_walk->pgmap))
+		if (unlikely(!hmm_vma_walk->pgmap)) {
+			pte_unmap(ptep);
 			return -EBUSY;
+		}
 	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
 		if (!is_zero_pfn(pte_pfn(pte))) {
+			pte_unmap(ptep);
 			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
@@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 
 		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
 		if (r) {
-			/* hmm_vma_handle_pte() did unmap pte directory */
+			/* hmm_vma_handle_pte() did pte_unmap() */
 			hmm_vma_walk->last = addr;
 			return r;
 		}
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-11 18:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:34 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

Many of the direct returns of error skipped doing the pte_unmap(). All non
zero exit paths must unmap the pte.

The pte_unmap() is split unnaturally like this because some of the error
exit paths trigger a sleep and must release the lock before sleeping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 72e5a6d9a41756..35f85424176d14 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		}
 
 		/* Report error for everything else */
+		pte_unmap(ptep);
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
 	} else {
@@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (pte_devmap(pte)) {
 		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
 					      hmm_vma_walk->pgmap);
-		if (unlikely(!hmm_vma_walk->pgmap))
+		if (unlikely(!hmm_vma_walk->pgmap)) {
+			pte_unmap(ptep);
 			return -EBUSY;
+		}
 	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
 		if (!is_zero_pfn(pte_pfn(pte))) {
+			pte_unmap(ptep);
 			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
@@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 
 		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
 		if (r) {
-			/* hmm_vma_handle_pte() did unmap pte directory */
+			/* hmm_vma_handle_pte() did pte_unmap() */
 			hmm_vma_walk->last = addr;
 			return r;
 		}
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

The pgmap is held in the hmm_vma_walk variable in hope of speeding up
future get_dev_pagemap() calls by hitting the same pointer. The algorithm
doesn't actually care about how long the pgmap is held for.

Move the put of the cached pgmap to after the walk is completed and delete
all the other now redundant puts.

This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
hmm_vma_handle_pte() fails while looping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

We talked about just deleting this stuff, but I think it makes alot sense for
hmm_range_fault() to trigger fault on devmap pages that are not compatible
with the caller - so lets just fix the leak on error path for now.

diff --git a/mm/hmm.c b/mm/hmm.c
index 35f85424176d14..9e8f68eb83287a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 		}
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 }
@@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	return 0;
 
 fault:
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep);
 	/* Fault any virtual address we were asked to fault */
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return r;
 		}
 	}
-	if (hmm_vma_walk->pgmap) {
-		/*
-		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-		 * so that we can leverage get_dev_pagemap() optimization which
-		 * will not re-take a reference on a pgmap if we already have
-		 * one.
-		 */
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep - 1);
 
 	hmm_vma_walk->last = addr;
@@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
-		if (hmm_vma_walk->pgmap) {
-			put_dev_pagemap(hmm_vma_walk->pgmap);
-			hmm_vma_walk->pgmap = NULL;
-		}
 		hmm_vma_walk->last = end;
 		goto out_unlock;
 	}
@@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			return -EBUSY;
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
+		/*
+		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+		 * searching in the probably common case that the pgmap is the
+		 * same for the entire requested range.
+		 */
+		if (hmm_vma_walk.pgmap) {
+			put_dev_pagemap(hmm_vma_walk.pgmap);
+			hmm_vma_walk.pgmap = NULL;
+		}
 	} while (ret == -EBUSY);
 
 	if (ret)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

The pgmap is held in the hmm_vma_walk variable in hope of speeding up
future get_dev_pagemap() calls by hitting the same pointer. The algorithm
doesn't actually care about how long the pgmap is held for.

Move the put of the cached pgmap to after the walk is completed and delete
all the other now redundant puts.

This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
hmm_vma_handle_pte() fails while looping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

We talked about just deleting this stuff, but I think it makes alot sense for
hmm_range_fault() to trigger fault on devmap pages that are not compatible
with the caller - so lets just fix the leak on error path for now.

diff --git a/mm/hmm.c b/mm/hmm.c
index 35f85424176d14..9e8f68eb83287a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 		}
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 }
@@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	return 0;
 
 fault:
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep);
 	/* Fault any virtual address we were asked to fault */
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return r;
 		}
 	}
-	if (hmm_vma_walk->pgmap) {
-		/*
-		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-		 * so that we can leverage get_dev_pagemap() optimization which
-		 * will not re-take a reference on a pgmap if we already have
-		 * one.
-		 */
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep - 1);
 
 	hmm_vma_walk->last = addr;
@@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
-		if (hmm_vma_walk->pgmap) {
-			put_dev_pagemap(hmm_vma_walk->pgmap);
-			hmm_vma_walk->pgmap = NULL;
-		}
 		hmm_vma_walk->last = end;
 		goto out_unlock;
 	}
@@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			return -EBUSY;
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
+		/*
+		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+		 * searching in the probably common case that the pgmap is the
+		 * same for the entire requested range.
+		 */
+		if (hmm_vma_walk.pgmap) {
+			put_dev_pagemap(hmm_vma_walk.pgmap);
+			hmm_vma_walk.pgmap = NULL;
+		}
 	} while (ret == -EBUSY);
 
 	if (ret)
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

The pgmap is held in the hmm_vma_walk variable in hope of speeding up
future get_dev_pagemap() calls by hitting the same pointer. The algorithm
doesn't actually care about how long the pgmap is held for.

Move the put of the cached pgmap to after the walk is completed and delete
all the other now redundant puts.

This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
hmm_vma_handle_pte() fails while looping.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 31 +++++++++----------------------
 1 file changed, 9 insertions(+), 22 deletions(-)

We talked about just deleting this stuff, but I think it makes alot sense for
hmm_range_fault() to trigger fault on devmap pages that are not compatible
with the caller - so lets just fix the leak on error path for now.

diff --git a/mm/hmm.c b/mm/hmm.c
index 35f85424176d14..9e8f68eb83287a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 		}
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 }
@@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	return 0;
 
 fault:
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep);
 	/* Fault any virtual address we were asked to fault */
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return r;
 		}
 	}
-	if (hmm_vma_walk->pgmap) {
-		/*
-		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-		 * so that we can leverage get_dev_pagemap() optimization which
-		 * will not re-take a reference on a pgmap if we already have
-		 * one.
-		 */
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep - 1);
 
 	hmm_vma_walk->last = addr;
@@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
-		if (hmm_vma_walk->pgmap) {
-			put_dev_pagemap(hmm_vma_walk->pgmap);
-			hmm_vma_walk->pgmap = NULL;
-		}
 		hmm_vma_walk->last = end;
 		goto out_unlock;
 	}
@@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			return -EBUSY;
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
+		/*
+		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+		 * searching in the probably common case that the pgmap is the
+		 * same for the entire requested range.
+		 */
+		if (hmm_vma_walk.pgmap) {
+			put_dev_pagemap(hmm_vma_walk.pgmap);
+			hmm_vma_walk.pgmap = NULL;
+		}
 	} while (ret == -EBUSY);
 
 	if (ret)
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe, Steven Price

From: Jason Gunthorpe <jgg@mellanox.com>

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 9e8f68eb83287a..32dcbfd3908315 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 
 	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		ret = hmm_vma_walk_hole(start, end, -1, walk);
-		goto out_unlock;
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
 	if (pud_huge(pud) && pud_devmap(pud)) {
@@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		bool fault, write_fault;
 
 		if (!pud_present(pud)) {
-			ret = hmm_vma_walk_hole(start, end, -1, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole(start, end, -1, walk);
 		}
 
 		i = (addr - range->start) >> PAGE_SHIFT;
@@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     cpu_flags, &fault, &write_fault);
 		if (fault || write_fault) {
-			ret = hmm_vma_walk_hole_(addr, end, fault,
-						 write_fault, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
+						  walk);
 		}
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, Steven Price, linux-mm,
	Jason Gunthorpe, dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 9e8f68eb83287a..32dcbfd3908315 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 
 	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		ret = hmm_vma_walk_hole(start, end, -1, walk);
-		goto out_unlock;
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
 	if (pud_huge(pud) && pud_devmap(pud)) {
@@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		bool fault, write_fault;
 
 		if (!pud_present(pud)) {
-			ret = hmm_vma_walk_hole(start, end, -1, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole(start, end, -1, walk);
 		}
 
 		i = (addr - range->start) >> PAGE_SHIFT;
@@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     cpu_flags, &fault, &write_fault);
 		if (fault || write_fault) {
-			ret = hmm_vma_walk_hole_(addr, end, fault,
-						 write_fault, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
+						  walk);
 		}
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, Steven Price, linux-mm,
	Jason Gunthorpe, dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

This eventually calls into handle_mm_fault() which is a sleeping function.
Release the lock first.

hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
need the lock.

Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 9e8f68eb83287a..32dcbfd3908315 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 
 	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		ret = hmm_vma_walk_hole(start, end, -1, walk);
-		goto out_unlock;
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
 	if (pud_huge(pud) && pud_devmap(pud)) {
@@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		bool fault, write_fault;
 
 		if (!pud_present(pud)) {
-			ret = hmm_vma_walk_hole(start, end, -1, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole(start, end, -1, walk);
 		}
 
 		i = (addr - range->start) >> PAGE_SHIFT;
@@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     cpu_flags, &fault, &write_fault);
 		if (fault || write_fault) {
-			ret = hmm_vma_walk_hole_(addr, end, fault,
-						 write_fault, walk);
-			goto out_unlock;
+			spin_unlock(ptl);
+			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
+						  walk);
 		}
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

All success exit paths from the walker functions must set the pfns array.

A migration entry with no required fault is a HMM_PFN_NONE return, just
like the pte case.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 32dcbfd3908315..5f5ccf13dd1e85 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			pmd_migration_entry_wait(walk->mm, pmdp);
 			return -EBUSY;
 		}
-		return 0;
+		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
 	} else if (!pmd_present(pmd))
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

All success exit paths from the walker functions must set the pfns array.

A migration entry with no required fault is a HMM_PFN_NONE return, just
like the pte case.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 32dcbfd3908315..5f5ccf13dd1e85 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			pmd_migration_entry_wait(walk->mm, pmdp);
 			return -EBUSY;
 		}
-		return 0;
+		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
 	} else if (!pmd_present(pmd))
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
 
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

All success exit paths from the walker functions must set the pfns array.

A migration entry with no required fault is a HMM_PFN_NONE return, just
like the pte case.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 32dcbfd3908315..5f5ccf13dd1e85 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			pmd_migration_entry_wait(walk->mm, pmdp);
 			return -EBUSY;
 		}
-		return 0;
+		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
 	} else if (!pmd_present(pmd))
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

All return paths that do EFAULT must call hmm_range_need_fault() to
determine if the user requires this page to be valid.

If the page cannot be made valid if the user later requires it, due to vma
flags in this case, then the return should be HMM_PFN_ERROR.

Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 5f5ccf13dd1e85..e10cd0adba7b37 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
 	struct vm_area_struct *vma = walk->vma;
 
 	/*
-	 * Skip vma ranges that don't have struct page backing them or
-	 * map I/O devices directly.
-	 */
-	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
-		return -EFAULT;
-
-	/*
+	 * Skip vma ranges that don't have struct page backing them or map I/O
+	 * devices directly.
+	 *
 	 * If the vma does not allow read access, then assume that it does not
-	 * allow write access either. HMM does not support architectures
-	 * that allow write without read.
+	 * allow write access either. HMM does not support architectures that
+	 * allow write without read.
 	 */
-	if (!(vma->vm_flags & VM_READ)) {
+	if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
+	    !(vma->vm_flags & VM_READ)) {
 		bool fault, write_fault;
 
 		/*
@@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
 		if (fault || write_fault)
 			return -EFAULT;
 
-		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
+		hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
 		hmm_vma_walk->last = end;
 
 		/* Skip this vma and continue processing the next vma. */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

All return paths that do EFAULT must call hmm_range_need_fault() to
determine if the user requires this page to be valid.

If the page cannot be made valid if the user later requires it, due to vma
flags in this case, then the return should be HMM_PFN_ERROR.

Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 5f5ccf13dd1e85..e10cd0adba7b37 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
 	struct vm_area_struct *vma = walk->vma;
 
 	/*
-	 * Skip vma ranges that don't have struct page backing them or
-	 * map I/O devices directly.
-	 */
-	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
-		return -EFAULT;
-
-	/*
+	 * Skip vma ranges that don't have struct page backing them or map I/O
+	 * devices directly.
+	 *
 	 * If the vma does not allow read access, then assume that it does not
-	 * allow write access either. HMM does not support architectures
-	 * that allow write without read.
+	 * allow write access either. HMM does not support architectures that
+	 * allow write without read.
 	 */
-	if (!(vma->vm_flags & VM_READ)) {
+	if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
+	    !(vma->vm_flags & VM_READ)) {
 		bool fault, write_fault;
 
 		/*
@@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
 		if (fault || write_fault)
 			return -EFAULT;
 
-		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
+		hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
 		hmm_vma_walk->last = end;
 
 		/* Skip this vma and continue processing the next vma. */
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

All return paths that do EFAULT must call hmm_range_need_fault() to
determine if the user requires this page to be valid.

If the page cannot be made valid if the user later requires it, due to vma
flags in this case, then the return should be HMM_PFN_ERROR.

Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 5f5ccf13dd1e85..e10cd0adba7b37 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
 	struct vm_area_struct *vma = walk->vma;
 
 	/*
-	 * Skip vma ranges that don't have struct page backing them or
-	 * map I/O devices directly.
-	 */
-	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
-		return -EFAULT;
-
-	/*
+	 * Skip vma ranges that don't have struct page backing them or map I/O
+	 * devices directly.
+	 *
 	 * If the vma does not allow read access, then assume that it does not
-	 * allow write access either. HMM does not support architectures
-	 * that allow write without read.
+	 * allow write access either. HMM does not support architectures that
+	 * allow write without read.
 	 */
-	if (!(vma->vm_flags & VM_READ)) {
+	if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
+	    !(vma->vm_flags & VM_READ)) {
 		bool fault, write_fault;
 
 		/*
@@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
 		if (fault || write_fault)
 			return -EFAULT;
 
-		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
+		hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
 		hmm_vma_walk->last = end;
 
 		/* Skip this vma and continue processing the next vma. */
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

The intention with this code is to determine if the caller required the
pages to be valid, and if so, then take some action to make them valid.
The action varies depending on the page type.

In all cases, if the caller doesn't ask for the page, then
hmm_range_fault() should not return an error.

Revise the implementation to be clearer, and fix some bugs:

 - hmm_pte_need_fault() must always be called before testing fault or
   write_fault otherwise the defaults of false apply and the if()'s don't
   work. This was missed on the is_migration_entry() branch

 - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
   fault is required - ie snapshotting should not fail.

 - For !pte_present() the cpu_flags are always 0, except in the special
   case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
   confusing.

Reorganize the flow so that it always follows the pattern of calling
hmm_pte_need_fault() and then checking fault || write_fault.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index e10cd0adba7b37..bf676cfef3e8ee 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (!pte_present(pte)) {
 		swp_entry_t entry = pte_to_swp_entry(pte);
 
-		if (!non_swap_entry(entry)) {
-			cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-			hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-					   &fault, &write_fault);
-			if (fault || write_fault)
-				goto fault;
-			return 0;
-		}
-
 		/*
 		 * This is a special swap entry, ignore migration, use
 		 * device and report anything else as error.
@@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			return 0;
 		}
 
-		if (is_migration_entry(entry)) {
-			if (fault || write_fault) {
-				pte_unmap(ptep);
-				hmm_vma_walk->last = addr;
-				migration_entry_wait(walk->mm, pmdp, addr);
-				return -EBUSY;
-			}
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (!fault && !write_fault)
 			return 0;
+
+		if (!non_swap_entry(entry))
+			goto fault;
+
+		if (is_migration_entry(entry)) {
+			pte_unmap(ptep);
+			hmm_vma_walk->last = addr;
+			migration_entry_wait(walk->mm, pmdp, addr);
+			return -EBUSY;
 		}
 
 		/* Report error for everything else */
 		pte_unmap(ptep);
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
-	} else {
-		cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-				   &fault, &write_fault);
 	}
 
+	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
+	hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, &fault,
+			   &write_fault);
 	if (fault || write_fault)
 		goto fault;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

The intention with this code is to determine if the caller required the
pages to be valid, and if so, then take some action to make them valid.
The action varies depending on the page type.

In all cases, if the caller doesn't ask for the page, then
hmm_range_fault() should not return an error.

Revise the implementation to be clearer, and fix some bugs:

 - hmm_pte_need_fault() must always be called before testing fault or
   write_fault otherwise the defaults of false apply and the if()'s don't
   work. This was missed on the is_migration_entry() branch

 - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
   fault is required - ie snapshotting should not fail.

 - For !pte_present() the cpu_flags are always 0, except in the special
   case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
   confusing.

Reorganize the flow so that it always follows the pattern of calling
hmm_pte_need_fault() and then checking fault || write_fault.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index e10cd0adba7b37..bf676cfef3e8ee 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (!pte_present(pte)) {
 		swp_entry_t entry = pte_to_swp_entry(pte);
 
-		if (!non_swap_entry(entry)) {
-			cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-			hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-					   &fault, &write_fault);
-			if (fault || write_fault)
-				goto fault;
-			return 0;
-		}
-
 		/*
 		 * This is a special swap entry, ignore migration, use
 		 * device and report anything else as error.
@@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			return 0;
 		}
 
-		if (is_migration_entry(entry)) {
-			if (fault || write_fault) {
-				pte_unmap(ptep);
-				hmm_vma_walk->last = addr;
-				migration_entry_wait(walk->mm, pmdp, addr);
-				return -EBUSY;
-			}
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (!fault && !write_fault)
 			return 0;
+
+		if (!non_swap_entry(entry))
+			goto fault;
+
+		if (is_migration_entry(entry)) {
+			pte_unmap(ptep);
+			hmm_vma_walk->last = addr;
+			migration_entry_wait(walk->mm, pmdp, addr);
+			return -EBUSY;
 		}
 
 		/* Report error for everything else */
 		pte_unmap(ptep);
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
-	} else {
-		cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-				   &fault, &write_fault);
 	}
 
+	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
+	hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, &fault,
+			   &write_fault);
 	if (fault || write_fault)
 		goto fault;
 
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

The intention with this code is to determine if the caller required the
pages to be valid, and if so, then take some action to make them valid.
The action varies depending on the page type.

In all cases, if the caller doesn't ask for the page, then
hmm_range_fault() should not return an error.

Revise the implementation to be clearer, and fix some bugs:

 - hmm_pte_need_fault() must always be called before testing fault or
   write_fault otherwise the defaults of false apply and the if()'s don't
   work. This was missed on the is_migration_entry() branch

 - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
   fault is required - ie snapshotting should not fail.

 - For !pte_present() the cpu_flags are always 0, except in the special
   case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
   confusing.

Reorganize the flow so that it always follows the pattern of calling
hmm_pte_need_fault() and then checking fault || write_fault.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 35 +++++++++++++++--------------------
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index e10cd0adba7b37..bf676cfef3e8ee 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (!pte_present(pte)) {
 		swp_entry_t entry = pte_to_swp_entry(pte);
 
-		if (!non_swap_entry(entry)) {
-			cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-			hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-					   &fault, &write_fault);
-			if (fault || write_fault)
-				goto fault;
-			return 0;
-		}
-
 		/*
 		 * This is a special swap entry, ignore migration, use
 		 * device and report anything else as error.
@@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			return 0;
 		}
 
-		if (is_migration_entry(entry)) {
-			if (fault || write_fault) {
-				pte_unmap(ptep);
-				hmm_vma_walk->last = addr;
-				migration_entry_wait(walk->mm, pmdp, addr);
-				return -EBUSY;
-			}
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (!fault && !write_fault)
 			return 0;
+
+		if (!non_swap_entry(entry))
+			goto fault;
+
+		if (is_migration_entry(entry)) {
+			pte_unmap(ptep);
+			hmm_vma_walk->last = addr;
+			migration_entry_wait(walk->mm, pmdp, addr);
+			return -EBUSY;
 		}
 
 		/* Report error for everything else */
 		pte_unmap(ptep);
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;
-	} else {
-		cpu_flags = pte_to_hmm_pfn_flags(range, pte);
-		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
-				   &fault, &write_fault);
 	}
 
+	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
+	hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, &fault,
+			   &write_fault);
 	if (fault || write_fault)
 		goto fault;
 
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

hmm_range_fault() should never return 0 if the caller requested a valid
page, but the pfns output for that page would be HMM_PFN_ERROR.

hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
detect if the page is in faulting mode or not.

Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
code.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index bf676cfef3e8ee..f61fddf2ef6505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	uint64_t *pfns = range->pfns;
-	unsigned long addr = start, i;
+	uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+	unsigned long npages = (end - start) >> PAGE_SHIFT;
+	unsigned long addr = start;
+	bool fault, write_fault;
 	pte_t *ptep;
 	pmd_t pmd;
 
@@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
-		bool fault, write_fault;
-		unsigned long npages;
-		uint64_t *pfns;
-
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
-
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     0, &fault, &write_fault);
 		if (fault || write_fault) {
@@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return -EBUSY;
 		}
 		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
-	} else if (!pmd_present(pmd))
+	}
+
+	if (!pmd_present(pmd)) {
+		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
+				     &write_fault);
+		if (fault || write_fault)
+			return -EFAULT;
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+	}
 
 	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
 		/*
@@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
 			goto again;
 
-		i = (addr - range->start) >> PAGE_SHIFT;
-		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
+		return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
 	}
 
 	/*
@@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	 * entry pointing to pte directory or it is a bad pmd that will not
 	 * recover.
 	 */
-	if (pmd_bad(pmd))
+	if (pmd_bad(pmd)) {
+		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
+				     &write_fault);
+		if (fault || write_fault)
+			return -EFAULT;
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+	}
 
 	ptep = pte_offset_map(pmdp, addr);
-	i = (addr - range->start) >> PAGE_SHIFT;
-	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
+	for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
 		int r;
 
-		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
 		if (r) {
 			/* hmm_vma_handle_pte() did pte_unmap() */
 			hmm_vma_walk->last = addr;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

hmm_range_fault() should never return 0 if the caller requested a valid
page, but the pfns output for that page would be HMM_PFN_ERROR.

hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
detect if the page is in faulting mode or not.

Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
code.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index bf676cfef3e8ee..f61fddf2ef6505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	uint64_t *pfns = range->pfns;
-	unsigned long addr = start, i;
+	uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+	unsigned long npages = (end - start) >> PAGE_SHIFT;
+	unsigned long addr = start;
+	bool fault, write_fault;
 	pte_t *ptep;
 	pmd_t pmd;
 
@@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
-		bool fault, write_fault;
-		unsigned long npages;
-		uint64_t *pfns;
-
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
-
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     0, &fault, &write_fault);
 		if (fault || write_fault) {
@@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return -EBUSY;
 		}
 		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
-	} else if (!pmd_present(pmd))
+	}
+
+	if (!pmd_present(pmd)) {
+		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
+				     &write_fault);
+		if (fault || write_fault)
+			return -EFAULT;
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+	}
 
 	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
 		/*
@@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
 			goto again;
 
-		i = (addr - range->start) >> PAGE_SHIFT;
-		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
+		return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
 	}
 
 	/*
@@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	 * entry pointing to pte directory or it is a bad pmd that will not
 	 * recover.
 	 */
-	if (pmd_bad(pmd))
+	if (pmd_bad(pmd)) {
+		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
+				     &write_fault);
+		if (fault || write_fault)
+			return -EFAULT;
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+	}
 
 	ptep = pte_offset_map(pmdp, addr);
-	i = (addr - range->start) >> PAGE_SHIFT;
-	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
+	for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
 		int r;
 
-		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
 		if (r) {
 			/* hmm_vma_handle_pte() did pte_unmap() */
 			hmm_vma_walk->last = addr;
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

hmm_range_fault() should never return 0 if the caller requested a valid
page, but the pfns output for that page would be HMM_PFN_ERROR.

hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
detect if the page is in faulting mode or not.

Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
code.

Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index bf676cfef3e8ee..f61fddf2ef6505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	uint64_t *pfns = range->pfns;
-	unsigned long addr = start, i;
+	uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+	unsigned long npages = (end - start) >> PAGE_SHIFT;
+	unsigned long addr = start;
+	bool fault, write_fault;
 	pte_t *ptep;
 	pmd_t pmd;
 
@@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
-		bool fault, write_fault;
-		unsigned long npages;
-		uint64_t *pfns;
-
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
-
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
 				     0, &fault, &write_fault);
 		if (fault || write_fault) {
@@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return -EBUSY;
 		}
 		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
-	} else if (!pmd_present(pmd))
+	}
+
+	if (!pmd_present(pmd)) {
+		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
+				     &write_fault);
+		if (fault || write_fault)
+			return -EFAULT;
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+	}
 
 	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
 		/*
@@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
 			goto again;
 
-		i = (addr - range->start) >> PAGE_SHIFT;
-		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
+		return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
 	}
 
 	/*
@@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	 * entry pointing to pte directory or it is a bad pmd that will not
 	 * recover.
 	 */
-	if (pmd_bad(pmd))
+	if (pmd_bad(pmd)) {
+		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
+				     &write_fault);
+		if (fault || write_fault)
+			return -EFAULT;
 		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
+	}
 
 	ptep = pte_offset_map(pmdp, addr);
-	i = (addr - range->start) >> PAGE_SHIFT;
-	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
+	for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
 		int r;
 
-		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
 		if (r) {
 			/* hmm_vma_handle_pte() did pte_unmap() */
 			hmm_vma_walk->last = addr;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-11 18:35   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

From: Jason Gunthorpe <jgg@mellanox.com>

Currently if a special PTE is encountered hmm_range_fault() immediately
returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
uses).

EFAULT should only be returned after testing with hmm_pte_need_fault().

Also pte_devmap() and pte_special() are exclusive, and there is no need to
check IS_ENABLED, pte_special() is stubbed out to return false on
unsupported architectures.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f61fddf2ef6505..ca33d086bdc190 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			pte_unmap(ptep);
 			return -EBUSY;
 		}
-	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
-		if (!is_zero_pfn(pte_pfn(pte))) {
+	}
+
+	/*
+	 * Since each architecture defines a struct page for the zero page, just
+	 * fall through and treat it like a normal page.
+	 */
+	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (fault || write_fault) {
 			pte_unmap(ptep);
-			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
-		/*
-		 * Since each architecture defines a struct page for the zero
-		 * page, just fall through and treat it like a normal page.
-		 */
+		*pfn = range->values[HMM_PFN_SPECIAL];
+		return 0;
 	}
 
 	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

Currently if a special PTE is encountered hmm_range_fault() immediately
returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
uses).

EFAULT should only be returned after testing with hmm_pte_need_fault().

Also pte_devmap() and pte_special() are exclusive, and there is no need to
check IS_ENABLED, pte_special() is stubbed out to return false on
unsupported architectures.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f61fddf2ef6505..ca33d086bdc190 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			pte_unmap(ptep);
 			return -EBUSY;
 		}
-	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
-		if (!is_zero_pfn(pte_pfn(pte))) {
+	}
+
+	/*
+	 * Since each architecture defines a struct page for the zero page, just
+	 * fall through and treat it like a normal page.
+	 */
+	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (fault || write_fault) {
 			pte_unmap(ptep);
-			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
-		/*
-		 * Since each architecture defines a struct page for the zero
-		 * page, just fall through and treat it like a normal page.
-		 */
+		*pfn = range->values[HMM_PFN_SPECIAL];
+		return 0;
 	}
 
 	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-11 18:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-11 18:35 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

From: Jason Gunthorpe <jgg@mellanox.com>

Currently if a special PTE is encountered hmm_range_fault() immediately
returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
uses).

EFAULT should only be returned after testing with hmm_pte_need_fault().

Also pte_devmap() and pte_special() are exclusive, and there is no need to
check IS_ENABLED, pte_special() is stubbed out to return false on
unsupported architectures.

Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f61fddf2ef6505..ca33d086bdc190 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			pte_unmap(ptep);
 			return -EBUSY;
 		}
-	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
-		if (!is_zero_pfn(pte_pfn(pte))) {
+	}
+
+	/*
+	 * Since each architecture defines a struct page for the zero page, just
+	 * fall through and treat it like a normal page.
+	 */
+	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
+		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
+				   &write_fault);
+		if (fault || write_fault) {
 			pte_unmap(ptep);
-			*pfn = range->values[HMM_PFN_SPECIAL];
 			return -EFAULT;
 		}
-		/*
-		 * Since each architecture defines a struct page for the zero
-		 * page, just fall through and treat it like a normal page.
-		 */
+		*pfn = range->values[HMM_PFN_SPECIAL];
+		return 0;
 	}
 
 	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  2020-03-11 18:34   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:28     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:34 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Many of the direct returns of error skipped doing the pte_unmap(). All non
> zero exit paths must unmap the pte.
> 
> The pte_unmap() is split unnaturally like this because some of the error
> exit paths trigger a sleep and must release the lock before sleeping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

The changes look OK to me but one issue noted below.
In any case, you can add:
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 72e5a6d9a41756..35f85424176d14 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   		}
>   
>   		/* Report error for everything else */
> +		pte_unmap(ptep);
>   		*pfn = range->values[HMM_PFN_ERROR];
>   		return -EFAULT;
>   	} else {
> @@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	if (pte_devmap(pte)) {
>   		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
>   					      hmm_vma_walk->pgmap);
> -		if (unlikely(!hmm_vma_walk->pgmap))
> +		if (unlikely(!hmm_vma_walk->pgmap)) {
> +			pte_unmap(ptep);
>   			return -EBUSY;
> +		}
>   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
>   		if (!is_zero_pfn(pte_pfn(pte))) {
> +			pte_unmap(ptep);
>   			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> @@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   
>   		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
>   		if (r) {
> -			/* hmm_vma_handle_pte() did unmap pte directory */
> +			/* hmm_vma_handle_pte() did pte_unmap() */
>   			hmm_vma_walk->last = addr;
>   			return r;
>   		}
> 

I think there is a case where hmm_vma_handle_pte() is called, a fault is requested,
pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the fault
was handled OK), but now the page table is unmapped for successive loop iterations
and pte_unmap(ptep - 1) is called at the loop end.
Since this isn't an issue on x86_64 but is on x86_32, I can see how it could be missed.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-12  1:28     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:34 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Many of the direct returns of error skipped doing the pte_unmap(). All non
> zero exit paths must unmap the pte.
> 
> The pte_unmap() is split unnaturally like this because some of the error
> exit paths trigger a sleep and must release the lock before sleeping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

The changes look OK to me but one issue noted below.
In any case, you can add:
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 72e5a6d9a41756..35f85424176d14 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   		}
>   
>   		/* Report error for everything else */
> +		pte_unmap(ptep);
>   		*pfn = range->values[HMM_PFN_ERROR];
>   		return -EFAULT;
>   	} else {
> @@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	if (pte_devmap(pte)) {
>   		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
>   					      hmm_vma_walk->pgmap);
> -		if (unlikely(!hmm_vma_walk->pgmap))
> +		if (unlikely(!hmm_vma_walk->pgmap)) {
> +			pte_unmap(ptep);
>   			return -EBUSY;
> +		}
>   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
>   		if (!is_zero_pfn(pte_pfn(pte))) {
> +			pte_unmap(ptep);
>   			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> @@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   
>   		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
>   		if (r) {
> -			/* hmm_vma_handle_pte() did unmap pte directory */
> +			/* hmm_vma_handle_pte() did pte_unmap() */
>   			hmm_vma_walk->last = addr;
>   			return r;
>   		}
> 

I think there is a case where hmm_vma_handle_pte() is called, a fault is requested,
pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the fault
was handled OK), but now the page table is unmapped for successive loop iterations
and pte_unmap(ptep - 1) is called at the loop end.
Since this isn't an issue on x86_64 but is on x86_32, I can see how it could be missed.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-12  1:28     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:34 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Many of the direct returns of error skipped doing the pte_unmap(). All non
> zero exit paths must unmap the pte.
> 
> The pte_unmap() is split unnaturally like this because some of the error
> exit paths trigger a sleep and must release the lock before sleeping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

The changes look OK to me but one issue noted below.
In any case, you can add:
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 72e5a6d9a41756..35f85424176d14 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   		}
>   
>   		/* Report error for everything else */
> +		pte_unmap(ptep);
>   		*pfn = range->values[HMM_PFN_ERROR];
>   		return -EFAULT;
>   	} else {
> @@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	if (pte_devmap(pte)) {
>   		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
>   					      hmm_vma_walk->pgmap);
> -		if (unlikely(!hmm_vma_walk->pgmap))
> +		if (unlikely(!hmm_vma_walk->pgmap)) {
> +			pte_unmap(ptep);
>   			return -EBUSY;
> +		}
>   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
>   		if (!is_zero_pfn(pte_pfn(pte))) {
> +			pte_unmap(ptep);
>   			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> @@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   
>   		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
>   		if (r) {
> -			/* hmm_vma_handle_pte() did unmap pte directory */
> +			/* hmm_vma_handle_pte() did pte_unmap() */
>   			hmm_vma_walk->last = addr;
>   			return r;
>   		}
> 

I think there is a case where hmm_vma_handle_pte() is called, a fault is requested,
pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the fault
was handled OK), but now the page table is unmapped for successive loop iterations
and pte_unmap(ptep - 1) is called at the loop end.
Since this isn't an issue on x86_64 but is on x86_32, I can see how it could be missed.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:29     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The pgmap is held in the hmm_vma_walk variable in hope of speeding up
> future get_dev_pagemap() calls by hitting the same pointer. The algorithm
> doesn't actually care about how long the pgmap is held for.
> 
> Move the put of the cached pgmap to after the walk is completed and delete
> all the other now redundant puts.
> 
> This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
> hmm_vma_handle_pte() fails while looping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 31 +++++++++----------------------
>   1 file changed, 9 insertions(+), 22 deletions(-)
> 
> We talked about just deleting this stuff, but I think it makes alot sense for
> hmm_range_fault() to trigger fault on devmap pages that are not compatible
> with the caller - so lets just fix the leak on error path for now.
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 35f85424176d14..9e8f68eb83287a 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
>   		}
>   		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
>   	}
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	hmm_vma_walk->last = end;
>   	return 0;
>   }
> @@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	return 0;
>   
>   fault:
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	pte_unmap(ptep);
>   	/* Fault any virtual address we were asked to fault */
>   	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
> @@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			return r;
>   		}
>   	}
> -	if (hmm_vma_walk->pgmap) {
> -		/*
> -		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
> -		 * so that we can leverage get_dev_pagemap() optimization which
> -		 * will not re-take a reference on a pgmap if we already have
> -		 * one.
> -		 */
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	pte_unmap(ptep - 1);
>   
>   	hmm_vma_walk->last = addr;
> @@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
>   				  cpu_flags;
>   		}
> -		if (hmm_vma_walk->pgmap) {
> -			put_dev_pagemap(hmm_vma_walk->pgmap);
> -			hmm_vma_walk->pgmap = NULL;
> -		}
>   		hmm_vma_walk->last = end;
>   		goto out_unlock;
>   	}
> @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>   			return -EBUSY;
>   		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
>   				      &hmm_walk_ops, &hmm_vma_walk);
> +		/*
> +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> +		 * searching in the probably common case that the pgmap is the
> +		 * same for the entire requested range.
> +		 */
> +		if (hmm_vma_walk.pgmap) {
> +			put_dev_pagemap(hmm_vma_walk.pgmap);
> +			hmm_vma_walk.pgmap = NULL;
> +		}
>   	} while (ret == -EBUSY);
>   
>   	if (ret)
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-12  1:29     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The pgmap is held in the hmm_vma_walk variable in hope of speeding up
> future get_dev_pagemap() calls by hitting the same pointer. The algorithm
> doesn't actually care about how long the pgmap is held for.
> 
> Move the put of the cached pgmap to after the walk is completed and delete
> all the other now redundant puts.
> 
> This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
> hmm_vma_handle_pte() fails while looping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 31 +++++++++----------------------
>   1 file changed, 9 insertions(+), 22 deletions(-)
> 
> We talked about just deleting this stuff, but I think it makes alot sense for
> hmm_range_fault() to trigger fault on devmap pages that are not compatible
> with the caller - so lets just fix the leak on error path for now.
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 35f85424176d14..9e8f68eb83287a 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
>   		}
>   		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
>   	}
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	hmm_vma_walk->last = end;
>   	return 0;
>   }
> @@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	return 0;
>   
>   fault:
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	pte_unmap(ptep);
>   	/* Fault any virtual address we were asked to fault */
>   	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
> @@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			return r;
>   		}
>   	}
> -	if (hmm_vma_walk->pgmap) {
> -		/*
> -		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
> -		 * so that we can leverage get_dev_pagemap() optimization which
> -		 * will not re-take a reference on a pgmap if we already have
> -		 * one.
> -		 */
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	pte_unmap(ptep - 1);
>   
>   	hmm_vma_walk->last = addr;
> @@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
>   				  cpu_flags;
>   		}
> -		if (hmm_vma_walk->pgmap) {
> -			put_dev_pagemap(hmm_vma_walk->pgmap);
> -			hmm_vma_walk->pgmap = NULL;
> -		}
>   		hmm_vma_walk->last = end;
>   		goto out_unlock;
>   	}
> @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>   			return -EBUSY;
>   		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
>   				      &hmm_walk_ops, &hmm_vma_walk);
> +		/*
> +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> +		 * searching in the probably common case that the pgmap is the
> +		 * same for the entire requested range.
> +		 */
> +		if (hmm_vma_walk.pgmap) {
> +			put_dev_pagemap(hmm_vma_walk.pgmap);
> +			hmm_vma_walk.pgmap = NULL;
> +		}
>   	} while (ret == -EBUSY);
>   
>   	if (ret)
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-12  1:29     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The pgmap is held in the hmm_vma_walk variable in hope of speeding up
> future get_dev_pagemap() calls by hitting the same pointer. The algorithm
> doesn't actually care about how long the pgmap is held for.
> 
> Move the put of the cached pgmap to after the walk is completed and delete
> all the other now redundant puts.
> 
> This solves a possible leak of the reference in hmm_vma_walk_pmd() if a
> hmm_vma_handle_pte() fails while looping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 31 +++++++++----------------------
>   1 file changed, 9 insertions(+), 22 deletions(-)
> 
> We talked about just deleting this stuff, but I think it makes alot sense for
> hmm_range_fault() to trigger fault on devmap pages that are not compatible
> with the caller - so lets just fix the leak on error path for now.
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 35f85424176d14..9e8f68eb83287a 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -239,10 +239,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
>   		}
>   		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
>   	}
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	hmm_vma_walk->last = end;
>   	return 0;
>   }
> @@ -360,10 +356,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	return 0;
>   
>   fault:
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	pte_unmap(ptep);
>   	/* Fault any virtual address we were asked to fault */
>   	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
> @@ -446,16 +438,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			return r;
>   		}
>   	}
> -	if (hmm_vma_walk->pgmap) {
> -		/*
> -		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
> -		 * so that we can leverage get_dev_pagemap() optimization which
> -		 * will not re-take a reference on a pgmap if we already have
> -		 * one.
> -		 */
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;
> -	}
>   	pte_unmap(ptep - 1);
>   
>   	hmm_vma_walk->last = addr;
> @@ -529,10 +511,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
>   				  cpu_flags;
>   		}
> -		if (hmm_vma_walk->pgmap) {
> -			put_dev_pagemap(hmm_vma_walk->pgmap);
> -			hmm_vma_walk->pgmap = NULL;
> -		}
>   		hmm_vma_walk->last = end;
>   		goto out_unlock;
>   	}
> @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>   			return -EBUSY;
>   		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
>   				      &hmm_walk_ops, &hmm_vma_walk);
> +		/*
> +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> +		 * searching in the probably common case that the pgmap is the
> +		 * same for the entire requested range.
> +		 */
> +		if (hmm_vma_walk.pgmap) {
> +			put_dev_pagemap(hmm_vma_walk.pgmap);
> +			hmm_vma_walk.pgmap = NULL;
> +		}
>   	} while (ret == -EBUSY);
>   
>   	if (ret)
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:31     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe, Steven Price


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-12  1:31     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, Steven Price, linux-mm,
	Jason Gunthorpe, dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-12  1:31     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:31 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, Steven Price, linux-mm,
	Jason Gunthorpe, dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:33     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All success exit paths from the walker functions must set the pfns array.
> 
> A migration entry with no required fault is a HMM_PFN_NONE return, just
> like the pte case.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 32dcbfd3908315..5f5ccf13dd1e85 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			pmd_migration_entry_wait(walk->mm, pmdp);
>   			return -EBUSY;
>   		}
> -		return 0;
> +		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
>   	} else if (!pmd_present(pmd))
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
>   
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
@ 2020-03-12  1:33     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All success exit paths from the walker functions must set the pfns array.
> 
> A migration entry with no required fault is a HMM_PFN_NONE return, just
> like the pte case.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 32dcbfd3908315..5f5ccf13dd1e85 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			pmd_migration_entry_wait(walk->mm, pmdp);
>   			return -EBUSY;
>   		}
> -		return 0;
> +		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
>   	} else if (!pmd_present(pmd))
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
>   
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
@ 2020-03-12  1:33     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All success exit paths from the walker functions must set the pfns array.
> 
> A migration entry with no required fault is a HMM_PFN_NONE return, just
> like the pte case.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 32dcbfd3908315..5f5ccf13dd1e85 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -394,7 +394,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			pmd_migration_entry_wait(walk->mm, pmdp);
>   			return -EBUSY;
>   		}
> -		return 0;
> +		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
>   	} else if (!pmd_present(pmd))
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
>   
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:34     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All return paths that do EFAULT must call hmm_range_need_fault() to
> determine if the user requires this page to be valid.
> 
> If the page cannot be made valid if the user later requires it, due to vma
> flags in this case, then the return should be HMM_PFN_ERROR.
> 
> Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 5f5ccf13dd1e85..e10cd0adba7b37 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
>   	struct vm_area_struct *vma = walk->vma;
>   
>   	/*
> -	 * Skip vma ranges that don't have struct page backing them or
> -	 * map I/O devices directly.
> -	 */
> -	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
> -		return -EFAULT;
> -
> -	/*
> +	 * Skip vma ranges that don't have struct page backing them or map I/O
> +	 * devices directly.
> +	 *
>   	 * If the vma does not allow read access, then assume that it does not
> -	 * allow write access either. HMM does not support architectures
> -	 * that allow write without read.
> +	 * allow write access either. HMM does not support architectures that
> +	 * allow write without read.
>   	 */
> -	if (!(vma->vm_flags & VM_READ)) {
> +	if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
> +	    !(vma->vm_flags & VM_READ)) {
>   		bool fault, write_fault;
>   
>   		/*
> @@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
>   		if (fault || write_fault)
>   			return -EFAULT;
>   
> -		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> +		hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
>   		hmm_vma_walk->last = end;
>   
>   		/* Skip this vma and continue processing the next vma. */
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
@ 2020-03-12  1:34     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All return paths that do EFAULT must call hmm_range_need_fault() to
> determine if the user requires this page to be valid.
> 
> If the page cannot be made valid if the user later requires it, due to vma
> flags in this case, then the return should be HMM_PFN_ERROR.
> 
> Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 5f5ccf13dd1e85..e10cd0adba7b37 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
>   	struct vm_area_struct *vma = walk->vma;
>   
>   	/*
> -	 * Skip vma ranges that don't have struct page backing them or
> -	 * map I/O devices directly.
> -	 */
> -	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
> -		return -EFAULT;
> -
> -	/*
> +	 * Skip vma ranges that don't have struct page backing them or map I/O
> +	 * devices directly.
> +	 *
>   	 * If the vma does not allow read access, then assume that it does not
> -	 * allow write access either. HMM does not support architectures
> -	 * that allow write without read.
> +	 * allow write access either. HMM does not support architectures that
> +	 * allow write without read.
>   	 */
> -	if (!(vma->vm_flags & VM_READ)) {
> +	if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
> +	    !(vma->vm_flags & VM_READ)) {
>   		bool fault, write_fault;
>   
>   		/*
> @@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
>   		if (fault || write_fault)
>   			return -EFAULT;
>   
> -		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> +		hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
>   		hmm_vma_walk->last = end;
>   
>   		/* Skip this vma and continue processing the next vma. */
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
@ 2020-03-12  1:34     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All return paths that do EFAULT must call hmm_range_need_fault() to
> determine if the user requires this page to be valid.
> 
> If the page cannot be made valid if the user later requires it, due to vma
> flags in this case, then the return should be HMM_PFN_ERROR.
> 
> Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 5f5ccf13dd1e85..e10cd0adba7b37 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -582,18 +582,15 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
>   	struct vm_area_struct *vma = walk->vma;
>   
>   	/*
> -	 * Skip vma ranges that don't have struct page backing them or
> -	 * map I/O devices directly.
> -	 */
> -	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
> -		return -EFAULT;
> -
> -	/*
> +	 * Skip vma ranges that don't have struct page backing them or map I/O
> +	 * devices directly.
> +	 *
>   	 * If the vma does not allow read access, then assume that it does not
> -	 * allow write access either. HMM does not support architectures
> -	 * that allow write without read.
> +	 * allow write access either. HMM does not support architectures that
> +	 * allow write without read.
>   	 */
> -	if (!(vma->vm_flags & VM_READ)) {
> +	if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
> +	    !(vma->vm_flags & VM_READ)) {
>   		bool fault, write_fault;
>   
>   		/*
> @@ -607,7 +604,7 @@ static int hmm_vma_walk_test(unsigned long start, unsigned long end,
>   		if (fault || write_fault)
>   			return -EFAULT;
>   
> -		hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> +		hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
>   		hmm_vma_walk->last = end;
>   
>   		/* Skip this vma and continue processing the next vma. */
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:36     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The intention with this code is to determine if the caller required the
> pages to be valid, and if so, then take some action to make them valid.
> The action varies depending on the page type.
> 
> In all cases, if the caller doesn't ask for the page, then
> hmm_range_fault() should not return an error.
> 
> Revise the implementation to be clearer, and fix some bugs:
> 
>   - hmm_pte_need_fault() must always be called before testing fault or
>     write_fault otherwise the defaults of false apply and the if()'s don't
>     work. This was missed on the is_migration_entry() branch
> 
>   - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
>     fault is required - ie snapshotting should not fail.
> 
>   - For !pte_present() the cpu_flags are always 0, except in the special
>     case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
>     confusing.
> 
> Reorganize the flow so that it always follows the pattern of calling
> hmm_pte_need_fault() and then checking fault || write_fault.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 35 +++++++++++++++--------------------
>   1 file changed, 15 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e10cd0adba7b37..bf676cfef3e8ee 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	if (!pte_present(pte)) {
>   		swp_entry_t entry = pte_to_swp_entry(pte);
>   
> -		if (!non_swap_entry(entry)) {
> -			cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> -			hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
> -					   &fault, &write_fault);
> -			if (fault || write_fault)
> -				goto fault;
> -			return 0;
> -		}
> -
>   		/*
>   		 * This is a special swap entry, ignore migration, use
>   		 * device and report anything else as error.
> @@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			return 0;
>   		}
>   
> -		if (is_migration_entry(entry)) {
> -			if (fault || write_fault) {
> -				pte_unmap(ptep);
> -				hmm_vma_walk->last = addr;
> -				migration_entry_wait(walk->mm, pmdp, addr);
> -				return -EBUSY;
> -			}
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (!fault && !write_fault)
>   			return 0;
> +
> +		if (!non_swap_entry(entry))
> +			goto fault;
> +
> +		if (is_migration_entry(entry)) {
> +			pte_unmap(ptep);
> +			hmm_vma_walk->last = addr;
> +			migration_entry_wait(walk->mm, pmdp, addr);
> +			return -EBUSY;
>   		}
>   
>   		/* Report error for everything else */
>   		pte_unmap(ptep);
>   		*pfn = range->values[HMM_PFN_ERROR];
>   		return -EFAULT;
> -	} else {
> -		cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> -		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
> -				   &fault, &write_fault);
>   	}
>   
> +	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> +	hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, &fault,
> +			   &write_fault);
>   	if (fault || write_fault)
>   		goto fault;
>   
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
@ 2020-03-12  1:36     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The intention with this code is to determine if the caller required the
> pages to be valid, and if so, then take some action to make them valid.
> The action varies depending on the page type.
> 
> In all cases, if the caller doesn't ask for the page, then
> hmm_range_fault() should not return an error.
> 
> Revise the implementation to be clearer, and fix some bugs:
> 
>   - hmm_pte_need_fault() must always be called before testing fault or
>     write_fault otherwise the defaults of false apply and the if()'s don't
>     work. This was missed on the is_migration_entry() branch
> 
>   - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
>     fault is required - ie snapshotting should not fail.
> 
>   - For !pte_present() the cpu_flags are always 0, except in the special
>     case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
>     confusing.
> 
> Reorganize the flow so that it always follows the pattern of calling
> hmm_pte_need_fault() and then checking fault || write_fault.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 35 +++++++++++++++--------------------
>   1 file changed, 15 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e10cd0adba7b37..bf676cfef3e8ee 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	if (!pte_present(pte)) {
>   		swp_entry_t entry = pte_to_swp_entry(pte);
>   
> -		if (!non_swap_entry(entry)) {
> -			cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> -			hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
> -					   &fault, &write_fault);
> -			if (fault || write_fault)
> -				goto fault;
> -			return 0;
> -		}
> -
>   		/*
>   		 * This is a special swap entry, ignore migration, use
>   		 * device and report anything else as error.
> @@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			return 0;
>   		}
>   
> -		if (is_migration_entry(entry)) {
> -			if (fault || write_fault) {
> -				pte_unmap(ptep);
> -				hmm_vma_walk->last = addr;
> -				migration_entry_wait(walk->mm, pmdp, addr);
> -				return -EBUSY;
> -			}
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (!fault && !write_fault)
>   			return 0;
> +
> +		if (!non_swap_entry(entry))
> +			goto fault;
> +
> +		if (is_migration_entry(entry)) {
> +			pte_unmap(ptep);
> +			hmm_vma_walk->last = addr;
> +			migration_entry_wait(walk->mm, pmdp, addr);
> +			return -EBUSY;
>   		}
>   
>   		/* Report error for everything else */
>   		pte_unmap(ptep);
>   		*pfn = range->values[HMM_PFN_ERROR];
>   		return -EFAULT;
> -	} else {
> -		cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> -		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
> -				   &fault, &write_fault);
>   	}
>   
> +	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> +	hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, &fault,
> +			   &write_fault);
>   	if (fault || write_fault)
>   		goto fault;
>   
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
@ 2020-03-12  1:36     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The intention with this code is to determine if the caller required the
> pages to be valid, and if so, then take some action to make them valid.
> The action varies depending on the page type.
> 
> In all cases, if the caller doesn't ask for the page, then
> hmm_range_fault() should not return an error.
> 
> Revise the implementation to be clearer, and fix some bugs:
> 
>   - hmm_pte_need_fault() must always be called before testing fault or
>     write_fault otherwise the defaults of false apply and the if()'s don't
>     work. This was missed on the is_migration_entry() branch
> 
>   - -EFAULT should not be returned unless hmm_pte_need_fault() indicates
>     fault is required - ie snapshotting should not fail.
> 
>   - For !pte_present() the cpu_flags are always 0, except in the special
>     case of is_device_private_entry(), calling pte_to_hmm_pfn_flags() is
>     confusing.
> 
> Reorganize the flow so that it always follows the pattern of calling
> hmm_pte_need_fault() and then checking fault || write_fault.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 35 +++++++++++++++--------------------
>   1 file changed, 15 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e10cd0adba7b37..bf676cfef3e8ee 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -282,15 +282,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   	if (!pte_present(pte)) {
>   		swp_entry_t entry = pte_to_swp_entry(pte);
>   
> -		if (!non_swap_entry(entry)) {
> -			cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> -			hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
> -					   &fault, &write_fault);
> -			if (fault || write_fault)
> -				goto fault;
> -			return 0;
> -		}
> -
>   		/*
>   		 * This is a special swap entry, ignore migration, use
>   		 * device and report anything else as error.
> @@ -310,26 +301,30 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			return 0;
>   		}
>   
> -		if (is_migration_entry(entry)) {
> -			if (fault || write_fault) {
> -				pte_unmap(ptep);
> -				hmm_vma_walk->last = addr;
> -				migration_entry_wait(walk->mm, pmdp, addr);
> -				return -EBUSY;
> -			}
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (!fault && !write_fault)
>   			return 0;
> +
> +		if (!non_swap_entry(entry))
> +			goto fault;
> +
> +		if (is_migration_entry(entry)) {
> +			pte_unmap(ptep);
> +			hmm_vma_walk->last = addr;
> +			migration_entry_wait(walk->mm, pmdp, addr);
> +			return -EBUSY;
>   		}
>   
>   		/* Report error for everything else */
>   		pte_unmap(ptep);
>   		*pfn = range->values[HMM_PFN_ERROR];
>   		return -EFAULT;
> -	} else {
> -		cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> -		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags,
> -				   &fault, &write_fault);
>   	}
>   
> +	cpu_flags = pte_to_hmm_pfn_flags(range, pte);
> +	hmm_pte_need_fault(hmm_vma_walk, orig_pfn, cpu_flags, &fault,
> +			   &write_fault);
>   	if (fault || write_fault)
>   		goto fault;
>   
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:36     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> hmm_range_fault() should never return 0 if the caller requested a valid
> page, but the pfns output for that page would be HMM_PFN_ERROR.
> 
> hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
> detect if the page is in faulting mode or not.
> 
> Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
> code.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 38 +++++++++++++++++++++-----------------
>   1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index bf676cfef3e8ee..f61fddf2ef6505 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   {
>   	struct hmm_vma_walk *hmm_vma_walk = walk->private;
>   	struct hmm_range *range = hmm_vma_walk->range;
> -	uint64_t *pfns = range->pfns;
> -	unsigned long addr = start, i;
> +	uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
> +	unsigned long npages = (end - start) >> PAGE_SHIFT;
> +	unsigned long addr = start;
> +	bool fault, write_fault;
>   	pte_t *ptep;
>   	pmd_t pmd;
>   
> @@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   		return hmm_vma_walk_hole(start, end, -1, walk);
>   
>   	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
> -		bool fault, write_fault;
> -		unsigned long npages;
> -		uint64_t *pfns;
> -
> -		i = (addr - range->start) >> PAGE_SHIFT;
> -		npages = (end - addr) >> PAGE_SHIFT;
> -		pfns = &range->pfns[i];
> -
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     0, &fault, &write_fault);
>   		if (fault || write_fault) {
> @@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			return -EBUSY;
>   		}
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> -	} else if (!pmd_present(pmd))
> +	}
> +
> +	if (!pmd_present(pmd)) {
> +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> +				     &write_fault);
> +		if (fault || write_fault)
> +			return -EFAULT;
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);

Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
Otherwise, when a THP is swapped out, you will get a different
value than if a PTE is swapped out and you are prefetching/snapshotting.

> +	}
>   
>   	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
>   		/*
> @@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
>   			goto again;
>   
> -		i = (addr - range->start) >> PAGE_SHIFT;
> -		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
> +		return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
>   	}
>   
>   	/*
> @@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   	 * entry pointing to pte directory or it is a bad pmd that will not
>   	 * recover.
>   	 */
> -	if (pmd_bad(pmd))
> +	if (pmd_bad(pmd)) {
> +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> +				     &write_fault);
> +		if (fault || write_fault)
> +			return -EFAULT;
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> +	}
>   
>   	ptep = pte_offset_map(pmdp, addr);
> -	i = (addr - range->start) >> PAGE_SHIFT;
> -	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
> +	for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
>   		int r;
>   
> -		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
> +		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
>   		if (r) {
>   			/* hmm_vma_handle_pte() did pte_unmap() */
>   			hmm_vma_walk->last = addr;
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-12  1:36     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> hmm_range_fault() should never return 0 if the caller requested a valid
> page, but the pfns output for that page would be HMM_PFN_ERROR.
> 
> hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
> detect if the page is in faulting mode or not.
> 
> Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
> code.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 38 +++++++++++++++++++++-----------------
>   1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index bf676cfef3e8ee..f61fddf2ef6505 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   {
>   	struct hmm_vma_walk *hmm_vma_walk = walk->private;
>   	struct hmm_range *range = hmm_vma_walk->range;
> -	uint64_t *pfns = range->pfns;
> -	unsigned long addr = start, i;
> +	uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
> +	unsigned long npages = (end - start) >> PAGE_SHIFT;
> +	unsigned long addr = start;
> +	bool fault, write_fault;
>   	pte_t *ptep;
>   	pmd_t pmd;
>   
> @@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   		return hmm_vma_walk_hole(start, end, -1, walk);
>   
>   	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
> -		bool fault, write_fault;
> -		unsigned long npages;
> -		uint64_t *pfns;
> -
> -		i = (addr - range->start) >> PAGE_SHIFT;
> -		npages = (end - addr) >> PAGE_SHIFT;
> -		pfns = &range->pfns[i];
> -
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     0, &fault, &write_fault);
>   		if (fault || write_fault) {
> @@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			return -EBUSY;
>   		}
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> -	} else if (!pmd_present(pmd))
> +	}
> +
> +	if (!pmd_present(pmd)) {
> +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> +				     &write_fault);
> +		if (fault || write_fault)
> +			return -EFAULT;
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);

Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
Otherwise, when a THP is swapped out, you will get a different
value than if a PTE is swapped out and you are prefetching/snapshotting.

> +	}
>   
>   	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
>   		/*
> @@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
>   			goto again;
>   
> -		i = (addr - range->start) >> PAGE_SHIFT;
> -		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
> +		return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
>   	}
>   
>   	/*
> @@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   	 * entry pointing to pte directory or it is a bad pmd that will not
>   	 * recover.
>   	 */
> -	if (pmd_bad(pmd))
> +	if (pmd_bad(pmd)) {
> +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> +				     &write_fault);
> +		if (fault || write_fault)
> +			return -EFAULT;
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> +	}
>   
>   	ptep = pte_offset_map(pmdp, addr);
> -	i = (addr - range->start) >> PAGE_SHIFT;
> -	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
> +	for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
>   		int r;
>   
> -		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
> +		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
>   		if (r) {
>   			/* hmm_vma_handle_pte() did pte_unmap() */
>   			hmm_vma_walk->last = addr;
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-12  1:36     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> hmm_range_fault() should never return 0 if the caller requested a valid
> page, but the pfns output for that page would be HMM_PFN_ERROR.
> 
> hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
> detect if the page is in faulting mode or not.
> 
> Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
> code.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Fixes: da4c3c735ea4 ("mm/hmm/mirror: helper to snapshot CPU page table")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 38 +++++++++++++++++++++-----------------
>   1 file changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index bf676cfef3e8ee..f61fddf2ef6505 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -363,8 +363,10 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   {
>   	struct hmm_vma_walk *hmm_vma_walk = walk->private;
>   	struct hmm_range *range = hmm_vma_walk->range;
> -	uint64_t *pfns = range->pfns;
> -	unsigned long addr = start, i;
> +	uint64_t *pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
> +	unsigned long npages = (end - start) >> PAGE_SHIFT;
> +	unsigned long addr = start;
> +	bool fault, write_fault;
>   	pte_t *ptep;
>   	pmd_t pmd;
>   
> @@ -374,14 +376,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   		return hmm_vma_walk_hole(start, end, -1, walk);
>   
>   	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
> -		bool fault, write_fault;
> -		unsigned long npages;
> -		uint64_t *pfns;
> -
> -		i = (addr - range->start) >> PAGE_SHIFT;
> -		npages = (end - addr) >> PAGE_SHIFT;
> -		pfns = &range->pfns[i];
> -
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     0, &fault, &write_fault);
>   		if (fault || write_fault) {
> @@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   			return -EBUSY;
>   		}
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> -	} else if (!pmd_present(pmd))
> +	}
> +
> +	if (!pmd_present(pmd)) {
> +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> +				     &write_fault);
> +		if (fault || write_fault)
> +			return -EFAULT;
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);

Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
Otherwise, when a THP is swapped out, you will get a different
value than if a PTE is swapped out and you are prefetching/snapshotting.

> +	}
>   
>   	if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
>   		/*
> @@ -408,8 +409,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   		if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd))
>   			goto again;
>   
> -		i = (addr - range->start) >> PAGE_SHIFT;
> -		return hmm_vma_handle_pmd(walk, addr, end, &pfns[i], pmd);
> +		return hmm_vma_handle_pmd(walk, addr, end, pfns, pmd);
>   	}
>   
>   	/*
> @@ -418,15 +418,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>   	 * entry pointing to pte directory or it is a bad pmd that will not
>   	 * recover.
>   	 */
> -	if (pmd_bad(pmd))
> +	if (pmd_bad(pmd)) {
> +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> +				     &write_fault);
> +		if (fault || write_fault)
> +			return -EFAULT;
>   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> +	}
>   
>   	ptep = pte_offset_map(pmdp, addr);
> -	i = (addr - range->start) >> PAGE_SHIFT;
> -	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
> +	for (; addr < end; addr += PAGE_SIZE, ptep++, pfns++) {
>   		int r;
>   
> -		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
> +		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, pfns);
>   		if (r) {
>   			/* hmm_vma_handle_pte() did pte_unmap() */
>   			hmm_vma_walk->last = addr;
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  1:38     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f61fddf2ef6505..ca33d086bdc190 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			pte_unmap(ptep);
>   			return -EBUSY;
>   		}
> -	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> -		if (!is_zero_pfn(pte_pfn(pte))) {
> +	}
> +
> +	/*
> +	 * Since each architecture defines a struct page for the zero page, just
> +	 * fall through and treat it like a normal page.
> +	 */
> +	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (fault || write_fault) {
>   			pte_unmap(ptep);
> -			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> -		/*
> -		 * Since each architecture defines a struct page for the zero
> -		 * page, just fall through and treat it like a normal page.
> -		 */
> +		*pfn = range->values[HMM_PFN_SPECIAL];
> +		return 0;
>   	}
>   
>   	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-12  1:38     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f61fddf2ef6505..ca33d086bdc190 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			pte_unmap(ptep);
>   			return -EBUSY;
>   		}
> -	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> -		if (!is_zero_pfn(pte_pfn(pte))) {
> +	}
> +
> +	/*
> +	 * Since each architecture defines a struct page for the zero page, just
> +	 * fall through and treat it like a normal page.
> +	 */
> +	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (fault || write_fault) {
>   			pte_unmap(ptep);
> -			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> -		/*
> -		 * Since each architecture defines a struct page for the zero
> -		 * page, just fall through and treat it like a normal page.
> -		 */
> +		*pfn = range->values[HMM_PFN_SPECIAL];
> +		return 0;
>   	}
>   
>   	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-12  1:38     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12  1:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig


On 3/11/20 11:35 AM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 19 ++++++++++++-------
>   1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f61fddf2ef6505..ca33d086bdc190 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -335,16 +335,21 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>   			pte_unmap(ptep);
>   			return -EBUSY;
>   		}
> -	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> -		if (!is_zero_pfn(pte_pfn(pte))) {
> +	}
> +
> +	/*
> +	 * Since each architecture defines a struct page for the zero page, just
> +	 * fall through and treat it like a normal page.
> +	 */
> +	if (pte_special(pte) && !is_zero_pfn(pte_pfn(pte))) {
> +		hmm_pte_need_fault(hmm_vma_walk, orig_pfn, 0, &fault,
> +				   &write_fault);
> +		if (fault || write_fault) {
>   			pte_unmap(ptep);
> -			*pfn = range->values[HMM_PFN_SPECIAL];
>   			return -EFAULT;
>   		}
> -		/*
> -		 * Since each architecture defines a struct page for the zero
> -		 * page, just fall through and treat it like a normal page.
> -		 */
> +		*pfn = range->values[HMM_PFN_SPECIAL];
> +		return 0;
>   	}
>   
>   	*pfn = hmm_device_entry_from_pfn(range, pte_pfn(pte)) | cpu_flags;
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  2020-03-11 18:35   ` Jason Gunthorpe
  (?)
@ 2020-03-12  8:54     ` Steven Price
  -1 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12  8:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang, Jason Gunthorpe

On 11/03/2020 18:35, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Sorry about that, thanks for fixing.

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> 



^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-12  8:54     ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12  8:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

On 11/03/2020 18:35, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Sorry about that, thanks for fixing.

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-12  8:54     ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12  8:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig

On 11/03/2020 18:35, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.
> 
> Fixes: 3afc423632a1 ("mm: pagewalk: add p4d_entry() and pgd_entry()")
> Cc: Steven Price <steven.price@arm.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Sorry about that, thanks for fixing.

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
>   mm/hmm.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9e8f68eb83287a..32dcbfd3908315 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -473,8 +473,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   
>   	pud = READ_ONCE(*pudp);
>   	if (pud_none(pud)) {
> -		ret = hmm_vma_walk_hole(start, end, -1, walk);
> -		goto out_unlock;
> +		spin_unlock(ptl);
> +		return hmm_vma_walk_hole(start, end, -1, walk);
>   	}
>   
>   	if (pud_huge(pud) && pud_devmap(pud)) {
> @@ -483,8 +483,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		bool fault, write_fault;
>   
>   		if (!pud_present(pud)) {
> -			ret = hmm_vma_walk_hole(start, end, -1, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole(start, end, -1, walk);
>   		}
>   
>   		i = (addr - range->start) >> PAGE_SHIFT;
> @@ -495,9 +495,9 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
>   		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
>   				     cpu_flags, &fault, &write_fault);
>   		if (fault || write_fault) {
> -			ret = hmm_vma_walk_hole_(addr, end, fault,
> -						 write_fault, walk);
> -			goto out_unlock;
> +			spin_unlock(ptl);
> +			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
> +						  walk);
>   		}
>   
>   		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12  8:54     ` Steven Price
  (?)
@ 2020-03-12 10:28       ` Steven Price
  -1 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 10:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, Jason Gunthorpe,
	dri-devel, Christoph Hellwig, Steven Price

By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
condition early it's possible to remove the 'ret' variable and remove a
level of indentation from half the function making the code easier to
read.

No functional change.

Signed-off-by: Steven Price <steven.price@arm.com>
---
Thanks to Jason's changes there were only two code paths left using
the out_unlock label so it seemed like a good opportunity to refactor.
---
 mm/hmm.c | 69 ++++++++++++++++++++++++++------------------------------
 1 file changed, 32 insertions(+), 37 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc1..0117c86426d1 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -466,8 +466,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 	struct hmm_range *range = hmm_vma_walk->range;
 	unsigned long addr = start;
 	pud_t pud;
-	int ret = 0;
 	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
+	unsigned long i, npages, pfn;
+	uint64_t *pfns, cpu_flags;
+	bool fault, write_fault;
 
 	if (!ptl)
 		return 0;
@@ -481,50 +483,43 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
-	if (pud_huge(pud) && pud_devmap(pud)) {
-		unsigned long i, npages, pfn;
-		uint64_t *pfns, cpu_flags;
-		bool fault, write_fault;
+	if (!pud_huge(pud) || !pud_devmap(pud)) {
+		/* Ask for the PUD to be split */
+		walk->action = ACTION_SUBTREE;
+		spin_unlock(ptl);
+		return 0;
+	}
 
-		if (!pud_present(pud)) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole(start, end, -1, walk);
-		}
+	if (!pud_present(pud)) {
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
+	}
 
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
+	i = (addr - range->start) >> PAGE_SHIFT;
+	npages = (end - addr) >> PAGE_SHIFT;
+	pfns = &range->pfns[i];
 
-		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
-		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
-				     cpu_flags, &fault, &write_fault);
-		if (fault || write_fault) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
-						  walk);
-		}
+	cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages,
+			     cpu_flags, &fault, &write_fault);
+	if (fault || write_fault) {
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+	}
 
-		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
-			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
-				  cpu_flags;
+	pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	for (i = 0; i < npages; ++i, ++pfn) {
+		hmm_vma_walk->pgmap = get_dev_pagemap(pfn, hmm_vma_walk->pgmap);
+		if (unlikely(!hmm_vma_walk->pgmap)) {
+			spin_unlock(ptl);
+			return -EBUSY;
 		}
-		hmm_vma_walk->last = end;
-		goto out_unlock;
+		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
+	hmm_vma_walk->last = end;
 
-	/* Ask for the PUD to be split */
-	walk->action = ACTION_SUBTREE;
-
-out_unlock:
 	spin_unlock(ptl);
-	return ret;
+	return 0;
 }
 #else
 #define hmm_vma_walk_pud	NULL
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 10:28       ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 10:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, dri-devel, Steven Price, linux-mm,
	Jason Gunthorpe, amd-gfx, Christoph Hellwig

By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
condition early it's possible to remove the 'ret' variable and remove a
level of indentation from half the function making the code easier to
read.

No functional change.

Signed-off-by: Steven Price <steven.price@arm.com>
---
Thanks to Jason's changes there were only two code paths left using
the out_unlock label so it seemed like a good opportunity to refactor.
---
 mm/hmm.c | 69 ++++++++++++++++++++++++++------------------------------
 1 file changed, 32 insertions(+), 37 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc1..0117c86426d1 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -466,8 +466,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 	struct hmm_range *range = hmm_vma_walk->range;
 	unsigned long addr = start;
 	pud_t pud;
-	int ret = 0;
 	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
+	unsigned long i, npages, pfn;
+	uint64_t *pfns, cpu_flags;
+	bool fault, write_fault;
 
 	if (!ptl)
 		return 0;
@@ -481,50 +483,43 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
-	if (pud_huge(pud) && pud_devmap(pud)) {
-		unsigned long i, npages, pfn;
-		uint64_t *pfns, cpu_flags;
-		bool fault, write_fault;
+	if (!pud_huge(pud) || !pud_devmap(pud)) {
+		/* Ask for the PUD to be split */
+		walk->action = ACTION_SUBTREE;
+		spin_unlock(ptl);
+		return 0;
+	}
 
-		if (!pud_present(pud)) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole(start, end, -1, walk);
-		}
+	if (!pud_present(pud)) {
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
+	}
 
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
+	i = (addr - range->start) >> PAGE_SHIFT;
+	npages = (end - addr) >> PAGE_SHIFT;
+	pfns = &range->pfns[i];
 
-		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
-		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
-				     cpu_flags, &fault, &write_fault);
-		if (fault || write_fault) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
-						  walk);
-		}
+	cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages,
+			     cpu_flags, &fault, &write_fault);
+	if (fault || write_fault) {
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+	}
 
-		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
-			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
-				  cpu_flags;
+	pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	for (i = 0; i < npages; ++i, ++pfn) {
+		hmm_vma_walk->pgmap = get_dev_pagemap(pfn, hmm_vma_walk->pgmap);
+		if (unlikely(!hmm_vma_walk->pgmap)) {
+			spin_unlock(ptl);
+			return -EBUSY;
 		}
-		hmm_vma_walk->last = end;
-		goto out_unlock;
+		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
+	hmm_vma_walk->last = end;
 
-	/* Ask for the PUD to be split */
-	walk->action = ACTION_SUBTREE;
-
-out_unlock:
 	spin_unlock(ptl);
-	return ret;
+	return 0;
 }
 #else
 #define hmm_vma_walk_pud	NULL
-- 
2.20.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 10:28       ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 10:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, dri-devel, Steven Price, linux-mm,
	Jason Gunthorpe, amd-gfx, Christoph Hellwig

By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
condition early it's possible to remove the 'ret' variable and remove a
level of indentation from half the function making the code easier to
read.

No functional change.

Signed-off-by: Steven Price <steven.price@arm.com>
---
Thanks to Jason's changes there were only two code paths left using
the out_unlock label so it seemed like a good opportunity to refactor.
---
 mm/hmm.c | 69 ++++++++++++++++++++++++++------------------------------
 1 file changed, 32 insertions(+), 37 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc1..0117c86426d1 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -466,8 +466,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 	struct hmm_range *range = hmm_vma_walk->range;
 	unsigned long addr = start;
 	pud_t pud;
-	int ret = 0;
 	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
+	unsigned long i, npages, pfn;
+	uint64_t *pfns, cpu_flags;
+	bool fault, write_fault;
 
 	if (!ptl)
 		return 0;
@@ -481,50 +483,43 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		return hmm_vma_walk_hole(start, end, -1, walk);
 	}
 
-	if (pud_huge(pud) && pud_devmap(pud)) {
-		unsigned long i, npages, pfn;
-		uint64_t *pfns, cpu_flags;
-		bool fault, write_fault;
+	if (!pud_huge(pud) || !pud_devmap(pud)) {
+		/* Ask for the PUD to be split */
+		walk->action = ACTION_SUBTREE;
+		spin_unlock(ptl);
+		return 0;
+	}
 
-		if (!pud_present(pud)) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole(start, end, -1, walk);
-		}
+	if (!pud_present(pud)) {
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole(start, end, -1, walk);
+	}
 
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
+	i = (addr - range->start) >> PAGE_SHIFT;
+	npages = (end - addr) >> PAGE_SHIFT;
+	pfns = &range->pfns[i];
 
-		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
-		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
-				     cpu_flags, &fault, &write_fault);
-		if (fault || write_fault) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole_(addr, end, fault, write_fault,
-						  walk);
-		}
+	cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	hmm_range_need_fault(hmm_vma_walk, pfns, npages,
+			     cpu_flags, &fault, &write_fault);
+	if (fault || write_fault) {
+		spin_unlock(ptl);
+		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
+	}
 
-		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
-			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
-				  cpu_flags;
+	pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	for (i = 0; i < npages; ++i, ++pfn) {
+		hmm_vma_walk->pgmap = get_dev_pagemap(pfn, hmm_vma_walk->pgmap);
+		if (unlikely(!hmm_vma_walk->pgmap)) {
+			spin_unlock(ptl);
+			return -EBUSY;
 		}
-		hmm_vma_walk->last = end;
-		goto out_unlock;
+		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
+	hmm_vma_walk->last = end;
 
-	/* Ask for the PUD to be split */
-	walk->action = ACTION_SUBTREE;
-
-out_unlock:
 	spin_unlock(ptl);
-	return ret;
+	return 0;
 }
 #else
 #define hmm_vma_walk_pud	NULL
-- 
2.20.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  2020-03-12  1:28     ` Ralph Campbell
  (?)
@ 2020-03-12 14:24       ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:24 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Jerome Glisse, Felix.Kuehling, linux-mm, John Hubbard, dri-devel,
	amd-gfx, Christoph Hellwig, Philip Yang

On Wed, Mar 11, 2020 at 06:28:30PM -0700, Ralph Campbell wrote:
> >   mm/hmm.c | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 72e5a6d9a41756..35f85424176d14 100644
> > +++ b/mm/hmm.c
> > @@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> >   		}
> >   		/* Report error for everything else */
> > +		pte_unmap(ptep);
> >   		*pfn = range->values[HMM_PFN_ERROR];
> >   		return -EFAULT;
> >   	} else {
> > @@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> >   	if (pte_devmap(pte)) {
> >   		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
> >   					      hmm_vma_walk->pgmap);
> > -		if (unlikely(!hmm_vma_walk->pgmap))
> > +		if (unlikely(!hmm_vma_walk->pgmap)) {
> > +			pte_unmap(ptep);
> >   			return -EBUSY;
> > +		}
> >   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> >   		if (!is_zero_pfn(pte_pfn(pte))) {
> > +			pte_unmap(ptep);
> >   			*pfn = range->values[HMM_PFN_SPECIAL];
> >   			return -EFAULT;
> >   		}
> > @@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >   		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
> >   		if (r) {
> > -			/* hmm_vma_handle_pte() did unmap pte directory */
> > +			/* hmm_vma_handle_pte() did pte_unmap() */
> >   			hmm_vma_walk->last = addr;
> >   			return r;
> >   		}
> > 
> 
> I think there is a case where hmm_vma_handle_pte() is called, a fault is requested,
> pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the fault
> was handled OK)

Not quite, hmm_vma_walk_hole_() never returns 0 if called with fault:

        return (fault || write_fault) ? -EBUSY : 0;

And all the call sites of hmm_vma_walk_hole_() in hmm_vma_handle_pte()
are structured as:

                if (fault || write_fault)
                        goto fault;
fault:
        return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);

So, it never returns 0.

I already made a patch making this less twisty while fixing something
else:

https://github.com/jgunthorpe/linux/commit/078e10ca5919f2c263c245784fb5fe63ddbb61f4

Thanks,
Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-12 14:24       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:24 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Philip Yang, John Hubbard, Felix.Kuehling, amd-gfx, linux-mm,
	Jerome Glisse, dri-devel, Christoph Hellwig

On Wed, Mar 11, 2020 at 06:28:30PM -0700, Ralph Campbell wrote:
> >   mm/hmm.c | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 72e5a6d9a41756..35f85424176d14 100644
> > +++ b/mm/hmm.c
> > @@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> >   		}
> >   		/* Report error for everything else */
> > +		pte_unmap(ptep);
> >   		*pfn = range->values[HMM_PFN_ERROR];
> >   		return -EFAULT;
> >   	} else {
> > @@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> >   	if (pte_devmap(pte)) {
> >   		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
> >   					      hmm_vma_walk->pgmap);
> > -		if (unlikely(!hmm_vma_walk->pgmap))
> > +		if (unlikely(!hmm_vma_walk->pgmap)) {
> > +			pte_unmap(ptep);
> >   			return -EBUSY;
> > +		}
> >   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> >   		if (!is_zero_pfn(pte_pfn(pte))) {
> > +			pte_unmap(ptep);
> >   			*pfn = range->values[HMM_PFN_SPECIAL];
> >   			return -EFAULT;
> >   		}
> > @@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >   		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
> >   		if (r) {
> > -			/* hmm_vma_handle_pte() did unmap pte directory */
> > +			/* hmm_vma_handle_pte() did pte_unmap() */
> >   			hmm_vma_walk->last = addr;
> >   			return r;
> >   		}
> > 
> 
> I think there is a case where hmm_vma_handle_pte() is called, a fault is requested,
> pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the fault
> was handled OK)

Not quite, hmm_vma_walk_hole_() never returns 0 if called with fault:

        return (fault || write_fault) ? -EBUSY : 0;

And all the call sites of hmm_vma_walk_hole_() in hmm_vma_handle_pte()
are structured as:

                if (fault || write_fault)
                        goto fault;
fault:
        return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);

So, it never returns 0.

I already made a patch making this less twisty while fixing something
else:

https://github.com/jgunthorpe/linux/commit/078e10ca5919f2c263c245784fb5fe63ddbb61f4

Thanks,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-12 14:24       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:24 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Philip Yang, John Hubbard, Felix.Kuehling, amd-gfx, linux-mm,
	Jerome Glisse, dri-devel, Christoph Hellwig

On Wed, Mar 11, 2020 at 06:28:30PM -0700, Ralph Campbell wrote:
> >   mm/hmm.c | 8 ++++++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 72e5a6d9a41756..35f85424176d14 100644
> > +++ b/mm/hmm.c
> > @@ -325,6 +325,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> >   		}
> >   		/* Report error for everything else */
> > +		pte_unmap(ptep);
> >   		*pfn = range->values[HMM_PFN_ERROR];
> >   		return -EFAULT;
> >   	} else {
> > @@ -339,10 +340,13 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> >   	if (pte_devmap(pte)) {
> >   		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
> >   					      hmm_vma_walk->pgmap);
> > -		if (unlikely(!hmm_vma_walk->pgmap))
> > +		if (unlikely(!hmm_vma_walk->pgmap)) {
> > +			pte_unmap(ptep);
> >   			return -EBUSY;
> > +		}
> >   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> >   		if (!is_zero_pfn(pte_pfn(pte))) {
> > +			pte_unmap(ptep);
> >   			*pfn = range->values[HMM_PFN_SPECIAL];
> >   			return -EFAULT;
> >   		}
> > @@ -437,7 +441,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >   		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
> >   		if (r) {
> > -			/* hmm_vma_handle_pte() did unmap pte directory */
> > +			/* hmm_vma_handle_pte() did pte_unmap() */
> >   			hmm_vma_walk->last = addr;
> >   			return r;
> >   		}
> > 
> 
> I think there is a case where hmm_vma_handle_pte() is called, a fault is requested,
> pte_unmap() and hmm_vma_walk_hole_() are called, the latter returns zero (the fault
> was handled OK)

Not quite, hmm_vma_walk_hole_() never returns 0 if called with fault:

        return (fault || write_fault) ? -EBUSY : 0;

And all the call sites of hmm_vma_walk_hole_() in hmm_vma_handle_pte()
are structured as:

                if (fault || write_fault)
                        goto fault;
fault:
        return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);

So, it never returns 0.

I already made a patch making this less twisty while fixing something
else:

https://github.com/jgunthorpe/linux/commit/078e10ca5919f2c263c245784fb5fe63ddbb61f4

Thanks,
Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 10:28       ` Steven Price
  (?)
@ 2020-03-12 14:27         ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:27 UTC (permalink / raw)
  To: Steven Price
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
> condition early it's possible to remove the 'ret' variable and remove a
> level of indentation from half the function making the code easier to
> read.
> 
> No functional change.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Thanks to Jason's changes there were only two code paths left using
> the out_unlock label so it seemed like a good opportunity to
> refactor.

Yes, I made something very similar, what do you think of this:

https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36

From 93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Wed, 4 Mar 2020 17:11:10 -0400
Subject: [PATCH] mm/hmm: rework hmm_vma_walk_pud()

At least since commit 3afc423632a1 ("mm: pagewalk: add p4d_entry() and
pgd_entry()") this code has developed a number of strange control flows.

The purpose of the routine is to copy the pfns of a huge devmap PUD into
the pfns output array, without splitting the PUD. Everything that is not a
huge devmap PUD should go back to the walker for splitting.

Rework the logic to show this goal and remove redundant stuff:

- If pud_trans_huge_lock returns !NULL then this is already
  'pud_trans_huge() || pud_devmap()' and 'pud_huge() || pud_devmap()'
  so some of the logic is redundant.

- Hitting pud_none() is a race, treat it as such and return back to the
  walker using ACTION_AGAIN

- !pud_present() gives 0 cpu_flags, so the extra checks are redundant

- Once the *pudp is read there is no need to continue holding the pud
  lock, so drop it. The only thing the following code cares about is the
  pfn from the devmap, and if there is racing then the notifiers will
  resolve everything. Perhaps the unlocked READ_ONCE in an ealier version
  was correct

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 79 +++++++++++++++++++++++---------------------------------
 1 file changed, 33 insertions(+), 46 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 8fec801a33c9e2..87a376659b5ad4 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -455,66 +455,53 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	unsigned long addr = start;
+	unsigned long i, npages, pfn;
+	unsigned int required_fault;
+	uint64_t cpu_flags;
+	uint64_t *pfns;
+	spinlock_t *ptl;
 	pud_t pud;
-	int ret = 0;
-	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
 
+	/*
+	 * This only handles huge devmap pages, the default return is
+	 * ACTION_SUBTREE, so everything else is split by the walker and passed
+	 * to the other routines.
+	 */
+	ptl = pud_trans_huge_lock(pudp, walk->vma);
 	if (!ptl)
 		return 0;
+	pud = *pudp;
+	spin_unlock(ptl);
 
-	/* Normally we don't want to split the huge page */
-	walk->action = ACTION_CONTINUE;
-
-	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		spin_unlock(ptl);
-		return hmm_vma_walk_hole(start, end, -1, walk);
+		walk->action = ACTION_AGAIN;
+		return 0;
 	}
 
-	if (pud_huge(pud) && pud_devmap(pud)) {
-		unsigned long i, npages, pfn;
-		unsigned int required_flags;
-		uint64_t *pfns, cpu_flags;
-
-		if (!pud_present(pud)) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole(start, end, -1, walk);
-		}
-
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
+	if (!pud_devmap(pud))
+		return 0;
 
-		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+	cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	required_fault =
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags);
-		if (required_flags) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole_(addr, end, required_flags,
-						  walk);
-		}
+	if (required_fault)
+		return hmm_vma_walk_hole_(start, end, required_fault, walk);
 
-		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
-			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
-				  cpu_flags;
-		}
-		hmm_vma_walk->last = end;
-		goto out_unlock;
+	pfn = pud_pfn(pud) + ((start & ~PUD_MASK) >> PAGE_SHIFT);
+	npages = (end - start) >> PAGE_SHIFT;
+	for (i = 0; i < npages; ++i, ++pfn) {
+		hmm_vma_walk->pgmap = get_dev_pagemap(pfn, hmm_vma_walk->pgmap);
+		if (unlikely(!hmm_vma_walk->pgmap))
+			return -EBUSY;
+		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
 
-	/* Ask for the PUD to be split */
-	walk->action = ACTION_SUBTREE;
+	hmm_vma_walk->last = end;
 
-out_unlock:
-	spin_unlock(ptl);
-	return ret;
+	/* Do not split the pud */
+	walk->action = ACTION_CONTINUE;
+	return 0;
 }
 #else
 #define hmm_vma_walk_pud	NULL
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 14:27         ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:27 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
> condition early it's possible to remove the 'ret' variable and remove a
> level of indentation from half the function making the code easier to
> read.
> 
> No functional change.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Thanks to Jason's changes there were only two code paths left using
> the out_unlock label so it seemed like a good opportunity to
> refactor.

Yes, I made something very similar, what do you think of this:

https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36

From 93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Wed, 4 Mar 2020 17:11:10 -0400
Subject: [PATCH] mm/hmm: rework hmm_vma_walk_pud()

At least since commit 3afc423632a1 ("mm: pagewalk: add p4d_entry() and
pgd_entry()") this code has developed a number of strange control flows.

The purpose of the routine is to copy the pfns of a huge devmap PUD into
the pfns output array, without splitting the PUD. Everything that is not a
huge devmap PUD should go back to the walker for splitting.

Rework the logic to show this goal and remove redundant stuff:

- If pud_trans_huge_lock returns !NULL then this is already
  'pud_trans_huge() || pud_devmap()' and 'pud_huge() || pud_devmap()'
  so some of the logic is redundant.

- Hitting pud_none() is a race, treat it as such and return back to the
  walker using ACTION_AGAIN

- !pud_present() gives 0 cpu_flags, so the extra checks are redundant

- Once the *pudp is read there is no need to continue holding the pud
  lock, so drop it. The only thing the following code cares about is the
  pfn from the devmap, and if there is racing then the notifiers will
  resolve everything. Perhaps the unlocked READ_ONCE in an ealier version
  was correct

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 79 +++++++++++++++++++++++---------------------------------
 1 file changed, 33 insertions(+), 46 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 8fec801a33c9e2..87a376659b5ad4 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -455,66 +455,53 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	unsigned long addr = start;
+	unsigned long i, npages, pfn;
+	unsigned int required_fault;
+	uint64_t cpu_flags;
+	uint64_t *pfns;
+	spinlock_t *ptl;
 	pud_t pud;
-	int ret = 0;
-	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
 
+	/*
+	 * This only handles huge devmap pages, the default return is
+	 * ACTION_SUBTREE, so everything else is split by the walker and passed
+	 * to the other routines.
+	 */
+	ptl = pud_trans_huge_lock(pudp, walk->vma);
 	if (!ptl)
 		return 0;
+	pud = *pudp;
+	spin_unlock(ptl);
 
-	/* Normally we don't want to split the huge page */
-	walk->action = ACTION_CONTINUE;
-
-	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		spin_unlock(ptl);
-		return hmm_vma_walk_hole(start, end, -1, walk);
+		walk->action = ACTION_AGAIN;
+		return 0;
 	}
 
-	if (pud_huge(pud) && pud_devmap(pud)) {
-		unsigned long i, npages, pfn;
-		unsigned int required_flags;
-		uint64_t *pfns, cpu_flags;
-
-		if (!pud_present(pud)) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole(start, end, -1, walk);
-		}
-
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
+	if (!pud_devmap(pud))
+		return 0;
 
-		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+	cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	required_fault =
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags);
-		if (required_flags) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole_(addr, end, required_flags,
-						  walk);
-		}
+	if (required_fault)
+		return hmm_vma_walk_hole_(start, end, required_fault, walk);
 
-		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
-			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
-				  cpu_flags;
-		}
-		hmm_vma_walk->last = end;
-		goto out_unlock;
+	pfn = pud_pfn(pud) + ((start & ~PUD_MASK) >> PAGE_SHIFT);
+	npages = (end - start) >> PAGE_SHIFT;
+	for (i = 0; i < npages; ++i, ++pfn) {
+		hmm_vma_walk->pgmap = get_dev_pagemap(pfn, hmm_vma_walk->pgmap);
+		if (unlikely(!hmm_vma_walk->pgmap))
+			return -EBUSY;
+		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
 
-	/* Ask for the PUD to be split */
-	walk->action = ACTION_SUBTREE;
+	hmm_vma_walk->last = end;
 
-out_unlock:
-	spin_unlock(ptl);
-	return ret;
+	/* Do not split the pud */
+	walk->action = ACTION_CONTINUE;
+	return 0;
 }
 #else
 #define hmm_vma_walk_pud	NULL
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 14:27         ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:27 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
> condition early it's possible to remove the 'ret' variable and remove a
> level of indentation from half the function making the code easier to
> read.
> 
> No functional change.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Thanks to Jason's changes there were only two code paths left using
> the out_unlock label so it seemed like a good opportunity to
> refactor.

Yes, I made something very similar, what do you think of this:

https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36

From 93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Wed, 4 Mar 2020 17:11:10 -0400
Subject: [PATCH] mm/hmm: rework hmm_vma_walk_pud()

At least since commit 3afc423632a1 ("mm: pagewalk: add p4d_entry() and
pgd_entry()") this code has developed a number of strange control flows.

The purpose of the routine is to copy the pfns of a huge devmap PUD into
the pfns output array, without splitting the PUD. Everything that is not a
huge devmap PUD should go back to the walker for splitting.

Rework the logic to show this goal and remove redundant stuff:

- If pud_trans_huge_lock returns !NULL then this is already
  'pud_trans_huge() || pud_devmap()' and 'pud_huge() || pud_devmap()'
  so some of the logic is redundant.

- Hitting pud_none() is a race, treat it as such and return back to the
  walker using ACTION_AGAIN

- !pud_present() gives 0 cpu_flags, so the extra checks are redundant

- Once the *pudp is read there is no need to continue holding the pud
  lock, so drop it. The only thing the following code cares about is the
  pfn from the devmap, and if there is racing then the notifiers will
  resolve everything. Perhaps the unlocked READ_ONCE in an ealier version
  was correct

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 79 +++++++++++++++++++++++---------------------------------
 1 file changed, 33 insertions(+), 46 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 8fec801a33c9e2..87a376659b5ad4 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -455,66 +455,53 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
-	unsigned long addr = start;
+	unsigned long i, npages, pfn;
+	unsigned int required_fault;
+	uint64_t cpu_flags;
+	uint64_t *pfns;
+	spinlock_t *ptl;
 	pud_t pud;
-	int ret = 0;
-	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
 
+	/*
+	 * This only handles huge devmap pages, the default return is
+	 * ACTION_SUBTREE, so everything else is split by the walker and passed
+	 * to the other routines.
+	 */
+	ptl = pud_trans_huge_lock(pudp, walk->vma);
 	if (!ptl)
 		return 0;
+	pud = *pudp;
+	spin_unlock(ptl);
 
-	/* Normally we don't want to split the huge page */
-	walk->action = ACTION_CONTINUE;
-
-	pud = READ_ONCE(*pudp);
 	if (pud_none(pud)) {
-		spin_unlock(ptl);
-		return hmm_vma_walk_hole(start, end, -1, walk);
+		walk->action = ACTION_AGAIN;
+		return 0;
 	}
 
-	if (pud_huge(pud) && pud_devmap(pud)) {
-		unsigned long i, npages, pfn;
-		unsigned int required_flags;
-		uint64_t *pfns, cpu_flags;
-
-		if (!pud_present(pud)) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole(start, end, -1, walk);
-		}
-
-		i = (addr - range->start) >> PAGE_SHIFT;
-		npages = (end - addr) >> PAGE_SHIFT;
-		pfns = &range->pfns[i];
+	if (!pud_devmap(pud))
+		return 0;
 
-		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	pfns = &range->pfns[(start - range->start) >> PAGE_SHIFT];
+	cpu_flags = pud_to_hmm_pfn_flags(range, pud);
+	required_fault =
 		hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags);
-		if (required_flags) {
-			spin_unlock(ptl);
-			return hmm_vma_walk_hole_(addr, end, required_flags,
-						  walk);
-		}
+	if (required_fault)
+		return hmm_vma_walk_hole_(start, end, required_fault, walk);
 
-		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
-			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
-				  cpu_flags;
-		}
-		hmm_vma_walk->last = end;
-		goto out_unlock;
+	pfn = pud_pfn(pud) + ((start & ~PUD_MASK) >> PAGE_SHIFT);
+	npages = (end - start) >> PAGE_SHIFT;
+	for (i = 0; i < npages; ++i, ++pfn) {
+		hmm_vma_walk->pgmap = get_dev_pagemap(pfn, hmm_vma_walk->pgmap);
+		if (unlikely(!hmm_vma_walk->pgmap))
+			return -EBUSY;
+		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
 
-	/* Ask for the PUD to be split */
-	walk->action = ACTION_SUBTREE;
+	hmm_vma_walk->last = end;
 
-out_unlock:
-	spin_unlock(ptl);
-	return ret;
+	/* Do not split the pud */
+	walk->action = ACTION_CONTINUE;
+	return 0;
 }
 #else
 #define hmm_vma_walk_pud	NULL
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
  2020-03-12  1:36     ` Ralph Campbell
  (?)
@ 2020-03-12 14:35       ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:35 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Jerome Glisse, Felix.Kuehling, linux-mm, John Hubbard, dri-devel,
	amd-gfx, Christoph Hellwig, Philip Yang

On Wed, Mar 11, 2020 at 06:36:47PM -0700, Ralph Campbell wrote:
> > @@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >   			return -EBUSY;
> >   		}
> >   		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> > -	} else if (!pmd_present(pmd))
> > +	}
> > +
> > +	if (!pmd_present(pmd)) {
> > +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> > +				     &write_fault);
> > +		if (fault || write_fault)
> > +			return -EFAULT;
> >   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> 
> Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
> Otherwise, when a THP is swapped out, you will get a different
> value than if a PTE is swapped out and you are prefetching/snapshotting.

If this is the case then the problem is that the return -EFAULT path
needs to do something else.. ie since the above code can't trigger
swap in, it is correct to return PFN_ERROR.

I'm completely guessing, but do we need to call pmd_to_swp_entry() and
handle things similarly to the pte? What swp_entries are valid for a
pmd?

Do you understand this better, or know how to trigger a !pmd_present
for test?

I suppose another option would be this:

	if (!pmd_present(pmd)) {
		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
				     &write_fault);
                /* We can't handle this. Cause the PMD to be split and
		 * handle it in the pte handler. */
		if (fault || write_fault)
 		        return 0;
  		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
        }

Which, I think, must be correct, but inefficient?

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-12 14:35       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:35 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Philip Yang, John Hubbard, Felix.Kuehling, amd-gfx, linux-mm,
	Jerome Glisse, dri-devel, Christoph Hellwig

On Wed, Mar 11, 2020 at 06:36:47PM -0700, Ralph Campbell wrote:
> > @@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >   			return -EBUSY;
> >   		}
> >   		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> > -	} else if (!pmd_present(pmd))
> > +	}
> > +
> > +	if (!pmd_present(pmd)) {
> > +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> > +				     &write_fault);
> > +		if (fault || write_fault)
> > +			return -EFAULT;
> >   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> 
> Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
> Otherwise, when a THP is swapped out, you will get a different
> value than if a PTE is swapped out and you are prefetching/snapshotting.

If this is the case then the problem is that the return -EFAULT path
needs to do something else.. ie since the above code can't trigger
swap in, it is correct to return PFN_ERROR.

I'm completely guessing, but do we need to call pmd_to_swp_entry() and
handle things similarly to the pte? What swp_entries are valid for a
pmd?

Do you understand this better, or know how to trigger a !pmd_present
for test?

I suppose another option would be this:

	if (!pmd_present(pmd)) {
		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
				     &write_fault);
                /* We can't handle this. Cause the PMD to be split and
		 * handle it in the pte handler. */
		if (fault || write_fault)
 		        return 0;
  		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
        }

Which, I think, must be correct, but inefficient?

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-12 14:35       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 14:35 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Philip Yang, John Hubbard, Felix.Kuehling, amd-gfx, linux-mm,
	Jerome Glisse, dri-devel, Christoph Hellwig

On Wed, Mar 11, 2020 at 06:36:47PM -0700, Ralph Campbell wrote:
> > @@ -390,8 +384,15 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
> >   			return -EBUSY;
> >   		}
> >   		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
> > -	} else if (!pmd_present(pmd))
> > +	}
> > +
> > +	if (!pmd_present(pmd)) {
> > +		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
> > +				     &write_fault);
> > +		if (fault || write_fault)
> > +			return -EFAULT;
> >   		return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR);
> 
> Shouldn't this fill with HMM_PFN_NONE instead of HMM_PFN_ERROR?
> Otherwise, when a THP is swapped out, you will get a different
> value than if a PTE is swapped out and you are prefetching/snapshotting.

If this is the case then the problem is that the return -EFAULT path
needs to do something else.. ie since the above code can't trigger
swap in, it is correct to return PFN_ERROR.

I'm completely guessing, but do we need to call pmd_to_swp_entry() and
handle things similarly to the pte? What swp_entries are valid for a
pmd?

Do you understand this better, or know how to trigger a !pmd_present
for test?

I suppose another option would be this:

	if (!pmd_present(pmd)) {
		hmm_range_need_fault(hmm_vma_walk, pfns, npages, 0, &fault,
				     &write_fault);
                /* We can't handle this. Cause the PMD to be split and
		 * handle it in the pte handler. */
		if (fault || write_fault)
 		        return 0;
  		return hmm_pfns_fill(start, end, range, HMM_PFN_NONE);
        }

Which, I think, must be correct, but inefficient?

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 14:27         ` Jason Gunthorpe
  (?)
@ 2020-03-12 14:40           ` Steven Price
  -1 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 14:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On 12/03/2020 14:27, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
>> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
>> condition early it's possible to remove the 'ret' variable and remove a
>> level of indentation from half the function making the code easier to
>> read.
>>
>> No functional change.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Thanks to Jason's changes there were only two code paths left using
>> the out_unlock label so it seemed like a good opportunity to
>> refactor.
> 
> Yes, I made something very similar, what do you think of this:
> 
> https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36

Even better! Sorry I didn't realise you'd already done this. I just saw 
that the function was needlessly complicated after your fix, so I 
thought I'd do a drive-by cleanup since part of the mess was my fault! :)

Thanks,

Steve


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 14:40           ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 14:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On 12/03/2020 14:27, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
>> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
>> condition early it's possible to remove the 'ret' variable and remove a
>> level of indentation from half the function making the code easier to
>> read.
>>
>> No functional change.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Thanks to Jason's changes there were only two code paths left using
>> the out_unlock label so it seemed like a good opportunity to
>> refactor.
> 
> Yes, I made something very similar, what do you think of this:
> 
> https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36

Even better! Sorry I didn't realise you'd already done this. I just saw 
that the function was needlessly complicated after your fix, so I 
thought I'd do a drive-by cleanup since part of the mess was my fault! :)

Thanks,

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 14:40           ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 14:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On 12/03/2020 14:27, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
>> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
>> condition early it's possible to remove the 'ret' variable and remove a
>> level of indentation from half the function making the code easier to
>> read.
>>
>> No functional change.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Thanks to Jason's changes there were only two code paths left using
>> the out_unlock label so it seemed like a good opportunity to
>> refactor.
> 
> Yes, I made something very similar, what do you think of this:
> 
> https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36

Even better! Sorry I didn't realise you'd already done this. I just saw 
that the function was needlessly complicated after your fix, so I 
thought I'd do a drive-by cleanup since part of the mess was my fault! :)

Thanks,

Steve
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 14:40           ` Steven Price
  (?)
@ 2020-03-12 15:11             ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 15:11 UTC (permalink / raw)
  To: Steven Price
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 02:40:08PM +0000, Steven Price wrote:
> On 12/03/2020 14:27, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
> > > By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
> > > condition early it's possible to remove the 'ret' variable and remove a
> > > level of indentation from half the function making the code easier to
> > > read.
> > > 
> > > No functional change.
> > > 
> > > Signed-off-by: Steven Price <steven.price@arm.com>
> > > Thanks to Jason's changes there were only two code paths left using
> > > the out_unlock label so it seemed like a good opportunity to
> > > refactor.
> > 
> > Yes, I made something very similar, what do you think of this:
> > 
> > https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36
> 
> Even better! Sorry I didn't realise you'd already done this. I just saw that
> the function was needlessly complicated after your fix, so I thought I'd do
> a drive-by cleanup since part of the mess was my fault! :)

No worries, I've got a lot of patches for hmm_range_fault right now,
just trying to organize them, test them and post them. Haven't posted
that one yet.

Actually, while you are looking at this, do you think we should be
adding at least READ_ONCE in the pagewalk.c walk_* functions? The
multiple references of pmd, pud, etc without locking seems sketchy to
me.

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 15:11             ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 15:11 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 02:40:08PM +0000, Steven Price wrote:
> On 12/03/2020 14:27, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
> > > By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
> > > condition early it's possible to remove the 'ret' variable and remove a
> > > level of indentation from half the function making the code easier to
> > > read.
> > > 
> > > No functional change.
> > > 
> > > Signed-off-by: Steven Price <steven.price@arm.com>
> > > Thanks to Jason's changes there were only two code paths left using
> > > the out_unlock label so it seemed like a good opportunity to
> > > refactor.
> > 
> > Yes, I made something very similar, what do you think of this:
> > 
> > https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36
> 
> Even better! Sorry I didn't realise you'd already done this. I just saw that
> the function was needlessly complicated after your fix, so I thought I'd do
> a drive-by cleanup since part of the mess was my fault! :)

No worries, I've got a lot of patches for hmm_range_fault right now,
just trying to organize them, test them and post them. Haven't posted
that one yet.

Actually, while you are looking at this, do you think we should be
adding at least READ_ONCE in the pagewalk.c walk_* functions? The
multiple references of pmd, pud, etc without locking seems sketchy to
me.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 15:11             ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 15:11 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 02:40:08PM +0000, Steven Price wrote:
> On 12/03/2020 14:27, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
> > > By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
> > > condition early it's possible to remove the 'ret' variable and remove a
> > > level of indentation from half the function making the code easier to
> > > read.
> > > 
> > > No functional change.
> > > 
> > > Signed-off-by: Steven Price <steven.price@arm.com>
> > > Thanks to Jason's changes there were only two code paths left using
> > > the out_unlock label so it seemed like a good opportunity to
> > > refactor.
> > 
> > Yes, I made something very similar, what do you think of this:
> > 
> > https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36
> 
> Even better! Sorry I didn't realise you'd already done this. I just saw that
> the function was needlessly complicated after your fix, so I thought I'd do
> a drive-by cleanup since part of the mess was my fault! :)

No worries, I've got a lot of patches for hmm_range_fault right now,
just trying to organize them, test them and post them. Haven't posted
that one yet.

Actually, while you are looking at this, do you think we should be
adding at least READ_ONCE in the pagewalk.c walk_* functions? The
multiple references of pmd, pud, etc without locking seems sketchy to
me.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 15:11             ` Jason Gunthorpe
  (?)
@ 2020-03-12 16:16               ` Steven Price
  -1 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 16:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On 12/03/2020 15:11, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 02:40:08PM +0000, Steven Price wrote:
>> On 12/03/2020 14:27, Jason Gunthorpe wrote:
>>> On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
>>>> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
>>>> condition early it's possible to remove the 'ret' variable and remove a
>>>> level of indentation from half the function making the code easier to
>>>> read.
>>>>
>>>> No functional change.
>>>>
>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>> Thanks to Jason's changes there were only two code paths left using
>>>> the out_unlock label so it seemed like a good opportunity to
>>>> refactor.
>>>
>>> Yes, I made something very similar, what do you think of this:
>>>
>>> https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36
>>
>> Even better! Sorry I didn't realise you'd already done this. I just saw that
>> the function was needlessly complicated after your fix, so I thought I'd do
>> a drive-by cleanup since part of the mess was my fault! :)
> 
> No worries, I've got a lot of patches for hmm_range_fault right now,
> just trying to organize them, test them and post them. Haven't posted
> that one yet.
> 
> Actually, while you are looking at this, do you think we should be
> adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> multiple references of pmd, pud, etc without locking seems sketchy to
> me.

I agree it seems worrying. I'm not entirely sure whether the holding of 
mmap_sem is sufficient, this isn't something that I changed so I've just 
been hoping that it's sufficient since it seems to have been working 
(whether that's by chance because the compiler didn't generate multiple 
reads I've no idea). For walking the kernel's page tables the lack of 
READ_ONCE is also not great, but at least for PTDUMP we don't care too 
much about accuracy and it should be crash proof because there's no RCU 
grace period. And again the code I was replacing didn't have any special 
protection.

I can't see any harm in updating the code to include READ_ONCE and I'm 
happy to review a patch.

Thanks,

Steve


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 16:16               ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 16:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On 12/03/2020 15:11, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 02:40:08PM +0000, Steven Price wrote:
>> On 12/03/2020 14:27, Jason Gunthorpe wrote:
>>> On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
>>>> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
>>>> condition early it's possible to remove the 'ret' variable and remove a
>>>> level of indentation from half the function making the code easier to
>>>> read.
>>>>
>>>> No functional change.
>>>>
>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>> Thanks to Jason's changes there were only two code paths left using
>>>> the out_unlock label so it seemed like a good opportunity to
>>>> refactor.
>>>
>>> Yes, I made something very similar, what do you think of this:
>>>
>>> https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36
>>
>> Even better! Sorry I didn't realise you'd already done this. I just saw that
>> the function was needlessly complicated after your fix, so I thought I'd do
>> a drive-by cleanup since part of the mess was my fault! :)
> 
> No worries, I've got a lot of patches for hmm_range_fault right now,
> just trying to organize them, test them and post them. Haven't posted
> that one yet.
> 
> Actually, while you are looking at this, do you think we should be
> adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> multiple references of pmd, pud, etc without locking seems sketchy to
> me.

I agree it seems worrying. I'm not entirely sure whether the holding of 
mmap_sem is sufficient, this isn't something that I changed so I've just 
been hoping that it's sufficient since it seems to have been working 
(whether that's by chance because the compiler didn't generate multiple 
reads I've no idea). For walking the kernel's page tables the lack of 
READ_ONCE is also not great, but at least for PTDUMP we don't care too 
much about accuracy and it should be crash proof because there's no RCU 
grace period. And again the code I was replacing didn't have any special 
protection.

I can't see any harm in updating the code to include READ_ONCE and I'm 
happy to review a patch.

Thanks,

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 16:16               ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 16:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On 12/03/2020 15:11, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 02:40:08PM +0000, Steven Price wrote:
>> On 12/03/2020 14:27, Jason Gunthorpe wrote:
>>> On Thu, Mar 12, 2020 at 10:28:13AM +0000, Steven Price wrote:
>>>> By refactoring to deal with the !pud_huge(pud) || !pud_devmap(pud)
>>>> condition early it's possible to remove the 'ret' variable and remove a
>>>> level of indentation from half the function making the code easier to
>>>> read.
>>>>
>>>> No functional change.
>>>>
>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>> Thanks to Jason's changes there were only two code paths left using
>>>> the out_unlock label so it seemed like a good opportunity to
>>>> refactor.
>>>
>>> Yes, I made something very similar, what do you think of this:
>>>
>>> https://github.com/jgunthorpe/linux/commit/93f0ed42ab3f9ceb27b58fb7c7c3ecaf60f16b36
>>
>> Even better! Sorry I didn't realise you'd already done this. I just saw that
>> the function was needlessly complicated after your fix, so I thought I'd do
>> a drive-by cleanup since part of the mess was my fault! :)
> 
> No worries, I've got a lot of patches for hmm_range_fault right now,
> just trying to organize them, test them and post them. Haven't posted
> that one yet.
> 
> Actually, while you are looking at this, do you think we should be
> adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> multiple references of pmd, pud, etc without locking seems sketchy to
> me.

I agree it seems worrying. I'm not entirely sure whether the holding of 
mmap_sem is sufficient, this isn't something that I changed so I've just 
been hoping that it's sufficient since it seems to have been working 
(whether that's by chance because the compiler didn't generate multiple 
reads I've no idea). For walking the kernel's page tables the lack of 
READ_ONCE is also not great, but at least for PTDUMP we don't care too 
much about accuracy and it should be crash proof because there's no RCU 
grace period. And again the code I was replacing didn't have any special 
protection.

I can't see any harm in updating the code to include READ_ONCE and I'm 
happy to review a patch.

Thanks,

Steve
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 16:16               ` Steven Price
  (?)
@ 2020-03-12 16:37                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 16:37 UTC (permalink / raw)
  To: Steven Price
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > Actually, while you are looking at this, do you think we should be
> > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > multiple references of pmd, pud, etc without locking seems sketchy to
> > me.
> 
> I agree it seems worrying. I'm not entirely sure whether the holding of
> mmap_sem is sufficient,

I looked at this question, and at least for PMD, mmap_sem is not
sufficient. I didn't easilly figure it out for the other ones

I'm guessing if PMD is not safe then none of them are.

> this isn't something that I changed so I've just
> been hoping that it's sufficient since it seems to have been working
> (whether that's by chance because the compiler didn't generate multiple
> reads I've no idea). For walking the kernel's page tables the lack of
> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> about accuracy and it should be crash proof because there's no RCU grace
> period. And again the code I was replacing didn't have any special
> protection.
>
> I can't see any harm in updating the code to include READ_ONCE and I'm happy
> to review a patch.

The reason I ask is because hmm's walkers often have this pattern
where they get the pointer and then de-ref it (again) then
immediately have to recheck the 'again' conditions of the walker
itself because the re-read may have given a different value.

Having the walker deref the pointer and pass the value it into the ops
for use rather than repeatedly de-refing an unlocked value seems like
a much safer design to me.

If this also implicitly relies on a RCU grace period then it is also
missing RCU locking...

I also didn't quite understand why walk_pte_range() skipped locking
the pte in the no_vma case - I don't get why vma would be related to
locking here.

I also saw that hmm open coded the pte walk, presumably for
performance, so I was thinking of adding some kind of pte_range()
callback to avoid the expensive indirect function call per pte, but
hmm also can't have the pmd locked...

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 16:37                 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 16:37 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > Actually, while you are looking at this, do you think we should be
> > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > multiple references of pmd, pud, etc without locking seems sketchy to
> > me.
> 
> I agree it seems worrying. I'm not entirely sure whether the holding of
> mmap_sem is sufficient,

I looked at this question, and at least for PMD, mmap_sem is not
sufficient. I didn't easilly figure it out for the other ones

I'm guessing if PMD is not safe then none of them are.

> this isn't something that I changed so I've just
> been hoping that it's sufficient since it seems to have been working
> (whether that's by chance because the compiler didn't generate multiple
> reads I've no idea). For walking the kernel's page tables the lack of
> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> about accuracy and it should be crash proof because there's no RCU grace
> period. And again the code I was replacing didn't have any special
> protection.
>
> I can't see any harm in updating the code to include READ_ONCE and I'm happy
> to review a patch.

The reason I ask is because hmm's walkers often have this pattern
where they get the pointer and then de-ref it (again) then
immediately have to recheck the 'again' conditions of the walker
itself because the re-read may have given a different value.

Having the walker deref the pointer and pass the value it into the ops
for use rather than repeatedly de-refing an unlocked value seems like
a much safer design to me.

If this also implicitly relies on a RCU grace period then it is also
missing RCU locking...

I also didn't quite understand why walk_pte_range() skipped locking
the pte in the no_vma case - I don't get why vma would be related to
locking here.

I also saw that hmm open coded the pte walk, presumably for
performance, so I was thinking of adding some kind of pte_range()
callback to avoid the expensive indirect function call per pte, but
hmm also can't have the pmd locked...

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 16:37                 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 16:37 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > Actually, while you are looking at this, do you think we should be
> > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > multiple references of pmd, pud, etc without locking seems sketchy to
> > me.
> 
> I agree it seems worrying. I'm not entirely sure whether the holding of
> mmap_sem is sufficient,

I looked at this question, and at least for PMD, mmap_sem is not
sufficient. I didn't easilly figure it out for the other ones

I'm guessing if PMD is not safe then none of them are.

> this isn't something that I changed so I've just
> been hoping that it's sufficient since it seems to have been working
> (whether that's by chance because the compiler didn't generate multiple
> reads I've no idea). For walking the kernel's page tables the lack of
> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> about accuracy and it should be crash proof because there's no RCU grace
> period. And again the code I was replacing didn't have any special
> protection.
>
> I can't see any harm in updating the code to include READ_ONCE and I'm happy
> to review a patch.

The reason I ask is because hmm's walkers often have this pattern
where they get the pointer and then de-ref it (again) then
immediately have to recheck the 'again' conditions of the walker
itself because the re-read may have given a different value.

Having the walker deref the pointer and pass the value it into the ops
for use rather than repeatedly de-refing an unlocked value seems like
a much safer design to me.

If this also implicitly relies on a RCU grace period then it is also
missing RCU locking...

I also didn't quite understand why walk_pte_range() skipped locking
the pte in the no_vma case - I don't get why vma would be related to
locking here.

I also saw that hmm open coded the pte walk, presumably for
performance, so I was thinking of adding some kind of pte_range()
callback to avoid the expensive indirect function call per pte, but
hmm also can't have the pmd locked...

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 16:37                 ` Jason Gunthorpe
  (?)
@ 2020-03-12 17:02                   ` Steven Price
  -1 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 17:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On 12/03/2020 16:37, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
>>> Actually, while you are looking at this, do you think we should be
>>> adding at least READ_ONCE in the pagewalk.c walk_* functions? The
>>> multiple references of pmd, pud, etc without locking seems sketchy to
>>> me.
>>
>> I agree it seems worrying. I'm not entirely sure whether the holding of
>> mmap_sem is sufficient,
> 
> I looked at this question, and at least for PMD, mmap_sem is not
> sufficient. I didn't easilly figure it out for the other ones
> 
> I'm guessing if PMD is not safe then none of them are.
> 
>> this isn't something that I changed so I've just
>> been hoping that it's sufficient since it seems to have been working
>> (whether that's by chance because the compiler didn't generate multiple
>> reads I've no idea). For walking the kernel's page tables the lack of
>> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
>> about accuracy and it should be crash proof because there's no RCU grace
>> period. And again the code I was replacing didn't have any special
>> protection.
>>
>> I can't see any harm in updating the code to include READ_ONCE and I'm happy
>> to review a patch.
> 
> The reason I ask is because hmm's walkers often have this pattern
> where they get the pointer and then de-ref it (again) then
> immediately have to recheck the 'again' conditions of the walker
> itself because the re-read may have given a different value.
> 
> Having the walker deref the pointer and pass the value it into the ops
> for use rather than repeatedly de-refing an unlocked value seems like
> a much safer design to me.

Yeah that sounds like a good idea.

> If this also implicitly relies on a RCU grace period then it is also
> missing RCU locking...

True - I'm not 100% sure in what situations a page table entry can be 
freed. Anshuman has added locking to deal with memory hotplug[1]. I 
believe this is sufficient.

[1] bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for 
kernel page table dump")

> I also didn't quite understand why walk_pte_range() skipped locking
> the pte in the no_vma case - I don't get why vma would be related to
> locking here.

The no_vma case is for walking the kernel's page tables and they may 
have entries that are not backed by struct page, so there isn't 
(reliably) a PTE lock to take.

> I also saw that hmm open coded the pte walk, presumably for
> performance, so I was thinking of adding some kind of pte_range()
> callback to avoid the expensive indirect function call per pte, but
> hmm also can't have the pmd locked...

Yeah the callback per PTE is a bit heavy because of the indirect 
function call. I'm not sure how to optimise it beyond open coding at the 
PMD level. One option would be to provide helper functions to make it a 
bit more generic.

Do you have an idea of what pte_range() would look like?

Steve


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 17:02                   ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 17:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On 12/03/2020 16:37, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
>>> Actually, while you are looking at this, do you think we should be
>>> adding at least READ_ONCE in the pagewalk.c walk_* functions? The
>>> multiple references of pmd, pud, etc without locking seems sketchy to
>>> me.
>>
>> I agree it seems worrying. I'm not entirely sure whether the holding of
>> mmap_sem is sufficient,
> 
> I looked at this question, and at least for PMD, mmap_sem is not
> sufficient. I didn't easilly figure it out for the other ones
> 
> I'm guessing if PMD is not safe then none of them are.
> 
>> this isn't something that I changed so I've just
>> been hoping that it's sufficient since it seems to have been working
>> (whether that's by chance because the compiler didn't generate multiple
>> reads I've no idea). For walking the kernel's page tables the lack of
>> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
>> about accuracy and it should be crash proof because there's no RCU grace
>> period. And again the code I was replacing didn't have any special
>> protection.
>>
>> I can't see any harm in updating the code to include READ_ONCE and I'm happy
>> to review a patch.
> 
> The reason I ask is because hmm's walkers often have this pattern
> where they get the pointer and then de-ref it (again) then
> immediately have to recheck the 'again' conditions of the walker
> itself because the re-read may have given a different value.
> 
> Having the walker deref the pointer and pass the value it into the ops
> for use rather than repeatedly de-refing an unlocked value seems like
> a much safer design to me.

Yeah that sounds like a good idea.

> If this also implicitly relies on a RCU grace period then it is also
> missing RCU locking...

True - I'm not 100% sure in what situations a page table entry can be 
freed. Anshuman has added locking to deal with memory hotplug[1]. I 
believe this is sufficient.

[1] bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for 
kernel page table dump")

> I also didn't quite understand why walk_pte_range() skipped locking
> the pte in the no_vma case - I don't get why vma would be related to
> locking here.

The no_vma case is for walking the kernel's page tables and they may 
have entries that are not backed by struct page, so there isn't 
(reliably) a PTE lock to take.

> I also saw that hmm open coded the pte walk, presumably for
> performance, so I was thinking of adding some kind of pte_range()
> callback to avoid the expensive indirect function call per pte, but
> hmm also can't have the pmd locked...

Yeah the callback per PTE is a bit heavy because of the indirect 
function call. I'm not sure how to optimise it beyond open coding at the 
PMD level. One option would be to provide helper functions to make it a 
bit more generic.

Do you have an idea of what pte_range() would look like?

Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 17:02                   ` Steven Price
  0 siblings, 0 replies; 153+ messages in thread
From: Steven Price @ 2020-03-12 17:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On 12/03/2020 16:37, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
>>> Actually, while you are looking at this, do you think we should be
>>> adding at least READ_ONCE in the pagewalk.c walk_* functions? The
>>> multiple references of pmd, pud, etc without locking seems sketchy to
>>> me.
>>
>> I agree it seems worrying. I'm not entirely sure whether the holding of
>> mmap_sem is sufficient,
> 
> I looked at this question, and at least for PMD, mmap_sem is not
> sufficient. I didn't easilly figure it out for the other ones
> 
> I'm guessing if PMD is not safe then none of them are.
> 
>> this isn't something that I changed so I've just
>> been hoping that it's sufficient since it seems to have been working
>> (whether that's by chance because the compiler didn't generate multiple
>> reads I've no idea). For walking the kernel's page tables the lack of
>> READ_ONCE is also not great, but at least for PTDUMP we don't care too much
>> about accuracy and it should be crash proof because there's no RCU grace
>> period. And again the code I was replacing didn't have any special
>> protection.
>>
>> I can't see any harm in updating the code to include READ_ONCE and I'm happy
>> to review a patch.
> 
> The reason I ask is because hmm's walkers often have this pattern
> where they get the pointer and then de-ref it (again) then
> immediately have to recheck the 'again' conditions of the walker
> itself because the re-read may have given a different value.
> 
> Having the walker deref the pointer and pass the value it into the ops
> for use rather than repeatedly de-refing an unlocked value seems like
> a much safer design to me.

Yeah that sounds like a good idea.

> If this also implicitly relies on a RCU grace period then it is also
> missing RCU locking...

True - I'm not 100% sure in what situations a page table entry can be 
freed. Anshuman has added locking to deal with memory hotplug[1]. I 
believe this is sufficient.

[1] bf2b59f60ee1 ("arm64/mm: Hold memory hotplug lock while walking for 
kernel page table dump")

> I also didn't quite understand why walk_pte_range() skipped locking
> the pte in the no_vma case - I don't get why vma would be related to
> locking here.

The no_vma case is for walking the kernel's page tables and they may 
have entries that are not backed by struct page, so there isn't 
(reliably) a PTE lock to take.

> I also saw that hmm open coded the pte walk, presumably for
> performance, so I was thinking of adding some kind of pte_range()
> callback to avoid the expensive indirect function call per pte, but
> hmm also can't have the pmd locked...

Yeah the callback per PTE is a bit heavy because of the indirect 
function call. I'm not sure how to optimise it beyond open coding at the 
PMD level. One option would be to provide helper functions to make it a 
bit more generic.

Do you have an idea of what pte_range() would look like?

Steve
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 17:02                   ` Steven Price
  (?)
@ 2020-03-12 17:17                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 17:17 UTC (permalink / raw)
  To: Steven Price
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:

> > Having the walker deref the pointer and pass the value it into the ops
> > for use rather than repeatedly de-refing an unlocked value seems like
> > a much safer design to me.
> 
> Yeah that sounds like a good idea.

Ok.. let see when I get this hmm & odp stuff cleared off
 
> > I also didn't quite understand why walk_pte_range() skipped locking
> > the pte in the no_vma case - I don't get why vma would be related to
> > locking here.
> 
> The no_vma case is for walking the kernel's page tables and they may have
> entries that are not backed by struct page, so there isn't (reliably) a PTE
> lock to take.

Oh, that is an interesting bit of insight..

> > I also saw that hmm open coded the pte walk, presumably for
> > performance, so I was thinking of adding some kind of pte_range()
> > callback to avoid the expensive indirect function call per pte, but
> > hmm also can't have the pmd locked...
> 
> Yeah the callback per PTE is a bit heavy because of the indirect function
> call. I'm not sure how to optimise it beyond open coding at the PMD level.
> One option would be to provide helper functions to make it a bit more
> generic.
> 
> Do you have an idea of what pte_range() would look like?

Basically just pass in the already mapped pte array just like is
already done at the tail of the pmd

The reason to do it like this is so that the common code in the walker
can correctly prove the pmd is pointing at a pte before trying to map
it.

This is complicated, and hmm at least already got it wrong when trying
to open code at the PMD level.

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 17:17                     ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 17:17 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:

> > Having the walker deref the pointer and pass the value it into the ops
> > for use rather than repeatedly de-refing an unlocked value seems like
> > a much safer design to me.
> 
> Yeah that sounds like a good idea.

Ok.. let see when I get this hmm & odp stuff cleared off
 
> > I also didn't quite understand why walk_pte_range() skipped locking
> > the pte in the no_vma case - I don't get why vma would be related to
> > locking here.
> 
> The no_vma case is for walking the kernel's page tables and they may have
> entries that are not backed by struct page, so there isn't (reliably) a PTE
> lock to take.

Oh, that is an interesting bit of insight..

> > I also saw that hmm open coded the pte walk, presumably for
> > performance, so I was thinking of adding some kind of pte_range()
> > callback to avoid the expensive indirect function call per pte, but
> > hmm also can't have the pmd locked...
> 
> Yeah the callback per PTE is a bit heavy because of the indirect function
> call. I'm not sure how to optimise it beyond open coding at the PMD level.
> One option would be to provide helper functions to make it a bit more
> generic.
> 
> Do you have an idea of what pte_range() would look like?

Basically just pass in the already mapped pte array just like is
already done at the tail of the pmd

The reason to do it like this is so that the common code in the walker
can correctly prove the pmd is pointing at a pte before trying to map
it.

This is complicated, and hmm at least already got it wrong when trying
to open code at the PMD level.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-12 17:17                     ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 17:17 UTC (permalink / raw)
  To: Steven Price
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:

> > Having the walker deref the pointer and pass the value it into the ops
> > for use rather than repeatedly de-refing an unlocked value seems like
> > a much safer design to me.
> 
> Yeah that sounds like a good idea.

Ok.. let see when I get this hmm & odp stuff cleared off
 
> > I also didn't quite understand why walk_pte_range() skipped locking
> > the pte in the no_vma case - I don't get why vma would be related to
> > locking here.
> 
> The no_vma case is for walking the kernel's page tables and they may have
> entries that are not backed by struct page, so there isn't (reliably) a PTE
> lock to take.

Oh, that is an interesting bit of insight..

> > I also saw that hmm open coded the pte walk, presumably for
> > performance, so I was thinking of adding some kind of pte_range()
> > callback to avoid the expensive indirect function call per pte, but
> > hmm also can't have the pmd locked...
> 
> Yeah the callback per PTE is a bit heavy because of the indirect function
> call. I'm not sure how to optimise it beyond open coding at the PMD level.
> One option would be to provide helper functions to make it a bit more
> generic.
> 
> Do you have an idea of what pte_range() would look like?

Basically just pass in the already mapped pte array just like is
already done at the tail of the pmd

The reason to do it like this is so that the common code in the walker
can correctly prove the pmd is pointing at a pte before trying to map
it.

This is complicated, and hmm at least already got it wrong when trying
to open code at the PMD level.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-12 19:33   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 19:33 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang

pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
fault is requested then the pfns should be returned with the not valid
flags.

It should not unconditionally fault if faulting is not requested.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Bonus patch, this one got found after I made the series..

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc190..6d9da4b0f0a9f8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
 			     &fault, &write_fault);
 
-	if (pmd_protnone(pmd) || fault || write_fault)
+	if (fault || write_fault)
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
@ 2020-03-12 19:33   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 19:33 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig

pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
fault is requested then the pfns should be returned with the not valid
flags.

It should not unconditionally fault if faulting is not requested.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Bonus patch, this one got found after I made the series..

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc190..6d9da4b0f0a9f8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
 			     &fault, &write_fault);
 
-	if (pmd_protnone(pmd) || fault || write_fault)
+	if (fault || write_fault)
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
@ 2020-03-12 19:33   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-12 19:33 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig

pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
fault is requested then the pfns should be returned with the not valid
flags.

It should not unconditionally fault if faulting is not requested.

Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Bonus patch, this one got found after I made the series..

diff --git a/mm/hmm.c b/mm/hmm.c
index ca33d086bdc190..6d9da4b0f0a9f8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
 			     &fault, &write_fault);
 
-	if (pmd_protnone(pmd) || fault || write_fault)
+	if (fault || write_fault)
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
  2020-03-12 19:33   ` Jason Gunthorpe
  (?)
@ 2020-03-12 23:50     ` Ralph Campbell
  -1 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12 23:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang


On 3/12/20 12:33 PM, Jason Gunthorpe wrote:
> pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
> fault is requested then the pfns should be returned with the not valid
> flags.
> 
> It should not unconditionally fault if faulting is not requested.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good to me.
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Bonus patch, this one got found after I made the series..
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index ca33d086bdc190..6d9da4b0f0a9f8 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
>   	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
>   			     &fault, &write_fault);
>   
> -	if (pmd_protnone(pmd) || fault || write_fault)
> +	if (fault || write_fault)
>   		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>   
>   	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> 


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
@ 2020-03-12 23:50     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12 23:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig


On 3/12/20 12:33 PM, Jason Gunthorpe wrote:
> pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
> fault is requested then the pfns should be returned with the not valid
> flags.
> 
> It should not unconditionally fault if faulting is not requested.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good to me.
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Bonus patch, this one got found after I made the series..
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index ca33d086bdc190..6d9da4b0f0a9f8 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
>   	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
>   			     &fault, &write_fault);
>   
> -	if (pmd_protnone(pmd) || fault || write_fault)
> +	if (fault || write_fault)
>   		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>   
>   	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
@ 2020-03-12 23:50     ` Ralph Campbell
  0 siblings, 0 replies; 153+ messages in thread
From: Ralph Campbell @ 2020-03-12 23:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Jerome Glisse, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig


On 3/12/20 12:33 PM, Jason Gunthorpe wrote:
> pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
> fault is requested then the pfns should be returned with the not valid
> flags.
> 
> It should not unconditionally fault if faulting is not requested.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good to me.
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

> ---
>   mm/hmm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Bonus patch, this one got found after I made the series..
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index ca33d086bdc190..6d9da4b0f0a9f8 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -226,7 +226,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
>   	hmm_range_need_fault(hmm_vma_walk, pfns, npages, cpu_flags,
>   			     &fault, &write_fault);
>   
> -	if (pmd_protnone(pmd) || fault || write_fault)
> +	if (fault || write_fault)
>   		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
>   
>   	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> 
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-12 17:02                   ` Steven Price
  (?)
@ 2020-03-13 19:55                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-13 19:55 UTC (permalink / raw)
  To: Steven Price, Matthew Wilcox
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, Philip Yang,
	John Hubbard, amd-gfx, linux-mm, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > Actually, while you are looking at this, do you think we should be
> > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > me.
> > > 
> > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > mmap_sem is sufficient,
> > 
> > I looked at this question, and at least for PMD, mmap_sem is not
> > sufficient. I didn't easilly figure it out for the other ones
> > 
> > I'm guessing if PMD is not safe then none of them are.
> > 
> > > this isn't something that I changed so I've just
> > > been hoping that it's sufficient since it seems to have been working
> > > (whether that's by chance because the compiler didn't generate multiple
> > > reads I've no idea). For walking the kernel's page tables the lack of
> > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > about accuracy and it should be crash proof because there's no RCU grace
> > > period. And again the code I was replacing didn't have any special
> > > protection.
> > > 
> > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > to review a patch.
> > 
> > The reason I ask is because hmm's walkers often have this pattern
> > where they get the pointer and then de-ref it (again) then
> > immediately have to recheck the 'again' conditions of the walker
> > itself because the re-read may have given a different value.
> > 
> > Having the walker deref the pointer and pass the value it into the ops
> > for use rather than repeatedly de-refing an unlocked value seems like
> > a much safer design to me.
> 
> Yeah that sounds like a good idea.

I'm looking at this now.. The PUD is also changing under the read
mmap_sem - and I was able to think up some race conditiony bugs
related to this. Have some patches now..

However, I haven't been able to understand why walk_page_range()
doesn't check pud_present() or pmd_present() before calling
pmd_offset_map() or pte_offset_map().

As far as I can see a non-present entry has a swap entry encoded in
it, and thus it seems like it is a bad idea to pass a non-present
entry to the two map functions. I think those should only be called
when the entry points to the next level in the page table  (so there
is something to map?)

I see you added !present tests for the !vma case, but why only there?

Is this a bug? Do you know how it works?

Is it something that was missed when people added non-present PUD and
PMD's?

Thanks,
Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-13 19:55                     ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-13 19:55 UTC (permalink / raw)
  To: Steven Price, Matthew Wilcox
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > Actually, while you are looking at this, do you think we should be
> > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > me.
> > > 
> > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > mmap_sem is sufficient,
> > 
> > I looked at this question, and at least for PMD, mmap_sem is not
> > sufficient. I didn't easilly figure it out for the other ones
> > 
> > I'm guessing if PMD is not safe then none of them are.
> > 
> > > this isn't something that I changed so I've just
> > > been hoping that it's sufficient since it seems to have been working
> > > (whether that's by chance because the compiler didn't generate multiple
> > > reads I've no idea). For walking the kernel's page tables the lack of
> > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > about accuracy and it should be crash proof because there's no RCU grace
> > > period. And again the code I was replacing didn't have any special
> > > protection.
> > > 
> > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > to review a patch.
> > 
> > The reason I ask is because hmm's walkers often have this pattern
> > where they get the pointer and then de-ref it (again) then
> > immediately have to recheck the 'again' conditions of the walker
> > itself because the re-read may have given a different value.
> > 
> > Having the walker deref the pointer and pass the value it into the ops
> > for use rather than repeatedly de-refing an unlocked value seems like
> > a much safer design to me.
> 
> Yeah that sounds like a good idea.

I'm looking at this now.. The PUD is also changing under the read
mmap_sem - and I was able to think up some race conditiony bugs
related to this. Have some patches now..

However, I haven't been able to understand why walk_page_range()
doesn't check pud_present() or pmd_present() before calling
pmd_offset_map() or pte_offset_map().

As far as I can see a non-present entry has a swap entry encoded in
it, and thus it seems like it is a bad idea to pass a non-present
entry to the two map functions. I think those should only be called
when the entry points to the next level in the page table  (so there
is something to map?)

I see you added !present tests for the !vma case, but why only there?

Is this a bug? Do you know how it works?

Is it something that was missed when people added non-present PUD and
PMD's?

Thanks,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-13 19:55                     ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-13 19:55 UTC (permalink / raw)
  To: Steven Price, Matthew Wilcox
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, linux-mm, Jerome Glisse, dri-devel, Christoph Hellwig

On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > Actually, while you are looking at this, do you think we should be
> > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > me.
> > > 
> > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > mmap_sem is sufficient,
> > 
> > I looked at this question, and at least for PMD, mmap_sem is not
> > sufficient. I didn't easilly figure it out for the other ones
> > 
> > I'm guessing if PMD is not safe then none of them are.
> > 
> > > this isn't something that I changed so I've just
> > > been hoping that it's sufficient since it seems to have been working
> > > (whether that's by chance because the compiler didn't generate multiple
> > > reads I've no idea). For walking the kernel's page tables the lack of
> > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > about accuracy and it should be crash proof because there's no RCU grace
> > > period. And again the code I was replacing didn't have any special
> > > protection.
> > > 
> > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > to review a patch.
> > 
> > The reason I ask is because hmm's walkers often have this pattern
> > where they get the pointer and then de-ref it (again) then
> > immediately have to recheck the 'again' conditions of the walker
> > itself because the re-read may have given a different value.
> > 
> > Having the walker deref the pointer and pass the value it into the ops
> > for use rather than repeatedly de-refing an unlocked value seems like
> > a much safer design to me.
> 
> Yeah that sounds like a good idea.

I'm looking at this now.. The PUD is also changing under the read
mmap_sem - and I was able to think up some race conditiony bugs
related to this. Have some patches now..

However, I haven't been able to understand why walk_page_range()
doesn't check pud_present() or pmd_present() before calling
pmd_offset_map() or pte_offset_map().

As far as I can see a non-present entry has a swap entry encoded in
it, and thus it seems like it is a bad idea to pass a non-present
entry to the two map functions. I think those should only be called
when the entry points to the next level in the page table  (so there
is something to map?)

I see you added !present tests for the !vma case, but why only there?

Is this a bug? Do you know how it works?

Is it something that was missed when people added non-present PUD and
PMD's?

Thanks,
Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-13 19:55                     ` Jason Gunthorpe
  (?)
@ 2020-03-13 21:04                       ` Matthew Wilcox
  -1 siblings, 0 replies; 153+ messages in thread
From: Matthew Wilcox @ 2020-03-13 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Steven Price, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig

On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > Actually, while you are looking at this, do you think we should be
> > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > me.
> > > > 
> > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > mmap_sem is sufficient,
> > > 
> > > I looked at this question, and at least for PMD, mmap_sem is not
> > > sufficient. I didn't easilly figure it out for the other ones
> > > 
> > > I'm guessing if PMD is not safe then none of them are.
> > > 
> > > > this isn't something that I changed so I've just
> > > > been hoping that it's sufficient since it seems to have been working
> > > > (whether that's by chance because the compiler didn't generate multiple
> > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > period. And again the code I was replacing didn't have any special
> > > > protection.
> > > > 
> > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > to review a patch.
> > > 
> > > The reason I ask is because hmm's walkers often have this pattern
> > > where they get the pointer and then de-ref it (again) then
> > > immediately have to recheck the 'again' conditions of the walker
> > > itself because the re-read may have given a different value.
> > > 
> > > Having the walker deref the pointer and pass the value it into the ops
> > > for use rather than repeatedly de-refing an unlocked value seems like
> > > a much safer design to me.
> > 
> > Yeah that sounds like a good idea.
> 
> I'm looking at this now.. The PUD is also changing under the read
> mmap_sem - and I was able to think up some race conditiony bugs
> related to this. Have some patches now..
> 
> However, I haven't been able to understand why walk_page_range()
> doesn't check pud_present() or pmd_present() before calling
> pmd_offset_map() or pte_offset_map().
> 
> As far as I can see a non-present entry has a swap entry encoded in
> it, and thus it seems like it is a bad idea to pass a non-present
> entry to the two map functions. I think those should only be called
> when the entry points to the next level in the page table  (so there
> is something to map?)
> 
> I see you added !present tests for the !vma case, but why only there?
> 
> Is this a bug? Do you know how it works?
> 
> Is it something that was missed when people added non-present PUD and
> PMD's?

... I'm sorry, I did what now?  As far as I can tell, you're talking
about mm/pagewalk.c, and the only commit I have in that file is
a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
PUD-sized transparent hugepages", which I think I was pretty clear
from the commit message is basically copy-and-paste from the PMD code.
I have no clue why most of the decisions in the MM were made.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-13 21:04                       ` Matthew Wilcox
  0 siblings, 0 replies; 153+ messages in thread
From: Matthew Wilcox @ 2020-03-13 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, Steven Price, linux-mm, Jerome Glisse, dri-devel,
	Christoph Hellwig

On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > Actually, while you are looking at this, do you think we should be
> > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > me.
> > > > 
> > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > mmap_sem is sufficient,
> > > 
> > > I looked at this question, and at least for PMD, mmap_sem is not
> > > sufficient. I didn't easilly figure it out for the other ones
> > > 
> > > I'm guessing if PMD is not safe then none of them are.
> > > 
> > > > this isn't something that I changed so I've just
> > > > been hoping that it's sufficient since it seems to have been working
> > > > (whether that's by chance because the compiler didn't generate multiple
> > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > period. And again the code I was replacing didn't have any special
> > > > protection.
> > > > 
> > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > to review a patch.
> > > 
> > > The reason I ask is because hmm's walkers often have this pattern
> > > where they get the pointer and then de-ref it (again) then
> > > immediately have to recheck the 'again' conditions of the walker
> > > itself because the re-read may have given a different value.
> > > 
> > > Having the walker deref the pointer and pass the value it into the ops
> > > for use rather than repeatedly de-refing an unlocked value seems like
> > > a much safer design to me.
> > 
> > Yeah that sounds like a good idea.
> 
> I'm looking at this now.. The PUD is also changing under the read
> mmap_sem - and I was able to think up some race conditiony bugs
> related to this. Have some patches now..
> 
> However, I haven't been able to understand why walk_page_range()
> doesn't check pud_present() or pmd_present() before calling
> pmd_offset_map() or pte_offset_map().
> 
> As far as I can see a non-present entry has a swap entry encoded in
> it, and thus it seems like it is a bad idea to pass a non-present
> entry to the two map functions. I think those should only be called
> when the entry points to the next level in the page table  (so there
> is something to map?)
> 
> I see you added !present tests for the !vma case, but why only there?
> 
> Is this a bug? Do you know how it works?
> 
> Is it something that was missed when people added non-present PUD and
> PMD's?

... I'm sorry, I did what now?  As far as I can tell, you're talking
about mm/pagewalk.c, and the only commit I have in that file is
a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
PUD-sized transparent hugepages", which I think I was pretty clear
from the commit message is basically copy-and-paste from the PMD code.
I have no clue why most of the decisions in the MM were made.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-13 21:04                       ` Matthew Wilcox
  0 siblings, 0 replies; 153+ messages in thread
From: Matthew Wilcox @ 2020-03-13 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, Steven Price, linux-mm, Jerome Glisse, dri-devel,
	Christoph Hellwig

On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > Actually, while you are looking at this, do you think we should be
> > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > me.
> > > > 
> > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > mmap_sem is sufficient,
> > > 
> > > I looked at this question, and at least for PMD, mmap_sem is not
> > > sufficient. I didn't easilly figure it out for the other ones
> > > 
> > > I'm guessing if PMD is not safe then none of them are.
> > > 
> > > > this isn't something that I changed so I've just
> > > > been hoping that it's sufficient since it seems to have been working
> > > > (whether that's by chance because the compiler didn't generate multiple
> > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > period. And again the code I was replacing didn't have any special
> > > > protection.
> > > > 
> > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > to review a patch.
> > > 
> > > The reason I ask is because hmm's walkers often have this pattern
> > > where they get the pointer and then de-ref it (again) then
> > > immediately have to recheck the 'again' conditions of the walker
> > > itself because the re-read may have given a different value.
> > > 
> > > Having the walker deref the pointer and pass the value it into the ops
> > > for use rather than repeatedly de-refing an unlocked value seems like
> > > a much safer design to me.
> > 
> > Yeah that sounds like a good idea.
> 
> I'm looking at this now.. The PUD is also changing under the read
> mmap_sem - and I was able to think up some race conditiony bugs
> related to this. Have some patches now..
> 
> However, I haven't been able to understand why walk_page_range()
> doesn't check pud_present() or pmd_present() before calling
> pmd_offset_map() or pte_offset_map().
> 
> As far as I can see a non-present entry has a swap entry encoded in
> it, and thus it seems like it is a bad idea to pass a non-present
> entry to the two map functions. I think those should only be called
> when the entry points to the next level in the page table  (so there
> is something to map?)
> 
> I see you added !present tests for the !vma case, but why only there?
> 
> Is this a bug? Do you know how it works?
> 
> Is it something that was missed when people added non-present PUD and
> PMD's?

... I'm sorry, I did what now?  As far as I can tell, you're talking
about mm/pagewalk.c, and the only commit I have in that file is
a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
PUD-sized transparent hugepages", which I think I was pretty clear
from the commit message is basically copy-and-paste from the PMD code.
I have no clue why most of the decisions in the MM were made.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
  2020-03-13 21:04                       ` Matthew Wilcox
  (?)
@ 2020-03-13 22:51                         ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-13 22:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Steven Price, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig

On Fri, Mar 13, 2020 at 02:04:46PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > > Actually, while you are looking at this, do you think we should be
> > > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > > me.
> > > > > 
> > > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > > mmap_sem is sufficient,
> > > > 
> > > > I looked at this question, and at least for PMD, mmap_sem is not
> > > > sufficient. I didn't easilly figure it out for the other ones
> > > > 
> > > > I'm guessing if PMD is not safe then none of them are.
> > > > 
> > > > > this isn't something that I changed so I've just
> > > > > been hoping that it's sufficient since it seems to have been working
> > > > > (whether that's by chance because the compiler didn't generate multiple
> > > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > > period. And again the code I was replacing didn't have any special
> > > > > protection.
> > > > > 
> > > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > > to review a patch.
> > > > 
> > > > The reason I ask is because hmm's walkers often have this pattern
> > > > where they get the pointer and then de-ref it (again) then
> > > > immediately have to recheck the 'again' conditions of the walker
> > > > itself because the re-read may have given a different value.
> > > > 
> > > > Having the walker deref the pointer and pass the value it into the ops
> > > > for use rather than repeatedly de-refing an unlocked value seems like
> > > > a much safer design to me.
> > > 
> > > Yeah that sounds like a good idea.
> > 
> > I'm looking at this now.. The PUD is also changing under the read
> > mmap_sem - and I was able to think up some race conditiony bugs
> > related to this. Have some patches now..
> > 
> > However, I haven't been able to understand why walk_page_range()
> > doesn't check pud_present() or pmd_present() before calling
> > pmd_offset_map() or pte_offset_map().
> > 
> > As far as I can see a non-present entry has a swap entry encoded in
> > it, and thus it seems like it is a bad idea to pass a non-present
> > entry to the two map functions. I think those should only be called
> > when the entry points to the next level in the page table  (so there
> > is something to map?)
> > 
> > I see you added !present tests for the !vma case, but why only there?
> > 
> > Is this a bug? Do you know how it works?
> > 
> > Is it something that was missed when people added non-present PUD and
> > PMD's?
> 
> ... I'm sorry, I did what now? 

No, no, just widening to see if someone knows

> As far as I can tell, you're talking
> about mm/pagewalk.c, and the only commit I have in that file is
> a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
> PUD-sized transparent hugepages", which I think I was pretty clear
> from the commit message is basically copy-and-paste from the PMD
> code.

Right, which added the split_huge_pud() which seems maybe related to
pud_present, or maybe not, I don't know.

> I have no clue why most of the decisions in the MM were made.

Fun!

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-13 22:51                         ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-13 22:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, Steven Price, linux-mm, Jerome Glisse, dri-devel,
	Christoph Hellwig

On Fri, Mar 13, 2020 at 02:04:46PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > > Actually, while you are looking at this, do you think we should be
> > > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > > me.
> > > > > 
> > > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > > mmap_sem is sufficient,
> > > > 
> > > > I looked at this question, and at least for PMD, mmap_sem is not
> > > > sufficient. I didn't easilly figure it out for the other ones
> > > > 
> > > > I'm guessing if PMD is not safe then none of them are.
> > > > 
> > > > > this isn't something that I changed so I've just
> > > > > been hoping that it's sufficient since it seems to have been working
> > > > > (whether that's by chance because the compiler didn't generate multiple
> > > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > > period. And again the code I was replacing didn't have any special
> > > > > protection.
> > > > > 
> > > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > > to review a patch.
> > > > 
> > > > The reason I ask is because hmm's walkers often have this pattern
> > > > where they get the pointer and then de-ref it (again) then
> > > > immediately have to recheck the 'again' conditions of the walker
> > > > itself because the re-read may have given a different value.
> > > > 
> > > > Having the walker deref the pointer and pass the value it into the ops
> > > > for use rather than repeatedly de-refing an unlocked value seems like
> > > > a much safer design to me.
> > > 
> > > Yeah that sounds like a good idea.
> > 
> > I'm looking at this now.. The PUD is also changing under the read
> > mmap_sem - and I was able to think up some race conditiony bugs
> > related to this. Have some patches now..
> > 
> > However, I haven't been able to understand why walk_page_range()
> > doesn't check pud_present() or pmd_present() before calling
> > pmd_offset_map() or pte_offset_map().
> > 
> > As far as I can see a non-present entry has a swap entry encoded in
> > it, and thus it seems like it is a bad idea to pass a non-present
> > entry to the two map functions. I think those should only be called
> > when the entry points to the next level in the page table  (so there
> > is something to map?)
> > 
> > I see you added !present tests for the !vma case, but why only there?
> > 
> > Is this a bug? Do you know how it works?
> > 
> > Is it something that was missed when people added non-present PUD and
> > PMD's?
> 
> ... I'm sorry, I did what now? 

No, no, just widening to see if someone knows

> As far as I can tell, you're talking
> about mm/pagewalk.c, and the only commit I have in that file is
> a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
> PUD-sized transparent hugepages", which I think I was pretty clear
> from the commit message is basically copy-and-paste from the PMD
> code.

Right, which added the split_huge_pud() which seems maybe related to
pud_present, or maybe not, I don't know.

> I have no clue why most of the decisions in the MM were made.

Fun!

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly
@ 2020-03-13 22:51                         ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-13 22:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	amd-gfx, Steven Price, linux-mm, Jerome Glisse, dri-devel,
	Christoph Hellwig

On Fri, Mar 13, 2020 at 02:04:46PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 13, 2020 at 04:55:50PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 12, 2020 at 05:02:18PM +0000, Steven Price wrote:
> > > On 12/03/2020 16:37, Jason Gunthorpe wrote:
> > > > On Thu, Mar 12, 2020 at 04:16:33PM +0000, Steven Price wrote:
> > > > > > Actually, while you are looking at this, do you think we should be
> > > > > > adding at least READ_ONCE in the pagewalk.c walk_* functions? The
> > > > > > multiple references of pmd, pud, etc without locking seems sketchy to
> > > > > > me.
> > > > > 
> > > > > I agree it seems worrying. I'm not entirely sure whether the holding of
> > > > > mmap_sem is sufficient,
> > > > 
> > > > I looked at this question, and at least for PMD, mmap_sem is not
> > > > sufficient. I didn't easilly figure it out for the other ones
> > > > 
> > > > I'm guessing if PMD is not safe then none of them are.
> > > > 
> > > > > this isn't something that I changed so I've just
> > > > > been hoping that it's sufficient since it seems to have been working
> > > > > (whether that's by chance because the compiler didn't generate multiple
> > > > > reads I've no idea). For walking the kernel's page tables the lack of
> > > > > READ_ONCE is also not great, but at least for PTDUMP we don't care too much
> > > > > about accuracy and it should be crash proof because there's no RCU grace
> > > > > period. And again the code I was replacing didn't have any special
> > > > > protection.
> > > > > 
> > > > > I can't see any harm in updating the code to include READ_ONCE and I'm happy
> > > > > to review a patch.
> > > > 
> > > > The reason I ask is because hmm's walkers often have this pattern
> > > > where they get the pointer and then de-ref it (again) then
> > > > immediately have to recheck the 'again' conditions of the walker
> > > > itself because the re-read may have given a different value.
> > > > 
> > > > Having the walker deref the pointer and pass the value it into the ops
> > > > for use rather than repeatedly de-refing an unlocked value seems like
> > > > a much safer design to me.
> > > 
> > > Yeah that sounds like a good idea.
> > 
> > I'm looking at this now.. The PUD is also changing under the read
> > mmap_sem - and I was able to think up some race conditiony bugs
> > related to this. Have some patches now..
> > 
> > However, I haven't been able to understand why walk_page_range()
> > doesn't check pud_present() or pmd_present() before calling
> > pmd_offset_map() or pte_offset_map().
> > 
> > As far as I can see a non-present entry has a swap entry encoded in
> > it, and thus it seems like it is a bad idea to pass a non-present
> > entry to the two map functions. I think those should only be called
> > when the entry points to the next level in the page table  (so there
> > is something to map?)
> > 
> > I see you added !present tests for the !vma case, but why only there?
> > 
> > Is this a bug? Do you know how it works?
> > 
> > Is it something that was missed when people added non-present PUD and
> > PMD's?
> 
> ... I'm sorry, I did what now? 

No, no, just widening to see if someone knows

> As far as I can tell, you're talking
> about mm/pagewalk.c, and the only commit I have in that file is
> a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517 ("mm, x86: add support for
> PUD-sized transparent hugepages", which I think I was pretty clear
> from the commit message is basically copy-and-paste from the PMD
> code.

Right, which added the split_huge_pud() which seems maybe related to
pud_present, or maybe not, I don't know.

> I have no clue why most of the decisions in the MM were made.

Fun!

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
  2020-03-11 18:34   ` Jason Gunthorpe
@ 2020-03-16  8:58     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  8:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:34:59PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Many of the direct returns of error skipped doing the pte_unmap(). All non
> zero exit paths must unmap the pte.
> 
> The pte_unmap() is split unnaturally like this because some of the error
> exit paths trigger a sleep and must release the lock before sleeping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Probably papered over by the fact that hmm doesn't work on architectures
where pte_unmap isn't a noop right now..


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
@ 2020-03-16  8:58     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  8:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:34:59PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Many of the direct returns of error skipped doing the pte_unmap(). All non
> zero exit paths must unmap the pte.
> 
> The pte_unmap() is split unnaturally like this because some of the error
> exit paths trigger a sleep and must release the lock before sleeping.
> 
> Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
> Fixes: 53f5c3f489ec ("mm/hmm: factor out pte and pmd handling to simplify hmm_vma_walk_pmd()")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Probably papered over by the fact that hmm doesn't work on architectures
where pte_unmap isn't a noop right now..
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:02     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:35:00PM -0300, Jason Gunthorpe wrote:
> @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>  			return -EBUSY;
>  		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
>  				      &hmm_walk_ops, &hmm_vma_walk);
> +		/*
> +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> +		 * searching in the probably common case that the pgmap is the
> +		 * same for the entire requested range.
> +		 */
> +		if (hmm_vma_walk.pgmap) {
> +			put_dev_pagemap(hmm_vma_walk.pgmap);
> +			hmm_vma_walk.pgmap = NULL;
> +		}
>  	} while (ret == -EBUSY);

In which case it should only be put on return, and not for every loop.

I still think the right fix is to just delete all the unused and broken
pgmap handling code.  If we ever need to add it back it can be added
in a proper understood and tested way.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-16  9:02     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:00PM -0300, Jason Gunthorpe wrote:
> @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>  			return -EBUSY;
>  		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
>  				      &hmm_walk_ops, &hmm_vma_walk);
> +		/*
> +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> +		 * searching in the probably common case that the pgmap is the
> +		 * same for the entire requested range.
> +		 */
> +		if (hmm_vma_walk.pgmap) {
> +			put_dev_pagemap(hmm_vma_walk.pgmap);
> +			hmm_vma_walk.pgmap = NULL;
> +		}
>  	} while (ret == -EBUSY);

In which case it should only be put on return, and not for every loop.

I still think the right fix is to just delete all the unused and broken
pgmap handling code.  If we ever need to add it back it can be added
in a proper understood and tested way.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:05     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe, Steven Price

On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.

So how did this manage to not be noticed before?

The fix looks fine assuming we want something backportable before
starting the cleanups:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-16  9:05     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, Steven Price, linux-mm, Jerome Glisse,
	Jason Gunthorpe, amd-gfx, Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This eventually calls into handle_mm_fault() which is a sleeping function.
> Release the lock first.
> 
> hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> need the lock.

So how did this manage to not be noticed before?

The fix looks fine assuming we want something backportable before
starting the cleanups:

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:06     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:35:02PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All success exit paths from the walker functions must set the pfns array.
> 
> A migration entry with no required fault is a HMM_PFN_NONE return, just
> like the pte case.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
@ 2020-03-16  9:06     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:02PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All success exit paths from the walker functions must set the pfns array.
> 
> A migration entry with no required fault is a HMM_PFN_NONE return, just
> like the pte case.
> 
> Fixes: d08faca018c4 ("mm/hmm: properly handle migration pmd")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:07     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:35:03PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All return paths that do EFAULT must call hmm_range_need_fault() to
> determine if the user requires this page to be valid.
> 
> If the page cannot be made valid if the user later requires it, due to vma
> flags in this case, then the return should be HMM_PFN_ERROR.
> 
> Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT
@ 2020-03-16  9:07     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:03PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> All return paths that do EFAULT must call hmm_range_need_fault() to
> determine if the user requires this page to be valid.
> 
> If the page cannot be made valid if the user later requires it, due to vma
> flags in this case, then the return should be HMM_PFN_ERROR.
> 
> Fixes: a3e0d41c2b1f ("mm/hmm: improve driver API to work and wait over a range")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:11     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
@ 2020-03-16  9:11     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:12     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:35:05PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> hmm_range_fault() should never return 0 if the caller requested a valid
> page, but the pfns output for that page would be HMM_PFN_ERROR.
> 
> hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
> detect if the page is in faulting mode or not.
> 
> Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
> code.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages
@ 2020-03-16  9:12     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:05PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> hmm_range_fault() should never return 0 if the caller requested a valid
> page, but the pfns output for that page would be HMM_PFN_ERROR.
> 
> hmm_pte_need_fault() must always be called before setting HMM_PFN_ERROR to
> detect if the page is in faulting mode or not.
> 
> Fix two cases in hmm_vma_walk_pmd() and reorganize some of the duplicated
> code.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16  9:13     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.

I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
fault on special ptes that aren't the zero page as an error.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16  9:13     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.

I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
fault on special ptes that aren't the zero page as an error.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
  2020-03-12 19:33   ` Jason Gunthorpe
@ 2020-03-16  9:14     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang

On Thu, Mar 12, 2020 at 04:33:10PM -0300, Jason Gunthorpe wrote:
> pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
> fault is requested then the pfns should be returned with the not valid
> flags.
> 
> It should not unconditionally fault if faulting is not requested.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  mm/hmm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Bonus patch, this one got found after I made the series..

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()
@ 2020-03-16  9:14     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16  9:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx, Christoph Hellwig

On Thu, Mar 12, 2020 at 04:33:10PM -0300, Jason Gunthorpe wrote:
> pmd_to_hmm_pfn_flags() already checks it and makes the cpu flags 0. If no
> fault is requested then the pfns should be returned with the not valid
> flags.
> 
> It should not unconditionally fault if faulting is not requested.
> 
> Fixes: 2aee09d8c116 ("mm/hmm: change hmm_vma_fault() to allow write fault on page basis")
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  mm/hmm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Bonus patch, this one got found after I made the series..

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-16  9:13     ` Christoph Hellwig
  (?)
@ 2020-03-16 12:10       ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 12:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > Currently if a special PTE is encountered hmm_range_fault() immediately
> > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > uses).
> > 
> > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > 
> > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > check IS_ENABLED, pte_special() is stubbed out to return false on
> > unsupported architectures.
> 
> I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> fault on special ptes that aren't the zero page as an error.
 
I have another series that is doing that - this change is to make the
next series make sense and not introduce new control logic too.

Even when this is switched to ERROR it still needs to have the
hmm_range_fault() logic this patch introduces.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 12:10       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 12:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > Currently if a special PTE is encountered hmm_range_fault() immediately
> > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > uses).
> > 
> > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > 
> > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > check IS_ENABLED, pte_special() is stubbed out to return false on
> > unsupported architectures.
> 
> I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> fault on special ptes that aren't the zero page as an error.
 
I have another series that is doing that - this change is to make the
next series make sense and not introduce new control logic too.

Even when this is switched to ERROR it still needs to have the
hmm_range_fault() logic this patch introduces.

Thanks,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 12:10       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 12:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > Currently if a special PTE is encountered hmm_range_fault() immediately
> > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > uses).
> > 
> > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > 
> > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > check IS_ENABLED, pte_special() is stubbed out to return false on
> > unsupported architectures.
> 
> I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> fault on special ptes that aren't the zero page as an error.
 
I have another series that is doing that - this change is to make the
next series make sense and not introduce new control logic too.

Even when this is switched to ERROR it still needs to have the
hmm_range_fault() logic this patch introduces.

Thanks,
Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-16 12:10       ` Jason Gunthorpe
@ 2020-03-16 12:49         ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	linux-mm, John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > 
> > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > uses).
> > > 
> > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > 
> > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > unsupported architectures.
> > 
> > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > fault on special ptes that aren't the zero page as an error.
>  
> I have another series that is doing that - this change is to make the
> next series make sense and not introduce new control logic too.

Ok.  I had some cleanups like this based of older trees, but if you are
active in this area I think I'll let you handle it.

> Even when this is switched to ERROR it still needs to have the
> hmm_range_fault() logic this patch introduces.

True.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 12:49         ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 12:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx, Christoph Hellwig

On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > 
> > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > uses).
> > > 
> > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > 
> > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > unsupported architectures.
> > 
> > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > fault on special ptes that aren't the zero page as an error.
>  
> I have another series that is doing that - this change is to make the
> next series make sense and not introduce new control logic too.

Ok.  I had some cleanups like this based of older trees, but if you are
active in this area I think I'll let you handle it.

> Even when this is switched to ERROR it still needs to have the
> hmm_range_fault() logic this patch introduces.

True.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-11 18:35   ` Jason Gunthorpe
@ 2020-03-16 12:51     ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 12:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Christoph Hellwig, Philip Yang,
	Jason Gunthorpe

On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 12:51     ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 12:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, Jason Gunthorpe, amd-gfx,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> Currently if a special PTE is encountered hmm_range_fault() immediately
> returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> uses).
> 
> EFAULT should only be returned after testing with hmm_pte_need_fault().
> 
> Also pte_devmap() and pte_special() are exclusive, and there is no need to
> check IS_ENABLED, pte_special() is stubbed out to return false on
> unsupported architectures.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
  2020-03-16  9:05     ` Christoph Hellwig
  (?)
@ 2020-03-16 12:56       ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 12:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang, Steven Price

On Mon, Mar 16, 2020 at 10:05:03AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > This eventually calls into handle_mm_fault() which is a sleeping function.
> > Release the lock first.
> > 
> > hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> > need the lock.
> 
> So how did this manage to not be noticed before?

I'm still struggling a bit with how this PUD stuff works.. However
AFAICT:

1) The first hunk around pud_none() is actualy covering a race. In the
   non-race case the page walker will directly call
   hmm_vma_walk_hole() for pud_none so it would be very hard to hit
   this

2) I'm not 100% sure, but I think pud_present() == pud_none(), as
   there is no swap entry for PUD I don't know what a non-present
   but non-none entry is or how to create one. This is possibly dead
   code due to #1

3) To hit hmm_range_need_fault() requesting fault you would need
   a read-only huge PUD and a fault requesting write. I suspect
   creating a read only huge PUD entry is very rare - not something
   someone would deliberately construct.

4) To even trigger the PUD flow at all you need the 1G THP or the
   special 1G DAX pages which I strongly suspect people are not
   testing with.

> The fix looks fine assuming we want something backportable before
> starting the cleanups:

I found it easier to make the other cleanup patches make sense if they
didn't introduce all kinds of new logic too..

Thanks,
Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-16 12:56       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 12:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, Steven Price, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 10:05:03AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > This eventually calls into handle_mm_fault() which is a sleeping function.
> > Release the lock first.
> > 
> > hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> > need the lock.
> 
> So how did this manage to not be noticed before?

I'm still struggling a bit with how this PUD stuff works.. However
AFAICT:

1) The first hunk around pud_none() is actualy covering a race. In the
   non-race case the page walker will directly call
   hmm_vma_walk_hole() for pud_none so it would be very hard to hit
   this

2) I'm not 100% sure, but I think pud_present() == pud_none(), as
   there is no swap entry for PUD I don't know what a non-present
   but non-none entry is or how to create one. This is possibly dead
   code due to #1

3) To hit hmm_range_need_fault() requesting fault you would need
   a read-only huge PUD and a fault requesting write. I suspect
   creating a read only huge PUD entry is very rare - not something
   someone would deliberately construct.

4) To even trigger the PUD flow at all you need the 1G THP or the
   special 1G DAX pages which I strongly suspect people are not
   testing with.

> The fix looks fine assuming we want something backportable before
> starting the cleanups:

I found it easier to make the other cleanup patches make sense if they
didn't introduce all kinds of new logic too..

Thanks,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
@ 2020-03-16 12:56       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 12:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, Steven Price, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 10:05:03AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:01PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > This eventually calls into handle_mm_fault() which is a sleeping function.
> > Release the lock first.
> > 
> > hmm_vma_walk_hole() does not touch the contents of the PUD, so it does not
> > need the lock.
> 
> So how did this manage to not be noticed before?

I'm still struggling a bit with how this PUD stuff works.. However
AFAICT:

1) The first hunk around pud_none() is actualy covering a race. In the
   non-race case the page walker will directly call
   hmm_vma_walk_hole() for pud_none so it would be very hard to hit
   this

2) I'm not 100% sure, but I think pud_present() == pud_none(), as
   there is no swap entry for PUD I don't know what a non-present
   but non-none entry is or how to create one. This is possibly dead
   code due to #1

3) To hit hmm_range_need_fault() requesting fault you would need
   a read-only huge PUD and a fault requesting write. I suspect
   creating a read only huge PUD entry is very rare - not something
   someone would deliberately construct.

4) To even trigger the PUD flow at all you need the 1G THP or the
   special 1G DAX pages which I strongly suspect people are not
   testing with.

> The fix looks fine assuming we want something backportable before
> starting the cleanups:

I found it easier to make the other cleanup patches make sense if they
didn't introduce all kinds of new logic too..

Thanks,
Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-16 12:49         ` Christoph Hellwig
  (?)
@ 2020-03-16 13:04           ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 01:49:53PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> > On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > 
> > > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > > uses).
> > > > 
> > > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > > 
> > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > > unsupported architectures.
> > > 
> > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > > fault on special ptes that aren't the zero page as an error.
> >  
> > I have another series that is doing that - this change is to make the
> > next series make sense and not introduce new control logic too.
> 
> Ok.  I had some cleanups like this based of older trees, but if you are
> active in this area I think I'll let you handle it.

You once said you wanted to loose the weird pfn flags scheme, so
before putting hmm_range_fault in ODP I planned to do that.

If you have your series someplace send me a URL and I'll look on it

Thanks,
Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 13:04           ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 01:49:53PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> > On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > 
> > > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > > uses).
> > > > 
> > > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > > 
> > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > > unsupported architectures.
> > > 
> > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > > fault on special ptes that aren't the zero page as an error.
> >  
> > I have another series that is doing that - this change is to make the
> > next series make sense and not introduce new control logic too.
> 
> Ok.  I had some cleanups like this based of older trees, but if you are
> active in this area I think I'll let you handle it.

You once said you wanted to loose the weird pfn flags scheme, so
before putting hmm_range_fault in ODP I planned to do that.

If you have your series someplace send me a URL and I'll look on it

Thanks,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 13:04           ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 01:49:53PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 09:10:53AM -0300, Jason Gunthorpe wrote:
> > On Mon, Mar 16, 2020 at 10:13:47AM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 11, 2020 at 03:35:06PM -0300, Jason Gunthorpe wrote:
> > > > From: Jason Gunthorpe <jgg@mellanox.com>
> > > > 
> > > > Currently if a special PTE is encountered hmm_range_fault() immediately
> > > > returns EFAULT and sets the HMM_PFN_SPECIAL error output (which nothing
> > > > uses).
> > > > 
> > > > EFAULT should only be returned after testing with hmm_pte_need_fault().
> > > > 
> > > > Also pte_devmap() and pte_special() are exclusive, and there is no need to
> > > > check IS_ENABLED, pte_special() is stubbed out to return false on
> > > > unsupported architectures.
> > > 
> > > I think the right fix is to just kill HMM_PFN_SPECIAL and treat any
> > > fault on special ptes that aren't the zero page as an error.
> >  
> > I have another series that is doing that - this change is to make the
> > next series make sense and not introduce new control logic too.
> 
> Ok.  I had some cleanups like this based of older trees, but if you are
> active in this area I think I'll let you handle it.

You once said you wanted to loose the weird pfn flags scheme, so
before putting hmm_range_fault in ODP I planned to do that.

If you have your series someplace send me a URL and I'll look on it

Thanks,
Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-16 13:04           ` Jason Gunthorpe
@ 2020-03-16 13:12             ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	linux-mm, John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > Ok.  I had some cleanups like this based of older trees, but if you are
> > active in this area I think I'll let you handle it.
> 
> You once said you wanted to loose the weird pfn flags scheme, so
> before putting hmm_range_fault in ODP I planned to do that.
> 
> If you have your series someplace send me a URL and I'll look on it

I have a local branch I just started hacking on, but it is rather broken
based on various discussions we had.  But for a basic one I'd suggest
something like:

 - kill HMM_PFN_SPECIAL as it serves no purpose
 - split the ->pfns array into an input flags (optional) and an output
   pfn (mandtory) one, using new flags for the input side
 - replace the output flags/values indirection with a bunch of values
   encoded in the high bits of a u64, with the rest used for the pfn



^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-16 13:12             ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx, Christoph Hellwig

On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > Ok.  I had some cleanups like this based of older trees, but if you are
> > active in this area I think I'll let you handle it.
> 
> You once said you wanted to loose the weird pfn flags scheme, so
> before putting hmm_range_fault in ODP I planned to do that.
> 
> If you have your series someplace send me a URL and I'll look on it

I have a local branch I just started hacking on, but it is rather broken
based on various discussions we had.  But for a basic one I'd suggest
something like:

 - kill HMM_PFN_SPECIAL as it serves no purpose
 - split the ->pfns array into an input flags (optional) and an output
   pfn (mandtory) one, using new flags for the input side
 - replace the output flags/values indirection with a bunch of values
   encoded in the high bits of a u64, with the rest used for the pfn

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
  2020-03-16  9:02     ` Christoph Hellwig
  (?)
@ 2020-03-16 18:07       ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 18:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 10:02:50AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:00PM -0300, Jason Gunthorpe wrote:
> > @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
> >  			return -EBUSY;
> >  		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
> >  				      &hmm_walk_ops, &hmm_vma_walk);
> > +		/*
> > +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> > +		 * searching in the probably common case that the pgmap is the
> > +		 * same for the entire requested range.
> > +		 */
> > +		if (hmm_vma_walk.pgmap) {
> > +			put_dev_pagemap(hmm_vma_walk.pgmap);
> > +			hmm_vma_walk.pgmap = NULL;
> > +		}
> >  	} while (ret == -EBUSY);
> 
> In which case it should only be put on return, and not for every loop.

I chose this to be simple without having to goto unwind it.

So, instead like this:

@@ -683,21 +661,33 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 		.flags = flags,
 	};
 	struct mm_struct *mm = range->notifier->mm;
-	int ret;
+	long ret;
 
 	lockdep_assert_held(&mm->mmap_sem);
 
 	do {
 		/* If range is no longer valid force retry. */
 		if (mmu_interval_check_retry(range->notifier,
-					     range->notifier_seq))
-			return -EBUSY;
+					     range->notifier_seq)) {
+			ret = -EBUSY;
+			goto out;
+		}
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
 	} while (ret == -EBUSY);
 
 	if (ret)
-		return ret;
-	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
+		goto out;
+	ret = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
+
+out:
+	/*
+	 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+	 * searching in the probably common case that the pgmap is the
+	 * same for the entire requested range.
+	 */
+	if (hmm_vma_walk.pgmap)
+		put_dev_pagemap(hmm_vma_walk.pgmap);
+	return ret;
 }
 EXPORT_SYMBOL(hmm_range_fault);

?

> I still think the right fix is to just delete all the unused and broken
> pgmap handling code.  If we ever need to add it back it can be added
> in a proper understood and tested way.

What I want to add is something like

 if (pgmap != walk->required_pgmap)
     cpu_flags = 0
 hmm_range_need_fault(..., cpu_flags, ...)

Which will fix a bug in nouveau where it blindly assumes any device
pages are its own, IIRC.

I think Ralph observed it needs to be here, because if the pgmap
doesn't match then it should trigger migration, in a single call,
rather than iterating.

I'm mostly expecting to replace all the other pgmap code, but keep the
pgmap caching. The existing pgmap stuff seems approx right, but
useless..

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-16 18:07       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 18:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 10:02:50AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:00PM -0300, Jason Gunthorpe wrote:
> > @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
> >  			return -EBUSY;
> >  		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
> >  				      &hmm_walk_ops, &hmm_vma_walk);
> > +		/*
> > +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> > +		 * searching in the probably common case that the pgmap is the
> > +		 * same for the entire requested range.
> > +		 */
> > +		if (hmm_vma_walk.pgmap) {
> > +			put_dev_pagemap(hmm_vma_walk.pgmap);
> > +			hmm_vma_walk.pgmap = NULL;
> > +		}
> >  	} while (ret == -EBUSY);
> 
> In which case it should only be put on return, and not for every loop.

I chose this to be simple without having to goto unwind it.

So, instead like this:

@@ -683,21 +661,33 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 		.flags = flags,
 	};
 	struct mm_struct *mm = range->notifier->mm;
-	int ret;
+	long ret;
 
 	lockdep_assert_held(&mm->mmap_sem);
 
 	do {
 		/* If range is no longer valid force retry. */
 		if (mmu_interval_check_retry(range->notifier,
-					     range->notifier_seq))
-			return -EBUSY;
+					     range->notifier_seq)) {
+			ret = -EBUSY;
+			goto out;
+		}
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
 	} while (ret == -EBUSY);
 
 	if (ret)
-		return ret;
-	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
+		goto out;
+	ret = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
+
+out:
+	/*
+	 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+	 * searching in the probably common case that the pgmap is the
+	 * same for the entire requested range.
+	 */
+	if (hmm_vma_walk.pgmap)
+		put_dev_pagemap(hmm_vma_walk.pgmap);
+	return ret;
 }
 EXPORT_SYMBOL(hmm_range_fault);

?

> I still think the right fix is to just delete all the unused and broken
> pgmap handling code.  If we ever need to add it back it can be added
> in a proper understood and tested way.

What I want to add is something like

 if (pgmap != walk->required_pgmap)
     cpu_flags = 0
 hmm_range_need_fault(..., cpu_flags, ...)

Which will fix a bug in nouveau where it blindly assumes any device
pages are its own, IIRC.

I think Ralph observed it needs to be here, because if the pgmap
doesn't match then it should trigger migration, in a single call,
rather than iterating.

I'm mostly expecting to replace all the other pgmap code, but keep the
pgmap caching. The existing pgmap stuff seems approx right, but
useless..

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-16 18:07       ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 18:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 10:02:50AM +0100, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 03:35:00PM -0300, Jason Gunthorpe wrote:
> > @@ -694,6 +672,15 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
> >  			return -EBUSY;
> >  		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
> >  				      &hmm_walk_ops, &hmm_vma_walk);
> > +		/*
> > +		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
> > +		 * searching in the probably common case that the pgmap is the
> > +		 * same for the entire requested range.
> > +		 */
> > +		if (hmm_vma_walk.pgmap) {
> > +			put_dev_pagemap(hmm_vma_walk.pgmap);
> > +			hmm_vma_walk.pgmap = NULL;
> > +		}
> >  	} while (ret == -EBUSY);
> 
> In which case it should only be put on return, and not for every loop.

I chose this to be simple without having to goto unwind it.

So, instead like this:

@@ -683,21 +661,33 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 		.flags = flags,
 	};
 	struct mm_struct *mm = range->notifier->mm;
-	int ret;
+	long ret;
 
 	lockdep_assert_held(&mm->mmap_sem);
 
 	do {
 		/* If range is no longer valid force retry. */
 		if (mmu_interval_check_retry(range->notifier,
-					     range->notifier_seq))
-			return -EBUSY;
+					     range->notifier_seq)) {
+			ret = -EBUSY;
+			goto out;
+		}
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
 	} while (ret == -EBUSY);
 
 	if (ret)
-		return ret;
-	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
+		goto out;
+	ret = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
+
+out:
+	/*
+	 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
+	 * searching in the probably common case that the pgmap is the
+	 * same for the entire requested range.
+	 */
+	if (hmm_vma_walk.pgmap)
+		put_dev_pagemap(hmm_vma_walk.pgmap);
+	return ret;
 }
 EXPORT_SYMBOL(hmm_range_fault);

?

> I still think the right fix is to just delete all the unused and broken
> pgmap handling code.  If we ever need to add it back it can be added
> in a proper understood and tested way.

What I want to add is something like

 if (pgmap != walk->required_pgmap)
     cpu_flags = 0
 hmm_range_need_fault(..., cpu_flags, ...)

Which will fix a bug in nouveau where it blindly assumes any device
pages are its own, IIRC.

I think Ralph observed it needs to be here, because if the pgmap
doesn't match then it should trigger migration, in a single call,
rather than iterating.

I'm mostly expecting to replace all the other pgmap code, but keep the
pgmap caching. The existing pgmap stuff seems approx right, but
useless..

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
  2020-03-16 18:07       ` Jason Gunthorpe
@ 2020-03-16 18:13         ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 18:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	linux-mm, John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> I chose this to be simple without having to goto unwind it.
> 
> So, instead like this:

As ѕaid, and per the previous discussion:  I think just removing the
pgmap lookup is the right thing to do here.  Something like this patch:

diff --git a/mm/hmm.c b/mm/hmm.c
index 3d10485bf323..9f1049815d44 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -28,7 +28,6 @@
 
 struct hmm_vma_walk {
 	struct hmm_range	*range;
-	struct dev_pagemap	*pgmap;
 	unsigned long		last;
 	unsigned int		flags;
 };
@@ -198,15 +197,8 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 		return hmm_vma_fault(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
-		if (pmd_devmap(pmd)) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap))
-				return -EBUSY;
-		}
+	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 }
@@ -277,15 +269,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (fault || write_fault)
 		goto fault;
 
-	if (pte_devmap(pte)) {
-		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
-					      hmm_vma_walk->pgmap);
-		if (unlikely(!hmm_vma_walk->pgmap)) {
-			pte_unmap(ptep);
-			return -EBUSY;
-		}
-	}
-
 	/*
 	 * Since each architecture defines a struct page for the zero page, just
 	 * fall through and treat it like a normal page.
@@ -455,12 +438,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
@@ -614,15 +591,6 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			return -EBUSY;
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
-		/*
-		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
-		 * searching in the probably common case that the pgmap is the
-		 * same for the entire requested range.
-		 */
-		if (hmm_vma_walk.pgmap) {
-			put_dev_pagemap(hmm_vma_walk.pgmap);
-			hmm_vma_walk.pgmap = NULL;
-		}
 	} while (ret == -EBUSY);
 
 	if (ret)


^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-16 18:13         ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-16 18:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx, Christoph Hellwig

On Mon, Mar 16, 2020 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> I chose this to be simple without having to goto unwind it.
> 
> So, instead like this:

As ѕaid, and per the previous discussion:  I think just removing the
pgmap lookup is the right thing to do here.  Something like this patch:

diff --git a/mm/hmm.c b/mm/hmm.c
index 3d10485bf323..9f1049815d44 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -28,7 +28,6 @@
 
 struct hmm_vma_walk {
 	struct hmm_range	*range;
-	struct dev_pagemap	*pgmap;
 	unsigned long		last;
 	unsigned int		flags;
 };
@@ -198,15 +197,8 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
 		return hmm_vma_fault(addr, end, fault, write_fault, walk);
 
 	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
-		if (pmd_devmap(pmd)) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap))
-				return -EBUSY;
-		}
+	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++)
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 }
@@ -277,15 +269,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	if (fault || write_fault)
 		goto fault;
 
-	if (pte_devmap(pte)) {
-		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
-					      hmm_vma_walk->pgmap);
-		if (unlikely(!hmm_vma_walk->pgmap)) {
-			pte_unmap(ptep);
-			return -EBUSY;
-		}
-	}
-
 	/*
 	 * Since each architecture defines a struct page for the zero page, just
 	 * fall through and treat it like a normal page.
@@ -455,12 +438,6 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap)) {
-				ret = -EBUSY;
-				goto out_unlock;
-			}
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
@@ -614,15 +591,6 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			return -EBUSY;
 		ret = walk_page_range(mm, hmm_vma_walk.last, range->end,
 				      &hmm_walk_ops, &hmm_vma_walk);
-		/*
-		 * A pgmap is kept cached in the hmm_vma_walk to avoid expensive
-		 * searching in the probably common case that the pgmap is the
-		 * same for the entire requested range.
-		 */
-		if (hmm_vma_walk.pgmap) {
-			put_dev_pagemap(hmm_vma_walk.pgmap);
-			hmm_vma_walk.pgmap = NULL;
-		}
 	} while (ret == -EBUSY);
 
 	if (ret)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 0/8] Various error case bug fixes for hmm_range_fault()
  2020-03-11 18:34 ` Jason Gunthorpe
  (?)
@ 2020-03-16 18:25   ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 18:25 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: linux-mm, John Hubbard, dri-devel, amd-gfx, Christoph Hellwig,
	Philip Yang

On Wed, Mar 11, 2020 at 03:34:58PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The hmm_range_fault() flow is fairly complicated. The scheme allows the
> caller to specify if it needs a usable result for each page, or if it only
> needs the current page table status filled in. This mixture of behavior is
> useful for a caller that wants to build a 'prefetch around fault'
> algorithm.
> 
> Although we have no in-tree users of this capability, I am working on
> having RDMA ODP work in this manner, and removing these bugs from
> hmm_range_fault() is a necessary step.
> 
> The basic principles are:
> 
>  - If the caller did not ask for a VA to be valid then hmm_range_fault()
>    should not fail because of that VA
> 
>  - If 0 is returned from hmm_range_fault() then the entire pfns array
>    contains valid data
> 
>  - HMM_PFN_ERROR is set if faulting fails, or if asking for faulting
>    would fail
> 
>  - A 0 return from hmm_range_fault() does not have HMM_PFN_ERROR in any
>    VA's the caller asked to be valid
> 
> This series does not get there completely, I have a followup series
> closing some more complex cases.
> 
> I tested this series using Ralph's hmm tester he posted a while back,
> other testing would be appreciated.
> 
> Jason Gunthorpe (8):
>   mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
>   mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
>   mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
>   mm/hmm: add missing call to hmm_range_need_fault() before returning
>     EFAULT
>   mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
>   mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid
>     pages
>   mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL
>     handling
>   mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()

I moved these toward linux-next, if others have remarks or tags please
feel free to continue.

>   mm/hmm: don't free the cached pgmap while scanning

I will respin

Thank you all for the reviews!

Regards,
Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 0/8] Various error case bug fixes for hmm_range_fault()
@ 2020-03-16 18:25   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 18:25 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:34:58PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The hmm_range_fault() flow is fairly complicated. The scheme allows the
> caller to specify if it needs a usable result for each page, or if it only
> needs the current page table status filled in. This mixture of behavior is
> useful for a caller that wants to build a 'prefetch around fault'
> algorithm.
> 
> Although we have no in-tree users of this capability, I am working on
> having RDMA ODP work in this manner, and removing these bugs from
> hmm_range_fault() is a necessary step.
> 
> The basic principles are:
> 
>  - If the caller did not ask for a VA to be valid then hmm_range_fault()
>    should not fail because of that VA
> 
>  - If 0 is returned from hmm_range_fault() then the entire pfns array
>    contains valid data
> 
>  - HMM_PFN_ERROR is set if faulting fails, or if asking for faulting
>    would fail
> 
>  - A 0 return from hmm_range_fault() does not have HMM_PFN_ERROR in any
>    VA's the caller asked to be valid
> 
> This series does not get there completely, I have a followup series
> closing some more complex cases.
> 
> I tested this series using Ralph's hmm tester he posted a while back,
> other testing would be appreciated.
> 
> Jason Gunthorpe (8):
>   mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
>   mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
>   mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
>   mm/hmm: add missing call to hmm_range_need_fault() before returning
>     EFAULT
>   mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
>   mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid
>     pages
>   mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL
>     handling
>   mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()

I moved these toward linux-next, if others have remarks or tags please
feel free to continue.

>   mm/hmm: don't free the cached pgmap while scanning

I will respin

Thank you all for the reviews!

Regards,
Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 0/8] Various error case bug fixes for hmm_range_fault()
@ 2020-03-16 18:25   ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 18:25 UTC (permalink / raw)
  To: Jerome Glisse, Ralph Campbell, Felix.Kuehling
  Cc: Philip Yang, John Hubbard, amd-gfx, linux-mm, dri-devel,
	Christoph Hellwig

On Wed, Mar 11, 2020 at 03:34:58PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> The hmm_range_fault() flow is fairly complicated. The scheme allows the
> caller to specify if it needs a usable result for each page, or if it only
> needs the current page table status filled in. This mixture of behavior is
> useful for a caller that wants to build a 'prefetch around fault'
> algorithm.
> 
> Although we have no in-tree users of this capability, I am working on
> having RDMA ODP work in this manner, and removing these bugs from
> hmm_range_fault() is a necessary step.
> 
> The basic principles are:
> 
>  - If the caller did not ask for a VA to be valid then hmm_range_fault()
>    should not fail because of that VA
> 
>  - If 0 is returned from hmm_range_fault() then the entire pfns array
>    contains valid data
> 
>  - HMM_PFN_ERROR is set if faulting fails, or if asking for faulting
>    would fail
> 
>  - A 0 return from hmm_range_fault() does not have HMM_PFN_ERROR in any
>    VA's the caller asked to be valid
> 
> This series does not get there completely, I have a followup series
> closing some more complex cases.
> 
> I tested this series using Ralph's hmm tester he posted a while back,
> other testing would be appreciated.
> 
> Jason Gunthorpe (8):
>   mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte()
>   mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock
>   mm/hmm: add missing pfns set to hmm_vma_walk_pmd()
>   mm/hmm: add missing call to hmm_range_need_fault() before returning
>     EFAULT
>   mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte()
>   mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid
>     pages
>   mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL
>     handling
>   mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd()

I moved these toward linux-next, if others have remarks or tags please
feel free to continue.

>   mm/hmm: don't free the cached pgmap while scanning

I will respin

Thank you all for the reviews!

Regards,
Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
  2020-03-16 18:13         ` Christoph Hellwig
  (?)
@ 2020-03-16 19:23           ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 19:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 07:13:24PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> > I chose this to be simple without having to goto unwind it.
> > 
> > So, instead like this:
> 
> As ѕaid, and per the previous discussion:  I think just removing the
> pgmap lookup is the right thing to do here.  Something like this patch:

OK. I think I get it now.

We don't even signal that the pfn is a pgmap to the caller, so the
caller must assume the pfn is CPU memory and can be dma mapped. At
that point it doesn't matter what kind of pgmap it is.

Races here are resolved by notifiers as we can't destroy the pgmap
without triggering invalidation of the pte

So removing is the right thing to do, and the fixing for the
device_private case is closer to the hunk I just sent.

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-16 19:23           ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 19:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 07:13:24PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> > I chose this to be simple without having to goto unwind it.
> > 
> > So, instead like this:
> 
> As ѕaid, and per the previous discussion:  I think just removing the
> pgmap lookup is the right thing to do here.  Something like this patch:

OK. I think I get it now.

We don't even signal that the pfn is a pgmap to the caller, so the
caller must assume the pfn is CPU memory and can be dma mapped. At
that point it doesn't matter what kind of pgmap it is.

Races here are resolved by notifiers as we can't destroy the pgmap
without triggering invalidation of the pte

So removing is the right thing to do, and the fixing for the
device_private case is closer to the hunk I just sent.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 2/8] mm/hmm: don't free the cached pgmap while scanning
@ 2020-03-16 19:23           ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-16 19:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Mon, Mar 16, 2020 at 07:13:24PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 03:07:13PM -0300, Jason Gunthorpe wrote:
> > I chose this to be simple without having to goto unwind it.
> > 
> > So, instead like this:
> 
> As ѕaid, and per the previous discussion:  I think just removing the
> pgmap lookup is the right thing to do here.  Something like this patch:

OK. I think I get it now.

We don't even signal that the pfn is a pgmap to the caller, so the
caller must assume the pfn is CPU memory and can be dma mapped. At
that point it doesn't matter what kind of pgmap it is.

Races here are resolved by notifiers as we can't destroy the pgmap
without triggering invalidation of the pte

So removing is the right thing to do, and the fixing for the
device_private case is closer to the hunk I just sent.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-16 13:12             ` Christoph Hellwig
@ 2020-03-17 12:32               ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-17 12:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	linux-mm, John Hubbard, dri-devel, amd-gfx, Philip Yang

On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > active in this area I think I'll let you handle it.
> > 
> > You once said you wanted to loose the weird pfn flags scheme, so
> > before putting hmm_range_fault in ODP I planned to do that.
> > 
> > If you have your series someplace send me a URL and I'll look on it
> 
> I have a local branch I just started hacking on, but it is rather broken
> based on various discussions we had.  But for a basic one I'd suggest
> something like:
> 
>  - kill HMM_PFN_SPECIAL as it serves no purpose
>  - split the ->pfns array into an input flags (optional) and an output
>    pfn (mandtory) one, using new flags for the input side
>  - replace the output flags/values indirection with a bunch of values
>    encoded in the high bits of a u64, with the rest used for the pfn

Thinking out loud a bit more:

 - do we really need HMM_PFN_ERROR, or is just a return value from
   hmm_range_fault enough?
 - because if it is we don't need output flags at all, and the output
   array could be struct pages, which would make for a much easier
   to use API


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-17 12:32               ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-17 12:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx, Christoph Hellwig

On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > active in this area I think I'll let you handle it.
> > 
> > You once said you wanted to loose the weird pfn flags scheme, so
> > before putting hmm_range_fault in ODP I planned to do that.
> > 
> > If you have your series someplace send me a URL and I'll look on it
> 
> I have a local branch I just started hacking on, but it is rather broken
> based on various discussions we had.  But for a basic one I'd suggest
> something like:
> 
>  - kill HMM_PFN_SPECIAL as it serves no purpose
>  - split the ->pfns array into an input flags (optional) and an output
>    pfn (mandtory) one, using new flags for the input side
>  - replace the output flags/values indirection with a bunch of values
>    encoded in the high bits of a u64, with the rest used for the pfn

Thinking out loud a bit more:

 - do we really need HMM_PFN_ERROR, or is just a return value from
   hmm_range_fault enough?
 - because if it is we don't need output flags at all, and the output
   array could be struct pages, which would make for a much easier
   to use API
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-17 12:32               ` Christoph Hellwig
  (?)
@ 2020-03-17 12:53                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 12:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang

On Tue, Mar 17, 2020 at 01:32:10PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > > active in this area I think I'll let you handle it.
> > > 
> > > You once said you wanted to loose the weird pfn flags scheme, so
> > > before putting hmm_range_fault in ODP I planned to do that.
> > > 
> > > If you have your series someplace send me a URL and I'll look on it
> > 
> > I have a local branch I just started hacking on, but it is rather broken
> > based on various discussions we had.  But for a basic one I'd suggest
> > something like:
> > 
> >  - kill HMM_PFN_SPECIAL as it serves no purpose
> >  - split the ->pfns array into an input flags (optional) and an output
> >    pfn (mandtory) one, using new flags for the input side
> >  - replace the output flags/values indirection with a bunch of values
> >    encoded in the high bits of a u64, with the rest used for the pfn
> 
> Thinking out loud a bit more:
> 
>  - do we really need HMM_PFN_ERROR, or is just a return value from
>    hmm_range_fault enough?

I'm not totally clear on this. The only use for ERROR is to signal to a
non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
device fault (ie device SIGSEGV).

We can also handle ERROR by having the device take the fault to CPU,
then fail during a faulting hmm_range_fault and then dropping the
ERROR indication toward the device.

If we already know the page cannot be faulted when we read it it feels
natural to return that too.

I have a patch, that now needs rebasing, which removes the PFN_ERROR
from the faulting cases. So only non-faulting hmm_range_fault users
will see it. faulting users will always see an errno return.

>  - because if it is we don't need output flags at all, and the output
>    array could be struct pages, which would make for a much easier
>    to use API

valid and write is required for the non-faulting case, I don't
think flags can go away.

Being able to do a non-faulting hmm_range_fault is going to be a big
performance win for ODP, this is my interest here.

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-17 12:53                 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 12:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Tue, Mar 17, 2020 at 01:32:10PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > > active in this area I think I'll let you handle it.
> > > 
> > > You once said you wanted to loose the weird pfn flags scheme, so
> > > before putting hmm_range_fault in ODP I planned to do that.
> > > 
> > > If you have your series someplace send me a URL and I'll look on it
> > 
> > I have a local branch I just started hacking on, but it is rather broken
> > based on various discussions we had.  But for a basic one I'd suggest
> > something like:
> > 
> >  - kill HMM_PFN_SPECIAL as it serves no purpose
> >  - split the ->pfns array into an input flags (optional) and an output
> >    pfn (mandtory) one, using new flags for the input side
> >  - replace the output flags/values indirection with a bunch of values
> >    encoded in the high bits of a u64, with the rest used for the pfn
> 
> Thinking out loud a bit more:
> 
>  - do we really need HMM_PFN_ERROR, or is just a return value from
>    hmm_range_fault enough?

I'm not totally clear on this. The only use for ERROR is to signal to a
non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
device fault (ie device SIGSEGV).

We can also handle ERROR by having the device take the fault to CPU,
then fail during a faulting hmm_range_fault and then dropping the
ERROR indication toward the device.

If we already know the page cannot be faulted when we read it it feels
natural to return that too.

I have a patch, that now needs rebasing, which removes the PFN_ERROR
from the faulting cases. So only non-faulting hmm_range_fault users
will see it. faulting users will always see an errno return.

>  - because if it is we don't need output flags at all, and the output
>    array could be struct pages, which would make for a much easier
>    to use API

valid and write is required for the non-faulting case, I don't
think flags can go away.

Being able to do a non-faulting hmm_range_fault is going to be a big
performance win for ODP, this is my interest here.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-17 12:53                 ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 12:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Tue, Mar 17, 2020 at 01:32:10PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 16, 2020 at 02:12:01PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 16, 2020 at 10:04:58AM -0300, Jason Gunthorpe wrote:
> > > > Ok.  I had some cleanups like this based of older trees, but if you are
> > > > active in this area I think I'll let you handle it.
> > > 
> > > You once said you wanted to loose the weird pfn flags scheme, so
> > > before putting hmm_range_fault in ODP I planned to do that.
> > > 
> > > If you have your series someplace send me a URL and I'll look on it
> > 
> > I have a local branch I just started hacking on, but it is rather broken
> > based on various discussions we had.  But for a basic one I'd suggest
> > something like:
> > 
> >  - kill HMM_PFN_SPECIAL as it serves no purpose
> >  - split the ->pfns array into an input flags (optional) and an output
> >    pfn (mandtory) one, using new flags for the input side
> >  - replace the output flags/values indirection with a bunch of values
> >    encoded in the high bits of a u64, with the rest used for the pfn
> 
> Thinking out loud a bit more:
> 
>  - do we really need HMM_PFN_ERROR, or is just a return value from
>    hmm_range_fault enough?

I'm not totally clear on this. The only use for ERROR is to signal to a
non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
device fault (ie device SIGSEGV).

We can also handle ERROR by having the device take the fault to CPU,
then fail during a faulting hmm_range_fault and then dropping the
ERROR indication toward the device.

If we already know the page cannot be faulted when we read it it feels
natural to return that too.

I have a patch, that now needs rebasing, which removes the PFN_ERROR
from the faulting cases. So only non-faulting hmm_range_fault users
will see it. faulting users will always see an errno return.

>  - because if it is we don't need output flags at all, and the output
>    array could be struct pages, which would make for a much easier
>    to use API

valid and write is required for the non-faulting case, I don't
think flags can go away.

Being able to do a non-faulting hmm_range_fault is going to be a big
performance win for ODP, this is my interest here.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-17 12:53                 ` Jason Gunthorpe
@ 2020-03-17 13:06                   ` Christoph Hellwig
  -1 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-17 13:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jerome Glisse, Ralph Campbell, Felix.Kuehling,
	linux-mm, John Hubbard, dri-devel, amd-gfx, Philip Yang

On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > Thinking out loud a bit more:
> > 
> >  - do we really need HMM_PFN_ERROR, or is just a return value from
> >    hmm_range_fault enough?
> 
> I'm not totally clear on this. The only use for ERROR is to signal to a
> non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> device fault (ie device SIGSEGV).
> 
> We can also handle ERROR by having the device take the fault to CPU,
> then fail during a faulting hmm_range_fault and then dropping the
> ERROR indication toward the device.
> 
> If we already know the page cannot be faulted when we read it it feels
> natural to return that too.

True.  Of course we could just stick an ERR_PTR into the page array
in this case.

> >  - because if it is we don't need output flags at all, and the output
> >    array could be struct pages, which would make for a much easier
> >    to use API
> 
> valid and write is required for the non-faulting case, I don't
> think flags can go away.

Do we have any cases where these aren't simply kept from the input array?
I couldn't find any, but I've not understood some of the subtle details
before.


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-17 13:06                   ` Christoph Hellwig
  0 siblings, 0 replies; 153+ messages in thread
From: Christoph Hellwig @ 2020-03-17 13:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx, Christoph Hellwig

On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > Thinking out loud a bit more:
> > 
> >  - do we really need HMM_PFN_ERROR, or is just a return value from
> >    hmm_range_fault enough?
> 
> I'm not totally clear on this. The only use for ERROR is to signal to a
> non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> device fault (ie device SIGSEGV).
> 
> We can also handle ERROR by having the device take the fault to CPU,
> then fail during a faulting hmm_range_fault and then dropping the
> ERROR indication toward the device.
> 
> If we already know the page cannot be faulted when we read it it feels
> natural to return that too.

True.  Of course we could just stick an ERR_PTR into the page array
in this case.

> >  - because if it is we don't need output flags at all, and the output
> >    array could be struct pages, which would make for a much easier
> >    to use API
> 
> valid and write is required for the non-faulting case, I don't
> think flags can go away.

Do we have any cases where these aren't simply kept from the input array?
I couldn't find any, but I've not understood some of the subtle details
before.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
  2020-03-17 13:06                   ` Christoph Hellwig
  (?)
@ 2020-03-17 13:25                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 13:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Ralph Campbell, Felix.Kuehling, linux-mm,
	John Hubbard, dri-devel, amd-gfx, Philip Yang

On Tue, Mar 17, 2020 at 02:06:08PM +0100, Christoph Hellwig wrote:
> On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > > Thinking out loud a bit more:
> > > 
> > >  - do we really need HMM_PFN_ERROR, or is just a return value from
> > >    hmm_range_fault enough?
> > 
> > I'm not totally clear on this. The only use for ERROR is to signal to a
> > non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> > device fault (ie device SIGSEGV).
> > 
> > We can also handle ERROR by having the device take the fault to CPU,
> > then fail during a faulting hmm_range_fault and then dropping the
> > ERROR indication toward the device.
> > 
> > If we already know the page cannot be faulted when we read it it feels
> > natural to return that too.
> 
> True.  Of course we could just stick an ERR_PTR into the page array
> in this case.

If we make the output here struct page *[] then there is also no
reason for hmm_range_fault to exist, get_user_pages already works like
that.

I see nearly the entire point of hmm_range_fault as being able to
return the flags..

> > >  - because if it is we don't need output flags at all, and the output
> > >    array could be struct pages, which would make for a much easier
> > >    to use API
> > 
> > valid and write is required for the non-faulting case, I don't
> > think flags can go away.
> 
> Do we have any cases where these aren't simply kept from the input array?
> I couldn't find any, but I've not understood some of the subtle details
> before.

Not today, I think

All this stuff becomes interesting when we have a non-faulting caller
of hmm_range_fault(). Then the input request flags will all be 0 and
the output flags will indicate the current state of the page table.

Jason


^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-17 13:25                     ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 13:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Tue, Mar 17, 2020 at 02:06:08PM +0100, Christoph Hellwig wrote:
> On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > > Thinking out loud a bit more:
> > > 
> > >  - do we really need HMM_PFN_ERROR, or is just a return value from
> > >    hmm_range_fault enough?
> > 
> > I'm not totally clear on this. The only use for ERROR is to signal to a
> > non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> > device fault (ie device SIGSEGV).
> > 
> > We can also handle ERROR by having the device take the fault to CPU,
> > then fail during a faulting hmm_range_fault and then dropping the
> > ERROR indication toward the device.
> > 
> > If we already know the page cannot be faulted when we read it it feels
> > natural to return that too.
> 
> True.  Of course we could just stick an ERR_PTR into the page array
> in this case.

If we make the output here struct page *[] then there is also no
reason for hmm_range_fault to exist, get_user_pages already works like
that.

I see nearly the entire point of hmm_range_fault as being able to
return the flags..

> > >  - because if it is we don't need output flags at all, and the output
> > >    array could be struct pages, which would make for a much easier
> > >    to use API
> > 
> > valid and write is required for the non-faulting case, I don't
> > think flags can go away.
> 
> Do we have any cases where these aren't simply kept from the input array?
> I couldn't find any, but I've not understood some of the subtle details
> before.

Not today, I think

All this stuff becomes interesting when we have a non-faulting caller
of hmm_range_fault(). Then the input request flags will all be 0 and
the output flags will indicate the current state of the page table.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 153+ messages in thread

* Re: [PATCH  hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling
@ 2020-03-17 13:25                     ` Jason Gunthorpe
  0 siblings, 0 replies; 153+ messages in thread
From: Jason Gunthorpe @ 2020-03-17 13:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philip Yang, Ralph Campbell, John Hubbard, Felix.Kuehling,
	dri-devel, linux-mm, Jerome Glisse, amd-gfx

On Tue, Mar 17, 2020 at 02:06:08PM +0100, Christoph Hellwig wrote:
> On Tue, Mar 17, 2020 at 09:53:17AM -0300, Jason Gunthorpe wrote:
> > > Thinking out loud a bit more:
> > > 
> > >  - do we really need HMM_PFN_ERROR, or is just a return value from
> > >    hmm_range_fault enough?
> > 
> > I'm not totally clear on this. The only use for ERROR is to signal to a
> > non-faulting hmm_range_fault (ie shapshot) that the page should generate a 
> > device fault (ie device SIGSEGV).
> > 
> > We can also handle ERROR by having the device take the fault to CPU,
> > then fail during a faulting hmm_range_fault and then dropping the
> > ERROR indication toward the device.
> > 
> > If we already know the page cannot be faulted when we read it it feels
> > natural to return that too.
> 
> True.  Of course we could just stick an ERR_PTR into the page array
> in this case.

If we make the output here struct page *[] then there is also no
reason for hmm_range_fault to exist, get_user_pages already works like
that.

I see nearly the entire point of hmm_range_fault as being able to
return the flags..

> > >  - because if it is we don't need output flags at all, and the output
> > >    array could be struct pages, which would make for a much easier
> > >    to use API
> > 
> > valid and write is required for the non-faulting case, I don't
> > think flags can go away.
> 
> Do we have any cases where these aren't simply kept from the input array?
> I couldn't find any, but I've not understood some of the subtle details
> before.

Not today, I think

All this stuff becomes interesting when we have a non-faulting caller
of hmm_range_fault(). Then the input request flags will all be 0 and
the output flags will indicate the current state of the page table.

Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 153+ messages in thread

end of thread, other threads:[~2020-03-17 14:18 UTC | newest]

Thread overview: 153+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 18:34 [PATCH hmm 0/8] Various error case bug fixes for hmm_range_fault() Jason Gunthorpe
2020-03-11 18:34 ` Jason Gunthorpe
2020-03-11 18:34 ` Jason Gunthorpe
2020-03-11 18:34 ` [PATCH hmm 1/8] mm/hmm: add missing unmaps of the ptep during hmm_vma_handle_pte() Jason Gunthorpe
2020-03-11 18:34   ` Jason Gunthorpe
2020-03-11 18:34   ` Jason Gunthorpe
2020-03-12  1:28   ` Ralph Campbell
2020-03-12  1:28     ` Ralph Campbell
2020-03-12  1:28     ` Ralph Campbell
2020-03-12 14:24     ` Jason Gunthorpe
2020-03-12 14:24       ` Jason Gunthorpe
2020-03-12 14:24       ` Jason Gunthorpe
2020-03-16  8:58   ` Christoph Hellwig
2020-03-16  8:58     ` Christoph Hellwig
2020-03-11 18:35 ` [PATCH hmm 2/8] mm/hmm: don't free the cached pgmap while scanning Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:29   ` Ralph Campbell
2020-03-12  1:29     ` Ralph Campbell
2020-03-12  1:29     ` Ralph Campbell
2020-03-16  9:02   ` Christoph Hellwig
2020-03-16  9:02     ` Christoph Hellwig
2020-03-16 18:07     ` Jason Gunthorpe
2020-03-16 18:07       ` Jason Gunthorpe
2020-03-16 18:07       ` Jason Gunthorpe
2020-03-16 18:13       ` Christoph Hellwig
2020-03-16 18:13         ` Christoph Hellwig
2020-03-16 19:23         ` Jason Gunthorpe
2020-03-16 19:23           ` Jason Gunthorpe
2020-03-16 19:23           ` Jason Gunthorpe
2020-03-11 18:35 ` [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:31   ` Ralph Campbell
2020-03-12  1:31     ` Ralph Campbell
2020-03-12  1:31     ` Ralph Campbell
2020-03-12  8:54   ` Steven Price
2020-03-12  8:54     ` Steven Price
2020-03-12  8:54     ` Steven Price
2020-03-12 10:28     ` [PATCH] mm/hmm: Simplify hmm_vma_walk_pud slightly Steven Price
2020-03-12 10:28       ` Steven Price
2020-03-12 10:28       ` Steven Price
2020-03-12 14:27       ` Jason Gunthorpe
2020-03-12 14:27         ` Jason Gunthorpe
2020-03-12 14:27         ` Jason Gunthorpe
2020-03-12 14:40         ` Steven Price
2020-03-12 14:40           ` Steven Price
2020-03-12 14:40           ` Steven Price
2020-03-12 15:11           ` Jason Gunthorpe
2020-03-12 15:11             ` Jason Gunthorpe
2020-03-12 15:11             ` Jason Gunthorpe
2020-03-12 16:16             ` Steven Price
2020-03-12 16:16               ` Steven Price
2020-03-12 16:16               ` Steven Price
2020-03-12 16:37               ` Jason Gunthorpe
2020-03-12 16:37                 ` Jason Gunthorpe
2020-03-12 16:37                 ` Jason Gunthorpe
2020-03-12 17:02                 ` Steven Price
2020-03-12 17:02                   ` Steven Price
2020-03-12 17:02                   ` Steven Price
2020-03-12 17:17                   ` Jason Gunthorpe
2020-03-12 17:17                     ` Jason Gunthorpe
2020-03-12 17:17                     ` Jason Gunthorpe
2020-03-13 19:55                   ` Jason Gunthorpe
2020-03-13 19:55                     ` Jason Gunthorpe
2020-03-13 19:55                     ` Jason Gunthorpe
2020-03-13 21:04                     ` Matthew Wilcox
2020-03-13 21:04                       ` Matthew Wilcox
2020-03-13 21:04                       ` Matthew Wilcox
2020-03-13 22:51                       ` Jason Gunthorpe
2020-03-13 22:51                         ` Jason Gunthorpe
2020-03-13 22:51                         ` Jason Gunthorpe
2020-03-16  9:05   ` [PATCH hmm 3/8] mm/hmm: do not call hmm_vma_walk_hole() while holding a spinlock Christoph Hellwig
2020-03-16  9:05     ` Christoph Hellwig
2020-03-16 12:56     ` Jason Gunthorpe
2020-03-16 12:56       ` Jason Gunthorpe
2020-03-16 12:56       ` Jason Gunthorpe
2020-03-11 18:35 ` [PATCH hmm 4/8] mm/hmm: add missing pfns set to hmm_vma_walk_pmd() Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:33   ` Ralph Campbell
2020-03-12  1:33     ` Ralph Campbell
2020-03-12  1:33     ` Ralph Campbell
2020-03-16  9:06   ` Christoph Hellwig
2020-03-16  9:06     ` Christoph Hellwig
2020-03-11 18:35 ` [PATCH hmm 5/8] mm/hmm: add missing call to hmm_range_need_fault() before returning EFAULT Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:34   ` Ralph Campbell
2020-03-12  1:34     ` Ralph Campbell
2020-03-12  1:34     ` Ralph Campbell
2020-03-16  9:07   ` Christoph Hellwig
2020-03-16  9:07     ` Christoph Hellwig
2020-03-11 18:35 ` [PATCH hmm 6/8] mm/hmm: reorganize how !pte_present is handled in hmm_vma_handle_pte() Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:36   ` Ralph Campbell
2020-03-12  1:36     ` Ralph Campbell
2020-03-12  1:36     ` Ralph Campbell
2020-03-16  9:11   ` Christoph Hellwig
2020-03-16  9:11     ` Christoph Hellwig
2020-03-11 18:35 ` [PATCH hmm 7/8] mm/hmm: return -EFAULT when setting HMM_PFN_ERROR on requested valid pages Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:36   ` Ralph Campbell
2020-03-12  1:36     ` Ralph Campbell
2020-03-12  1:36     ` Ralph Campbell
2020-03-12 14:35     ` Jason Gunthorpe
2020-03-12 14:35       ` Jason Gunthorpe
2020-03-12 14:35       ` Jason Gunthorpe
2020-03-16  9:12   ` Christoph Hellwig
2020-03-16  9:12     ` Christoph Hellwig
2020-03-11 18:35 ` [PATCH hmm 8/8] mm/hmm: add missing call to hmm_pte_need_fault in HMM_PFN_SPECIAL handling Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-11 18:35   ` Jason Gunthorpe
2020-03-12  1:38   ` Ralph Campbell
2020-03-12  1:38     ` Ralph Campbell
2020-03-12  1:38     ` Ralph Campbell
2020-03-16  9:13   ` Christoph Hellwig
2020-03-16  9:13     ` Christoph Hellwig
2020-03-16 12:10     ` Jason Gunthorpe
2020-03-16 12:10       ` Jason Gunthorpe
2020-03-16 12:10       ` Jason Gunthorpe
2020-03-16 12:49       ` Christoph Hellwig
2020-03-16 12:49         ` Christoph Hellwig
2020-03-16 13:04         ` Jason Gunthorpe
2020-03-16 13:04           ` Jason Gunthorpe
2020-03-16 13:04           ` Jason Gunthorpe
2020-03-16 13:12           ` Christoph Hellwig
2020-03-16 13:12             ` Christoph Hellwig
2020-03-17 12:32             ` Christoph Hellwig
2020-03-17 12:32               ` Christoph Hellwig
2020-03-17 12:53               ` Jason Gunthorpe
2020-03-17 12:53                 ` Jason Gunthorpe
2020-03-17 12:53                 ` Jason Gunthorpe
2020-03-17 13:06                 ` Christoph Hellwig
2020-03-17 13:06                   ` Christoph Hellwig
2020-03-17 13:25                   ` Jason Gunthorpe
2020-03-17 13:25                     ` Jason Gunthorpe
2020-03-17 13:25                     ` Jason Gunthorpe
2020-03-16 12:51   ` Christoph Hellwig
2020-03-16 12:51     ` Christoph Hellwig
2020-03-12 19:33 ` [PATCH hmm 9/8] mm/hmm: do not check pmd_protnone twice in hmm_vma_handle_pmd() Jason Gunthorpe
2020-03-12 19:33   ` Jason Gunthorpe
2020-03-12 19:33   ` Jason Gunthorpe
2020-03-12 23:50   ` Ralph Campbell
2020-03-12 23:50     ` Ralph Campbell
2020-03-12 23:50     ` Ralph Campbell
2020-03-16  9:14   ` Christoph Hellwig
2020-03-16  9:14     ` Christoph Hellwig
2020-03-16 18:25 ` [PATCH hmm 0/8] Various error case bug fixes for hmm_range_fault() Jason Gunthorpe
2020-03-16 18:25   ` Jason Gunthorpe
2020-03-16 18:25   ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.