linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* hmm_range_fault related fixes and legacy API removal v2
@ 2019-07-22  9:44 Christoph Hellwig
  2019-07-22  9:44 ` [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

Hi Jérôme, Ben and Jason,

below is a series against the hmm tree which fixes up the mmap_sem
locking in nouveau and while at it also removes leftover legacy HMM APIs
only used by nouveau.

The first 4 patches are a bug fix for nouveau, which I suspect should
go into this merge window even if the code is marked as staging, just
to avoid people copying the breakage.

Changes since v1:
 - don't return the valid state from hmm_range_unregister
 - additional nouveau cleanups

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
@ 2019-07-22  9:44 ` Christoph Hellwig
  2019-07-22 14:37   ` Souptick Joarder
  2019-07-22  9:44 ` [PATCH 2/6] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel,
	Felix Kuehling

We should not have two different error codes for the same condition.  In
addition this really complicates the code due to the special handling of
EAGAIN that drops the mmap_sem due to the FAULT_FLAG_ALLOW_RETRY logic
in the core vm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 Documentation/vm/hmm.rst |  2 +-
 mm/hmm.c                 | 10 ++++------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 7d90964abbb0..710ce1c701bf 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -237,7 +237,7 @@ The usage pattern is::
       ret = hmm_range_snapshot(&range);
       if (ret) {
           up_read(&mm->mmap_sem);
-          if (ret == -EAGAIN) {
+          if (ret == -EBUSY) {
             /*
              * No need to check hmm_range_wait_until_valid() return value
              * on retry we will get proper error with hmm_range_snapshot()
diff --git a/mm/hmm.c b/mm/hmm.c
index e1eedef129cf..16b6731a34db 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -946,7 +946,7 @@ EXPORT_SYMBOL(hmm_range_unregister);
  * @range: range
  * Return: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
  *          permission (for instance asking for write and range is read only),
- *          -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid
+ *          -EBUSY if you need to retry, -EFAULT invalid (ie either no valid
  *          vma or it is illegal to access that range), number of valid pages
  *          in range->pfns[] (from range start address).
  *
@@ -967,7 +967,7 @@ long hmm_range_snapshot(struct hmm_range *range)
 	do {
 		/* If range is no longer valid force retry. */
 		if (!range->valid)
-			return -EAGAIN;
+			return -EBUSY;
 
 		vma = find_vma(hmm->mm, start);
 		if (vma == NULL || (vma->vm_flags & device_vma))
@@ -1062,10 +1062,8 @@ long hmm_range_fault(struct hmm_range *range, bool block)
 
 	do {
 		/* If range is no longer valid force retry. */
-		if (!range->valid) {
-			up_read(&hmm->mm->mmap_sem);
-			return -EAGAIN;
-		}
+		if (!range->valid)
+			return -EBUSY;
 
 		vma = find_vma(hmm->mm, start);
 		if (vma == NULL || (vma->vm_flags & device_vma))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
  2019-07-22  9:44 ` [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} Christoph Hellwig
@ 2019-07-22  9:44 ` Christoph Hellwig
  2019-07-23 14:55   ` Jason Gunthorpe
  2019-07-22  9:44 ` [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

These two functions are marked as a legacy APIs to get rid of, but seem
to suit the current nouveau flow.  Move it to the only user in
preparation for fixing a locking bug involving caller and callee.
All comments referring to the old API have been removed as this now
is a driver private helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 45 +++++++++++++++++++++-
 include/linux/hmm.h                   | 54 ---------------------------
 2 files changed, 43 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 8c92374afcf2..cde09003c06b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -475,6 +475,47 @@ nouveau_svm_fault_cache(struct nouveau_svm *svm,
 		fault->inst, fault->addr, fault->access);
 }
 
+static inline bool nouveau_range_done(struct hmm_range *range)
+{
+	bool ret = hmm_range_valid(range);
+
+	hmm_range_unregister(range);
+	return ret;
+}
+
+static int
+nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range,
+		    bool block)
+{
+	long ret;
+
+	range->default_flags = 0;
+	range->pfn_flags_mask = -1UL;
+
+	ret = hmm_range_register(range, mirror,
+				 range->start, range->end,
+				 PAGE_SHIFT);
+	if (ret)
+		return (int)ret;
+
+	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
+		up_read(&range->vma->vm_mm->mmap_sem);
+		return -EAGAIN;
+	}
+
+	ret = hmm_range_fault(range, block);
+	if (ret <= 0) {
+		if (ret == -EBUSY || !ret) {
+			up_read(&range->vma->vm_mm->mmap_sem);
+			ret = -EBUSY;
+		} else if (ret == -EAGAIN)
+			ret = -EBUSY;
+		hmm_range_unregister(range);
+		return ret;
+	}
+	return 0;
+}
+
 static int
 nouveau_svm_fault(struct nvif_notify *notify)
 {
@@ -649,10 +690,10 @@ nouveau_svm_fault(struct nvif_notify *notify)
 		range.values = nouveau_svm_pfn_values;
 		range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT;
 again:
-		ret = hmm_vma_fault(&svmm->mirror, &range, true);
+		ret = nouveau_range_fault(&svmm->mirror, &range, true);
 		if (ret == 0) {
 			mutex_lock(&svmm->mutex);
-			if (!hmm_vma_range_done(&range)) {
+			if (!nouveau_range_done(&range)) {
 				mutex_unlock(&svmm->mutex);
 				goto again;
 			}
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index b8a08b2a10ca..7ef56dc18050 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -484,60 +484,6 @@ long hmm_range_dma_unmap(struct hmm_range *range,
  */
 #define HMM_RANGE_DEFAULT_TIMEOUT 1000
 
-/* This is a temporary helper to avoid merge conflict between trees. */
-static inline bool hmm_vma_range_done(struct hmm_range *range)
-{
-	bool ret = hmm_range_valid(range);
-
-	hmm_range_unregister(range);
-	return ret;
-}
-
-/* This is a temporary helper to avoid merge conflict between trees. */
-static inline int hmm_vma_fault(struct hmm_mirror *mirror,
-				struct hmm_range *range, bool block)
-{
-	long ret;
-
-	/*
-	 * With the old API the driver must set each individual entries with
-	 * the requested flags (valid, write, ...). So here we set the mask to
-	 * keep intact the entries provided by the driver and zero out the
-	 * default_flags.
-	 */
-	range->default_flags = 0;
-	range->pfn_flags_mask = -1UL;
-
-	ret = hmm_range_register(range, mirror,
-				 range->start, range->end,
-				 PAGE_SHIFT);
-	if (ret)
-		return (int)ret;
-
-	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
-		/*
-		 * The mmap_sem was taken by driver we release it here and
-		 * returns -EAGAIN which correspond to mmap_sem have been
-		 * drop in the old API.
-		 */
-		up_read(&range->vma->vm_mm->mmap_sem);
-		return -EAGAIN;
-	}
-
-	ret = hmm_range_fault(range, block);
-	if (ret <= 0) {
-		if (ret == -EBUSY || !ret) {
-			/* Same as above, drop mmap_sem to match old API. */
-			up_read(&range->vma->vm_mm->mmap_sem);
-			ret = -EBUSY;
-		} else if (ret == -EAGAIN)
-			ret = -EBUSY;
-		hmm_range_unregister(range);
-		return ret;
-	}
-	return 0;
-}
-
 /* Below are for HMM internal use only! Not to be used by device driver! */
 static inline void hmm_mm_init(struct mm_struct *mm)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
  2019-07-22  9:44 ` [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} Christoph Hellwig
  2019-07-22  9:44 ` [PATCH 2/6] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau Christoph Hellwig
@ 2019-07-22  9:44 ` Christoph Hellwig
  2019-07-23 14:56   ` Jason Gunthorpe
  2019-07-22  9:44 ` [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

The parameter is always false, so remove it as well as the -EAGAIN
handling that can only happen for the non-blocking case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index cde09003c06b..5dd83a46578f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -484,8 +484,7 @@ static inline bool nouveau_range_done(struct hmm_range *range)
 }
 
 static int
-nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range,
-		    bool block)
+nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 {
 	long ret;
 
@@ -503,7 +502,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range,
 		return -EAGAIN;
 	}
 
-	ret = hmm_range_fault(range, block);
+	ret = hmm_range_fault(range, true);
 	if (ret <= 0) {
 		if (ret == -EBUSY || !ret) {
 			up_read(&range->vma->vm_mm->mmap_sem);
@@ -690,7 +689,7 @@ nouveau_svm_fault(struct nvif_notify *notify)
 		range.values = nouveau_svm_pfn_values;
 		range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT;
 again:
-		ret = nouveau_range_fault(&svmm->mirror, &range, true);
+		ret = nouveau_range_fault(&svmm->mirror, &range);
 		if (ret == 0) {
 			mutex_lock(&svmm->mutex);
 			if (!nouveau_range_done(&range)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-07-22  9:44 ` [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault Christoph Hellwig
@ 2019-07-22  9:44 ` Christoph Hellwig
  2019-07-23 15:18   ` Jason Gunthorpe
  2019-07-22  9:44 ` [PATCH 5/6] nouveau: return -EBUSY when hmm_range_wait_until_valid fails Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

Currently nouveau_svm_fault expects nouveau_range_fault to never unlock
mmap_sem, but the latter unlocks it for a random selection of error
codes. Fix this up by always unlocking mmap_sem for non-zero return
values in nouveau_range_fault, and only unlocking it in the caller
for successful returns.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 5dd83a46578f..5de2d54b9782 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -494,8 +494,10 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 	ret = hmm_range_register(range, mirror,
 				 range->start, range->end,
 				 PAGE_SHIFT);
-	if (ret)
+	if (ret) {
+		up_read(&range->vma->vm_mm->mmap_sem);
 		return (int)ret;
+	}
 
 	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
 		up_read(&range->vma->vm_mm->mmap_sem);
@@ -504,11 +506,9 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 
 	ret = hmm_range_fault(range, true);
 	if (ret <= 0) {
-		if (ret == -EBUSY || !ret) {
-			up_read(&range->vma->vm_mm->mmap_sem);
-			ret = -EBUSY;
-		} else if (ret == -EAGAIN)
+		if (ret == 0)
 			ret = -EBUSY;
+		up_read(&range->vma->vm_mm->mmap_sem);
 		hmm_range_unregister(range);
 		return ret;
 	}
@@ -706,8 +706,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
 						NULL);
 			svmm->vmm->vmm.object.client->super = false;
 			mutex_unlock(&svmm->mutex);
+			up_read(&svmm->mm->mmap_sem);
 		}
-		up_read(&svmm->mm->mmap_sem);
 
 		/* Cancel any faults in the window whose pages didn't manage
 		 * to keep their valid bit, or stay writeable when required.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] nouveau: return -EBUSY when hmm_range_wait_until_valid fails
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-07-22  9:44 ` [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault Christoph Hellwig
@ 2019-07-22  9:44 ` Christoph Hellwig
  2019-07-23 15:18   ` Jason Gunthorpe
  2019-07-22  9:44 ` [PATCH 6/6] mm: remove the legacy hmm_pfn_* APIs Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

-EAGAIN has a magic meaning for non-blocking faults, so don't overload
it.  Given that the caller doesn't check for specific error codes this
change is purely cosmetic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 5de2d54b9782..a9c5c58d425b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -501,7 +501,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 
 	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
 		up_read(&range->vma->vm_mm->mmap_sem);
-		return -EAGAIN;
+		return -EBUSY;
 	}
 
 	ret = hmm_range_fault(range, true);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] mm: remove the legacy hmm_pfn_* APIs
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-07-22  9:44 ` [PATCH 5/6] nouveau: return -EBUSY when hmm_range_wait_until_valid fails Christoph Hellwig
@ 2019-07-22  9:44 ` Christoph Hellwig
  2019-07-23  1:11 ` hmm_range_fault related fixes and legacy API removal v2 Ralph Campbell
  2019-07-23 15:27 ` Jason Gunthorpe
  7 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-22  9:44 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

Switch the one remaining user in nouveau over to its replacement,
and remove all the wrappers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 +-
 include/linux/hmm.h                    | 34 --------------------------
 2 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 1333220787a1..345c63cb752a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -845,7 +845,7 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
 		struct page *page;
 		uint64_t addr;
 
-		page = hmm_pfn_to_page(range, range->pfns[i]);
+		page = hmm_device_entry_to_page(range, range->pfns[i]);
 		if (page == NULL)
 			continue;
 
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 7ef56dc18050..9f32586684c9 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -290,40 +290,6 @@ static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
 		range->flags[HMM_PFN_VALID];
 }
 
-/*
- * Old API:
- * hmm_pfn_to_page()
- * hmm_pfn_to_pfn()
- * hmm_pfn_from_page()
- * hmm_pfn_from_pfn()
- *
- * This are the OLD API please use new API, it is here to avoid cross-tree
- * merge painfullness ie we convert things to new API in stages.
- */
-static inline struct page *hmm_pfn_to_page(const struct hmm_range *range,
-					   uint64_t pfn)
-{
-	return hmm_device_entry_to_page(range, pfn);
-}
-
-static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range,
-					   uint64_t pfn)
-{
-	return hmm_device_entry_to_pfn(range, pfn);
-}
-
-static inline uint64_t hmm_pfn_from_page(const struct hmm_range *range,
-					 struct page *page)
-{
-	return hmm_device_entry_from_page(range, page);
-}
-
-static inline uint64_t hmm_pfn_from_pfn(const struct hmm_range *range,
-					unsigned long pfn)
-{
-	return hmm_device_entry_from_pfn(range, pfn);
-}
-
 /*
  * Mirroring: how to synchronize device page table with CPU page table.
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
  2019-07-22  9:44 ` [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} Christoph Hellwig
@ 2019-07-22 14:37   ` Souptick Joarder
  2019-07-23 14:54     ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Souptick Joarder @ 2019-07-22 14:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs,
	Ralph Campbell, Linux-MM, nouveau, dri-devel, linux-kernel,
	Felix Kuehling

On Mon, Jul 22, 2019 at 3:14 PM Christoph Hellwig <hch@lst.de> wrote:
>
> We should not have two different error codes for the same condition.  In
> addition this really complicates the code due to the special handling of
> EAGAIN that drops the mmap_sem due to the FAULT_FLAG_ALLOW_RETRY logic
> in the core vm.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  Documentation/vm/hmm.rst |  2 +-
>  mm/hmm.c                 | 10 ++++------
>  2 files changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> index 7d90964abbb0..710ce1c701bf 100644
> --- a/Documentation/vm/hmm.rst
> +++ b/Documentation/vm/hmm.rst
> @@ -237,7 +237,7 @@ The usage pattern is::
>        ret = hmm_range_snapshot(&range);
>        if (ret) {
>            up_read(&mm->mmap_sem);
> -          if (ret == -EAGAIN) {
> +          if (ret == -EBUSY) {
>              /*
>               * No need to check hmm_range_wait_until_valid() return value
>               * on retry we will get proper error with hmm_range_snapshot()
> diff --git a/mm/hmm.c b/mm/hmm.c
> index e1eedef129cf..16b6731a34db 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -946,7 +946,7 @@ EXPORT_SYMBOL(hmm_range_unregister);
>   * @range: range
>   * Return: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
>   *          permission (for instance asking for write and range is read only),
> - *          -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid
> + *          -EBUSY if you need to retry, -EFAULT invalid (ie either no valid
>   *          vma or it is illegal to access that range), number of valid pages
>   *          in range->pfns[] (from range start address).
>   *
> @@ -967,7 +967,7 @@ long hmm_range_snapshot(struct hmm_range *range)
>         do {
>                 /* If range is no longer valid force retry. */
>                 if (!range->valid)
> -                       return -EAGAIN;
> +                       return -EBUSY;
>
>                 vma = find_vma(hmm->mm, start);
>                 if (vma == NULL || (vma->vm_flags & device_vma))
> @@ -1062,10 +1062,8 @@ long hmm_range_fault(struct hmm_range *range, bool block)
>
>         do {
>                 /* If range is no longer valid force retry. */
> -               if (!range->valid) {
> -                       up_read(&hmm->mm->mmap_sem);
> -                       return -EAGAIN;
> -               }
> +               if (!range->valid)
> +                       return -EBUSY;

Is it fine to remove  up_read(&hmm->mm->mmap_sem) ?

>
>                 vma = find_vma(hmm->mm, start);
>                 if (vma == NULL || (vma->vm_flags & device_vma))
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: hmm_range_fault related fixes and legacy API removal v2
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2019-07-22  9:44 ` [PATCH 6/6] mm: remove the legacy hmm_pfn_* APIs Christoph Hellwig
@ 2019-07-23  1:11 ` Ralph Campbell
  2019-07-23 15:20   ` Jason Gunthorpe
  2019-07-23 15:27 ` Jason Gunthorpe
  7 siblings, 1 reply; 25+ messages in thread
From: Ralph Campbell @ 2019-07-23  1:11 UTC (permalink / raw)
  To: Christoph Hellwig, Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: linux-mm, nouveau, dri-devel, linux-kernel


On 7/22/19 2:44 AM, Christoph Hellwig wrote:
> Hi Jérôme, Ben and Jason,
> 
> below is a series against the hmm tree which fixes up the mmap_sem
> locking in nouveau and while at it also removes leftover legacy HMM APIs
> only used by nouveau.
> 
> The first 4 patches are a bug fix for nouveau, which I suspect should
> go into this merge window even if the code is marked as staging, just
> to avoid people copying the breakage.
> 
> Changes since v1:
>   - don't return the valid state from hmm_range_unregister
>   - additional nouveau cleanups
> 

I ran some OpenCL tests from Jerome with nouveau and this series,
5.3.0-rc1, and my two HMM fixes:
("mm/hmm: fix ZONE_DEVICE anon page mapping reuse")
("mm/hmm: Fix bad subpage pointer in try_to_unmap_one")

You can add for the series:
Tested-by: Ralph Campbell <rcampbell@nvidia.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
  2019-07-22 14:37   ` Souptick Joarder
@ 2019-07-23 14:54     ` Jason Gunthorpe
  2019-07-23 16:19       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 14:54 UTC (permalink / raw)
  To: Souptick Joarder
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Ralph Campbell, Linux-MM, nouveau, dri-devel, linux-kernel,
	Felix Kuehling

On Mon, Jul 22, 2019 at 08:07:33PM +0530, Souptick Joarder wrote:
> On Mon, Jul 22, 2019 at 3:14 PM Christoph Hellwig <hch@lst.de> wrote:
> >
> > We should not have two different error codes for the same condition.  In
> > addition this really complicates the code due to the special handling of
> > EAGAIN that drops the mmap_sem due to the FAULT_FLAG_ALLOW_RETRY logic
> > in the core vm.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
> > Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >  Documentation/vm/hmm.rst |  2 +-
> >  mm/hmm.c                 | 10 ++++------
> >  2 files changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> > index 7d90964abbb0..710ce1c701bf 100644
> > +++ b/Documentation/vm/hmm.rst
> > @@ -237,7 +237,7 @@ The usage pattern is::
> >        ret = hmm_range_snapshot(&range);
> >        if (ret) {
> >            up_read(&mm->mmap_sem);
> > -          if (ret == -EAGAIN) {
> > +          if (ret == -EBUSY) {
> >              /*
> >               * No need to check hmm_range_wait_until_valid() return value
> >               * on retry we will get proper error with hmm_range_snapshot()
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index e1eedef129cf..16b6731a34db 100644
> > +++ b/mm/hmm.c
> > @@ -946,7 +946,7 @@ EXPORT_SYMBOL(hmm_range_unregister);
> >   * @range: range
> >   * Return: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
> >   *          permission (for instance asking for write and range is read only),
> > - *          -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid
> > + *          -EBUSY if you need to retry, -EFAULT invalid (ie either no valid
> >   *          vma or it is illegal to access that range), number of valid pages
> >   *          in range->pfns[] (from range start address).
> >   *
> > @@ -967,7 +967,7 @@ long hmm_range_snapshot(struct hmm_range *range)
> >         do {
> >                 /* If range is no longer valid force retry. */
> >                 if (!range->valid)
> > -                       return -EAGAIN;
> > +                       return -EBUSY;
> >
> >                 vma = find_vma(hmm->mm, start);
> >                 if (vma == NULL || (vma->vm_flags & device_vma))
> > @@ -1062,10 +1062,8 @@ long hmm_range_fault(struct hmm_range *range, bool block)
> >
> >         do {
> >                 /* If range is no longer valid force retry. */
> > -               if (!range->valid) {
> > -                       up_read(&hmm->mm->mmap_sem);
> > -                       return -EAGAIN;
> > -               }
> > +               if (!range->valid)
> > +                       return -EBUSY;
> 
> Is it fine to remove  up_read(&hmm->mm->mmap_sem) ?

It seems very subtle, but under the covers this calls
handle_mm_fault() with FAULT_FLAG_ALLOW_RETRY which will cause the
mmap sem to become unlocked along the -EAGAIN return path.

I think without the commit message I wouldn't have been able to
understand that, so Christoph, could you also add the comment below
please?

Otherwise

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Frankly, I find the 'int *locked' scheme of GUP easier to understand..

diff --git a/mm/hmm.c b/mm/hmm.c
index 16b6731a34db79..54b3a4162ae949 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -301,8 +301,10 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
 	flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
 	flags |= write_fault ? FAULT_FLAG_WRITE : 0;
 	ret = handle_mm_fault(vma, addr, flags);
-	if (ret & VM_FAULT_RETRY)
+	if (ret & VM_FAULT_RETRY) {
+		/* Note, handle_mm_fault did up_read(&mm->mmap_sem)) */
 		return -EAGAIN;
+	}
 	if (ret & VM_FAULT_ERROR) {
 		*pfn = range->values[HMM_PFN_ERROR];
 		return -EFAULT;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau
  2019-07-22  9:44 ` [PATCH 2/6] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau Christoph Hellwig
@ 2019-07-23 14:55   ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 14:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Mon, Jul 22, 2019 at 11:44:22AM +0200, Christoph Hellwig wrote:
> These two functions are marked as a legacy APIs to get rid of, but seem
> to suit the current nouveau flow.  Move it to the only user in
> preparation for fixing a locking bug involving caller and callee.
> All comments referring to the old API have been removed as this now
> is a driver private helper.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 45 +++++++++++++++++++++-
>  include/linux/hmm.h                   | 54 ---------------------------
>  2 files changed, 43 insertions(+), 56 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault
  2019-07-22  9:44 ` [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault Christoph Hellwig
@ 2019-07-23 14:56   ` Jason Gunthorpe
  2019-07-23 16:23     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 14:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Mon, Jul 22, 2019 at 11:44:23AM +0200, Christoph Hellwig wrote:
> The parameter is always false, so remove it as well as the -EAGAIN
> handling that can only happen for the non-blocking case.

? Did the EAGAIN handling get removed in this patch?

> Signed-off-by: Christoph Hellwig <hch@lst.de>
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)

Code seems fine

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-22  9:44 ` [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault Christoph Hellwig
@ 2019-07-23 15:18   ` Jason Gunthorpe
  2019-07-23 16:30     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 15:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Mon, Jul 22, 2019 at 11:44:24AM +0200, Christoph Hellwig wrote:
> Currently nouveau_svm_fault expects nouveau_range_fault to never unlock
> mmap_sem, but the latter unlocks it for a random selection of error
> codes. Fix this up by always unlocking mmap_sem for non-zero return
> values in nouveau_range_fault, and only unlocking it in the caller
> for successful returns.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index 5dd83a46578f..5de2d54b9782 100644
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -494,8 +494,10 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
>  	ret = hmm_range_register(range, mirror,
>  				 range->start, range->end,
>  				 PAGE_SHIFT);
> -	if (ret)
> +	if (ret) {
> +		up_read(&range->vma->vm_mm->mmap_sem);
>  		return (int)ret;
> +	}
>  
>  	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>  		up_read(&range->vma->vm_mm->mmap_sem);
> @@ -504,11 +506,9 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
>  
>  	ret = hmm_range_fault(range, true);
>  	if (ret <= 0) {
> -		if (ret == -EBUSY || !ret) {
> -			up_read(&range->vma->vm_mm->mmap_sem);
> -			ret = -EBUSY;
> -		} else if (ret == -EAGAIN)
> +		if (ret == 0)
>  			ret = -EBUSY;
> +		up_read(&range->vma->vm_mm->mmap_sem);
>  		hmm_range_unregister(range);
>  		return ret;

Hum..

The caller does this:

again:
		ret = nouveau_range_fault(&svmm->mirror, &range);
		if (ret == 0) {
			mutex_lock(&svmm->mutex);
			if (!nouveau_range_done(&range)) {
				mutex_unlock(&svmm->mutex);
				goto again;

And we can't call nouveau_range_fault() -> hmm_range_fault() without
holding the mmap_sem, so we can't allow nouveau_range_fault to unlock
it.

Maybe this instead? 

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a9c5c58d425b3d..92cf760a9bcc5d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -494,21 +494,16 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 	ret = hmm_range_register(range, mirror,
 				 range->start, range->end,
 				 PAGE_SHIFT);
-	if (ret) {
-		up_read(&range->vma->vm_mm->mmap_sem);
+	if (ret)
 		return (int)ret;
-	}
 
-	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
-		up_read(&range->vma->vm_mm->mmap_sem);
+	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT))
 		return -EBUSY;
-	}
 
 	ret = hmm_range_fault(range, true);
 	if (ret <= 0) {
 		if (ret == 0)
 			ret = -EBUSY;
-		up_read(&range->vma->vm_mm->mmap_sem);
 		hmm_range_unregister(range);
 		return ret;
 	}
@@ -706,8 +701,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
 						NULL);
 			svmm->vmm->vmm.object.client->super = false;
 			mutex_unlock(&svmm->mutex);
-			up_read(&svmm->mm->mmap_sem);
 		}
+		up_read(&svmm->mm->mmap_sem);
 
 		/* Cancel any faults in the window whose pages didn't manage
 		 * to keep their valid bit, or stay writeable when required.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] nouveau: return -EBUSY when hmm_range_wait_until_valid fails
  2019-07-22  9:44 ` [PATCH 5/6] nouveau: return -EBUSY when hmm_range_wait_until_valid fails Christoph Hellwig
@ 2019-07-23 15:18   ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 15:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Mon, Jul 22, 2019 at 11:44:25AM +0200, Christoph Hellwig wrote:
> -EAGAIN has a magic meaning for non-blocking faults, so don't overload
> it.  Given that the caller doesn't check for specific error codes this
> change is purely cosmetic.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Agree

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: hmm_range_fault related fixes and legacy API removal v2
  2019-07-23  1:11 ` hmm_range_fault related fixes and legacy API removal v2 Ralph Campbell
@ 2019-07-23 15:20   ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 15:20 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs, linux-mm,
	nouveau, dri-devel, linux-kernel

On Mon, Jul 22, 2019 at 06:11:04PM -0700, Ralph Campbell wrote:
> 
> On 7/22/19 2:44 AM, Christoph Hellwig wrote:
> > Hi Jérôme, Ben and Jason,
> > 
> > below is a series against the hmm tree which fixes up the mmap_sem
> > locking in nouveau and while at it also removes leftover legacy HMM APIs
> > only used by nouveau.
> > 
> > The first 4 patches are a bug fix for nouveau, which I suspect should
> > go into this merge window even if the code is marked as staging, just
> > to avoid people copying the breakage.
> > 
> > Changes since v1:
> >   - don't return the valid state from hmm_range_unregister
> >   - additional nouveau cleanups
> > 
> 
> I ran some OpenCL tests from Jerome with nouveau and this series,
> 5.3.0-rc1, and my two HMM fixes:
> ("mm/hmm: fix ZONE_DEVICE anon page mapping reuse")
> ("mm/hmm: Fix bad subpage pointer in try_to_unmap_one")
> 
> You can add for the series:
> Tested-by: Ralph Campbell <rcampbell@nvidia.com>

Thanks, would you also rebase and resend the patch deleting 
struct hmm_update ?

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: hmm_range_fault related fixes and legacy API removal v2
  2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2019-07-23  1:11 ` hmm_range_fault related fixes and legacy API removal v2 Ralph Campbell
@ 2019-07-23 15:27 ` Jason Gunthorpe
  2019-07-23 16:32   ` Christoph Hellwig
  7 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 15:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Mon, Jul 22, 2019 at 11:44:20AM +0200, Christoph Hellwig wrote:
> Hi Jérôme, Ben and Jason,
> 
> below is a series against the hmm tree which fixes up the mmap_sem
> locking in nouveau and while at it also removes leftover legacy HMM APIs
> only used by nouveau.
> 
> The first 4 patches are a bug fix for nouveau, which I suspect should
> go into this merge window even if the code is marked as staging, just
> to avoid people copying the breakage.

Ignoring the STAGING issue I've tried to use the same guideline as for
-stable for -rc .. 

So this is a real problem, we definitely hit the locking bugs if we
retry/etc under stress, so I would be OK to send it to Linus for
early-rc.

However, it doesn't look like the 1st patch is fixing a current bug
though, the only callers uses blocking = true, so just the middle
three are -rc?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
  2019-07-23 14:54     ` Jason Gunthorpe
@ 2019-07-23 16:19       ` Christoph Hellwig
  2019-07-23 17:18         ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-23 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Souptick Joarder, Christoph Hellwig, Jérôme Glisse,
	Ben Skeggs, Ralph Campbell, Linux-MM, nouveau, dri-devel,
	linux-kernel, Felix Kuehling

On Tue, Jul 23, 2019 at 02:54:45PM +0000, Jason Gunthorpe wrote:
> I think without the commit message I wouldn't have been able to
> understand that, so Christoph, could you also add the comment below
> please?

I don't think this belongs into this patch.  I can add it as a separate
patch under your name and with your signoff if you are ok with that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault
  2019-07-23 14:56   ` Jason Gunthorpe
@ 2019-07-23 16:23     ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-23 16:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

On Tue, Jul 23, 2019 at 02:56:24PM +0000, Jason Gunthorpe wrote:
> On Mon, Jul 22, 2019 at 11:44:23AM +0200, Christoph Hellwig wrote:
> > The parameter is always false, so remove it as well as the -EAGAIN
> > handling that can only happen for the non-blocking case.
> 
> ? Did the EAGAIN handling get removed in this patch?

No.  The next revision will remove it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-23 15:18   ` Jason Gunthorpe
@ 2019-07-23 16:30     ` Christoph Hellwig
  2019-07-23 17:17       ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-23 16:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

On Tue, Jul 23, 2019 at 03:18:28PM +0000, Jason Gunthorpe wrote:
> Hum..
> 
> The caller does this:
> 
> again:
> 		ret = nouveau_range_fault(&svmm->mirror, &range);
> 		if (ret == 0) {
> 			mutex_lock(&svmm->mutex);
> 			if (!nouveau_range_done(&range)) {
> 				mutex_unlock(&svmm->mutex);
> 				goto again;
> 
> And we can't call nouveau_range_fault() -> hmm_range_fault() without
> holding the mmap_sem, so we can't allow nouveau_range_fault to unlock
> it.

Goto again can only happen if nouveau_range_fault was successful,
in which case we did not drop mmap_sem.

Also:

>  	ret = hmm_range_fault(range, true);
>  	if (ret <= 0) {
>  		if (ret == 0)
>  			ret = -EBUSY;
> -		up_read(&range->vma->vm_mm->mmap_sem);
>  		hmm_range_unregister(range);

This would hold mmap_sem over hmm_range_unregister, which can lead
to deadlock if we call exit_mmap and then acquire mmap_sem again.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: hmm_range_fault related fixes and legacy API removal v2
  2019-07-23 15:27 ` Jason Gunthorpe
@ 2019-07-23 16:32   ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-23 16:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

On Tue, Jul 23, 2019 at 03:27:41PM +0000, Jason Gunthorpe wrote:
> Ignoring the STAGING issue I've tried to use the same guideline as for
> -stable for -rc .. 
> 
> So this is a real problem, we definitely hit the locking bugs if we
> retry/etc under stress, so I would be OK to send it to Linus for
> early-rc.
> 
> However, it doesn't look like the 1st patch is fixing a current bug
> though, the only callers uses blocking = true, so just the middle
> three are -rc?

nonblocking isn't used anywher, but it is a major, major API bug.
Your call, but if it was my tree I'd probably send it to Linus.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-23 16:30     ` Christoph Hellwig
@ 2019-07-23 17:17       ` Jason Gunthorpe
  2019-07-23 17:23         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 17:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Tue, Jul 23, 2019 at 06:30:48PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 03:18:28PM +0000, Jason Gunthorpe wrote:
> > Hum..
> > 
> > The caller does this:
> > 
> > again:
> > 		ret = nouveau_range_fault(&svmm->mirror, &range);
> > 		if (ret == 0) {
> > 			mutex_lock(&svmm->mutex);
> > 			if (!nouveau_range_done(&range)) {
> > 				mutex_unlock(&svmm->mutex);
> > 				goto again;
> > 
> > And we can't call nouveau_range_fault() -> hmm_range_fault() without
> > holding the mmap_sem, so we can't allow nouveau_range_fault to unlock
> > it.
> 
> Goto again can only happen if nouveau_range_fault was successful,
> in which case we did not drop mmap_sem.

Oh, right we switch from success = number of pages to success =0..

However the reason this looks so weird to me is that the locking
pattern isn't being followed, any result of hmm_range_fault should be
ignored until we enter the svmm->mutex and check if there was a
colliding invalidation.

So the 'goto again' *should* be possible even if range_fault failed.

But that is not for this patch..

> >  	ret = hmm_range_fault(range, true);
> >  	if (ret <= 0) {
> >  		if (ret == 0)
> >  			ret = -EBUSY;
> > -		up_read(&range->vma->vm_mm->mmap_sem);
> >  		hmm_range_unregister(range);
> 
> This would hold mmap_sem over hmm_range_unregister, which can lead
> to deadlock if we call exit_mmap and then acquire mmap_sem again.

That reminds me, this code is also leaking hmm_range_unregister() in
the success path, right?

I think the right way to structure this is to move the goto again and
related into the nouveau_range_fault() so the whole retry algorithm is
sensibly self contained.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}
  2019-07-23 16:19       ` Christoph Hellwig
@ 2019-07-23 17:18         ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 17:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Souptick Joarder, Jérôme Glisse, Ben Skeggs,
	Ralph Campbell, Linux-MM, nouveau, dri-devel, linux-kernel,
	Felix Kuehling

On Tue, Jul 23, 2019 at 06:19:07PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 02:54:45PM +0000, Jason Gunthorpe wrote:
> > I think without the commit message I wouldn't have been able to
> > understand that, so Christoph, could you also add the comment below
> > please?
> 
> I don't think this belongs into this patch.  I can add it as a separate
> patch under your name and with your signoff if you are ok with that.

Yep, thanks

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-23 17:17       ` Jason Gunthorpe
@ 2019-07-23 17:23         ` Christoph Hellwig
  2019-07-23 17:30           ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-23 17:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

On Tue, Jul 23, 2019 at 02:17:31PM -0300, Jason Gunthorpe wrote:
> That reminds me, this code is also leaking hmm_range_unregister() in
> the success path, right?

No, that is done by hmm_vma_range_done / nouveau_range_done for the
success path.

> 
> I think the right way to structure this is to move the goto again and
> related into the nouveau_range_fault() so the whole retry algorithm is
> sensibly self contained.

Then we'd take svmm->mutex inside the helper and let the caller
unlock that.  Either way it is a bit of a mess, and I'd rather prefer
if someone has the hardware would do a grand rewrite of this path
eventually.  Alternatively if no one signs up to mainain this code
we should eventually drop it given the staging status.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-23 17:23         ` Christoph Hellwig
@ 2019-07-23 17:30           ` Jason Gunthorpe
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2019-07-23 17:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, linux-kernel

On Tue, Jul 23, 2019 at 07:23:35PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 02:17:31PM -0300, Jason Gunthorpe wrote:
> > That reminds me, this code is also leaking hmm_range_unregister() in
> > the success path, right?
> 
> No, that is done by hmm_vma_range_done / nouveau_range_done for the
> success path.

.. which is done with the mmap_sem held :(

> > I think the right way to structure this is to move the goto again and
> > related into the nouveau_range_fault() so the whole retry algorithm is
> > sensibly self contained.
> 
> Then we'd take svmm->mutex inside the helper and let the caller
> unlock that.  Either way it is a bit of a mess, and I'd rather prefer
> if someone has the hardware would do a grand rewrite of this path
> eventually.  Alternatively if no one signs up to mainain this code
> we should eventually drop it given the staging status.

I tend to agree with the sentiment, it just makes me sad that all the
examples we have of these APIs are so troubled.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  2019-07-03 22:02 Christoph Hellwig
@ 2019-07-03 22:02 ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2019-07-03 22:02 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, linux-kernel

Currently nouveau_svm_fault expects nouveau_range_fault to never unlock
mmap_sem, but the latter unlocks it for a random selection of error
codes. Fix this up by always unlocking mmap_sem for non-zero return
values in nouveau_range_fault, and only unlocking it in the caller
for successful returns.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 9a9f71e4be29..d97d862e8b7d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -501,8 +501,10 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 	ret = hmm_range_register(range, mirror,
 				 range->start, range->end,
 				 PAGE_SHIFT);
-	if (ret)
+	if (ret) {
+		up_read(&range->vma->vm_mm->mmap_sem);
 		return (int)ret;
+	}
 
 	if (!hmm_range_wait_until_valid(range, NOUVEAU_RANGE_FAULT_TIMEOUT)) {
 		up_read(&range->vma->vm_mm->mmap_sem);
@@ -511,11 +513,9 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 
 	ret = hmm_range_fault(range, true);
 	if (ret <= 0) {
-		if (ret == -EBUSY || !ret) {
-			up_read(&range->vma->vm_mm->mmap_sem);
-			ret = -EBUSY;
-		} else if (ret == -EAGAIN)
+		if (ret == 0)
 			ret = -EBUSY;
+		up_read(&range->vma->vm_mm->mmap_sem);
 		hmm_range_unregister(range);
 		return ret;
 	}
@@ -713,8 +713,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
 						NULL);
 			svmm->vmm->vmm.object.client->super = false;
 			mutex_unlock(&svmm->mutex);
+			up_read(&svmm->mm->mmap_sem);
 		}
-		up_read(&svmm->mm->mmap_sem);
 
 		/* Cancel any faults in the window whose pages didn't manage
 		 * to keep their valid bit, or stay writeable when required.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-07-23 17:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22  9:44 hmm_range_fault related fixes and legacy API removal v2 Christoph Hellwig
2019-07-22  9:44 ` [PATCH 1/6] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot} Christoph Hellwig
2019-07-22 14:37   ` Souptick Joarder
2019-07-23 14:54     ` Jason Gunthorpe
2019-07-23 16:19       ` Christoph Hellwig
2019-07-23 17:18         ` Jason Gunthorpe
2019-07-22  9:44 ` [PATCH 2/6] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau Christoph Hellwig
2019-07-23 14:55   ` Jason Gunthorpe
2019-07-22  9:44 ` [PATCH 3/6] nouveau: remove the block parameter to nouveau_range_fault Christoph Hellwig
2019-07-23 14:56   ` Jason Gunthorpe
2019-07-23 16:23     ` Christoph Hellwig
2019-07-22  9:44 ` [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault Christoph Hellwig
2019-07-23 15:18   ` Jason Gunthorpe
2019-07-23 16:30     ` Christoph Hellwig
2019-07-23 17:17       ` Jason Gunthorpe
2019-07-23 17:23         ` Christoph Hellwig
2019-07-23 17:30           ` Jason Gunthorpe
2019-07-22  9:44 ` [PATCH 5/6] nouveau: return -EBUSY when hmm_range_wait_until_valid fails Christoph Hellwig
2019-07-23 15:18   ` Jason Gunthorpe
2019-07-22  9:44 ` [PATCH 6/6] mm: remove the legacy hmm_pfn_* APIs Christoph Hellwig
2019-07-23  1:11 ` hmm_range_fault related fixes and legacy API removal v2 Ralph Campbell
2019-07-23 15:20   ` Jason Gunthorpe
2019-07-23 15:27 ` Jason Gunthorpe
2019-07-23 16:32   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2019-07-03 22:02 Christoph Hellwig
2019-07-03 22:02 ` [PATCH 4/6] nouveau: unlock mmap_sem on all errors from nouveau_range_fault Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).