Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* hmm cleanups, v2
@ 2019-08-06 16:05 Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 01/15] amdgpu: remove -EAGAIN handling for hmm_range_fault Christoph Hellwig
                   ` (15 more replies)
  0 siblings, 16 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel


Hi Jérôme, Ben, Felix and Jason,

below is a series against the hmm tree which cleans up various minor
bits and allows HMM_MIRROR to be built on all architectures.

Diffstat:

    11 files changed, 94 insertions(+), 210 deletions(-)

A git tree is also available at:

    git://git.infradead.org/users/hch/misc.git hmm-cleanups.2

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-cleanups.2

Changes since v1:
 - fix the cover letter subject
 - improve various patch descriptions
 - use svmm->mm in nouveau_range_fault
 - inverse the hmask field when using it
 - select HMM_MIRROR instead of making it a user visible option


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/15] amdgpu: remove -EAGAIN handling for hmm_range_fault
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 02/15] amdgpu: don't initialize range->list in amdgpu_hmm_init_range Christoph Hellwig
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

hmm_range_fault can only return -EAGAIN if called with the
HMM_FAULT_ALLOW_RETRY flag, which amdgpu never does.  Remove the
handling for the -EAGAIN case with its non-standard locking scheme.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 12a59ac83f72..f0821638bbc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -778,7 +778,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 	struct hmm_range *range;
 	unsigned long i;
 	uint64_t *pfns;
-	int retry = 0;
 	int r = 0;
 
 	if (!mm) /* Happens during process shutdown */
@@ -822,7 +821,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 	hmm_range_register(range, mirror, start,
 			   start + ttm->num_pages * PAGE_SIZE, PAGE_SHIFT);
 
-retry:
 	/*
 	 * Just wait for range to be valid, safe to ignore return value as we
 	 * will use the return value of hmm_range_fault() below under the
@@ -831,24 +829,12 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 	hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
 
 	down_read(&mm->mmap_sem);
-
 	r = hmm_range_fault(range, 0);
-	if (unlikely(r < 0)) {
-		if (likely(r == -EAGAIN)) {
-			/*
-			 * return -EAGAIN, mmap_sem is dropped
-			 */
-			if (retry++ < MAX_RETRY_HMM_RANGE_FAULT)
-				goto retry;
-			else
-				pr_err("Retry hmm fault too many times\n");
-		}
-
-		goto out_up_read;
-	}
-
 	up_read(&mm->mmap_sem);
 
+	if (unlikely(r < 0))
+		goto out_free_pfns;
+
 	for (i = 0; i < ttm->num_pages; i++) {
 		pages[i] = hmm_device_entry_to_page(range, pfns[i]);
 		if (unlikely(!pages[i])) {
@@ -864,9 +850,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 
 	return 0;
 
-out_up_read:
-	if (likely(r != -EAGAIN))
-		up_read(&mm->mmap_sem);
 out_free_pfns:
 	hmm_range_unregister(range);
 	kvfree(pfns);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 02/15] amdgpu: don't initialize range->list in amdgpu_hmm_init_range
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 01/15] amdgpu: remove -EAGAIN handling for hmm_range_fault Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault Christoph Hellwig
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

The list is used to add the range to another list as an entry in the
core hmm code, and intended as a private member not exposed to drivers.
There is no need to initialize it in a driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index b698b423b25d..60b9fc9561d7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -484,6 +484,5 @@ void amdgpu_hmm_init_range(struct hmm_range *range)
 		range->flags = hmm_range_flags;
 		range->values = hmm_range_values;
 		range->pfn_shift = PAGE_SHIFT;
-		INIT_LIST_HEAD(&range->list);
 	}
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 01/15] amdgpu: remove -EAGAIN handling for hmm_range_fault Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 02/15] amdgpu: don't initialize range->list in amdgpu_hmm_init_range Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 18:02   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk Christoph Hellwig
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

We'll need the nouveau_svmm structure to improve the function soon.
For now this allows using the svmm->mm reference to unlock the
mmap_sem, and thus the same dereference chain that the caller uses
to lock and unlock it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a74530b5a523..98072fd48cf7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -485,23 +485,23 @@ nouveau_range_done(struct hmm_range *range)
 }
 
 static int
-nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
+nouveau_range_fault(struct nouveau_svmm *svmm, struct hmm_range *range)
 {
 	long ret;
 
 	range->default_flags = 0;
 	range->pfn_flags_mask = -1UL;
 
-	ret = hmm_range_register(range, mirror,
+	ret = hmm_range_register(range, &svmm->mirror,
 				 range->start, range->end,
 				 PAGE_SHIFT);
 	if (ret) {
-		up_read(&range->hmm->mm->mmap_sem);
+		up_read(&svmm->mm->mmap_sem);
 		return (int)ret;
 	}
 
 	if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
-		up_read(&range->hmm->mm->mmap_sem);
+		up_read(&svmm->mm->mmap_sem);
 		return -EBUSY;
 	}
 
@@ -509,7 +509,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
 	if (ret <= 0) {
 		if (ret == 0)
 			ret = -EBUSY;
-		up_read(&range->hmm->mm->mmap_sem);
+		up_read(&svmm->mm->mmap_sem);
 		hmm_range_unregister(range);
 		return ret;
 	}
@@ -689,7 +689,7 @@ nouveau_svm_fault(struct nvif_notify *notify)
 		range.values = nouveau_svm_pfn_values;
 		range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT;
 again:
-		ret = nouveau_range_fault(&svmm->mirror, &range);
+		ret = nouveau_range_fault(svmm, &range);
 		if (ret == 0) {
 			mutex_lock(&svmm->mutex);
 			if (!nouveau_range_done(&range)) {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-07 17:45   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 05/15] mm: remove the unused vma argument to hmm_range_dma_unmap Christoph Hellwig
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

There is only a single place where the pgmap is passed over a function
call, so replace it with local variables in the places where we deal
with the pgmap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 62 ++++++++++++++++++++++++--------------------------------
 1 file changed, 27 insertions(+), 35 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 9a908902e4cc..d66fa29b42e0 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister);
 
 struct hmm_vma_walk {
 	struct hmm_range	*range;
-	struct dev_pagemap	*pgmap;
 	unsigned long		last;
 	unsigned int		flags;
 };
@@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
+	struct dev_pagemap *pgmap = NULL;
 	unsigned long pfn, npages, i;
 	bool fault, write_fault;
 	uint64_t cpu_flags;
@@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 	pfn = pmd_pfn(pmd) + pte_index(addr);
 	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
 		if (pmd_devmap(pmd)) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap))
+			pgmap = get_dev_pagemap(pfn, pgmap);
+			if (unlikely(!pgmap))
 				return -EBUSY;
 		}
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
+	if (pgmap)
+		put_dev_pagemap(pgmap);
 	hmm_vma_walk->last = end;
 	return 0;
 #else
@@ -520,7 +517,7 @@ static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
 
 static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 			      unsigned long end, pmd_t *pmdp, pte_t *ptep,
-			      uint64_t *pfn)
+			      uint64_t *pfn, struct dev_pagemap **pgmap)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -591,9 +588,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 		goto fault;
 
 	if (pte_devmap(pte)) {
-		hmm_vma_walk->pgmap = get_dev_pagemap(pte_pfn(pte),
-					      hmm_vma_walk->pgmap);
-		if (unlikely(!hmm_vma_walk->pgmap))
+		*pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
+		if (unlikely(!*pgmap))
 			return -EBUSY;
 	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
 		*pfn = range->values[HMM_PFN_SPECIAL];
@@ -604,10 +600,10 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	return 0;
 
 fault:
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
+	if (*pgmap)
+		put_dev_pagemap(*pgmap);
+	*pgmap = NULL;
+
 	pte_unmap(ptep);
 	/* Fault any virtual address we were asked to fault */
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -620,6 +616,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
+	struct dev_pagemap *pgmap = NULL;
 	uint64_t *pfns = range->pfns;
 	unsigned long addr = start, i;
 	pte_t *ptep;
@@ -683,23 +680,21 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	for (; addr < end; addr += PAGE_SIZE, ptep++, i++) {
 		int r;
 
-		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i]);
+		r = hmm_vma_handle_pte(walk, addr, end, pmdp, ptep, &pfns[i],
+				&pgmap);
 		if (r) {
 			/* hmm_vma_handle_pte() did unmap pte directory */
 			hmm_vma_walk->last = addr;
 			return r;
 		}
 	}
-	if (hmm_vma_walk->pgmap) {
-		/*
-		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-		 * so that we can leverage get_dev_pagemap() optimization which
-		 * will not re-take a reference on a pgmap if we already have
-		 * one.
-		 */
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
+	/*
+	 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte() so that
+	 * we can leverage the get_dev_pagemap() optimization which will not
+	 * re-take a reference on a pgmap if we already have one.
+	 */
+	if (pgmap)
+		put_dev_pagemap(pgmap);
 	pte_unmap(ptep - 1);
 
 	hmm_vma_walk->last = addr;
@@ -714,6 +709,7 @@ static int hmm_vma_walk_pud(pud_t *pudp,
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	unsigned long addr = start, next;
+	struct dev_pagemap *pgmap = NULL;
 	pmd_t *pmdp;
 	pud_t pud;
 	int ret;
@@ -744,17 +740,14 @@ static int hmm_vma_walk_pud(pud_t *pudp,
 
 		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
 		for (i = 0; i < npages; ++i, ++pfn) {
-			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
-					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap))
+			pgmap = get_dev_pagemap(pfn, pgmap);
+			if (unlikely(!pgmap))
 				return -EBUSY;
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
-		if (hmm_vma_walk->pgmap) {
-			put_dev_pagemap(hmm_vma_walk->pgmap);
-			hmm_vma_walk->pgmap = NULL;
-		}
+		if (pgmap)
+			put_dev_pagemap(pgmap);
 		hmm_vma_walk->last = end;
 		return 0;
 	}
@@ -1002,7 +995,6 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			return -EPERM;
 		}
 
-		hmm_vma_walk.pgmap = NULL;
 		hmm_vma_walk.last = start;
 		hmm_vma_walk.flags = flags;
 		hmm_vma_walk.range = range;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 05/15] mm: remove the unused vma argument to hmm_range_dma_unmap
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 06/15] mm: remove superflous arguments from hmm_range_register Christoph Hellwig
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
---
 include/linux/hmm.h | 1 -
 mm/hmm.c            | 2 --
 2 files changed, 3 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 82265118d94a..59be0aa2476d 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -422,7 +422,6 @@ long hmm_range_dma_map(struct hmm_range *range,
 		       dma_addr_t *daddrs,
 		       unsigned int flags);
 long hmm_range_dma_unmap(struct hmm_range *range,
-			 struct vm_area_struct *vma,
 			 struct device *device,
 			 dma_addr_t *daddrs,
 			 bool dirty);
diff --git a/mm/hmm.c b/mm/hmm.c
index d66fa29b42e0..3a3852660757 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -1121,7 +1121,6 @@ EXPORT_SYMBOL(hmm_range_dma_map);
 /**
  * hmm_range_dma_unmap() - unmap range of that was map with hmm_range_dma_map()
  * @range: range being unmapped
- * @vma: the vma against which the range (optional)
  * @device: device against which dma map was done
  * @daddrs: dma address of mapped pages
  * @dirty: dirty page if it had the write flag set
@@ -1133,7 +1132,6 @@ EXPORT_SYMBOL(hmm_range_dma_map);
  * concurrent mmu notifier or sync_cpu_device_pagetables() to make progress.
  */
 long hmm_range_dma_unmap(struct hmm_range *range,
-			 struct vm_area_struct *vma,
 			 struct device *device,
 			 dma_addr_t *daddrs,
 			 bool dirty)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 06/15] mm: remove superflous arguments from hmm_range_register
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 05/15] mm: remove the unused vma argument to hmm_range_dma_unmap Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 07/15] mm: remove the page_shift member from struct hmm_range Christoph Hellwig
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

The start, end and page_shift values are all saved in the range
structure, so we might as well use that for argument passing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 Documentation/vm/hmm.rst                |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  7 +++++--
 drivers/gpu/drm/nouveau/nouveau_svm.c   |  5 ++---
 include/linux/hmm.h                     |  6 +-----
 mm/hmm.c                                | 20 +++++---------------
 5 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index ddcb5ca8b296..e63c11f7e0e0 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -222,7 +222,7 @@ The usage pattern is::
       range.flags = ...;
       range.values = ...;
       range.pfn_shift = ...;
-      hmm_range_register(&range);
+      hmm_range_register(&range, mirror);
 
       /*
        * Just wait for range to be valid, safe to ignore return value as we
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index f0821638bbc6..71d6e7087b0b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -818,8 +818,11 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 				0 : range->flags[HMM_PFN_WRITE];
 	range->pfn_flags_mask = 0;
 	range->pfns = pfns;
-	hmm_range_register(range, mirror, start,
-			   start + ttm->num_pages * PAGE_SIZE, PAGE_SHIFT);
+	range->page_shift = PAGE_SHIFT;
+	range->start = start;
+	range->end = start + ttm->num_pages * PAGE_SIZE;
+
+	hmm_range_register(range, mirror);
 
 	/*
 	 * Just wait for range to be valid, safe to ignore return value as we
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 98072fd48cf7..41fad4719ac6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -492,9 +492,7 @@ nouveau_range_fault(struct nouveau_svmm *svmm, struct hmm_range *range)
 	range->default_flags = 0;
 	range->pfn_flags_mask = -1UL;
 
-	ret = hmm_range_register(range, &svmm->mirror,
-				 range->start, range->end,
-				 PAGE_SHIFT);
+	ret = hmm_range_register(range, &svmm->mirror);
 	if (ret) {
 		up_read(&svmm->mm->mmap_sem);
 		return (int)ret;
@@ -682,6 +680,7 @@ nouveau_svm_fault(struct nvif_notify *notify)
 			 args.i.p.addr + args.i.p.size, fn - fi);
 
 		/* Have HMM fault pages within the fault window to the GPU. */
+		range.page_shift = PAGE_SHIFT;
 		range.start = args.i.p.addr;
 		range.end = args.i.p.addr + args.i.p.size;
 		range.pfns = args.phys;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 59be0aa2476d..c5b51376b453 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -400,11 +400,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
 /*
  * Please see Documentation/vm/hmm.rst for how to use the range API.
  */
-int hmm_range_register(struct hmm_range *range,
-		       struct hmm_mirror *mirror,
-		       unsigned long start,
-		       unsigned long end,
-		       unsigned page_shift);
+int hmm_range_register(struct hmm_range *range, struct hmm_mirror *mirror);
 void hmm_range_unregister(struct hmm_range *range);
 
 /*
diff --git a/mm/hmm.c b/mm/hmm.c
index 3a3852660757..926735a3aef9 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -843,35 +843,25 @@ static void hmm_pfns_clear(struct hmm_range *range,
  * hmm_range_register() - start tracking change to CPU page table over a range
  * @range: range
  * @mm: the mm struct for the range of virtual address
- * @start: start virtual address (inclusive)
- * @end: end virtual address (exclusive)
- * @page_shift: expect page shift for the range
+ *
  * Return: 0 on success, -EFAULT if the address space is no longer valid
  *
  * Track updates to the CPU page table see include/linux/hmm.h
  */
-int hmm_range_register(struct hmm_range *range,
-		       struct hmm_mirror *mirror,
-		       unsigned long start,
-		       unsigned long end,
-		       unsigned page_shift)
+int hmm_range_register(struct hmm_range *range, struct hmm_mirror *mirror)
 {
-	unsigned long mask = ((1UL << page_shift) - 1UL);
+	unsigned long mask = ((1UL << range->page_shift) - 1UL);
 	struct hmm *hmm = mirror->hmm;
 	unsigned long flags;
 
 	range->valid = false;
 	range->hmm = NULL;
 
-	if ((start & mask) || (end & mask))
+	if ((range->start & mask) || (range->end & mask))
 		return -EINVAL;
-	if (start >= end)
+	if (range->start >= range->end)
 		return -EINVAL;
 
-	range->page_shift = page_shift;
-	range->start = start;
-	range->end = end;
-
 	/* Prevent hmm_release() from running while the range is valid */
 	if (!mmget_not_zero(hmm->mm))
 		return -EFAULT;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 07/15] mm: remove the page_shift member from struct hmm_range
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 06/15] mm: remove superflous arguments from hmm_range_register Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-07 17:51   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry Christoph Hellwig
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

All users pass PAGE_SIZE here, and if we wanted to support single
entries for huge pages we should really just add a HMM_FAULT_HUGEPAGE
flag instead that uses the huge page size instead of having the
caller calculate that size once, just for the hmm code to verify it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 -
 drivers/gpu/drm/nouveau/nouveau_svm.c   |  1 -
 include/linux/hmm.h                     | 22 -------------
 mm/hmm.c                                | 42 ++++++-------------------
 4 files changed, 9 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 71d6e7087b0b..8bf79288c4e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -818,7 +818,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
 				0 : range->flags[HMM_PFN_WRITE];
 	range->pfn_flags_mask = 0;
 	range->pfns = pfns;
-	range->page_shift = PAGE_SHIFT;
 	range->start = start;
 	range->end = start + ttm->num_pages * PAGE_SIZE;
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 41fad4719ac6..668d4bd0c118 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -680,7 +680,6 @@ nouveau_svm_fault(struct nvif_notify *notify)
 			 args.i.p.addr + args.i.p.size, fn - fi);
 
 		/* Have HMM fault pages within the fault window to the GPU. */
-		range.page_shift = PAGE_SHIFT;
 		range.start = args.i.p.addr;
 		range.end = args.i.p.addr + args.i.p.size;
 		range.pfns = args.phys;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index c5b51376b453..51e18fbb8953 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -158,7 +158,6 @@ enum hmm_pfn_value_e {
  * @values: pfn value for some special case (none, special, error, ...)
  * @default_flags: default flags for the range (write, read, ... see hmm doc)
  * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter
- * @page_shift: device virtual address shift value (should be >= PAGE_SHIFT)
  * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
  * @valid: pfns array did not change since it has been fill by an HMM function
  */
@@ -172,31 +171,10 @@ struct hmm_range {
 	const uint64_t		*values;
 	uint64_t		default_flags;
 	uint64_t		pfn_flags_mask;
-	uint8_t			page_shift;
 	uint8_t			pfn_shift;
 	bool			valid;
 };
 
-/*
- * hmm_range_page_shift() - return the page shift for the range
- * @range: range being queried
- * Return: page shift (page size = 1 << page shift) for the range
- */
-static inline unsigned hmm_range_page_shift(const struct hmm_range *range)
-{
-	return range->page_shift;
-}
-
-/*
- * hmm_range_page_size() - return the page size for the range
- * @range: range being queried
- * Return: page size for the range in bytes
- */
-static inline unsigned long hmm_range_page_size(const struct hmm_range *range)
-{
-	return 1UL << hmm_range_page_shift(range);
-}
-
 /*
  * hmm_range_wait_until_valid() - wait for range to be valid
  * @range: range affected by invalidation to wait on
diff --git a/mm/hmm.c b/mm/hmm.c
index 926735a3aef9..f26d6abc4ed2 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -344,13 +344,12 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	uint64_t *pfns = range->pfns;
-	unsigned long i, page_size;
+	unsigned long i;
 
 	hmm_vma_walk->last = addr;
-	page_size = hmm_range_page_size(range);
-	i = (addr - range->start) >> range->page_shift;
+	i = (addr - range->start) >> PAGE_SHIFT;
 
-	for (; addr < end; addr += page_size, i++) {
+	for (; addr < end; addr += PAGE_SIZE, i++) {
 		pfns[i] = range->values[HMM_PFN_NONE];
 		if (fault || write_fault) {
 			int ret;
@@ -772,7 +771,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 				      struct mm_walk *walk)
 {
 #ifdef CONFIG_HUGETLB_PAGE
-	unsigned long addr = start, i, pfn, mask, size, pfn_inc;
+	unsigned long addr = start, i, pfn, mask;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
@@ -783,24 +782,12 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	pte_t entry;
 	int ret = 0;
 
-	size = huge_page_size(h);
-	mask = size - 1;
-	if (range->page_shift != PAGE_SHIFT) {
-		/* Make sure we are looking at a full page. */
-		if (start & mask)
-			return -EINVAL;
-		if (end < (start + size))
-			return -EINVAL;
-		pfn_inc = size >> PAGE_SHIFT;
-	} else {
-		pfn_inc = 1;
-		size = PAGE_SIZE;
-	}
+	mask = huge_page_size(h) - 1;
 
 	ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
 	entry = huge_ptep_get(pte);
 
-	i = (start - range->start) >> range->page_shift;
+	i = (start - range->start) >> PAGE_SHIFT;
 	orig_pfn = range->pfns[i];
 	range->pfns[i] = range->values[HMM_PFN_NONE];
 	cpu_flags = pte_to_hmm_pfn_flags(range, entry);
@@ -812,8 +799,8 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 		goto unlock;
 	}
 
-	pfn = pte_pfn(entry) + ((start & mask) >> range->page_shift);
-	for (; addr < end; addr += size, i++, pfn += pfn_inc)
+	pfn = pte_pfn(entry) + ((start & mask) >> PAGE_SHIFT);
+	for (; addr < end; addr += PAGE_SIZE, i++, pfn++)
 		range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				 cpu_flags;
 	hmm_vma_walk->last = end;
@@ -850,14 +837,13 @@ static void hmm_pfns_clear(struct hmm_range *range,
  */
 int hmm_range_register(struct hmm_range *range, struct hmm_mirror *mirror)
 {
-	unsigned long mask = ((1UL << range->page_shift) - 1UL);
 	struct hmm *hmm = mirror->hmm;
 	unsigned long flags;
 
 	range->valid = false;
 	range->hmm = NULL;
 
-	if ((range->start & mask) || (range->end & mask))
+	if ((range->start & (PAGE_SIZE - 1)) || (range->end & (PAGE_SIZE - 1)))
 		return -EINVAL;
 	if (range->start >= range->end)
 		return -EINVAL;
@@ -964,16 +950,6 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 		if (vma == NULL || (vma->vm_flags & device_vma))
 			return -EFAULT;
 
-		if (is_vm_hugetlb_page(vma)) {
-			if (huge_page_shift(hstate_vma(vma)) !=
-			    range->page_shift &&
-			    range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		} else {
-			if (range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		}
-
 		if (!(vma->vm_flags & VM_READ)) {
 			/*
 			 * If vma do not allow read access, then assume that it
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 07/15] mm: remove the page_shift member from struct hmm_range Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 18:02   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd Christoph Hellwig
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

The pagewalk code already passes the value as the hmask parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f26d6abc4ed2..03d37e102e3b 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -771,19 +771,16 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 				      struct mm_walk *walk)
 {
 #ifdef CONFIG_HUGETLB_PAGE
-	unsigned long addr = start, i, pfn, mask;
+	unsigned long addr = start, i, pfn;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	struct vm_area_struct *vma = walk->vma;
-	struct hstate *h = hstate_vma(vma);
 	uint64_t orig_pfn, cpu_flags;
 	bool fault, write_fault;
 	spinlock_t *ptl;
 	pte_t entry;
 	int ret = 0;
 
-	mask = huge_page_size(h) - 1;
-
 	ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
 	entry = huge_ptep_get(pte);
 
@@ -799,7 +796,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 		goto unlock;
 	}
 
-	pfn = pte_pfn(entry) + ((start & mask) >> PAGE_SHIFT);
+	pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT);
 	for (; addr < end; addr += PAGE_SIZE, i++, pfn++)
 		range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				 cpu_flags;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-07 17:18   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 10/15] mm: only define hmm_vma_walk_pud if needed Christoph Hellwig
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

pte_index is an internal arch helper in various architectures,
without consistent semantics.  Open code that calculation of a PMD
index based on the virtual address instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 03d37e102e3b..2083e4db46f5 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -486,7 +486,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 	if (pmd_protnone(pmd) || fault || write_fault)
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
-	pfn = pmd_pfn(pmd) + pte_index(addr);
+	pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
 	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
 		if (pmd_devmap(pmd)) {
 			pgmap = get_dev_pagemap(pfn, pgmap);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 10/15] mm: only define hmm_vma_walk_pud if needed
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub Christoph Hellwig
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

We only need the special pud_entry walker if PUD-sized hugepages and
pte mappings are supported, else the common pagewalk code will take
care of the iteration.  Not implementing this callback reduced the
amount of code compiled for non-x86 platforms, and also fixes compile
failures with other architectures when helpers like pud_pfn are not
implemented.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 2083e4db46f5..5e7afe685213 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -455,15 +455,6 @@ static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
 				range->flags[HMM_PFN_VALID];
 }
 
-static inline uint64_t pud_to_hmm_pfn_flags(struct hmm_range *range, pud_t pud)
-{
-	if (!pud_present(pud))
-		return 0;
-	return pud_write(pud) ? range->flags[HMM_PFN_VALID] |
-				range->flags[HMM_PFN_WRITE] :
-				range->flags[HMM_PFN_VALID];
-}
-
 static int hmm_vma_handle_pmd(struct mm_walk *walk,
 			      unsigned long addr,
 			      unsigned long end,
@@ -700,10 +691,19 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 	return 0;
 }
 
-static int hmm_vma_walk_pud(pud_t *pudp,
-			    unsigned long start,
-			    unsigned long end,
-			    struct mm_walk *walk)
+#if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && \
+    defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)
+static inline uint64_t pud_to_hmm_pfn_flags(struct hmm_range *range, pud_t pud)
+{
+	if (!pud_present(pud))
+		return 0;
+	return pud_write(pud) ? range->flags[HMM_PFN_VALID] |
+				range->flags[HMM_PFN_WRITE] :
+				range->flags[HMM_PFN_VALID];
+}
+
+static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
+		struct mm_walk *walk)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -765,6 +765,9 @@ static int hmm_vma_walk_pud(pud_t *pudp,
 
 	return 0;
 }
+#else
+#define hmm_vma_walk_pud	NULL
+#endif
 
 static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 				      unsigned long start, unsigned long end,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 10/15] mm: only define hmm_vma_walk_pud if needed Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 18:00   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 12/15] mm: cleanup the hmm_vma_walk_hugetlb_entry stub Christoph Hellwig
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

Stub out the whole function when CONFIG_TRANSPARENT_HUGEPAGE is not set
to make the function easier to read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 5e7afe685213..4aa7135f1094 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -455,13 +455,10 @@ static inline uint64_t pmd_to_hmm_pfn_flags(struct hmm_range *range, pmd_t pmd)
 				range->flags[HMM_PFN_VALID];
 }
 
-static int hmm_vma_handle_pmd(struct mm_walk *walk,
-			      unsigned long addr,
-			      unsigned long end,
-			      uint64_t *pfns,
-			      pmd_t pmd)
-{
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
+		unsigned long end, uint64_t *pfns, pmd_t pmd)
+{
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
 	struct dev_pagemap *pgmap = NULL;
@@ -490,11 +487,12 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 		put_dev_pagemap(pgmap);
 	hmm_vma_walk->last = end;
 	return 0;
-#else
-	/* If THP is not enabled then we should never reach this code ! */
-	return -EINVAL;
-#endif
 }
+#else /* CONFIG_TRANSPARENT_HUGEPAGE */
+/* stub to allow the code below to compile */
+int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
+		unsigned long end, uint64_t *pfns, pmd_t pmd);
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 static inline uint64_t pte_to_hmm_pfn_flags(struct hmm_range *range, pte_t pte)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 12/15] mm: cleanup the hmm_vma_walk_hugetlb_entry stub
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 13/15] mm: allow HMM_MIRROR on all architectures with MMU Christoph Hellwig
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

Stub out the whole function and assign NULL to the .hugetlb_entry method
if CONFIG_HUGETLB_PAGE is not set, as the method won't ever be called in
that case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/hmm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 4aa7135f1094..dee99d0cc856 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -767,11 +767,11 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 #define hmm_vma_walk_pud	NULL
 #endif
 
+#ifdef CONFIG_HUGETLB_PAGE
 static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 				      unsigned long start, unsigned long end,
 				      struct mm_walk *walk)
 {
-#ifdef CONFIG_HUGETLB_PAGE
 	unsigned long addr = start, i, pfn;
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -810,10 +810,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 		return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
 
 	return ret;
-#else /* CONFIG_HUGETLB_PAGE */
-	return -EINVAL;
-#endif
 }
+#else
+#define hmm_vma_walk_hugetlb_entry NULL
+#endif /* CONFIG_HUGETLB_PAGE */
 
 static void hmm_pfns_clear(struct hmm_range *range,
 			   uint64_t *pfns,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 13/15] mm: allow HMM_MIRROR on all architectures with MMU
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 12/15] mm: cleanup the hmm_vma_walk_hugetlb_entry stub Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 16:05 ` [PATCH 14/15] mm: make HMM_MIRROR an implicit option Christoph Hellwig
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

There isn't really any architecture specific code in this page table
walk implementation, so drop the dependencies.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
---
 mm/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 56cec636a1fc..b18782be969c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -677,8 +677,7 @@ config DEV_PAGEMAP_OPS
 
 config HMM_MIRROR
 	bool "HMM mirror CPU page table into a device page table"
-	depends on (X86_64 || PPC64)
-	depends on MMU && 64BIT
+	depends on MMU
 	select MMU_NOTIFIER
 	help
 	  Select HMM_MIRROR if you want to mirror range of the CPU page table of a
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 14/15] mm: make HMM_MIRROR an implicit option
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 13/15] mm: allow HMM_MIRROR on all architectures with MMU Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 17:44   ` Jason Gunthorpe
  2019-08-06 16:05 ` [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR Christoph Hellwig
  2019-08-07 18:17 ` hmm cleanups, v2 Jason Gunthorpe
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

Make HMM_MIRROR an option that is selected by drivers wanting to use it
instead of a user visible option as it is just a low-level
implementation detail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/amd/amdgpu/Kconfig |  4 +++-
 drivers/gpu/drm/nouveau/Kconfig    |  4 +++-
 mm/Kconfig                         | 14 ++++++--------
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index f6e5c0282fc1..2e98c016cb47 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -27,7 +27,9 @@ config DRM_AMDGPU_CIK
 config DRM_AMDGPU_USERPTR
 	bool "Always enable userptr write support"
 	depends on DRM_AMDGPU
-	depends on HMM_MIRROR
+	depends on MMU
+	select HMM_MIRROR
+	select MMU_NOTIFIER
 	help
 	  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
 	  isn't already selected to enabled full userptr support.
diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index 96b9814e6d06..df4352c279ba 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -86,9 +86,11 @@ config DRM_NOUVEAU_SVM
 	bool "(EXPERIMENTAL) Enable SVM (Shared Virtual Memory) support"
 	depends on DEVICE_PRIVATE
 	depends on DRM_NOUVEAU
-	depends on HMM_MIRROR
+	depends on MMU
 	depends on STAGING
+	select HMM_MIRROR
 	select MIGRATE_VMA_HELPER
+	select MMU_NOTIFIER
 	default n
 	help
 	  Say Y here if you want to enable experimental support for
diff --git a/mm/Kconfig b/mm/Kconfig
index b18782be969c..563436dc1f24 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -675,16 +675,14 @@ config MIGRATE_VMA_HELPER
 config DEV_PAGEMAP_OPS
 	bool
 
+#
+# Helpers to mirror range of the CPU page tables of a process into device page
+# tables.
+#
 config HMM_MIRROR
-	bool "HMM mirror CPU page table into a device page table"
+	bool
 	depends on MMU
-	select MMU_NOTIFIER
-	help
-	  Select HMM_MIRROR if you want to mirror range of the CPU page table of a
-	  process into a device page table. Here, mirror means "keep synchronized".
-	  Prerequisites: the device must provide the ability to write-protect its
-	  page tables (at PAGE_SIZE granularity), and must be able to recover from
-	  the resulting potential page faults.
+	depends on MMU_NOTIFIER
 
 config DEVICE_PRIVATE
 	bool "Unaddressable device memory (GPU memory, ...)"
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 14/15] mm: make HMM_MIRROR an implicit option Christoph Hellwig
@ 2019-08-06 16:05 ` Christoph Hellwig
  2019-08-06 17:44   ` Jason Gunthorpe
  2019-08-07 18:17 ` hmm cleanups, v2 Jason Gunthorpe
  15 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-06 16:05 UTC (permalink / raw)
  To: Jérôme Glisse, Jason Gunthorpe, Ben Skeggs, Felix Kuehling
  Cc: Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel

The option is just used to select HMM mirror support and has a very
confusing help text.  Just pull in the HMM mirror code by default
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/gpu/drm/Kconfig                 |  2 ++
 drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
 4 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 1d80222587ad..319c1da2e74e 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -226,9 +226,11 @@ config DRM_AMDGPU
 	select DRM_SCHED
         select DRM_TTM
 	select POWER_SUPPLY
+	select HMM_MIRROR
 	select HWMON
 	select BACKLIGHT_CLASS_DEVICE
 	select INTERVAL_TREE
+	select MMU_NOTIFIER
 	select CHASH
 	help
 	  Choose this option if you have a recent AMD Radeon graphics card.
diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 2e98c016cb47..c5c963164f5e 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -24,16 +24,6 @@ config DRM_AMDGPU_CIK
 
 	  radeon.cik_support=0 amdgpu.cik_support=1
 
-config DRM_AMDGPU_USERPTR
-	bool "Always enable userptr write support"
-	depends on DRM_AMDGPU
-	depends on MMU
-	select HMM_MIRROR
-	select MMU_NOTIFIER
-	help
-	  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
-	  isn't already selected to enabled full userptr support.
-
 config DRM_AMDGPU_GART_DEBUGFS
 	bool "Allow GART access through debugfs"
 	depends on DRM_AMDGPU
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8bf79288c4e2..00b74adbd790 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -751,9 +751,7 @@ struct amdgpu_ttm_tt {
 	uint64_t		userptr;
 	struct task_struct	*usertask;
 	uint32_t		userflags;
-#if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
 	struct hmm_range	*range;
-#endif
 };
 
 /**
@@ -763,7 +761,6 @@ struct amdgpu_ttm_tt {
  * Calling function must call amdgpu_ttm_tt_userptr_range_done() once and only
  * once afterwards to stop HMM tracking
  */
-#if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
 
 #define MAX_RETRY_HMM_RANGE_FAULT	16
 
@@ -892,7 +889,6 @@ bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm)
 
 	return r;
 }
-#endif
 
 /**
  * amdgpu_ttm_tt_set_user_pages - Copy pages in, putting old pages as necessary.
@@ -970,12 +966,10 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt *ttm)
 
 	sg_free_table(ttm->sg);
 
-#if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
 	if (gtt->range &&
 	    ttm->pages[0] == hmm_device_entry_to_page(gtt->range,
 						      gtt->range->pfns[0]))
 		WARN_ONCE(1, "Missing get_user_page_done\n");
-#endif
 }
 
 int amdgpu_ttm_gart_bind(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index caa76c693700..406b1c5e6dd4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -101,20 +101,8 @@ int amdgpu_mmap(struct file *filp, struct vm_area_struct *vma);
 int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
 int amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
 
-#if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
 int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages);
 bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm);
-#else
-static inline int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo,
-					       struct page **pages)
-{
-	return -EPERM;
-}
-static inline bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm)
-{
-	return false;
-}
-#endif
 
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
 int amdgpu_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-06 16:05 ` [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR Christoph Hellwig
@ 2019-08-06 17:44   ` Jason Gunthorpe
  2019-08-06 17:51     ` Kuehling, Felix
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-06 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote:
> The option is just used to select HMM mirror support and has a very
> confusing help text.  Just pull in the HMM mirror code by default
> instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/gpu/drm/Kconfig                 |  2 ++
>  drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
>  4 files changed, 2 insertions(+), 28 deletions(-)

Felix, was this an effort to avoid the arch restriction on hmm or
something? Also can't see why this was like this.

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 14/15] mm: make HMM_MIRROR an implicit option
  2019-08-06 16:05 ` [PATCH 14/15] mm: make HMM_MIRROR an implicit option Christoph Hellwig
@ 2019-08-06 17:44   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-06 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:52PM +0300, Christoph Hellwig wrote:
> Make HMM_MIRROR an option that is selected by drivers wanting to use it
> instead of a user visible option as it is just a low-level
> implementation detail.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/gpu/drm/amd/amdgpu/Kconfig |  4 +++-
>  drivers/gpu/drm/nouveau/Kconfig    |  4 +++-
>  mm/Kconfig                         | 14 ++++++--------
>  3 files changed, 12 insertions(+), 10 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-06 17:44   ` Jason Gunthorpe
@ 2019-08-06 17:51     ` Kuehling, Felix
  2019-08-06 18:58       ` Alex Deucher
  0 siblings, 1 reply; 56+ messages in thread
From: Kuehling, Felix @ 2019-08-06 17:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig, Deucher, Alexander, Koenig,
	Christian
  Cc: Jérôme Glisse, Ben Skeggs, Ralph Campbell, linux-mm,
	nouveau, dri-devel, amd-gfx, linux-kernel

On 2019-08-06 13:44, Jason Gunthorpe wrote:
> On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote:
>> The option is just used to select HMM mirror support and has a very
>> confusing help text.  Just pull in the HMM mirror code by default
>> instead.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/gpu/drm/Kconfig                 |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
>>   4 files changed, 2 insertions(+), 28 deletions(-)
> Felix, was this an effort to avoid the arch restriction on hmm or
> something? Also can't see why this was like this.

This option predates KFD's support of userptrs, which in turn predates 
HMM. Radeon has the same kind of option, though it doesn't affect HMM in 
that case.

Alex, Christian, can you think of a good reason to maintain userptr 
support as an option in amdgpu? I suspect it was originally meant as a 
way to allow kernels with amdgpu without MMU notifiers. Now it would 
allow a kernel with amdgpu without HMM or MMU notifiers. I don't know if 
this is a useful thing to have.

Regards,
   Felix

>
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>
> Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub
  2019-08-06 16:05 ` [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub Christoph Hellwig
@ 2019-08-06 18:00   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-06 18:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:49PM +0300, Christoph Hellwig wrote:
> Stub out the whole function when CONFIG_TRANSPARENT_HUGEPAGE is not set
> to make the function easier to read.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/hmm.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry
  2019-08-06 16:05 ` [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry Christoph Hellwig
@ 2019-08-06 18:02   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-06 18:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:46PM +0300, Christoph Hellwig wrote:
> The pagewalk code already passes the value as the hmask parameter.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>  mm/hmm.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault
  2019-08-06 16:05 ` [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault Christoph Hellwig
@ 2019-08-06 18:02   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-06 18:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:41PM +0300, Christoph Hellwig wrote:
> We'll need the nouveau_svmm structure to improve the function soon.
> For now this allows using the svmm->mm reference to unlock the
> mmap_sem, and thus the same dereference chain that the caller uses
> to lock and unlock it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-06 17:51     ` Kuehling, Felix
@ 2019-08-06 18:58       ` Alex Deucher
  2019-08-06 20:03         ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Deucher @ 2019-08-06 18:58 UTC (permalink / raw)
  To: Kuehling, Felix
  Cc: Jason Gunthorpe, Christoph Hellwig, Deucher, Alexander, Koenig,
	Christian, Ralph Campbell, amd-gfx, nouveau, linux-kernel,
	dri-devel, linux-mm, Jérôme Glisse, Ben Skeggs

On Tue, Aug 6, 2019 at 1:51 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
>
> On 2019-08-06 13:44, Jason Gunthorpe wrote:
> > On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote:
> >> The option is just used to select HMM mirror support and has a very
> >> confusing help text.  Just pull in the HMM mirror code by default
> >> instead.
> >>
> >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >> ---
> >>   drivers/gpu/drm/Kconfig                 |  2 ++
> >>   drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
> >>   4 files changed, 2 insertions(+), 28 deletions(-)
> > Felix, was this an effort to avoid the arch restriction on hmm or
> > something? Also can't see why this was like this.
>
> This option predates KFD's support of userptrs, which in turn predates
> HMM. Radeon has the same kind of option, though it doesn't affect HMM in
> that case.
>
> Alex, Christian, can you think of a good reason to maintain userptr
> support as an option in amdgpu? I suspect it was originally meant as a
> way to allow kernels with amdgpu without MMU notifiers. Now it would
> allow a kernel with amdgpu without HMM or MMU notifiers. I don't know if
> this is a useful thing to have.

Right.  There were people that didn't have MMU notifiers that wanted
support for the GPU.  For a lot of older APIs, a lack of userptr
support was not a big deal (it just disabled some optimizations and
API extensions), but as it becomes more relevant it may make sense to
just make it a requirement.

Alex

>
> Regards,
>    Felix
>
> >
> > Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
> >
> > Jason
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-06 18:58       ` Alex Deucher
@ 2019-08-06 20:03         ` Jason Gunthorpe
  2019-08-07  6:57           ` Koenig, Christian
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-06 20:03 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Kuehling, Felix, Christoph Hellwig, Deucher, Alexander, Koenig,
	Christian, Ralph Campbell, amd-gfx, nouveau, linux-kernel,
	dri-devel, linux-mm, Jérôme Glisse, Ben Skeggs

On Tue, Aug 06, 2019 at 02:58:58PM -0400, Alex Deucher wrote:
> On Tue, Aug 6, 2019 at 1:51 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
> >
> > On 2019-08-06 13:44, Jason Gunthorpe wrote:
> > > On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote:
> > >> The option is just used to select HMM mirror support and has a very
> > >> confusing help text.  Just pull in the HMM mirror code by default
> > >> instead.
> > >>
> > >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> > >>   drivers/gpu/drm/Kconfig                 |  2 ++
> > >>   drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
> > >>   4 files changed, 2 insertions(+), 28 deletions(-)
> > > Felix, was this an effort to avoid the arch restriction on hmm or
> > > something? Also can't see why this was like this.
> >
> > This option predates KFD's support of userptrs, which in turn predates
> > HMM. Radeon has the same kind of option, though it doesn't affect HMM in
> > that case.
> >
> > Alex, Christian, can you think of a good reason to maintain userptr
> > support as an option in amdgpu? I suspect it was originally meant as a
> > way to allow kernels with amdgpu without MMU notifiers. Now it would
> > allow a kernel with amdgpu without HMM or MMU notifiers. I don't know if
> > this is a useful thing to have.
> 
> Right.  There were people that didn't have MMU notifiers that wanted
> support for the GPU.

?? Is that even a real thing? mmu_notifier does not have much kconfig
dependency.

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-06 20:03         ` Jason Gunthorpe
@ 2019-08-07  6:57           ` Koenig, Christian
  2019-08-07 11:46             ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Koenig, Christian @ 2019-08-07  6:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Deucher
  Cc: Kuehling, Felix, Christoph Hellwig, Deucher, Alexander,
	Ralph Campbell, amd-gfx, nouveau, linux-kernel, dri-devel,
	linux-mm, Jérôme Glisse, Ben Skeggs

Am 06.08.19 um 22:03 schrieb Jason Gunthorpe:
> On Tue, Aug 06, 2019 at 02:58:58PM -0400, Alex Deucher wrote:
>> On Tue, Aug 6, 2019 at 1:51 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
>>> On 2019-08-06 13:44, Jason Gunthorpe wrote:
>>>> On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote:
>>>>> The option is just used to select HMM mirror support and has a very
>>>>> confusing help text.  Just pull in the HMM mirror code by default
>>>>> instead.
>>>>>
>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>    drivers/gpu/drm/Kconfig                 |  2 ++
>>>>>    drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
>>>>>    4 files changed, 2 insertions(+), 28 deletions(-)
>>>> Felix, was this an effort to avoid the arch restriction on hmm or
>>>> something? Also can't see why this was like this.
>>> This option predates KFD's support of userptrs, which in turn predates
>>> HMM. Radeon has the same kind of option, though it doesn't affect HMM in
>>> that case.
>>>
>>> Alex, Christian, can you think of a good reason to maintain userptr
>>> support as an option in amdgpu? I suspect it was originally meant as a
>>> way to allow kernels with amdgpu without MMU notifiers. Now it would
>>> allow a kernel with amdgpu without HMM or MMU notifiers. I don't know if
>>> this is a useful thing to have.
>> Right.  There were people that didn't have MMU notifiers that wanted
>> support for the GPU.
> ?? Is that even a real thing? mmu_notifier does not have much kconfig
> dependency.

Yes, that used to be a very real thing.

Initially a lot of users didn't wanted mmu notifiers to be enabled 
because of the performance overhead they costs.

Then we had the problem that HMM mirror wasn't available on a lot of 
architectures.

Not sure if we still need it separately, but I think that it shouldn't 
be removed without asking around if somebody still needs that.

Christian.

>
> Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR
  2019-08-07  6:57           ` Koenig, Christian
@ 2019-08-07 11:46             ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 11:46 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Alex Deucher, Kuehling, Felix, Christoph Hellwig, Deucher,
	Alexander, Ralph Campbell, amd-gfx, nouveau, linux-kernel,
	dri-devel, linux-mm, Jérôme Glisse, Ben Skeggs

On Wed, Aug 07, 2019 at 06:57:24AM +0000, Koenig, Christian wrote:
> Am 06.08.19 um 22:03 schrieb Jason Gunthorpe:
> > On Tue, Aug 06, 2019 at 02:58:58PM -0400, Alex Deucher wrote:
> >> On Tue, Aug 6, 2019 at 1:51 PM Kuehling, Felix <Felix.Kuehling@amd.com> wrote:
> >>> On 2019-08-06 13:44, Jason Gunthorpe wrote:
> >>>> On Tue, Aug 06, 2019 at 07:05:53PM +0300, Christoph Hellwig wrote:
> >>>>> The option is just used to select HMM mirror support and has a very
> >>>>> confusing help text.  Just pull in the HMM mirror code by default
> >>>>> instead.
> >>>>>
> >>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >>>>>    drivers/gpu/drm/Kconfig                 |  2 ++
> >>>>>    drivers/gpu/drm/amd/amdgpu/Kconfig      | 10 ----------
> >>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  6 ------
> >>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 12 ------------
> >>>>>    4 files changed, 2 insertions(+), 28 deletions(-)
> >>>> Felix, was this an effort to avoid the arch restriction on hmm or
> >>>> something? Also can't see why this was like this.
> >>> This option predates KFD's support of userptrs, which in turn predates
> >>> HMM. Radeon has the same kind of option, though it doesn't affect HMM in
> >>> that case.
> >>>
> >>> Alex, Christian, can you think of a good reason to maintain userptr
> >>> support as an option in amdgpu? I suspect it was originally meant as a
> >>> way to allow kernels with amdgpu without MMU notifiers. Now it would
> >>> allow a kernel with amdgpu without HMM or MMU notifiers. I don't know if
> >>> this is a useful thing to have.
> >> Right.  There were people that didn't have MMU notifiers that wanted
> >> support for the GPU.
> > ?? Is that even a real thing? mmu_notifier does not have much kconfig
> > dependency.
> 
> Yes, that used to be a very real thing.
> 
> Initially a lot of users didn't wanted mmu notifiers to be enabled 
> because of the performance overhead they costs.

Seems strange to hear these days, every distro ships with it on, it is
needed for kvm.

> Then we had the problem that HMM mirror wasn't available on a lot of 
> architectures.

Some patches for hmm are ready now that will fix this

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd
  2019-08-06 16:05 ` [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd Christoph Hellwig
@ 2019-08-07 17:18   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 17:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:47PM +0300, Christoph Hellwig wrote:
> pte_index is an internal arch helper in various architectures,
> without consistent semantics.  Open code that calculation of a PMD
> index based on the virtual address instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/hmm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

There sure are a lot of different ways to express this, but this one
looks OK to me, at least the switch from the PTRS_PER_PTE expression
in the x86 imlpementation to PMD_MASK looks equivalent
 
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-06 16:05 ` [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk Christoph Hellwig
@ 2019-08-07 17:45   ` Jason Gunthorpe
  2019-08-07 18:47     ` Dan Williams
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 17:45 UTC (permalink / raw)
  To: Christoph Hellwig, Dan Williams
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:42PM +0300, Christoph Hellwig wrote:
> There is only a single place where the pgmap is passed over a function
> call, so replace it with local variables in the places where we deal
> with the pgmap.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>  mm/hmm.c | 62 ++++++++++++++++++++++++--------------------------------
>  1 file changed, 27 insertions(+), 35 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9a908902e4cc..d66fa29b42e0 100644
> +++ b/mm/hmm.c
> @@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister);
>  
>  struct hmm_vma_walk {
>  	struct hmm_range	*range;
> -	struct dev_pagemap	*pgmap;
>  	unsigned long		last;
>  	unsigned int		flags;
>  };
> @@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
>  	struct hmm_range *range = hmm_vma_walk->range;
> +	struct dev_pagemap *pgmap = NULL;
>  	unsigned long pfn, npages, i;
>  	bool fault, write_fault;
>  	uint64_t cpu_flags;
> @@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>  	pfn = pmd_pfn(pmd) + pte_index(addr);
>  	for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
>  		if (pmd_devmap(pmd)) {
> -			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
> -					      hmm_vma_walk->pgmap);
> -			if (unlikely(!hmm_vma_walk->pgmap))
> +			pgmap = get_dev_pagemap(pfn, pgmap);
> +			if (unlikely(!pgmap))
>  				return -EBUSY;

Unrelated to this patch, but what is the point of getting checking
that the pgmap exists for the page and then immediately releasing it?
This code has this pattern in several places.

It feels racy

>  		}
>  		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
>  	}
> -	if (hmm_vma_walk->pgmap) {
> -		put_dev_pagemap(hmm_vma_walk->pgmap);
> -		hmm_vma_walk->pgmap = NULL;

Putting the value in the hmm_vma_walk would have made some sense to me
if the pgmap was not set to NULL all over the place. Then the most
xa_loads would be eliminated, as I would expect the pgmap tends to be
mostly uniform for these use cases.

Is there some reason the pgmap ref can't be held across
faulting/sleeping? ie like below.

Anyhow, I looked over this pretty carefully and the change looks
functionally OK, I just don't know why the code is like this in the
first place.

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

diff --git a/mm/hmm.c b/mm/hmm.c
index 9a908902e4cc38..4e30128c23a505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 		}
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 #else
@@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	return 0;
 
 fault:
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep);
 	/* Fault any virtual address we were asked to fault */
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return r;
 		}
 	}
-	if (hmm_vma_walk->pgmap) {
-		/*
-		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-		 * so that we can leverage get_dev_pagemap() optimization which
-		 * will not re-take a reference on a pgmap if we already have
-		 * one.
-		 */
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep - 1);
 
 	hmm_vma_walk->last = addr;
@@ -751,10 +733,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
-		if (hmm_vma_walk->pgmap) {
-			put_dev_pagemap(hmm_vma_walk->pgmap);
-			hmm_vma_walk->pgmap = NULL;
-		}
 		hmm_vma_walk->last = end;
 		return 0;
 	}
@@ -1026,6 +1004,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			/* Keep trying while the range is valid. */
 		} while (ret == -EBUSY && range->valid);
 
+		/*
+		 * We do put_dev_pagemap() here so that we can leverage
+		 * get_dev_pagemap() optimization which will not re-take a
+		 * reference on a pgmap if we already have one.
+		 */
+		if (hmm_vma_walk->pgmap)
+			put_dev_pagemap(hmm_vma_walk->pgmap);
+
 		if (ret) {
 			unsigned long i;
 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 07/15] mm: remove the page_shift member from struct hmm_range
  2019-08-06 16:05 ` [PATCH 07/15] mm: remove the page_shift member from struct hmm_range Christoph Hellwig
@ 2019-08-07 17:51   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 17:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:45PM +0300, Christoph Hellwig wrote:
> All users pass PAGE_SIZE here, and if we wanted to support single
> entries for huge pages we should really just add a HMM_FAULT_HUGEPAGE
> flag instead that uses the huge page size instead of having the
> caller calculate that size once, just for the hmm code to verify it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 -
>  drivers/gpu/drm/nouveau/nouveau_svm.c   |  1 -
>  include/linux/hmm.h                     | 22 -------------
>  mm/hmm.c                                | 42 ++++++-------------------
>  4 files changed, 9 insertions(+), 57 deletions(-)

Having looked at ODP more closley this doesn't seem to match what it
needs anyhow. It can keep using its checking algorithm
 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 71d6e7087b0b..8bf79288c4e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -818,7 +818,6 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>  				0 : range->flags[HMM_PFN_WRITE];
>  	range->pfn_flags_mask = 0;
>  	range->pfns = pfns;
> -	range->page_shift = PAGE_SHIFT;
>  	range->start = start;
>  	range->end = start + ttm->num_pages * PAGE_SIZE;
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index 41fad4719ac6..668d4bd0c118 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -680,7 +680,6 @@ nouveau_svm_fault(struct nvif_notify *notify)
>  			 args.i.p.addr + args.i.p.size, fn - fi);
>  
>  		/* Have HMM fault pages within the fault window to the GPU. */
> -		range.page_shift = PAGE_SHIFT;
>  		range.start = args.i.p.addr;
>  		range.end = args.i.p.addr + args.i.p.size;
>  		range.pfns = args.phys;
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index c5b51376b453..51e18fbb8953 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -158,7 +158,6 @@ enum hmm_pfn_value_e {
>   * @values: pfn value for some special case (none, special, error, ...)
>   * @default_flags: default flags for the range (write, read, ... see hmm doc)
>   * @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter
> - * @page_shift: device virtual address shift value (should be >= PAGE_SHIFT)
>   * @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
>   * @valid: pfns array did not change since it has been fill by an HMM function
>   */
> @@ -172,31 +171,10 @@ struct hmm_range {
>  	const uint64_t		*values;
>  	uint64_t		default_flags;
>  	uint64_t		pfn_flags_mask;
> -	uint8_t			page_shift;
>  	uint8_t			pfn_shift;
>  	bool			valid;
>  };
>  
> -/*
> - * hmm_range_page_shift() - return the page shift for the range
> - * @range: range being queried
> - * Return: page shift (page size = 1 << page shift) for the range
> - */
> -static inline unsigned hmm_range_page_shift(const struct hmm_range *range)
> -{
> -	return range->page_shift;
> -}
> -
> -/*
> - * hmm_range_page_size() - return the page size for the range
> - * @range: range being queried
> - * Return: page size for the range in bytes
> - */
> -static inline unsigned long hmm_range_page_size(const struct hmm_range *range)
> -{
> -	return 1UL << hmm_range_page_shift(range);
> -}
> -
>  /*
>   * hmm_range_wait_until_valid() - wait for range to be valid
>   * @range: range affected by invalidation to wait on
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 926735a3aef9..f26d6abc4ed2 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -344,13 +344,12 @@ static int hmm_vma_walk_hole_(unsigned long addr, unsigned long end,
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
>  	struct hmm_range *range = hmm_vma_walk->range;
>  	uint64_t *pfns = range->pfns;
> -	unsigned long i, page_size;
> +	unsigned long i;
>  
>  	hmm_vma_walk->last = addr;
> -	page_size = hmm_range_page_size(range);
> -	i = (addr - range->start) >> range->page_shift;
> +	i = (addr - range->start) >> PAGE_SHIFT;
>  
> -	for (; addr < end; addr += page_size, i++) {
> +	for (; addr < end; addr += PAGE_SIZE, i++) {
>  		pfns[i] = range->values[HMM_PFN_NONE];
>  		if (fault || write_fault) {
>  			int ret;
> @@ -772,7 +771,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>  				      struct mm_walk *walk)
>  {
>  #ifdef CONFIG_HUGETLB_PAGE
> -	unsigned long addr = start, i, pfn, mask, size, pfn_inc;
> +	unsigned long addr = start, i, pfn, mask;
>  	struct hmm_vma_walk *hmm_vma_walk = walk->private;
>  	struct hmm_range *range = hmm_vma_walk->range;
>  	struct vm_area_struct *vma = walk->vma;
> @@ -783,24 +782,12 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>  	pte_t entry;
>  	int ret = 0;
>  
> -	size = huge_page_size(h);
> -	mask = size - 1;
> -	if (range->page_shift != PAGE_SHIFT) {
> -		/* Make sure we are looking at a full page. */
> -		if (start & mask)
> -			return -EINVAL;
> -		if (end < (start + size))
> -			return -EINVAL;
> -		pfn_inc = size >> PAGE_SHIFT;
> -	} else {
> -		pfn_inc = 1;
> -		size = PAGE_SIZE;
> -	}
> +	mask = huge_page_size(h) - 1;
>  
>  	ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
>  	entry = huge_ptep_get(pte);
>  
> -	i = (start - range->start) >> range->page_shift;
> +	i = (start - range->start) >> PAGE_SHIFT;
>  	orig_pfn = range->pfns[i];
>  	range->pfns[i] = range->values[HMM_PFN_NONE];
>  	cpu_flags = pte_to_hmm_pfn_flags(range, entry);
> @@ -812,8 +799,8 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>  		goto unlock;
>  	}
>  
> -	pfn = pte_pfn(entry) + ((start & mask) >> range->page_shift);
> -	for (; addr < end; addr += size, i++, pfn += pfn_inc)
> +	pfn = pte_pfn(entry) + ((start & mask) >> PAGE_SHIFT);
> +	for (; addr < end; addr += PAGE_SIZE, i++, pfn++)
>  		range->pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
>  				 cpu_flags;
>  	hmm_vma_walk->last = end;
> @@ -850,14 +837,13 @@ static void hmm_pfns_clear(struct hmm_range *range,
>   */
>  int hmm_range_register(struct hmm_range *range, struct hmm_mirror *mirror)
>  {
> -	unsigned long mask = ((1UL << range->page_shift) - 1UL);
>  	struct hmm *hmm = mirror->hmm;
>  	unsigned long flags;
>  
>  	range->valid = false;
>  	range->hmm = NULL;
>  
> -	if ((range->start & mask) || (range->end & mask))
> +	if ((range->start & (PAGE_SIZE - 1)) || (range->end & (PAGE_SIZE - 1)))
>  		return -EINVAL;

PAGE_SIZE-1 == PAGE_MASK ? If yes I can fix it

Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: hmm cleanups, v2
  2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2019-08-06 16:05 ` [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR Christoph Hellwig
@ 2019-08-07 18:17 ` Jason Gunthorpe
  15 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-07 18:17 UTC (permalink / raw)
  To: Christoph Hellwig, John Hubbard
  Cc: Jérôme Glisse, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Tue, Aug 06, 2019 at 07:05:38PM +0300, Christoph Hellwig wrote:
> 
> Hi Jérôme, Ben, Felix and Jason,
> 
> below is a series against the hmm tree which cleans up various minor
> bits and allows HMM_MIRROR to be built on all architectures.
> 
> Diffstat:
> 
>     11 files changed, 94 insertions(+), 210 deletions(-)
> 
> A git tree is also available at:
> 
>     git://git.infradead.org/users/hch/misc.git hmm-cleanups.2
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/hmm-cleanups.2
> 
> Changes since v1:
>  - fix the cover letter subject
>  - improve various patch descriptions
>  - use svmm->mm in nouveau_range_fault
>  - inverse the hmask field when using it
>  - select HMM_MIRROR instead of making it a user visible option
 
I think it is straightforward enough to move into -next, so applied to
the hmm.git lets get some more reviewed-bys/tested-by though.

For now I dropped 'remove the pgmap field from struct hmm_vma_walk'
just to hear the followup and 'amdgpu: remove
CONFIG_DRM_AMDGPU_USERPTR' until the AMD team Acks

Thanks,
Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-07 17:45   ` Jason Gunthorpe
@ 2019-08-07 18:47     ` Dan Williams
  2019-08-08  6:59       ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-07 18:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Wed, Aug 7, 2019 at 10:45 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Tue, Aug 06, 2019 at 07:05:42PM +0300, Christoph Hellwig wrote:
> > There is only a single place where the pgmap is passed over a function
> > call, so replace it with local variables in the places where we deal
> > with the pgmap.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >  mm/hmm.c | 62 ++++++++++++++++++++++++--------------------------------
> >  1 file changed, 27 insertions(+), 35 deletions(-)
> >
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 9a908902e4cc..d66fa29b42e0 100644
> > +++ b/mm/hmm.c
> > @@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister);
> >
> >  struct hmm_vma_walk {
> >       struct hmm_range        *range;
> > -     struct dev_pagemap      *pgmap;
> >       unsigned long           last;
> >       unsigned int            flags;
> >  };
> > @@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >       struct hmm_vma_walk *hmm_vma_walk = walk->private;
> >       struct hmm_range *range = hmm_vma_walk->range;
> > +     struct dev_pagemap *pgmap = NULL;
> >       unsigned long pfn, npages, i;
> >       bool fault, write_fault;
> >       uint64_t cpu_flags;
> > @@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> >       pfn = pmd_pfn(pmd) + pte_index(addr);
> >       for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
> >               if (pmd_devmap(pmd)) {
> > -                     hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
> > -                                           hmm_vma_walk->pgmap);
> > -                     if (unlikely(!hmm_vma_walk->pgmap))
> > +                     pgmap = get_dev_pagemap(pfn, pgmap);
> > +                     if (unlikely(!pgmap))
> >                               return -EBUSY;
>
> Unrelated to this patch, but what is the point of getting checking
> that the pgmap exists for the page and then immediately releasing it?
> This code has this pattern in several places.
>
> It feels racy

Agree, not sure what the intent is here. The only other reason call
get_dev_pagemap() is to just check in general if the pfn is indeed
owned by some ZONE_DEVICE instance, but if the intent is to make sure
the device is still attached/enabled that check is invalidated at
put_dev_pagemap().

If it's the former case, validating ZONE_DEVICE pfns, I imagine we can
do something cheaper with a helper that is on the order of the same
cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag
or something similar.

>
> >               }
> >               pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
> >       }
> > -     if (hmm_vma_walk->pgmap) {
> > -             put_dev_pagemap(hmm_vma_walk->pgmap);
> > -             hmm_vma_walk->pgmap = NULL;
>
> Putting the value in the hmm_vma_walk would have made some sense to me
> if the pgmap was not set to NULL all over the place. Then the most
> xa_loads would be eliminated, as I would expect the pgmap tends to be
> mostly uniform for these use cases.
>
> Is there some reason the pgmap ref can't be held across
> faulting/sleeping? ie like below.

No restriction on holding refs over faulting / sleeping.

>
> Anyhow, I looked over this pretty carefully and the change looks
> functionally OK, I just don't know why the code is like this in the
> first place.
>
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9a908902e4cc38..4e30128c23a505 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>                 }
>                 pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
>         }
> -       if (hmm_vma_walk->pgmap) {
> -               put_dev_pagemap(hmm_vma_walk->pgmap);
> -               hmm_vma_walk->pgmap = NULL;
> -       }
>         hmm_vma_walk->last = end;
>         return 0;
>  #else
> @@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>         return 0;
>
>  fault:
> -       if (hmm_vma_walk->pgmap) {
> -               put_dev_pagemap(hmm_vma_walk->pgmap);
> -               hmm_vma_walk->pgmap = NULL;
> -       }
>         pte_unmap(ptep);
>         /* Fault any virtual address we were asked to fault */
>         return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
> @@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>                         return r;
>                 }
>         }
> -       if (hmm_vma_walk->pgmap) {
> -               /*
> -                * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
> -                * so that we can leverage get_dev_pagemap() optimization which
> -                * will not re-take a reference on a pgmap if we already have
> -                * one.
> -                */
> -               put_dev_pagemap(hmm_vma_walk->pgmap);
> -               hmm_vma_walk->pgmap = NULL;
> -       }
>         pte_unmap(ptep - 1);
>
>         hmm_vma_walk->last = addr;
> @@ -751,10 +733,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
>                         pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
>                                   cpu_flags;
>                 }
> -               if (hmm_vma_walk->pgmap) {
> -                       put_dev_pagemap(hmm_vma_walk->pgmap);
> -                       hmm_vma_walk->pgmap = NULL;
> -               }
>                 hmm_vma_walk->last = end;
>                 return 0;
>         }
> @@ -1026,6 +1004,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>                         /* Keep trying while the range is valid. */
>                 } while (ret == -EBUSY && range->valid);
>
> +               /*
> +                * We do put_dev_pagemap() here so that we can leverage
> +                * get_dev_pagemap() optimization which will not re-take a
> +                * reference on a pgmap if we already have one.
> +                */
> +               if (hmm_vma_walk->pgmap)
> +                       put_dev_pagemap(hmm_vma_walk->pgmap);
> +

Seems ok, but only if the caller is guaranteeing that the range does
not span outside of a single pagemap instance. If that guarantee is
met why not just have the caller pass in a pinned pagemap? If that
guarantee is not met, then I think we're back to your race concern.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-07 18:47     ` Dan Williams
@ 2019-08-08  6:59       ` Christoph Hellwig
  2019-08-14  1:36         ` Dan Williams
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-08  6:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, Christoph Hellwig, Jérôme Glisse,
	Ben Skeggs, Felix Kuehling, Ralph Campbell, linux-mm, nouveau,
	dri-devel, amd-gfx, linux-kernel

On Wed, Aug 07, 2019 at 11:47:22AM -0700, Dan Williams wrote:
> > Unrelated to this patch, but what is the point of getting checking
> > that the pgmap exists for the page and then immediately releasing it?
> > This code has this pattern in several places.
> >
> > It feels racy
> 
> Agree, not sure what the intent is here. The only other reason call
> get_dev_pagemap() is to just check in general if the pfn is indeed
> owned by some ZONE_DEVICE instance, but if the intent is to make sure
> the device is still attached/enabled that check is invalidated at
> put_dev_pagemap().
> 
> If it's the former case, validating ZONE_DEVICE pfns, I imagine we can
> do something cheaper with a helper that is on the order of the same
> cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag
> or something similar.

The hmm literally never dereferences the pgmap, so validity checking is
the only explanation for it.

> > +               /*
> > +                * We do put_dev_pagemap() here so that we can leverage
> > +                * get_dev_pagemap() optimization which will not re-take a
> > +                * reference on a pgmap if we already have one.
> > +                */
> > +               if (hmm_vma_walk->pgmap)
> > +                       put_dev_pagemap(hmm_vma_walk->pgmap);
> > +
> 
> Seems ok, but only if the caller is guaranteeing that the range does
> not span outside of a single pagemap instance. If that guarantee is
> met why not just have the caller pass in a pinned pagemap? If that
> guarantee is not met, then I think we're back to your race concern.

It iterates over multiple ptes in a non-huge pmd.  Is there any kind of
limitations on different pgmap instances inside a pmd?  I can't think
of one, so this might actually be a bug.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-08  6:59       ` Christoph Hellwig
@ 2019-08-14  1:36         ` Dan Williams
  2019-08-14  7:38           ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-14  1:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Jérôme Glisse, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Wed, Aug 7, 2019 at 11:59 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 07, 2019 at 11:47:22AM -0700, Dan Williams wrote:
> > > Unrelated to this patch, but what is the point of getting checking
> > > that the pgmap exists for the page and then immediately releasing it?
> > > This code has this pattern in several places.
> > >
> > > It feels racy
> >
> > Agree, not sure what the intent is here. The only other reason call
> > get_dev_pagemap() is to just check in general if the pfn is indeed
> > owned by some ZONE_DEVICE instance, but if the intent is to make sure
> > the device is still attached/enabled that check is invalidated at
> > put_dev_pagemap().
> >
> > If it's the former case, validating ZONE_DEVICE pfns, I imagine we can
> > do something cheaper with a helper that is on the order of the same
> > cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag
> > or something similar.
>
> The hmm literally never dereferences the pgmap, so validity checking is
> the only explanation for it.
>
> > > +               /*
> > > +                * We do put_dev_pagemap() here so that we can leverage
> > > +                * get_dev_pagemap() optimization which will not re-take a
> > > +                * reference on a pgmap if we already have one.
> > > +                */
> > > +               if (hmm_vma_walk->pgmap)
> > > +                       put_dev_pagemap(hmm_vma_walk->pgmap);
> > > +
> >
> > Seems ok, but only if the caller is guaranteeing that the range does
> > not span outside of a single pagemap instance. If that guarantee is
> > met why not just have the caller pass in a pinned pagemap? If that
> > guarantee is not met, then I think we're back to your race concern.
>
> It iterates over multiple ptes in a non-huge pmd.  Is there any kind of
> limitations on different pgmap instances inside a pmd?  I can't think
> of one, so this might actually be a bug.

Section alignment constraints somewhat save us here. The only example
I can think of a PMD not containing a uniform pgmap association for
each pte is the case when the pgmap overlaps normal dram, i.e. shares
the same 'struct memory_section' for a given span. Otherwise, distinct
pgmaps arrange to manage their own exclusive sections (and now
subsections as of v5.3). Otherwise the implementation could not
guarantee different mapping lifetimes.

That said, this seems to want a better mechanism to determine "pfn is
ZONE_DEVICE".


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-14  1:36         ` Dan Williams
@ 2019-08-14  7:38           ` Christoph Hellwig
  2019-08-14 13:27             ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-14  7:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Christoph Hellwig, Jason Gunthorpe, Jérôme Glisse,
	Ben Skeggs, Felix Kuehling, Ralph Campbell, linux-mm, nouveau,
	dri-devel, amd-gfx, linux-kernel

On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> Section alignment constraints somewhat save us here. The only example
> I can think of a PMD not containing a uniform pgmap association for
> each pte is the case when the pgmap overlaps normal dram, i.e. shares
> the same 'struct memory_section' for a given span. Otherwise, distinct
> pgmaps arrange to manage their own exclusive sections (and now
> subsections as of v5.3). Otherwise the implementation could not
> guarantee different mapping lifetimes.
> 
> That said, this seems to want a better mechanism to determine "pfn is
> ZONE_DEVICE".

So I guess this patch is fine for now, and once you provide a better
mechanism we can switch over to it?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-14  7:38           ` Christoph Hellwig
@ 2019-08-14 13:27             ` Jason Gunthorpe
  2019-08-14 14:48               ` Dan Williams
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-14 13:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dan Williams, Jérôme Glisse, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > Section alignment constraints somewhat save us here. The only example
> > I can think of a PMD not containing a uniform pgmap association for
> > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > the same 'struct memory_section' for a given span. Otherwise, distinct
> > pgmaps arrange to manage their own exclusive sections (and now
> > subsections as of v5.3). Otherwise the implementation could not
> > guarantee different mapping lifetimes.
> > 
> > That said, this seems to want a better mechanism to determine "pfn is
> > ZONE_DEVICE".
> 
> So I guess this patch is fine for now, and once you provide a better
> mechanism we can switch over to it?

What about the version I sent to just get rid of all the strange
put_dev_pagemaps while scanning? Odds are good we will work with only
a single pagemap, so it makes some sense to cache it once we find it?

diff --git a/mm/hmm.c b/mm/hmm.c
index 9a908902e4cc38..4e30128c23a505 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
 		}
 		pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
 	}
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	hmm_vma_walk->last = end;
 	return 0;
 #else
@@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
 	return 0;
 
 fault:
-	if (hmm_vma_walk->pgmap) {
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep);
 	/* Fault any virtual address we were asked to fault */
 	return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
@@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 			return r;
 		}
 	}
-	if (hmm_vma_walk->pgmap) {
-		/*
-		 * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
-		 * so that we can leverage get_dev_pagemap() optimization which
-		 * will not re-take a reference on a pgmap if we already have
-		 * one.
-		 */
-		put_dev_pagemap(hmm_vma_walk->pgmap);
-		hmm_vma_walk->pgmap = NULL;
-	}
 	pte_unmap(ptep - 1);
 
 	hmm_vma_walk->last = addr;
@@ -751,10 +733,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
 			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
 				  cpu_flags;
 		}
-		if (hmm_vma_walk->pgmap) {
-			put_dev_pagemap(hmm_vma_walk->pgmap);
-			hmm_vma_walk->pgmap = NULL;
-		}
 		hmm_vma_walk->last = end;
 		return 0;
 	}
@@ -1026,6 +1004,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
 			/* Keep trying while the range is valid. */
 		} while (ret == -EBUSY && range->valid);
 
+		/*
+		 * We do put_dev_pagemap() here so that we can leverage
+		 * get_dev_pagemap() optimization which will not re-take a
+		 * reference on a pgmap if we already have one.
+		 */
+		if (hmm_vma_walk->pgmap)
+			put_dev_pagemap(hmm_vma_walk->pgmap);
+
 		if (ret) {
 			unsigned long i;
 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-14 13:27             ` Jason Gunthorpe
@ 2019-08-14 14:48               ` Dan Williams
  2019-08-15 18:03                 ` Jerome Glisse
  0 siblings, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-14 14:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jérôme Glisse, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > Section alignment constraints somewhat save us here. The only example
> > > I can think of a PMD not containing a uniform pgmap association for
> > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > pgmaps arrange to manage their own exclusive sections (and now
> > > subsections as of v5.3). Otherwise the implementation could not
> > > guarantee different mapping lifetimes.
> > >
> > > That said, this seems to want a better mechanism to determine "pfn is
> > > ZONE_DEVICE".
> >
> > So I guess this patch is fine for now, and once you provide a better
> > mechanism we can switch over to it?
>
> What about the version I sent to just get rid of all the strange
> put_dev_pagemaps while scanning? Odds are good we will work with only
> a single pagemap, so it makes some sense to cache it once we find it?

Yes, if the scan is over a single pmd then caching it makes sense.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-14 14:48               ` Dan Williams
@ 2019-08-15 18:03                 ` Jerome Glisse
  2019-08-15 19:22                   ` Jason Gunthorpe
  2019-08-15 19:36                   ` Dan Williams
  0 siblings, 2 replies; 56+ messages in thread
From: Jerome Glisse @ 2019-08-15 18:03 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > Section alignment constraints somewhat save us here. The only example
> > > > I can think of a PMD not containing a uniform pgmap association for
> > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > subsections as of v5.3). Otherwise the implementation could not
> > > > guarantee different mapping lifetimes.
> > > >
> > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > ZONE_DEVICE".
> > >
> > > So I guess this patch is fine for now, and once you provide a better
> > > mechanism we can switch over to it?
> >
> > What about the version I sent to just get rid of all the strange
> > put_dev_pagemaps while scanning? Odds are good we will work with only
> > a single pagemap, so it makes some sense to cache it once we find it?
> 
> Yes, if the scan is over a single pmd then caching it makes sense.

Quite frankly an easier an better solution is to remove the pagemap
lookup as HMM user abide by mmu notifier it means we will not make
use or dereference the struct page so that we are safe from any
racing hotunplug of dax memory (as long as device driver using hmm
do not have a bug).

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 18:03                 ` Jerome Glisse
@ 2019-08-15 19:22                   ` Jason Gunthorpe
  2019-08-15 19:36                   ` Dan Williams
  1 sibling, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-15 19:22 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Dan Williams, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 02:03:25PM -0400, Jerome Glisse wrote:
> On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > Section alignment constraints somewhat save us here. The only example
> > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > guarantee different mapping lifetimes.
> > > > >
> > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > ZONE_DEVICE".
> > > >
> > > > So I guess this patch is fine for now, and once you provide a better
> > > > mechanism we can switch over to it?
> > >
> > > What about the version I sent to just get rid of all the strange
> > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > a single pagemap, so it makes some sense to cache it once we find it?
> > 
> > Yes, if the scan is over a single pmd then caching it makes sense.
> 
> Quite frankly an easier an better solution is to remove the pagemap
> lookup as HMM user abide by mmu notifier it means we will not make
> use or dereference the struct page so that we are safe from any
> racing hotunplug of dax memory (as long as device driver using hmm
> do not have a bug).

Yes, I also would prefer to drop the confusing checks entirely -
Christoph can you resend this patch?

Thanks,
Jason 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 18:03                 ` Jerome Glisse
  2019-08-15 19:22                   ` Jason Gunthorpe
@ 2019-08-15 19:36                   ` Dan Williams
  2019-08-15 19:43                     ` Jerome Glisse
  1 sibling, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-15 19:36 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jason Gunthorpe, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > Section alignment constraints somewhat save us here. The only example
> > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > guarantee different mapping lifetimes.
> > > > >
> > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > ZONE_DEVICE".
> > > >
> > > > So I guess this patch is fine for now, and once you provide a better
> > > > mechanism we can switch over to it?
> > >
> > > What about the version I sent to just get rid of all the strange
> > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > a single pagemap, so it makes some sense to cache it once we find it?
> >
> > Yes, if the scan is over a single pmd then caching it makes sense.
>
> Quite frankly an easier an better solution is to remove the pagemap
> lookup as HMM user abide by mmu notifier it means we will not make
> use or dereference the struct page so that we are safe from any
> racing hotunplug of dax memory (as long as device driver using hmm
> do not have a bug).

Yes, as long as the driver remove is synchronized against HMM
operations via another mechanism then there is no need to take pagemap
references. Can you briefly describe what that other mechanism is?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 19:36                   ` Dan Williams
@ 2019-08-15 19:43                     ` Jerome Glisse
  2019-08-15 20:12                       ` Dan Williams
  0 siblings, 1 reply; 56+ messages in thread
From: Jerome Glisse @ 2019-08-15 19:43 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > Section alignment constraints somewhat save us here. The only example
> > > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > guarantee different mapping lifetimes.
> > > > > >
> > > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > > ZONE_DEVICE".
> > > > >
> > > > > So I guess this patch is fine for now, and once you provide a better
> > > > > mechanism we can switch over to it?
> > > >
> > > > What about the version I sent to just get rid of all the strange
> > > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > > a single pagemap, so it makes some sense to cache it once we find it?
> > >
> > > Yes, if the scan is over a single pmd then caching it makes sense.
> >
> > Quite frankly an easier an better solution is to remove the pagemap
> > lookup as HMM user abide by mmu notifier it means we will not make
> > use or dereference the struct page so that we are safe from any
> > racing hotunplug of dax memory (as long as device driver using hmm
> > do not have a bug).
> 
> Yes, as long as the driver remove is synchronized against HMM
> operations via another mechanism then there is no need to take pagemap
> references. Can you briefly describe what that other mechanism is?

So if you hotunplug some dax memory i assume that this can only
happens once all the pages are unmapped (as it must have the
zero refcount, well 1 because of the bias) and any unmap will
trigger a mmu notifier callback. User of hmm mirror abiding by
the API will never make use of information they get through the
fault or snapshot function until checking for racing notifier
under lock.

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 19:43                     ` Jerome Glisse
@ 2019-08-15 20:12                       ` Dan Williams
  2019-08-15 20:33                         ` Jerome Glisse
  0 siblings, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-15 20:12 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Jason Gunthorpe, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > > > >
> > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > > Section alignment constraints somewhat save us here. The only example
> > > > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > > guarantee different mapping lifetimes.
> > > > > > >
> > > > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > > > ZONE_DEVICE".
> > > > > >
> > > > > > So I guess this patch is fine for now, and once you provide a better
> > > > > > mechanism we can switch over to it?
> > > > >
> > > > > What about the version I sent to just get rid of all the strange
> > > > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > > > a single pagemap, so it makes some sense to cache it once we find it?
> > > >
> > > > Yes, if the scan is over a single pmd then caching it makes sense.
> > >
> > > Quite frankly an easier an better solution is to remove the pagemap
> > > lookup as HMM user abide by mmu notifier it means we will not make
> > > use or dereference the struct page so that we are safe from any
> > > racing hotunplug of dax memory (as long as device driver using hmm
> > > do not have a bug).
> >
> > Yes, as long as the driver remove is synchronized against HMM
> > operations via another mechanism then there is no need to take pagemap
> > references. Can you briefly describe what that other mechanism is?
>
> So if you hotunplug some dax memory i assume that this can only
> happens once all the pages are unmapped (as it must have the
> zero refcount, well 1 because of the bias) and any unmap will
> trigger a mmu notifier callback. User of hmm mirror abiding by
> the API will never make use of information they get through the
> fault or snapshot function until checking for racing notifier
> under lock.

Hmm that first assumption is not guaranteed by the dev_pagemap core.
The dev_pagemap end of life model is "disable, invalidate, drain" so
it's possible to call devm_munmap_pages() while pages are still mapped
it just won't complete the teardown of the pagemap until the last
reference is dropped. New references are blocked during this teardown.

However, if the driver is validating the liveness of the mapping in
the mmu-notifier path and blocking new references it sounds like it
should be ok. Might there be GPU driver unit tests that cover this
racing teardown case?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:12                       ` Dan Williams
@ 2019-08-15 20:33                         ` Jerome Glisse
  2019-08-15 20:41                           ` Jason Gunthorpe
  2019-08-16  4:41                           ` Christoph Hellwig
  0 siblings, 2 replies; 56+ messages in thread
From: Jerome Glisse @ 2019-08-15 20:33 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 01:12:22PM -0700, Dan Williams wrote:
> On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse <jglisse@redhat.com> wrote:
> >
> > On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > > > Section alignment constraints somewhat save us here. The only example
> > > > > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > > > guarantee different mapping lifetimes.
> > > > > > > >
> > > > > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > > > > ZONE_DEVICE".
> > > > > > >
> > > > > > > So I guess this patch is fine for now, and once you provide a better
> > > > > > > mechanism we can switch over to it?
> > > > > >
> > > > > > What about the version I sent to just get rid of all the strange
> > > > > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > > > > a single pagemap, so it makes some sense to cache it once we find it?
> > > > >
> > > > > Yes, if the scan is over a single pmd then caching it makes sense.
> > > >
> > > > Quite frankly an easier an better solution is to remove the pagemap
> > > > lookup as HMM user abide by mmu notifier it means we will not make
> > > > use or dereference the struct page so that we are safe from any
> > > > racing hotunplug of dax memory (as long as device driver using hmm
> > > > do not have a bug).
> > >
> > > Yes, as long as the driver remove is synchronized against HMM
> > > operations via another mechanism then there is no need to take pagemap
> > > references. Can you briefly describe what that other mechanism is?
> >
> > So if you hotunplug some dax memory i assume that this can only
> > happens once all the pages are unmapped (as it must have the
> > zero refcount, well 1 because of the bias) and any unmap will
> > trigger a mmu notifier callback. User of hmm mirror abiding by
> > the API will never make use of information they get through the
> > fault or snapshot function until checking for racing notifier
> > under lock.
> 
> Hmm that first assumption is not guaranteed by the dev_pagemap core.
> The dev_pagemap end of life model is "disable, invalidate, drain" so
> it's possible to call devm_munmap_pages() while pages are still mapped
> it just won't complete the teardown of the pagemap until the last
> reference is dropped. New references are blocked during this teardown.
> 
> However, if the driver is validating the liveness of the mapping in
> the mmu-notifier path and blocking new references it sounds like it
> should be ok. Might there be GPU driver unit tests that cover this
> racing teardown case?

So nor HMM nor driver should dereference the struct page (i do not
think any iommu driver would either), they only care about the pfn.
So even if we race with a teardown as soon as we get the mmu notifier
callback to invalidate the mmapping we will do so. The pattern is:

    mydriver_populate_vaddr_range(start, end) {
        hmm_range_register(range, start, end)
    again:
        ret = hmm_range_fault(start, end)
        if (ret < 0)
            return ret;

        take_driver_page_table_lock();
        if (range.valid) {
            populate_device_page_table();
            release_driver_page_table_lock();
        } else {
            release_driver_page_table_lock();
            goto again;
        }
    }

The mmu notifier callback do use the same page table lock and we
also have the range tracking going on. So either we populate
device page table before racing with teardown in which case the
device page table entry are clear through the mmu notifier call
back. Or if we race, but then we can see the racing mmu notifier
calls and retry again which will trigger a regular page fault
which will return an error i assume.

So in the end we have the exact same behavior as if a CPU was trying
to access that virtual address. This is the whole point of HMM, to
behave exactly as if it was a CPU access. Fails in the same way,
race in the same way. So if DAX teardown are safe versus racing
CPU access to some vma that have that memory map, it will be the
same for HMM users.


GPU driver test suite are not good at testing this. They are geared
to test the GPU itself not the interaction of the GPU driver with
rest of the kernel.

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:33                         ` Jerome Glisse
@ 2019-08-15 20:41                           ` Jason Gunthorpe
  2019-08-15 20:47                             ` Dan Williams
  2019-08-15 20:51                             ` Jerome Glisse
  2019-08-16  4:41                           ` Christoph Hellwig
  1 sibling, 2 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-15 20:41 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Dan Williams, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:

> So nor HMM nor driver should dereference the struct page (i do not
> think any iommu driver would either),

Er, they do technically deref the struct page:

nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
			 struct hmm_range *range)
		struct page *page;
		page = hmm_pfn_to_page(range, range->pfns[i]);
		if (!nouveau_dmem_page(drm, page)) {


nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
{
	return is_device_private_page(page) && drm->dmem == page_to_dmem(page)


Which does touch 'page->pgmap'

Is this OK without having a get_dev_pagemap() ?

Noting that the collision-retry scheme doesn't protect anything here
as we can have a concurrent invalidation while doing the above deref.

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:41                           ` Jason Gunthorpe
@ 2019-08-15 20:47                             ` Dan Williams
  2019-08-16  0:40                               ` Jason Gunthorpe
  2019-08-15 20:51                             ` Jerome Glisse
  1 sibling, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-15 20:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
>
> > So nor HMM nor driver should dereference the struct page (i do not
> > think any iommu driver would either),
>
> Er, they do technically deref the struct page:
>
> nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
>                          struct hmm_range *range)
>                 struct page *page;
>                 page = hmm_pfn_to_page(range, range->pfns[i]);
>                 if (!nouveau_dmem_page(drm, page)) {
>
>
> nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> {
>         return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
>
>
> Which does touch 'page->pgmap'
>
> Is this OK without having a get_dev_pagemap() ?
>
> Noting that the collision-retry scheme doesn't protect anything here
> as we can have a concurrent invalidation while doing the above deref.

As long take_driver_page_table_lock() in Jerome's flow can replace
percpu_ref_tryget_live() on the pagemap reference. It seems
nouveau_dmem_convert_pfn() happens after:

                        mutex_lock(&svmm->mutex);
                        if (!nouveau_range_done(&range)) {

...so I would expect that to be functionally equivalent to validating
the reference count.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:41                           ` Jason Gunthorpe
  2019-08-15 20:47                             ` Dan Williams
@ 2019-08-15 20:51                             ` Jerome Glisse
  2019-08-16  0:43                               ` Jason Gunthorpe
  1 sibling, 1 reply; 56+ messages in thread
From: Jerome Glisse @ 2019-08-15 20:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 08:41:33PM +0000, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> 
> > So nor HMM nor driver should dereference the struct page (i do not
> > think any iommu driver would either),
> 
> Er, they do technically deref the struct page:
> 
> nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
> 			 struct hmm_range *range)
> 		struct page *page;
> 		page = hmm_pfn_to_page(range, range->pfns[i]);
> 		if (!nouveau_dmem_page(drm, page)) {
> 
> 
> nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> {
> 	return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
> 
> 
> Which does touch 'page->pgmap'
> 
> Is this OK without having a get_dev_pagemap() ?
>
> Noting that the collision-retry scheme doesn't protect anything here
> as we can have a concurrent invalidation while doing the above deref.

Uh ? How so ? We are not reading the same code i think.

My read is that function is call when holding the device
lock which exclude any racing mmu notifier from making
forward progress and it is also protected by the range so
at the time this happens it is safe to dereference the
struct page. In this case any way we can update the
nouveau_dmem_page() to check that page page->pgmap == the
expected pgmap.

Cheers,
Jérôme


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:47                             ` Dan Williams
@ 2019-08-16  0:40                               ` Jason Gunthorpe
  2019-08-16  3:54                                 ` Dan Williams
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-16  0:40 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 01:47:12PM -0700, Dan Williams wrote:
> On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> >
> > > So nor HMM nor driver should dereference the struct page (i do not
> > > think any iommu driver would either),
> >
> > Er, they do technically deref the struct page:
> >
> > nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
> >                          struct hmm_range *range)
> >                 struct page *page;
> >                 page = hmm_pfn_to_page(range, range->pfns[i]);
> >                 if (!nouveau_dmem_page(drm, page)) {
> >
> >
> > nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> > {
> >         return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
> >
> >
> > Which does touch 'page->pgmap'
> >
> > Is this OK without having a get_dev_pagemap() ?
> >
> > Noting that the collision-retry scheme doesn't protect anything here
> > as we can have a concurrent invalidation while doing the above deref.
> 
> As long take_driver_page_table_lock() in Jerome's flow can replace
> percpu_ref_tryget_live() on the pagemap reference. It seems
> nouveau_dmem_convert_pfn() happens after:
>
>                         mutex_lock(&svmm->mutex);
>                         if (!nouveau_range_done(&range)) {
> 
> ...so I would expect that to be functionally equivalent to validating
> the reference count.

Yes, OK, that makes sense, I was mostly surprised by the statement the
driver doesn't touch the struct page.. 

I suppose "doesn't touch the struct page out of the driver lock" is
the case.

However, this means we cannot do any processing of ZONE_DEVICE pages
outside the driver lock, so eg, doing any DMA map that might rely on
MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
a bit unfortunate.

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:51                             ` Jerome Glisse
@ 2019-08-16  0:43                               ` Jason Gunthorpe
  2019-08-16  4:44                                 ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-16  0:43 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Dan Williams, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 04:51:33PM -0400, Jerome Glisse wrote:

> struct page. In this case any way we can update the
> nouveau_dmem_page() to check that page page->pgmap == the
> expected pgmap.

I was also wondering if that is a problem.. just blindly doing a
container_of on the page->pgmap does seem like it assumes that only
this driver is using DEVICE_PRIVATE.

It seems like something missing in hmm_range_fault, it should be told
what DEVICE_PRIVATE is acceptable to trigger HMM_PFN_DEVICE_PRIVATE
and fault all others?

Jason 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16  0:40                               ` Jason Gunthorpe
@ 2019-08-16  3:54                                 ` Dan Williams
  2019-08-16 12:24                                   ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-16  3:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 5:41 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Thu, Aug 15, 2019 at 01:47:12PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> > >
> > > > So nor HMM nor driver should dereference the struct page (i do not
> > > > think any iommu driver would either),
> > >
> > > Er, they do technically deref the struct page:
> > >
> > > nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
> > >                          struct hmm_range *range)
> > >                 struct page *page;
> > >                 page = hmm_pfn_to_page(range, range->pfns[i]);
> > >                 if (!nouveau_dmem_page(drm, page)) {
> > >
> > >
> > > nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> > > {
> > >         return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
> > >
> > >
> > > Which does touch 'page->pgmap'
> > >
> > > Is this OK without having a get_dev_pagemap() ?
> > >
> > > Noting that the collision-retry scheme doesn't protect anything here
> > > as we can have a concurrent invalidation while doing the above deref.
> >
> > As long take_driver_page_table_lock() in Jerome's flow can replace
> > percpu_ref_tryget_live() on the pagemap reference. It seems
> > nouveau_dmem_convert_pfn() happens after:
> >
> >                         mutex_lock(&svmm->mutex);
> >                         if (!nouveau_range_done(&range)) {
> >
> > ...so I would expect that to be functionally equivalent to validating
> > the reference count.
>
> Yes, OK, that makes sense, I was mostly surprised by the statement the
> driver doesn't touch the struct page..
>
> I suppose "doesn't touch the struct page out of the driver lock" is
> the case.
>
> However, this means we cannot do any processing of ZONE_DEVICE pages
> outside the driver lock, so eg, doing any DMA map that might rely on
> MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
> a bit unfortunate.

Wouldn't P2PDMA use page pins? Not needing to hold a lock over
ZONE_DEVICE page operations was one of the motivations for plumbing
get_dev_pagemap() with a percpu-ref.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-15 20:33                         ` Jerome Glisse
  2019-08-15 20:41                           ` Jason Gunthorpe
@ 2019-08-16  4:41                           ` Christoph Hellwig
  1 sibling, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-16  4:41 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Dan Williams, Jason Gunthorpe, Christoph Hellwig, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> So nor HMM nor driver should dereference the struct page (i do not
> think any iommu driver would either),

Both current hmm drivers convert the hmm pfn back to a page and
eventually call dma_map_page on it.  As do the ODP patches from you.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16  0:43                               ` Jason Gunthorpe
@ 2019-08-16  4:44                                 ` Christoph Hellwig
  2019-08-16 12:30                                   ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-16  4:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Dan Williams, Christoph Hellwig, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Fri, Aug 16, 2019 at 12:43:07AM +0000, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2019 at 04:51:33PM -0400, Jerome Glisse wrote:
> 
> > struct page. In this case any way we can update the
> > nouveau_dmem_page() to check that page page->pgmap == the
> > expected pgmap.
> 
> I was also wondering if that is a problem.. just blindly doing a
> container_of on the page->pgmap does seem like it assumes that only
> this driver is using DEVICE_PRIVATE.
> 
> It seems like something missing in hmm_range_fault, it should be told
> what DEVICE_PRIVATE is acceptable to trigger HMM_PFN_DEVICE_PRIVATE
> and fault all others?

The whole device private handling in hmm and migrate_vma seems pretty
broken as far as I can tell, and I have some WIP patches.  Basically we
should not touch (or possibly eventually call migrate to ram eventually
in the future) device private pages not owned by the caller, where I
try to defined the caller by the dev_pagemap_ops instance.  


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16  3:54                                 ` Dan Williams
@ 2019-08-16 12:24                                   ` Jason Gunthorpe
  2019-08-16 17:21                                     ` Dan Williams
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-16 12:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Thu, Aug 15, 2019 at 08:54:46PM -0700, Dan Williams wrote:

> > However, this means we cannot do any processing of ZONE_DEVICE pages
> > outside the driver lock, so eg, doing any DMA map that might rely on
> > MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
> > a bit unfortunate.
> 
> Wouldn't P2PDMA use page pins? Not needing to hold a lock over
> ZONE_DEVICE page operations was one of the motivations for plumbing
> get_dev_pagemap() with a percpu-ref.

hmm_range_fault() doesn't use page pins at all, so if a ZONE_DEVICE
page comes out of it then it needs to use another locking pattern.

If I follow it all right:

We can do a get_dev_pagemap inside the page_walk and touch the pgmap,
or we can do the 'device mutex && retry' pattern and touch the pgmap
in the driver, under that lock.

However in all cases the current get_dev_pagemap()'s in the page walk
are not necessary, and we can delete them.

?

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16  4:44                                 ` Christoph Hellwig
@ 2019-08-16 12:30                                   ` Jason Gunthorpe
  2019-08-16 12:34                                     ` Christoph Hellwig
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-16 12:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jerome Glisse, Dan Williams, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Fri, Aug 16, 2019 at 06:44:48AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 16, 2019 at 12:43:07AM +0000, Jason Gunthorpe wrote:
> > On Thu, Aug 15, 2019 at 04:51:33PM -0400, Jerome Glisse wrote:
> > 
> > > struct page. In this case any way we can update the
> > > nouveau_dmem_page() to check that page page->pgmap == the
> > > expected pgmap.
> > 
> > I was also wondering if that is a problem.. just blindly doing a
> > container_of on the page->pgmap does seem like it assumes that only
> > this driver is using DEVICE_PRIVATE.
> > 
> > It seems like something missing in hmm_range_fault, it should be told
> > what DEVICE_PRIVATE is acceptable to trigger HMM_PFN_DEVICE_PRIVATE
> > and fault all others?
> 
> The whole device private handling in hmm and migrate_vma seems pretty
> broken as far as I can tell, and I have some WIP patches.  Basically we
> should not touch (or possibly eventually call migrate to ram eventually
> in the future) device private pages not owned by the caller, where I
> try to defined the caller by the dev_pagemap_ops instance.  

I think it needs to be more elaborate.

For instance, a system may have multiple DEVICE_PRIVATE map's owned by
the same driver - but multiple physical devices using that driver.

Each physical device's driver should only ever get DEVICE_PRIVATE
pages for it's own on-device memory. Never a DEVICE_PRIVATE for
another device's memory.

The dev_pagemap_ops would not be unique enough, right?

Probably also clusters of same-driver struct device can share a
DEVICE_PRIVATE, at least high end GPU's now have private memory
coherency busses between their devices.

Since we want to trigger migration to CPU on incompatible
DEVICE_PRIVATE pages, it seems best to sort this out in the
hmm_range_fault?

Maybe some sort of unique ID inside the page->pgmap and passed as
input?

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16 12:30                                   ` Jason Gunthorpe
@ 2019-08-16 12:34                                     ` Christoph Hellwig
  0 siblings, 0 replies; 56+ messages in thread
From: Christoph Hellwig @ 2019-08-16 12:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Jerome Glisse, Dan Williams, Ben Skeggs,
	Felix Kuehling, Ralph Campbell, linux-mm, nouveau, dri-devel,
	amd-gfx, linux-kernel

On Fri, Aug 16, 2019 at 12:30:41PM +0000, Jason Gunthorpe wrote:
> 
> For instance, a system may have multiple DEVICE_PRIVATE map's owned by
> the same driver - but multiple physical devices using that driver.
> 
> Each physical device's driver should only ever get DEVICE_PRIVATE
> pages for it's own on-device memory. Never a DEVICE_PRIVATE for
> another device's memory.
> 
> The dev_pagemap_ops would not be unique enough, right?

True.

> 
> Probably also clusters of same-driver struct device can share a
> DEVICE_PRIVATE, at least high end GPU's now have private memory
> coherency busses between their devices.
> 
> Since we want to trigger migration to CPU on incompatible
> DEVICE_PRIVATE pages, it seems best to sort this out in the
> hmm_range_fault?
> 
> Maybe some sort of unique ID inside the page->pgmap and passed as
> input?

Yes, we'll probably need an owner field.  And it's not just
hmm_range_fault, the migrate_vma_* routines as affected in the same
way.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16 12:24                                   ` Jason Gunthorpe
@ 2019-08-16 17:21                                     ` Dan Williams
  2019-08-16 17:28                                       ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Dan Williams @ 2019-08-16 17:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Fri, Aug 16, 2019 at 5:24 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Thu, Aug 15, 2019 at 08:54:46PM -0700, Dan Williams wrote:
>
> > > However, this means we cannot do any processing of ZONE_DEVICE pages
> > > outside the driver lock, so eg, doing any DMA map that might rely on
> > > MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
> > > a bit unfortunate.
> >
> > Wouldn't P2PDMA use page pins? Not needing to hold a lock over
> > ZONE_DEVICE page operations was one of the motivations for plumbing
> > get_dev_pagemap() with a percpu-ref.
>
> hmm_range_fault() doesn't use page pins at all, so if a ZONE_DEVICE
> page comes out of it then it needs to use another locking pattern.
>
> If I follow it all right:
>
> We can do a get_dev_pagemap inside the page_walk and touch the pgmap,
> or we can do the 'device mutex && retry' pattern and touch the pgmap
> in the driver, under that lock.
>
> However in all cases the current get_dev_pagemap()'s in the page walk
> are not necessary, and we can delete them.

Yes, as long as 'struct page' instances resulting from that lookup are
not passed outside of that lock.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16 17:21                                     ` Dan Williams
@ 2019-08-16 17:28                                       ` Jason Gunthorpe
  2019-08-16 21:10                                         ` Ralph Campbell
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2019-08-16 17:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	Ralph Campbell, linux-mm, nouveau, dri-devel, amd-gfx,
	linux-kernel

On Fri, Aug 16, 2019 at 10:21:41AM -0700, Dan Williams wrote:

> > We can do a get_dev_pagemap inside the page_walk and touch the pgmap,
> > or we can do the 'device mutex && retry' pattern and touch the pgmap
> > in the driver, under that lock.
> >
> > However in all cases the current get_dev_pagemap()'s in the page walk
> > are not necessary, and we can delete them.
> 
> Yes, as long as 'struct page' instances resulting from that lookup are
> not passed outside of that lock.

Indeed.

Also, I was reflecting over lunch that the hmm_range_fault should only
return DEVICE_PRIVATE pages for the caller's device (see other thread
with HCH), and in this case, the caller should also be responsible to
ensure that the driver is not calling hmm_range_fault at the same time
it is deleting it's own DEVICE_PRIVATE mapping - ie by fencing its
page fault handler.

This does not apply to PCI_P2PDMA, but, lets see how that looks when
we get there.

So the whole thing seems pretty safe.

Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
  2019-08-16 17:28                                       ` Jason Gunthorpe
@ 2019-08-16 21:10                                         ` Ralph Campbell
  0 siblings, 0 replies; 56+ messages in thread
From: Ralph Campbell @ 2019-08-16 21:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Dan Williams
  Cc: Jerome Glisse, Christoph Hellwig, Ben Skeggs, Felix Kuehling,
	linux-mm, nouveau, dri-devel, amd-gfx, linux-kernel


On 8/16/19 10:28 AM, Jason Gunthorpe wrote:
> On Fri, Aug 16, 2019 at 10:21:41AM -0700, Dan Williams wrote:
> 
>>> We can do a get_dev_pagemap inside the page_walk and touch the pgmap,
>>> or we can do the 'device mutex && retry' pattern and touch the pgmap
>>> in the driver, under that lock.
>>>
>>> However in all cases the current get_dev_pagemap()'s in the page walk
>>> are not necessary, and we can delete them.
>>
>> Yes, as long as 'struct page' instances resulting from that lookup are
>> not passed outside of that lock.
> 
> Indeed.
> 
> Also, I was reflecting over lunch that the hmm_range_fault should only
> return DEVICE_PRIVATE pages for the caller's device (see other thread
> with HCH), and in this case, the caller should also be responsible to
> ensure that the driver is not calling hmm_range_fault at the same time
> it is deleting it's own DEVICE_PRIVATE mapping - ie by fencing its
> page fault handler.

Yes, that would make it a one step process to access another
device's migrated memory pages.
Right now, it has to be a two step process where the caller calls
hmm_range_fault, check the struct page to see if it is device
private and not owned, then call hmm_range_fault again with
range->pfns[i] |= range->flags[HMM_PFN_DEVICE_PRIVATE] to cause
the other device to migrate the page back to system memory.


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, back to index

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
2019-08-06 16:05 ` [PATCH 01/15] amdgpu: remove -EAGAIN handling for hmm_range_fault Christoph Hellwig
2019-08-06 16:05 ` [PATCH 02/15] amdgpu: don't initialize range->list in amdgpu_hmm_init_range Christoph Hellwig
2019-08-06 16:05 ` [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault Christoph Hellwig
2019-08-06 18:02   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk Christoph Hellwig
2019-08-07 17:45   ` Jason Gunthorpe
2019-08-07 18:47     ` Dan Williams
2019-08-08  6:59       ` Christoph Hellwig
2019-08-14  1:36         ` Dan Williams
2019-08-14  7:38           ` Christoph Hellwig
2019-08-14 13:27             ` Jason Gunthorpe
2019-08-14 14:48               ` Dan Williams
2019-08-15 18:03                 ` Jerome Glisse
2019-08-15 19:22                   ` Jason Gunthorpe
2019-08-15 19:36                   ` Dan Williams
2019-08-15 19:43                     ` Jerome Glisse
2019-08-15 20:12                       ` Dan Williams
2019-08-15 20:33                         ` Jerome Glisse
2019-08-15 20:41                           ` Jason Gunthorpe
2019-08-15 20:47                             ` Dan Williams
2019-08-16  0:40                               ` Jason Gunthorpe
2019-08-16  3:54                                 ` Dan Williams
2019-08-16 12:24                                   ` Jason Gunthorpe
2019-08-16 17:21                                     ` Dan Williams
2019-08-16 17:28                                       ` Jason Gunthorpe
2019-08-16 21:10                                         ` Ralph Campbell
2019-08-15 20:51                             ` Jerome Glisse
2019-08-16  0:43                               ` Jason Gunthorpe
2019-08-16  4:44                                 ` Christoph Hellwig
2019-08-16 12:30                                   ` Jason Gunthorpe
2019-08-16 12:34                                     ` Christoph Hellwig
2019-08-16  4:41                           ` Christoph Hellwig
2019-08-06 16:05 ` [PATCH 05/15] mm: remove the unused vma argument to hmm_range_dma_unmap Christoph Hellwig
2019-08-06 16:05 ` [PATCH 06/15] mm: remove superflous arguments from hmm_range_register Christoph Hellwig
2019-08-06 16:05 ` [PATCH 07/15] mm: remove the page_shift member from struct hmm_range Christoph Hellwig
2019-08-07 17:51   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry Christoph Hellwig
2019-08-06 18:02   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd Christoph Hellwig
2019-08-07 17:18   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 10/15] mm: only define hmm_vma_walk_pud if needed Christoph Hellwig
2019-08-06 16:05 ` [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub Christoph Hellwig
2019-08-06 18:00   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 12/15] mm: cleanup the hmm_vma_walk_hugetlb_entry stub Christoph Hellwig
2019-08-06 16:05 ` [PATCH 13/15] mm: allow HMM_MIRROR on all architectures with MMU Christoph Hellwig
2019-08-06 16:05 ` [PATCH 14/15] mm: make HMM_MIRROR an implicit option Christoph Hellwig
2019-08-06 17:44   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR Christoph Hellwig
2019-08-06 17:44   ` Jason Gunthorpe
2019-08-06 17:51     ` Kuehling, Felix
2019-08-06 18:58       ` Alex Deucher
2019-08-06 20:03         ` Jason Gunthorpe
2019-08-07  6:57           ` Koenig, Christian
2019-08-07 11:46             ` Jason Gunthorpe
2019-08-07 18:17 ` hmm cleanups, v2 Jason Gunthorpe

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org linux-mm@archiver.kernel.org
	public-inbox-index linux-mm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox