linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping
@ 2020-07-01 22:53 Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 1/5] nouveau/hmm: fault one page at a time Ralph Campbell
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Ralph Campbell @ 2020-07-01 22:53 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Ralph Campbell

The goal for this series is to introduce the hmm_pfn_to_map_order()
function. This allows a device driver to know that a given 4K PFN is
actually mapped by the CPU using a larger sized CPU page table entry and
therefore the device driver can safely map system memory using larger
device MMU PTEs.
The series is based on 5.8.0-rc3 and is intended for Jason Gunthorpe's
hmm tree. These were originally part of a larger series:
https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell@nvidia.com/

Changes in v3:
Replaced the HMM_PFN_P[MU]D flags with hmm_pfn_to_map_order() to
indicate the size of the CPU mapping.

Changes in v2:
Make the hmm_range_fault() API changes into a separate series and add
  two output flags for PMD/PUD instead of a single compund page flag as
  suggested by Jason Gunthorpe.
Make the nouveau page table changes a separate patch as suggested by
  Ben Skeggs.
Only add support for 2MB nouveau mappings initially since changing the
1:1 CPU/GPU page table size assumptions requires a bigger set of changes.
Rebase to 5.8.0-rc3.

Ralph Campbell (5):
  nouveau/hmm: fault one page at a time
  mm/hmm: add hmm_mapping order
  nouveau: fix mapping 2MB sysmem pages
  nouveau/hmm: support mapping large sysmem pages
  hmm: add tests for HMM_PFN_PMD flag

 drivers/gpu/drm/nouveau/nouveau_svm.c         | 236 ++++++++----------
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |   5 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |  82 ++++++
 include/linux/hmm.h                           |  24 +-
 lib/test_hmm.c                                |   4 +
 lib/test_hmm_uapi.h                           |   4 +
 mm/hmm.c                                      |  14 +-
 tools/testing/selftests/vm/hmm-tests.c        |  76 ++++++
 8 files changed, 299 insertions(+), 146 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/5] nouveau/hmm: fault one page at a time
  2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
@ 2020-07-01 22:53 ` Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 2/5] mm/hmm: add hmm_mapping order Ralph Campbell
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Ralph Campbell @ 2020-07-01 22:53 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Ralph Campbell

The SVM page fault handler groups faults into a range of contiguous
virtual addresses and requests hmm_range_fault() to populate and
return the page frame number of system memory mapped by the CPU.
In preparation for supporting large pages to be mapped by the GPU,
process faults one page at a time. In addition, use the hmm_range
default_flags to fix a corner case where the input hmm_pfns array
is not reinitialized after hmm_range_fault() returns -EBUSY and must
be called again.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 199 +++++++++-----------------
 1 file changed, 66 insertions(+), 133 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index ba9f9359c30e..665dede69bd1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -516,7 +516,7 @@ static const struct mmu_interval_notifier_ops nouveau_svm_mni_ops = {
 static void nouveau_hmm_convert_pfn(struct nouveau_drm *drm,
 				    struct hmm_range *range, u64 *ioctl_addr)
 {
-	unsigned long i, npages;
+	struct page *page;
 
 	/*
 	 * The ioctl_addr prepared here is passed through nvif_object_ioctl()
@@ -525,42 +525,38 @@ static void nouveau_hmm_convert_pfn(struct nouveau_drm *drm,
 	 * This is all just encoding the internal hmm representation into a
 	 * different nouveau internal representation.
 	 */
-	npages = (range->end - range->start) >> PAGE_SHIFT;
-	for (i = 0; i < npages; ++i) {
-		struct page *page;
-
-		if (!(range->hmm_pfns[i] & HMM_PFN_VALID)) {
-			ioctl_addr[i] = 0;
-			continue;
-		}
-
-		page = hmm_pfn_to_page(range->hmm_pfns[i]);
-		if (is_device_private_page(page))
-			ioctl_addr[i] = nouveau_dmem_page_addr(page) |
-					NVIF_VMM_PFNMAP_V0_V |
-					NVIF_VMM_PFNMAP_V0_VRAM;
-		else
-			ioctl_addr[i] = page_to_phys(page) |
-					NVIF_VMM_PFNMAP_V0_V |
-					NVIF_VMM_PFNMAP_V0_HOST;
-		if (range->hmm_pfns[i] & HMM_PFN_WRITE)
-			ioctl_addr[i] |= NVIF_VMM_PFNMAP_V0_W;
+	if (!(range->hmm_pfns[0] & HMM_PFN_VALID)) {
+		ioctl_addr[0] = 0;
+		return;
 	}
+
+	page = hmm_pfn_to_page(range->hmm_pfns[0]);
+	if (is_device_private_page(page))
+		ioctl_addr[0] = nouveau_dmem_page_addr(page) |
+				NVIF_VMM_PFNMAP_V0_V |
+				NVIF_VMM_PFNMAP_V0_VRAM;
+	else
+		ioctl_addr[0] = page_to_phys(page) |
+				NVIF_VMM_PFNMAP_V0_V |
+				NVIF_VMM_PFNMAP_V0_HOST;
+	if (range->hmm_pfns[0] & HMM_PFN_WRITE)
+		ioctl_addr[0] |= NVIF_VMM_PFNMAP_V0_W;
 }
 
 static int nouveau_range_fault(struct nouveau_svmm *svmm,
 			       struct nouveau_drm *drm, void *data, u32 size,
-			       unsigned long hmm_pfns[], u64 *ioctl_addr,
+			       u64 *ioctl_addr, unsigned long hmm_flags,
 			       struct svm_notifier *notifier)
 {
 	unsigned long timeout =
 		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
 	/* Have HMM fault pages within the fault window to the GPU. */
+	unsigned long hmm_pfns[1];
 	struct hmm_range range = {
 		.notifier = &notifier->notifier,
 		.start = notifier->notifier.interval_tree.start,
 		.end = notifier->notifier.interval_tree.last + 1,
-		.pfn_flags_mask = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE,
+		.default_flags = hmm_flags,
 		.hmm_pfns = hmm_pfns,
 	};
 	struct mm_struct *mm = notifier->notifier.mm;
@@ -575,11 +571,6 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
 		ret = hmm_range_fault(&range);
 		mmap_read_unlock(mm);
 		if (ret) {
-			/*
-			 * FIXME: the input PFN_REQ flags are destroyed on
-			 * -EBUSY, we need to regenerate them, also for the
-			 * other continue below
-			 */
 			if (ret == -EBUSY)
 				continue;
 			return ret;
@@ -614,17 +605,12 @@ nouveau_svm_fault(struct nvif_notify *notify)
 	struct nvif_object *device = &svm->drm->client.device.object;
 	struct nouveau_svmm *svmm;
 	struct {
-		struct {
-			struct nvif_ioctl_v0 i;
-			struct nvif_ioctl_mthd_v0 m;
-			struct nvif_vmm_pfnmap_v0 p;
-		} i;
-		u64 phys[16];
+		struct nouveau_pfnmap_args i;
+		u64 phys[1];
 	} args;
-	unsigned long hmm_pfns[ARRAY_SIZE(args.phys)];
-	struct vm_area_struct *vma;
+	unsigned long hmm_flags;
 	u64 inst, start, limit;
-	int fi, fn, pi, fill;
+	int fi, fn;
 	int replay = 0, ret;
 
 	/* Parse available fault buffer entries into a cache, and update
@@ -691,66 +677,53 @@ nouveau_svm_fault(struct nvif_notify *notify)
 		 * window into a single update.
 		 */
 		start = buffer->fault[fi]->addr;
-		limit = start + (ARRAY_SIZE(args.phys) << PAGE_SHIFT);
+		limit = start + PAGE_SIZE;
 		if (start < svmm->unmanaged.limit)
 			limit = min_t(u64, limit, svmm->unmanaged.start);
-		SVMM_DBG(svmm, "wndw %016llx-%016llx", start, limit);
 
-		mm = svmm->notifier.mm;
-		if (!mmget_not_zero(mm)) {
-			nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]);
-			continue;
-		}
-
-		/* Intersect fault window with the CPU VMA, cancelling
-		 * the fault if the address is invalid.
+		/*
+		 * Prepare the GPU-side update of all pages within the
+		 * fault window, determining required pages and access
+		 * permissions based on pending faults.
 		 */
-		mmap_read_lock(mm);
-		vma = find_vma_intersection(mm, start, limit);
-		if (!vma) {
-			SVMM_ERR(svmm, "wndw %016llx-%016llx", start, limit);
-			mmap_read_unlock(mm);
-			mmput(mm);
-			nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]);
-			continue;
+		args.i.p.addr = start;
+		args.i.p.page = PAGE_SHIFT;
+		args.i.p.size = PAGE_SIZE;
+		/*
+		 * Determine required permissions based on GPU fault
+		 * access flags.
+		 * XXX: atomic?
+		 */
+		switch (buffer->fault[fi]->access) {
+		case 0: /* READ. */
+			hmm_flags = HMM_PFN_REQ_FAULT;
+			break;
+		case 3: /* PREFETCH. */
+			hmm_flags = 0;
+			break;
+		default:
+			hmm_flags = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE;
+			break;
 		}
-		start = max_t(u64, start, vma->vm_start);
-		limit = min_t(u64, limit, vma->vm_end);
-		mmap_read_unlock(mm);
-		SVMM_DBG(svmm, "wndw %016llx-%016llx", start, limit);
 
-		if (buffer->fault[fi]->addr != start) {
-			SVMM_ERR(svmm, "addr %016llx", buffer->fault[fi]->addr);
-			mmput(mm);
+		mm = svmm->notifier.mm;
+		if (!mmget_not_zero(mm)) {
 			nouveau_svm_fault_cancel_fault(svm, buffer->fault[fi]);
 			continue;
 		}
 
-		/* Prepare the GPU-side update of all pages within the
-		 * fault window, determining required pages and access
-		 * permissions based on pending faults.
-		 */
-		args.i.p.page = PAGE_SHIFT;
-		args.i.p.addr = start;
-		for (fn = fi, pi = 0;;) {
-			/* Determine required permissions based on GPU fault
-			 * access flags.
-			 *XXX: atomic?
-			 */
-			switch (buffer->fault[fn]->access) {
-			case 0: /* READ. */
-				hmm_pfns[pi++] = HMM_PFN_REQ_FAULT;
-				break;
-			case 3: /* PREFETCH. */
-				hmm_pfns[pi++] = 0;
-				break;
-			default:
-				hmm_pfns[pi++] = HMM_PFN_REQ_FAULT |
-						 HMM_PFN_REQ_WRITE;
-				break;
-			}
-			args.i.p.size = pi << PAGE_SHIFT;
+		notifier.svmm = svmm;
+		ret = mmu_interval_notifier_insert(&notifier.notifier, mm,
+						   args.i.p.addr, args.i.p.size,
+						   &nouveau_svm_mni_ops);
+		if (!ret) {
+			ret = nouveau_range_fault(svmm, svm->drm, &args,
+				sizeof(args), args.phys, hmm_flags, &notifier);
+			mmu_interval_notifier_remove(&notifier.notifier);
+		}
+		mmput(mm);
 
+		for (fn = fi; ++fn < buffer->fault_nr; ) {
 			/* It's okay to skip over duplicate addresses from the
 			 * same SVMM as faults are ordered by access type such
 			 * that only the first one needs to be handled.
@@ -758,61 +731,21 @@ nouveau_svm_fault(struct nvif_notify *notify)
 			 * ie. WRITE faults appear first, thus any handling of
 			 * pending READ faults will already be satisfied.
 			 */
-			while (++fn < buffer->fault_nr &&
-			       buffer->fault[fn]->svmm == svmm &&
-			       buffer->fault[fn    ]->addr ==
-			       buffer->fault[fn - 1]->addr);
-
-			/* If the next fault is outside the window, or all GPU
-			 * faults have been dealt with, we're done here.
-			 */
-			if (fn >= buffer->fault_nr ||
-			    buffer->fault[fn]->svmm != svmm ||
+			if (buffer->fault[fn]->svmm != svmm ||
 			    buffer->fault[fn]->addr >= limit)
 				break;
-
-			/* Fill in the gap between this fault and the next. */
-			fill = (buffer->fault[fn    ]->addr -
-				buffer->fault[fn - 1]->addr) >> PAGE_SHIFT;
-			while (--fill)
-				hmm_pfns[pi++] = 0;
 		}
 
-		SVMM_DBG(svmm, "wndw %016llx-%016llx covering %d fault(s)",
-			 args.i.p.addr,
-			 args.i.p.addr + args.i.p.size, fn - fi);
-
-		notifier.svmm = svmm;
-		ret = mmu_interval_notifier_insert(&notifier.notifier,
-						   svmm->notifier.mm,
-						   args.i.p.addr, args.i.p.size,
-						   &nouveau_svm_mni_ops);
-		if (!ret) {
-			ret = nouveau_range_fault(
-				svmm, svm->drm, &args,
-				sizeof(args.i) + pi * sizeof(args.phys[0]),
-				hmm_pfns, args.phys, &notifier);
-			mmu_interval_notifier_remove(&notifier.notifier);
-		}
-		mmput(mm);
+		/* If handling failed completely, cancel all faults. */
+		if (ret) {
+			while (fi < fn) {
+				struct nouveau_svm_fault *fault =
+					buffer->fault[fi++];
 
-		/* Cancel any faults in the window whose pages didn't manage
-		 * to keep their valid bit, or stay writeable when required.
-		 *
-		 * If handling failed completely, cancel all faults.
-		 */
-		while (fi < fn) {
-			struct nouveau_svm_fault *fault = buffer->fault[fi++];
-			pi = (fault->addr - args.i.p.addr) >> PAGE_SHIFT;
-			if (ret ||
-			     !(args.phys[pi] & NVIF_VMM_PFNMAP_V0_V) ||
-			    (!(args.phys[pi] & NVIF_VMM_PFNMAP_V0_W) &&
-			     fault->access != 0 && fault->access != 3)) {
 				nouveau_svm_fault_cancel_fault(svm, fault);
-				continue;
 			}
+		} else
 			replay++;
-		}
 	}
 
 	/* Issue fault replay to the GPU. */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/5] mm/hmm: add hmm_mapping order
  2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 1/5] nouveau/hmm: fault one page at a time Ralph Campbell
@ 2020-07-01 22:53 ` Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 3/5] nouveau: fix mapping 2MB sysmem pages Ralph Campbell
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Ralph Campbell @ 2020-07-01 22:53 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Ralph Campbell

hmm_range_fault() returns an array of page frame numbers and flags for
how the pages are mapped in the requested process' page tables. The PFN
can be used to get the struct page with hmm_pfn_to_page() and the page
size order can be determined with compound_order(page) but if the page
is larger than order 0 (PAGE_SIZE), there is no indication that a
compound page is mapped by the CPU using a larger page size. Without
this information, the caller can't safely use a large device PTE to map
the compound page because the CPU might be using smaller PTEs with
different read/write permissions.

Add a new function hmm_pfn_to_map_order() to return the mapping size
order so that callers know the pages are being mapped with consistent
permissions and a large device page table mapping can be used if one is
available.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 include/linux/hmm.h | 24 ++++++++++++++++++++++--
 mm/hmm.c            | 14 +++++++++++---
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index f4a09ed223ac..e7a21a21f11f 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -37,16 +37,17 @@
  *                     will fail. Must be combined with HMM_PFN_REQ_FAULT.
  */
 enum hmm_pfn_flags {
-	/* Output flags */
+	/* Output fields and flags */
 	HMM_PFN_VALID = 1UL << (BITS_PER_LONG - 1),
 	HMM_PFN_WRITE = 1UL << (BITS_PER_LONG - 2),
 	HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3),
+	HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8),
 
 	/* Input flags */
 	HMM_PFN_REQ_FAULT = HMM_PFN_VALID,
 	HMM_PFN_REQ_WRITE = HMM_PFN_WRITE,
 
-	HMM_PFN_FLAGS = HMM_PFN_VALID | HMM_PFN_WRITE | HMM_PFN_ERROR,
+	HMM_PFN_FLAGS = 0xFFUL << HMM_PFN_ORDER_SHIFT,
 };
 
 /*
@@ -61,6 +62,25 @@ static inline struct page *hmm_pfn_to_page(unsigned long hmm_pfn)
 	return pfn_to_page(hmm_pfn & ~HMM_PFN_FLAGS);
 }
 
+/*
+ * hmm_pfn_to_map_order() - return the CPU mapping size order
+ *
+ * The hmm_pfn entry returned by hmm_range_fault() is for a PAGE_SIZE
+ * address range. hmm_pfn_to_map_order() lets the caller know that the
+ * underlying physical page order is at least as large as the return value and
+ * that the CPU has mapped that physical range with the same permissions so
+ * that a device MMU mapping of up to the size of the return value can be
+ * used without giving the device more access than the CPU process.
+ *
+ * This must be called under the caller 'user_lock' after a successful
+ * mmu_interval_read_begin(). The caller must have tested for HMM_PFN_VALID
+ * already.
+ */
+static inline unsigned int hmm_pfn_to_map_order(unsigned long hmm_pfn)
+{
+	return (hmm_pfn >> HMM_PFN_ORDER_SHIFT) & 0x1F;
+}
+
 /*
  * struct hmm_range - track invalidation lock on virtual address range
  *
diff --git a/mm/hmm.c b/mm/hmm.c
index e9a545751108..de04bbed47b3 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -170,7 +170,10 @@ static inline unsigned long pmd_to_hmm_pfn_flags(struct hmm_range *range,
 {
 	if (pmd_protnone(pmd))
 		return 0;
-	return pmd_write(pmd) ? (HMM_PFN_VALID | HMM_PFN_WRITE) : HMM_PFN_VALID;
+	return ((unsigned long)(PMD_SHIFT - PAGE_SHIFT) <<
+			HMM_PFN_ORDER_SHIFT) |
+		pmd_write(pmd) ? (HMM_PFN_VALID | HMM_PFN_WRITE) :
+				 HMM_PFN_VALID;
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -389,7 +392,10 @@ static inline unsigned long pud_to_hmm_pfn_flags(struct hmm_range *range,
 {
 	if (!pud_present(pud))
 		return 0;
-	return pud_write(pud) ? (HMM_PFN_VALID | HMM_PFN_WRITE) : HMM_PFN_VALID;
+	return ((unsigned long)(PUD_SHIFT - PAGE_SHIFT) <<
+			HMM_PFN_ORDER_SHIFT) |
+		pud_write(pud) ? (HMM_PFN_VALID | HMM_PFN_WRITE) :
+				 HMM_PFN_VALID;
 }
 
 static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
@@ -468,13 +474,15 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 	unsigned long cpu_flags;
 	spinlock_t *ptl;
 	pte_t entry;
+	unsigned long horder = huge_page_order(hstate_vma(vma));
 
 	ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
 	entry = huge_ptep_get(pte);
 
 	i = (start - range->start) >> PAGE_SHIFT;
 	pfn_req_flags = range->hmm_pfns[i];
-	cpu_flags = pte_to_hmm_pfn_flags(range, entry);
+	cpu_flags = pte_to_hmm_pfn_flags(range, entry) |
+			(horder << HMM_PFN_ORDER_SHIFT);
 	required_fault =
 		hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
 	if (required_fault) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/5] nouveau: fix mapping 2MB sysmem pages
  2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 1/5] nouveau/hmm: fault one page at a time Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 2/5] mm/hmm: add hmm_mapping order Ralph Campbell
@ 2020-07-01 22:53 ` Ralph Campbell
  2020-07-08  3:19   ` [Nouveau] " Ben Skeggs
  2020-07-01 22:53 ` [PATCH v3 4/5] nouveau/hmm: support mapping large " Ralph Campbell
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Ralph Campbell @ 2020-07-01 22:53 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Ralph Campbell

The nvif_object_ioctl() method NVIF_VMM_V0_PFNMAP wasn't correctly
setting the hardware specific GPU page table entries for 2MB sized
pages. Fix this by adding functions to set and clear PD0 GPU page
table entries.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  5 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    | 82 +++++++++++++++++++
 2 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
index 199f94e15c5f..19a6804e3989 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
@@ -1204,7 +1204,6 @@ nvkm_vmm_pfn_unmap(struct nvkm_vmm *vmm, u64 addr, u64 size)
 /*TODO:
  * - Avoid PT readback (for dma_unmap etc), this might end up being dealt
  *   with inside HMM, which would be a lot nicer for us to deal with.
- * - Multiple page sizes (particularly for huge page support).
  * - Support for systems without a 4KiB page size.
  */
 int
@@ -1220,8 +1219,8 @@ nvkm_vmm_pfn_map(struct nvkm_vmm *vmm, u8 shift, u64 addr, u64 size, u64 *pfn)
 	/* Only support mapping where the page size of the incoming page
 	 * array matches a page size available for direct mapping.
 	 */
-	while (page->shift && page->shift != shift &&
-	       page->desc->func->pfn == NULL)
+	while (page->shift && (page->shift != shift ||
+	       page->desc->func->pfn == NULL))
 		page++;
 
 	if (!page->shift || !IS_ALIGNED(addr, 1ULL << shift) ||
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index d86287565542..ed37fddd063f 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -258,12 +258,94 @@ gp100_vmm_pd0_unmap(struct nvkm_vmm *vmm,
 	VMM_FO128(pt, vmm, pdei * 0x10, 0ULL, 0ULL, pdes);
 }
 
+static void
+gp100_vmm_pd0_pfn_unmap(struct nvkm_vmm *vmm,
+			struct nvkm_mmu_pt *pt, u32 ptei, u32 ptes)
+{
+	struct device *dev = vmm->mmu->subdev.device->dev;
+	dma_addr_t addr;
+
+	nvkm_kmap(pt->memory);
+	while (ptes--) {
+		u32 datalo = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 0);
+		u32 datahi = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 4);
+		u64 data   = (u64)datahi << 32 | datalo;
+
+		if ((data & (3ULL << 1)) != 0) {
+			addr = (data >> 8) << 12;
+			dma_unmap_page(dev, addr, 1UL << 21, DMA_BIDIRECTIONAL);
+		}
+		ptei++;
+	}
+	nvkm_done(pt->memory);
+}
+
+static bool
+gp100_vmm_pd0_pfn_clear(struct nvkm_vmm *vmm,
+			struct nvkm_mmu_pt *pt, u32 ptei, u32 ptes)
+{
+	bool dma = false;
+
+	nvkm_kmap(pt->memory);
+	while (ptes--) {
+		u32 datalo = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 0);
+		u32 datahi = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 4);
+		u64 data   = (u64)datahi << 32 | datalo;
+
+		if ((data & BIT_ULL(0)) && (data & (3ULL << 1)) != 0) {
+			VMM_WO064(pt, vmm, ptei * 16, data & ~BIT_ULL(0));
+			dma = true;
+		}
+		ptei++;
+	}
+	nvkm_done(pt->memory);
+	return dma;
+}
+
+static void
+gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
+		  u32 ptei, u32 ptes, struct nvkm_vmm_map *map)
+{
+	struct device *dev = vmm->mmu->subdev.device->dev;
+	dma_addr_t addr;
+
+	nvkm_kmap(pt->memory);
+	while (ptes--) {
+		u64 data = 0;
+
+		if (!(*map->pfn & NVKM_VMM_PFN_W))
+			data |= BIT_ULL(6); /* RO. */
+
+		if (!(*map->pfn & NVKM_VMM_PFN_VRAM)) {
+			addr = *map->pfn >> NVKM_VMM_PFN_ADDR_SHIFT;
+			addr = dma_map_page(dev, pfn_to_page(addr), 0,
+					    1UL << 21, DMA_BIDIRECTIONAL);
+			if (!WARN_ON(dma_mapping_error(dev, addr))) {
+				data |= addr >> 4;
+				data |= 2ULL << 1; /* SYSTEM_COHERENT_MEMORY. */
+				data |= BIT_ULL(3); /* VOL. */
+				data |= BIT_ULL(0); /* VALID. */
+			}
+		} else {
+			data |= (*map->pfn & NVKM_VMM_PFN_ADDR) >> 4;
+			data |= BIT_ULL(0); /* VALID. */
+		}
+
+		VMM_WO064(pt, vmm, ptei++ * 16, data);
+		map->pfn++;
+	}
+	nvkm_done(pt->memory);
+}
+
 static const struct nvkm_vmm_desc_func
 gp100_vmm_desc_pd0 = {
 	.unmap = gp100_vmm_pd0_unmap,
 	.sparse = gp100_vmm_pd0_sparse,
 	.pde = gp100_vmm_pd0_pde,
 	.mem = gp100_vmm_pd0_mem,
+	.pfn = gp100_vmm_pd0_pfn,
+	.pfn_clear = gp100_vmm_pd0_pfn_clear,
+	.pfn_unmap = gp100_vmm_pd0_pfn_unmap,
 };
 
 static void
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 4/5] nouveau/hmm: support mapping large sysmem pages
  2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
                   ` (2 preceding siblings ...)
  2020-07-01 22:53 ` [PATCH v3 3/5] nouveau: fix mapping 2MB sysmem pages Ralph Campbell
@ 2020-07-01 22:53 ` Ralph Campbell
  2020-07-01 22:53 ` [PATCH v3 5/5] hmm: add tests for hmm_pfn_to_map_order() Ralph Campbell
  2020-07-10 19:27 ` [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Jason Gunthorpe
  5 siblings, 0 replies; 9+ messages in thread
From: Ralph Campbell @ 2020-07-01 22:53 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Ralph Campbell

Nouveau currently only supports mapping PAGE_SIZE sized pages of system
memory when shared virtual memory (SVM) is enabled. Use the new
hmm_pfn_to_map_order() function to support mapping system memory pages
that are PMD_SIZE.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 53 ++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 665dede69bd1..c5f8ca6fb2e3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -514,38 +514,57 @@ static const struct mmu_interval_notifier_ops nouveau_svm_mni_ops = {
 };
 
 static void nouveau_hmm_convert_pfn(struct nouveau_drm *drm,
-				    struct hmm_range *range, u64 *ioctl_addr)
+				    struct hmm_range *range,
+				    struct nouveau_pfnmap_args *args)
 {
 	struct page *page;
 
 	/*
-	 * The ioctl_addr prepared here is passed through nvif_object_ioctl()
+	 * The address prepared here is passed through nvif_object_ioctl()
 	 * to an eventual DMA map in something like gp100_vmm_pgt_pfn()
 	 *
 	 * This is all just encoding the internal hmm representation into a
 	 * different nouveau internal representation.
 	 */
 	if (!(range->hmm_pfns[0] & HMM_PFN_VALID)) {
-		ioctl_addr[0] = 0;
+		args->p.phys[0] = 0;
 		return;
 	}
 
 	page = hmm_pfn_to_page(range->hmm_pfns[0]);
+	/*
+	 * Only map compound pages to the GPU if the CPU is also mapping the
+	 * page as a compound page. Otherwise, the PTE protections might not be
+	 * consistent (e.g., CPU only maps part of a compound page).
+	 * Note that the underlying page might still be larger than the
+	 * CPU mapping (e.g., a PUD sized compound page partially mapped with
+	 * a PMD sized page table entry).
+	 */
+	if (hmm_pfn_to_map_order(range->hmm_pfns[0])) {
+		unsigned long addr = args->p.addr;
+
+		args->p.page = hmm_pfn_to_map_order(range->hmm_pfns[0]) +
+				PAGE_SHIFT;
+		args->p.size = 1UL << args->p.page;
+		args->p.addr &= ~(args->p.size - 1);
+		page -= (addr - args->p.addr) >> PAGE_SHIFT;
+	}
 	if (is_device_private_page(page))
-		ioctl_addr[0] = nouveau_dmem_page_addr(page) |
+		args->p.phys[0] = nouveau_dmem_page_addr(page) |
 				NVIF_VMM_PFNMAP_V0_V |
 				NVIF_VMM_PFNMAP_V0_VRAM;
 	else
-		ioctl_addr[0] = page_to_phys(page) |
+		args->p.phys[0] = page_to_phys(page) |
 				NVIF_VMM_PFNMAP_V0_V |
 				NVIF_VMM_PFNMAP_V0_HOST;
 	if (range->hmm_pfns[0] & HMM_PFN_WRITE)
-		ioctl_addr[0] |= NVIF_VMM_PFNMAP_V0_W;
+		args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W;
 }
 
 static int nouveau_range_fault(struct nouveau_svmm *svmm,
-			       struct nouveau_drm *drm, void *data, u32 size,
-			       u64 *ioctl_addr, unsigned long hmm_flags,
+			       struct nouveau_drm *drm,
+			       struct nouveau_pfnmap_args *args, u32 size,
+			       unsigned long hmm_flags,
 			       struct svm_notifier *notifier)
 {
 	unsigned long timeout =
@@ -585,10 +604,10 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
 		break;
 	}
 
-	nouveau_hmm_convert_pfn(drm, &range, ioctl_addr);
+	nouveau_hmm_convert_pfn(drm, &range, args);
 
 	svmm->vmm->vmm.object.client->super = true;
-	ret = nvif_object_ioctl(&svmm->vmm->vmm.object, data, size, NULL);
+	ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL);
 	svmm->vmm->vmm.object.client->super = false;
 	mutex_unlock(&svmm->mutex);
 
@@ -717,12 +736,13 @@ nouveau_svm_fault(struct nvif_notify *notify)
 						   args.i.p.addr, args.i.p.size,
 						   &nouveau_svm_mni_ops);
 		if (!ret) {
-			ret = nouveau_range_fault(svmm, svm->drm, &args,
-				sizeof(args), args.phys, hmm_flags, &notifier);
+			ret = nouveau_range_fault(svmm, svm->drm, &args.i,
+				sizeof(args), hmm_flags, &notifier);
 			mmu_interval_notifier_remove(&notifier.notifier);
 		}
 		mmput(mm);
 
+		limit = args.i.p.addr + args.i.p.size;
 		for (fn = fi; ++fn < buffer->fault_nr; ) {
 			/* It's okay to skip over duplicate addresses from the
 			 * same SVMM as faults are ordered by access type such
@@ -730,9 +750,16 @@ nouveau_svm_fault(struct nvif_notify *notify)
 			 *
 			 * ie. WRITE faults appear first, thus any handling of
 			 * pending READ faults will already be satisfied.
+			 * But if a large page is mapped, make sure subsequent
+			 * fault addresses have sufficient access permission.
 			 */
 			if (buffer->fault[fn]->svmm != svmm ||
-			    buffer->fault[fn]->addr >= limit)
+			    buffer->fault[fn]->addr >= limit ||
+			    (buffer->fault[fi]->access == 0 /* READ. */ &&
+			     !(args.phys[0] & NVIF_VMM_PFNMAP_V0_V)) ||
+			    (buffer->fault[fi]->access != 0 /* READ. */ &&
+			     buffer->fault[fi]->access != 3 /* PREFETCH. */ &&
+			     !(args.phys[0] & NVIF_VMM_PFNMAP_V0_W)))
 				break;
 		}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 5/5] hmm: add tests for hmm_pfn_to_map_order()
  2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
                   ` (3 preceding siblings ...)
  2020-07-01 22:53 ` [PATCH v3 4/5] nouveau/hmm: support mapping large " Ralph Campbell
@ 2020-07-01 22:53 ` Ralph Campbell
  2020-07-10 19:27 ` [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Jason Gunthorpe
  5 siblings, 0 replies; 9+ messages in thread
From: Ralph Campbell @ 2020-07-01 22:53 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Ralph Campbell

Add a sanity test for hmm_range_fault() returning the page mapping size
order.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 lib/test_hmm.c                         |  4 ++
 lib/test_hmm_uapi.h                    |  4 ++
 tools/testing/selftests/vm/hmm-tests.c | 76 ++++++++++++++++++++++++++
 3 files changed, 84 insertions(+)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index a2a82262b97b..9aa577afc269 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -766,6 +766,10 @@ static void dmirror_mkentry(struct dmirror *dmirror, struct hmm_range *range,
 		*perm |= HMM_DMIRROR_PROT_WRITE;
 	else
 		*perm |= HMM_DMIRROR_PROT_READ;
+	if (hmm_pfn_to_map_order(entry) + PAGE_SHIFT == PMD_SHIFT)
+		*perm |= HMM_DMIRROR_PROT_PMD;
+	else if (hmm_pfn_to_map_order(entry) + PAGE_SHIFT == PUD_SHIFT)
+		*perm |= HMM_DMIRROR_PROT_PUD;
 }
 
 static bool dmirror_snapshot_invalidate(struct mmu_interval_notifier *mni,
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index 67b3b2e6ff5d..670b4ef2a5b6 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -40,6 +40,8 @@ struct hmm_dmirror_cmd {
  * HMM_DMIRROR_PROT_NONE: unpopulated PTE or PTE with no access
  * HMM_DMIRROR_PROT_READ: read-only PTE
  * HMM_DMIRROR_PROT_WRITE: read/write PTE
+ * HMM_DMIRROR_PROT_PMD: PMD sized page is fully mapped by same permissions
+ * HMM_DMIRROR_PROT_PUD: PUD sized page is fully mapped by same permissions
  * HMM_DMIRROR_PROT_ZERO: special read-only zero page
  * HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL: Migrated device private page on the
  *					device the ioctl() is made
@@ -51,6 +53,8 @@ enum {
 	HMM_DMIRROR_PROT_NONE			= 0x00,
 	HMM_DMIRROR_PROT_READ			= 0x01,
 	HMM_DMIRROR_PROT_WRITE			= 0x02,
+	HMM_DMIRROR_PROT_PMD			= 0x04,
+	HMM_DMIRROR_PROT_PUD			= 0x08,
 	HMM_DMIRROR_PROT_ZERO			= 0x10,
 	HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL	= 0x20,
 	HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE	= 0x30,
diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 79db22604019..b533dd08da1d 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -1291,6 +1291,82 @@ TEST_F(hmm2, snapshot)
 	hmm_buffer_free(buffer);
 }
 
+/*
+ * Test the hmm_range_fault() HMM_PFN_PMD flag for large pages that
+ * should be mapped by a large page table entry.
+ */
+TEST_F(hmm, compound)
+{
+	struct hmm_buffer *buffer;
+	unsigned long npages;
+	unsigned long size;
+	int *ptr;
+	unsigned char *m;
+	int ret;
+	long pagesizes[4];
+	int n, idx;
+	unsigned long i;
+
+	/* Skip test if we can't allocate a hugetlbfs page. */
+
+	n = gethugepagesizes(pagesizes, 4);
+	if (n <= 0)
+		return;
+	for (idx = 0; --n > 0; ) {
+		if (pagesizes[n] < pagesizes[idx])
+			idx = n;
+	}
+	size = ALIGN(TWOMEG, pagesizes[idx]);
+	npages = size >> self->page_shift;
+
+	buffer = malloc(sizeof(*buffer));
+	ASSERT_NE(buffer, NULL);
+
+	buffer->ptr = get_hugepage_region(size, GHR_STRICT);
+	if (buffer->ptr == NULL) {
+		free(buffer);
+		return;
+	}
+
+	buffer->size = size;
+	buffer->mirror = malloc(npages);
+	ASSERT_NE(buffer->mirror, NULL);
+
+	/* Initialize the pages the device will snapshot in buffer->ptr. */
+	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+		ptr[i] = i;
+
+	/* Simulate a device snapshotting CPU pagetables. */
+	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+
+	/* Check what the device saw. */
+	m = buffer->mirror;
+	for (i = 0; i < npages; ++i)
+		ASSERT_EQ(m[i], HMM_DMIRROR_PROT_WRITE |
+				HMM_DMIRROR_PROT_PMD);
+
+	/* Make the region read-only. */
+	ret = mprotect(buffer->ptr, size, PROT_READ);
+	ASSERT_EQ(ret, 0);
+
+	/* Simulate a device snapshotting CPU pagetables. */
+	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+
+	/* Check what the device saw. */
+	m = buffer->mirror;
+	for (i = 0; i < npages; ++i)
+		ASSERT_EQ(m[i], HMM_DMIRROR_PROT_READ |
+				HMM_DMIRROR_PROT_PMD);
+
+	free_hugepage_region(buffer->ptr);
+	buffer->ptr = NULL;
+	hmm_buffer_free(buffer);
+}
+
 /*
  * Test two devices reading the same memory (double mapped).
  */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Nouveau] [PATCH v3 3/5] nouveau: fix mapping 2MB sysmem pages
  2020-07-01 22:53 ` [PATCH v3 3/5] nouveau: fix mapping 2MB sysmem pages Ralph Campbell
@ 2020-07-08  3:19   ` Ben Skeggs
  0 siblings, 0 replies; 9+ messages in thread
From: Ben Skeggs @ 2020-07-08  3:19 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, ML nouveau, linux-kselftest, LKML,
	Jason Gunthorpe, Ben Skeggs, Andrew Morton, Shuah Khan,
	Christoph Hellwig

On Thu, 2 Jul 2020 at 08:54, Ralph Campbell <rcampbell@nvidia.com> wrote:
>
> The nvif_object_ioctl() method NVIF_VMM_V0_PFNMAP wasn't correctly
> setting the hardware specific GPU page table entries for 2MB sized
> pages. Fix this by adding functions to set and clear PD0 GPU page
> table entries.
I can take this one in my tree now, it's fairly independent of the rest.

Ben.

>
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  5 +-
>  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    | 82 +++++++++++++++++++
>  2 files changed, 84 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> index 199f94e15c5f..19a6804e3989 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c
> @@ -1204,7 +1204,6 @@ nvkm_vmm_pfn_unmap(struct nvkm_vmm *vmm, u64 addr, u64 size)
>  /*TODO:
>   * - Avoid PT readback (for dma_unmap etc), this might end up being dealt
>   *   with inside HMM, which would be a lot nicer for us to deal with.
> - * - Multiple page sizes (particularly for huge page support).
>   * - Support for systems without a 4KiB page size.
>   */
>  int
> @@ -1220,8 +1219,8 @@ nvkm_vmm_pfn_map(struct nvkm_vmm *vmm, u8 shift, u64 addr, u64 size, u64 *pfn)
>         /* Only support mapping where the page size of the incoming page
>          * array matches a page size available for direct mapping.
>          */
> -       while (page->shift && page->shift != shift &&
> -              page->desc->func->pfn == NULL)
> +       while (page->shift && (page->shift != shift ||
> +              page->desc->func->pfn == NULL))
>                 page++;
>
>         if (!page->shift || !IS_ALIGNED(addr, 1ULL << shift) ||
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
> index d86287565542..ed37fddd063f 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
> @@ -258,12 +258,94 @@ gp100_vmm_pd0_unmap(struct nvkm_vmm *vmm,
>         VMM_FO128(pt, vmm, pdei * 0x10, 0ULL, 0ULL, pdes);
>  }
>
> +static void
> +gp100_vmm_pd0_pfn_unmap(struct nvkm_vmm *vmm,
> +                       struct nvkm_mmu_pt *pt, u32 ptei, u32 ptes)
> +{
> +       struct device *dev = vmm->mmu->subdev.device->dev;
> +       dma_addr_t addr;
> +
> +       nvkm_kmap(pt->memory);
> +       while (ptes--) {
> +               u32 datalo = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 0);
> +               u32 datahi = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 4);
> +               u64 data   = (u64)datahi << 32 | datalo;
> +
> +               if ((data & (3ULL << 1)) != 0) {
> +                       addr = (data >> 8) << 12;
> +                       dma_unmap_page(dev, addr, 1UL << 21, DMA_BIDIRECTIONAL);
> +               }
> +               ptei++;
> +       }
> +       nvkm_done(pt->memory);
> +}
> +
> +static bool
> +gp100_vmm_pd0_pfn_clear(struct nvkm_vmm *vmm,
> +                       struct nvkm_mmu_pt *pt, u32 ptei, u32 ptes)
> +{
> +       bool dma = false;
> +
> +       nvkm_kmap(pt->memory);
> +       while (ptes--) {
> +               u32 datalo = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 0);
> +               u32 datahi = nvkm_ro32(pt->memory, pt->base + ptei * 16 + 4);
> +               u64 data   = (u64)datahi << 32 | datalo;
> +
> +               if ((data & BIT_ULL(0)) && (data & (3ULL << 1)) != 0) {
> +                       VMM_WO064(pt, vmm, ptei * 16, data & ~BIT_ULL(0));
> +                       dma = true;
> +               }
> +               ptei++;
> +       }
> +       nvkm_done(pt->memory);
> +       return dma;
> +}
> +
> +static void
> +gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
> +                 u32 ptei, u32 ptes, struct nvkm_vmm_map *map)
> +{
> +       struct device *dev = vmm->mmu->subdev.device->dev;
> +       dma_addr_t addr;
> +
> +       nvkm_kmap(pt->memory);
> +       while (ptes--) {
> +               u64 data = 0;
> +
> +               if (!(*map->pfn & NVKM_VMM_PFN_W))
> +                       data |= BIT_ULL(6); /* RO. */
> +
> +               if (!(*map->pfn & NVKM_VMM_PFN_VRAM)) {
> +                       addr = *map->pfn >> NVKM_VMM_PFN_ADDR_SHIFT;
> +                       addr = dma_map_page(dev, pfn_to_page(addr), 0,
> +                                           1UL << 21, DMA_BIDIRECTIONAL);
> +                       if (!WARN_ON(dma_mapping_error(dev, addr))) {
> +                               data |= addr >> 4;
> +                               data |= 2ULL << 1; /* SYSTEM_COHERENT_MEMORY. */
> +                               data |= BIT_ULL(3); /* VOL. */
> +                               data |= BIT_ULL(0); /* VALID. */
> +                       }
> +               } else {
> +                       data |= (*map->pfn & NVKM_VMM_PFN_ADDR) >> 4;
> +                       data |= BIT_ULL(0); /* VALID. */
> +               }
> +
> +               VMM_WO064(pt, vmm, ptei++ * 16, data);
> +               map->pfn++;
> +       }
> +       nvkm_done(pt->memory);
> +}
> +
>  static const struct nvkm_vmm_desc_func
>  gp100_vmm_desc_pd0 = {
>         .unmap = gp100_vmm_pd0_unmap,
>         .sparse = gp100_vmm_pd0_sparse,
>         .pde = gp100_vmm_pd0_pde,
>         .mem = gp100_vmm_pd0_mem,
> +       .pfn = gp100_vmm_pd0_pfn,
> +       .pfn_clear = gp100_vmm_pd0_pfn_clear,
> +       .pfn_unmap = gp100_vmm_pd0_pfn_unmap,
>  };
>
>  static void
> --
> 2.20.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping
  2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
                   ` (4 preceding siblings ...)
  2020-07-01 22:53 ` [PATCH v3 5/5] hmm: add tests for hmm_pfn_to_map_order() Ralph Campbell
@ 2020-07-10 19:27 ` Jason Gunthorpe
  2020-07-10 20:13   ` Ralph Campbell
  5 siblings, 1 reply; 9+ messages in thread
From: Jason Gunthorpe @ 2020-07-10 19:27 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel,
	Jerome Glisse, John Hubbard, Christoph Hellwig, Andrew Morton,
	Shuah Khan, Ben Skeggs

On Wed, Jul 01, 2020 at 03:53:47PM -0700, Ralph Campbell wrote:
> The goal for this series is to introduce the hmm_pfn_to_map_order()
> function. This allows a device driver to know that a given 4K PFN is
> actually mapped by the CPU using a larger sized CPU page table entry and
> therefore the device driver can safely map system memory using larger
> device MMU PTEs.
> The series is based on 5.8.0-rc3 and is intended for Jason Gunthorpe's
> hmm tree. These were originally part of a larger series:
> https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell@nvidia.com/
> 
> Changes in v3:
> Replaced the HMM_PFN_P[MU]D flags with hmm_pfn_to_map_order() to
> indicate the size of the CPU mapping.
> 
> Changes in v2:
> Make the hmm_range_fault() API changes into a separate series and add
>   two output flags for PMD/PUD instead of a single compund page flag as
>   suggested by Jason Gunthorpe.
> Make the nouveau page table changes a separate patch as suggested by
>   Ben Skeggs.
> Only add support for 2MB nouveau mappings initially since changing the
> 1:1 CPU/GPU page table size assumptions requires a bigger set of changes.
> Rebase to 5.8.0-rc3.
> 
> Ralph Campbell (5):
>   nouveau/hmm: fault one page at a time
>   mm/hmm: add hmm_mapping order
>   nouveau: fix mapping 2MB sysmem pages
>   nouveau/hmm: support mapping large sysmem pages
>   hmm: add tests for HMM_PFN_PMD flag

Applied to hmm.git. 

I edited the comment for hmm_pfn_to_map_order() and added a function
to compute the field.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping
  2020-07-10 19:27 ` [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Jason Gunthorpe
@ 2020-07-10 20:13   ` Ralph Campbell
  0 siblings, 0 replies; 9+ messages in thread
From: Ralph Campbell @ 2020-07-10 20:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, linux-kselftest, linux-kernel,
	Jerome Glisse, John Hubbard, Christoph Hellwig, Andrew Morton,
	Shuah Khan, Ben Skeggs


On 7/10/20 12:27 PM, Jason Gunthorpe wrote:
> On Wed, Jul 01, 2020 at 03:53:47PM -0700, Ralph Campbell wrote:
>> The goal for this series is to introduce the hmm_pfn_to_map_order()
>> function. This allows a device driver to know that a given 4K PFN is
>> actually mapped by the CPU using a larger sized CPU page table entry and
>> therefore the device driver can safely map system memory using larger
>> device MMU PTEs.
>> The series is based on 5.8.0-rc3 and is intended for Jason Gunthorpe's
>> hmm tree. These were originally part of a larger series:
>> https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell@nvidia.com/
>>
>> Changes in v3:
>> Replaced the HMM_PFN_P[MU]D flags with hmm_pfn_to_map_order() to
>> indicate the size of the CPU mapping.
>>
>> Changes in v2:
>> Make the hmm_range_fault() API changes into a separate series and add
>>    two output flags for PMD/PUD instead of a single compund page flag as
>>    suggested by Jason Gunthorpe.
>> Make the nouveau page table changes a separate patch as suggested by
>>    Ben Skeggs.
>> Only add support for 2MB nouveau mappings initially since changing the
>> 1:1 CPU/GPU page table size assumptions requires a bigger set of changes.
>> Rebase to 5.8.0-rc3.
>>
>> Ralph Campbell (5):
>>    nouveau/hmm: fault one page at a time
>>    mm/hmm: add hmm_mapping order
>>    nouveau: fix mapping 2MB sysmem pages
>>    nouveau/hmm: support mapping large sysmem pages
>>    hmm: add tests for HMM_PFN_PMD flag
> 
> Applied to hmm.git.
> 
> I edited the comment for hmm_pfn_to_map_order() and added a function
> to compute the field.
> 
> Thanks,
> Jason

Looks good, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-10 20:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01 22:53 [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Ralph Campbell
2020-07-01 22:53 ` [PATCH v3 1/5] nouveau/hmm: fault one page at a time Ralph Campbell
2020-07-01 22:53 ` [PATCH v3 2/5] mm/hmm: add hmm_mapping order Ralph Campbell
2020-07-01 22:53 ` [PATCH v3 3/5] nouveau: fix mapping 2MB sysmem pages Ralph Campbell
2020-07-08  3:19   ` [Nouveau] " Ben Skeggs
2020-07-01 22:53 ` [PATCH v3 4/5] nouveau/hmm: support mapping large " Ralph Campbell
2020-07-01 22:53 ` [PATCH v3 5/5] hmm: add tests for hmm_pfn_to_map_order() Ralph Campbell
2020-07-10 19:27 ` [PATCH v3 0/5] mm/hmm/nouveau: add PMD system memory mapping Jason Gunthorpe
2020-07-10 20:13   ` Ralph Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).