Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 0/6] mm/migrate: avoid device private invalidations
@ 2020-07-23 22:29 Ralph Campbell
  2020-07-23 22:29 ` [PATCH v4 1/6] nouveau: fix storing invalid ptes Ralph Campbell
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:29 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

The goal for this series is to avoid device private memory TLB
invalidations when migrating a range of addresses from system
memory to device private memory and some of those pages have already
been migrated. The approach taken is to introduce a new mmu notifier
invalidation event type and use that in the device driver to skip
invalidation callbacks from migrate_vma_setup(). The device driver is
also then expected to handle device MMU invalidations as part of the
migrate_vma_setup(), migrate_vma_pages(), migrate_vma_finalize() process.
Note that this is opt-in. A device driver can simply invalidate its MMU
in the mmu notifier callback and not handle MMU invalidations in the
migration sequence.

This series is based on Jason Gunthorpe's HMM tree (linux-5.8.0-rc4).

Also, this replaces the need for the following two patches I sent:
("mm: fix migrate_vma_setup() src_owner and normal pages")
https://lore.kernel.org/linux-mm/20200622222008.9971-1-rcampbell@nvidia.com
("nouveau: fix mixed normal and device private page migration")
https://lore.kernel.org/lkml/20200622233854.10889-3-rcampbell@nvidia.com

Changes in v4:
Added reviewed-by from Bharata B Rao.
Removed dead code checking for source device private page in lib/test_hmm.c
  dmirror_migrate_alloc_and_copy() since the source filter flag guarantees
  that.
Added patch 6 to remove a redundant invalidation in migrate_vma_pages().

Changes in v3:
Changed the direction field "dir" to a "flags" field and renamed
  src_owner to pgmap_owner.
Fixed a locking issue in nouveau for the migration invalidation.
Added a HMM selftest test case to exercise the HMM test driver
  invalidation changes.
Removed reviewed-by Bharata B Rao since this version is moderately
  changed.

Changes in v2:
Rebase to Jason Gunthorpe's HMM tree.
Added reviewed-by from Bharata B Rao.
Rename the mmu_notifier_range::data field to migrate_pgmap_owner as
  suggested by Jason Gunthorpe.

Ralph Campbell (6):
  nouveau: fix storing invalid ptes
  mm/migrate: add a flags parameter to migrate_vma
  mm/notifier: add migration invalidation type
  nouveau/svm: use the new migration invalidation
  mm/hmm/test: use the new migration invalidation
  mm/migrate: remove range invalidation in migrate_vma_pages()

 arch/powerpc/kvm/book3s_hv_uvmem.c            |  4 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c        | 19 ++++++--
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 21 ++++-----
 drivers/gpu/drm/nouveau/nouveau_svm.h         | 13 +++++-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    | 13 ++++--
 include/linux/migrate.h                       | 16 +++++--
 include/linux/mmu_notifier.h                  |  7 +++
 lib/test_hmm.c                                | 43 +++++++++----------
 mm/migrate.c                                  | 34 +++++----------
 tools/testing/selftests/vm/hmm-tests.c        | 18 ++++++--
 10 files changed, 112 insertions(+), 76 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/6] nouveau: fix storing invalid ptes
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
@ 2020-07-23 22:29 ` Ralph Campbell
  2020-07-23 22:30 ` [PATCH v4 2/6] mm/migrate: add a flags parameter to migrate_vma Ralph Campbell
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:29 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

When migrating a range of system memory to device private memory, some
of the pages in the address range may not be migrating. In this case,
the non migrating pages won't have a new GPU MMU entry to store but
the nvif_object_ioctl() NVIF_VMM_V0_PFNMAP method doesn't check the input
and stores a bad valid GPU page table entry.
Fix this by skipping the invalid input PTEs when updating the GPU page
tables.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index ed37fddd063f..7eabe9fe0d2b 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -79,8 +79,12 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 	dma_addr_t addr;
 
 	nvkm_kmap(pt->memory);
-	while (ptes--) {
+	for (; ptes; ptes--, map->pfn++) {
 		u64 data = 0;
+
+		if (!(*map->pfn & NVKM_VMM_PFN_V))
+			continue;
+
 		if (!(*map->pfn & NVKM_VMM_PFN_W))
 			data |= BIT_ULL(6); /* RO. */
 
@@ -100,7 +104,6 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 		}
 
 		VMM_WO064(pt, vmm, ptei++ * 8, data);
-		map->pfn++;
 	}
 	nvkm_done(pt->memory);
 }
@@ -310,9 +313,12 @@ gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 	dma_addr_t addr;
 
 	nvkm_kmap(pt->memory);
-	while (ptes--) {
+	for (; ptes; ptes--, map->pfn++) {
 		u64 data = 0;
 
+		if (!(*map->pfn & NVKM_VMM_PFN_V))
+			continue;
+
 		if (!(*map->pfn & NVKM_VMM_PFN_W))
 			data |= BIT_ULL(6); /* RO. */
 
@@ -332,7 +338,6 @@ gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 		}
 
 		VMM_WO064(pt, vmm, ptei++ * 16, data);
-		map->pfn++;
 	}
 	nvkm_done(pt->memory);
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 2/6] mm/migrate: add a flags parameter to migrate_vma
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
  2020-07-23 22:29 ` [PATCH v4 1/6] nouveau: fix storing invalid ptes Ralph Campbell
@ 2020-07-23 22:30 ` Ralph Campbell
  2020-07-23 22:30 ` [PATCH v4 3/6] mm/notifier: add migration invalidation type Ralph Campbell
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:30 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

The src_owner field in struct migrate_vma is being used for two purposes,
it acts as a selection filter for which types of pages are to be migrated
and it identifies device private pages owned by the caller. Split this
into separate parameters so the src_owner field can be used just to
identify device private pages owned by the caller of migrate_vma_setup().
Rename the src_owner field to pgmap_owner to reflect it is now used only
to identify which device private pages to migrate.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c     |  4 +++-
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  4 +++-
 include/linux/migrate.h                | 13 +++++++++----
 lib/test_hmm.c                         | 15 ++++-----------
 mm/migrate.c                           |  6 ++++--
 5 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 09d8119024db..6850bd04bcb9 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -400,6 +400,7 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	mig.end = end;
 	mig.src = &src_pfn;
 	mig.dst = &dst_pfn;
+	mig.flags = MIGRATE_VMA_SELECT_SYSTEM;
 
 	/*
 	 * We come here with mmap_lock write lock held just for
@@ -577,7 +578,8 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
 	mig.end = end;
 	mig.src = &src_pfn;
 	mig.dst = &dst_pfn;
-	mig.src_owner = &kvmppc_uvmem_pgmap;
+	mig.pgmap_owner = &kvmppc_uvmem_pgmap;
+	mig.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 
 	mutex_lock(&kvm->arch.uvmem_lock);
 	/* The requested page is already paged-out, nothing to do */
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index e5c230d9ae24..78b9e3c2a5b3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -182,7 +182,8 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
 		.end		= vmf->address + PAGE_SIZE,
 		.src		= &src,
 		.dst		= &dst,
-		.src_owner	= drm->dev,
+		.pgmap_owner	= drm->dev,
+		.flags		= MIGRATE_VMA_SELECT_DEVICE_PRIVATE,
 	};
 
 	/*
@@ -615,6 +616,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
 	struct migrate_vma args = {
 		.vma		= vma,
 		.start		= start,
+		.flags		= MIGRATE_VMA_SELECT_SYSTEM,
 	};
 	unsigned long i;
 	u64 *pfns;
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cbf03dd..aafec0ca7b41 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
 	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
 }
 
+enum migrate_vma_direction {
+	MIGRATE_VMA_SELECT_SYSTEM         = (1UL << 0),
+	MIGRATE_VMA_SELECT_DEVICE_PRIVATE = (1UL << 1),
+};
+
 struct migrate_vma {
 	struct vm_area_struct	*vma;
 	/*
@@ -199,11 +204,11 @@ struct migrate_vma {
 
 	/*
 	 * Set to the owner value also stored in page->pgmap->owner for
-	 * migrating out of device private memory.  If set only device
-	 * private pages with this owner are migrated.  If not set
-	 * device private pages are not migrated at all.
+	 * migrating out of device private memory. The flags also need to
+	 * be set to MIGRATE_VMA_SELECT_DEVICE_PRIVATE.
 	 */
-	void			*src_owner;
+	void			*pgmap_owner;
+	unsigned long		flags;
 };
 
 int migrate_vma_setup(struct migrate_vma *args);
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 9aa577afc269..e78a1414f58e 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -585,15 +585,6 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
 		 */
 		spage = migrate_pfn_to_page(*src);
 
-		/*
-		 * Don't migrate device private pages from our own driver or
-		 * others. For our own we would do a device private memory copy
-		 * not a migration and for others, we would need to fault the
-		 * other device's page into system memory first.
-		 */
-		if (spage && is_zone_device_page(spage))
-			continue;
-
 		dpage = dmirror_devmem_alloc_page(mdevice);
 		if (!dpage)
 			continue;
@@ -702,7 +693,8 @@ static int dmirror_migrate(struct dmirror *dmirror,
 		args.dst = dst_pfns;
 		args.start = addr;
 		args.end = next;
-		args.src_owner = NULL;
+		args.pgmap_owner = NULL;
+		args.flags = MIGRATE_VMA_SELECT_SYSTEM;
 		ret = migrate_vma_setup(&args);
 		if (ret)
 			goto out;
@@ -1053,7 +1045,8 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 	args.end = args.start + PAGE_SIZE;
 	args.src = &src_pfns;
 	args.dst = &dst_pfns;
-	args.src_owner = dmirror->mdevice;
+	args.pgmap_owner = dmirror->mdevice;
+	args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 
 	if (migrate_vma_setup(&args))
 		return VM_FAULT_SIGBUS;
diff --git a/mm/migrate.c b/mm/migrate.c
index f37729673558..e3ea68e3a08b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2287,7 +2287,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				goto next;
 
 			page = device_private_entry_to_page(entry);
-			if (page->pgmap->owner != migrate->src_owner)
+			if (!(migrate->flags &
+				MIGRATE_VMA_SELECT_DEVICE_PRIVATE) ||
+			    page->pgmap->owner != migrate->pgmap_owner)
 				goto next;
 
 			mpfn = migrate_pfn(page_to_pfn(page)) |
@@ -2295,7 +2297,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 			if (is_write_device_private_entry(entry))
 				mpfn |= MIGRATE_PFN_WRITE;
 		} else {
-			if (migrate->src_owner)
+			if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
 				goto next;
 			pfn = pte_pfn(pte);
 			if (is_zero_pfn(pfn)) {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 3/6] mm/notifier: add migration invalidation type
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
  2020-07-23 22:29 ` [PATCH v4 1/6] nouveau: fix storing invalid ptes Ralph Campbell
  2020-07-23 22:30 ` [PATCH v4 2/6] mm/migrate: add a flags parameter to migrate_vma Ralph Campbell
@ 2020-07-23 22:30 ` Ralph Campbell
  2020-07-28 19:15   ` Jason Gunthorpe
  2020-07-23 22:30 ` [PATCH v4 4/6] nouveau/svm: use the new migration invalidation Ralph Campbell
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:30 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start()
which flushes all device private page mappings whether or not a page
is being migrated to/from device private memory. In order to not disrupt
device mappings that are not being migrated, shift the responsibility
for clearing device private mappings to the device driver and leave
CPU page table unmapping handled by migrate_vma_setup(). To support
this, the caller of migrate_vma_setup() should always set struct
migrate_vma::pgmap_owner to a non NULL value that matches the device
private page->pgmap->owner. This value is then passed to the struct
mmu_notifier_range with a new event type which the driver's invalidation
function can use to avoid device MMU invalidations.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 include/linux/migrate.h      | 3 +++
 include/linux/mmu_notifier.h | 7 +++++++
 mm/migrate.c                 | 8 +++++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index aafec0ca7b41..4cc7097d0271 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -206,6 +206,9 @@ struct migrate_vma {
 	 * Set to the owner value also stored in page->pgmap->owner for
 	 * migrating out of device private memory. The flags also need to
 	 * be set to MIGRATE_VMA_SELECT_DEVICE_PRIVATE.
+	 * The caller should always set this field when using mmu notifier
+	 * callbacks to avoid device MMU invalidations for device private
+	 * pages that are not being migrated.
 	 */
 	void			*pgmap_owner;
 	unsigned long		flags;
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index fc68f3570e19..1921fcf6be5b 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -38,6 +38,10 @@ struct mmu_interval_notifier;
  *
  * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
  * that the mm refcount is zero and the range is no longer accessible.
+ *
+ * @MMU_NOTIFY_MIGRATE: used during migrate_vma_collect() invalidate to signal
+ * a device driver to possibly ignore the invalidation if the
+ * migrate_pgmap_owner field matches the driver's device private pgmap owner.
  */
 enum mmu_notifier_event {
 	MMU_NOTIFY_UNMAP = 0,
@@ -46,6 +50,7 @@ enum mmu_notifier_event {
 	MMU_NOTIFY_PROTECTION_PAGE,
 	MMU_NOTIFY_SOFT_DIRTY,
 	MMU_NOTIFY_RELEASE,
+	MMU_NOTIFY_MIGRATE,
 };
 
 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
@@ -264,6 +269,7 @@ struct mmu_notifier_range {
 	unsigned long end;
 	unsigned flags;
 	enum mmu_notifier_event event;
+	void *migrate_pgmap_owner;
 };
 
 static inline int mm_has_notifiers(struct mm_struct *mm)
@@ -513,6 +519,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
 	range->start = start;
 	range->end = end;
 	range->flags = flags;
+	range->migrate_pgmap_owner = NULL;
 }
 
 #define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
diff --git a/mm/migrate.c b/mm/migrate.c
index e3ea68e3a08b..96e1f41a991e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2392,8 +2392,14 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
 {
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL,
+	/*
+	 * Note that the pgmap_owner is passed to the mmu notifier callback so
+	 * that the registered device driver can skip invalidating device
+	 * private page mappings that won't be migrated.
+	 */
+	mmu_notifier_range_init(&range, MMU_NOTIFY_MIGRATE, 0, migrate->vma,
 			migrate->vma->vm_mm, migrate->start, migrate->end);
+	range.migrate_pgmap_owner = migrate->pgmap_owner;
 	mmu_notifier_invalidate_range_start(&range);
 
 	walk_page_range(migrate->vma->vm_mm, migrate->start, migrate->end,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 4/6] nouveau/svm: use the new migration invalidation
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (2 preceding siblings ...)
  2020-07-23 22:30 ` [PATCH v4 3/6] mm/notifier: add migration invalidation type Ralph Campbell
@ 2020-07-23 22:30 ` Ralph Campbell
  2020-07-23 22:30 ` [PATCH v4 5/6] mm/hmm/test: " Ralph Campbell
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:30 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

Use the new MMU_NOTIFY_MIGRATE event to skip GPU MMU invalidations of
device private memory and handle the invalidation in the driver as part
of migrating device private memory.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 15 ++++++++++++---
 drivers/gpu/drm/nouveau/nouveau_svm.c  | 21 +++++++++------------
 drivers/gpu/drm/nouveau/nouveau_svm.h  | 13 ++++++++++++-
 3 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 78b9e3c2a5b3..511fe04cd680 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -140,6 +140,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
 {
 	struct device *dev = drm->dev->dev;
 	struct page *dpage, *spage;
+	struct nouveau_svmm *svmm;
 
 	spage = migrate_pfn_to_page(args->src[0]);
 	if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE))
@@ -154,14 +155,19 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
 	if (dma_mapping_error(dev, *dma_addr))
 		goto error_free_page;
 
+	svmm = spage->zone_device_data;
+	mutex_lock(&svmm->mutex);
+	nouveau_svmm_invalidate(svmm, args->start, args->end);
 	if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
 			NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
 		goto error_dma_unmap;
+	mutex_unlock(&svmm->mutex);
 
 	args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
 	return 0;
 
 error_dma_unmap:
+	mutex_unlock(&svmm->mutex);
 	dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
 error_free_page:
 	__free_page(dpage);
@@ -531,7 +537,8 @@ nouveau_dmem_init(struct nouveau_drm *drm)
 }
 
 static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
-		unsigned long src, dma_addr_t *dma_addr, u64 *pfn)
+		struct nouveau_svmm *svmm, unsigned long src,
+		dma_addr_t *dma_addr, u64 *pfn)
 {
 	struct device *dev = drm->dev->dev;
 	struct page *dpage, *spage;
@@ -561,6 +568,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
 			goto out_free_page;
 	}
 
+	dpage->zone_device_data = svmm;
 	*pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
 		((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
 	if (src & MIGRATE_PFN_WRITE)
@@ -584,8 +592,8 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm,
 	unsigned long addr = args->start, nr_dma = 0, i;
 
 	for (i = 0; addr < args->end; i++) {
-		args->dst[i] = nouveau_dmem_migrate_copy_one(drm, args->src[i],
-				dma_addrs + nr_dma, pfns + i);
+		args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm,
+				args->src[i], dma_addrs + nr_dma, pfns + i);
 		if (!dma_mapping_error(drm->dev->dev, dma_addrs[nr_dma]))
 			nr_dma++;
 		addr += PAGE_SIZE;
@@ -616,6 +624,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
 	struct migrate_vma args = {
 		.vma		= vma,
 		.start		= start,
+		.pgmap_owner	= drm->dev,
 		.flags		= MIGRATE_VMA_SELECT_SYSTEM,
 	};
 	unsigned long i;
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index c5f8ca6fb2e3..67b068ac57b2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -93,17 +93,6 @@ nouveau_ivmm_find(struct nouveau_svm *svm, u64 inst)
 	return NULL;
 }
 
-struct nouveau_svmm {
-	struct mmu_notifier notifier;
-	struct nouveau_vmm *vmm;
-	struct {
-		unsigned long start;
-		unsigned long limit;
-	} unmanaged;
-
-	struct mutex mutex;
-};
-
 #define SVMM_DBG(s,f,a...)                                                     \
 	NV_DEBUG((s)->vmm->cli->drm, "svm-%p: "f"\n", (s), ##a)
 #define SVMM_ERR(s,f,a...)                                                     \
@@ -246,7 +235,7 @@ nouveau_svmm_join(struct nouveau_svmm *svmm, u64 inst)
 }
 
 /* Invalidate SVMM address-range on GPU. */
-static void
+void
 nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit)
 {
 	if (limit > start) {
@@ -279,6 +268,14 @@ nouveau_svmm_invalidate_range_start(struct mmu_notifier *mn,
 	if (unlikely(!svmm->vmm))
 		goto out;
 
+	/*
+	 * Ignore invalidation callbacks for device private pages since
+	 * the invalidation is handled as part of the migration process.
+	 */
+	if (update->event == MMU_NOTIFY_MIGRATE &&
+	    update->migrate_pgmap_owner == svmm->vmm->cli->drm->dev)
+		goto out;
+
 	if (limit > svmm->unmanaged.start && start < svmm->unmanaged.limit) {
 		if (start < svmm->unmanaged.start) {
 			nouveau_svmm_invalidate(svmm, start,
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.h b/drivers/gpu/drm/nouveau/nouveau_svm.h
index f0fcd1b72e8b..e7d63d7f0c2d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.h
@@ -1,11 +1,21 @@
 #ifndef __NOUVEAU_SVM_H__
 #define __NOUVEAU_SVM_H__
 #include <nvif/os.h>
+#include <linux/mmu_notifier.h>
 struct drm_device;
 struct drm_file;
 struct nouveau_drm;
 
-struct nouveau_svmm;
+struct nouveau_svmm {
+	struct mmu_notifier notifier;
+	struct nouveau_vmm *vmm;
+	struct {
+		unsigned long start;
+		unsigned long limit;
+	} unmanaged;
+
+	struct mutex mutex;
+};
 
 #if IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM)
 void nouveau_svm_init(struct nouveau_drm *);
@@ -19,6 +29,7 @@ int nouveau_svmm_join(struct nouveau_svmm *, u64 inst);
 void nouveau_svmm_part(struct nouveau_svmm *, u64 inst);
 int nouveau_svmm_bind(struct drm_device *, void *, struct drm_file *);
 
+void nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit);
 u64 *nouveau_pfns_alloc(unsigned long npages);
 void nouveau_pfns_free(u64 *pfns);
 void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm,
-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 5/6] mm/hmm/test: use the new migration invalidation
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (3 preceding siblings ...)
  2020-07-23 22:30 ` [PATCH v4 4/6] nouveau/svm: use the new migration invalidation Ralph Campbell
@ 2020-07-23 22:30 ` Ralph Campbell
  2020-07-23 22:30 ` [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages() Ralph Campbell
  2020-07-28 19:22 ` [PATCH v4 0/6] mm/migrate: avoid device private invalidations Jason Gunthorpe
  6 siblings, 0 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:30 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

Use the new MMU_NOTIFY_MIGRATE event to skip MMU invalidations of device
private memory and handle the invalidation in the driver as part of
migrating device private memory.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 lib/test_hmm.c                         | 30 +++++++++++++++-----------
 tools/testing/selftests/vm/hmm-tests.c | 18 ++++++++++++----
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index e78a1414f58e..e7dc3de355b7 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -214,6 +214,14 @@ static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni,
 {
 	struct dmirror *dmirror = container_of(mni, struct dmirror, notifier);
 
+	/*
+	 * Ignore invalidation callbacks for device private pages since
+	 * the invalidation is handled as part of the migration process.
+	 */
+	if (range->event == MMU_NOTIFY_MIGRATE &&
+	    range->migrate_pgmap_owner == dmirror->mdevice)
+		return true;
+
 	if (mmu_notifier_range_blockable(range))
 		mutex_lock(&dmirror->mutex);
 	else if (!mutex_trylock(&dmirror->mutex))
@@ -693,7 +701,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
 		args.dst = dst_pfns;
 		args.start = addr;
 		args.end = next;
-		args.pgmap_owner = NULL;
+		args.pgmap_owner = dmirror->mdevice;
 		args.flags = MIGRATE_VMA_SELECT_SYSTEM;
 		ret = migrate_vma_setup(&args);
 		if (ret)
@@ -983,7 +991,7 @@ static void dmirror_devmem_free(struct page *page)
 }
 
 static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
-						struct dmirror_device *mdevice)
+						      struct dmirror *dmirror)
 {
 	const unsigned long *src = args->src;
 	unsigned long *dst = args->dst;
@@ -1005,6 +1013,7 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
 			continue;
 
 		lock_page(dpage);
+		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
 		copy_highpage(dpage, spage);
 		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
 		if (*src & MIGRATE_PFN_WRITE)
@@ -1013,15 +1022,6 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
 	return 0;
 }
 
-static void dmirror_devmem_fault_finalize_and_map(struct migrate_vma *args,
-						  struct dmirror *dmirror)
-{
-	/* Invalidate the device's page table mapping. */
-	mutex_lock(&dmirror->mutex);
-	dmirror_do_update(dmirror, args->start, args->end);
-	mutex_unlock(&dmirror->mutex);
-}
-
 static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 {
 	struct migrate_vma args;
@@ -1051,11 +1051,15 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 	if (migrate_vma_setup(&args))
 		return VM_FAULT_SIGBUS;
 
-	ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror->mdevice);
+	ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror);
 	if (ret)
 		return ret;
 	migrate_vma_pages(&args);
-	dmirror_devmem_fault_finalize_and_map(&args, dmirror);
+	/*
+	 * No device finalize step is needed since
+	 * dmirror_devmem_fault_alloc_and_copy() will have already
+	 * invalidated the device page table.
+	 */
 	migrate_vma_finalize(&args);
 	return 0;
 }
diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index b533dd08da1d..91d38a29956b 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -881,8 +881,9 @@ TEST_F(hmm, migrate)
 }
 
 /*
- * Migrate anonymous memory to device private memory and fault it back to system
- * memory.
+ * Migrate anonymous memory to device private memory and fault some of it back
+ * to system memory, then try migrating the resulting mix of system and device
+ * private memory to the device.
  */
 TEST_F(hmm, migrate_fault)
 {
@@ -924,8 +925,17 @@ TEST_F(hmm, migrate_fault)
 	for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
 		ASSERT_EQ(ptr[i], i);
 
-	/* Fault pages back to system memory and check them. */
-	for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
+	/* Fault half the pages back to system memory and check them. */
+	for (i = 0, ptr = buffer->ptr; i < size / (2 * sizeof(*ptr)); ++i)
+		ASSERT_EQ(ptr[i], i);
+
+	/* Migrate memory to the device again. */
+	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+	ASSERT_EQ(ret, 0);
+	ASSERT_EQ(buffer->cpages, npages);
+
+	/* Check what the device read. */
+	for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
 		ASSERT_EQ(ptr[i], i);
 
 	hmm_buffer_free(buffer);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages()
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (4 preceding siblings ...)
  2020-07-23 22:30 ` [PATCH v4 5/6] mm/hmm/test: " Ralph Campbell
@ 2020-07-23 22:30 ` Ralph Campbell
  2020-07-28 19:19   ` Jason Gunthorpe
  2020-07-28 19:22 ` [PATCH v4 0/6] mm/migrate: avoid device private invalidations Jason Gunthorpe
  6 siblings, 1 reply; 14+ messages in thread
From: Ralph Campbell @ 2020-07-23 22:30 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

When migrating the special zero page, migrate_vma_pages() calls
mmu_notifier_invalidate_range_start() before replacing the zero page
PFN in the CPU page tables. This is unnecessary since the range was
invalidated in migrate_vma_setup() and the page table entry is checked
to be sure it hasn't changed between migrate_vma_setup() and
migrate_vma_pages(). Therefore, remove the redundant invalidation.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1;
	t\x1595543388; bh=Y8qnSzc0udfc4m+P6cziRmRJk5XEGcgDU5ImloLrCuE=;
	h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer:
	 In-Reply-To:References:MIME-Version:X-NVConfidentiality:
	 Content-Transfer-Encoding:Content-Type;
	b=rIyLVqKU1Kxf67Qz9TIZ2f1lAaP7YxvFgCZR8v0Vw1fwq7aUbMbgfzcF0bd6+XzG6
	 ZHtBRMp/Zu/ZLGRxP6lBqZo4wHMHbuW3fXOvPCrYTD5YsCCLv+Ao4RmreWyec2wBTk
	 uBo3ZylCHJ0ckD85BcjQQxpXyY99cBsvIomZw9wzg6QGm7Ksbq6d+UKSkb0L04d6v8
	 fiRvvLNq3kCbPzrifaBTj3klQcVcKXz34km0XUoRQlSaftlq4BJWopBPX8U7gQtstO
	 OvA7Al9t87sCpKjSnqjE7N1jThU0KzjPrCxJiEHq/0Vf4sqeUA42bOkc+bk/CV1ZSF
	 n9jm36j4kRTUg=
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 mm/migrate.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 96e1f41a991e..36076ba2f51a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2877,9 +2877,7 @@ void migrate_vma_pages(struct migrate_vma *migrate)
 {
 	const unsigned long npages = migrate->npages;
 	const unsigned long start = migrate->start;
-	struct mmu_notifier_range range;
 	unsigned long addr, i;
-	bool notified = false;
 
 	for (i = 0, addr = start; i < npages; addr += PAGE_SIZE, i++) {
 		struct page *newpage = migrate_pfn_to_page(migrate->dst[i]);
@@ -2895,16 +2893,6 @@ void migrate_vma_pages(struct migrate_vma *migrate)
 		if (!page) {
 			if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE))
 				continue;
-			if (!notified) {
-				notified = true;
-
-				mmu_notifier_range_init(&range,
-							MMU_NOTIFY_CLEAR, 0,
-							NULL,
-							migrate->vma->vm_mm,
-							addr, migrate->end);
-				mmu_notifier_invalidate_range_start(&range);
-			}
 			migrate_vma_insert_page(migrate, addr, newpage,
 						&migrate->src[i],
 						&migrate->dst[i]);
@@ -2937,14 +2925,6 @@ void migrate_vma_pages(struct migrate_vma *migrate)
 		if (r != MIGRATEPAGE_SUCCESS)
 			migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
 	}
-
-	/*
-	 * No need to double call mmu_notifier->invalidate_range() callback as
-	 * the above ptep_clear_flush_notify() inside migrate_vma_insert_page()
-	 * did already call it.
-	 */
-	if (notified)
-		mmu_notifier_invalidate_range_only_end(&range);
 }
 EXPORT_SYMBOL(migrate_vma_pages);
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/6] mm/notifier: add migration invalidation type
  2020-07-23 22:30 ` [PATCH v4 3/6] mm/notifier: add migration invalidation type Ralph Campbell
@ 2020-07-28 19:15   ` Jason Gunthorpe
  2020-07-28 20:57     ` Ralph Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2020-07-28 19:15 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Thu, Jul 23, 2020 at 03:30:01PM -0700, Ralph Campbell wrote:
>  static inline int mm_has_notifiers(struct mm_struct *mm)
> @@ -513,6 +519,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
>  	range->start = start;
>  	range->end = end;
>  	range->flags = flags;
> +	range->migrate_pgmap_owner = NULL;
>  }

Since this function is commonly called and nobody should read
migrate_pgmap_owner unless MMU_NOTIFY_MIGRATE is set as the event,
this assignment can be dropped.

Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages()
  2020-07-23 22:30 ` [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages() Ralph Campbell
@ 2020-07-28 19:19   ` Jason Gunthorpe
  2020-07-28 22:04     ` Ralph Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2020-07-28 19:19 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Thu, Jul 23, 2020 at 03:30:04PM -0700, Ralph Campbell wrote:
> When migrating the special zero page, migrate_vma_pages() calls
> mmu_notifier_invalidate_range_start() before replacing the zero page
> PFN in the CPU page tables. This is unnecessary since the range was
> invalidated in migrate_vma_setup() and the page table entry is checked
> to be sure it hasn't changed between migrate_vma_setup() and
> migrate_vma_pages(). Therefore, remove the redundant invalidation.

I don't follow this logic, the purpose of the invalidation is also to
clear out anything that may be mirroring this VA, and "the page hasn't
changed" doesn't seem to rule out that case?

I'm also not sure I follow where the zero page came from?

Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/6] mm/migrate: avoid device private invalidations
  2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (5 preceding siblings ...)
  2020-07-23 22:30 ` [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages() Ralph Campbell
@ 2020-07-28 19:22 ` Jason Gunthorpe
  6 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2020-07-28 19:22 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Thu, Jul 23, 2020 at 03:29:58PM -0700, Ralph Campbell wrote:
> The goal for this series is to avoid device private memory TLB
> invalidations when migrating a range of addresses from system
> memory to device private memory and some of those pages have already
> been migrated. The approach taken is to introduce a new mmu notifier
> invalidation event type and use that in the device driver to skip
> invalidation callbacks from migrate_vma_setup(). The device driver is
> also then expected to handle device MMU invalidations as part of the
> migrate_vma_setup(), migrate_vma_pages(), migrate_vma_finalize() process.
> Note that this is opt-in. A device driver can simply invalidate its MMU
> in the mmu notifier callback and not handle MMU invalidations in the
> migration sequence.
> 
> This series is based on Jason Gunthorpe's HMM tree (linux-5.8.0-rc4).
> 
> Also, this replaces the need for the following two patches I sent:
> ("mm: fix migrate_vma_setup() src_owner and normal pages")
> https://lore.kernel.org/linux-mm/20200622222008.9971-1-rcampbell@nvidia.com
> ("nouveau: fix mixed normal and device private page migration")
> https://lore.kernel.org/lkml/20200622233854.10889-3-rcampbell@nvidia.com
> 
> Changes in v4:
> Added reviewed-by from Bharata B Rao.
> Removed dead code checking for source device private page in lib/test_hmm.c
>   dmirror_migrate_alloc_and_copy() since the source filter flag guarantees
>   that.
> Added patch 6 to remove a redundant invalidation in migrate_vma_pages().
> 
> Changes in v3:
> Changed the direction field "dir" to a "flags" field and renamed
>   src_owner to pgmap_owner.
> Fixed a locking issue in nouveau for the migration invalidation.
> Added a HMM selftest test case to exercise the HMM test driver
>   invalidation changes.
> Removed reviewed-by Bharata B Rao since this version is moderately
>   changed.
> 
> Changes in v2:
> Rebase to Jason Gunthorpe's HMM tree.
> Added reviewed-by from Bharata B Rao.
> Rename the mmu_notifier_range::data field to migrate_pgmap_owner as
>   suggested by Jason Gunthorpe.
> 
> Ralph Campbell (6):
>   nouveau: fix storing invalid ptes
>   mm/migrate: add a flags parameter to migrate_vma
>   mm/notifier: add migration invalidation type
>   nouveau/svm: use the new migration invalidation
>   mm/hmm/test: use the new migration invalidation

Applied to the hmm tree with the modification I noted, I think all the
comments in the past versions were addressed. I will accumulate more
Reviews if any come.

>   mm/migrate: remove range invalidation in migrate_vma_pages()

Let's have some discussion on this new patch please, at least I don't
follow it yet.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/6] mm/notifier: add migration invalidation type
  2020-07-28 19:15   ` Jason Gunthorpe
@ 2020-07-28 20:57     ` Ralph Campbell
  0 siblings, 0 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-28 20:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/28/20 12:15 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2020 at 03:30:01PM -0700, Ralph Campbell wrote:
>>   static inline int mm_has_notifiers(struct mm_struct *mm)
>> @@ -513,6 +519,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
>>   	range->start = start;
>>   	range->end = end;
>>   	range->flags = flags;
>> +	range->migrate_pgmap_owner = NULL;
>>   }
> 
> Since this function is commonly called and nobody should read
> migrate_pgmap_owner unless MMU_NOTIFY_MIGRATE is set as the event,
> this assignment can be dropped.
> 
> Jason

I agree.
Acked-by: Ralph Campbell <rcampbell@nvidia.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages()
  2020-07-28 19:19   ` Jason Gunthorpe
@ 2020-07-28 22:04     ` Ralph Campbell
  2020-07-31 19:15       ` Jason Gunthorpe
  0 siblings, 1 reply; 14+ messages in thread
From: Ralph Campbell @ 2020-07-28 22:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/28/20 12:19 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2020 at 03:30:04PM -0700, Ralph Campbell wrote:
>> When migrating the special zero page, migrate_vma_pages() calls
>> mmu_notifier_invalidate_range_start() before replacing the zero page
>> PFN in the CPU page tables. This is unnecessary since the range was
>> invalidated in migrate_vma_setup() and the page table entry is checked
>> to be sure it hasn't changed between migrate_vma_setup() and
>> migrate_vma_pages(). Therefore, remove the redundant invalidation.
> 
> I don't follow this logic, the purpose of the invalidation is also to
> clear out anything that may be mirroring this VA, and "the page hasn't
> changed" doesn't seem to rule out that case?
> 
> I'm also not sure I follow where the zero page came from?

The zero page comes from an anonymous private VMA that is read-only
and the user level CPU process tries to read the page data (or any
other read page fault).

> Jason
> 

The overall migration process is:

mmap_read_lock()

migrate_vma_setup()
       // invalidates range, locks/isolates pages, puts migration entry in page table

<driver allocates destination pages and copies source to dest>

migrate_vma_pages()
       // moves source struct page info to destination struct page info.
       // clears migration flag for pages that can't be migrated.

<driver updates device page tables for pages still migrating, rollback pages not migrating>

migrate_vma_finalize()
       // replaces migration page table entry with destination page PFN.

mmap_read_unlock()

Since the address range is invalidated in the migrate_vma_setup() stage,
and the page is isolated from the LRU cache, locked, unmapped, and the page table
holds a migration entry (so the page can't be faulted and the CPU page table set
valid again), and there are no extra page references (pins), the page
"should not be modified".

For pte_none()/is_zero_pfn() entries, migrate_vma_setup() leaves the
pte_none()/is_zero_pfn() entry in place but does still call
mmu_notifier_invalidate_range_start() for the whole range being migrated.

In the migrate_vma_pages() step, the pte page table is locked and the
pte entry checked to be sure it is still pte_none/is_zero_pfn(). If not,
the new page isn't inserted. If it is still none/zero, the new device private
struct page is inserted into the page table, replacing the pte_none()/is_zero_pfn()
page table entry. The secondary MMUs were already invalidated in the migrate_vma_setup()
step and a pte_none() or zero page can't be modified so the only invalidation needed
is the CPU TLB(s) for clearing the special zero page PTE entry.

Two devices could both try to do the migrate_vma_*() sequence and proceed in parallel up
to the migrate_vma_pages() step and try to install a new page for the hole/zero PTE but
only one will win and the other fail.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages()
  2020-07-28 22:04     ` Ralph Campbell
@ 2020-07-31 19:15       ` Jason Gunthorpe
  2020-07-31 19:31         ` Ralph Campbell
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2020-07-31 19:15 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Tue, Jul 28, 2020 at 03:04:07PM -0700, Ralph Campbell wrote:
> 
> On 7/28/20 12:19 PM, Jason Gunthorpe wrote:
> > On Thu, Jul 23, 2020 at 03:30:04PM -0700, Ralph Campbell wrote:
> > > When migrating the special zero page, migrate_vma_pages() calls
> > > mmu_notifier_invalidate_range_start() before replacing the zero page
> > > PFN in the CPU page tables. This is unnecessary since the range was
> > > invalidated in migrate_vma_setup() and the page table entry is checked
> > > to be sure it hasn't changed between migrate_vma_setup() and
> > > migrate_vma_pages(). Therefore, remove the redundant invalidation.
> > 
> > I don't follow this logic, the purpose of the invalidation is also to
> > clear out anything that may be mirroring this VA, and "the page hasn't
> > changed" doesn't seem to rule out that case?
> > 
> > I'm also not sure I follow where the zero page came from?
> 
> The zero page comes from an anonymous private VMA that is read-only
> and the user level CPU process tries to read the page data (or any
> other read page fault).
> 
> > Jason
> > 
> 
> The overall migration process is:
> 
> mmap_read_lock()
> 
> migrate_vma_setup()
>       // invalidates range, locks/isolates pages, puts migration entry in page table
> 
> <driver allocates destination pages and copies source to dest>
> 
> migrate_vma_pages()
>       // moves source struct page info to destination struct page info.
>       // clears migration flag for pages that can't be migrated.
> 
> <driver updates device page tables for pages still migrating, rollback pages not migrating>
> 
> migrate_vma_finalize()
>       // replaces migration page table entry with destination page PFN.
> 
> mmap_read_unlock()
> 
> Since the address range is invalidated in the migrate_vma_setup() stage,
> and the page is isolated from the LRU cache, locked, unmapped, and the page table
> holds a migration entry (so the page can't be faulted and the CPU page table set
> valid again), and there are no extra page references (pins), the page
> "should not be modified".

That is the physical page though, it doesn't prove nobody else is
reading the PTE.
 
> For pte_none()/is_zero_pfn() entries, migrate_vma_setup() leaves the
> pte_none()/is_zero_pfn() entry in place but does still call
> mmu_notifier_invalidate_range_start() for the whole range being migrated.

Ok..

> In the migrate_vma_pages() step, the pte page table is locked and the
> pte entry checked to be sure it is still pte_none/is_zero_pfn(). If not,
> the new page isn't inserted. If it is still none/zero, the new device private
> struct page is inserted into the page table, replacing the pte_none()/is_zero_pfn()
> page table entry. The secondary MMUs were already invalidated in the migrate_vma_setup()
> step and a pte_none() or zero page can't be modified so the only invalidation needed
> is the CPU TLB(s) for clearing the special zero page PTE entry.

No, the secondary MMU was invalidated but the invalidation start/end
range was exited. That means a secondary MMU is immeidately able to
reload the zero page into its MMU cache.

When this code replaces the PTE that has a zero page it also has to
invalidate again so that secondary MMU's are guaranteed to pick up the
new PTE value.

So, I still don't understand how this is safe?

Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages()
  2020-07-31 19:15       ` Jason Gunthorpe
@ 2020-07-31 19:31         ` Ralph Campbell
  0 siblings, 0 replies; 14+ messages in thread
From: Ralph Campbell @ 2020-07-31 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/31/20 12:15 PM, Jason Gunthorpe wrote:
> On Tue, Jul 28, 2020 at 03:04:07PM -0700, Ralph Campbell wrote:
>>
>> On 7/28/20 12:19 PM, Jason Gunthorpe wrote:
>>> On Thu, Jul 23, 2020 at 03:30:04PM -0700, Ralph Campbell wrote:
>>>> When migrating the special zero page, migrate_vma_pages() calls
>>>> mmu_notifier_invalidate_range_start() before replacing the zero page
>>>> PFN in the CPU page tables. This is unnecessary since the range was
>>>> invalidated in migrate_vma_setup() and the page table entry is checked
>>>> to be sure it hasn't changed between migrate_vma_setup() and
>>>> migrate_vma_pages(). Therefore, remove the redundant invalidation.
>>>
>>> I don't follow this logic, the purpose of the invalidation is also to
>>> clear out anything that may be mirroring this VA, and "the page hasn't
>>> changed" doesn't seem to rule out that case?
>>>
>>> I'm also not sure I follow where the zero page came from?
>>
>> The zero page comes from an anonymous private VMA that is read-only
>> and the user level CPU process tries to read the page data (or any
>> other read page fault).
>>
>>> Jason
>>>
>>
>> The overall migration process is:
>>
>> mmap_read_lock()
>>
>> migrate_vma_setup()
>>        // invalidates range, locks/isolates pages, puts migration entry in page table
>>
>> <driver allocates destination pages and copies source to dest>
>>
>> migrate_vma_pages()
>>        // moves source struct page info to destination struct page info.
>>        // clears migration flag for pages that can't be migrated.
>>
>> <driver updates device page tables for pages still migrating, rollback pages not migrating>
>>
>> migrate_vma_finalize()
>>        // replaces migration page table entry with destination page PFN.
>>
>> mmap_read_unlock()
>>
>> Since the address range is invalidated in the migrate_vma_setup() stage,
>> and the page is isolated from the LRU cache, locked, unmapped, and the page table
>> holds a migration entry (so the page can't be faulted and the CPU page table set
>> valid again), and there are no extra page references (pins), the page
>> "should not be modified".
> 
> That is the physical page though, it doesn't prove nobody else is
> reading the PTE.
>   
>> For pte_none()/is_zero_pfn() entries, migrate_vma_setup() leaves the
>> pte_none()/is_zero_pfn() entry in place but does still call
>> mmu_notifier_invalidate_range_start() for the whole range being migrated.
> 
> Ok..
> 
>> In the migrate_vma_pages() step, the pte page table is locked and the
>> pte entry checked to be sure it is still pte_none/is_zero_pfn(). If not,
>> the new page isn't inserted. If it is still none/zero, the new device private
>> struct page is inserted into the page table, replacing the pte_none()/is_zero_pfn()
>> page table entry. The secondary MMUs were already invalidated in the migrate_vma_setup()
>> step and a pte_none() or zero page can't be modified so the only invalidation needed
>> is the CPU TLB(s) for clearing the special zero page PTE entry.
> 
> No, the secondary MMU was invalidated but the invalidation start/end
> range was exited. That means a secondary MMU is immeidately able to
> reload the zero page into its MMU cache.
> 
> When this code replaces the PTE that has a zero page it also has to
> invalidate again so that secondary MMU's are guaranteed to pick up the
> new PTE value.
> 
> So, I still don't understand how this is safe?
> 
> Jason

Oops, you are right of course. I was only thinking of the device doing the migration
and forgetting about a second device faulting on the same page.
You can drop patch from the series.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 22:29 [PATCH v4 0/6] mm/migrate: avoid device private invalidations Ralph Campbell
2020-07-23 22:29 ` [PATCH v4 1/6] nouveau: fix storing invalid ptes Ralph Campbell
2020-07-23 22:30 ` [PATCH v4 2/6] mm/migrate: add a flags parameter to migrate_vma Ralph Campbell
2020-07-23 22:30 ` [PATCH v4 3/6] mm/notifier: add migration invalidation type Ralph Campbell
2020-07-28 19:15   ` Jason Gunthorpe
2020-07-28 20:57     ` Ralph Campbell
2020-07-23 22:30 ` [PATCH v4 4/6] nouveau/svm: use the new migration invalidation Ralph Campbell
2020-07-23 22:30 ` [PATCH v4 5/6] mm/hmm/test: " Ralph Campbell
2020-07-23 22:30 ` [PATCH v4 6/6] mm/migrate: remove range invalidation in migrate_vma_pages() Ralph Campbell
2020-07-28 19:19   ` Jason Gunthorpe
2020-07-28 22:04     ` Ralph Campbell
2020-07-31 19:15       ` Jason Gunthorpe
2020-07-31 19:31         ` Ralph Campbell
2020-07-28 19:22 ` [PATCH v4 0/6] mm/migrate: avoid device private invalidations Jason Gunthorpe

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git