Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/5] mm/migrate: avoid device private invalidations
@ 2020-07-13 17:21 Ralph Campbell
  2020-07-13 17:21 ` [PATCH v2 1/5] nouveau: fix storing invalid ptes Ralph Campbell
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-13 17:21 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

The goal for this series is to avoid device private memory TLB
invalidations when migrating a range of addresses from system
memory to device private memory and some of those pages have already
been migrated. The approach taken is to introduce a new mmu notifier
invalidation event type and use that in the device driver to skip
invalidation callbacks from migrate_vma_setup(). The device driver is
also then expected to handle device MMU invalidations as part of the
migrate_vma_setup(), migrate_vma_pages(), migrate_vma_finalize() process.
Note that this is opt-in. A device driver can simply invalidate its MMU
in the mmu notifier callback and not handle MMU invalidations in the
migration sequence.

This series is based on Jason Gunthorpe's HMM tree (linux-5.8.0-rc4).

Also, this replaces the need for the following two patches I sent:
("mm: fix migrate_vma_setup() src_owner and normal pages")
https://lore.kernel.org/linux-mm/20200622222008.9971-1-rcampbell@nvidia.com
("nouveau: fix mixed normal and device private page migration")
https://lore.kernel.org/lkml/20200622233854.10889-3-rcampbell@nvidia.com

Changes in v2:
Rebase to Jason Gunthorpe's HMM tree.
Added reviewed-by from Bharata B Rao.
Rename the mmu_notifier_range::data field to migrate_pgmap_owner as
  suggested by Jason Gunthorpe.

Ralph Campbell (5):
  nouveau: fix storing invalid ptes
  mm/migrate: add a direction parameter to migrate_vma
  mm/notifier: add migration invalidation type
  nouveau/svm: use the new migration invalidation
  mm/hmm/test: use the new migration invalidation

 arch/powerpc/kvm/book3s_hv_uvmem.c            |  2 ++
 drivers/gpu/drm/nouveau/nouveau_dmem.c        | 13 ++++++--
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 10 +++++-
 drivers/gpu/drm/nouveau/nouveau_svm.h         |  1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    | 13 +++++---
 include/linux/migrate.h                       | 12 +++++--
 include/linux/mmu_notifier.h                  |  7 ++++
 lib/test_hmm.c                                | 33 +++++++++++--------
 mm/migrate.c                                  | 13 ++++++--
 9 files changed, 77 insertions(+), 27 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/5] nouveau: fix storing invalid ptes
  2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
@ 2020-07-13 17:21 ` Ralph Campbell
  2020-07-13 17:21 ` [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma Ralph Campbell
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-13 17:21 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

When migrating a range of system memory to device private memory, some
of the pages in the address range may not be migrating. In this case,
the non migrating pages won't have a new GPU MMU entry to store but
the nvif_object_ioctl() NVIF_VMM_V0_PFNMAP method doesn't check the input
and stores a bad valid GPU page table entry.
Fix this by skipping the invalid input PTEs when updating the GPU page
tables.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
index ed37fddd063f..7eabe9fe0d2b 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmmgp100.c
@@ -79,8 +79,12 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 	dma_addr_t addr;
 
 	nvkm_kmap(pt->memory);
-	while (ptes--) {
+	for (; ptes; ptes--, map->pfn++) {
 		u64 data = 0;
+
+		if (!(*map->pfn & NVKM_VMM_PFN_V))
+			continue;
+
 		if (!(*map->pfn & NVKM_VMM_PFN_W))
 			data |= BIT_ULL(6); /* RO. */
 
@@ -100,7 +104,6 @@ gp100_vmm_pgt_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 		}
 
 		VMM_WO064(pt, vmm, ptei++ * 8, data);
-		map->pfn++;
 	}
 	nvkm_done(pt->memory);
 }
@@ -310,9 +313,12 @@ gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 	dma_addr_t addr;
 
 	nvkm_kmap(pt->memory);
-	while (ptes--) {
+	for (; ptes; ptes--, map->pfn++) {
 		u64 data = 0;
 
+		if (!(*map->pfn & NVKM_VMM_PFN_V))
+			continue;
+
 		if (!(*map->pfn & NVKM_VMM_PFN_W))
 			data |= BIT_ULL(6); /* RO. */
 
@@ -332,7 +338,6 @@ gp100_vmm_pd0_pfn(struct nvkm_vmm *vmm, struct nvkm_mmu_pt *pt,
 		}
 
 		VMM_WO064(pt, vmm, ptei++ * 16, data);
-		map->pfn++;
 	}
 	nvkm_done(pt->memory);
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
  2020-07-13 17:21 ` [PATCH v2 1/5] nouveau: fix storing invalid ptes Ralph Campbell
@ 2020-07-13 17:21 ` Ralph Campbell
  2020-07-20 18:36   ` Jason Gunthorpe
  2020-07-13 17:21 ` [PATCH v2 3/5] mm/notifier: add migration invalidation type Ralph Campbell
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Ralph Campbell @ 2020-07-13 17:21 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

The src_owner field in struct migrate_vma is being used for two purposes,
it implies the direction of the migration and it identifies device private
pages owned by the caller. Split this into separate parameters so the
src_owner field can be used just to identify device private pages owned
by the caller of migrate_vma_setup().

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c     |  2 ++
 drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 ++
 include/linux/migrate.h                | 12 +++++++++---
 lib/test_hmm.c                         |  2 ++
 mm/migrate.c                           |  5 +++--
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 09d8119024db..acbf14cd2d72 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -400,6 +400,7 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	mig.end = end;
 	mig.src = &src_pfn;
 	mig.dst = &dst_pfn;
+	mig.dir = MIGRATE_VMA_FROM_SYSTEM;
 
 	/*
 	 * We come here with mmap_lock write lock held just for
@@ -578,6 +579,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
 	mig.src = &src_pfn;
 	mig.dst = &dst_pfn;
 	mig.src_owner = &kvmppc_uvmem_pgmap;
+	mig.dir = MIGRATE_VMA_FROM_DEVICE_PRIVATE;
 
 	mutex_lock(&kvm->arch.uvmem_lock);
 	/* The requested page is already paged-out, nothing to do */
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index e5c230d9ae24..e5c83b8ee82e 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -183,6 +183,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
 		.src		= &src,
 		.dst		= &dst,
 		.src_owner	= drm->dev,
+		.dir		= MIGRATE_VMA_FROM_DEVICE_PRIVATE,
 	};
 
 	/*
@@ -615,6 +616,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
 	struct migrate_vma args = {
 		.vma		= vma,
 		.start		= start,
+		.dir		= MIGRATE_VMA_FROM_SYSTEM,
 	};
 	unsigned long i;
 	u64 *pfns;
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cbf03dd..620f2235d7d4 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
 	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
 }
 
+enum migrate_vma_direction {
+	MIGRATE_VMA_FROM_SYSTEM,
+	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
+};
+
 struct migrate_vma {
 	struct vm_area_struct	*vma;
 	/*
@@ -199,11 +204,12 @@ struct migrate_vma {
 
 	/*
 	 * Set to the owner value also stored in page->pgmap->owner for
-	 * migrating out of device private memory.  If set only device
-	 * private pages with this owner are migrated.  If not set
-	 * device private pages are not migrated at all.
+	 * migrating device private memory. The direction also needs to
+	 * be set to MIGRATE_VMA_FROM_DEVICE_PRIVATE.
 	 */
 	void			*src_owner;
+
+	enum migrate_vma_direction dir;
 };
 
 int migrate_vma_setup(struct migrate_vma *args);
diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 9aa577afc269..1bd60cfb5a25 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -703,6 +703,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
 		args.start = addr;
 		args.end = next;
 		args.src_owner = NULL;
+		args.dir = MIGRATE_VMA_FROM_SYSTEM;
 		ret = migrate_vma_setup(&args);
 		if (ret)
 			goto out;
@@ -1054,6 +1055,7 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 	args.src = &src_pfns;
 	args.dst = &dst_pfns;
 	args.src_owner = dmirror->mdevice;
+	args.dir = MIGRATE_VMA_FROM_DEVICE_PRIVATE;
 
 	if (migrate_vma_setup(&args))
 		return VM_FAULT_SIGBUS;
diff --git a/mm/migrate.c b/mm/migrate.c
index f37729673558..2bbc5c4c672e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2287,7 +2287,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				goto next;
 
 			page = device_private_entry_to_page(entry);
-			if (page->pgmap->owner != migrate->src_owner)
+			if (migrate->dir != MIGRATE_VMA_FROM_DEVICE_PRIVATE ||
+			    page->pgmap->owner != migrate->src_owner)
 				goto next;
 
 			mpfn = migrate_pfn(page_to_pfn(page)) |
@@ -2295,7 +2296,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 			if (is_write_device_private_entry(entry))
 				mpfn |= MIGRATE_PFN_WRITE;
 		} else {
-			if (migrate->src_owner)
+			if (migrate->dir != MIGRATE_VMA_FROM_SYSTEM)
 				goto next;
 			pfn = pte_pfn(pte);
 			if (is_zero_pfn(pfn)) {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] mm/notifier: add migration invalidation type
  2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
  2020-07-13 17:21 ` [PATCH v2 1/5] nouveau: fix storing invalid ptes Ralph Campbell
  2020-07-13 17:21 ` [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma Ralph Campbell
@ 2020-07-13 17:21 ` Ralph Campbell
  2020-07-20 18:40   ` Jason Gunthorpe
  2020-07-13 17:21 ` [PATCH v2 4/5] nouveau/svm: use the new migration invalidation Ralph Campbell
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Ralph Campbell @ 2020-07-13 17:21 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start()
which flushes all device private page mappings whether or not a page
is being migrated to/from device private memory. In order to not disrupt
device mappings that are not being migrated, shift the responsibility
for clearing device private mappings to the device driver and leave
CPU page table unmapping handled by migrate_vma_setup(). To support
this, the caller of migrate_vma_setup() should always set struct
migrate_vma::src_owner to a non NULL value that matches the device
private page->pgmap->owner. This value is then passed to the struct
mmu_notifier_range with a new event type which the driver's invalidation
function can use to avoid device MMU invalidations.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 include/linux/mmu_notifier.h | 7 +++++++
 mm/migrate.c                 | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index fc68f3570e19..1921fcf6be5b 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -38,6 +38,10 @@ struct mmu_interval_notifier;
  *
  * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
  * that the mm refcount is zero and the range is no longer accessible.
+ *
+ * @MMU_NOTIFY_MIGRATE: used during migrate_vma_collect() invalidate to signal
+ * a device driver to possibly ignore the invalidation if the
+ * migrate_pgmap_owner field matches the driver's device private pgmap owner.
  */
 enum mmu_notifier_event {
 	MMU_NOTIFY_UNMAP = 0,
@@ -46,6 +50,7 @@ enum mmu_notifier_event {
 	MMU_NOTIFY_PROTECTION_PAGE,
 	MMU_NOTIFY_SOFT_DIRTY,
 	MMU_NOTIFY_RELEASE,
+	MMU_NOTIFY_MIGRATE,
 };
 
 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
@@ -264,6 +269,7 @@ struct mmu_notifier_range {
 	unsigned long end;
 	unsigned flags;
 	enum mmu_notifier_event event;
+	void *migrate_pgmap_owner;
 };
 
 static inline int mm_has_notifiers(struct mm_struct *mm)
@@ -513,6 +519,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
 	range->start = start;
 	range->end = end;
 	range->flags = flags;
+	range->migrate_pgmap_owner = NULL;
 }
 
 #define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
diff --git a/mm/migrate.c b/mm/migrate.c
index 2bbc5c4c672e..9b3dcb81be5f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2391,8 +2391,14 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
 {
 	struct mmu_notifier_range range;
 
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL,
+	/*
+	 * Note that the src_owner is passed to the mmu notifier callback so
+	 * that the registered device driver can skip invalidating device
+	 * private page mappings that won't be migrated.
+	 */
+	mmu_notifier_range_init(&range, MMU_NOTIFY_MIGRATE, 0, migrate->vma,
 			migrate->vma->vm_mm, migrate->start, migrate->end);
+	range.migrate_pgmap_owner = migrate->src_owner;
 	mmu_notifier_invalidate_range_start(&range);
 
 	walk_page_range(migrate->vma->vm_mm, migrate->start, migrate->end,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] nouveau/svm: use the new migration invalidation
  2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (2 preceding siblings ...)
  2020-07-13 17:21 ` [PATCH v2 3/5] mm/notifier: add migration invalidation type Ralph Campbell
@ 2020-07-13 17:21 ` Ralph Campbell
  2020-07-13 17:21 ` [PATCH v2 5/5] mm/hmm/test: " Ralph Campbell
  2020-07-20 18:41 ` [PATCH v2 0/5] mm/migrate: avoid device private invalidations Jason Gunthorpe
  5 siblings, 0 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-13 17:21 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

Use the new MMU_NOTIFY_MIGRATE event to skip GPU MMU invalidations of
device private memory and handle the invalidation in the driver as part
of migrating device private memory.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 11 ++++++++---
 drivers/gpu/drm/nouveau/nouveau_svm.c  | 10 +++++++++-
 drivers/gpu/drm/nouveau/nouveau_svm.h  |  1 +
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index e5c83b8ee82e..8f2683ebd8c0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -154,6 +154,8 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
 	if (dma_mapping_error(dev, *dma_addr))
 		goto error_free_page;
 
+	nouveau_svmm_invalidate(spage->zone_device_data, args->start,
+				args->end);
 	if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
 			NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
 		goto error_dma_unmap;
@@ -531,7 +533,8 @@ nouveau_dmem_init(struct nouveau_drm *drm)
 }
 
 static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
-		unsigned long src, dma_addr_t *dma_addr, u64 *pfn)
+		struct nouveau_svmm *svmm, unsigned long src,
+		dma_addr_t *dma_addr, u64 *pfn)
 {
 	struct device *dev = drm->dev->dev;
 	struct page *dpage, *spage;
@@ -561,6 +564,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
 			goto out_free_page;
 	}
 
+	dpage->zone_device_data = svmm;
 	*pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
 		((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
 	if (src & MIGRATE_PFN_WRITE)
@@ -584,8 +588,8 @@ static void nouveau_dmem_migrate_chunk(struct nouveau_drm *drm,
 	unsigned long addr = args->start, nr_dma = 0, i;
 
 	for (i = 0; addr < args->end; i++) {
-		args->dst[i] = nouveau_dmem_migrate_copy_one(drm, args->src[i],
-				dma_addrs + nr_dma, pfns + i);
+		args->dst[i] = nouveau_dmem_migrate_copy_one(drm, svmm,
+				args->src[i], dma_addrs + nr_dma, pfns + i);
 		if (!dma_mapping_error(drm->dev->dev, dma_addrs[nr_dma]))
 			nr_dma++;
 		addr += PAGE_SIZE;
@@ -616,6 +620,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
 	struct migrate_vma args = {
 		.vma		= vma,
 		.start		= start,
+		.src_owner	= drm->dev,
 		.dir		= MIGRATE_VMA_FROM_SYSTEM,
 	};
 	unsigned long i;
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index c5f8ca6fb2e3..2ba7a8a2348c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -246,7 +246,7 @@ nouveau_svmm_join(struct nouveau_svmm *svmm, u64 inst)
 }
 
 /* Invalidate SVMM address-range on GPU. */
-static void
+void
 nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit)
 {
 	if (limit > start) {
@@ -279,6 +279,14 @@ nouveau_svmm_invalidate_range_start(struct mmu_notifier *mn,
 	if (unlikely(!svmm->vmm))
 		goto out;
 
+	/*
+	 * Ignore invalidation callbacks for device private pages since
+	 * the invalidation is handled as part of the migration process.
+	 */
+	if (update->event == MMU_NOTIFY_MIGRATE &&
+	    update->migrate_pgmap_owner == svmm->vmm->cli->drm->dev)
+		goto out;
+
 	if (limit > svmm->unmanaged.start && start < svmm->unmanaged.limit) {
 		if (start < svmm->unmanaged.start) {
 			nouveau_svmm_invalidate(svmm, start,
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.h b/drivers/gpu/drm/nouveau/nouveau_svm.h
index f0fcd1b72e8b..bb2d56e50e0c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.h
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.h
@@ -19,6 +19,7 @@ int nouveau_svmm_join(struct nouveau_svmm *, u64 inst);
 void nouveau_svmm_part(struct nouveau_svmm *, u64 inst);
 int nouveau_svmm_bind(struct drm_device *, void *, struct drm_file *);
 
+void nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit);
 u64 *nouveau_pfns_alloc(unsigned long npages);
 void nouveau_pfns_free(u64 *pfns);
 void nouveau_pfns_map(struct nouveau_svmm *svmm, struct mm_struct *mm,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] mm/hmm/test: use the new migration invalidation
  2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (3 preceding siblings ...)
  2020-07-13 17:21 ` [PATCH v2 4/5] nouveau/svm: use the new migration invalidation Ralph Campbell
@ 2020-07-13 17:21 ` Ralph Campbell
  2020-07-20 18:41 ` [PATCH v2 0/5] mm/migrate: avoid device private invalidations Jason Gunthorpe
  5 siblings, 0 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-13 17:21 UTC (permalink / raw)
  To: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest, linux-kernel
  Cc: Jerome Glisse, John Hubbard, Christoph Hellwig, Jason Gunthorpe,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao,
	Ralph Campbell

Use the new MMU_NOTIFY_MIGRATE event to skip MMU invalidations of device
private memory and handle the invalidation in the driver as part of
migrating device private memory.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
---
 lib/test_hmm.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 1bd60cfb5a25..77875fc4e7c1 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -214,6 +214,14 @@ static bool dmirror_interval_invalidate(struct mmu_interval_notifier *mni,
 {
 	struct dmirror *dmirror = container_of(mni, struct dmirror, notifier);
 
+	/*
+	 * Ignore invalidation callbacks for device private pages since
+	 * the invalidation is handled as part of the migration process.
+	 */
+	if (range->event == MMU_NOTIFY_MIGRATE &&
+	    range->migrate_pgmap_owner == dmirror->mdevice)
+		return true;
+
 	if (mmu_notifier_range_blockable(range))
 		mutex_lock(&dmirror->mutex);
 	else if (!mutex_trylock(&dmirror->mutex))
@@ -702,7 +710,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
 		args.dst = dst_pfns;
 		args.start = addr;
 		args.end = next;
-		args.src_owner = NULL;
+		args.src_owner = dmirror->mdevice;
 		args.dir = MIGRATE_VMA_FROM_SYSTEM;
 		ret = migrate_vma_setup(&args);
 		if (ret)
@@ -992,7 +1000,7 @@ static void dmirror_devmem_free(struct page *page)
 }
 
 static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
-						struct dmirror_device *mdevice)
+						      struct dmirror *dmirror)
 {
 	const unsigned long *src = args->src;
 	unsigned long *dst = args->dst;
@@ -1014,6 +1022,7 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
 			continue;
 
 		lock_page(dpage);
+		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
 		copy_highpage(dpage, spage);
 		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
 		if (*src & MIGRATE_PFN_WRITE)
@@ -1022,15 +1031,6 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
 	return 0;
 }
 
-static void dmirror_devmem_fault_finalize_and_map(struct migrate_vma *args,
-						  struct dmirror *dmirror)
-{
-	/* Invalidate the device's page table mapping. */
-	mutex_lock(&dmirror->mutex);
-	dmirror_do_update(dmirror, args->start, args->end);
-	mutex_unlock(&dmirror->mutex);
-}
-
 static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 {
 	struct migrate_vma args;
@@ -1060,11 +1060,16 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 	if (migrate_vma_setup(&args))
 		return VM_FAULT_SIGBUS;
 
-	ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror->mdevice);
+	ret = dmirror_devmem_fault_alloc_and_copy(&args, dmirror);
 	if (ret)
 		return ret;
 	migrate_vma_pages(&args);
-	dmirror_devmem_fault_finalize_and_map(&args, dmirror);
+	/*
+	 * No device finalize step is needed since
+	 * dmirror_devmem_fault_alloc_and_copy() will have already
+	 * invalidated the device page table. We could reinstate device MMU
+	 * entries for pages that didn't migrate but that should be rare.
+	 */
 	migrate_vma_finalize(&args);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-13 17:21 ` [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma Ralph Campbell
@ 2020-07-20 18:36   ` Jason Gunthorpe
  2020-07-20 19:54     ` Ralph Campbell
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-07-20 18:36 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Mon, Jul 13, 2020 at 10:21:46AM -0700, Ralph Campbell wrote:
> The src_owner field in struct migrate_vma is being used for two purposes,
> it implies the direction of the migration and it identifies device private
> pages owned by the caller. Split this into separate parameters so the
> src_owner field can be used just to identify device private pages owned
> by the caller of migrate_vma_setup().
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
>  arch/powerpc/kvm/book3s_hv_uvmem.c     |  2 ++
>  drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 ++
>  include/linux/migrate.h                | 12 +++++++++---
>  lib/test_hmm.c                         |  2 ++
>  mm/migrate.c                           |  5 +++--
>  5 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 09d8119024db..acbf14cd2d72 100644
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -400,6 +400,7 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>  	mig.end = end;
>  	mig.src = &src_pfn;
>  	mig.dst = &dst_pfn;
> +	mig.dir = MIGRATE_VMA_FROM_SYSTEM;
>  
>  	/*
>  	 * We come here with mmap_lock write lock held just for
> @@ -578,6 +579,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
>  	mig.src = &src_pfn;
>  	mig.dst = &dst_pfn;
>  	mig.src_owner = &kvmppc_uvmem_pgmap;
> +	mig.dir = MIGRATE_VMA_FROM_DEVICE_PRIVATE;
>  
>  	mutex_lock(&kvm->arch.uvmem_lock);
>  	/* The requested page is already paged-out, nothing to do */
> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> index e5c230d9ae24..e5c83b8ee82e 100644
> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> @@ -183,6 +183,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
>  		.src		= &src,
>  		.dst		= &dst,
>  		.src_owner	= drm->dev,
> +		.dir		= MIGRATE_VMA_FROM_DEVICE_PRIVATE,
>  	};
>  
>  	/*
> @@ -615,6 +616,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
>  	struct migrate_vma args = {
>  		.vma		= vma,
>  		.start		= start,
> +		.dir		= MIGRATE_VMA_FROM_SYSTEM,
>  	};
>  	unsigned long i;
>  	u64 *pfns;
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 3e546cbf03dd..620f2235d7d4 100644
> +++ b/include/linux/migrate.h
> @@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
>  	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
>  }
>  
> +enum migrate_vma_direction {
> +	MIGRATE_VMA_FROM_SYSTEM,
> +	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
> +};

I would have guessed this is more natural as _FROM_DEVICE_ and
TO_DEVICE_ ?

All the callers of this API are device drivers managing their
DEVICE_PRIVATE, right?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/notifier: add migration invalidation type
  2020-07-13 17:21 ` [PATCH v2 3/5] mm/notifier: add migration invalidation type Ralph Campbell
@ 2020-07-20 18:40   ` Jason Gunthorpe
  2020-07-20 19:56     ` Ralph Campbell
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-07-20 18:40 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Mon, Jul 13, 2020 at 10:21:47AM -0700, Ralph Campbell wrote:
> Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start()
> which flushes all device private page mappings whether or not a page
> is being migrated to/from device private memory. In order to not disrupt
> device mappings that are not being migrated, shift the responsibility
> for clearing device private mappings to the device driver and leave
> CPU page table unmapping handled by migrate_vma_setup(). To support
> this, the caller of migrate_vma_setup() should always set struct
> migrate_vma::src_owner to a non NULL value that matches the device
> private page->pgmap->owner. This value is then passed to the struct
> mmu_notifier_range with a new event type which the driver's invalidation
> function can use to avoid device MMU invalidations.
> 
> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>  include/linux/mmu_notifier.h | 7 +++++++
>  mm/migrate.c                 | 8 +++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index fc68f3570e19..1921fcf6be5b 100644
> +++ b/include/linux/mmu_notifier.h
> @@ -38,6 +38,10 @@ struct mmu_interval_notifier;
>   *
>   * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
>   * that the mm refcount is zero and the range is no longer accessible.
> + *
> + * @MMU_NOTIFY_MIGRATE: used during migrate_vma_collect() invalidate to signal
> + * a device driver to possibly ignore the invalidation if the
> + * migrate_pgmap_owner field matches the driver's device private pgmap owner.
>   */
>  enum mmu_notifier_event {
>  	MMU_NOTIFY_UNMAP = 0,
> @@ -46,6 +50,7 @@ enum mmu_notifier_event {
>  	MMU_NOTIFY_PROTECTION_PAGE,
>  	MMU_NOTIFY_SOFT_DIRTY,
>  	MMU_NOTIFY_RELEASE,
> +	MMU_NOTIFY_MIGRATE,
>  };
>  
>  #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
> @@ -264,6 +269,7 @@ struct mmu_notifier_range {
>  	unsigned long end;
>  	unsigned flags;
>  	enum mmu_notifier_event event;
> +	void *migrate_pgmap_owner;
>  };
>  
>  static inline int mm_has_notifiers(struct mm_struct *mm)
> @@ -513,6 +519,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
>  	range->start = start;
>  	range->end = end;
>  	range->flags = flags;
> +	range->migrate_pgmap_owner = NULL;
>  }
>  
>  #define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2bbc5c4c672e..9b3dcb81be5f 100644
> +++ b/mm/migrate.c
> @@ -2391,8 +2391,14 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
>  {
>  	struct mmu_notifier_range range;
>  
> -	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL,
> +	/*
> +	 * Note that the src_owner is passed to the mmu notifier callback so
> +	 * that the registered device driver can skip invalidating device
> +	 * private page mappings that won't be migrated.
> +	 */
> +	mmu_notifier_range_init(&range, MMU_NOTIFY_MIGRATE, 0, migrate->vma,
>  			migrate->vma->vm_mm, migrate->start, migrate->end);

So the idea is that src_owner is always set to the pgmap owner when
working with DEVICE_PRIVATE?

But then the comment in the prior patch should be fixed:

@@ -199,11 +204,12 @@  struct migrate_vma {
 
 	/*
 	 * Set to the owner value also stored in page->pgmap->owner for
+	 * migrating device private memory. The direction also needs to
+	 * be set to MIGRATE_VMA_FROM_DEVICE_PRIVATE.

To say the caller must always provide src_owner.

And that field should probably be renamed at this point, as there is
nothing "src" about it. It is just the pgmap_owner of the
DEVICE_PRIVATE pages the TO/FROM DEVICE migration is working on.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/5] mm/migrate: avoid device private invalidations
  2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
                   ` (4 preceding siblings ...)
  2020-07-13 17:21 ` [PATCH v2 5/5] mm/hmm/test: " Ralph Campbell
@ 2020-07-20 18:41 ` Jason Gunthorpe
  2020-07-20 19:58   ` Ralph Campbell
  5 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-07-20 18:41 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Mon, Jul 13, 2020 at 10:21:44AM -0700, Ralph Campbell wrote:
> The goal for this series is to avoid device private memory TLB
> invalidations when migrating a range of addresses from system
> memory to device private memory and some of those pages have already
> been migrated. The approach taken is to introduce a new mmu notifier
> invalidation event type and use that in the device driver to skip
> invalidation callbacks from migrate_vma_setup(). The device driver is
> also then expected to handle device MMU invalidations as part of the
> migrate_vma_setup(), migrate_vma_pages(), migrate_vma_finalize() process.
> Note that this is opt-in. A device driver can simply invalidate its MMU
> in the mmu notifier callback and not handle MMU invalidations in the
> migration sequence.
> 
> This series is based on Jason Gunthorpe's HMM tree (linux-5.8.0-rc4).
> 
> Also, this replaces the need for the following two patches I sent:
> ("mm: fix migrate_vma_setup() src_owner and normal pages")
> https://lore.kernel.org/linux-mm/20200622222008.9971-1-rcampbell@nvidia.com
> ("nouveau: fix mixed normal and device private page migration")
> https://lore.kernel.org/lkml/20200622233854.10889-3-rcampbell@nvidia.com
> 
> Changes in v2:
> Rebase to Jason Gunthorpe's HMM tree.
> Added reviewed-by from Bharata B Rao.
> Rename the mmu_notifier_range::data field to migrate_pgmap_owner as
>   suggested by Jason Gunthorpe.

I didn't see anything stand out in this at this point, did you intend
this to go to the HMM tree?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-20 18:36   ` Jason Gunthorpe
@ 2020-07-20 19:54     ` Ralph Campbell
  2020-07-20 19:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Ralph Campbell @ 2020-07-20 19:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/20/20 11:36 AM, Jason Gunthorpe wrote:
> On Mon, Jul 13, 2020 at 10:21:46AM -0700, Ralph Campbell wrote:
>> The src_owner field in struct migrate_vma is being used for two purposes,
>> it implies the direction of the migration and it identifies device private
>> pages owned by the caller. Split this into separate parameters so the
>> src_owner field can be used just to identify device private pages owned
>> by the caller of migrate_vma_setup().
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>> Reviewed-by: Bharata B Rao <bharata@linux.ibm.com>
>>   arch/powerpc/kvm/book3s_hv_uvmem.c     |  2 ++
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c |  2 ++
>>   include/linux/migrate.h                | 12 +++++++++---
>>   lib/test_hmm.c                         |  2 ++
>>   mm/migrate.c                           |  5 +++--
>>   5 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> index 09d8119024db..acbf14cd2d72 100644
>> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
>> @@ -400,6 +400,7 @@ kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
>>   	mig.end = end;
>>   	mig.src = &src_pfn;
>>   	mig.dst = &dst_pfn;
>> +	mig.dir = MIGRATE_VMA_FROM_SYSTEM;
>>   
>>   	/*
>>   	 * We come here with mmap_lock write lock held just for
>> @@ -578,6 +579,7 @@ kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
>>   	mig.src = &src_pfn;
>>   	mig.dst = &dst_pfn;
>>   	mig.src_owner = &kvmppc_uvmem_pgmap;
>> +	mig.dir = MIGRATE_VMA_FROM_DEVICE_PRIVATE;
>>   
>>   	mutex_lock(&kvm->arch.uvmem_lock);
>>   	/* The requested page is already paged-out, nothing to do */
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> index e5c230d9ae24..e5c83b8ee82e 100644
>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>> @@ -183,6 +183,7 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
>>   		.src		= &src,
>>   		.dst		= &dst,
>>   		.src_owner	= drm->dev,
>> +		.dir		= MIGRATE_VMA_FROM_DEVICE_PRIVATE,
>>   	};
>>   
>>   	/*
>> @@ -615,6 +616,7 @@ nouveau_dmem_migrate_vma(struct nouveau_drm *drm,
>>   	struct migrate_vma args = {
>>   		.vma		= vma,
>>   		.start		= start,
>> +		.dir		= MIGRATE_VMA_FROM_SYSTEM,
>>   	};
>>   	unsigned long i;
>>   	u64 *pfns;
>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>> index 3e546cbf03dd..620f2235d7d4 100644
>> +++ b/include/linux/migrate.h
>> @@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
>>   	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
>>   }
>>   
>> +enum migrate_vma_direction {
>> +	MIGRATE_VMA_FROM_SYSTEM,
>> +	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
>> +};
> 
> I would have guessed this is more natural as _FROM_DEVICE_ and
> TO_DEVICE_ ?

The caller controls where the destination memory is allocated so it isn't
necessarily device private memory, it could be from system to system.
The use case for system to system memory migration is for hardware
like ARM SMMU or PCIe ATS where a single set of page tables is shared by
the device and a CPU process over a coherent system memory bus.
Also many integrated GPUs in SOCs fall into this category too.

So to me, it makes more sense to specify the direction based on the
source location.

> All the callers of this API are device drivers managing their
> DEVICE_PRIVATE, right?

True for now, yes.

> Jason
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] mm/notifier: add migration invalidation type
  2020-07-20 18:40   ` Jason Gunthorpe
@ 2020-07-20 19:56     ` Ralph Campbell
  0 siblings, 0 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-20 19:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/20/20 11:40 AM, Jason Gunthorpe wrote:
> On Mon, Jul 13, 2020 at 10:21:47AM -0700, Ralph Campbell wrote:
>> Currently migrate_vma_setup() calls mmu_notifier_invalidate_range_start()
>> which flushes all device private page mappings whether or not a page
>> is being migrated to/from device private memory. In order to not disrupt
>> device mappings that are not being migrated, shift the responsibility
>> for clearing device private mappings to the device driver and leave
>> CPU page table unmapping handled by migrate_vma_setup(). To support
>> this, the caller of migrate_vma_setup() should always set struct
>> migrate_vma::src_owner to a non NULL value that matches the device
>> private page->pgmap->owner. This value is then passed to the struct
>> mmu_notifier_range with a new event type which the driver's invalidation
>> function can use to avoid device MMU invalidations.
>>
>> Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
>>   include/linux/mmu_notifier.h | 7 +++++++
>>   mm/migrate.c                 | 8 +++++++-
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
>> index fc68f3570e19..1921fcf6be5b 100644
>> +++ b/include/linux/mmu_notifier.h
>> @@ -38,6 +38,10 @@ struct mmu_interval_notifier;
>>    *
>>    * @MMU_NOTIFY_RELEASE: used during mmu_interval_notifier invalidate to signal
>>    * that the mm refcount is zero and the range is no longer accessible.
>> + *
>> + * @MMU_NOTIFY_MIGRATE: used during migrate_vma_collect() invalidate to signal
>> + * a device driver to possibly ignore the invalidation if the
>> + * migrate_pgmap_owner field matches the driver's device private pgmap owner.
>>    */
>>   enum mmu_notifier_event {
>>   	MMU_NOTIFY_UNMAP = 0,
>> @@ -46,6 +50,7 @@ enum mmu_notifier_event {
>>   	MMU_NOTIFY_PROTECTION_PAGE,
>>   	MMU_NOTIFY_SOFT_DIRTY,
>>   	MMU_NOTIFY_RELEASE,
>> +	MMU_NOTIFY_MIGRATE,
>>   };
>>   
>>   #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
>> @@ -264,6 +269,7 @@ struct mmu_notifier_range {
>>   	unsigned long end;
>>   	unsigned flags;
>>   	enum mmu_notifier_event event;
>> +	void *migrate_pgmap_owner;
>>   };
>>   
>>   static inline int mm_has_notifiers(struct mm_struct *mm)
>> @@ -513,6 +519,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
>>   	range->start = start;
>>   	range->end = end;
>>   	range->flags = flags;
>> +	range->migrate_pgmap_owner = NULL;
>>   }
>>   
>>   #define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 2bbc5c4c672e..9b3dcb81be5f 100644
>> +++ b/mm/migrate.c
>> @@ -2391,8 +2391,14 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
>>   {
>>   	struct mmu_notifier_range range;
>>   
>> -	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL,
>> +	/*
>> +	 * Note that the src_owner is passed to the mmu notifier callback so
>> +	 * that the registered device driver can skip invalidating device
>> +	 * private page mappings that won't be migrated.
>> +	 */
>> +	mmu_notifier_range_init(&range, MMU_NOTIFY_MIGRATE, 0, migrate->vma,
>>   			migrate->vma->vm_mm, migrate->start, migrate->end);
> 
> So the idea is that src_owner is always set to the pgmap owner when
> working with DEVICE_PRIVATE?
> 
> But then the comment in the prior patch should be fixed:
> 
> @@ -199,11 +204,12 @@  struct migrate_vma {
>   
>   	/*
>   	 * Set to the owner value also stored in page->pgmap->owner for
> +	 * migrating device private memory. The direction also needs to
> +	 * be set to MIGRATE_VMA_FROM_DEVICE_PRIVATE.
> 
> To say the caller must always provide src_owner.
> 
> And that field should probably be renamed at this point, as there is
> nothing "src" about it. It is just the pgmap_owner of the
> DEVICE_PRIVATE pages the TO/FROM DEVICE migration is working on.
> 
> Jason

Good point. I'll send a v3 with your suggested changes.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/5] mm/migrate: avoid device private invalidations
  2020-07-20 18:41 ` [PATCH v2 0/5] mm/migrate: avoid device private invalidations Jason Gunthorpe
@ 2020-07-20 19:58   ` Ralph Campbell
  0 siblings, 0 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-20 19:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/20/20 11:41 AM, Jason Gunthorpe wrote:
> On Mon, Jul 13, 2020 at 10:21:44AM -0700, Ralph Campbell wrote:
>> The goal for this series is to avoid device private memory TLB
>> invalidations when migrating a range of addresses from system
>> memory to device private memory and some of those pages have already
>> been migrated. The approach taken is to introduce a new mmu notifier
>> invalidation event type and use that in the device driver to skip
>> invalidation callbacks from migrate_vma_setup(). The device driver is
>> also then expected to handle device MMU invalidations as part of the
>> migrate_vma_setup(), migrate_vma_pages(), migrate_vma_finalize() process.
>> Note that this is opt-in. A device driver can simply invalidate its MMU
>> in the mmu notifier callback and not handle MMU invalidations in the
>> migration sequence.
>>
>> This series is based on Jason Gunthorpe's HMM tree (linux-5.8.0-rc4).
>>
>> Also, this replaces the need for the following two patches I sent:
>> ("mm: fix migrate_vma_setup() src_owner and normal pages")
>> https://lore.kernel.org/linux-mm/20200622222008.9971-1-rcampbell@nvidia.com
>> ("nouveau: fix mixed normal and device private page migration")
>> https://lore.kernel.org/lkml/20200622233854.10889-3-rcampbell@nvidia.com
>>
>> Changes in v2:
>> Rebase to Jason Gunthorpe's HMM tree.
>> Added reviewed-by from Bharata B Rao.
>> Rename the mmu_notifier_range::data field to migrate_pgmap_owner as
>>    suggested by Jason Gunthorpe.
> 
> I didn't see anything stand out in this at this point, did you intend
> this to go to the HMM tree?
> 
> Thanks,
> Jason

Yes, please.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-20 19:54     ` Ralph Campbell
@ 2020-07-20 19:59       ` Jason Gunthorpe
  2020-07-20 20:49         ` Ralph Campbell
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-07-20 19:59 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Mon, Jul 20, 2020 at 12:54:53PM -0700, Ralph Campbell wrote:
> > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > index 3e546cbf03dd..620f2235d7d4 100644
> > > +++ b/include/linux/migrate.h
> > > @@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
> > >   	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
> > >   }
> > > +enum migrate_vma_direction {
> > > +	MIGRATE_VMA_FROM_SYSTEM,
> > > +	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
> > > +};
> > 
> > I would have guessed this is more natural as _FROM_DEVICE_ and
> > TO_DEVICE_ ?
> 
> The caller controls where the destination memory is allocated so it isn't
> necessarily device private memory, it could be from system to system.
> The use case for system to system memory migration is for hardware
> like ARM SMMU or PCIe ATS where a single set of page tables is shared by
> the device and a CPU process over a coherent system memory bus.
> Also many integrated GPUs in SOCs fall into this category too.

Maybe just TO/FROM_DEIVCE then? Even though the memory is not
DEVICE_PRIVATE it is still device owned pages right?

> So to me, it makes more sense to specify the direction based on the
> source location.

It feels strange because the driver doesn't always know or control the
source?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-20 19:59       ` Jason Gunthorpe
@ 2020-07-20 20:49         ` Ralph Campbell
  2020-07-20 23:16           ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Ralph Campbell @ 2020-07-20 20:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/20/20 12:59 PM, Jason Gunthorpe wrote:
> On Mon, Jul 20, 2020 at 12:54:53PM -0700, Ralph Campbell wrote:
>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>>> index 3e546cbf03dd..620f2235d7d4 100644
>>>> +++ b/include/linux/migrate.h
>>>> @@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
>>>>    	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
>>>>    }
>>>> +enum migrate_vma_direction {
>>>> +	MIGRATE_VMA_FROM_SYSTEM,
>>>> +	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
>>>> +};
>>>
>>> I would have guessed this is more natural as _FROM_DEVICE_ and
>>> TO_DEVICE_ ?
>>
>> The caller controls where the destination memory is allocated so it isn't
>> necessarily device private memory, it could be from system to system.
>> The use case for system to system memory migration is for hardware
>> like ARM SMMU or PCIe ATS where a single set of page tables is shared by
>> the device and a CPU process over a coherent system memory bus.
>> Also many integrated GPUs in SOCs fall into this category too.
> 
> Maybe just TO/FROM_DEIVCE then? Even though the memory is not
> DEVICE_PRIVATE it is still device owned pages right?
> 
>> So to me, it makes more sense to specify the direction based on the
>> source location.
> 
> It feels strange because the driver doesn't always know or control the
> source?
> 
> Jason
> 

The driver can't really know where the source is currently located because the
API is designed to not initially hold the page locks, migrate_vma_setup() only knows
the source once it holds the page table locks and isolates/locks the pages being
migrated. The direction and pgmap_owner are supposed to filter which pages
the caller is interested in migrating.
Perhaps the direction should instead be a flags field with separate bits for
system memory and device private memory selecting source candidates for
migration. I can imagine use cases for all 4 combinations of
d->d, d->s, s->d, and s->s being valid.

I didn't really think a direction was needed, this was something that
Christoph Hellwig seemed to think made the API safer.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-20 20:49         ` Ralph Campbell
@ 2020-07-20 23:16           ` Jason Gunthorpe
  2020-07-20 23:53             ` Ralph Campbell
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-07-20 23:16 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao

On Mon, Jul 20, 2020 at 01:49:09PM -0700, Ralph Campbell wrote:
> 
> On 7/20/20 12:59 PM, Jason Gunthorpe wrote:
> > On Mon, Jul 20, 2020 at 12:54:53PM -0700, Ralph Campbell wrote:
> > > > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > > > > index 3e546cbf03dd..620f2235d7d4 100644
> > > > > +++ b/include/linux/migrate.h
> > > > > @@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
> > > > >    	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
> > > > >    }
> > > > > +enum migrate_vma_direction {
> > > > > +	MIGRATE_VMA_FROM_SYSTEM,
> > > > > +	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
> > > > > +};
> > > > 
> > > > I would have guessed this is more natural as _FROM_DEVICE_ and
> > > > TO_DEVICE_ ?
> > > 
> > > The caller controls where the destination memory is allocated so it isn't
> > > necessarily device private memory, it could be from system to system.
> > > The use case for system to system memory migration is for hardware
> > > like ARM SMMU or PCIe ATS where a single set of page tables is shared by
> > > the device and a CPU process over a coherent system memory bus.
> > > Also many integrated GPUs in SOCs fall into this category too.
> > 
> > Maybe just TO/FROM_DEIVCE then? Even though the memory is not
> > DEVICE_PRIVATE it is still device owned pages right?
> > 
> > > So to me, it makes more sense to specify the direction based on the
> > > source location.
> > 
> > It feels strange because the driver doesn't always know or control the
> > source?
> 
> The driver can't really know where the source is currently located because the
> API is designed to not initially hold the page locks, migrate_vma_setup() only knows
> the source once it holds the page table locks and isolates/locks the pages being
> migrated. The direction and pgmap_owner are supposed to filter which pages
> the caller is interested in migrating.
> Perhaps the direction should instead be a flags field with separate bits for
> system memory and device private memory selecting source candidates for
> migration. I can imagine use cases for all 4 combinations of
> d->d, d->s, s->d, and s->s being valid.
> 
> I didn't really think a direction was needed, this was something that
> Christoph Hellwig seemed to think made the API safer.

If it is a filter then just using those names would make sense

MIGRATE_VMA_SELECT_SYSTEM
MIGRATE_VMA_SELECT_DEVICE_PRIVATE

SYSTEM feels like the wrong name too, doesn't linux have a formal name
for RAM struct pages?

In your future coherent design how would the migrate select 'device'
pages that are fully coherent? Are they still zone something pages
that are OK for CPU usage?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma
  2020-07-20 23:16           ` Jason Gunthorpe
@ 2020-07-20 23:53             ` Ralph Campbell
  0 siblings, 0 replies; 16+ messages in thread
From: Ralph Campbell @ 2020-07-20 23:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-mm, nouveau, kvm-ppc, linux-kselftest,
	linux-kernel, Jerome Glisse, John Hubbard, Christoph Hellwig,
	Andrew Morton, Shuah Khan, Ben Skeggs, Bharata B Rao


On 7/20/20 4:16 PM, Jason Gunthorpe wrote:
> On Mon, Jul 20, 2020 at 01:49:09PM -0700, Ralph Campbell wrote:
>>
>> On 7/20/20 12:59 PM, Jason Gunthorpe wrote:
>>> On Mon, Jul 20, 2020 at 12:54:53PM -0700, Ralph Campbell wrote:
>>>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>>>>> index 3e546cbf03dd..620f2235d7d4 100644
>>>>>> +++ b/include/linux/migrate.h
>>>>>> @@ -180,6 +180,11 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
>>>>>>     	return (pfn << MIGRATE_PFN_SHIFT) | MIGRATE_PFN_VALID;
>>>>>>     }
>>>>>> +enum migrate_vma_direction {
>>>>>> +	MIGRATE_VMA_FROM_SYSTEM,
>>>>>> +	MIGRATE_VMA_FROM_DEVICE_PRIVATE,
>>>>>> +};
>>>>>
>>>>> I would have guessed this is more natural as _FROM_DEVICE_ and
>>>>> TO_DEVICE_ ?
>>>>
>>>> The caller controls where the destination memory is allocated so it isn't
>>>> necessarily device private memory, it could be from system to system.
>>>> The use case for system to system memory migration is for hardware
>>>> like ARM SMMU or PCIe ATS where a single set of page tables is shared by
>>>> the device and a CPU process over a coherent system memory bus.
>>>> Also many integrated GPUs in SOCs fall into this category too.
>>>
>>> Maybe just TO/FROM_DEIVCE then? Even though the memory is not
>>> DEVICE_PRIVATE it is still device owned pages right?
>>>
>>>> So to me, it makes more sense to specify the direction based on the
>>>> source location.
>>>
>>> It feels strange because the driver doesn't always know or control the
>>> source?
>>
>> The driver can't really know where the source is currently located because the
>> API is designed to not initially hold the page locks, migrate_vma_setup() only knows
>> the source once it holds the page table locks and isolates/locks the pages being
>> migrated. The direction and pgmap_owner are supposed to filter which pages
>> the caller is interested in migrating.
>> Perhaps the direction should instead be a flags field with separate bits for
>> system memory and device private memory selecting source candidates for
>> migration. I can imagine use cases for all 4 combinations of
>> d->d, d->s, s->d, and s->s being valid.
>>
>> I didn't really think a direction was needed, this was something that
>> Christoph Hellwig seemed to think made the API safer.
> 
> If it is a filter then just using those names would make sense
> 
> MIGRATE_VMA_SELECT_SYSTEM
> MIGRATE_VMA_SELECT_DEVICE_PRIVATE
> 
> SYSTEM feels like the wrong name too, doesn't linux have a formal name
> for RAM struct pages?

Highmem? Movable? Zone normal?
There are quite a few :-)
At the moment, only anonymous pages are being migrated but I expect
file backed pages to be supported at some point (but not DAX).
VM_PFNMAP and VM_MIXEDMAP might make sense some day with peer-to-peer
copies.

So MIGRATE_VMA_SELECT_SYSTEM seems OK to me.

> In your future coherent design how would the migrate select 'device'
> pages that are fully coherent? Are they still zone something pages
> that are OK for CPU usage?
> 
> Jason
> 

For pages that are device private, the pgmap_owner selects them (plus the
MIGRATE_VMA_SELECT_DEVICE_PRIVATE flag).
For pages that are migrating from system memory to system memory, I expect
the pages to be in different NUMA zones. Otherwise, there wouldn't be much
point in migrating them. And yes, the CPU can access them.
It might be useful to have a filter saying "migrate system memory not already
in NUMA zone X" if the MIGRATE_VMA_SELECT_SYSTEM flag is set.

Also, in support of the flags field, I'm looking at THP migration and I can
picture defining some request flags like hmm_range_fault() to say "migrate
THPs if they exist, otherwise split THPs".
A default_flags MIGRATE_PFN_REQ_FAULT would be useful if the source page is
swapped out. Currently, migrate_vma_setup() just skips these pages without
any indication to the caller why the page isn't being migrated or if retrying
is worth attempting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 17:21 [PATCH v2 0/5] mm/migrate: avoid device private invalidations Ralph Campbell
2020-07-13 17:21 ` [PATCH v2 1/5] nouveau: fix storing invalid ptes Ralph Campbell
2020-07-13 17:21 ` [PATCH v2 2/5] mm/migrate: add a direction parameter to migrate_vma Ralph Campbell
2020-07-20 18:36   ` Jason Gunthorpe
2020-07-20 19:54     ` Ralph Campbell
2020-07-20 19:59       ` Jason Gunthorpe
2020-07-20 20:49         ` Ralph Campbell
2020-07-20 23:16           ` Jason Gunthorpe
2020-07-20 23:53             ` Ralph Campbell
2020-07-13 17:21 ` [PATCH v2 3/5] mm/notifier: add migration invalidation type Ralph Campbell
2020-07-20 18:40   ` Jason Gunthorpe
2020-07-20 19:56     ` Ralph Campbell
2020-07-13 17:21 ` [PATCH v2 4/5] nouveau/svm: use the new migration invalidation Ralph Campbell
2020-07-13 17:21 ` [PATCH v2 5/5] mm/hmm/test: " Ralph Campbell
2020-07-20 18:41 ` [PATCH v2 0/5] mm/migrate: avoid device private invalidations Jason Gunthorpe
2020-07-20 19:58   ` Ralph Campbell

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git