linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
@ 2022-01-10 22:31 Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 01/10] mm: add zone device coherent type memory support Alex Sierra
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
owned by a device that can be mapped into CPU page tables like
MEMORY_DEVICE_GENERIC and can also be migrated like
MEMORY_DEVICE_PRIVATE.

Christoph, the suggestion to incorporate Ralph Campbell’s refcount
cleanup patch into our hardware page migration patchset originally came
from you, but it proved impractical to do things in that order because
the refcount cleanup introduced a bug with wide ranging structural
implications. Instead, we amended Ralph’s patch so that it could be
applied after merging the migration work. As we saw from the recent
discussion, merging the refcount work is going to take some time and
cooperation between multiple development groups, while the migration
work is ready now and is needed now. So we propose to merge this
patchset first and continue to work with Ralph and others to merge the
refcount cleanup separately, when it is ready.

This patch series is mostly self-contained except for a few places where
it needs to update other subsystems to handle the new memory type.
System stability and performance are not affected according to our
ongoing testing, including xfstests.

How it works: The system BIOS advertises the GPU device memory
(aka VRAM) as SPM (special purpose memory) in the UEFI system address
map.

The amdgpu driver registers the memory with devmap as
MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
this hardware page migration capability is the Frontier supercomputer
project. This functionality is not AMD-specific. We expect other GPU
vendors to find this functionality useful, and possibly other hardware
types in the future.

Our test nodes in the lab are similar to the Frontier configuration,
with .5 TB of system memory plus 256 GB of device memory split across
4 GPUs, all in a single coherent address space. Page migration is
expected to improve application efficiency significantly. We will
report empirical results as they become available.

We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This
patch set builds on HMM and our SVM memory manager already merged in
5.15.

v2:
- test_hmm is now able to create private and coherent device mirror
instances in the same driver probe. This adds more usability to the hmm
test by not having to remove the kernel module for each device type
test (private/coherent type). This is done by passing the module
parameters spm_addr_dev0 & spm_addr_dev1. In this case, it will create
four instances of device_mirror. The first two correspond to private
device type, the last two to coherent type. Then, they can be easily
accessed from user space through /dev/hmm_mirror<num_device>. Usually
num_device 0 and 1 are for private, and 2 and 3 for coherent types.

- Coherent device type pages at gup are now migrated back to system
memory if they have been long term pinned (FOLL_LONGTERM). The reason
is these pages could eventually interfere with their own device memory
manager. A new hmm_gup_test has been added to the hmm-test to test this
functionality. It makes use of the gup_test module to long term pin
user pages that have been migrate to device memory first.

- Other patch corrections made by Felix, Alistair and Christoph.

v3:
- Based on last v2 feedback we got from Alistair, we've decided to
remove migration logic for FOLL_LONGTERM coherent device type pages at
gup for now. Ideally, this should be done through the kernel mm,
instead of calling the device driver to do it. Currently, there's no
support for migrating device pages based on pfn, mainly because
migrate_pages() relies on pages being LRU pages. Alistair mentioned, he
has started to work on adding this migrate device pages logic. For now,
we fail on get_user_pages call with FOLL_LONGTERM for DEVICE_COHERENT
pages.

- Also, hmm_gup_test has been removed from hmm-test. We plan to include
it again after this migration work is ready.

- Addressed Liam Howlett's feedback changes.

Alex Sierra (10):
  mm: add zone device coherent type memory support
  mm: add device coherent vma selection for memory migration
  mm/gup: fail get_user_pages for LONGTERM dev coherent type
  drm/amdkfd: add SPM support for SVM
  drm/amdkfd: coherent type as sys mem on migration to ram
  lib: test_hmm add ioctl to get zone device type
  lib: test_hmm add module param for zone device type
  lib: add support for device coherent type in test_hmm
  tools: update hmm-test to support device coherent type
  tools: update test_hmm script to support SP config

 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  34 ++-
 include/linux/memremap.h                 |   8 +
 include/linux/migrate.h                  |   1 +
 include/linux/mm.h                       |  16 ++
 lib/test_hmm.c                           | 333 +++++++++++++++++------
 lib/test_hmm_uapi.h                      |  22 +-
 mm/gup.c                                 |   7 +
 mm/memcontrol.c                          |   6 +-
 mm/memory-failure.c                      |   8 +-
 mm/memremap.c                            |   5 +-
 mm/migrate.c                             |  30 +-
 tools/testing/selftests/vm/hmm-tests.c   | 122 +++++++--
 tools/testing/selftests/vm/test_hmm.sh   |  24 +-
 13 files changed, 475 insertions(+), 141 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 01/10] mm: add zone device coherent type memory support
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-20  4:08   ` Alistair Popple
  2022-01-10 22:31 ` [PATCH v3 02/10] mm: add device coherent vma selection for memory migration Alex Sierra
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 include/linux/memremap.h |  8 ++++++++
 include/linux/mm.h       | 16 ++++++++++++++++
 mm/memcontrol.c          |  6 +++---
 mm/memory-failure.c      |  8 ++++++--
 mm/memremap.c            |  5 ++++-
 mm/migrate.c             | 21 +++++++++++++--------
 6 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index c0e9d35889e8..ff4d398edf35 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -39,6 +39,13 @@ struct vmem_altmap {
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.rst.
  *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one
+ * should be allowed to pin such memory so that it can always be evicted.
+ *
  * MEMORY_DEVICE_FS_DAX:
  * Host memory that has similar access semantics as System RAM i.e. DMA
  * coherent and supports page pinning. In support of coordinating page
@@ -59,6 +66,7 @@ struct vmem_altmap {
 enum memory_type {
 	/* 0 is reserved to catch uninitialized type fields */
 	MEMORY_DEVICE_PRIVATE = 1,
+	MEMORY_DEVICE_COHERENT,
 	MEMORY_DEVICE_FS_DAX,
 	MEMORY_DEVICE_GENERIC,
 	MEMORY_DEVICE_PCI_P2PDMA,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 73a52aba448f..fcf96c0fc918 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1162,6 +1162,7 @@ static inline bool page_is_devmap_managed(struct page *page)
 		return false;
 	switch (page->pgmap->type) {
 	case MEMORY_DEVICE_PRIVATE:
+	case MEMORY_DEVICE_COHERENT:
 	case MEMORY_DEVICE_FS_DAX:
 		return true;
 	default:
@@ -1191,6 +1192,21 @@ static inline bool is_device_private_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
+static inline bool is_device_coherent_page(const struct page *page)
+{
+	return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+		is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_COHERENT;
+}
+
+static inline bool is_device_page(const struct page *page)
+{
+	return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
+		is_zone_device_page(page) &&
+		(page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
+		page->pgmap->type == MEMORY_DEVICE_COHERENT);
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
 	return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6da5020a8656..d0bab0747c73 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5695,8 +5695,8 @@ static int mem_cgroup_move_account(struct page *page,
  *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
  *     target for charge migration. if @target is not NULL, the entry is stored
  *     in target->ent.
- *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PRIVATE
- *     (so ZONE_DEVICE page and thus not on the lru).
+ *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is device memory and
+ *   thus not on the lru.
  *     For now we such page is charge like a regular page would be as for all
  *     intent and purposes it is just special memory taking the place of a
  *     regular page.
@@ -5730,7 +5730,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		 */
 		if (page_memcg(page) == mc.from) {
 			ret = MC_TARGET_PAGE;
-			if (is_device_private_page(page))
+			if (is_device_page(page))
 				ret = MC_TARGET_DEVICE;
 			if (target)
 				target->page = page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3e6449f2102a..4cf212e5f432 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1554,12 +1554,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 		goto unlock;
 	}
 
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+	switch (pgmap->type) {
+	case MEMORY_DEVICE_PRIVATE:
+	case MEMORY_DEVICE_COHERENT:
 		/*
-		 * TODO: Handle HMM pages which may need coordination
+		 * TODO: Handle device pages which may need coordination
 		 * with device-side memory.
 		 */
 		goto unlock;
+	default:
+		break;
 	}
 
 	/*
diff --git a/mm/memremap.c b/mm/memremap.c
index ed593bf87109..94d6a1e01d42 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -44,6 +44,7 @@ EXPORT_SYMBOL(devmap_managed_key);
 static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
 {
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
+	    pgmap->type == MEMORY_DEVICE_COHERENT ||
 	    pgmap->type == MEMORY_DEVICE_FS_DAX)
 		static_branch_dec(&devmap_managed_key);
 }
@@ -51,6 +52,7 @@ static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
 static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
 {
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
+	    pgmap->type == MEMORY_DEVICE_COHERENT ||
 	    pgmap->type == MEMORY_DEVICE_FS_DAX)
 		static_branch_inc(&devmap_managed_key);
 }
@@ -328,6 +330,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 
 	switch (pgmap->type) {
 	case MEMORY_DEVICE_PRIVATE:
+	case MEMORY_DEVICE_COHERENT:
 		if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
 			WARN(1, "Device private memory not supported\n");
 			return ERR_PTR(-EINVAL);
@@ -498,7 +501,7 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
 void free_devmap_managed_page(struct page *page)
 {
 	/* notify page idle for dax */
-	if (!is_device_private_page(page)) {
+	if (!is_device_page(page)) {
 		wake_up_var(&page->_refcount);
 		return;
 	}
diff --git a/mm/migrate.c b/mm/migrate.c
index 1852d787e6ab..91018880dc7f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -362,7 +362,7 @@ static int expected_page_refs(struct address_space *mapping, struct page *page)
 	 * Device private pages have an extra refcount as they are
 	 * ZONE_DEVICE pages.
 	 */
-	expected_count += is_device_private_page(page);
+	expected_count += is_device_page(page);
 	if (mapping)
 		expected_count += thp_nr_pages(page) + page_has_private(page);
 
@@ -2503,7 +2503,7 @@ static bool migrate_vma_check_page(struct page *page)
 		 * FIXME proper solution is to rework migration_entry_wait() so
 		 * it does not need to take a reference on page.
 		 */
-		return is_device_private_page(page);
+		return is_device_page(page);
 	}
 
 	/* For file back page */
@@ -2791,7 +2791,7 @@ EXPORT_SYMBOL(migrate_vma_setup);
  *     handle_pte_fault()
  *       do_anonymous_page()
  * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE
- * private page.
+ * private or coherent page.
  */
 static void migrate_vma_insert_page(struct migrate_vma *migrate,
 				    unsigned long addr,
@@ -2867,10 +2867,15 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
 				swp_entry = make_readable_device_private_entry(
 							page_to_pfn(page));
 			entry = swp_entry_to_pte(swp_entry);
+		} else if (is_device_coherent_page(page)) {
+			entry = pte_mkold(mk_pte(page,
+						 READ_ONCE(vma->vm_page_prot)));
+			if (vma->vm_flags & VM_WRITE)
+				entry = pte_mkwrite(pte_mkdirty(entry));
 		} else {
 			/*
-			 * For now we only support migrating to un-addressable
-			 * device memory.
+			 * We support migrating to private and coherent types
+			 * for device zone memory.
 			 */
 			pr_warn_once("Unsupported ZONE_DEVICE page type.\n");
 			goto abort;
@@ -2976,10 +2981,10 @@ void migrate_vma_pages(struct migrate_vma *migrate)
 		mapping = page_mapping(page);
 
 		if (is_zone_device_page(newpage)) {
-			if (is_device_private_page(newpage)) {
+			if (is_device_page(newpage)) {
 				/*
-				 * For now only support private anonymous when
-				 * migrating to un-addressable device memory.
+				 * For now only support private and coherent
+				 * anonymous when migrating to device memory.
 				 */
 				if (mapping) {
 					migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 02/10] mm: add device coherent vma selection for memory migration
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 01/10] mm: add zone device coherent type memory support Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

This case is used to migrate pages from device memory, back to system
memory. Device coherent type memory is cache coherent from device and CPU
point of view.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
v2:
condition added when migrations from device coherent pages.
---
 include/linux/migrate.h | 1 +
 mm/migrate.c            | 9 +++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c8077e936691..e74bb0978f6f 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -138,6 +138,7 @@ static inline unsigned long migrate_pfn(unsigned long pfn)
 enum migrate_vma_direction {
 	MIGRATE_VMA_SELECT_SYSTEM = 1 << 0,
 	MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1,
+	MIGRATE_VMA_SELECT_DEVICE_COHERENT = 1 << 2,
 };
 
 struct migrate_vma {
diff --git a/mm/migrate.c b/mm/migrate.c
index 91018880dc7f..0367f471211a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2340,8 +2340,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 			if (is_writable_device_private_entry(entry))
 				mpfn |= MIGRATE_PFN_WRITE;
 		} else {
-			if (!(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
-				goto next;
 			pfn = pte_pfn(pte);
 			if (is_zero_pfn(pfn)) {
 				mpfn = MIGRATE_PFN_MIGRATE;
@@ -2349,6 +2347,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				goto next;
 			}
 			page = vm_normal_page(migrate->vma, addr, pte);
+			if (page && !is_zone_device_page(page) &&
+			    !(migrate->flags & MIGRATE_VMA_SELECT_SYSTEM))
+				goto next;
+			if (page && is_device_coherent_page(page) &&
+			    (!(migrate->flags & MIGRATE_VMA_SELECT_DEVICE_COHERENT) ||
+			     page->pgmap->owner != migrate->pgmap_owner))
+				goto next;
 			mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
 			mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
 		}
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 01/10] mm: add zone device coherent type memory support Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 02/10] mm: add device coherent vma selection for memory migration Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-20 12:36   ` Joao Martins
  2022-01-10 22:31 ` [PATCH v3 04/10] drm/amdkfd: add SPM support for SVM Alex Sierra
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Avoid long term pinning for Coherent device type pages. This could
interfere with their own device memory manager. For now, we are just
returning error for PIN_LONGTERM Coherent device type pages. Eventually,
these type of pages will get migrated to system memory, once the device
migration pages support is added.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 mm/gup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index 886d6148d3d0..9c8a075d862d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1720,6 +1720,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 		 * If we get a movable page, since we are going to be pinning
 		 * these entries, try to move them out if possible.
 		 */
+		if (is_device_page(head)) {
+			WARN_ON_ONCE(is_device_private_page(head));
+			ret = -EFAULT;
+			goto unpin_pages;
+		}
+
 		if (!is_pinnable_page(head)) {
 			if (PageHuge(head)) {
 				if (!isolate_huge_page(head, &movable_page_list))
@@ -1750,6 +1756,7 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 	if (list_empty(&movable_page_list) && !isolation_error_count)
 		return nr_pages;
 
+unpin_pages:
 	if (gup_flags & FOLL_PIN) {
 		unpin_user_pages(pages, nr_pages);
 	} else {
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 04/10] drm/amdkfd: add SPM support for SVM
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (2 preceding siblings ...)
  2022-01-10 22:31 ` [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 05/10] drm/amdkfd: coherent type as sys mem on migration to ram Alex Sierra
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_COHERENT to create the device
page map region.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
v7:
Remove lookup_resource call, so export symbol for this function
is not longer required. Patch dropped "kernel: resource:
lookup_resource as exported symbol"
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 29 +++++++++++++++---------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index aeade32ec298..9e36fe8aea0f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -935,7 +935,7 @@ int svm_migrate_init(struct amdgpu_device *adev)
 {
 	struct kfd_dev *kfddev = adev->kfd.dev;
 	struct dev_pagemap *pgmap;
-	struct resource *res;
+	struct resource *res = NULL;
 	unsigned long size;
 	void *r;
 
@@ -950,28 +950,34 @@ int svm_migrate_init(struct amdgpu_device *adev)
 	 * should remove reserved size
 	 */
 	size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
-	res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
-	if (IS_ERR(res))
-		return -ENOMEM;
+	if (adev->gmc.xgmi.connected_to_cpu) {
+		pgmap->range.start = adev->gmc.aper_base;
+		pgmap->range.end = adev->gmc.aper_base + adev->gmc.aper_size - 1;
+		pgmap->type = MEMORY_DEVICE_COHERENT;
+	} else {
+		res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
+		if (IS_ERR(res))
+			return -ENOMEM;
+		pgmap->range.start = res->start;
+		pgmap->range.end = res->end;
+		pgmap->type = MEMORY_DEVICE_PRIVATE;
+	}
 
-	pgmap->type = MEMORY_DEVICE_PRIVATE;
 	pgmap->nr_range = 1;
-	pgmap->range.start = res->start;
-	pgmap->range.end = res->end;
 	pgmap->ops = &svm_migrate_pgmap_ops;
 	pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
-	pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
-
+	pgmap->flags = 0;
 	/* Device manager releases device-specific resources, memory region and
 	 * pgmap when driver disconnects from device.
 	 */
 	r = devm_memremap_pages(adev->dev, pgmap);
 	if (IS_ERR(r)) {
 		pr_err("failed to register HMM device memory\n");
-
 		/* Disable SVM support capability */
 		pgmap->type = 0;
-		devm_release_mem_region(adev->dev, res->start, resource_size(res));
+		if (pgmap->type == MEMORY_DEVICE_PRIVATE)
+			devm_release_mem_region(adev->dev, res->start,
+						res->end - res->start + 1);
 		return PTR_ERR(r);
 	}
 
@@ -984,3 +990,4 @@ int svm_migrate_init(struct amdgpu_device *adev)
 
 	return 0;
 }
+
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 05/10] drm/amdkfd: coherent type as sys mem on migration to ram
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (3 preceding siblings ...)
  2022-01-10 22:31 ` [PATCH v3 04/10] drm/amdkfd: add SPM support for SVM Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-10 22:31 ` [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Coherent device type memory on VRAM to RAM migration, has similar access
as System RAM from the CPU. This flag sets the source from the sender.
Which in Coherent type case, should be set as
MIGRATE_VMA_SELECT_DEVICE_COHERENT.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 9e36fe8aea0f..3e405f078ade 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -661,9 +661,12 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
 	migrate.vma = vma;
 	migrate.start = start;
 	migrate.end = end;
-	migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 	migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
 
+	if (adev->gmc.xgmi.connected_to_cpu)
+		migrate.flags = MIGRATE_VMA_SELECT_DEVICE_COHERENT;
+	else
+		migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
 	size = 2 * sizeof(*migrate.src) + sizeof(uint64_t) + sizeof(dma_addr_t);
 	size *= npages;
 	buf = kvmalloc(size, GFP_KERNEL | __GFP_ZERO);
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (4 preceding siblings ...)
  2022-01-10 22:31 ` [PATCH v3 05/10] drm/amdkfd: coherent type as sys mem on migration to ram Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-20  5:01   ` Alistair Popple
  2022-01-10 22:31 ` [PATCH v3 07/10] lib: test_hmm add module param for " Alex Sierra
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

new ioctl cmd added to query zone device type. This will be
used once the test_hmm adds zone device coherent type.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 lib/test_hmm.c      | 14 ++++++++++++++
 lib/test_hmm_uapi.h |  8 ++++++++
 2 files changed, 22 insertions(+)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index c259842f6d44..97e48164d56a 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -84,6 +84,7 @@ struct dmirror_chunk {
 struct dmirror_device {
 	struct cdev		cdevice;
 	struct hmm_devmem	*devmem;
+	unsigned int            zone_device_type;
 
 	unsigned int		devmem_capacity;
 	unsigned int		devmem_count;
@@ -470,6 +471,7 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 	if (IS_ERR(res))
 		goto err_devmem;
 
+	mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
 	devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
 	devmem->pagemap.range.start = res->start;
 	devmem->pagemap.range.end = res->end;
@@ -1025,6 +1027,15 @@ static int dmirror_snapshot(struct dmirror *dmirror,
 	return ret;
 }
 
+static int dmirror_get_device_type(struct dmirror *dmirror,
+			    struct hmm_dmirror_cmd *cmd)
+{
+	mutex_lock(&dmirror->mutex);
+	cmd->zone_device_type = dmirror->mdevice->zone_device_type;
+	mutex_unlock(&dmirror->mutex);
+
+	return 0;
+}
 static long dmirror_fops_unlocked_ioctl(struct file *filp,
 					unsigned int command,
 					unsigned long arg)
@@ -1075,6 +1086,9 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
 		ret = dmirror_snapshot(dmirror, &cmd);
 		break;
 
+	case HMM_DMIRROR_GET_MEM_DEV_TYPE:
+		ret = dmirror_get_device_type(dmirror, &cmd);
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index f14dea5dcd06..17f842f1aa02 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -19,6 +19,7 @@
  * @npages: (in) number of pages to read/write
  * @cpages: (out) number of pages copied
  * @faults: (out) number of device page faults seen
+ * @zone_device_type: (out) zone device memory type
  */
 struct hmm_dmirror_cmd {
 	__u64		addr;
@@ -26,6 +27,7 @@ struct hmm_dmirror_cmd {
 	__u64		npages;
 	__u64		cpages;
 	__u64		faults;
+	__u64		zone_device_type;
 };
 
 /* Expose the address space of the calling process through hmm device file */
@@ -35,6 +37,7 @@ struct hmm_dmirror_cmd {
 #define HMM_DMIRROR_SNAPSHOT		_IOWR('H', 0x03, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_EXCLUSIVE		_IOWR('H', 0x04, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_CHECK_EXCLUSIVE	_IOWR('H', 0x05, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_GET_MEM_DEV_TYPE	_IOWR('H', 0x06, struct hmm_dmirror_cmd)
 
 /*
  * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT.
@@ -62,4 +65,9 @@ enum {
 	HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE	= 0x30,
 };
 
+enum {
+	/* 0 is reserved to catch uninitialized type fields */
+	HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1,
+};
+
 #endif /* _LIB_TEST_HMM_UAPI_H */
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 07/10] lib: test_hmm add module param for zone device type
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (5 preceding siblings ...)
  2022-01-10 22:31 ` [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-20  5:23   ` Alistair Popple
  2022-01-10 22:31 ` [PATCH v3 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

In order to configure device coherent in test_hmm, two module parameters
should be passed, which correspond to the SP start address of each
device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed,
private device type is configured.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 lib/test_hmm.c      | 74 +++++++++++++++++++++++++++++++--------------
 lib/test_hmm_uapi.h |  1 +
 2 files changed, 53 insertions(+), 22 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 97e48164d56a..9edeff52302e 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -34,6 +34,16 @@
 #define DEVMEM_CHUNK_SIZE		(256 * 1024 * 1024U)
 #define DEVMEM_CHUNKS_RESERVE		16
 
+static unsigned long spm_addr_dev0;
+module_param(spm_addr_dev0, long, 0644);
+MODULE_PARM_DESC(spm_addr_dev0,
+		"Specify start address for SPM (special purpose memory) used for device 0. By setting this Coherent device type will be used. Make sure spm_addr_dev1 is set too");
+
+static unsigned long spm_addr_dev1;
+module_param(spm_addr_dev1, long, 0644);
+MODULE_PARM_DESC(spm_addr_dev1,
+		"Specify start address for SPM (special purpose memory) used for device 1. By setting this Coherent device type will be used. Make sure spm_addr_dev0 is set too");
+
 static const struct dev_pagemap_ops dmirror_devmem_ops;
 static const struct mmu_interval_notifier_ops dmirror_min_ops;
 static dev_t dmirror_dev;
@@ -452,29 +462,44 @@ static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd)
 	return ret;
 }
 
-static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
+static int dmirror_allocate_chunk(struct dmirror_device *mdevice,
 				   struct page **ppage)
 {
 	struct dmirror_chunk *devmem;
-	struct resource *res;
+	struct resource *res = NULL;
 	unsigned long pfn;
 	unsigned long pfn_first;
 	unsigned long pfn_last;
 	void *ptr;
+	int ret = -ENOMEM;
 
 	devmem = kzalloc(sizeof(*devmem), GFP_KERNEL);
 	if (!devmem)
-		return false;
+		return ret;
 
-	res = request_free_mem_region(&iomem_resource, DEVMEM_CHUNK_SIZE,
-				      "hmm_dmirror");
-	if (IS_ERR(res))
+	switch (mdevice->zone_device_type) {
+	case HMM_DMIRROR_MEMORY_DEVICE_PRIVATE:
+		res = request_free_mem_region(&iomem_resource, DEVMEM_CHUNK_SIZE,
+					      "hmm_dmirror");
+		if (IS_ERR_OR_NULL(res))
+			goto err_devmem;
+		devmem->pagemap.range.start = res->start;
+		devmem->pagemap.range.end = res->end;
+		devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
+		break;
+	case HMM_DMIRROR_MEMORY_DEVICE_COHERENT:
+		devmem->pagemap.range.start = (MINOR(mdevice->cdevice.dev) - 2) ?
+							spm_addr_dev0 :
+							spm_addr_dev1;
+		devmem->pagemap.range.end = devmem->pagemap.range.start +
+					    DEVMEM_CHUNK_SIZE - 1;
+		devmem->pagemap.type = MEMORY_DEVICE_COHERENT;
+		break;
+	default:
+		ret = -EINVAL;
 		goto err_devmem;
+	}
 
-	mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
-	devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
-	devmem->pagemap.range.start = res->start;
-	devmem->pagemap.range.end = res->end;
 	devmem->pagemap.nr_range = 1;
 	devmem->pagemap.ops = &dmirror_devmem_ops;
 	devmem->pagemap.owner = mdevice;
@@ -495,10 +520,14 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 		mdevice->devmem_capacity = new_capacity;
 		mdevice->devmem_chunks = new_chunks;
 	}
-
 	ptr = memremap_pages(&devmem->pagemap, numa_node_id());
-	if (IS_ERR(ptr))
+	if (IS_ERR_OR_NULL(ptr)) {
+		if (ptr)
+			ret = PTR_ERR(ptr);
+		else
+			ret = -EFAULT;
 		goto err_release;
+	}
 
 	devmem->mdevice = mdevice;
 	pfn_first = devmem->pagemap.range.start >> PAGE_SHIFT;
@@ -527,15 +556,17 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
 	}
 	spin_unlock(&mdevice->lock);
 
-	return true;
+	return 0;
 
 err_release:
 	mutex_unlock(&mdevice->devmem_lock);
-	release_mem_region(devmem->pagemap.range.start, range_len(&devmem->pagemap.range));
+	if (res && devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+		release_mem_region(devmem->pagemap.range.start,
+				   range_len(&devmem->pagemap.range));
 err_devmem:
 	kfree(devmem);
 
-	return false;
+	return ret;
 }
 
 static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
@@ -560,7 +591,7 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
 		spin_unlock(&mdevice->lock);
 	} else {
 		spin_unlock(&mdevice->lock);
-		if (!dmirror_allocate_chunk(mdevice, &dpage))
+		if (dmirror_allocate_chunk(mdevice, &dpage))
 			goto error;
 	}
 
@@ -1220,10 +1251,8 @@ static int dmirror_device_init(struct dmirror_device *mdevice, int id)
 	if (ret)
 		return ret;
 
-	/* Build a list of free ZONE_DEVICE private struct pages */
-	dmirror_allocate_chunk(mdevice, NULL);
-
-	return 0;
+	/* Build a list of free ZONE_DEVICE struct pages */
+	return dmirror_allocate_chunk(mdevice, NULL);
 }
 
 static void dmirror_device_remove(struct dmirror_device *mdevice)
@@ -1236,8 +1265,9 @@ static void dmirror_device_remove(struct dmirror_device *mdevice)
 				mdevice->devmem_chunks[i];
 
 			memunmap_pages(&devmem->pagemap);
-			release_mem_region(devmem->pagemap.range.start,
-					   range_len(&devmem->pagemap.range));
+			if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+				release_mem_region(devmem->pagemap.range.start,
+						   range_len(&devmem->pagemap.range));
 			kfree(devmem);
 		}
 		kfree(mdevice->devmem_chunks);
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index 17f842f1aa02..625f3690d086 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -68,6 +68,7 @@ enum {
 enum {
 	/* 0 is reserved to catch uninitialized type fields */
 	HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1,
+	HMM_DMIRROR_MEMORY_DEVICE_COHERENT,
 };
 
 #endif /* _LIB_TEST_HMM_UAPI_H */
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 08/10] lib: add support for device coherent type in test_hmm
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (6 preceding siblings ...)
  2022-01-10 22:31 ` [PATCH v3 07/10] lib: test_hmm add module param for " Alex Sierra
@ 2022-01-10 22:31 ` Alex Sierra
  2022-01-20  6:00   ` Alistair Popple
  2022-01-10 22:32 ` [PATCH v3 09/10] tools: update hmm-test to support device coherent type Alex Sierra
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:31 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Device Coherent type uses device memory that is coherently accesible by
the CPU. This could be shown as SP (special purpose) memory range
at the BIOS-e820 memory enumeration. If no SP memory is supported in
system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP.

Currently, test_hmm only supports two different SP ranges of at least
256MB size. This could be specified in the kernel parameter variable
efi_fake_mem. Ex. Two SP ranges of 1GB starting at 0x100000000 &
0x140000000 physical address. Ex.
efi_fake_mem=1G@0x100000000:0x40000,1G@0x140000000:0x40000

Private and coherent device mirror instances can be created in the same
probed. This is done by passing the module parameters spm_addr_dev0 &
spm_addr_dev1. In this case, it will create four instances of
device_mirror. The first two correspond to private device type, the
last two to coherent type. Then, they can be easily accessed from user
space through /dev/hmm_mirror<num_device>. Usually num_device 0 and 1
are for private, and 2 and 3 for coherent types. If no module
parameters are passed, two instances of private type device_mirror will
be created only.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 lib/test_hmm.c      | 247 ++++++++++++++++++++++++++++++++------------
 lib/test_hmm_uapi.h |  15 ++-
 2 files changed, 193 insertions(+), 69 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 9edeff52302e..7c641c5a9cfa 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -29,11 +29,22 @@
 
 #include "test_hmm_uapi.h"
 
-#define DMIRROR_NDEVICES		2
+#define DMIRROR_NDEVICES		4
 #define DMIRROR_RANGE_FAULT_TIMEOUT	1000
 #define DEVMEM_CHUNK_SIZE		(256 * 1024 * 1024U)
 #define DEVMEM_CHUNKS_RESERVE		16
 
+/*
+ * For device_private pages, dpage is just a dummy struct page
+ * representing a piece of device memory. dmirror_devmem_alloc_page
+ * allocates a real system memory page as backing storage to fake a
+ * real device. zone_device_data points to that backing page. But
+ * for device_coherent memory, the struct page represents real
+ * physical CPU-accessible memory that we can use directly.
+ */
+#define BACKING_PAGE(page) (is_device_private_page((page)) ? \
+			   (page)->zone_device_data : (page))
+
 static unsigned long spm_addr_dev0;
 module_param(spm_addr_dev0, long, 0644);
 MODULE_PARM_DESC(spm_addr_dev0,
@@ -122,6 +133,21 @@ static int dmirror_bounce_init(struct dmirror_bounce *bounce,
 	return 0;
 }
 
+static bool dmirror_is_private_zone(struct dmirror_device *mdevice)
+{
+	return (mdevice->zone_device_type ==
+		HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ? true : false;
+}
+
+static enum migrate_vma_direction
+	dmirror_select_device(struct dmirror *dmirror)
+{
+	return (dmirror->mdevice->zone_device_type ==
+		HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ?
+		MIGRATE_VMA_SELECT_DEVICE_PRIVATE :
+		MIGRATE_VMA_SELECT_DEVICE_COHERENT;
+}
+
 static void dmirror_bounce_fini(struct dmirror_bounce *bounce)
 {
 	vfree(bounce->ptr);
@@ -572,16 +598,19 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice,
 static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
 {
 	struct page *dpage = NULL;
-	struct page *rpage;
+	struct page *rpage = NULL;
 
 	/*
-	 * This is a fake device so we alloc real system memory to store
-	 * our device memory.
+	 * For ZONE_DEVICE private type, this is a fake device so we alloc real
+	 * system memory to store our device memory.
+	 * For ZONE_DEVICE coherent type we use the actual dpage to store the data
+	 * and ignore rpage.
 	 */
-	rpage = alloc_page(GFP_HIGHUSER);
-	if (!rpage)
-		return NULL;
-
+	if (dmirror_is_private_zone(mdevice)) {
+		rpage = alloc_page(GFP_HIGHUSER);
+		if (!rpage)
+			return NULL;
+	}
 	spin_lock(&mdevice->lock);
 
 	if (mdevice->free_pages) {
@@ -601,7 +630,8 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
 	return dpage;
 
 error:
-	__free_page(rpage);
+	if (rpage)
+		__free_page(rpage);
 	return NULL;
 }
 
@@ -627,12 +657,15 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
 		 * unallocated pte_none() or read-only zero page.
 		 */
 		spage = migrate_pfn_to_page(*src);
+		WARN(spage && is_zone_device_page(spage),
+		     "page already in device spage pfn: 0x%lx\n",
+		     page_to_pfn(spage));
 
 		dpage = dmirror_devmem_alloc_page(mdevice);
 		if (!dpage)
 			continue;
 
-		rpage = dpage->zone_device_data;
+		rpage = BACKING_PAGE(dpage);
 		if (spage)
 			copy_highpage(rpage, spage);
 		else
@@ -646,6 +679,8 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
 		 */
 		rpage->zone_device_data = dmirror;
 
+		pr_debug("migrating from sys to dev pfn src: 0x%lx pfn dst: 0x%lx\n",
+			 page_to_pfn(spage), page_to_pfn(dpage));
 		*dst = migrate_pfn(page_to_pfn(dpage)) |
 			    MIGRATE_PFN_LOCKED;
 		if ((*src & MIGRATE_PFN_WRITE) ||
@@ -724,11 +759,7 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
 		if (!dpage)
 			continue;
 
-		/*
-		 * Store the page that holds the data so the page table
-		 * doesn't have to deal with ZONE_DEVICE private pages.
-		 */
-		entry = dpage->zone_device_data;
+		entry = BACKING_PAGE(dpage);
 		if (*dst & MIGRATE_PFN_WRITE)
 			entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE);
 		entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC);
@@ -808,8 +839,101 @@ static int dmirror_exclusive(struct dmirror *dmirror,
 	return ret;
 }
 
-static int dmirror_migrate(struct dmirror *dmirror,
-			   struct hmm_dmirror_cmd *cmd)
+static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
+						      struct dmirror *dmirror)
+{
+	const unsigned long *src = args->src;
+	unsigned long *dst = args->dst;
+	unsigned long start = args->start;
+	unsigned long end = args->end;
+	unsigned long addr;
+
+	for (addr = start; addr < end; addr += PAGE_SIZE,
+				       src++, dst++) {
+		struct page *dpage, *spage;
+
+		spage = migrate_pfn_to_page(*src);
+		if (!spage || !(*src & MIGRATE_PFN_MIGRATE))
+			continue;
+
+		WARN_ON(!is_device_page(spage));
+		spage = BACKING_PAGE(spage);
+		dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr);
+		if (!dpage)
+			continue;
+		pr_debug("migrating from dev to sys pfn src: 0x%lx pfn dst: 0x%lx\n",
+			 page_to_pfn(spage), page_to_pfn(dpage));
+
+		lock_page(dpage);
+		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
+		copy_highpage(dpage, spage);
+		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+		if (*src & MIGRATE_PFN_WRITE)
+			*dst |= MIGRATE_PFN_WRITE;
+	}
+	return 0;
+}
+
+static int dmirror_migrate_to_system(struct dmirror *dmirror,
+				     struct hmm_dmirror_cmd *cmd)
+{
+	unsigned long start, end, addr;
+	unsigned long size = cmd->npages << PAGE_SHIFT;
+	struct mm_struct *mm = dmirror->notifier.mm;
+	struct vm_area_struct *vma;
+	unsigned long src_pfns[64];
+	unsigned long dst_pfns[64];
+	struct migrate_vma args;
+	unsigned long next;
+	int ret;
+
+	start = cmd->addr;
+	end = start + size;
+	if (end < start)
+		return -EINVAL;
+
+	/* Since the mm is for the mirrored process, get a reference first. */
+	if (!mmget_not_zero(mm))
+		return -EINVAL;
+
+	mmap_read_lock(mm);
+	for (addr = start; addr < end; addr = next) {
+		vma = vma_lookup(mm, addr);
+		if (!vma || !(vma->vm_flags & VM_READ)) {
+			ret = -EINVAL;
+			goto out;
+		}
+		next = min(end, addr + (ARRAY_SIZE(src_pfns) << PAGE_SHIFT));
+		if (next > vma->vm_end)
+			next = vma->vm_end;
+
+		args.vma = vma;
+		args.src = src_pfns;
+		args.dst = dst_pfns;
+		args.start = addr;
+		args.end = next;
+		args.pgmap_owner = dmirror->mdevice;
+		args.flags = dmirror_select_device(dmirror);
+
+		ret = migrate_vma_setup(&args);
+		if (ret)
+			goto out;
+
+		pr_debug("Migrating from device mem to sys mem\n");
+		dmirror_devmem_fault_alloc_and_copy(&args, dmirror);
+
+		migrate_vma_pages(&args);
+		migrate_vma_finalize(&args);
+	}
+out:
+	mmap_read_unlock(mm);
+	mmput(mm);
+
+	return ret;
+}
+
+static int dmirror_migrate_to_device(struct dmirror *dmirror,
+				struct hmm_dmirror_cmd *cmd)
 {
 	unsigned long start, end, addr;
 	unsigned long size = cmd->npages << PAGE_SHIFT;
@@ -853,6 +977,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
 		if (ret)
 			goto out;
 
+		pr_debug("Migrating from sys mem to device mem\n");
 		dmirror_migrate_alloc_and_copy(&args, dmirror);
 		migrate_vma_pages(&args);
 		dmirror_migrate_finalize_and_map(&args, dmirror);
@@ -861,7 +986,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
 	mmap_read_unlock(mm);
 	mmput(mm);
 
-	/* Return the migrated data for verification. */
+	/* Return the migrated data for verification. only for pages in device zone */
 	ret = dmirror_bounce_init(&bounce, start, size);
 	if (ret)
 		return ret;
@@ -898,12 +1023,22 @@ static void dmirror_mkentry(struct dmirror *dmirror, struct hmm_range *range,
 	}
 
 	page = hmm_pfn_to_page(entry);
-	if (is_device_private_page(page)) {
-		/* Is the page migrated to this device or some other? */
-		if (dmirror->mdevice == dmirror_page_to_device(page))
+	if (is_device_page(page)) {
+		/* Is page ZONE_DEVICE coherent? */
+		if (!is_device_private_page(page)) {
+			if (dmirror->mdevice == dmirror_page_to_device(page))
+				*perm = HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL;
+			else
+				*perm = HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE;
+		/*
+		 * Is page ZONE_DEVICE private migrated to
+		 * this device or some other?
+		 */
+		} else if (dmirror->mdevice == dmirror_page_to_device(page)) {
 			*perm = HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL;
-		else
+		} else {
 			*perm = HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE;
+		}
 	} else if (is_zero_pfn(page_to_pfn(page)))
 		*perm = HMM_DMIRROR_PROT_ZERO;
 	else
@@ -1100,8 +1235,12 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
 		ret = dmirror_write(dmirror, &cmd);
 		break;
 
-	case HMM_DMIRROR_MIGRATE:
-		ret = dmirror_migrate(dmirror, &cmd);
+	case HMM_DMIRROR_MIGRATE_TO_DEV:
+		ret = dmirror_migrate_to_device(dmirror, &cmd);
+		break;
+
+	case HMM_DMIRROR_MIGRATE_TO_SYS:
+		ret = dmirror_migrate_to_system(dmirror, &cmd);
 		break;
 
 	case HMM_DMIRROR_EXCLUSIVE:
@@ -1142,14 +1281,13 @@ static const struct file_operations dmirror_fops = {
 
 static void dmirror_devmem_free(struct page *page)
 {
-	struct page *rpage = page->zone_device_data;
+	struct page *rpage = BACKING_PAGE(page);
 	struct dmirror_device *mdevice;
 
-	if (rpage)
+	if (rpage != page)
 		__free_page(rpage);
 
 	mdevice = dmirror_page_to_device(page);
-
 	spin_lock(&mdevice->lock);
 	mdevice->cfree++;
 	page->zone_device_data = mdevice->free_pages;
@@ -1157,38 +1295,6 @@ static void dmirror_devmem_free(struct page *page)
 	spin_unlock(&mdevice->lock);
 }
 
-static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
-						      struct dmirror *dmirror)
-{
-	const unsigned long *src = args->src;
-	unsigned long *dst = args->dst;
-	unsigned long start = args->start;
-	unsigned long end = args->end;
-	unsigned long addr;
-
-	for (addr = start; addr < end; addr += PAGE_SIZE,
-				       src++, dst++) {
-		struct page *dpage, *spage;
-
-		spage = migrate_pfn_to_page(*src);
-		if (!spage || !(*src & MIGRATE_PFN_MIGRATE))
-			continue;
-		spage = spage->zone_device_data;
-
-		dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr);
-		if (!dpage)
-			continue;
-
-		lock_page(dpage);
-		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
-		copy_highpage(dpage, spage);
-		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
-		if (*src & MIGRATE_PFN_WRITE)
-			*dst |= MIGRATE_PFN_WRITE;
-	}
-	return 0;
-}
-
 static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 {
 	struct migrate_vma args;
@@ -1203,7 +1309,7 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 	 * the mirror but here we use it to hold the page for the simulated
 	 * device memory and that page holds the pointer to the mirror.
 	 */
-	rpage = vmf->page->zone_device_data;
+	rpage = BACKING_PAGE(vmf->page);
 	dmirror = rpage->zone_device_data;
 
 	/* FIXME demonstrate how we can adjust migrate range */
@@ -1213,7 +1319,7 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
 	args.src = &src_pfns;
 	args.dst = &dst_pfns;
 	args.pgmap_owner = dmirror->mdevice;
-	args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+	args.flags = dmirror_select_device(dmirror);
 
 	if (migrate_vma_setup(&args))
 		return VM_FAULT_SIGBUS;
@@ -1279,14 +1385,26 @@ static void dmirror_device_remove(struct dmirror_device *mdevice)
 static int __init hmm_dmirror_init(void)
 {
 	int ret;
-	int id;
+	int id = 0;
+	int ndevices = 0;
 
 	ret = alloc_chrdev_region(&dmirror_dev, 0, DMIRROR_NDEVICES,
 				  "HMM_DMIRROR");
 	if (ret)
 		goto err_unreg;
 
-	for (id = 0; id < DMIRROR_NDEVICES; id++) {
+	memset(dmirror_devices, 0, DMIRROR_NDEVICES * sizeof(dmirror_devices[0]));
+	dmirror_devices[ndevices++].zone_device_type =
+				HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
+	dmirror_devices[ndevices++].zone_device_type =
+				HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
+	if (spm_addr_dev0 && spm_addr_dev1) {
+		dmirror_devices[ndevices++].zone_device_type =
+					HMM_DMIRROR_MEMORY_DEVICE_COHERENT;
+		dmirror_devices[ndevices++].zone_device_type =
+					HMM_DMIRROR_MEMORY_DEVICE_COHERENT;
+	}
+	for (id = 0; id < ndevices; id++) {
 		ret = dmirror_device_init(dmirror_devices + id, id);
 		if (ret)
 			goto err_chrdev;
@@ -1308,7 +1426,8 @@ static void __exit hmm_dmirror_exit(void)
 	int id;
 
 	for (id = 0; id < DMIRROR_NDEVICES; id++)
-		dmirror_device_remove(dmirror_devices + id);
+		if (dmirror_devices[id].zone_device_type)
+			dmirror_device_remove(dmirror_devices + id);
 	unregister_chrdev_region(dmirror_dev, DMIRROR_NDEVICES);
 }
 
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index 625f3690d086..e190b2ab6f19 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -33,11 +33,12 @@ struct hmm_dmirror_cmd {
 /* Expose the address space of the calling process through hmm device file */
 #define HMM_DMIRROR_READ		_IOWR('H', 0x00, struct hmm_dmirror_cmd)
 #define HMM_DMIRROR_WRITE		_IOWR('H', 0x01, struct hmm_dmirror_cmd)
-#define HMM_DMIRROR_MIGRATE		_IOWR('H', 0x02, struct hmm_dmirror_cmd)
-#define HMM_DMIRROR_SNAPSHOT		_IOWR('H', 0x03, struct hmm_dmirror_cmd)
-#define HMM_DMIRROR_EXCLUSIVE		_IOWR('H', 0x04, struct hmm_dmirror_cmd)
-#define HMM_DMIRROR_CHECK_EXCLUSIVE	_IOWR('H', 0x05, struct hmm_dmirror_cmd)
-#define HMM_DMIRROR_GET_MEM_DEV_TYPE	_IOWR('H', 0x06, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_MIGRATE_TO_DEV	_IOWR('H', 0x02, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_MIGRATE_TO_SYS	_IOWR('H', 0x03, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_SNAPSHOT		_IOWR('H', 0x04, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_EXCLUSIVE		_IOWR('H', 0x05, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_CHECK_EXCLUSIVE	_IOWR('H', 0x06, struct hmm_dmirror_cmd)
+#define HMM_DMIRROR_GET_MEM_DEV_TYPE	_IOWR('H', 0x07, struct hmm_dmirror_cmd)
 
 /*
  * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT.
@@ -52,6 +53,8 @@ struct hmm_dmirror_cmd {
  *					device the ioctl() is made
  * HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE: Migrated device private page on some
  *					other device
+ * HMM_DMIRROR_PROT_DEV_COHERENT: Migrate device coherent page on the device
+ *				  the ioctl() is made
  */
 enum {
 	HMM_DMIRROR_PROT_ERROR			= 0xFF,
@@ -63,6 +66,8 @@ enum {
 	HMM_DMIRROR_PROT_ZERO			= 0x10,
 	HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL	= 0x20,
 	HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE	= 0x30,
+	HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL	= 0x40,
+	HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE	= 0x50,
 };
 
 enum {
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 09/10] tools: update hmm-test to support device coherent type
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (7 preceding siblings ...)
  2022-01-10 22:31 ` [PATCH v3 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
@ 2022-01-10 22:32 ` Alex Sierra
  2022-01-20  6:14   ` Alistair Popple
  2022-01-10 22:32 ` [PATCH v3 10/10] tools: update test_hmm script to support SP config Alex Sierra
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:32 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Test cases such as migrate_fault and migrate_multiple, were modified to
explicit migrate from device to sys memory without the need of page
faults, when using device coherent type.

Snapshot test case updated to read memory device type first and based
on that, get the proper returned results migrate_ping_pong test case
added to test explicit migration from device to sys memory for both
private and coherent zone types.

Helpers to migrate from device to sys memory and vicerversa
were also added.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
v2:
Set FIXTURE_VARIANT to add multiple device types to the FIXTURE. This
will run all the tests for each device type (private and coherent) in
case both existed during hmm-test driver probed.
---
 tools/testing/selftests/vm/hmm-tests.c | 122 ++++++++++++++++++++-----
 1 file changed, 101 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
index 864f126ffd78..8eb81dfba4b3 100644
--- a/tools/testing/selftests/vm/hmm-tests.c
+++ b/tools/testing/selftests/vm/hmm-tests.c
@@ -44,6 +44,14 @@ struct hmm_buffer {
 	int		fd;
 	uint64_t	cpages;
 	uint64_t	faults;
+	int		zone_device_type;
+};
+
+enum {
+	HMM_PRIVATE_DEVICE_ONE,
+	HMM_PRIVATE_DEVICE_TWO,
+	HMM_COHERENCE_DEVICE_ONE,
+	HMM_COHERENCE_DEVICE_TWO,
 };
 
 #define TWOMEG		(1 << 21)
@@ -60,6 +68,21 @@ FIXTURE(hmm)
 	unsigned int	page_shift;
 };
 
+FIXTURE_VARIANT(hmm)
+{
+	int     device_number;
+};
+
+FIXTURE_VARIANT_ADD(hmm, hmm_device_private)
+{
+	.device_number = HMM_PRIVATE_DEVICE_ONE,
+};
+
+FIXTURE_VARIANT_ADD(hmm, hmm_device_coherent)
+{
+	.device_number = HMM_COHERENCE_DEVICE_ONE,
+};
+
 FIXTURE(hmm2)
 {
 	int		fd0;
@@ -68,6 +91,24 @@ FIXTURE(hmm2)
 	unsigned int	page_shift;
 };
 
+FIXTURE_VARIANT(hmm2)
+{
+	int     device_number0;
+	int     device_number1;
+};
+
+FIXTURE_VARIANT_ADD(hmm2, hmm2_device_private)
+{
+	.device_number0 = HMM_PRIVATE_DEVICE_ONE,
+	.device_number1 = HMM_PRIVATE_DEVICE_TWO,
+};
+
+FIXTURE_VARIANT_ADD(hmm2, hmm2_device_coherent)
+{
+	.device_number0 = HMM_COHERENCE_DEVICE_ONE,
+	.device_number1 = HMM_COHERENCE_DEVICE_TWO,
+};
+
 static int hmm_open(int unit)
 {
 	char pathname[HMM_PATH_MAX];
@@ -81,12 +122,19 @@ static int hmm_open(int unit)
 	return fd;
 }
 
+static bool hmm_is_coherent_type(int dev_num)
+{
+	return (dev_num >= HMM_COHERENCE_DEVICE_ONE);
+}
+
 FIXTURE_SETUP(hmm)
 {
 	self->page_size = sysconf(_SC_PAGE_SIZE);
 	self->page_shift = ffs(self->page_size) - 1;
 
-	self->fd = hmm_open(0);
+	self->fd = hmm_open(variant->device_number);
+	if (self->fd < 0 && hmm_is_coherent_type(variant->device_number))
+		SKIP(exit(0), "DEVICE_COHERENT not available");
 	ASSERT_GE(self->fd, 0);
 }
 
@@ -95,9 +143,11 @@ FIXTURE_SETUP(hmm2)
 	self->page_size = sysconf(_SC_PAGE_SIZE);
 	self->page_shift = ffs(self->page_size) - 1;
 
-	self->fd0 = hmm_open(0);
+	self->fd0 = hmm_open(variant->device_number0);
+	if (self->fd0 < 0 && hmm_is_coherent_type(variant->device_number0))
+		SKIP(exit(0), "DEVICE_COHERENT not available");
 	ASSERT_GE(self->fd0, 0);
-	self->fd1 = hmm_open(1);
+	self->fd1 = hmm_open(variant->device_number1);
 	ASSERT_GE(self->fd1, 0);
 }
 
@@ -144,6 +194,7 @@ static int hmm_dmirror_cmd(int fd,
 	}
 	buffer->cpages = cmd.cpages;
 	buffer->faults = cmd.faults;
+	buffer->zone_device_type = cmd.zone_device_type;
 
 	return 0;
 }
@@ -211,6 +262,20 @@ static void hmm_nanosleep(unsigned int n)
 	nanosleep(&t, NULL);
 }
 
+static int hmm_migrate_sys_to_dev(int fd,
+				   struct hmm_buffer *buffer,
+				   unsigned long npages)
+{
+	return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_DEV, buffer, npages);
+}
+
+static int hmm_migrate_dev_to_sys(int fd,
+				   struct hmm_buffer *buffer,
+				   unsigned long npages)
+{
+	return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_SYS, buffer, npages);
+}
+
 /*
  * Simple NULL test of device open/close.
  */
@@ -875,7 +940,7 @@ TEST_F(hmm, migrate)
 		ptr[i] = i;
 
 	/* Migrate memory to device. */
-	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, npages);
 
@@ -923,7 +988,7 @@ TEST_F(hmm, migrate_fault)
 		ptr[i] = i;
 
 	/* Migrate memory to device. */
-	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, npages);
 
@@ -936,7 +1001,7 @@ TEST_F(hmm, migrate_fault)
 		ASSERT_EQ(ptr[i], i);
 
 	/* Migrate memory to the device again. */
-	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, npages);
 
@@ -976,7 +1041,7 @@ TEST_F(hmm, migrate_shared)
 	ASSERT_NE(buffer->ptr, MAP_FAILED);
 
 	/* Migrate memory to device. */
-	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
+	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
 	ASSERT_EQ(ret, -ENOENT);
 
 	hmm_buffer_free(buffer);
@@ -1015,7 +1080,7 @@ TEST_F(hmm2, migrate_mixed)
 	p = buffer->ptr;
 
 	/* Migrating a protected area should be an error. */
-	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, npages);
+	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, npages);
 	ASSERT_EQ(ret, -EINVAL);
 
 	/* Punch a hole after the first page address. */
@@ -1023,7 +1088,7 @@ TEST_F(hmm2, migrate_mixed)
 	ASSERT_EQ(ret, 0);
 
 	/* We expect an error if the vma doesn't cover the range. */
-	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 3);
+	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 3);
 	ASSERT_EQ(ret, -EINVAL);
 
 	/* Page 2 will be a read-only zero page. */
@@ -1055,13 +1120,13 @@ TEST_F(hmm2, migrate_mixed)
 
 	/* Now try to migrate pages 2-5 to device 1. */
 	buffer->ptr = p + 2 * self->page_size;
-	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 4);
+	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 4);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, 4);
 
 	/* Page 5 won't be migrated to device 0 because it's on device 1. */
 	buffer->ptr = p + 5 * self->page_size;
-	ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1);
+	ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
 	ASSERT_EQ(ret, -ENOENT);
 	buffer->ptr = p;
 
@@ -1070,8 +1135,12 @@ TEST_F(hmm2, migrate_mixed)
 }
 
 /*
- * Migrate anonymous memory to device private memory and fault it back to system
- * memory multiple times.
+ * Migrate anonymous memory to device memory and back to system memory
+ * multiple times. In case of private zone configuration, this is done
+ * through fault pages accessed by CPU. In case of coherent zone configuration,
+ * the pages from the device should be explicitly migrated back to system memory.
+ * The reason is Coherent device zone has coherent access by CPU, therefore
+ * it will not generate any page fault.
  */
 TEST_F(hmm, migrate_multiple)
 {
@@ -1107,8 +1176,7 @@ TEST_F(hmm, migrate_multiple)
 			ptr[i] = i;
 
 		/* Migrate memory to device. */
-		ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer,
-				      npages);
+		ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
 		ASSERT_EQ(ret, 0);
 		ASSERT_EQ(buffer->cpages, npages);
 
@@ -1116,7 +1184,12 @@ TEST_F(hmm, migrate_multiple)
 		for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
 			ASSERT_EQ(ptr[i], i);
 
-		/* Fault pages back to system memory and check them. */
+		/* Migrate back to system memory and check them. */
+		if (hmm_is_coherent_type(variant->device_number)) {
+			ret = hmm_migrate_dev_to_sys(self->fd, buffer, npages);
+			ASSERT_EQ(ret, 0);
+		}
+
 		for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
 			ASSERT_EQ(ptr[i], i);
 
@@ -1312,13 +1385,13 @@ TEST_F(hmm2, snapshot)
 
 	/* Page 5 will be migrated to device 0. */
 	buffer->ptr = p + 5 * self->page_size;
-	ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1);
+	ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, 1);
 
 	/* Page 6 will be migrated to device 1. */
 	buffer->ptr = p + 6 * self->page_size;
-	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 1);
+	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 1);
 	ASSERT_EQ(ret, 0);
 	ASSERT_EQ(buffer->cpages, 1);
 
@@ -1335,9 +1408,16 @@ TEST_F(hmm2, snapshot)
 	ASSERT_EQ(m[2], HMM_DMIRROR_PROT_ZERO | HMM_DMIRROR_PROT_READ);
 	ASSERT_EQ(m[3], HMM_DMIRROR_PROT_READ);
 	ASSERT_EQ(m[4], HMM_DMIRROR_PROT_WRITE);
-	ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL |
-			HMM_DMIRROR_PROT_WRITE);
-	ASSERT_EQ(m[6], HMM_DMIRROR_PROT_NONE);
+	if (!hmm_is_coherent_type(variant->device_number0)) {
+		ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL |
+				HMM_DMIRROR_PROT_WRITE);
+		ASSERT_EQ(m[6], HMM_DMIRROR_PROT_NONE);
+	} else {
+		ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL |
+				HMM_DMIRROR_PROT_WRITE);
+		ASSERT_EQ(m[6], HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE |
+				HMM_DMIRROR_PROT_WRITE);
+	}
 
 	hmm_buffer_free(buffer);
 }
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 10/10] tools: update test_hmm script to support SP config
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (8 preceding siblings ...)
  2022-01-10 22:32 ` [PATCH v3 09/10] tools: update hmm-test to support device coherent type Alex Sierra
@ 2022-01-10 22:32 ` Alex Sierra
  2022-01-20  6:17   ` Alistair Popple
  2022-01-12 11:06 ` [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alistair Popple
  2022-01-12 11:16 ` David Hildenbrand
  11 siblings, 1 reply; 24+ messages in thread
From: Alex Sierra @ 2022-01-10 22:32 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Add two more parameters to set spm_addr_dev0 & spm_addr_dev1
addresses. These two parameters configure the start SP
addresses for each device in test_hmm driver.
Consequently, this configures zone device type as coherent.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
v2:
Add more mknods for device coherent type. These are represented under
/dev/hmm_mirror2 and /dev/hmm_mirror3, only in case they have created
at probing the hmm-test driver.
---
 tools/testing/selftests/vm/test_hmm.sh | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/test_hmm.sh b/tools/testing/selftests/vm/test_hmm.sh
index 0647b525a625..539c9371e592 100755
--- a/tools/testing/selftests/vm/test_hmm.sh
+++ b/tools/testing/selftests/vm/test_hmm.sh
@@ -40,11 +40,26 @@ check_test_requirements()
 
 load_driver()
 {
-	modprobe $DRIVER > /dev/null 2>&1
+	if [ $# -eq 0 ]; then
+		modprobe $DRIVER > /dev/null 2>&1
+	else
+		if [ $# -eq 2 ]; then
+			modprobe $DRIVER spm_addr_dev0=$1 spm_addr_dev1=$2
+				> /dev/null 2>&1
+		else
+			echo "Missing module parameters. Make sure pass"\
+			"spm_addr_dev0 and spm_addr_dev1"
+			usage
+		fi
+	fi
 	if [ $? == 0 ]; then
 		major=$(awk "\$2==\"HMM_DMIRROR\" {print \$1}" /proc/devices)
 		mknod /dev/hmm_dmirror0 c $major 0
 		mknod /dev/hmm_dmirror1 c $major 1
+		if [ $# -eq 2 ]; then
+			mknod /dev/hmm_dmirror2 c $major 2
+			mknod /dev/hmm_dmirror3 c $major 3
+		fi
 	fi
 }
 
@@ -58,7 +73,7 @@ run_smoke()
 {
 	echo "Running smoke test. Note, this test provides basic coverage."
 
-	load_driver
+	load_driver $1 $2
 	$(dirname "${BASH_SOURCE[0]}")/hmm-tests
 	unload_driver
 }
@@ -75,6 +90,9 @@ usage()
 	echo "# Smoke testing"
 	echo "./${TEST_NAME}.sh smoke"
 	echo
+	echo "# Smoke testing with SPM enabled"
+	echo "./${TEST_NAME}.sh smoke <spm_addr_dev0> <spm_addr_dev1>"
+	echo
 	exit 0
 }
 
@@ -84,7 +102,7 @@ function run_test()
 		usage
 	else
 		if [ "$1" = "smoke" ]; then
-			run_smoke
+			run_smoke $2 $3
 		else
 			usage
 		fi
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (9 preceding siblings ...)
  2022-01-10 22:32 ` [PATCH v3 10/10] tools: update test_hmm script to support SP config Alex Sierra
@ 2022-01-12 11:06 ` Alistair Popple
  2022-01-20  6:33   ` Alistair Popple
  2022-01-12 11:16 ` David Hildenbrand
  11 siblings, 1 reply; 24+ messages in thread
From: Alistair Popple @ 2022-01-12 11:06 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

I have been looking at this in relation to the migration code and noticed we
have the following in try_to_migrate():

        if (is_zone_device_page(page) && !is_device_private_page(page))
                return;

Which if I'm understanding correctly means that migration of device coherent
pages will always fail. Given that I do wonder how hmm-tests are passing, but
I assume you must always be hitting this fast path in
migrate_vma_collect_pmd():

                /*
                 * Optimize for the common case where page is only mapped once
                 * in one process. If we can lock the page, then we can safely
                 * set up a special migration page table entry now.
                 */

Meaning that try_to_migrate() never gets called from migrate_vma_unmap(). So
you will also need some changes to try_to_migrate() and possibly
try_to_migrate_one() to make this reliable.

 - Alistair

On Tuesday, 11 January 2022 9:31:51 AM AEDT Alex Sierra wrote:
> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> owned by a device that can be mapped into CPU page tables like
> MEMORY_DEVICE_GENERIC and can also be migrated like
> MEMORY_DEVICE_PRIVATE.
> 
> Christoph, the suggestion to incorporate Ralph Campbell’s refcount
> cleanup patch into our hardware page migration patchset originally came
> from you, but it proved impractical to do things in that order because
> the refcount cleanup introduced a bug with wide ranging structural
> implications. Instead, we amended Ralph’s patch so that it could be
> applied after merging the migration work. As we saw from the recent
> discussion, merging the refcount work is going to take some time and
> cooperation between multiple development groups, while the migration
> work is ready now and is needed now. So we propose to merge this
> patchset first and continue to work with Ralph and others to merge the
> refcount cleanup separately, when it is ready.
> 
> This patch series is mostly self-contained except for a few places where
> it needs to update other subsystems to handle the new memory type.
> System stability and performance are not affected according to our
> ongoing testing, including xfstests.
> 
> How it works: The system BIOS advertises the GPU device memory
> (aka VRAM) as SPM (special purpose memory) in the UEFI system address
> map.
> 
> The amdgpu driver registers the memory with devmap as
> MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
> this hardware page migration capability is the Frontier supercomputer
> project. This functionality is not AMD-specific. We expect other GPU
> vendors to find this functionality useful, and possibly other hardware
> types in the future.
> 
> Our test nodes in the lab are similar to the Frontier configuration,
> with .5 TB of system memory plus 256 GB of device memory split across
> 4 GPUs, all in a single coherent address space. Page migration is
> expected to improve application efficiency significantly. We will
> report empirical results as they become available.
> 
> We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This
> patch set builds on HMM and our SVM memory manager already merged in
> 5.15.
> 
> v2:
> - test_hmm is now able to create private and coherent device mirror
> instances in the same driver probe. This adds more usability to the hmm
> test by not having to remove the kernel module for each device type
> test (private/coherent type). This is done by passing the module
> parameters spm_addr_dev0 & spm_addr_dev1. In this case, it will create
> four instances of device_mirror. The first two correspond to private
> device type, the last two to coherent type. Then, they can be easily
> accessed from user space through /dev/hmm_mirror<num_device>. Usually
> num_device 0 and 1 are for private, and 2 and 3 for coherent types.
> 
> - Coherent device type pages at gup are now migrated back to system
> memory if they have been long term pinned (FOLL_LONGTERM). The reason
> is these pages could eventually interfere with their own device memory
> manager. A new hmm_gup_test has been added to the hmm-test to test this
> functionality. It makes use of the gup_test module to long term pin
> user pages that have been migrate to device memory first.
> 
> - Other patch corrections made by Felix, Alistair and Christoph.
> 
> v3:
> - Based on last v2 feedback we got from Alistair, we've decided to
> remove migration logic for FOLL_LONGTERM coherent device type pages at
> gup for now. Ideally, this should be done through the kernel mm,
> instead of calling the device driver to do it. Currently, there's no
> support for migrating device pages based on pfn, mainly because
> migrate_pages() relies on pages being LRU pages. Alistair mentioned, he
> has started to work on adding this migrate device pages logic. For now,
> we fail on get_user_pages call with FOLL_LONGTERM for DEVICE_COHERENT
> pages.
> 
> - Also, hmm_gup_test has been removed from hmm-test. We plan to include
> it again after this migration work is ready.
> 
> - Addressed Liam Howlett's feedback changes.
> 
> Alex Sierra (10):
>   mm: add zone device coherent type memory support
>   mm: add device coherent vma selection for memory migration
>   mm/gup: fail get_user_pages for LONGTERM dev coherent type
>   drm/amdkfd: add SPM support for SVM
>   drm/amdkfd: coherent type as sys mem on migration to ram
>   lib: test_hmm add ioctl to get zone device type
>   lib: test_hmm add module param for zone device type
>   lib: add support for device coherent type in test_hmm
>   tools: update hmm-test to support device coherent type
>   tools: update test_hmm script to support SP config
> 
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  34 ++-
>  include/linux/memremap.h                 |   8 +
>  include/linux/migrate.h                  |   1 +
>  include/linux/mm.h                       |  16 ++
>  lib/test_hmm.c                           | 333 +++++++++++++++++------
>  lib/test_hmm_uapi.h                      |  22 +-
>  mm/gup.c                                 |   7 +
>  mm/memcontrol.c                          |   6 +-
>  mm/memory-failure.c                      |   8 +-
>  mm/memremap.c                            |   5 +-
>  mm/migrate.c                             |  30 +-
>  tools/testing/selftests/vm/hmm-tests.c   | 122 +++++++--
>  tools/testing/selftests/vm/test_hmm.sh   |  24 +-
>  13 files changed, 475 insertions(+), 141 deletions(-)
> 
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
  2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
                   ` (10 preceding siblings ...)
  2022-01-12 11:06 ` [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alistair Popple
@ 2022-01-12 11:16 ` David Hildenbrand
  2022-01-12 16:08   ` Felix Kuehling
  11 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2022-01-12 11:16 UTC (permalink / raw)
  To: Alex Sierra, akpm, Felix.Kuehling, linux-mm, rcampbell,
	linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

On 10.01.22 23:31, Alex Sierra wrote:
> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> owned by a device that can be mapped into CPU page tables like
> MEMORY_DEVICE_GENERIC and can also be migrated like
> MEMORY_DEVICE_PRIVATE.
> 
> Christoph, the suggestion to incorporate Ralph Campbell’s refcount
> cleanup patch into our hardware page migration patchset originally came
> from you, but it proved impractical to do things in that order because
> the refcount cleanup introduced a bug with wide ranging structural
> implications. Instead, we amended Ralph’s patch so that it could be
> applied after merging the migration work. As we saw from the recent
> discussion, merging the refcount work is going to take some time and
> cooperation between multiple development groups, while the migration
> work is ready now and is needed now. So we propose to merge this
> patchset first and continue to work with Ralph and others to merge the
> refcount cleanup separately, when it is ready.
> 
> This patch series is mostly self-contained except for a few places where
> it needs to update other subsystems to handle the new memory type.
> System stability and performance are not affected according to our
> ongoing testing, including xfstests.
> 
> How it works: The system BIOS advertises the GPU device memory
> (aka VRAM) as SPM (special purpose memory) in the UEFI system address
> map.
> 
> The amdgpu driver registers the memory with devmap as
> MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
> this hardware page migration capability is the Frontier supercomputer
> project. This functionality is not AMD-specific. We expect other GPU
> vendors to find this functionality useful, and possibly other hardware
> types in the future.
> 
> Our test nodes in the lab are similar to the Frontier configuration,
> with .5 TB of system memory plus 256 GB of device memory split across
> 4 GPUs, all in a single coherent address space. Page migration is
> expected to improve application efficiency significantly. We will
> report empirical results as they become available.

Hi,

might be a dumb question because I'm not too familiar with
MEMORY_DEVICE_COHERENT, but who's in charge of migrating *to* that
memory? Or how does a process ever get a grab on such pages?

And where does migration come into play? I assume migration is only
required to migrate off of that device memory to ordinary system RAM
when required because the device memory has to be freed up, correct?

(a high level description on how this is exploited from users space
would be great)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
  2022-01-12 11:16 ` David Hildenbrand
@ 2022-01-12 16:08   ` Felix Kuehling
  0 siblings, 0 replies; 24+ messages in thread
From: Felix Kuehling @ 2022-01-12 16:08 UTC (permalink / raw)
  To: David Hildenbrand, Alex Sierra, akpm, linux-mm, rcampbell,
	linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, apopple, willy

Am 2022-01-12 um 6:16 a.m. schrieb David Hildenbrand:
> On 10.01.22 23:31, Alex Sierra wrote:
>> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
>> owned by a device that can be mapped into CPU page tables like
>> MEMORY_DEVICE_GENERIC and can also be migrated like
>> MEMORY_DEVICE_PRIVATE.
>>
>> Christoph, the suggestion to incorporate Ralph Campbell’s refcount
>> cleanup patch into our hardware page migration patchset originally came
>> from you, but it proved impractical to do things in that order because
>> the refcount cleanup introduced a bug with wide ranging structural
>> implications. Instead, we amended Ralph’s patch so that it could be
>> applied after merging the migration work. As we saw from the recent
>> discussion, merging the refcount work is going to take some time and
>> cooperation between multiple development groups, while the migration
>> work is ready now and is needed now. So we propose to merge this
>> patchset first and continue to work with Ralph and others to merge the
>> refcount cleanup separately, when it is ready.
>>
>> This patch series is mostly self-contained except for a few places where
>> it needs to update other subsystems to handle the new memory type.
>> System stability and performance are not affected according to our
>> ongoing testing, including xfstests.
>>
>> How it works: The system BIOS advertises the GPU device memory
>> (aka VRAM) as SPM (special purpose memory) in the UEFI system address
>> map.
>>
>> The amdgpu driver registers the memory with devmap as
>> MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
>> this hardware page migration capability is the Frontier supercomputer
>> project. This functionality is not AMD-specific. We expect other GPU
>> vendors to find this functionality useful, and possibly other hardware
>> types in the future.
>>
>> Our test nodes in the lab are similar to the Frontier configuration,
>> with .5 TB of system memory plus 256 GB of device memory split across
>> 4 GPUs, all in a single coherent address space. Page migration is
>> expected to improve application efficiency significantly. We will
>> report empirical results as they become available.
> Hi,
>
> might be a dumb question because I'm not too familiar with
> MEMORY_DEVICE_COHERENT, but who's in charge of migrating *to* that
> memory? Or how does a process ever get a grab on such pages?

Device memory management and migration to device memory work the same as
MEMORY_DEVICE_PRIVATE. The device driver is in charge of managing the
memory and migrating data to it in response to application requests
(e.g. hipMemPrefetchAsync) or device page faults.

The nice thing about MEMORY_DEVICE_COHERENT is, that the CPU, or a 3rd
party device (e.g. a NIC) can access the memory without migrations
disrupting execution of high performance application code on the GPU.


>
> And where does migration come into play? I assume migration is only
> required to migrate off of that device memory to ordinary system RAM
> when required because the device memory has to be freed up, correct?

That's one case. For example memory pressure can force the GPU driver to
evict some device-coherent memory back to system memory. Also,
applications can request a migration to system memory explicitly (again
with something like hipMemPrefetchAsync).

Regards,
  Felix


>
> (a high level description on how this is exploited from users space
> would be great)
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 01/10] mm: add zone device coherent type memory support
  2022-01-10 22:31 ` [PATCH v3 01/10] mm: add zone device coherent type memory support Alex Sierra
@ 2022-01-20  4:08   ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  4:08 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

On Tuesday, 11 January 2022 9:31:52 AM AEDT Alex Sierra wrote:
> Device memory that is cache coherent from device and CPU point of view.
> This is used on platforms that have an advanced system bus (like CAPI
> or CXL). Any page of a process can be migrated to such memory. However,
> no one should be allowed to pin such memory so that it can always be
> evicted.
> 
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>  include/linux/memremap.h |  8 ++++++++
>  include/linux/mm.h       | 16 ++++++++++++++++
>  mm/memcontrol.c          |  6 +++---
>  mm/memory-failure.c      |  8 ++++++--
>  mm/memremap.c            |  5 ++++-
>  mm/migrate.c             | 21 +++++++++++++--------
>  6 files changed, 50 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index c0e9d35889e8..ff4d398edf35 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -39,6 +39,13 @@ struct vmem_altmap {
>   * A more complete discussion of unaddressable memory may be found in
>   * include/linux/hmm.h and Documentation/vm/hmm.rst.
>   *
> + * MEMORY_DEVICE_COHERENT:
> + * Device memory that is cache coherent from device and CPU point of view. This
> + * is used on platforms that have an advanced system bus (like CAPI or CXL). A
> + * driver can hotplug the device memory using ZONE_DEVICE and with that memory
> + * type. Any page of a process can be migrated to such memory. However no one
> + * should be allowed to pin such memory so that it can always be evicted.
> + *
>   * MEMORY_DEVICE_FS_DAX:
>   * Host memory that has similar access semantics as System RAM i.e. DMA
>   * coherent and supports page pinning. In support of coordinating page
> @@ -59,6 +66,7 @@ struct vmem_altmap {
>  enum memory_type {
>  	/* 0 is reserved to catch uninitialized type fields */
>  	MEMORY_DEVICE_PRIVATE = 1,
> +	MEMORY_DEVICE_COHERENT,
>  	MEMORY_DEVICE_FS_DAX,
>  	MEMORY_DEVICE_GENERIC,
>  	MEMORY_DEVICE_PCI_P2PDMA,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 73a52aba448f..fcf96c0fc918 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1162,6 +1162,7 @@ static inline bool page_is_devmap_managed(struct page *page)
>  		return false;
>  	switch (page->pgmap->type) {
>  	case MEMORY_DEVICE_PRIVATE:
> +	case MEMORY_DEVICE_COHERENT:
>  	case MEMORY_DEVICE_FS_DAX:
>  		return true;
>  	default:
> @@ -1191,6 +1192,21 @@ static inline bool is_device_private_page(const struct page *page)
>  		page->pgmap->type == MEMORY_DEVICE_PRIVATE;
>  }
>  
> +static inline bool is_device_coherent_page(const struct page *page)
> +{
> +	return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> +		is_zone_device_page(page) &&
> +		page->pgmap->type == MEMORY_DEVICE_COHERENT;
> +}
> +
> +static inline bool is_device_page(const struct page *page)

I wish we could think of a better name for this - it's too similar to
is_zone_device_page() so I can never remember if it includes FS_DAX pages or
not. Unfortunately I don't have any better suggestions though.

> +{
> +	return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> +		is_zone_device_page(page) &&
> +		(page->pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +		page->pgmap->type == MEMORY_DEVICE_COHERENT);
> +}
> +
>  static inline bool is_pci_p2pdma_page(const struct page *page)
>  {
>  	return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6da5020a8656..d0bab0747c73 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5695,8 +5695,8 @@ static int mem_cgroup_move_account(struct page *page,
>   *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
>   *     target for charge migration. if @target is not NULL, the entry is stored
>   *     in target->ent.
> - *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PRIVATE
> - *     (so ZONE_DEVICE page and thus not on the lru).
> + *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is device memory and
> + *   thus not on the lru.
>   *     For now we such page is charge like a regular page would be as for all
>   *     intent and purposes it is just special memory taking the place of a
>   *     regular page.
> @@ -5730,7 +5730,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
>  		 */
>  		if (page_memcg(page) == mc.from) {
>  			ret = MC_TARGET_PAGE;
> -			if (is_device_private_page(page))
> +			if (is_device_page(page))
>  				ret = MC_TARGET_DEVICE;
>  			if (target)
>  				target->page = page;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3e6449f2102a..4cf212e5f432 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1554,12 +1554,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>  		goto unlock;
>  	}
>  
> -	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> +	switch (pgmap->type) {
> +	case MEMORY_DEVICE_PRIVATE:
> +	case MEMORY_DEVICE_COHERENT:
>  		/*
> -		 * TODO: Handle HMM pages which may need coordination
> +		 * TODO: Handle device pages which may need coordination
>  		 * with device-side memory.
>  		 */
>  		goto unlock;
> +	default:
> +		break;
>  	}
>  
>  	/*
> diff --git a/mm/memremap.c b/mm/memremap.c
> index ed593bf87109..94d6a1e01d42 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -44,6 +44,7 @@ EXPORT_SYMBOL(devmap_managed_key);
>  static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
>  {
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    pgmap->type == MEMORY_DEVICE_COHERENT ||
>  	    pgmap->type == MEMORY_DEVICE_FS_DAX)
>  		static_branch_dec(&devmap_managed_key);
>  }
> @@ -51,6 +52,7 @@ static void devmap_managed_enable_put(struct dev_pagemap *pgmap)
>  static void devmap_managed_enable_get(struct dev_pagemap *pgmap)
>  {
>  	if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
> +	    pgmap->type == MEMORY_DEVICE_COHERENT ||
>  	    pgmap->type == MEMORY_DEVICE_FS_DAX)
>  		static_branch_inc(&devmap_managed_key);
>  }
> @@ -328,6 +330,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  
>  	switch (pgmap->type) {
>  	case MEMORY_DEVICE_PRIVATE:
> +	case MEMORY_DEVICE_COHERENT:
>  		if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {

This will fail if device private support isn't enabled. Does device coherent
need to depend on that?

>  			WARN(1, "Device private memory not supported\n");
>  			return ERR_PTR(-EINVAL);

Also there is this check further down:

                if (!pgmap->ops || !pgmap->ops->migrate_to_ram) {

Device coherent pages don't use the migrate_to_ram() callback, so this should
check for migrate_to_ram == NULL in that case.

                        WARN(1, "Missing migrate_to_ram method\n");
                        return ERR_PTR(-EINVAL);
                }


> @@ -498,7 +501,7 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
>  void free_devmap_managed_page(struct page *page)
>  {
>  	/* notify page idle for dax */
> -	if (!is_device_private_page(page)) {
> +	if (!is_device_page(page)) {
>  		wake_up_var(&page->_refcount);
>  		return;
>  	}
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1852d787e6ab..91018880dc7f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -362,7 +362,7 @@ static int expected_page_refs(struct address_space *mapping, struct page *page)
>  	 * Device private pages have an extra refcount as they are
>  	 * ZONE_DEVICE pages.
>  	 */
> -	expected_count += is_device_private_page(page);
> +	expected_count += is_device_page(page);
>  	if (mapping)
>  		expected_count += thp_nr_pages(page) + page_has_private(page);
>  
> @@ -2503,7 +2503,7 @@ static bool migrate_vma_check_page(struct page *page)
>  		 * FIXME proper solution is to rework migration_entry_wait() so
>  		 * it does not need to take a reference on page.
>  		 */
> -		return is_device_private_page(page);
> +		return is_device_page(page);
>  	}
>  	/* For file back page */
> @@ -2791,7 +2791,7 @@ EXPORT_SYMBOL(migrate_vma_setup);
>   *     handle_pte_fault()
>   *       do_anonymous_page()
>   * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE
> - * private page.
> + * private or coherent page.
>   */
>  static void migrate_vma_insert_page(struct migrate_vma *migrate,
>  				    unsigned long addr,
> @@ -2867,10 +2867,15 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
>  				swp_entry = make_readable_device_private_entry(
>  							page_to_pfn(page));
>  			entry = swp_entry_to_pte(swp_entry);
> +		} else if (is_device_coherent_page(page)) {
> +			entry = pte_mkold(mk_pte(page,
> +						 READ_ONCE(vma->vm_page_prot)));
> +			if (vma->vm_flags & VM_WRITE)
> +				entry = pte_mkwrite(pte_mkdirty(entry));

As I understand things device coherent pages use normal PTEs, so it would be
good if you could consolidate this to use the same code path as for normal
pages.

>  		} else {
>  			/*
> -			 * For now we only support migrating to un-addressable
> -			 * device memory.
> +			 * We support migrating to private and coherent types
> +			 * for device zone memory.
>  			 */
>  			pr_warn_once("Unsupported ZONE_DEVICE page type.\n");
>  			goto abort;
> @@ -2976,10 +2981,10 @@ void migrate_vma_pages(struct migrate_vma *migrate)
>  		mapping = page_mapping(page);
>  
>  		if (is_zone_device_page(newpage)) {
> -			if (is_device_private_page(newpage)) {
> +			if (is_device_page(newpage)) {
>  				/*
> -				 * For now only support private anonymous when
> -				 * migrating to un-addressable device memory.
> +				 * For now only support private and coherent
> +				 * anonymous when migrating to device memory.
>  				 */
>  				if (mapping) {
>  					migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type
  2022-01-10 22:31 ` [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
@ 2022-01-20  5:01   ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  5:01 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

On Tuesday, 11 January 2022 9:31:57 AM AEDT Alex Sierra wrote:

[...]

> +enum {
> +	/* 0 is reserved to catch uninitialized type fields */

This seems unnecessary and can be dropped to start at zero.

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> +	HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1,
> +};
> +
>  #endif /* _LIB_TEST_HMM_UAPI_H */
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 07/10] lib: test_hmm add module param for zone device type
  2022-01-10 22:31 ` [PATCH v3 07/10] lib: test_hmm add module param for " Alex Sierra
@ 2022-01-20  5:23   ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  5:23 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

Thanks for splitting the coherent devices into separate device nodes. Couple of
comments below.

On Tuesday, 11 January 2022 9:31:58 AM AEDT Alex Sierra wrote:
> In order to configure device coherent in test_hmm, two module parameters
> should be passed, which correspond to the SP start address of each
> device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed,
> private device type is configured.
> 
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>  lib/test_hmm.c      | 74 +++++++++++++++++++++++++++++++--------------
>  lib/test_hmm_uapi.h |  1 +
>  2 files changed, 53 insertions(+), 22 deletions(-)
> 
> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> index 97e48164d56a..9edeff52302e 100644
> --- a/lib/test_hmm.c
> +++ b/lib/test_hmm.c
> @@ -34,6 +34,16 @@
>  #define DEVMEM_CHUNK_SIZE		(256 * 1024 * 1024U)
>  #define DEVMEM_CHUNKS_RESERVE		16
>  
> +static unsigned long spm_addr_dev0;
> +module_param(spm_addr_dev0, long, 0644);
> +MODULE_PARM_DESC(spm_addr_dev0,
> +		"Specify start address for SPM (special purpose memory) used for device 0. By setting this Coherent device type will be used. Make sure spm_addr_dev1 is set too");

It would be useful if you could mention the required size for this region
(ie. DEVMEM_CHUNK_SIZE).

> +
> +static unsigned long spm_addr_dev1;
> +module_param(spm_addr_dev1, long, 0644);
> +MODULE_PARM_DESC(spm_addr_dev1,
> +		"Specify start address for SPM (special purpose memory) used for device 1. By setting this Coherent device type will be used. Make sure spm_addr_dev0 is set too");
> +
>  static const struct dev_pagemap_ops dmirror_devmem_ops;
>  static const struct mmu_interval_notifier_ops dmirror_min_ops;
>  static dev_t dmirror_dev;
> @@ -452,29 +462,44 @@ static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd)
>  	return ret;
>  }
>  
> -static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
> +static int dmirror_allocate_chunk(struct dmirror_device *mdevice,
>  				   struct page **ppage)
>  {
>  	struct dmirror_chunk *devmem;
> -	struct resource *res;
> +	struct resource *res = NULL;
>  	unsigned long pfn;
>  	unsigned long pfn_first;
>  	unsigned long pfn_last;
>  	void *ptr;
> +	int ret = -ENOMEM;
>  
>  	devmem = kzalloc(sizeof(*devmem), GFP_KERNEL);
>  	if (!devmem)
> -		return false;
> +		return ret;
>  
> -	res = request_free_mem_region(&iomem_resource, DEVMEM_CHUNK_SIZE,
> -				      "hmm_dmirror");
> -	if (IS_ERR(res))
> +	switch (mdevice->zone_device_type) {
> +	case HMM_DMIRROR_MEMORY_DEVICE_PRIVATE:
> +		res = request_free_mem_region(&iomem_resource, DEVMEM_CHUNK_SIZE,
> +					      "hmm_dmirror");
> +		if (IS_ERR_OR_NULL(res))
> +			goto err_devmem;
> +		devmem->pagemap.range.start = res->start;
> +		devmem->pagemap.range.end = res->end;
> +		devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
> +		break;
> +	case HMM_DMIRROR_MEMORY_DEVICE_COHERENT:
> +		devmem->pagemap.range.start = (MINOR(mdevice->cdevice.dev) - 2) ?
> +							spm_addr_dev0 :
> +							spm_addr_dev1;
> +		devmem->pagemap.range.end = devmem->pagemap.range.start +
> +					    DEVMEM_CHUNK_SIZE - 1;
> +		devmem->pagemap.type = MEMORY_DEVICE_COHERENT;
> +		break;
> +	default:
> +		ret = -EINVAL;
>  		goto err_devmem;
> +	}
>  
> -	mdevice->zone_device_type = HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;

What initialises mdevice->zone_device_type now? It looks like it needs to get
initialised in hmm_dmirror_init(), which would be easier to do in the previous
patch rather than adding it here in the first place.

> -	devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
> -	devmem->pagemap.range.start = res->start;
> -	devmem->pagemap.range.end = res->end;
>  	devmem->pagemap.nr_range = 1;
>  	devmem->pagemap.ops = &dmirror_devmem_ops;
>  	devmem->pagemap.owner = mdevice;
> @@ -495,10 +520,14 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
>  		mdevice->devmem_capacity = new_capacity;
>  		mdevice->devmem_chunks = new_chunks;
>  	}
> -
>  	ptr = memremap_pages(&devmem->pagemap, numa_node_id());
> -	if (IS_ERR(ptr))
> +	if (IS_ERR_OR_NULL(ptr)) {
> +		if (ptr)
> +			ret = PTR_ERR(ptr);
> +		else
> +			ret = -EFAULT;
>  		goto err_release;
> +	}
>  
>  	devmem->mdevice = mdevice;
>  	pfn_first = devmem->pagemap.range.start >> PAGE_SHIFT;
> @@ -527,15 +556,17 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
>  	}
>  	spin_unlock(&mdevice->lock);
>  
> -	return true;
> +	return 0;
>  
>  err_release:
>  	mutex_unlock(&mdevice->devmem_lock);
> -	release_mem_region(devmem->pagemap.range.start, range_len(&devmem->pagemap.range));
> +	if (res && devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
> +		release_mem_region(devmem->pagemap.range.start,
> +				   range_len(&devmem->pagemap.range));
>  err_devmem:
>  	kfree(devmem);
>  
> -	return false;
> +	return ret;
>  }
>  
>  static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
> @@ -560,7 +591,7 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
>  		spin_unlock(&mdevice->lock);
>  	} else {
>  		spin_unlock(&mdevice->lock);
> -		if (!dmirror_allocate_chunk(mdevice, &dpage))
> +		if (dmirror_allocate_chunk(mdevice, &dpage))
>  			goto error;
>  	}
>  
> @@ -1220,10 +1251,8 @@ static int dmirror_device_init(struct dmirror_device *mdevice, int id)
>  	if (ret)
>  		return ret;
>  
> -	/* Build a list of free ZONE_DEVICE private struct pages */
> -	dmirror_allocate_chunk(mdevice, NULL);
> -
> -	return 0;
> +	/* Build a list of free ZONE_DEVICE struct pages */
> +	return dmirror_allocate_chunk(mdevice, NULL);
>  }
>  
>  static void dmirror_device_remove(struct dmirror_device *mdevice)
> @@ -1236,8 +1265,9 @@ static void dmirror_device_remove(struct dmirror_device *mdevice)
>  				mdevice->devmem_chunks[i];
>  
>  			memunmap_pages(&devmem->pagemap);
> -			release_mem_region(devmem->pagemap.range.start,
> -					   range_len(&devmem->pagemap.range));
> +			if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
> +				release_mem_region(devmem->pagemap.range.start,
> +						   range_len(&devmem->pagemap.range));
>  			kfree(devmem);
>  		}
>  		kfree(mdevice->devmem_chunks);
> diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
> index 17f842f1aa02..625f3690d086 100644
> --- a/lib/test_hmm_uapi.h
> +++ b/lib/test_hmm_uapi.h
> @@ -68,6 +68,7 @@ enum {
>  enum {
>  	/* 0 is reserved to catch uninitialized type fields */
>  	HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1,
> +	HMM_DMIRROR_MEMORY_DEVICE_COHERENT,
>  };
>  
>  #endif /* _LIB_TEST_HMM_UAPI_H */
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 08/10] lib: add support for device coherent type in test_hmm
  2022-01-10 22:31 ` [PATCH v3 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
@ 2022-01-20  6:00   ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  6:00 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

On Tuesday, 11 January 2022 9:31:59 AM AEDT Alex Sierra wrote:
> Device Coherent type uses device memory that is coherently accesible by
> the CPU. This could be shown as SP (special purpose) memory range
> at the BIOS-e820 memory enumeration. If no SP memory is supported in
> system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP.
> 
> Currently, test_hmm only supports two different SP ranges of at least
> 256MB size. This could be specified in the kernel parameter variable
> efi_fake_mem. Ex. Two SP ranges of 1GB starting at 0x100000000 &
> 0x140000000 physical address. Ex.
> efi_fake_mem=1G@0x100000000:0x40000,1G@0x140000000:0x40000
> 
> Private and coherent device mirror instances can be created in the same
> probed. This is done by passing the module parameters spm_addr_dev0 &
> spm_addr_dev1. In this case, it will create four instances of
> device_mirror. The first two correspond to private device type, the
> last two to coherent type. Then, they can be easily accessed from user
> space through /dev/hmm_mirror<num_device>. Usually num_device 0 and 1
> are for private, and 2 and 3 for coherent types. If no module
> parameters are passed, two instances of private type device_mirror will
> be created only.
> 
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>  lib/test_hmm.c      | 247 ++++++++++++++++++++++++++++++++------------
>  lib/test_hmm_uapi.h |  15 ++-
>  2 files changed, 193 insertions(+), 69 deletions(-)
> 
> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> index 9edeff52302e..7c641c5a9cfa 100644
> --- a/lib/test_hmm.c
> +++ b/lib/test_hmm.c
> @@ -29,11 +29,22 @@
>  
>  #include "test_hmm_uapi.h"
>  
> -#define DMIRROR_NDEVICES		2
> +#define DMIRROR_NDEVICES		4
>  #define DMIRROR_RANGE_FAULT_TIMEOUT	1000
>  #define DEVMEM_CHUNK_SIZE		(256 * 1024 * 1024U)
>  #define DEVMEM_CHUNKS_RESERVE		16
>  
> +/*
> + * For device_private pages, dpage is just a dummy struct page
> + * representing a piece of device memory. dmirror_devmem_alloc_page
> + * allocates a real system memory page as backing storage to fake a
> + * real device. zone_device_data points to that backing page. But
> + * for device_coherent memory, the struct page represents real
> + * physical CPU-accessible memory that we can use directly.
> + */
> +#define BACKING_PAGE(page) (is_device_private_page((page)) ? \
> +			   (page)->zone_device_data : (page))
> +
>  static unsigned long spm_addr_dev0;
>  module_param(spm_addr_dev0, long, 0644);
>  MODULE_PARM_DESC(spm_addr_dev0,
> @@ -122,6 +133,21 @@ static int dmirror_bounce_init(struct dmirror_bounce *bounce,
>  	return 0;
>  }
>  
> +static bool dmirror_is_private_zone(struct dmirror_device *mdevice)
> +{
> +	return (mdevice->zone_device_type ==
> +		HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ? true : false;
> +}
> +
> +static enum migrate_vma_direction
> +	dmirror_select_device(struct dmirror *dmirror)
> +{
> +	return (dmirror->mdevice->zone_device_type ==
> +		HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ?
> +		MIGRATE_VMA_SELECT_DEVICE_PRIVATE :
> +		MIGRATE_VMA_SELECT_DEVICE_COHERENT;
> +}
> +
>  static void dmirror_bounce_fini(struct dmirror_bounce *bounce)
>  {
>  	vfree(bounce->ptr);
> @@ -572,16 +598,19 @@ static int dmirror_allocate_chunk(struct dmirror_device *mdevice,
>  static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
>  {
>  	struct page *dpage = NULL;
> -	struct page *rpage;
> +	struct page *rpage = NULL;
>  
>  	/*
> -	 * This is a fake device so we alloc real system memory to store
> -	 * our device memory.
> +	 * For ZONE_DEVICE private type, this is a fake device so we alloc real
> +	 * system memory to store our device memory.
> +	 * For ZONE_DEVICE coherent type we use the actual dpage to store the data
> +	 * and ignore rpage.
>  	 */
> -	rpage = alloc_page(GFP_HIGHUSER);
> -	if (!rpage)
> -		return NULL;
> -
> +	if (dmirror_is_private_zone(mdevice)) {
> +		rpage = alloc_page(GFP_HIGHUSER);
> +		if (!rpage)
> +			return NULL;
> +	}
>  	spin_lock(&mdevice->lock);
>  
>  	if (mdevice->free_pages) {
> @@ -601,7 +630,8 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
>  	return dpage;
>  
>  error:
> -	__free_page(rpage);
> +	if (rpage)
> +		__free_page(rpage);
>  	return NULL;
>  }
>  
> @@ -627,12 +657,15 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
>  		 * unallocated pte_none() or read-only zero page.
>  		 */
>  		spage = migrate_pfn_to_page(*src);
> +		WARN(spage && is_zone_device_page(spage),
> +		     "page already in device spage pfn: 0x%lx\n",
> +		     page_to_pfn(spage));

This should also lead to test failure because we are only supposed to be
selecting system pages. Ie.

if (WARN_ON_ONCE()) {
	dpage = NULL;
	continue;
}

>  		dpage = dmirror_devmem_alloc_page(mdevice);
>  		if (!dpage)
>  			continue;
>  
> -		rpage = dpage->zone_device_data;
> +		rpage = BACKING_PAGE(dpage);
>  		if (spage)
>  			copy_highpage(rpage, spage);
>  		else
> @@ -646,6 +679,8 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
>  		 */
>  		rpage->zone_device_data = dmirror;
>  
> +		pr_debug("migrating from sys to dev pfn src: 0x%lx pfn dst: 0x%lx\n",
> +			 page_to_pfn(spage), page_to_pfn(dpage));
>  		*dst = migrate_pfn(page_to_pfn(dpage)) |
>  			    MIGRATE_PFN_LOCKED;
>  		if ((*src & MIGRATE_PFN_WRITE) ||
> @@ -724,11 +759,7 @@ static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
>  		if (!dpage)
>  			continue;
>  
> -		/*
> -		 * Store the page that holds the data so the page table
> -		 * doesn't have to deal with ZONE_DEVICE private pages.
> -		 */
> -		entry = dpage->zone_device_data;
> +		entry = BACKING_PAGE(dpage);
>  		if (*dst & MIGRATE_PFN_WRITE)
>  			entry = xa_tag_pointer(entry, DPT_XA_TAG_WRITE);
>  		entry = xa_store(&dmirror->pt, pfn, entry, GFP_ATOMIC);
> @@ -808,8 +839,101 @@ static int dmirror_exclusive(struct dmirror *dmirror,
>  	return ret;
>  }
>  
> -static int dmirror_migrate(struct dmirror *dmirror,
> -			   struct hmm_dmirror_cmd *cmd)
> +static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
> +						      struct dmirror *dmirror)
> +{
> +	const unsigned long *src = args->src;
> +	unsigned long *dst = args->dst;
> +	unsigned long start = args->start;
> +	unsigned long end = args->end;
> +	unsigned long addr;
> +
> +	for (addr = start; addr < end; addr += PAGE_SIZE,
> +				       src++, dst++) {
> +		struct page *dpage, *spage;
> +
> +		spage = migrate_pfn_to_page(*src);
> +		if (!spage || !(*src & MIGRATE_PFN_MIGRATE))
> +			continue;
> +
> +		WARN_ON(!is_device_page(spage));

Same - the test should fail in this case (ie. by not migrating the page).

> +		spage = BACKING_PAGE(spage);
> +		dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr);
> +		if (!dpage)
> +			continue;
> +		pr_debug("migrating from dev to sys pfn src: 0x%lx pfn dst: 0x%lx\n",
> +			 page_to_pfn(spage), page_to_pfn(dpage));
> +
> +		lock_page(dpage);
> +		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
> +		copy_highpage(dpage, spage);
> +		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> +		if (*src & MIGRATE_PFN_WRITE)
> +			*dst |= MIGRATE_PFN_WRITE;
> +	}
> +	return 0;
> +}
> +
> +static int dmirror_migrate_to_system(struct dmirror *dmirror,
> +				     struct hmm_dmirror_cmd *cmd)
> +{
> +	unsigned long start, end, addr;
> +	unsigned long size = cmd->npages << PAGE_SHIFT;
> +	struct mm_struct *mm = dmirror->notifier.mm;
> +	struct vm_area_struct *vma;
> +	unsigned long src_pfns[64];
> +	unsigned long dst_pfns[64];
> +	struct migrate_vma args;
> +	unsigned long next;
> +	int ret;
> +
> +	start = cmd->addr;
> +	end = start + size;
> +	if (end < start)
> +		return -EINVAL;
> +
> +	/* Since the mm is for the mirrored process, get a reference first. */
> +	if (!mmget_not_zero(mm))
> +		return -EINVAL;
> +
> +	mmap_read_lock(mm);
> +	for (addr = start; addr < end; addr = next) {
> +		vma = vma_lookup(mm, addr);
> +		if (!vma || !(vma->vm_flags & VM_READ)) {
> +			ret = -EINVAL;
> +			goto out;
> +		}
> +		next = min(end, addr + (ARRAY_SIZE(src_pfns) << PAGE_SHIFT));
> +		if (next > vma->vm_end)
> +			next = vma->vm_end;
> +
> +		args.vma = vma;
> +		args.src = src_pfns;
> +		args.dst = dst_pfns;
> +		args.start = addr;
> +		args.end = next;
> +		args.pgmap_owner = dmirror->mdevice;
> +		args.flags = dmirror_select_device(dmirror);
> +
> +		ret = migrate_vma_setup(&args);
> +		if (ret)
> +			goto out;
> +
> +		pr_debug("Migrating from device mem to sys mem\n");
> +		dmirror_devmem_fault_alloc_and_copy(&args, dmirror);
> +
> +		migrate_vma_pages(&args);
> +		migrate_vma_finalize(&args);
> +	}
> +out:
> +	mmap_read_unlock(mm);
> +	mmput(mm);
> +	return ret;

I think you need to return the number of pages successfully migrated otherwise
failures migrating back to system memory will go unnoticed. This isn't a
problem for device private pages because in that case pages are migrated back
in response to a fault, and if they fail the process will get killed with
SIGBUS.

> +}
> +
> +static int dmirror_migrate_to_device(struct dmirror *dmirror,
> +				struct hmm_dmirror_cmd *cmd)
>  {
>  	unsigned long start, end, addr;
>  	unsigned long size = cmd->npages << PAGE_SHIFT;
> @@ -853,6 +977,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
>  		if (ret)
>  			goto out;
>  
> +		pr_debug("Migrating from sys mem to device mem\n");
>  		dmirror_migrate_alloc_and_copy(&args, dmirror);
>  		migrate_vma_pages(&args);
>  		dmirror_migrate_finalize_and_map(&args, dmirror);
> @@ -861,7 +986,7 @@ static int dmirror_migrate(struct dmirror *dmirror,
>  	mmap_read_unlock(mm);
>  	mmput(mm);
>  
> -	/* Return the migrated data for verification. */
> +	/* Return the migrated data for verification. only for pages in device zone */
>  	ret = dmirror_bounce_init(&bounce, start, size);
>  	if (ret)
>  		return ret;
> @@ -898,12 +1023,22 @@ static void dmirror_mkentry(struct dmirror *dmirror, struct hmm_range *range,
>  	}
>  
>  	page = hmm_pfn_to_page(entry);
> -	if (is_device_private_page(page)) {
> -		/* Is the page migrated to this device or some other? */
> -		if (dmirror->mdevice == dmirror_page_to_device(page))
> +	if (is_device_page(page)) {
> +		/* Is page ZONE_DEVICE coherent? */
> +		if (!is_device_private_page(page)) {

This would read more clearly without the negation -
ie. `if (is_device_coherent_page(page))`.

> +			if (dmirror->mdevice == dmirror_page_to_device(page))
> +				*perm = HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL;
> +			else
> +				*perm = HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE;
> +		/*
> +		 * Is page ZONE_DEVICE private migrated to
> +		 * this device or some other?
> +		 */
> +		} else if (dmirror->mdevice == dmirror_page_to_device(page)) {
>  			*perm = HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL;
> -		else
> +		} else {
>  			*perm = HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE;
> +		}
>  	} else if (is_zero_pfn(page_to_pfn(page)))
>  		*perm = HMM_DMIRROR_PROT_ZERO;
>  	else
> @@ -1100,8 +1235,12 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
>  		ret = dmirror_write(dmirror, &cmd);
>  		break;
>  
> -	case HMM_DMIRROR_MIGRATE:
> -		ret = dmirror_migrate(dmirror, &cmd);
> +	case HMM_DMIRROR_MIGRATE_TO_DEV:
> +		ret = dmirror_migrate_to_device(dmirror, &cmd);
> +		break;
> +
> +	case HMM_DMIRROR_MIGRATE_TO_SYS:
> +		ret = dmirror_migrate_to_system(dmirror, &cmd);
>  		break;
>  
>  	case HMM_DMIRROR_EXCLUSIVE:
> @@ -1142,14 +1281,13 @@ static const struct file_operations dmirror_fops = {
>  
>  static void dmirror_devmem_free(struct page *page)
>  {
> -	struct page *rpage = page->zone_device_data;
> +	struct page *rpage = BACKING_PAGE(page);
>  	struct dmirror_device *mdevice;
>  
> -	if (rpage)
> +	if (rpage != page)
>  		__free_page(rpage);
>  
>  	mdevice = dmirror_page_to_device(page);
> -
>  	spin_lock(&mdevice->lock);
>  	mdevice->cfree++;
>  	page->zone_device_data = mdevice->free_pages;
> @@ -1157,38 +1295,6 @@ static void dmirror_devmem_free(struct page *page)
>  	spin_unlock(&mdevice->lock);
>  }
>  
> -static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
> -						      struct dmirror *dmirror)
> -{
> -	const unsigned long *src = args->src;
> -	unsigned long *dst = args->dst;
> -	unsigned long start = args->start;
> -	unsigned long end = args->end;
> -	unsigned long addr;
> -
> -	for (addr = start; addr < end; addr += PAGE_SIZE,
> -				       src++, dst++) {
> -		struct page *dpage, *spage;
> -
> -		spage = migrate_pfn_to_page(*src);
> -		if (!spage || !(*src & MIGRATE_PFN_MIGRATE))
> -			continue;
> -		spage = spage->zone_device_data;
> -
> -		dpage = alloc_page_vma(GFP_HIGHUSER_MOVABLE, args->vma, addr);
> -		if (!dpage)
> -			continue;
> -
> -		lock_page(dpage);
> -		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
> -		copy_highpage(dpage, spage);
> -		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> -		if (*src & MIGRATE_PFN_WRITE)
> -			*dst |= MIGRATE_PFN_WRITE;
> -	}
> -	return 0;
> -}
> -
>  static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
>  {
>  	struct migrate_vma args;
> @@ -1203,7 +1309,7 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
>  	 * the mirror but here we use it to hold the page for the simulated
>  	 * device memory and that page holds the pointer to the mirror.
>  	 */
> -	rpage = vmf->page->zone_device_data;
> +	rpage = BACKING_PAGE(vmf->page);

Note that as I understand things this isn't strictly required because this will
never be called for device coherent pages.

>  	dmirror = rpage->zone_device_data;
>  
>  	/* FIXME demonstrate how we can adjust migrate range */
> @@ -1213,7 +1319,7 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
>  	args.src = &src_pfns;
>  	args.dst = &dst_pfns;
>  	args.pgmap_owner = dmirror->mdevice;
> -	args.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
> +	args.flags = dmirror_select_device(dmirror);
>  
>  	if (migrate_vma_setup(&args))
>  		return VM_FAULT_SIGBUS;
> @@ -1279,14 +1385,26 @@ static void dmirror_device_remove(struct dmirror_device *mdevice)
>  static int __init hmm_dmirror_init(void)
>  {
>  	int ret;
> -	int id;
> +	int id = 0;
> +	int ndevices = 0;
>  
>  	ret = alloc_chrdev_region(&dmirror_dev, 0, DMIRROR_NDEVICES,
>  				  "HMM_DMIRROR");
>  	if (ret)
>  		goto err_unreg;
>  
> -	for (id = 0; id < DMIRROR_NDEVICES; id++) {
> +	memset(dmirror_devices, 0, DMIRROR_NDEVICES * sizeof(dmirror_devices[0]));
> +	dmirror_devices[ndevices++].zone_device_type =
> +				HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
> +	dmirror_devices[ndevices++].zone_device_type =
> +				HMM_DMIRROR_MEMORY_DEVICE_PRIVATE;
> +	if (spm_addr_dev0 && spm_addr_dev1) {
> +		dmirror_devices[ndevices++].zone_device_type =
> +					HMM_DMIRROR_MEMORY_DEVICE_COHERENT;
> +		dmirror_devices[ndevices++].zone_device_type =
> +					HMM_DMIRROR_MEMORY_DEVICE_COHERENT;
> +	}
> +	for (id = 0; id < ndevices; id++) {
>  		ret = dmirror_device_init(dmirror_devices + id, id);
>  		if (ret)
>  			goto err_chrdev;
> @@ -1308,7 +1426,8 @@ static void __exit hmm_dmirror_exit(void)
>  	int id;
>  
>  	for (id = 0; id < DMIRROR_NDEVICES; id++)
> -		dmirror_device_remove(dmirror_devices + id);
> +		if (dmirror_devices[id].zone_device_type)

Oh, I understand why you reserved 0 for zone_device_type now. But I still
thinks it's unnecessary - hmm_dmirror_exit() getting called implies
hmm_dmirror_init() was successful which implies zone_device_type was
initialised for all devices.

> +			dmirror_device_remove(dmirror_devices + id);
>  	unregister_chrdev_region(dmirror_dev, DMIRROR_NDEVICES);
>  }
>  
> diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
> index 625f3690d086..e190b2ab6f19 100644
> --- a/lib/test_hmm_uapi.h
> +++ b/lib/test_hmm_uapi.h
> @@ -33,11 +33,12 @@ struct hmm_dmirror_cmd {
>  /* Expose the address space of the calling process through hmm device file */
>  #define HMM_DMIRROR_READ		_IOWR('H', 0x00, struct hmm_dmirror_cmd)
>  #define HMM_DMIRROR_WRITE		_IOWR('H', 0x01, struct hmm_dmirror_cmd)
> -#define HMM_DMIRROR_MIGRATE		_IOWR('H', 0x02, struct hmm_dmirror_cmd)
> -#define HMM_DMIRROR_SNAPSHOT		_IOWR('H', 0x03, struct hmm_dmirror_cmd)
> -#define HMM_DMIRROR_EXCLUSIVE		_IOWR('H', 0x04, struct hmm_dmirror_cmd)
> -#define HMM_DMIRROR_CHECK_EXCLUSIVE	_IOWR('H', 0x05, struct hmm_dmirror_cmd)
> -#define HMM_DMIRROR_GET_MEM_DEV_TYPE	_IOWR('H', 0x06, struct hmm_dmirror_cmd)
> +#define HMM_DMIRROR_MIGRATE_TO_DEV	_IOWR('H', 0x02, struct hmm_dmirror_cmd)
> +#define HMM_DMIRROR_MIGRATE_TO_SYS	_IOWR('H', 0x03, struct hmm_dmirror_cmd)
> +#define HMM_DMIRROR_SNAPSHOT		_IOWR('H', 0x04, struct hmm_dmirror_cmd)
> +#define HMM_DMIRROR_EXCLUSIVE		_IOWR('H', 0x05, struct hmm_dmirror_cmd)
> +#define HMM_DMIRROR_CHECK_EXCLUSIVE	_IOWR('H', 0x06, struct hmm_dmirror_cmd)
> +#define HMM_DMIRROR_GET_MEM_DEV_TYPE	_IOWR('H', 0x07, struct hmm_dmirror_cmd)
>  
>  /*
>   * Values returned in hmm_dmirror_cmd.ptr for HMM_DMIRROR_SNAPSHOT.
> @@ -52,6 +53,8 @@ struct hmm_dmirror_cmd {
>   *					device the ioctl() is made
>   * HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE: Migrated device private page on some
>   *					other device
> + * HMM_DMIRROR_PROT_DEV_COHERENT: Migrate device coherent page on the device
> + *				  the ioctl() is made
>   */
>  enum {
>  	HMM_DMIRROR_PROT_ERROR			= 0xFF,
> @@ -63,6 +66,8 @@ enum {
>  	HMM_DMIRROR_PROT_ZERO			= 0x10,
>  	HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL	= 0x20,
>  	HMM_DMIRROR_PROT_DEV_PRIVATE_REMOTE	= 0x30,
> +	HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL	= 0x40,
> +	HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE	= 0x50,
>  };
>  
>  enum {
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 09/10] tools: update hmm-test to support device coherent type
  2022-01-10 22:32 ` [PATCH v3 09/10] tools: update hmm-test to support device coherent type Alex Sierra
@ 2022-01-20  6:14   ` Alistair Popple
  2022-01-27  3:22     ` Sierra Guiza, Alejandro (Alex)
  0 siblings, 1 reply; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  6:14 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

On Tuesday, 11 January 2022 9:32:00 AM AEDT Alex Sierra wrote:
> Test cases such as migrate_fault and migrate_multiple, were modified to
> explicit migrate from device to sys memory without the need of page
> faults, when using device coherent type.
> 
> Snapshot test case updated to read memory device type first and based
> on that, get the proper returned results migrate_ping_pong test case

Where is the migrate_ping_pong test? Did you perhaps forget to add it? :-)

> added to test explicit migration from device to sys memory for both
> private and coherent zone types.
> 
> Helpers to migrate from device to sys memory and vicerversa
> were also added.
> 
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> v2:
> Set FIXTURE_VARIANT to add multiple device types to the FIXTURE. This
> will run all the tests for each device type (private and coherent) in
> case both existed during hmm-test driver probed.
> ---
>  tools/testing/selftests/vm/hmm-tests.c | 122 ++++++++++++++++++++-----
>  1 file changed, 101 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
> index 864f126ffd78..8eb81dfba4b3 100644
> --- a/tools/testing/selftests/vm/hmm-tests.c
> +++ b/tools/testing/selftests/vm/hmm-tests.c
> @@ -44,6 +44,14 @@ struct hmm_buffer {
>  	int		fd;
>  	uint64_t	cpages;
>  	uint64_t	faults;
> +	int		zone_device_type;
> +};
> +
> +enum {
> +	HMM_PRIVATE_DEVICE_ONE,
> +	HMM_PRIVATE_DEVICE_TWO,
> +	HMM_COHERENCE_DEVICE_ONE,
> +	HMM_COHERENCE_DEVICE_TWO,
>  };
>  
>  #define TWOMEG		(1 << 21)
> @@ -60,6 +68,21 @@ FIXTURE(hmm)
>  	unsigned int	page_shift;
>  };
>  
> +FIXTURE_VARIANT(hmm)
> +{
> +	int     device_number;
> +};
> +
> +FIXTURE_VARIANT_ADD(hmm, hmm_device_private)
> +{
> +	.device_number = HMM_PRIVATE_DEVICE_ONE,
> +};
> +
> +FIXTURE_VARIANT_ADD(hmm, hmm_device_coherent)
> +{
> +	.device_number = HMM_COHERENCE_DEVICE_ONE,
> +};
> +
>  FIXTURE(hmm2)
>  {
>  	int		fd0;
> @@ -68,6 +91,24 @@ FIXTURE(hmm2)
>  	unsigned int	page_shift;
>  };
>  
> +FIXTURE_VARIANT(hmm2)
> +{
> +	int     device_number0;
> +	int     device_number1;
> +};
> +
> +FIXTURE_VARIANT_ADD(hmm2, hmm2_device_private)
> +{
> +	.device_number0 = HMM_PRIVATE_DEVICE_ONE,
> +	.device_number1 = HMM_PRIVATE_DEVICE_TWO,
> +};
> +
> +FIXTURE_VARIANT_ADD(hmm2, hmm2_device_coherent)
> +{
> +	.device_number0 = HMM_COHERENCE_DEVICE_ONE,
> +	.device_number1 = HMM_COHERENCE_DEVICE_TWO,
> +};
> +
>  static int hmm_open(int unit)
>  {
>  	char pathname[HMM_PATH_MAX];
> @@ -81,12 +122,19 @@ static int hmm_open(int unit)
>  	return fd;
>  }
>  
> +static bool hmm_is_coherent_type(int dev_num)
> +{
> +	return (dev_num >= HMM_COHERENCE_DEVICE_ONE);
> +}
> +
>  FIXTURE_SETUP(hmm)
>  {
>  	self->page_size = sysconf(_SC_PAGE_SIZE);
>  	self->page_shift = ffs(self->page_size) - 1;
>  
> -	self->fd = hmm_open(0);
> +	self->fd = hmm_open(variant->device_number);
> +	if (self->fd < 0 && hmm_is_coherent_type(variant->device_number))
> +		SKIP(exit(0), "DEVICE_COHERENT not available");
>  	ASSERT_GE(self->fd, 0);
>  }
>  
> @@ -95,9 +143,11 @@ FIXTURE_SETUP(hmm2)
>  	self->page_size = sysconf(_SC_PAGE_SIZE);
>  	self->page_shift = ffs(self->page_size) - 1;
>  
> -	self->fd0 = hmm_open(0);
> +	self->fd0 = hmm_open(variant->device_number0);
> +	if (self->fd0 < 0 && hmm_is_coherent_type(variant->device_number0))
> +		SKIP(exit(0), "DEVICE_COHERENT not available");
>  	ASSERT_GE(self->fd0, 0);
> -	self->fd1 = hmm_open(1);
> +	self->fd1 = hmm_open(variant->device_number1);
>  	ASSERT_GE(self->fd1, 0);
>  }
>  
> @@ -144,6 +194,7 @@ static int hmm_dmirror_cmd(int fd,
>  	}
>  	buffer->cpages = cmd.cpages;
>  	buffer->faults = cmd.faults;
> +	buffer->zone_device_type = cmd.zone_device_type;
>  
>  	return 0;
>  }
> @@ -211,6 +262,20 @@ static void hmm_nanosleep(unsigned int n)
>  	nanosleep(&t, NULL);
>  }
>  
> +static int hmm_migrate_sys_to_dev(int fd,
> +				   struct hmm_buffer *buffer,
> +				   unsigned long npages)
> +{
> +	return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_DEV, buffer, npages);
> +}
> +
> +static int hmm_migrate_dev_to_sys(int fd,
> +				   struct hmm_buffer *buffer,
> +				   unsigned long npages)
> +{
> +	return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_SYS, buffer, npages);
> +}
> +
>  /*
>   * Simple NULL test of device open/close.
>   */
> @@ -875,7 +940,7 @@ TEST_F(hmm, migrate)
>  		ptr[i] = i;
>  
>  	/* Migrate memory to device. */
> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>  	ASSERT_EQ(ret, 0);
>  	ASSERT_EQ(buffer->cpages, npages);
>  
> @@ -923,7 +988,7 @@ TEST_F(hmm, migrate_fault)
>  		ptr[i] = i;
>  
>  	/* Migrate memory to device. */
> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>  	ASSERT_EQ(ret, 0);
>  	ASSERT_EQ(buffer->cpages, npages);
>  
> @@ -936,7 +1001,7 @@ TEST_F(hmm, migrate_fault)
>  		ASSERT_EQ(ptr[i], i);
>  
>  	/* Migrate memory to the device again. */
> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>  	ASSERT_EQ(ret, 0);
>  	ASSERT_EQ(buffer->cpages, npages);
>  
> @@ -976,7 +1041,7 @@ TEST_F(hmm, migrate_shared)
>  	ASSERT_NE(buffer->ptr, MAP_FAILED);
>  
>  	/* Migrate memory to device. */
> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>  	ASSERT_EQ(ret, -ENOENT);
>  
>  	hmm_buffer_free(buffer);
> @@ -1015,7 +1080,7 @@ TEST_F(hmm2, migrate_mixed)
>  	p = buffer->ptr;
>  
>  	/* Migrating a protected area should be an error. */
> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, npages);
> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, npages);
>  	ASSERT_EQ(ret, -EINVAL);
>  
>  	/* Punch a hole after the first page address. */
> @@ -1023,7 +1088,7 @@ TEST_F(hmm2, migrate_mixed)
>  	ASSERT_EQ(ret, 0);
>  
>  	/* We expect an error if the vma doesn't cover the range. */
> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 3);
> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 3);
>  	ASSERT_EQ(ret, -EINVAL);
>  
>  	/* Page 2 will be a read-only zero page. */
> @@ -1055,13 +1120,13 @@ TEST_F(hmm2, migrate_mixed)
>  
>  	/* Now try to migrate pages 2-5 to device 1. */
>  	buffer->ptr = p + 2 * self->page_size;
> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 4);
> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 4);
>  	ASSERT_EQ(ret, 0);
>  	ASSERT_EQ(buffer->cpages, 4);
>  
>  	/* Page 5 won't be migrated to device 0 because it's on device 1. */
>  	buffer->ptr = p + 5 * self->page_size;
> -	ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1);
> +	ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
>  	ASSERT_EQ(ret, -ENOENT);
>  	buffer->ptr = p;
>  
> @@ -1070,8 +1135,12 @@ TEST_F(hmm2, migrate_mixed)
>  }
>  
>  /*
> - * Migrate anonymous memory to device private memory and fault it back to system
> - * memory multiple times.
> + * Migrate anonymous memory to device memory and back to system memory
> + * multiple times. In case of private zone configuration, this is done
> + * through fault pages accessed by CPU. In case of coherent zone configuration,
> + * the pages from the device should be explicitly migrated back to system memory.
> + * The reason is Coherent device zone has coherent access by CPU, therefore
> + * it will not generate any page fault.
>   */
>  TEST_F(hmm, migrate_multiple)
>  {
> @@ -1107,8 +1176,7 @@ TEST_F(hmm, migrate_multiple)
>  			ptr[i] = i;
>  
>  		/* Migrate memory to device. */
> -		ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer,
> -				      npages);
> +		ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>  		ASSERT_EQ(ret, 0);
>  		ASSERT_EQ(buffer->cpages, npages);
>  
> @@ -1116,7 +1184,12 @@ TEST_F(hmm, migrate_multiple)
>  		for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
>  			ASSERT_EQ(ptr[i], i);
>  
> -		/* Fault pages back to system memory and check them. */
> +		/* Migrate back to system memory and check them. */
> +		if (hmm_is_coherent_type(variant->device_number)) {
> +			ret = hmm_migrate_dev_to_sys(self->fd, buffer, npages);

So I think this will still pass even if nothing migrates so as mentioned on
the previous patch I think we need to check for the number of pages that
actually migrated. Alternatively I suppose you could do a snapshot and check
that, but that seems like it would be harder. Otherwise I think this looks
good.

> +			ASSERT_EQ(ret, 0);
> +		}
> +
>  		for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
>  			ASSERT_EQ(ptr[i], i);
>  
> @@ -1312,13 +1385,13 @@ TEST_F(hmm2, snapshot)
>  
>  	/* Page 5 will be migrated to device 0. */
>  	buffer->ptr = p + 5 * self->page_size;
> -	ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1);
> +	ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
>  	ASSERT_EQ(ret, 0);
>  	ASSERT_EQ(buffer->cpages, 1);
>  
>  	/* Page 6 will be migrated to device 1. */
>  	buffer->ptr = p + 6 * self->page_size;
> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 1);
> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 1);
>  	ASSERT_EQ(ret, 0);
>  	ASSERT_EQ(buffer->cpages, 1);
>  
> @@ -1335,9 +1408,16 @@ TEST_F(hmm2, snapshot)
>  	ASSERT_EQ(m[2], HMM_DMIRROR_PROT_ZERO | HMM_DMIRROR_PROT_READ);
>  	ASSERT_EQ(m[3], HMM_DMIRROR_PROT_READ);
>  	ASSERT_EQ(m[4], HMM_DMIRROR_PROT_WRITE);
> -	ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL |
> -			HMM_DMIRROR_PROT_WRITE);
> -	ASSERT_EQ(m[6], HMM_DMIRROR_PROT_NONE);
> +	if (!hmm_is_coherent_type(variant->device_number0)) {
> +		ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL |
> +				HMM_DMIRROR_PROT_WRITE);
> +		ASSERT_EQ(m[6], HMM_DMIRROR_PROT_NONE);
> +	} else {
> +		ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL |
> +				HMM_DMIRROR_PROT_WRITE);
> +		ASSERT_EQ(m[6], HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE |
> +				HMM_DMIRROR_PROT_WRITE);
> +	}
>  
>  	hmm_buffer_free(buffer);
>  }
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/10] tools: update test_hmm script to support SP config
  2022-01-10 22:32 ` [PATCH v3 10/10] tools: update test_hmm script to support SP config Alex Sierra
@ 2022-01-20  6:17   ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  6:17 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

Looks good,

Reviewed-by: Alistair Popple <apopple@nvidia.com>

On Tuesday, 11 January 2022 9:32:01 AM AEDT Alex Sierra wrote:
> Add two more parameters to set spm_addr_dev0 & spm_addr_dev1
> addresses. These two parameters configure the start SP
> addresses for each device in test_hmm driver.
> Consequently, this configures zone device type as coherent.
> 
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
> v2:
> Add more mknods for device coherent type. These are represented under
> /dev/hmm_mirror2 and /dev/hmm_mirror3, only in case they have created
> at probing the hmm-test driver.
> ---
>  tools/testing/selftests/vm/test_hmm.sh | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/test_hmm.sh b/tools/testing/selftests/vm/test_hmm.sh
> index 0647b525a625..539c9371e592 100755
> --- a/tools/testing/selftests/vm/test_hmm.sh
> +++ b/tools/testing/selftests/vm/test_hmm.sh
> @@ -40,11 +40,26 @@ check_test_requirements()
>  
>  load_driver()
>  {
> -	modprobe $DRIVER > /dev/null 2>&1
> +	if [ $# -eq 0 ]; then
> +		modprobe $DRIVER > /dev/null 2>&1
> +	else
> +		if [ $# -eq 2 ]; then
> +			modprobe $DRIVER spm_addr_dev0=$1 spm_addr_dev1=$2
> +				> /dev/null 2>&1
> +		else
> +			echo "Missing module parameters. Make sure pass"\
> +			"spm_addr_dev0 and spm_addr_dev1"
> +			usage
> +		fi
> +	fi
>  	if [ $? == 0 ]; then
>  		major=$(awk "\$2==\"HMM_DMIRROR\" {print \$1}" /proc/devices)
>  		mknod /dev/hmm_dmirror0 c $major 0
>  		mknod /dev/hmm_dmirror1 c $major 1
> +		if [ $# -eq 2 ]; then
> +			mknod /dev/hmm_dmirror2 c $major 2
> +			mknod /dev/hmm_dmirror3 c $major 3
> +		fi
>  	fi
>  }
>  
> @@ -58,7 +73,7 @@ run_smoke()
>  {
>  	echo "Running smoke test. Note, this test provides basic coverage."
>  
> -	load_driver
> +	load_driver $1 $2
>  	$(dirname "${BASH_SOURCE[0]}")/hmm-tests
>  	unload_driver
>  }
> @@ -75,6 +90,9 @@ usage()
>  	echo "# Smoke testing"
>  	echo "./${TEST_NAME}.sh smoke"
>  	echo
> +	echo "# Smoke testing with SPM enabled"
> +	echo "./${TEST_NAME}.sh smoke <spm_addr_dev0> <spm_addr_dev1>"
> +	echo
>  	exit 0
>  }
>  
> @@ -84,7 +102,7 @@ function run_test()
>  		usage
>  	else
>  		if [ "$1" = "smoke" ]; then
> -			run_smoke
> +			run_smoke $2 $3
>  		else
>  			usage
>  		fi
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
  2022-01-12 11:06 ` [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alistair Popple
@ 2022-01-20  6:33   ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20  6:33 UTC (permalink / raw)
  To: akpm, Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs,
	Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy

On Wednesday, 12 January 2022 10:06:03 PM AEDT Alistair Popple wrote:
> I have been looking at this in relation to the migration code and noticed we
> have the following in try_to_migrate():
> 
>         if (is_zone_device_page(page) && !is_device_private_page(page))
>                 return;
> 
> Which if I'm understanding correctly means that migration of device coherent
> pages will always fail. Given that I do wonder how hmm-tests are passing, but
> I assume you must always be hitting this fast path in
> migrate_vma_collect_pmd():
> 
>                 /*
>                  * Optimize for the common case where page is only mapped once
>                  * in one process. If we can lock the page, then we can safely
>                  * set up a special migration page table entry now.
>                  */
> 
> Meaning that try_to_migrate() never gets called from migrate_vma_unmap(). So
> you will also need some changes to try_to_migrate() and possibly
> try_to_migrate_one() to make this reliable.

I have been running the hmm tests with the changes below. I'm pretty sure these
are correct because the only zone device pages try_to_migrate_one() should be
called on are device coherent/private, and coherent pages can be treated just
the same as a normal pages for migration. However it would be worth checking I
haven't missed anything.

 - Alistair

---

diff --git a/mm/rmap.c b/mm/rmap.c
index 163ac4e6bcee..15f56c27daab 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1806,7 +1806,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
 		/* Update high watermark before we lower rss */
 		update_hiwater_rss(mm);
 
-		if (is_zone_device_page(page)) {
+		if (is_device_private_page(page)) {
 			unsigned long pfn = page_to_pfn(page);
 			swp_entry_t entry;
 			pte_t swp_pte;
@@ -1947,7 +1947,7 @@ void try_to_migrate(struct page *page, enum ttu_flags flags)
 					TTU_SYNC)))
 		return;
 
-	if (is_zone_device_page(page) && !is_device_private_page(page))
+	if (is_zone_device_page(page) && !is_device_page(page))
 		return;
 
 	/*





^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type
  2022-01-10 22:31 ` [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
@ 2022-01-20 12:36   ` Joao Martins
  2022-01-20 13:18     ` Alistair Popple
  0 siblings, 1 reply; 24+ messages in thread
From: Joao Martins @ 2022-01-20 12:36 UTC (permalink / raw)
  To: Alex Sierra
  Cc: amd-gfx, dri-devel, hch, jglisse, apopple, willy, akpm,
	Felix.Kuehling, linux-mm, rcampbell, linux-ext4, linux-xfs, jgg

On 1/10/22 22:31, Alex Sierra wrote:
> Avoid long term pinning for Coherent device type pages. This could
> interfere with their own device memory manager. For now, we are just
> returning error for PIN_LONGTERM Coherent device type pages. Eventually,
> these type of pages will get migrated to system memory, once the device
> migration pages support is added.
> 
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>  mm/gup.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 886d6148d3d0..9c8a075d862d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1720,6 +1720,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  		 * If we get a movable page, since we are going to be pinning
>  		 * these entries, try to move them out if possible.
>  		 */
> +		if (is_device_page(head)) {
> +			WARN_ON_ONCE(is_device_private_page(head));
> +			ret = -EFAULT;
> +			goto unpin_pages;
> +		}
> +

Wouldn't be more efficient for you failing earlier instead of after all the pages are pinned?

Filesystem DAX suffers from a somewhat similar issue[0] -- albeit it's more related to
blocking FOLL_LONGTERM in gup-fast while gup-slow can still do it. Coherent devmap appears
to want to block it in all gup.

On another thread Jason was suggesting about having different pgmap::flags to capture
these special cases[1] instead of selecting what different pgmap types can do in various
different places.

[0] https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10fef0@oracle.com/
[1] https://lore.kernel.org/linux-mm/20211019160136.GH3686969@ziepe.ca/


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type
  2022-01-20 12:36   ` Joao Martins
@ 2022-01-20 13:18     ` Alistair Popple
  0 siblings, 0 replies; 24+ messages in thread
From: Alistair Popple @ 2022-01-20 13:18 UTC (permalink / raw)
  To: Alex Sierra, Joao Martins
  Cc: amd-gfx, dri-devel, hch, jglisse, willy, akpm, Felix.Kuehling,
	linux-mm, rcampbell, linux-ext4, linux-xfs, jgg

On Thursday, 20 January 2022 11:36:21 PM AEDT Joao Martins wrote:
> On 1/10/22 22:31, Alex Sierra wrote:
> > Avoid long term pinning for Coherent device type pages. This could
> > interfere with their own device memory manager. For now, we are just
> > returning error for PIN_LONGTERM Coherent device type pages. Eventually,
> > these type of pages will get migrated to system memory, once the device
> > migration pages support is added.
> > 
> > Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> > ---
> >  mm/gup.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 886d6148d3d0..9c8a075d862d 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1720,6 +1720,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
> >  		 * If we get a movable page, since we are going to be pinning
> >  		 * these entries, try to move them out if possible.
> >  		 */
> > +		if (is_device_page(head)) {
> > +			WARN_ON_ONCE(is_device_private_page(head));
> > +			ret = -EFAULT;
> > +			goto unpin_pages;
> > +		}
> > +
> 
> Wouldn't be more efficient for you failing earlier instead of after all the pages are pinned?

Rather than failing I think the plan is to migrate the device coherent pages
like we do for ZONE_MOVABLE, so leaving this here is a good place holder until
that is done. Currently we are missing some functionality required to do that
but I am hoping to post a series fixing that soon.

> Filesystem DAX suffers from a somewhat similar issue[0] -- albeit it's more related to
> blocking FOLL_LONGTERM in gup-fast while gup-slow can still do it. Coherent devmap appears
> to want to block it in all gup.
> 
> On another thread Jason was suggesting about having different pgmap::flags to capture
> these special cases[1] instead of selecting what different pgmap types can do in various
> different places.
> 
> [0] https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10fef0@oracle.com/
> [1] https://lore.kernel.org/linux-mm/20211019160136.GH3686969@ziepe.ca/
> 






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 09/10] tools: update hmm-test to support device coherent type
  2022-01-20  6:14   ` Alistair Popple
@ 2022-01-27  3:22     ` Sierra Guiza, Alejandro (Alex)
  0 siblings, 0 replies; 24+ messages in thread
From: Sierra Guiza, Alejandro (Alex) @ 2022-01-27  3:22 UTC (permalink / raw)
  To: Alistair Popple, akpm, Felix.Kuehling, linux-mm, rcampbell,
	linux-ext4, linux-xfs
  Cc: amd-gfx, dri-devel, hch, jgg, jglisse, willy


On 1/20/2022 12:14 AM, Alistair Popple wrote:
> On Tuesday, 11 January 2022 9:32:00 AM AEDT Alex Sierra wrote:
>> Test cases such as migrate_fault and migrate_multiple, were modified to
>> explicit migrate from device to sys memory without the need of page
>> faults, when using device coherent type.
>>
>> Snapshot test case updated to read memory device type first and based
>> on that, get the proper returned results migrate_ping_pong test case
> Where is the migrate_ping_pong test? Did you perhaps forget to add it? :-)

Migration from device coherent to system is tested with migrate_multiple 
too. Therefore,
I've removed migrate_ping_pong test. BTW, I just added the "number of 
pages migrated"
checker after migrate from coherent to system on v4 series.

Regards,
Alejandro Sierra

>
>> added to test explicit migration from device to sys memory for both
>> private and coherent zone types.
>>
>> Helpers to migrate from device to sys memory and vicerversa
>> were also added.
>>
>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>> ---
>> v2:
>> Set FIXTURE_VARIANT to add multiple device types to the FIXTURE. This
>> will run all the tests for each device type (private and coherent) in
>> case both existed during hmm-test driver probed.
>> ---
>>   tools/testing/selftests/vm/hmm-tests.c | 122 ++++++++++++++++++++-----
>>   1 file changed, 101 insertions(+), 21 deletions(-)
>>
>> diff --git a/tools/testing/selftests/vm/hmm-tests.c b/tools/testing/selftests/vm/hmm-tests.c
>> index 864f126ffd78..8eb81dfba4b3 100644
>> --- a/tools/testing/selftests/vm/hmm-tests.c
>> +++ b/tools/testing/selftests/vm/hmm-tests.c
>> @@ -44,6 +44,14 @@ struct hmm_buffer {
>>   	int		fd;
>>   	uint64_t	cpages;
>>   	uint64_t	faults;
>> +	int		zone_device_type;
>> +};
>> +
>> +enum {
>> +	HMM_PRIVATE_DEVICE_ONE,
>> +	HMM_PRIVATE_DEVICE_TWO,
>> +	HMM_COHERENCE_DEVICE_ONE,
>> +	HMM_COHERENCE_DEVICE_TWO,
>>   };
>>   
>>   #define TWOMEG		(1 << 21)
>> @@ -60,6 +68,21 @@ FIXTURE(hmm)
>>   	unsigned int	page_shift;
>>   };
>>   
>> +FIXTURE_VARIANT(hmm)
>> +{
>> +	int     device_number;
>> +};
>> +
>> +FIXTURE_VARIANT_ADD(hmm, hmm_device_private)
>> +{
>> +	.device_number = HMM_PRIVATE_DEVICE_ONE,
>> +};
>> +
>> +FIXTURE_VARIANT_ADD(hmm, hmm_device_coherent)
>> +{
>> +	.device_number = HMM_COHERENCE_DEVICE_ONE,
>> +};
>> +
>>   FIXTURE(hmm2)
>>   {
>>   	int		fd0;
>> @@ -68,6 +91,24 @@ FIXTURE(hmm2)
>>   	unsigned int	page_shift;
>>   };
>>   
>> +FIXTURE_VARIANT(hmm2)
>> +{
>> +	int     device_number0;
>> +	int     device_number1;
>> +};
>> +
>> +FIXTURE_VARIANT_ADD(hmm2, hmm2_device_private)
>> +{
>> +	.device_number0 = HMM_PRIVATE_DEVICE_ONE,
>> +	.device_number1 = HMM_PRIVATE_DEVICE_TWO,
>> +};
>> +
>> +FIXTURE_VARIANT_ADD(hmm2, hmm2_device_coherent)
>> +{
>> +	.device_number0 = HMM_COHERENCE_DEVICE_ONE,
>> +	.device_number1 = HMM_COHERENCE_DEVICE_TWO,
>> +};
>> +
>>   static int hmm_open(int unit)
>>   {
>>   	char pathname[HMM_PATH_MAX];
>> @@ -81,12 +122,19 @@ static int hmm_open(int unit)
>>   	return fd;
>>   }
>>   
>> +static bool hmm_is_coherent_type(int dev_num)
>> +{
>> +	return (dev_num >= HMM_COHERENCE_DEVICE_ONE);
>> +}
>> +
>>   FIXTURE_SETUP(hmm)
>>   {
>>   	self->page_size = sysconf(_SC_PAGE_SIZE);
>>   	self->page_shift = ffs(self->page_size) - 1;
>>   
>> -	self->fd = hmm_open(0);
>> +	self->fd = hmm_open(variant->device_number);
>> +	if (self->fd < 0 && hmm_is_coherent_type(variant->device_number))
>> +		SKIP(exit(0), "DEVICE_COHERENT not available");
>>   	ASSERT_GE(self->fd, 0);
>>   }
>>   
>> @@ -95,9 +143,11 @@ FIXTURE_SETUP(hmm2)
>>   	self->page_size = sysconf(_SC_PAGE_SIZE);
>>   	self->page_shift = ffs(self->page_size) - 1;
>>   
>> -	self->fd0 = hmm_open(0);
>> +	self->fd0 = hmm_open(variant->device_number0);
>> +	if (self->fd0 < 0 && hmm_is_coherent_type(variant->device_number0))
>> +		SKIP(exit(0), "DEVICE_COHERENT not available");
>>   	ASSERT_GE(self->fd0, 0);
>> -	self->fd1 = hmm_open(1);
>> +	self->fd1 = hmm_open(variant->device_number1);
>>   	ASSERT_GE(self->fd1, 0);
>>   }
>>   
>> @@ -144,6 +194,7 @@ static int hmm_dmirror_cmd(int fd,
>>   	}
>>   	buffer->cpages = cmd.cpages;
>>   	buffer->faults = cmd.faults;
>> +	buffer->zone_device_type = cmd.zone_device_type;
>>   
>>   	return 0;
>>   }
>> @@ -211,6 +262,20 @@ static void hmm_nanosleep(unsigned int n)
>>   	nanosleep(&t, NULL);
>>   }
>>   
>> +static int hmm_migrate_sys_to_dev(int fd,
>> +				   struct hmm_buffer *buffer,
>> +				   unsigned long npages)
>> +{
>> +	return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_DEV, buffer, npages);
>> +}
>> +
>> +static int hmm_migrate_dev_to_sys(int fd,
>> +				   struct hmm_buffer *buffer,
>> +				   unsigned long npages)
>> +{
>> +	return hmm_dmirror_cmd(fd, HMM_DMIRROR_MIGRATE_TO_SYS, buffer, npages);
>> +}
>> +
>>   /*
>>    * Simple NULL test of device open/close.
>>    */
>> @@ -875,7 +940,7 @@ TEST_F(hmm, migrate)
>>   		ptr[i] = i;
>>   
>>   	/* Migrate memory to device. */
>> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
>> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>>   	ASSERT_EQ(ret, 0);
>>   	ASSERT_EQ(buffer->cpages, npages);
>>   
>> @@ -923,7 +988,7 @@ TEST_F(hmm, migrate_fault)
>>   		ptr[i] = i;
>>   
>>   	/* Migrate memory to device. */
>> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
>> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>>   	ASSERT_EQ(ret, 0);
>>   	ASSERT_EQ(buffer->cpages, npages);
>>   
>> @@ -936,7 +1001,7 @@ TEST_F(hmm, migrate_fault)
>>   		ASSERT_EQ(ptr[i], i);
>>   
>>   	/* Migrate memory to the device again. */
>> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
>> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>>   	ASSERT_EQ(ret, 0);
>>   	ASSERT_EQ(buffer->cpages, npages);
>>   
>> @@ -976,7 +1041,7 @@ TEST_F(hmm, migrate_shared)
>>   	ASSERT_NE(buffer->ptr, MAP_FAILED);
>>   
>>   	/* Migrate memory to device. */
>> -	ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer, npages);
>> +	ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>>   	ASSERT_EQ(ret, -ENOENT);
>>   
>>   	hmm_buffer_free(buffer);
>> @@ -1015,7 +1080,7 @@ TEST_F(hmm2, migrate_mixed)
>>   	p = buffer->ptr;
>>   
>>   	/* Migrating a protected area should be an error. */
>> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, npages);
>> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, npages);
>>   	ASSERT_EQ(ret, -EINVAL);
>>   
>>   	/* Punch a hole after the first page address. */
>> @@ -1023,7 +1088,7 @@ TEST_F(hmm2, migrate_mixed)
>>   	ASSERT_EQ(ret, 0);
>>   
>>   	/* We expect an error if the vma doesn't cover the range. */
>> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 3);
>> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 3);
>>   	ASSERT_EQ(ret, -EINVAL);
>>   
>>   	/* Page 2 will be a read-only zero page. */
>> @@ -1055,13 +1120,13 @@ TEST_F(hmm2, migrate_mixed)
>>   
>>   	/* Now try to migrate pages 2-5 to device 1. */
>>   	buffer->ptr = p + 2 * self->page_size;
>> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 4);
>> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 4);
>>   	ASSERT_EQ(ret, 0);
>>   	ASSERT_EQ(buffer->cpages, 4);
>>   
>>   	/* Page 5 won't be migrated to device 0 because it's on device 1. */
>>   	buffer->ptr = p + 5 * self->page_size;
>> -	ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1);
>> +	ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
>>   	ASSERT_EQ(ret, -ENOENT);
>>   	buffer->ptr = p;
>>   
>> @@ -1070,8 +1135,12 @@ TEST_F(hmm2, migrate_mixed)
>>   }
>>   
>>   /*
>> - * Migrate anonymous memory to device private memory and fault it back to system
>> - * memory multiple times.
>> + * Migrate anonymous memory to device memory and back to system memory
>> + * multiple times. In case of private zone configuration, this is done
>> + * through fault pages accessed by CPU. In case of coherent zone configuration,
>> + * the pages from the device should be explicitly migrated back to system memory.
>> + * The reason is Coherent device zone has coherent access by CPU, therefore
>> + * it will not generate any page fault.
>>    */
>>   TEST_F(hmm, migrate_multiple)
>>   {
>> @@ -1107,8 +1176,7 @@ TEST_F(hmm, migrate_multiple)
>>   			ptr[i] = i;
>>   
>>   		/* Migrate memory to device. */
>> -		ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_MIGRATE, buffer,
>> -				      npages);
>> +		ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages);
>>   		ASSERT_EQ(ret, 0);
>>   		ASSERT_EQ(buffer->cpages, npages);
>>   
>> @@ -1116,7 +1184,12 @@ TEST_F(hmm, migrate_multiple)
>>   		for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i)
>>   			ASSERT_EQ(ptr[i], i);
>>   
>> -		/* Fault pages back to system memory and check them. */
>> +		/* Migrate back to system memory and check them. */
>> +		if (hmm_is_coherent_type(variant->device_number)) {
>> +			ret = hmm_migrate_dev_to_sys(self->fd, buffer, npages);
> So I think this will still pass even if nothing migrates so as mentioned on
> the previous patch I think we need to check for the number of pages that
> actually migrated. Alternatively I suppose you could do a snapshot and check
> that, but that seems like it would be harder. Otherwise I think this looks
> good.
>
>> +			ASSERT_EQ(ret, 0);
>> +		}
>> +
>>   		for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i)
>>   			ASSERT_EQ(ptr[i], i);
>>   
>> @@ -1312,13 +1385,13 @@ TEST_F(hmm2, snapshot)
>>   
>>   	/* Page 5 will be migrated to device 0. */
>>   	buffer->ptr = p + 5 * self->page_size;
>> -	ret = hmm_dmirror_cmd(self->fd0, HMM_DMIRROR_MIGRATE, buffer, 1);
>> +	ret = hmm_migrate_sys_to_dev(self->fd0, buffer, 1);
>>   	ASSERT_EQ(ret, 0);
>>   	ASSERT_EQ(buffer->cpages, 1);
>>   
>>   	/* Page 6 will be migrated to device 1. */
>>   	buffer->ptr = p + 6 * self->page_size;
>> -	ret = hmm_dmirror_cmd(self->fd1, HMM_DMIRROR_MIGRATE, buffer, 1);
>> +	ret = hmm_migrate_sys_to_dev(self->fd1, buffer, 1);
>>   	ASSERT_EQ(ret, 0);
>>   	ASSERT_EQ(buffer->cpages, 1);
>>   
>> @@ -1335,9 +1408,16 @@ TEST_F(hmm2, snapshot)
>>   	ASSERT_EQ(m[2], HMM_DMIRROR_PROT_ZERO | HMM_DMIRROR_PROT_READ);
>>   	ASSERT_EQ(m[3], HMM_DMIRROR_PROT_READ);
>>   	ASSERT_EQ(m[4], HMM_DMIRROR_PROT_WRITE);
>> -	ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL |
>> -			HMM_DMIRROR_PROT_WRITE);
>> -	ASSERT_EQ(m[6], HMM_DMIRROR_PROT_NONE);
>> +	if (!hmm_is_coherent_type(variant->device_number0)) {
>> +		ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL |
>> +				HMM_DMIRROR_PROT_WRITE);
>> +		ASSERT_EQ(m[6], HMM_DMIRROR_PROT_NONE);
>> +	} else {
>> +		ASSERT_EQ(m[5], HMM_DMIRROR_PROT_DEV_COHERENT_LOCAL |
>> +				HMM_DMIRROR_PROT_WRITE);
>> +		ASSERT_EQ(m[6], HMM_DMIRROR_PROT_DEV_COHERENT_REMOTE |
>> +				HMM_DMIRROR_PROT_WRITE);
>> +	}
>>   
>>   	hmm_buffer_free(buffer);
>>   }
>>
>
>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-01-27  3:22 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 22:31 [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
2022-01-10 22:31 ` [PATCH v3 01/10] mm: add zone device coherent type memory support Alex Sierra
2022-01-20  4:08   ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 02/10] mm: add device coherent vma selection for memory migration Alex Sierra
2022-01-10 22:31 ` [PATCH v3 03/10] mm/gup: fail get_user_pages for LONGTERM dev coherent type Alex Sierra
2022-01-20 12:36   ` Joao Martins
2022-01-20 13:18     ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 04/10] drm/amdkfd: add SPM support for SVM Alex Sierra
2022-01-10 22:31 ` [PATCH v3 05/10] drm/amdkfd: coherent type as sys mem on migration to ram Alex Sierra
2022-01-10 22:31 ` [PATCH v3 06/10] lib: test_hmm add ioctl to get zone device type Alex Sierra
2022-01-20  5:01   ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 07/10] lib: test_hmm add module param for " Alex Sierra
2022-01-20  5:23   ` Alistair Popple
2022-01-10 22:31 ` [PATCH v3 08/10] lib: add support for device coherent type in test_hmm Alex Sierra
2022-01-20  6:00   ` Alistair Popple
2022-01-10 22:32 ` [PATCH v3 09/10] tools: update hmm-test to support device coherent type Alex Sierra
2022-01-20  6:14   ` Alistair Popple
2022-01-27  3:22     ` Sierra Guiza, Alejandro (Alex)
2022-01-10 22:32 ` [PATCH v3 10/10] tools: update test_hmm script to support SP config Alex Sierra
2022-01-20  6:17   ` Alistair Popple
2022-01-12 11:06 ` [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alistair Popple
2022-01-20  6:33   ` Alistair Popple
2022-01-12 11:16 ` David Hildenbrand
2022-01-12 16:08   ` Felix Kuehling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).