dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD
@ 2021-04-22  1:30 Felix Kuehling
  2021-04-22  1:30 ` [PATCH v2 01/10] rock-dbg_defconfig: Enable Intel IOMMU Felix Kuehling
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

This patch series fixes DMA-mappings of system memory (GTT and userptr)
for KFD running on multi-GPU systems with IOMMU enabled. One SG-BO per
GPU is needed to maintain the DMA mappings of each BO.

Changes in v2:
- Made the original BO parent of the SG BO to fix bo destruction order
- Removed individualiation hack that is, not needed with parent BO
- Removed resv locking hace in amdgpu_ttm_unpopulate, not needed without
  the individualization hack
- Added a patch to enable the Intel IOMMU driver in rock-dbg_defconfig
- Added a patch to move dmabuf attach/detach into backend_(un)bind

I'm still seeing some IOMMU access faults in the eviction test. They seem
to be related to userptr handling. They happen even without this patch
series on a single-GPU system, where this patch series is not needed. I
believe this is an old problem in KFD or amdgpu that is being exposed by
device isolation from the IOMMU. I'm debugging it, but it should not hold
up this patch series.

"drm/ttm: Don't count pages in SG BOs against pages_limit" was already
applied to drm-misc (I think). I'm still including it here because my
patches depend on it. Without that, the SG BOs created for DMA mappings
cause many tests fail because TTM incorrectly thinks it's out of memory.

Felix Kuehling (10):
  rock-dbg_defconfig: Enable Intel IOMMU
  drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
  drm/amdgpu: Keep a bo-reference per-attachment
  drm/amdgpu: Simplify AQL queue mapping
  drm/amdgpu: Add multi-GPU DMA mapping helpers
  drm/amdgpu: DMA map/unmap when updating GPU mappings
  drm/amdgpu: Move kfd_mem_attach outside reservation
  drm/amdgpu: Add DMA mapping of GTT BOs
  drm/ttm: Don't count pages in SG BOs against pages_limit
  drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind

 arch/x86/configs/rock-dbg_defconfig           |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  18 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 530 ++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  51 +-
 drivers/gpu/drm/ttm/ttm_tt.c                  |  27 +-
 5 files changed, 437 insertions(+), 200 deletions(-)

-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 01/10] rock-dbg_defconfig: Enable Intel IOMMU
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-04-22  1:30 ` [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment Felix Kuehling
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

Enable the Intel IOMMU driver in the rock-dbg_defconfig. This enables
testing of DMA mappings on systems with an Intel IOMMU.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 arch/x86/configs/rock-dbg_defconfig | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/configs/rock-dbg_defconfig b/arch/x86/configs/rock-dbg_defconfig
index 54688993d6e2..9f7d93307754 100644
--- a/arch/x86/configs/rock-dbg_defconfig
+++ b/arch/x86/configs/rock-dbg_defconfig
@@ -296,6 +296,7 @@ CONFIG_ARCH_SUSPEND_POSSIBLE=y
 CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
 CONFIG_ZONE_DMA32=y
 CONFIG_AUDIT_ARCH=y
+CONFIG_HAVE_INTEL_TXT=y
 CONFIG_X86_64_SMP=y
 CONFIG_ARCH_SUPPORTS_UPROBES=y
 CONFIG_FIX_EARLYCON_MEM=y
@@ -3112,6 +3113,7 @@ CONFIG_DRM_AMD_DC_DCN=y
 # end of Display Engine Configuration
 
 CONFIG_HSA_AMD=y
+CONFIG_HSA_AMD_SVM=y
 # CONFIG_DRM_NOUVEAU is not set
 # CONFIG_DRM_I915 is not set
 # CONFIG_DRM_VGEM is not set
@@ -3770,6 +3772,7 @@ CONFIG_MAILBOX=y
 CONFIG_PCC=y
 # CONFIG_ALTERA_MBOX is not set
 CONFIG_IOMMU_IOVA=y
+CONFIG_IOASID=y
 CONFIG_IOMMU_API=y
 CONFIG_IOMMU_SUPPORT=y
 
@@ -3783,7 +3786,12 @@ CONFIG_IOMMU_SUPPORT=y
 CONFIG_IOMMU_DMA=y
 CONFIG_AMD_IOMMU=y
 CONFIG_AMD_IOMMU_V2=m
-# CONFIG_INTEL_IOMMU is not set
+CONFIG_DMAR_TABLE=y
+CONFIG_INTEL_IOMMU=y
+# CONFIG_INTEL_IOMMU_SVM is not set
+CONFIG_INTEL_IOMMU_DEFAULT_ON=y
+CONFIG_INTEL_IOMMU_FLOPPY_WA=y
+# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
 # CONFIG_IRQ_REMAP is not set
 
 #
@@ -4184,6 +4192,7 @@ CONFIG_SECURITY_NETWORK=y
 CONFIG_PAGE_TABLE_ISOLATION=y
 # CONFIG_SECURITY_NETWORK_XFRM is not set
 # CONFIG_SECURITY_PATH is not set
+# CONFIG_INTEL_TXT is not set
 CONFIG_LSM_MMAP_MIN_ADDR=65536
 CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y
 # CONFIG_HARDENED_USERCOPY is not set
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
  2021-04-22  1:30 ` [PATCH v2 01/10] rock-dbg_defconfig: Enable Intel IOMMU Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-05-10 22:00   ` Errabolu, Ramesh
  2021-04-22  1:30 ` [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment Felix Kuehling
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

This name is more fitting, especially for the changes coming next to
support multi-GPU systems with proper DMA mappings. Cleaned up the code
and renamed some related functions and variables to improve readability.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 209 +++++++++---------
 2 files changed, 104 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 313ee49b9f17..c24b2478f445 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -38,10 +38,10 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
 
 struct amdgpu_device;
 
-struct kfd_bo_va_list {
-	struct list_head bo_list;
+struct kfd_mem_attachment {
+	struct list_head list;
 	struct amdgpu_bo_va *bo_va;
-	void *kgd_dev;
+	struct amdgpu_device *adev;
 	bool is_mapped;
 	uint64_t va;
 	uint64_t pte_flags;
@@ -50,7 +50,7 @@ struct kfd_bo_va_list {
 struct kgd_mem {
 	struct mutex lock;
 	struct amdgpu_bo *bo;
-	struct list_head bo_va_list;
+	struct list_head attachments;
 	/* protected by amdkfd_process_info.lock */
 	struct ttm_validate_buffer validate_list;
 	struct ttm_validate_buffer resv_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index dfa025d694f8..fee4c64dd051 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -72,16 +72,16 @@ static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
 	return (struct amdgpu_device *)kgd;
 }
 
-static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm,
+static bool kfd_mem_is_attached(struct amdgpu_vm *avm,
 		struct kgd_mem *mem)
 {
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list)
+	list_for_each_entry(entry, &mem->attachments, list)
 		if (entry->bo_va->base.vm == avm)
-			return false;
+			return true;
 
-	return true;
+	return false;
 }
 
 /* Set memory usage limits. Current, limits are
@@ -473,7 +473,7 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
 	return pte_flags;
 }
 
-/* add_bo_to_vm - Add a BO to a VM
+/* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
  * to a VM. It can later be mapped and unmapped many times without
@@ -485,15 +485,14 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
  * 4. Alloc page tables and directories if needed
  * 4a.  Validate new page tables and directories
  */
-static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
+static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		struct amdgpu_vm *vm, bool is_aql,
-		struct kfd_bo_va_list **p_bo_va_entry)
+		struct kfd_mem_attachment **p_attachment)
 {
 	int ret;
-	struct kfd_bo_va_list *bo_va_entry;
+	struct kfd_mem_attachment *attachment;
 	struct amdgpu_bo *bo = mem->bo;
 	uint64_t va = mem->va;
-	struct list_head *list_bo_va = &mem->bo_va_list;
 	unsigned long bo_size = bo->tbo.base.size;
 
 	if (!va) {
@@ -504,29 +503,29 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
 	if (is_aql)
 		va += bo_size;
 
-	bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL);
-	if (!bo_va_entry)
+	attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
+	if (!attachment)
 		return -ENOMEM;
 
 	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
 			va + bo_size, vm);
 
 	/* Add BO to VM internal data structures*/
-	bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
-	if (!bo_va_entry->bo_va) {
+	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
+	if (!attachment->bo_va) {
 		ret = -EINVAL;
 		pr_err("Failed to add BO object to VM. ret == %d\n",
 				ret);
 		goto err_vmadd;
 	}
 
-	bo_va_entry->va = va;
-	bo_va_entry->pte_flags = get_pte_flags(adev, mem);
-	bo_va_entry->kgd_dev = (void *)adev;
-	list_add(&bo_va_entry->bo_list, list_bo_va);
+	attachment->va = va;
+	attachment->pte_flags = get_pte_flags(adev, mem);
+	attachment->adev = adev;
+	list_add(&attachment->list, &mem->attachments);
 
-	if (p_bo_va_entry)
-		*p_bo_va_entry = bo_va_entry;
+	if (p_attachment)
+		*p_attachment = attachment;
 
 	/* Allocate validate page tables if needed */
 	ret = vm_validate_pt_pd_bos(vm);
@@ -538,22 +537,20 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
 	return 0;
 
 err_alloc_pts:
-	amdgpu_vm_bo_rmv(adev, bo_va_entry->bo_va);
-	list_del(&bo_va_entry->bo_list);
+	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
+	list_del(&attachment->list);
 err_vmadd:
-	kfree(bo_va_entry);
+	kfree(attachment);
 	return ret;
 }
 
-static void remove_bo_from_vm(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry, unsigned long size)
+static void kfd_mem_detach(struct kfd_mem_attachment *attachment)
 {
-	pr_debug("\t remove VA 0x%llx - 0x%llx in entry %p\n",
-			entry->va,
-			entry->va + size, entry);
-	amdgpu_vm_bo_rmv(adev, entry->bo_va);
-	list_del(&entry->bo_list);
-	kfree(entry);
+	pr_debug("\t remove VA 0x%llx in entry %p\n",
+			attachment->va, attachment);
+	amdgpu_vm_bo_rmv(attachment->adev, attachment->bo_va);
+	list_del(&attachment->list);
+	kfree(attachment);
 }
 
 static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
@@ -728,7 +725,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 				struct bo_vm_reservation_context *ctx)
 {
 	struct amdgpu_bo *bo = mem->bo;
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 	unsigned int i;
 	int ret;
 
@@ -740,7 +737,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 	INIT_LIST_HEAD(&ctx->list);
 	INIT_LIST_HEAD(&ctx->duplicates);
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+	list_for_each_entry(entry, &mem->attachments, list) {
 		if ((vm && vm != entry->bo_va->base.vm) ||
 			(entry->is_mapped != map_type
 			&& map_type != BO_VM_ALL))
@@ -762,7 +759,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
 
 	i = 0;
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+	list_for_each_entry(entry, &mem->attachments, list) {
 		if ((vm && vm != entry->bo_va->base.vm) ||
 			(entry->is_mapped != map_type
 			&& map_type != BO_VM_ALL))
@@ -817,7 +814,7 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
 }
 
 static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
-				struct kfd_bo_va_list *entry,
+				struct kfd_mem_attachment *entry,
 				struct amdgpu_sync *sync)
 {
 	struct amdgpu_bo_va *bo_va = entry->bo_va;
@@ -833,7 +830,7 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
 }
 
 static int update_gpuvm_pte(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry,
+		struct kfd_mem_attachment *entry,
 		struct amdgpu_sync *sync)
 {
 	int ret;
@@ -850,7 +847,7 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
 }
 
 static int map_bo_to_gpuvm(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry, struct amdgpu_sync *sync,
+		struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
 		bool no_update_pte)
 {
 	int ret;
@@ -1194,7 +1191,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		ret = -ENOMEM;
 		goto err;
 	}
-	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	INIT_LIST_HEAD(&(*mem)->attachments);
 	mutex_init(&(*mem)->lock);
 	(*mem)->aql_queue = !!(flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM);
 
@@ -1283,7 +1280,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 {
 	struct amdkfd_process_info *process_info = mem->process_info;
 	unsigned long bo_size = mem->bo->tbo.base.size;
-	struct kfd_bo_va_list *entry, *tmp;
+	struct kfd_mem_attachment *entry, *tmp;
 	struct bo_vm_reservation_context ctx;
 	struct ttm_validate_buffer *bo_list_entry;
 	unsigned int mapped_to_gpu_memory;
@@ -1327,9 +1324,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 		mem->va + bo_size * (1 + mem->aql_queue));
 
 	/* Remove from VM internal data structures */
-	list_for_each_entry_safe(entry, tmp, &mem->bo_va_list, bo_list)
-		remove_bo_from_vm((struct amdgpu_device *)entry->kgd_dev,
-				entry, bo_size);
+	list_for_each_entry_safe(entry, tmp, &mem->attachments, list)
+		kfd_mem_detach(entry);
 
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
@@ -1372,10 +1368,10 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	int ret;
 	struct amdgpu_bo *bo;
 	uint32_t domain;
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 	struct bo_vm_reservation_context ctx;
-	struct kfd_bo_va_list *bo_va_entry = NULL;
-	struct kfd_bo_va_list *bo_va_entry_aql = NULL;
+	struct kfd_mem_attachment *attachment = NULL;
+	struct kfd_mem_attachment *attachment_aql = NULL;
 	unsigned long bo_size;
 	bool is_invalid_userptr = false;
 
@@ -1424,21 +1420,20 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	    bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
 		is_invalid_userptr = true;
 
-	if (check_if_add_bo_to_vm(avm, mem)) {
-		ret = add_bo_to_vm(adev, mem, avm, false,
-				&bo_va_entry);
+	if (!kfd_mem_is_attached(avm, mem)) {
+		ret = kfd_mem_attach(adev, mem, avm, false, &attachment);
 		if (ret)
-			goto add_bo_to_vm_failed;
+			goto attach_failed;
 		if (mem->aql_queue) {
-			ret = add_bo_to_vm(adev, mem, avm,
-					true, &bo_va_entry_aql);
+			ret = kfd_mem_attach(adev, mem, avm, true,
+					     &attachment_aql);
 			if (ret)
-				goto add_bo_to_vm_failed_aql;
+				goto attach_failed_aql;
 		}
 	} else {
 		ret = vm_validate_pt_pd_bos(avm);
 		if (unlikely(ret))
-			goto add_bo_to_vm_failed;
+			goto attach_failed;
 	}
 
 	if (mem->mapped_to_gpu_memory == 0 &&
@@ -1454,30 +1449,30 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		}
 	}
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
-		if (entry->bo_va->base.vm == avm && !entry->is_mapped) {
-			pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
-					entry->va, entry->va + bo_size,
-					entry);
+	list_for_each_entry(entry, &mem->attachments, list) {
+		if (entry->bo_va->base.vm != avm || entry->is_mapped)
+			continue;
 
-			ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
-					      is_invalid_userptr);
-			if (ret) {
-				pr_err("Failed to map bo to gpuvm\n");
-				goto map_bo_to_gpuvm_failed;
-			}
+		pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
+			 entry->va, entry->va + bo_size, entry);
 
-			ret = vm_update_pds(avm, ctx.sync);
-			if (ret) {
-				pr_err("Failed to update page directories\n");
-				goto map_bo_to_gpuvm_failed;
-			}
+		ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
+				      is_invalid_userptr);
+		if (ret) {
+			pr_err("Failed to map bo to gpuvm\n");
+			goto map_bo_to_gpuvm_failed;
+		}
 
-			entry->is_mapped = true;
-			mem->mapped_to_gpu_memory++;
-			pr_debug("\t INC mapping count %d\n",
-					mem->mapped_to_gpu_memory);
+		ret = vm_update_pds(avm, ctx.sync);
+		if (ret) {
+			pr_err("Failed to update page directories\n");
+			goto map_bo_to_gpuvm_failed;
 		}
+
+		entry->is_mapped = true;
+		mem->mapped_to_gpu_memory++;
+		pr_debug("\t INC mapping count %d\n",
+			 mem->mapped_to_gpu_memory);
 	}
 
 	if (!amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) && !bo->tbo.pin_count)
@@ -1489,12 +1484,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	goto out;
 
 map_bo_to_gpuvm_failed:
-	if (bo_va_entry_aql)
-		remove_bo_from_vm(adev, bo_va_entry_aql, bo_size);
-add_bo_to_vm_failed_aql:
-	if (bo_va_entry)
-		remove_bo_from_vm(adev, bo_va_entry, bo_size);
-add_bo_to_vm_failed:
+	if (attachment_aql)
+		kfd_mem_detach(attachment_aql);
+attach_failed_aql:
+	if (attachment)
+		kfd_mem_detach(attachment);
+attach_failed:
 	unreserve_bo_and_vms(&ctx, false, false);
 out:
 	mutex_unlock(&mem->process_info->lock);
@@ -1509,7 +1504,7 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
 	struct amdkfd_process_info *process_info = avm->process_info;
 	unsigned long bo_size = mem->bo->tbo.base.size;
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 	struct bo_vm_reservation_context ctx;
 	int ret;
 
@@ -1533,26 +1528,24 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 		mem->va + bo_size * (1 + mem->aql_queue),
 		avm);
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
-		if (entry->bo_va->base.vm == avm && entry->is_mapped) {
-			pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
-					entry->va,
-					entry->va + bo_size,
-					entry);
+	list_for_each_entry(entry, &mem->attachments, list) {
+		if (entry->bo_va->base.vm != avm || !entry->is_mapped)
+			continue;
 
-			ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
-			if (ret == 0) {
-				entry->is_mapped = false;
-			} else {
-				pr_err("failed to unmap VA 0x%llx\n",
-						mem->va);
-				goto unreserve_out;
-			}
+		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
+			 entry->va, entry->va + bo_size, entry);
 
-			mem->mapped_to_gpu_memory--;
-			pr_debug("\t DEC mapping count %d\n",
-					mem->mapped_to_gpu_memory);
+		ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
+		if (ret == 0) {
+			entry->is_mapped = false;
+		} else {
+			pr_err("failed to unmap VA 0x%llx\n", mem->va);
+			goto unreserve_out;
 		}
+
+		mem->mapped_to_gpu_memory--;
+		pr_debug("\t DEC mapping count %d\n",
+			 mem->mapped_to_gpu_memory);
 	}
 
 	/* If BO is unmapped from all VMs, unfence it. It can be evicted if
@@ -1701,7 +1694,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
 	if (mmap_offset)
 		*mmap_offset = amdgpu_bo_mmap_offset(bo);
 
-	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	INIT_LIST_HEAD(&(*mem)->attachments);
 	mutex_init(&(*mem)->lock);
 
 	(*mem)->alloc_flags =
@@ -1898,7 +1891,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_inval_list,
 				 validate_list.head) {
-		struct kfd_bo_va_list *bo_va_entry;
+		struct kfd_mem_attachment *attachment;
 
 		bo = mem->bo;
 
@@ -1921,13 +1914,13 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 		 * VM faults if the GPU tries to access the invalid
 		 * memory.
 		 */
-		list_for_each_entry(bo_va_entry, &mem->bo_va_list, bo_list) {
-			if (!bo_va_entry->is_mapped)
+		list_for_each_entry(attachment, &mem->attachments, list) {
+			if (!attachment->is_mapped)
 				continue;
 
 			ret = update_gpuvm_pte((struct amdgpu_device *)
-					       bo_va_entry->kgd_dev,
-					       bo_va_entry, &sync);
+					       attachment->adev,
+					       attachment, &sync);
 			if (ret) {
 				pr_err("%s: update PTE failed\n", __func__);
 				/* make sure this gets validated again */
@@ -2108,7 +2101,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 
 		struct amdgpu_bo *bo = mem->bo;
 		uint32_t domain = mem->domain;
-		struct kfd_bo_va_list *bo_va_entry;
+		struct kfd_mem_attachment *attachment;
 
 		total_size += amdgpu_bo_size(bo);
 
@@ -2128,11 +2121,9 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 			pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
 			goto validate_map_fail;
 		}
-		list_for_each_entry(bo_va_entry, &mem->bo_va_list,
-				    bo_list) {
+		list_for_each_entry(attachment, &mem->attachments, list) {
 			ret = update_gpuvm_pte((struct amdgpu_device *)
-					      bo_va_entry->kgd_dev,
-					      bo_va_entry,
+					      attachment->adev, attachment,
 					      &sync_obj);
 			if (ret) {
 				pr_debug("Memory eviction: update PTE failed. Try again\n");
@@ -2208,7 +2199,7 @@ int amdgpu_amdkfd_add_gws_to_process(void *info, void *gws, struct kgd_mem **mem
 		return -ENOMEM;
 
 	mutex_init(&(*mem)->lock);
-	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	INIT_LIST_HEAD(&(*mem)->attachments);
 	(*mem)->bo = amdgpu_bo_ref(gws_bo);
 	(*mem)->domain = AMDGPU_GEM_DOMAIN_GWS;
 	(*mem)->process_info = process_info;
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
  2021-04-22  1:30 ` [PATCH v2 01/10] rock-dbg_defconfig: Enable Intel IOMMU Felix Kuehling
  2021-04-22  1:30 ` [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-05-10 22:00   ` Errabolu, Ramesh
  2021-04-22  1:30 ` [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping Felix Kuehling
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

For now they all reference the same BO. For correct DMA mappings they will
refer to different BOs per-GPU.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index fee4c64dd051..34c9a2d0028e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -489,11 +489,11 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		struct amdgpu_vm *vm, bool is_aql,
 		struct kfd_mem_attachment **p_attachment)
 {
-	int ret;
-	struct kfd_mem_attachment *attachment;
-	struct amdgpu_bo *bo = mem->bo;
+	unsigned long bo_size = mem->bo->tbo.base.size;
 	uint64_t va = mem->va;
-	unsigned long bo_size = bo->tbo.base.size;
+	struct kfd_mem_attachment *attachment;
+	struct amdgpu_bo *bo;
+	int ret;
 
 	if (!va) {
 		pr_err("Invalid VA when adding BO to VM\n");
@@ -510,6 +510,14 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
 			va + bo_size, vm);
 
+	/* FIXME: For now all attachments use the same BO. This is incorrect
+	 * because one BO can only have one DMA mapping for one GPU. We need
+	 * one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
+	 * will be addressed one BO-type at a time in subsequent patches.
+	 */
+	bo = mem->bo;
+	drm_gem_object_get(&bo->tbo.base);
+
 	/* Add BO to VM internal data structures*/
 	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
 	if (!attachment->bo_va) {
@@ -529,7 +537,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 
 	/* Allocate validate page tables if needed */
 	ret = vm_validate_pt_pd_bos(vm);
-	if (ret) {
+	if (unlikely(ret)) {
 		pr_err("validate_pt_pd_bos() failed\n");
 		goto err_alloc_pts;
 	}
@@ -540,15 +548,19 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
 	list_del(&attachment->list);
 err_vmadd:
+	drm_gem_object_put(&bo->tbo.base);
 	kfree(attachment);
 	return ret;
 }
 
 static void kfd_mem_detach(struct kfd_mem_attachment *attachment)
 {
+	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
 	pr_debug("\t remove VA 0x%llx in entry %p\n",
 			attachment->va, attachment);
 	amdgpu_vm_bo_rmv(attachment->adev, attachment->bo_va);
+	drm_gem_object_put(&bo->tbo.base);
 	list_del(&attachment->list);
 	kfree(attachment);
 }
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (2 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-04-23  1:33   ` Zeng, Oak
  2021-04-22  1:30 ` [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers Felix Kuehling
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

Do AQL queue double-mapping with a single attach call. That will make it
easier to create per-GPU BOs later, to be shared between the two BO VA
mappings on the same GPU.

Freeing the attachments is not necessary if map_to_gpu fails. These will be
cleaned up when the kdg_mem object is destroyed in
amdgpu_amdkfd_gpuvm_free_memory_of_gpu.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 103 ++++++++----------
 1 file changed, 48 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 34c9a2d0028e..fbd7e786b54e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -486,70 +486,76 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
  * 4a.  Validate new page tables and directories
  */
 static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
-		struct amdgpu_vm *vm, bool is_aql,
-		struct kfd_mem_attachment **p_attachment)
+		struct amdgpu_vm *vm, bool is_aql)
 {
 	unsigned long bo_size = mem->bo->tbo.base.size;
 	uint64_t va = mem->va;
-	struct kfd_mem_attachment *attachment;
-	struct amdgpu_bo *bo;
-	int ret;
+	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
+	struct amdgpu_bo *bo[2] = {NULL, NULL};
+	int i, ret;
 
 	if (!va) {
 		pr_err("Invalid VA when adding BO to VM\n");
 		return -EINVAL;
 	}
 
-	if (is_aql)
-		va += bo_size;
-
-	attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
-	if (!attachment)
-		return -ENOMEM;
+	for (i = 0; i <= is_aql; i++) {
+		attachment[i] = kzalloc(sizeof(*attachment[i]), GFP_KERNEL);
+		if (unlikely(!attachment[i])) {
+			ret = -ENOMEM;
+			goto unwind;
+		}
 
-	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
-			va + bo_size, vm);
+		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
+			 va + bo_size, vm);
 
-	/* FIXME: For now all attachments use the same BO. This is incorrect
-	 * because one BO can only have one DMA mapping for one GPU. We need
-	 * one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
-	 * will be addressed one BO-type at a time in subsequent patches.
-	 */
-	bo = mem->bo;
-	drm_gem_object_get(&bo->tbo.base);
+		/* FIXME: For now all attachments use the same BO. This is
+		 * incorrect because one BO can only have one DMA mapping
+		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
+		 * import with dynamic attachment. This will be addressed
+		 * one BO-type at a time in subsequent patches.
+		 */
+		bo[i] = mem->bo;
+		drm_gem_object_get(&bo[i]->tbo.base);
 
-	/* Add BO to VM internal data structures*/
-	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
-	if (!attachment->bo_va) {
-		ret = -EINVAL;
-		pr_err("Failed to add BO object to VM. ret == %d\n",
-				ret);
-		goto err_vmadd;
-	}
+		/* Add BO to VM internal data structures */
+		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
+		if (unlikely(!attachment[i]->bo_va)) {
+			ret = -ENOMEM;
+			pr_err("Failed to add BO object to VM. ret == %d\n",
+			       ret);
+			goto unwind;
+		}
 
-	attachment->va = va;
-	attachment->pte_flags = get_pte_flags(adev, mem);
-	attachment->adev = adev;
-	list_add(&attachment->list, &mem->attachments);
+		attachment[i]->va = va;
+		attachment[i]->pte_flags = get_pte_flags(adev, mem);
+		attachment[i]->adev = adev;
+		list_add(&attachment[i]->list, &mem->attachments);
 
-	if (p_attachment)
-		*p_attachment = attachment;
+		va += bo_size;
+	}
 
 	/* Allocate validate page tables if needed */
 	ret = vm_validate_pt_pd_bos(vm);
 	if (unlikely(ret)) {
 		pr_err("validate_pt_pd_bos() failed\n");
-		goto err_alloc_pts;
+		goto unwind;
 	}
 
 	return 0;
 
-err_alloc_pts:
-	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
-	list_del(&attachment->list);
-err_vmadd:
-	drm_gem_object_put(&bo->tbo.base);
-	kfree(attachment);
+unwind:
+	for (; i >= 0; i--) {
+		if (!attachment[i])
+			continue;
+		if (attachment[i]->bo_va) {
+			amdgpu_vm_bo_rmv(adev, attachment[i]->bo_va);
+			list_del(&attachment[i]->list);
+		}
+		if (bo[i])
+			drm_gem_object_put(&bo[i]->tbo.base);
+		kfree(attachment[i]);
+	}
 	return ret;
 }
 
@@ -1382,8 +1388,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	uint32_t domain;
 	struct kfd_mem_attachment *entry;
 	struct bo_vm_reservation_context ctx;
-	struct kfd_mem_attachment *attachment = NULL;
-	struct kfd_mem_attachment *attachment_aql = NULL;
 	unsigned long bo_size;
 	bool is_invalid_userptr = false;
 
@@ -1433,15 +1437,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		is_invalid_userptr = true;
 
 	if (!kfd_mem_is_attached(avm, mem)) {
-		ret = kfd_mem_attach(adev, mem, avm, false, &attachment);
+		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
 		if (ret)
 			goto attach_failed;
-		if (mem->aql_queue) {
-			ret = kfd_mem_attach(adev, mem, avm, true,
-					     &attachment_aql);
-			if (ret)
-				goto attach_failed_aql;
-		}
 	} else {
 		ret = vm_validate_pt_pd_bos(avm);
 		if (unlikely(ret))
@@ -1496,11 +1494,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	goto out;
 
 map_bo_to_gpuvm_failed:
-	if (attachment_aql)
-		kfd_mem_detach(attachment_aql);
-attach_failed_aql:
-	if (attachment)
-		kfd_mem_detach(attachment);
 attach_failed:
 	unreserve_bo_and_vms(&ctx, false, false);
 out:
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (3 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-04-27  0:09   ` Zeng, Oak
  2021-04-22  1:30 ` [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings Felix Kuehling
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

Add BO-type specific helpers functions to DMA-map and unmap
kfd_mem_attachments. Implement this functionality for userptrs by creating
one SG BO per GPU and filling it with a DMA mapping of the pages from the
original mem->bo.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 146 +++++++++++++++++-
 2 files changed, 145 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index c24b2478f445..63668433f5a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
 
 struct amdgpu_device;
 
+enum kfd_mem_attachment_type {
+	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
+	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
+};
+
 struct kfd_mem_attachment {
 	struct list_head list;
+	enum kfd_mem_attachment_type type;
+	bool is_mapped;
 	struct amdgpu_bo_va *bo_va;
 	struct amdgpu_device *adev;
-	bool is_mapped;
 	uint64_t va;
 	uint64_t pte_flags;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index fbd7e786b54e..49d1af4aa5f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -473,12 +473,117 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
 	return pte_flags;
 }
 
+static int
+kfd_mem_dmamap_userptr(struct kgd_mem *mem,
+		       struct kfd_mem_attachment *attachment)
+{
+	enum dma_data_direction direction =
+		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+	struct ttm_operation_ctx ctx = {.interruptible = true};
+	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+	struct amdgpu_device *adev = attachment->adev;
+	struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
+	struct ttm_tt *ttm = bo->tbo.ttm;
+	int ret;
+
+	ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
+	if (unlikely(!ttm->sg))
+		return -ENOMEM;
+
+	if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
+		return -EINVAL;
+
+	/* Same sequence as in amdgpu_ttm_tt_pin_userptr */
+	ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
+					ttm->num_pages, 0,
+					(u64)ttm->num_pages << PAGE_SHIFT,
+					GFP_KERNEL);
+	if (unlikely(ret))
+		goto release_sg;
+
+	ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
+	if (unlikely(ret))
+		goto release_sg;
+
+	drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
+				       ttm->num_pages);
+
+	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (ret)
+		goto release_sg;
+
+	return 0;
+
+release_sg:
+	pr_err("DMA map userptr failed: %d\n", ret);
+	sg_free_table(ttm->sg);
+	kfree(ttm->sg);
+	ttm->sg = NULL;
+	return ret;
+}
+
+static int
+kfd_mem_dmamap_attachment(struct kgd_mem *mem,
+			  struct kfd_mem_attachment *attachment)
+{
+	switch (attachment->type) {
+	case KFD_MEM_ATT_SHARED:
+		return 0;
+	case KFD_MEM_ATT_USERPTR:
+		return kfd_mem_dmamap_userptr(mem, attachment);
+	default:
+		WARN_ON_ONCE(1);
+	}
+	return -EINVAL;
+}
+
+static void
+kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
+			 struct kfd_mem_attachment *attachment)
+{
+	enum dma_data_direction direction =
+		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
+	struct ttm_operation_ctx ctx = {.interruptible = false};
+	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+	struct amdgpu_device *adev = attachment->adev;
+	struct ttm_tt *ttm = bo->tbo.ttm;
+
+	if (unlikely(!ttm->sg))
+		return;
+
+	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+
+	dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
+	sg_free_table(ttm->sg);
+	ttm->sg = NULL;
+}
+
+static void
+kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
+			    struct kfd_mem_attachment *attachment)
+{
+	switch (attachment->type) {
+	case KFD_MEM_ATT_SHARED:
+		break;
+	case KFD_MEM_ATT_USERPTR:
+		kfd_mem_dmaunmap_userptr(mem, attachment);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
  * to a VM. It can later be mapped and unmapped many times without
  * repeating these steps.
  *
+ * 0. Create BO for DMA mapping, if needed
  * 1. Allocate and initialize BO VA entry data structure
  * 2. Add BO to the VM
  * 3. Determine ASIC-specific PTE flags
@@ -488,10 +593,12 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
 static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		struct amdgpu_vm *vm, bool is_aql)
 {
+	struct amdgpu_device *bo_adev = amdgpu_ttm_adev(mem->bo->tbo.bdev);
 	unsigned long bo_size = mem->bo->tbo.base.size;
 	uint64_t va = mem->va;
 	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
 	struct amdgpu_bo *bo[2] = {NULL, NULL};
+	struct drm_gem_object *gobj;
 	int i, ret;
 
 	if (!va) {
@@ -509,14 +616,37 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
 			 va + bo_size, vm);
 
-		/* FIXME: For now all attachments use the same BO. This is
-		 * incorrect because one BO can only have one DMA mapping
-		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
-		 * import with dynamic attachment. This will be addressed
-		 * one BO-type at a time in subsequent patches.
-		 */
-		bo[i] = mem->bo;
-		drm_gem_object_get(&bo[i]->tbo.base);
+		if (adev == bo_adev || (mem->domain == AMDGPU_GEM_DOMAIN_VRAM &&
+					amdgpu_xgmi_same_hive(adev, bo_adev))) {
+			/* Mappings on the local GPU and VRAM mappings in the
+			 * local hive share the original BO
+			 */
+			attachment[i]->type = KFD_MEM_ATT_SHARED;
+			bo[i] = mem->bo;
+			drm_gem_object_get(&bo[i]->tbo.base);
+		} else if (i > 0) {
+			/* Multiple mappings on the same GPU share the BO */
+			attachment[i]->type = KFD_MEM_ATT_SHARED;
+			bo[i] = bo[0];
+			drm_gem_object_get(&bo[i]->tbo.base);
+		} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
+			/* Create an SG BO to DMA-map userptrs on other GPUs */
+			attachment[i]->type = KFD_MEM_ATT_USERPTR;
+			ret = amdgpu_gem_object_create(adev, bo_size, 1,
+						       AMDGPU_GEM_DOMAIN_CPU,
+						       0, ttm_bo_type_sg,
+						       mem->bo->tbo.base.resv,
+						       &gobj);
+			if (ret)
+				goto unwind;
+			bo[i] = gem_to_amdgpu_bo(gobj);
+			bo[i]->parent = amdgpu_bo_ref(mem->bo);
+		} else {
+			/* FIXME: Need to DMA-map other BO types */
+			attachment[i]->type = KFD_MEM_ATT_SHARED;
+			bo[i] = mem->bo;
+			drm_gem_object_get(&bo[i]->tbo.base);
+		}
 
 		/* Add BO to VM internal data structures */
 		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (4 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-04-27  0:23   ` Zeng, Oak
  2021-04-22  1:30 ` [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation Felix Kuehling
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
with the BO and page tables reserved, so we can safely update the DMA
mapping.

DMA unmap when a BO is unmapped from a GPU and before updating mappings
in restore workers.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++++++++++---------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 49d1af4aa5f1..7d25d886b98c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -961,11 +961,12 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
 	return ret;
 }
 
-static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
+static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
 				struct kfd_mem_attachment *entry,
 				struct amdgpu_sync *sync)
 {
 	struct amdgpu_bo_va *bo_va = entry->bo_va;
+	struct amdgpu_device *adev = entry->adev;
 	struct amdgpu_vm *vm = bo_va->base.vm;
 
 	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
@@ -974,15 +975,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
 
 	amdgpu_sync_fence(sync, bo_va->last_pt_update);
 
-	return 0;
+	kfd_mem_dmaunmap_attachment(mem, entry);
 }
 
-static int update_gpuvm_pte(struct amdgpu_device *adev,
-		struct kfd_mem_attachment *entry,
-		struct amdgpu_sync *sync)
+static int update_gpuvm_pte(struct kgd_mem *mem,
+			    struct kfd_mem_attachment *entry,
+			    struct amdgpu_sync *sync)
 {
-	int ret;
 	struct amdgpu_bo_va *bo_va = entry->bo_va;
+	struct amdgpu_device *adev = entry->adev;
+	int ret;
+
+	ret = kfd_mem_dmamap_attachment(mem, entry);
+	if (ret)
+		return ret;
 
 	/* Update the page tables  */
 	ret = amdgpu_vm_bo_update(adev, bo_va, false);
@@ -994,14 +1000,15 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
 	return amdgpu_sync_fence(sync, bo_va->last_pt_update);
 }
 
-static int map_bo_to_gpuvm(struct amdgpu_device *adev,
-		struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
-		bool no_update_pte)
+static int map_bo_to_gpuvm(struct kgd_mem *mem,
+			   struct kfd_mem_attachment *entry,
+			   struct amdgpu_sync *sync,
+			   bool no_update_pte)
 {
 	int ret;
 
 	/* Set virtual address for the allocation */
-	ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
+	ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
 			       amdgpu_bo_size(entry->bo_va->base.bo),
 			       entry->pte_flags);
 	if (ret) {
@@ -1013,7 +1020,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
 	if (no_update_pte)
 		return 0;
 
-	ret = update_gpuvm_pte(adev, entry, sync);
+	ret = update_gpuvm_pte(mem, entry, sync);
 	if (ret) {
 		pr_err("update_gpuvm_pte() failed\n");
 		goto update_gpuvm_pte_failed;
@@ -1022,7 +1029,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
 	return 0;
 
 update_gpuvm_pte_failed:
-	unmap_bo_from_gpuvm(adev, entry, sync);
+	unmap_bo_from_gpuvm(mem, entry, sync);
 	return ret;
 }
 
@@ -1596,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
 			 entry->va, entry->va + bo_size, entry);
 
-		ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
+		ret = map_bo_to_gpuvm(mem, entry, ctx.sync,
 				      is_invalid_userptr);
 		if (ret) {
 			pr_err("Failed to map bo to gpuvm\n");
@@ -1635,7 +1642,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 		struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv)
 {
-	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
 	struct amdkfd_process_info *process_info = avm->process_info;
 	unsigned long bo_size = mem->bo->tbo.base.size;
@@ -1670,13 +1676,8 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
 			 entry->va, entry->va + bo_size, entry);
 
-		ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
-		if (ret == 0) {
-			entry->is_mapped = false;
-		} else {
-			pr_err("failed to unmap VA 0x%llx\n", mem->va);
-			goto unreserve_out;
-		}
+		unmap_bo_from_gpuvm(mem, entry, ctx.sync);
+		entry->is_mapped = false;
 
 		mem->mapped_to_gpu_memory--;
 		pr_debug("\t DEC mapping count %d\n",
@@ -2053,9 +2054,8 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 			if (!attachment->is_mapped)
 				continue;
 
-			ret = update_gpuvm_pte((struct amdgpu_device *)
-					       attachment->adev,
-					       attachment, &sync);
+			kfd_mem_dmaunmap_attachment(mem, attachment);
+			ret = update_gpuvm_pte(mem, attachment, &sync);
 			if (ret) {
 				pr_err("%s: update PTE failed\n", __func__);
 				/* make sure this gets validated again */
@@ -2257,9 +2257,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 			goto validate_map_fail;
 		}
 		list_for_each_entry(attachment, &mem->attachments, list) {
-			ret = update_gpuvm_pte((struct amdgpu_device *)
-					      attachment->adev, attachment,
-					      &sync_obj);
+			if (!attachment->is_mapped)
+				continue;
+
+			kfd_mem_dmaunmap_attachment(mem, attachment);
+			ret = update_gpuvm_pte(mem, attachment, &sync_obj);
 			if (ret) {
 				pr_debug("Memory eviction: update PTE failed. Try again\n");
 				goto validate_map_fail;
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (5 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-05-10 22:06   ` Errabolu, Ramesh
  2021-04-22  1:30 ` [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs Felix Kuehling
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

This is needed to avoid deadlocks with DMA buf import in the next patch.
Also move PT/PD validation out of kfd_mem_attach, that way the caller
can bo this unconditionally.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 75 +++++++++++--------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d25d886b98c..9eeedd0c7920 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -577,6 +577,34 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
 	}
 }
 
+static int
+kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
+		       struct amdgpu_bo **bo)
+{
+	unsigned long bo_size = mem->bo->tbo.base.size;
+	struct drm_gem_object *gobj;
+	int ret;
+
+	ret = amdgpu_bo_reserve(mem->bo, false);
+	if (ret)
+		return ret;
+
+	ret = amdgpu_gem_object_create(adev, bo_size, 1,
+				       AMDGPU_GEM_DOMAIN_CPU,
+				       0, ttm_bo_type_sg,
+				       mem->bo->tbo.base.resv,
+				       &gobj);
+	if (ret)
+		return ret;
+
+	amdgpu_bo_unreserve(mem->bo);
+
+	*bo = gem_to_amdgpu_bo(gobj);
+	(*bo)->parent = amdgpu_bo_ref(mem->bo);
+
+	return 0;
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
@@ -598,7 +626,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 	uint64_t va = mem->va;
 	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
 	struct amdgpu_bo *bo[2] = {NULL, NULL};
-	struct drm_gem_object *gobj;
 	int i, ret;
 
 	if (!va) {
@@ -632,15 +659,9 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
 			/* Create an SG BO to DMA-map userptrs on other GPUs */
 			attachment[i]->type = KFD_MEM_ATT_USERPTR;
-			ret = amdgpu_gem_object_create(adev, bo_size, 1,
-						       AMDGPU_GEM_DOMAIN_CPU,
-						       0, ttm_bo_type_sg,
-						       mem->bo->tbo.base.resv,
-						       &gobj);
+			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
 			if (ret)
 				goto unwind;
-			bo[i] = gem_to_amdgpu_bo(gobj);
-			bo[i]->parent = amdgpu_bo_ref(mem->bo);
 		} else {
 			/* FIXME: Need to DMA-map other BO types */
 			attachment[i]->type = KFD_MEM_ATT_SHARED;
@@ -665,13 +686,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		va += bo_size;
 	}
 
-	/* Allocate validate page tables if needed */
-	ret = vm_validate_pt_pd_bos(vm);
-	if (unlikely(ret)) {
-		pr_err("validate_pt_pd_bos() failed\n");
-		goto unwind;
-	}
-
 	return 0;
 
 unwind:
@@ -1478,12 +1492,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
 		mem->va + bo_size * (1 + mem->aql_queue));
 
+	ret = unreserve_bo_and_vms(&ctx, false, false);
+
 	/* Remove from VM internal data structures */
 	list_for_each_entry_safe(entry, tmp, &mem->attachments, list)
 		kfd_mem_detach(entry);
 
-	ret = unreserve_bo_and_vms(&ctx, false, false);
-
 	/* Free the sync object */
 	amdgpu_sync_free(&mem->sync);
 
@@ -1560,6 +1574,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 			mem->va + bo_size * (1 + mem->aql_queue),
 			avm, domain_string(domain));
 
+	if (!kfd_mem_is_attached(avm, mem)) {
+		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
+		if (ret)
+			goto out;
+	}
+
 	ret = reserve_bo_and_vm(mem, avm, &ctx);
 	if (unlikely(ret))
 		goto out;
@@ -1573,15 +1593,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	    bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
 		is_invalid_userptr = true;
 
-	if (!kfd_mem_is_attached(avm, mem)) {
-		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
-		if (ret)
-			goto attach_failed;
-	} else {
-		ret = vm_validate_pt_pd_bos(avm);
-		if (unlikely(ret))
-			goto attach_failed;
-	}
+	ret = vm_validate_pt_pd_bos(avm);
+	if (unlikely(ret))
+		goto out_unreserve;
 
 	if (mem->mapped_to_gpu_memory == 0 &&
 	    !amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
@@ -1592,7 +1606,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		ret = amdgpu_amdkfd_bo_validate(bo, domain, true);
 		if (ret) {
 			pr_debug("Validate failed\n");
-			goto map_bo_to_gpuvm_failed;
+			goto out_unreserve;
 		}
 	}
 
@@ -1607,13 +1621,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 				      is_invalid_userptr);
 		if (ret) {
 			pr_err("Failed to map bo to gpuvm\n");
-			goto map_bo_to_gpuvm_failed;
+			goto out_unreserve;
 		}
 
 		ret = vm_update_pds(avm, ctx.sync);
 		if (ret) {
 			pr_err("Failed to update page directories\n");
-			goto map_bo_to_gpuvm_failed;
+			goto out_unreserve;
 		}
 
 		entry->is_mapped = true;
@@ -1630,8 +1644,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 
 	goto out;
 
-map_bo_to_gpuvm_failed:
-attach_failed:
+out_unreserve:
 	unreserve_bo_and_vms(&ctx, false, false);
 out:
 	mutex_unlock(&mem->process_info->lock);
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (6 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-04-27  0:35   ` Zeng, Oak
  2021-04-22  1:30 ` [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit Felix Kuehling
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++++++++++++++++++-
 2 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 63668433f5a6..b706e5a54782 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -41,6 +41,7 @@ struct amdgpu_device;
 enum kfd_mem_attachment_type {
 	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
 	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
+	KFD_MEM_ATT_DMABUF,	/* DMAbuf to DMA map TTM BOs */
 };
 
 struct kfd_mem_attachment {
@@ -56,6 +57,7 @@ struct kfd_mem_attachment {
 struct kgd_mem {
 	struct mutex lock;
 	struct amdgpu_bo *bo;
+	struct dma_buf *dmabuf;
 	struct list_head attachments;
 	/* protected by amdkfd_process_info.lock */
 	struct ttm_validate_buffer validate_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 9eeedd0c7920..18a1f9222a59 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
 	return ret;
 }
 
+static int
+kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
+{
+	struct ttm_operation_ctx ctx = {.interruptible = true};
+	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
+	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
+	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+}
+
 static int
 kfd_mem_dmamap_attachment(struct kgd_mem *mem,
 			  struct kfd_mem_attachment *attachment)
@@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
 		return 0;
 	case KFD_MEM_ATT_USERPTR:
 		return kfd_mem_dmamap_userptr(mem, attachment);
+	case KFD_MEM_ATT_DMABUF:
+		return kfd_mem_dmamap_dmabuf(attachment);
 	default:
 		WARN_ON_ONCE(1);
 	}
@@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
 	ttm->sg = NULL;
 }
 
+static void
+kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
+{
+	struct ttm_operation_ctx ctx = {.interruptible = true};
+	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
+	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
+	 * called
+	 */
+}
+
 static void
 kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
 			    struct kfd_mem_attachment *attachment)
@@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
 	case KFD_MEM_ATT_USERPTR:
 		kfd_mem_dmaunmap_userptr(mem, attachment);
 		break;
+	case KFD_MEM_ATT_DMABUF:
+		kfd_mem_dmaunmap_dmabuf(attachment);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 	}
@@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
 	return 0;
 }
 
+static int
+kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
+		      struct amdgpu_bo **bo)
+{
+	struct drm_gem_object *gobj;
+
+	if (!mem->dmabuf) {
+		mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
+			mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
+				DRM_RDWR : 0);
+		if (IS_ERR(mem->dmabuf)) {
+			mem->dmabuf = NULL;
+			return PTR_ERR(mem->dmabuf);
+		}
+	}
+
+	gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
+	if (IS_ERR(gobj))
+		return PTR_ERR(gobj);
+
+	/* Import takes an extra reference on the dmabuf. Drop it now to
+	 * avoid leaking it. We only need the one reference in
+	 * kgd_mem->dmabuf.
+	 */
+	dma_buf_put(mem->dmabuf);
+
+	*bo = gem_to_amdgpu_bo(gobj);
+	(*bo)->parent = amdgpu_bo_ref(mem->bo);
+
+	return 0;
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
@@ -662,8 +722,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
 			if (ret)
 				goto unwind;
+		} else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
+			   mem->bo->tbo.type != ttm_bo_type_sg) {
+			/* GTT BOs use DMA-mapping ability of dynamic-attach
+			 * DMA bufs. TODO: The same should work for VRAM on
+			 * large-BAR GPUs.
+			 */
+			attachment[i]->type = KFD_MEM_ATT_DMABUF;
+			ret = kfd_mem_attach_dmabuf(adev, mem, &bo[i]);
+			if (ret)
+				goto unwind;
 		} else {
-			/* FIXME: Need to DMA-map other BO types */
+			/* FIXME: Need to DMA-map other BO types:
+			 * large-BAR VRAM, doorbells, MMIO remap
+			 */
 			attachment[i]->type = KFD_MEM_ATT_SHARED;
 			bo[i] = mem->bo;
 			drm_gem_object_get(&bo[i]->tbo.base);
@@ -1522,6 +1594,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 
 	/* Free the BO*/
 	drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
+	if (mem->dmabuf)
+		dma_buf_put(mem->dmabuf);
 	drm_gem_object_put(&mem->bo->tbo.base);
 	mutex_destroy(&mem->lock);
 	kfree(mem);
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (7 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-05-10 22:08   ` Errabolu, Ramesh
  2021-04-22  1:30 ` [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind Felix Kuehling
  2021-04-27 15:16 ` [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Zeng, Oak
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

Pages in SG BOs were not allocated by TTM. So don't count them against
TTM's pages limit.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5d8820725b75..e8b8c3257392 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -317,9 +317,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	if (ttm_tt_is_populated(ttm))
 		return 0;
 
-	atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
-	if (bdev->pool.use_dma32)
-		atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocated);
+	if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
+		if (bdev->pool.use_dma32)
+			atomic_long_add(ttm->num_pages,
+					&ttm_dma32_pages_allocated);
+	}
 
 	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
 	       atomic_long_read(&ttm_dma32_pages_allocated) >
@@ -350,9 +353,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	return 0;
 
 error:
-	atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-	if (bdev->pool.use_dma32)
-		atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+	if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+		if (bdev->pool.use_dma32)
+			atomic_long_sub(ttm->num_pages,
+					&ttm_dma32_pages_allocated);
+	}
 	return ret;
 }
 EXPORT_SYMBOL(ttm_tt_populate);
@@ -382,9 +388,12 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	else
 		ttm_pool_free(&bdev->pool, ttm);
 
-	atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-	if (bdev->pool.use_dma32)
-		atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+	if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+		if (bdev->pool.use_dma32)
+			atomic_long_sub(ttm->num_pages,
+					&ttm_dma32_pages_allocated);
+	}
 
 	ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;
 }
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (8 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit Felix Kuehling
@ 2021-04-22  1:30 ` Felix Kuehling
  2021-04-22 11:20   ` Christian König
  2021-04-27 15:16 ` [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Zeng, Oak
  10 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-22  1:30 UTC (permalink / raw)
  To: amd-gfx, dri-devel

The dmabuf attachment should be updated by moving the SG BO to DOMAIN_CPU
and back to DOMAIN_GTT. This does not necessarily invoke the
populate/unpopulate callbacks. Do this in backend_bind/unbind instead.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  3 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 51 +++++++++----------
 2 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 18a1f9222a59..68e6ce8dcf33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -582,9 +582,6 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
 
 	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
 	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
-	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
-	 * called
-	 */
 }
 
 static void
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 7e7d8330d64b..fc2a8d681dbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -910,7 +910,23 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
 			DRM_ERROR("failed to pin userptr\n");
 			return r;
 		}
+	} else if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
+		if (!ttm->sg) {
+			struct dma_buf_attachment *attach;
+			struct sg_table *sgt;
+
+			attach = gtt->gobj->import_attach;
+			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
+			if (IS_ERR(sgt))
+				return PTR_ERR(sgt);
+
+			ttm->sg = sgt;
+		}
+
+		drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
+					       ttm->num_pages);
 	}
+
 	if (!ttm->num_pages) {
 		WARN(1, "nothing to bind %u pages for mreg %p back %p!\n",
 		     ttm->num_pages, bo_mem, ttm);
@@ -1037,8 +1053,15 @@ static void amdgpu_ttm_backend_unbind(struct ttm_device *bdev,
 	int r;
 
 	/* if the pages have userptr pinning then clear that first */
-	if (gtt->userptr)
+	if (gtt->userptr) {
 		amdgpu_ttm_tt_unpin_userptr(bdev, ttm);
+	} else if (ttm->sg && gtt->gobj->import_attach) {
+		struct dma_buf_attachment *attach;
+
+		attach = gtt->gobj->import_attach;
+		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
+		ttm->sg = NULL;
+	}
 
 	if (!gtt->bound)
 		return;
@@ -1125,23 +1148,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
 		return 0;
 	}
 
-	if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
-		if (!ttm->sg) {
-			struct dma_buf_attachment *attach;
-			struct sg_table *sgt;
-
-			attach = gtt->gobj->import_attach;
-			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
-			if (IS_ERR(sgt))
-				return PTR_ERR(sgt);
-
-			ttm->sg = sgt;
-		}
-
-		drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
-					       ttm->num_pages);
+	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
 		return 0;
-	}
 
 	return ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx);
 }
@@ -1165,15 +1173,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
 		return;
 	}
 
-	if (ttm->sg && gtt->gobj->import_attach) {
-		struct dma_buf_attachment *attach;
-
-		attach = gtt->gobj->import_attach;
-		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
-		ttm->sg = NULL;
-		return;
-	}
-
 	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
 		return;
 
-- 
2.31.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind
  2021-04-22  1:30 ` [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind Felix Kuehling
@ 2021-04-22 11:20   ` Christian König
  2021-05-10 22:09     ` Errabolu, Ramesh
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2021-04-22 11:20 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx, dri-devel

Am 22.04.21 um 03:30 schrieb Felix Kuehling:
> The dmabuf attachment should be updated by moving the SG BO to DOMAIN_CPU
> and back to DOMAIN_GTT. This does not necessarily invoke the
> populate/unpopulate callbacks. Do this in backend_bind/unbind instead.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  3 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 51 +++++++++----------
>   2 files changed, 25 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 18a1f9222a59..68e6ce8dcf33 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -582,9 +582,6 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
>   
>   	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>   	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> -	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
> -	 * called
> -	 */
>   }
>   
>   static void
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 7e7d8330d64b..fc2a8d681dbc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -910,7 +910,23 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
>   			DRM_ERROR("failed to pin userptr\n");
>   			return r;
>   		}
> +	} else if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
> +		if (!ttm->sg) {
> +			struct dma_buf_attachment *attach;
> +			struct sg_table *sgt;
> +
> +			attach = gtt->gobj->import_attach;
> +			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> +			if (IS_ERR(sgt))
> +				return PTR_ERR(sgt);
> +
> +			ttm->sg = sgt;
> +		}
> +
> +		drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
> +					       ttm->num_pages);
>   	}
> +
>   	if (!ttm->num_pages) {
>   		WARN(1, "nothing to bind %u pages for mreg %p back %p!\n",
>   		     ttm->num_pages, bo_mem, ttm);
> @@ -1037,8 +1053,15 @@ static void amdgpu_ttm_backend_unbind(struct ttm_device *bdev,
>   	int r;
>   
>   	/* if the pages have userptr pinning then clear that first */
> -	if (gtt->userptr)
> +	if (gtt->userptr) {
>   		amdgpu_ttm_tt_unpin_userptr(bdev, ttm);
> +	} else if (ttm->sg && gtt->gobj->import_attach) {
> +		struct dma_buf_attachment *attach;
> +
> +		attach = gtt->gobj->import_attach;
> +		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
> +		ttm->sg = NULL;
> +	}
>   
>   	if (!gtt->bound)
>   		return;
> @@ -1125,23 +1148,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
>   		return 0;
>   	}
>   
> -	if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
> -		if (!ttm->sg) {
> -			struct dma_buf_attachment *attach;
> -			struct sg_table *sgt;
> -
> -			attach = gtt->gobj->import_attach;
> -			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> -			if (IS_ERR(sgt))
> -				return PTR_ERR(sgt);
> -
> -			ttm->sg = sgt;
> -		}
> -
> -		drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
> -					       ttm->num_pages);
> +	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
>   		return 0;
> -	}
>   
>   	return ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx);
>   }
> @@ -1165,15 +1173,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
>   		return;
>   	}
>   
> -	if (ttm->sg && gtt->gobj->import_attach) {
> -		struct dma_buf_attachment *attach;
> -
> -		attach = gtt->gobj->import_attach;
> -		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
> -		ttm->sg = NULL;
> -		return;
> -	}
> -
>   	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
>   		return;
>   

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping
  2021-04-22  1:30 ` [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping Felix Kuehling
@ 2021-04-23  1:33   ` Zeng, Oak
  2021-04-23  7:23     ` Felix Kuehling
  0 siblings, 1 reply; 32+ messages in thread
From: Zeng, Oak @ 2021-04-23  1:33 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel



Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:

    Do AQL queue double-mapping with a single attach call. That will make it
    easier to create per-GPU BOs later, to be shared between the two BO VA
    mappings on the same GPU.

    Freeing the attachments is not necessary if map_to_gpu fails. These will be
    cleaned up when the kdg_mem object is destroyed in
    amdgpu_amdkfd_gpuvm_free_memory_of_gpu.

    Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
    ---
     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 103 ++++++++----------
     1 file changed, 48 insertions(+), 55 deletions(-)

    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    index 34c9a2d0028e..fbd7e786b54e 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    @@ -486,70 +486,76 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
      * 4a.  Validate new page tables and directories
      */
     static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
    -		struct amdgpu_vm *vm, bool is_aql,
    -		struct kfd_mem_attachment **p_attachment)
    +		struct amdgpu_vm *vm, bool is_aql)
     {
     	unsigned long bo_size = mem->bo->tbo.base.size;
     	uint64_t va = mem->va;
    -	struct kfd_mem_attachment *attachment;
    -	struct amdgpu_bo *bo;
    -	int ret;
    +	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
    +	struct amdgpu_bo *bo[2] = {NULL, NULL};
    +	int i, ret;

     	if (!va) {
     		pr_err("Invalid VA when adding BO to VM\n");
     		return -EINVAL;
     	}

    -	if (is_aql)
    -		va += bo_size;
    -
    -	attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
    -	if (!attachment)
    -		return -ENOMEM;
    +	for (i = 0; i <= is_aql; i++) {
    +		attachment[i] = kzalloc(sizeof(*attachment[i]), GFP_KERNEL);
    +		if (unlikely(!attachment[i])) {
    +			ret = -ENOMEM;
    +			goto unwind;
    +		}

    -	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
    -			va + bo_size, vm);
    +		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
    +			 va + bo_size, vm);

    -	/* FIXME: For now all attachments use the same BO. This is incorrect
    -	 * because one BO can only have one DMA mapping for one GPU. We need
    -	 * one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
    -	 * will be addressed one BO-type at a time in subsequent patches.
    -	 */
    -	bo = mem->bo;
    -	drm_gem_object_get(&bo->tbo.base);
    +		/* FIXME: For now all attachments use the same BO. This is
    +		 * incorrect because one BO can only have one DMA mapping
    +		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
    +		 * import with dynamic attachment. This will be addressed
    +		 * one BO-type at a time in subsequent patches.
    +		 */
    +		bo[i] = mem->bo;
    +		drm_gem_object_get(&bo[i]->tbo.base);

    -	/* Add BO to VM internal data structures*/
    -	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
    -	if (!attachment->bo_va) {
    -		ret = -EINVAL;
    -		pr_err("Failed to add BO object to VM. ret == %d\n",
    -				ret);
    -		goto err_vmadd;
    -	}
    +		/* Add BO to VM internal data structures */
    +		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
Just for discussion. Are we allowed to add one bo twice to a vm? When I looked at amdgpu_vm_bo_base_init (called by amdgpu_vm_bo_add), line:
bo->vm_bo = base;
when you add the same bo to vm the second time, bo->vm_bo will be overwritten. I am not sure whether this will cause an issue later.
This is not introduced by your code. The original code (calling kfd_mem_attach twice for aql) has the same problem.
    +		if (unlikely(!attachment[i]->bo_va)) {
    +			ret = -ENOMEM;
    +			pr_err("Failed to add BO object to VM. ret == %d\n",
    +			       ret);
    +			goto unwind;
    +		}

    -	attachment->va = va;
    -	attachment->pte_flags = get_pte_flags(adev, mem);
    -	attachment->adev = adev;
    -	list_add(&attachment->list, &mem->attachments);
    +		attachment[i]->va = va;
    +		attachment[i]->pte_flags = get_pte_flags(adev, mem);
    +		attachment[i]->adev = adev;
    +		list_add(&attachment[i]->list, &mem->attachments);

    -	if (p_attachment)
    -		*p_attachment = attachment;
    +		va += bo_size;
    +	}

     	/* Allocate validate page tables if needed */
     	ret = vm_validate_pt_pd_bos(vm);
     	if (unlikely(ret)) {
     		pr_err("validate_pt_pd_bos() failed\n");
    -		goto err_alloc_pts;
    +		goto unwind;
     	}

     	return 0;

    -err_alloc_pts:
    -	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
    -	list_del(&attachment->list);
    -err_vmadd:
    -	drm_gem_object_put(&bo->tbo.base);
    -	kfree(attachment);
    +unwind:
    +	for (; i >= 0; i--) {
    +		if (!attachment[i])
    +			continue;
    +		if (attachment[i]->bo_va) {
    +			amdgpu_vm_bo_rmv(adev, attachment[i]->bo_va);
    +			list_del(&attachment[i]->list);
    +		}
    +		if (bo[i])
    +			drm_gem_object_put(&bo[i]->tbo.base);
    +		kfree(attachment[i]);
    +	}
     	return ret;
     }

    @@ -1382,8 +1388,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
     	uint32_t domain;
     	struct kfd_mem_attachment *entry;
     	struct bo_vm_reservation_context ctx;
    -	struct kfd_mem_attachment *attachment = NULL;
    -	struct kfd_mem_attachment *attachment_aql = NULL;
     	unsigned long bo_size;
     	bool is_invalid_userptr = false;

    @@ -1433,15 +1437,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
     		is_invalid_userptr = true;

     	if (!kfd_mem_is_attached(avm, mem)) {
    -		ret = kfd_mem_attach(adev, mem, avm, false, &attachment);
    +		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
     		if (ret)
     			goto attach_failed;
    -		if (mem->aql_queue) {
    -			ret = kfd_mem_attach(adev, mem, avm, true,
    -					     &attachment_aql);
    -			if (ret)
    -				goto attach_failed_aql;
    -		}
     	} else {
     		ret = vm_validate_pt_pd_bos(avm);
     		if (unlikely(ret))
    @@ -1496,11 +1494,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
     	goto out;

     map_bo_to_gpuvm_failed:
    -	if (attachment_aql)
    -		kfd_mem_detach(attachment_aql);
    -attach_failed_aql:
    -	if (attachment)
    -		kfd_mem_detach(attachment);
     attach_failed:
     	unreserve_bo_and_vms(&ctx, false, false);
     out:
    -- 
    2.31.1

    _______________________________________________
    amd-gfx mailing list
    amd-gfx@lists.freedesktop.org
    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Coak.zeng%40amd.com%7C5e91627d89664d410dee08d9052e628c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519049009398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=k8rqHZCG80WdfogGnNydykqQdG%2FQ%2BE%2BwCFo0mQ0aAno%3D&amp;reserved=0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping
  2021-04-23  1:33   ` Zeng, Oak
@ 2021-04-23  7:23     ` Felix Kuehling
  2021-05-10 22:03       ` Errabolu, Ramesh
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-23  7:23 UTC (permalink / raw)
  To: Zeng, Oak, amd-gfx, dri-devel

Am 2021-04-22 um 9:33 p.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     Do AQL queue double-mapping with a single attach call. That will make it
>     easier to create per-GPU BOs later, to be shared between the two BO VA
>     mappings on the same GPU.
>
>     Freeing the attachments is not necessary if map_to_gpu fails. These will be
>     cleaned up when the kdg_mem object is destroyed in
>     amdgpu_amdkfd_gpuvm_free_memory_of_gpu.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 103 ++++++++----------
>      1 file changed, 48 insertions(+), 55 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index 34c9a2d0028e..fbd7e786b54e 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -486,70 +486,76 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
>       * 4a.  Validate new page tables and directories
>       */
>      static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>     -		struct amdgpu_vm *vm, bool is_aql,
>     -		struct kfd_mem_attachment **p_attachment)
>     +		struct amdgpu_vm *vm, bool is_aql)
>      {
>      	unsigned long bo_size = mem->bo->tbo.base.size;
>      	uint64_t va = mem->va;
>     -	struct kfd_mem_attachment *attachment;
>     -	struct amdgpu_bo *bo;
>     -	int ret;
>     +	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
>     +	struct amdgpu_bo *bo[2] = {NULL, NULL};
>     +	int i, ret;
>
>      	if (!va) {
>      		pr_err("Invalid VA when adding BO to VM\n");
>      		return -EINVAL;
>      	}
>
>     -	if (is_aql)
>     -		va += bo_size;
>     -
>     -	attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
>     -	if (!attachment)
>     -		return -ENOMEM;
>     +	for (i = 0; i <= is_aql; i++) {
>     +		attachment[i] = kzalloc(sizeof(*attachment[i]), GFP_KERNEL);
>     +		if (unlikely(!attachment[i])) {
>     +			ret = -ENOMEM;
>     +			goto unwind;
>     +		}
>
>     -	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
>     -			va + bo_size, vm);
>     +		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
>     +			 va + bo_size, vm);
>
>     -	/* FIXME: For now all attachments use the same BO. This is incorrect
>     -	 * because one BO can only have one DMA mapping for one GPU. We need
>     -	 * one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
>     -	 * will be addressed one BO-type at a time in subsequent patches.
>     -	 */
>     -	bo = mem->bo;
>     -	drm_gem_object_get(&bo->tbo.base);
>     +		/* FIXME: For now all attachments use the same BO. This is
>     +		 * incorrect because one BO can only have one DMA mapping
>     +		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
>     +		 * import with dynamic attachment. This will be addressed
>     +		 * one BO-type at a time in subsequent patches.
>     +		 */
>     +		bo[i] = mem->bo;
>     +		drm_gem_object_get(&bo[i]->tbo.base);
>
>     -	/* Add BO to VM internal data structures*/
>     -	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
>     -	if (!attachment->bo_va) {
>     -		ret = -EINVAL;
>     -		pr_err("Failed to add BO object to VM. ret == %d\n",
>     -				ret);
>     -		goto err_vmadd;
>     -	}
>     +		/* Add BO to VM internal data structures */
>     +		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
> Just for discussion. Are we allowed to add one bo twice to a vm? When I looked at amdgpu_vm_bo_base_init (called by amdgpu_vm_bo_add), line:
> bo->vm_bo = base;
> when you add the same bo to vm the second time, bo->vm_bo will be overwritten. I am not sure whether this will cause an issue later.
> This is not introduced by your code. The original code (calling kfd_mem_attach twice for aql) has the same problem.

If you just add one more line of context, you'll see that bo->vm_bo is
the start of a single linked list of struct amdgpu_vm_bo_base. So adding
a BO to a VM multiple times just extends that single-linked list:

        base->next = bo->vm_bo;
        bo->vm_bo = base;

Regards,
  Felix


>     +		if (unlikely(!attachment[i]->bo_va)) {
>     +			ret = -ENOMEM;
>     +			pr_err("Failed to add BO object to VM. ret == %d\n",
>     +			       ret);
>     +			goto unwind;
>     +		}
>
>     -	attachment->va = va;
>     -	attachment->pte_flags = get_pte_flags(adev, mem);
>     -	attachment->adev = adev;
>     -	list_add(&attachment->list, &mem->attachments);
>     +		attachment[i]->va = va;
>     +		attachment[i]->pte_flags = get_pte_flags(adev, mem);
>     +		attachment[i]->adev = adev;
>     +		list_add(&attachment[i]->list, &mem->attachments);
>
>     -	if (p_attachment)
>     -		*p_attachment = attachment;
>     +		va += bo_size;
>     +	}
>
>      	/* Allocate validate page tables if needed */
>      	ret = vm_validate_pt_pd_bos(vm);
>      	if (unlikely(ret)) {
>      		pr_err("validate_pt_pd_bos() failed\n");
>     -		goto err_alloc_pts;
>     +		goto unwind;
>      	}
>
>      	return 0;
>
>     -err_alloc_pts:
>     -	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
>     -	list_del(&attachment->list);
>     -err_vmadd:
>     -	drm_gem_object_put(&bo->tbo.base);
>     -	kfree(attachment);
>     +unwind:
>     +	for (; i >= 0; i--) {
>     +		if (!attachment[i])
>     +			continue;
>     +		if (attachment[i]->bo_va) {
>     +			amdgpu_vm_bo_rmv(adev, attachment[i]->bo_va);
>     +			list_del(&attachment[i]->list);
>     +		}
>     +		if (bo[i])
>     +			drm_gem_object_put(&bo[i]->tbo.base);
>     +		kfree(attachment[i]);
>     +	}
>      	return ret;
>      }
>
>     @@ -1382,8 +1388,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      	uint32_t domain;
>      	struct kfd_mem_attachment *entry;
>      	struct bo_vm_reservation_context ctx;
>     -	struct kfd_mem_attachment *attachment = NULL;
>     -	struct kfd_mem_attachment *attachment_aql = NULL;
>      	unsigned long bo_size;
>      	bool is_invalid_userptr = false;
>
>     @@ -1433,15 +1437,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      		is_invalid_userptr = true;
>
>      	if (!kfd_mem_is_attached(avm, mem)) {
>     -		ret = kfd_mem_attach(adev, mem, avm, false, &attachment);
>     +		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
>      		if (ret)
>      			goto attach_failed;
>     -		if (mem->aql_queue) {
>     -			ret = kfd_mem_attach(adev, mem, avm, true,
>     -					     &attachment_aql);
>     -			if (ret)
>     -				goto attach_failed_aql;
>     -		}
>      	} else {
>      		ret = vm_validate_pt_pd_bos(avm);
>      		if (unlikely(ret))
>     @@ -1496,11 +1494,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      	goto out;
>
>      map_bo_to_gpuvm_failed:
>     -	if (attachment_aql)
>     -		kfd_mem_detach(attachment_aql);
>     -attach_failed_aql:
>     -	if (attachment)
>     -		kfd_mem_detach(attachment);
>      attach_failed:
>      	unreserve_bo_and_vms(&ctx, false, false);
>      out:
>     -- 
>     2.31.1
>
>     _______________________________________________
>     amd-gfx mailing list
>     amd-gfx@lists.freedesktop.org
>     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers
  2021-04-22  1:30 ` [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers Felix Kuehling
@ 2021-04-27  0:09   ` Zeng, Oak
  2021-04-27  3:41     ` Felix Kuehling
  0 siblings, 1 reply; 32+ messages in thread
From: Zeng, Oak @ 2021-04-27  0:09 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel

As I understand it, when one GPU map another GPU's vram, this vram should also be mapped in iommu page table. Also normal GTT memory (versus userptr) also need to be mapped in iommu. But don't see this code below. I only see you map userptr in iommu. Maybe you map them in iommu not during memory attachment time?

Also see a nit-pick inline

Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:

    Add BO-type specific helpers functions to DMA-map and unmap
    kfd_mem_attachments. Implement this functionality for userptrs by creating
    one SG BO per GPU and filling it with a DMA mapping of the pages from the
    original mem->bo.

    Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
    ---
     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 146 +++++++++++++++++-
     2 files changed, 145 insertions(+), 9 deletions(-)

    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    index c24b2478f445..63668433f5a6 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    @@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;

     struct amdgpu_device;

    +enum kfd_mem_attachment_type {
    +	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
    +	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
    +};
    +
     struct kfd_mem_attachment {
     	struct list_head list;
    +	enum kfd_mem_attachment_type type;
    +	bool is_mapped;
     	struct amdgpu_bo_va *bo_va;
     	struct amdgpu_device *adev;
    -	bool is_mapped;
     	uint64_t va;
     	uint64_t pte_flags;
     };
    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    index fbd7e786b54e..49d1af4aa5f1 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    @@ -473,12 +473,117 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
     	return pte_flags;
     }

    +static int
    +kfd_mem_dmamap_userptr(struct kgd_mem *mem,
    +		       struct kfd_mem_attachment *attachment)
    +{
    +	enum dma_data_direction direction =
    +		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
    +		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
    +	struct ttm_operation_ctx ctx = {.interruptible = true};
    +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
    +	struct amdgpu_device *adev = attachment->adev;
    +	struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
    +	struct ttm_tt *ttm = bo->tbo.ttm;
    +	int ret;
    +
    +	ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
    +	if (unlikely(!ttm->sg))
    +		return -ENOMEM;
    +
    +	if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
    +		return -EINVAL;
    +
    +	/* Same sequence as in amdgpu_ttm_tt_pin_userptr */
    +	ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
    +					ttm->num_pages, 0,
    +					(u64)ttm->num_pages << PAGE_SHIFT,
    +					GFP_KERNEL);
    +	if (unlikely(ret))
    +		goto release_sg;
Should go to a label starting from kfree below?
    +
    +	ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
    +	if (unlikely(ret))
    +		goto release_sg;
    +
    +	drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
    +				       ttm->num_pages);
    +
    +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
    +	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
    +	if (ret)
    +		goto release_sg;
    +
    +	return 0;
    +
    +release_sg:
    +	pr_err("DMA map userptr failed: %d\n", ret);
    +	sg_free_table(ttm->sg);
    +	kfree(ttm->sg);
    +	ttm->sg = NULL;
    +	return ret;
    +}
    +
    +static int
    +kfd_mem_dmamap_attachment(struct kgd_mem *mem,
    +			  struct kfd_mem_attachment *attachment)
    +{
    +	switch (attachment->type) {
    +	case KFD_MEM_ATT_SHARED:
    +		return 0;
    +	case KFD_MEM_ATT_USERPTR:
    +		return kfd_mem_dmamap_userptr(mem, attachment);
    +	default:
    +		WARN_ON_ONCE(1);
    +	}
    +	return -EINVAL;
    +}
    +
    +static void
    +kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
    +			 struct kfd_mem_attachment *attachment)
    +{
    +	enum dma_data_direction direction =
    +		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
    +		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
    +	struct ttm_operation_ctx ctx = {.interruptible = false};
    +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
    +	struct amdgpu_device *adev = attachment->adev;
    +	struct ttm_tt *ttm = bo->tbo.ttm;
    +
    +	if (unlikely(!ttm->sg))
    +		return;
    +
    +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
    +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
    +
    +	dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
    +	sg_free_table(ttm->sg);
    +	ttm->sg = NULL;
    +}
    +
    +static void
    +kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
    +			    struct kfd_mem_attachment *attachment)
    +{
    +	switch (attachment->type) {
    +	case KFD_MEM_ATT_SHARED:
    +		break;
    +	case KFD_MEM_ATT_USERPTR:
    +		kfd_mem_dmaunmap_userptr(mem, attachment);
    +		break;
    +	default:
    +		WARN_ON_ONCE(1);
    +	}
    +}
    +
     /* kfd_mem_attach - Add a BO to a VM
      *
      * Everything that needs to bo done only once when a BO is first added
      * to a VM. It can later be mapped and unmapped many times without
      * repeating these steps.
      *
    + * 0. Create BO for DMA mapping, if needed
      * 1. Allocate and initialize BO VA entry data structure
      * 2. Add BO to the VM
      * 3. Determine ASIC-specific PTE flags
    @@ -488,10 +593,12 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
     static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
     		struct amdgpu_vm *vm, bool is_aql)
     {
    +	struct amdgpu_device *bo_adev = amdgpu_ttm_adev(mem->bo->tbo.bdev);
     	unsigned long bo_size = mem->bo->tbo.base.size;
     	uint64_t va = mem->va;
     	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
     	struct amdgpu_bo *bo[2] = {NULL, NULL};
    +	struct drm_gem_object *gobj;
     	int i, ret;

     	if (!va) {
    @@ -509,14 +616,37 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
     		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
     			 va + bo_size, vm);

    -		/* FIXME: For now all attachments use the same BO. This is
    -		 * incorrect because one BO can only have one DMA mapping
    -		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
    -		 * import with dynamic attachment. This will be addressed
    -		 * one BO-type at a time in subsequent patches.
    -		 */
    -		bo[i] = mem->bo;
    -		drm_gem_object_get(&bo[i]->tbo.base);
    +		if (adev == bo_adev || (mem->domain == AMDGPU_GEM_DOMAIN_VRAM &&
    +					amdgpu_xgmi_same_hive(adev, bo_adev))) {
    +			/* Mappings on the local GPU and VRAM mappings in the
    +			 * local hive share the original BO
    +			 */
    +			attachment[i]->type = KFD_MEM_ATT_SHARED;
    +			bo[i] = mem->bo;
    +			drm_gem_object_get(&bo[i]->tbo.base);
    +		} else if (i > 0) {
    +			/* Multiple mappings on the same GPU share the BO */
    +			attachment[i]->type = KFD_MEM_ATT_SHARED;
    +			bo[i] = bo[0];
    +			drm_gem_object_get(&bo[i]->tbo.base);
    +		} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
    +			/* Create an SG BO to DMA-map userptrs on other GPUs */
    +			attachment[i]->type = KFD_MEM_ATT_USERPTR;
    +			ret = amdgpu_gem_object_create(adev, bo_size, 1,
    +						       AMDGPU_GEM_DOMAIN_CPU,
    +						       0, ttm_bo_type_sg,
    +						       mem->bo->tbo.base.resv,
    +						       &gobj);
    +			if (ret)
    +				goto unwind;
    +			bo[i] = gem_to_amdgpu_bo(gobj);
    +			bo[i]->parent = amdgpu_bo_ref(mem->bo);
    +		} else {
    +			/* FIXME: Need to DMA-map other BO types */
    +			attachment[i]->type = KFD_MEM_ATT_SHARED;
    +			bo[i] = mem->bo;
    +			drm_gem_object_get(&bo[i]->tbo.base);
    +		}

     		/* Add BO to VM internal data structures */
     		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
    -- 
    2.31.1

    _______________________________________________
    dri-devel mailing list
    dri-devel@lists.freedesktop.org
    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Coak.zeng%40amd.com%7C9e2c7006d7264997ce7308d9052e5e97%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546518977421104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0rC9Awqc7OayZrwLiyLdSaXglTtSHloX5y5yUP07LgI%3D&amp;reserved=0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings
  2021-04-22  1:30 ` [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings Felix Kuehling
@ 2021-04-27  0:23   ` Zeng, Oak
  2021-04-27  3:47     ` Felix Kuehling
  0 siblings, 1 reply; 32+ messages in thread
From: Zeng, Oak @ 2021-04-27  0:23 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel



Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:

    DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
    with the BO and page tables reserved, so we can safely update the DMA
    mapping.

    DMA unmap when a BO is unmapped from a GPU and before updating mappings
    in restore workers.

    Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
    ---
     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++++++++++---------
     1 file changed, 29 insertions(+), 27 deletions(-)

    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    index 49d1af4aa5f1..7d25d886b98c 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    @@ -961,11 +961,12 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
     	return ret;
     }

    -static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
    +static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
     				struct kfd_mem_attachment *entry,
     				struct amdgpu_sync *sync)
     {
     	struct amdgpu_bo_va *bo_va = entry->bo_va;
    +	struct amdgpu_device *adev = entry->adev;
     	struct amdgpu_vm *vm = bo_va->base.vm;

     	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
    @@ -974,15 +975,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,

     	amdgpu_sync_fence(sync, bo_va->last_pt_update);

    -	return 0;
    +	kfd_mem_dmaunmap_attachment(mem, entry);
     }

    -static int update_gpuvm_pte(struct amdgpu_device *adev,
    -		struct kfd_mem_attachment *entry,
    -		struct amdgpu_sync *sync)
    +static int update_gpuvm_pte(struct kgd_mem *mem,
    +			    struct kfd_mem_attachment *entry,
    +			    struct amdgpu_sync *sync)
     {
    -	int ret;
     	struct amdgpu_bo_va *bo_va = entry->bo_va;
    +	struct amdgpu_device *adev = entry->adev;
    +	int ret;
    +
    +	ret = kfd_mem_dmamap_attachment(mem, entry);
Should the dma mapping be done in the kfd_mem_attach function on a memory object is attached to a vm the first time? Since each memory object can be mapped to many GPU or many VMs, by doing dma mapping the first it is attached can simplify the logics. Or even simpler, maybe we can just just dma map when a memory object is created - it wastes some iommu page table entry but really simplify the logic in this patch series. I found this series is not very easy to understand.
    +	if (ret)
    +		return ret;

     	/* Update the page tables  */
     	ret = amdgpu_vm_bo_update(adev, bo_va, false);
    @@ -994,14 +1000,15 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
     	return amdgpu_sync_fence(sync, bo_va->last_pt_update);
     }

    -static int map_bo_to_gpuvm(struct amdgpu_device *adev,
    -		struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
    -		bool no_update_pte)
    +static int map_bo_to_gpuvm(struct kgd_mem *mem,
    +			   struct kfd_mem_attachment *entry,
    +			   struct amdgpu_sync *sync,
    +			   bool no_update_pte)
     {
     	int ret;

     	/* Set virtual address for the allocation */
    -	ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
    +	ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
     			       amdgpu_bo_size(entry->bo_va->base.bo),
     			       entry->pte_flags);
     	if (ret) {
    @@ -1013,7 +1020,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
     	if (no_update_pte)
     		return 0;

    -	ret = update_gpuvm_pte(adev, entry, sync);
    +	ret = update_gpuvm_pte(mem, entry, sync);
     	if (ret) {
     		pr_err("update_gpuvm_pte() failed\n");
     		goto update_gpuvm_pte_failed;
    @@ -1022,7 +1029,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
     	return 0;

     update_gpuvm_pte_failed:
    -	unmap_bo_from_gpuvm(adev, entry, sync);
    +	unmap_bo_from_gpuvm(mem, entry, sync);
     	return ret;
     }

    @@ -1596,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
     		pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
     			 entry->va, entry->va + bo_size, entry);

    -		ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
    +		ret = map_bo_to_gpuvm(mem, entry, ctx.sync,
     				      is_invalid_userptr);
     		if (ret) {
     			pr_err("Failed to map bo to gpuvm\n");
    @@ -1635,7 +1642,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
     int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
     		struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv)
     {
    -	struct amdgpu_device *adev = get_amdgpu_device(kgd);
     	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
     	struct amdkfd_process_info *process_info = avm->process_info;
     	unsigned long bo_size = mem->bo->tbo.base.size;
    @@ -1670,13 +1676,8 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
     		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
     			 entry->va, entry->va + bo_size, entry);

    -		ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
    -		if (ret == 0) {
    -			entry->is_mapped = false;
    -		} else {
    -			pr_err("failed to unmap VA 0x%llx\n", mem->va);
    -			goto unreserve_out;
    -		}
    +		unmap_bo_from_gpuvm(mem, entry, ctx.sync);
    +		entry->is_mapped = false;

     		mem->mapped_to_gpu_memory--;
     		pr_debug("\t DEC mapping count %d\n",
    @@ -2053,9 +2054,8 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
     			if (!attachment->is_mapped)
     				continue;

    -			ret = update_gpuvm_pte((struct amdgpu_device *)
    -					       attachment->adev,
    -					       attachment, &sync);
    +			kfd_mem_dmaunmap_attachment(mem, attachment);
    +			ret = update_gpuvm_pte(mem, attachment, &sync);
     			if (ret) {
     				pr_err("%s: update PTE failed\n", __func__);
     				/* make sure this gets validated again */
    @@ -2257,9 +2257,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
     			goto validate_map_fail;
     		}
     		list_for_each_entry(attachment, &mem->attachments, list) {
    -			ret = update_gpuvm_pte((struct amdgpu_device *)
    -					      attachment->adev, attachment,
    -					      &sync_obj);
    +			if (!attachment->is_mapped)
    +				continue;
    +
    +			kfd_mem_dmaunmap_attachment(mem, attachment);
    +			ret = update_gpuvm_pte(mem, attachment, &sync_obj);
     			if (ret) {
     				pr_debug("Memory eviction: update PTE failed. Try again\n");
     				goto validate_map_fail;
    -- 
    2.31.1

    _______________________________________________
    dri-devel mailing list
    dri-devel@lists.freedesktop.org
    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Coak.zeng%40amd.com%7C867f4b956e9d4d2e539208d9052e6140%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519028781182%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VM31e7X1NRqmm7u%2BLTzCqTO0c2fHa0j6PmIXT24eJY8%3D&amp;reserved=0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs
  2021-04-22  1:30 ` [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs Felix Kuehling
@ 2021-04-27  0:35   ` Zeng, Oak
  2021-04-27  3:56     ` Felix Kuehling
  0 siblings, 1 reply; 32+ messages in thread
From: Zeng, Oak @ 2021-04-27  0:35 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel



Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:

    Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.

    Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
    ---
     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +
     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++++++++++++++++++-
     2 files changed, 77 insertions(+), 1 deletion(-)

    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    index 63668433f5a6..b706e5a54782 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    @@ -41,6 +41,7 @@ struct amdgpu_device;
     enum kfd_mem_attachment_type {
     	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
     	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
    +	KFD_MEM_ATT_DMABUF,	/* DMAbuf to DMA map TTM BOs */
     };

     struct kfd_mem_attachment {
    @@ -56,6 +57,7 @@ struct kfd_mem_attachment {
     struct kgd_mem {
     	struct mutex lock;
     	struct amdgpu_bo *bo;
    +	struct dma_buf *dmabuf;
     	struct list_head attachments;
     	/* protected by amdkfd_process_info.lock */
     	struct ttm_validate_buffer validate_list;
    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    index 9eeedd0c7920..18a1f9222a59 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    @@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
     	return ret;
     }

    +static int
    +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
    +{
    +	struct ttm_operation_ctx ctx = {.interruptible = true};
    +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
    +
    +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
    +	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
How does this work? The function name says this is dma mapping a buffer but from the implementation, it is just a placement and validation
    +}
    +
     static int
     kfd_mem_dmamap_attachment(struct kgd_mem *mem,
     			  struct kfd_mem_attachment *attachment)
    @@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
     		return 0;
     	case KFD_MEM_ATT_USERPTR:
     		return kfd_mem_dmamap_userptr(mem, attachment);
    +	case KFD_MEM_ATT_DMABUF:
    +		return kfd_mem_dmamap_dmabuf(attachment);
     	default:
     		WARN_ON_ONCE(1);
     	}
    @@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
     	ttm->sg = NULL;
     }

    +static void
    +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
    +{
    +	struct ttm_operation_ctx ctx = {.interruptible = true};
    +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
    +
    +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
    +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
    +	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
    +	 * called
    +	 */
    +}
    +
     static void
     kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
     			    struct kfd_mem_attachment *attachment)
    @@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
     	case KFD_MEM_ATT_USERPTR:
     		kfd_mem_dmaunmap_userptr(mem, attachment);
     		break;
    +	case KFD_MEM_ATT_DMABUF:
    +		kfd_mem_dmaunmap_dmabuf(attachment);
    +		break;
     	default:
     		WARN_ON_ONCE(1);
     	}
    @@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
     	return 0;
     }

    +static int
    +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
    +		      struct amdgpu_bo **bo)
    +{
    +	struct drm_gem_object *gobj;
    +
    +	if (!mem->dmabuf) {
    +		mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
    +			mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
    +				DRM_RDWR : 0);
    +		if (IS_ERR(mem->dmabuf)) {
    +			mem->dmabuf = NULL;
    +			return PTR_ERR(mem->dmabuf);
    +		}
    +	}
    +
    +	gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
    +	if (IS_ERR(gobj))
    +		return PTR_ERR(gobj);
    +
    +	/* Import takes an extra reference on the dmabuf. Drop it now to
    +	 * avoid leaking it. We only need the one reference in
    +	 * kgd_mem->dmabuf.
    +	 */
    +	dma_buf_put(mem->dmabuf);
    +
    +	*bo = gem_to_amdgpu_bo(gobj);
    +	(*bo)->parent = amdgpu_bo_ref(mem->bo);
    +
    +	return 0;
    +}
    +
     /* kfd_mem_attach - Add a BO to a VM
      *
      * Everything that needs to bo done only once when a BO is first added
    @@ -662,8 +722,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
     			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
     			if (ret)
     				goto unwind;
    +		} else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
    +			   mem->bo->tbo.type != ttm_bo_type_sg) {
    +			/* GTT BOs use DMA-mapping ability of dynamic-attach
    +			 * DMA bufs. TODO: The same should work for VRAM on
    +			 * large-BAR GPUs.
    +			 */
    +			attachment[i]->type = KFD_MEM_ATT_DMABUF;
    +			ret = kfd_mem_attach_dmabuf(adev, mem, &bo[i]);
    +			if (ret)
    +				goto unwind;
     		} else {
    -			/* FIXME: Need to DMA-map other BO types */
    +			/* FIXME: Need to DMA-map other BO types:
    +			 * large-BAR VRAM, doorbells, MMIO remap
    +			 */
     			attachment[i]->type = KFD_MEM_ATT_SHARED;
     			bo[i] = mem->bo;
     			drm_gem_object_get(&bo[i]->tbo.base);
    @@ -1522,6 +1594,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(

     	/* Free the BO*/
     	drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
    +	if (mem->dmabuf)
    +		dma_buf_put(mem->dmabuf);
     	drm_gem_object_put(&mem->bo->tbo.base);
     	mutex_destroy(&mem->lock);
     	kfree(mem);
    -- 
    2.31.1

    _______________________________________________
    amd-gfx mailing list
    amd-gfx@lists.freedesktop.org
    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Coak.zeng%40amd.com%7Ca14e8f1d4ba6450b5f1308d9052e630b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519053906547%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AD%2FnbE6kQDFHLTr0kbzZ2sl3jKwuOqUKfpPEcPHwwfY%3D&amp;reserved=0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers
  2021-04-27  0:09   ` Zeng, Oak
@ 2021-04-27  3:41     ` Felix Kuehling
  2021-05-10 22:05       ` Errabolu, Ramesh
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-27  3:41 UTC (permalink / raw)
  To: Zeng, Oak, amd-gfx, dri-devel

Am 2021-04-26 um 8:09 p.m. schrieb Zeng, Oak:
> As I understand it, when one GPU map another GPU's vram, this vram should also be mapped in iommu page table. Also normal GTT memory (versus userptr) also need to be mapped in iommu. But don't see this code below.

Right, I'm not solving all problems at once. The next patch is there to
handle GTT BOs.

Peer mappings of doorbells, MMIO and VRAM still need to be handled in
the future. I'm trying to fix the worst issues first. This series should
get 99% of real world tests working.


>  I only see you map userptr in iommu. Maybe you map them in iommu not during memory attachment time?
>
> Also see a nit-pick inline
>
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     Add BO-type specific helpers functions to DMA-map and unmap
>     kfd_mem_attachments. Implement this functionality for userptrs by creating
>     one SG BO per GPU and filling it with a DMA mapping of the pages from the
>     original mem->bo.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 146 +++++++++++++++++-
>      2 files changed, 145 insertions(+), 9 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     index c24b2478f445..63668433f5a6 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     @@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
>
>      struct amdgpu_device;
>
>     +enum kfd_mem_attachment_type {
>     +	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
>     +	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
>     +};
>     +
>      struct kfd_mem_attachment {
>      	struct list_head list;
>     +	enum kfd_mem_attachment_type type;
>     +	bool is_mapped;
>      	struct amdgpu_bo_va *bo_va;
>      	struct amdgpu_device *adev;
>     -	bool is_mapped;
>      	uint64_t va;
>      	uint64_t pte_flags;
>      };
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index fbd7e786b54e..49d1af4aa5f1 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -473,12 +473,117 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
>      	return pte_flags;
>      }
>
>     +static int
>     +kfd_mem_dmamap_userptr(struct kgd_mem *mem,
>     +		       struct kfd_mem_attachment *attachment)
>     +{
>     +	enum dma_data_direction direction =
>     +		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     +		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
>     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     +	struct amdgpu_device *adev = attachment->adev;
>     +	struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
>     +	struct ttm_tt *ttm = bo->tbo.ttm;
>     +	int ret;
>     +
>     +	ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
>     +	if (unlikely(!ttm->sg))
>     +		return -ENOMEM;
>     +
>     +	if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
>     +		return -EINVAL;
>     +
>     +	/* Same sequence as in amdgpu_ttm_tt_pin_userptr */
>     +	ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
>     +					ttm->num_pages, 0,
>     +					(u64)ttm->num_pages << PAGE_SHIFT,
>     +					GFP_KERNEL);
>     +	if (unlikely(ret))
>     +		goto release_sg;
> Should go to a label starting from kfree below?

Thanks, I'll fix that.

Regards,
  Felix


>     +
>     +	ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
>     +	if (unlikely(ret))
>     +		goto release_sg;
>     +
>     +	drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
>     +				       ttm->num_pages);
>     +
>     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>     +	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     +	if (ret)
>     +		goto release_sg;
>     +
>     +	return 0;
>     +
>     +release_sg:
>     +	pr_err("DMA map userptr failed: %d\n", ret);
>     +	sg_free_table(ttm->sg);
>     +	kfree(ttm->sg);
>     +	ttm->sg = NULL;
>     +	return ret;
>     +}
>     +
>     +static int
>     +kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>     +			  struct kfd_mem_attachment *attachment)
>     +{
>     +	switch (attachment->type) {
>     +	case KFD_MEM_ATT_SHARED:
>     +		return 0;
>     +	case KFD_MEM_ATT_USERPTR:
>     +		return kfd_mem_dmamap_userptr(mem, attachment);
>     +	default:
>     +		WARN_ON_ONCE(1);
>     +	}
>     +	return -EINVAL;
>     +}
>     +
>     +static void
>     +kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>     +			 struct kfd_mem_attachment *attachment)
>     +{
>     +	enum dma_data_direction direction =
>     +		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     +		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
>     +	struct ttm_operation_ctx ctx = {.interruptible = false};
>     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     +	struct amdgpu_device *adev = attachment->adev;
>     +	struct ttm_tt *ttm = bo->tbo.ttm;
>     +
>     +	if (unlikely(!ttm->sg))
>     +		return;
>     +
>     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>     +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     +
>     +	dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
>     +	sg_free_table(ttm->sg);
>     +	ttm->sg = NULL;
>     +}
>     +
>     +static void
>     +kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>     +			    struct kfd_mem_attachment *attachment)
>     +{
>     +	switch (attachment->type) {
>     +	case KFD_MEM_ATT_SHARED:
>     +		break;
>     +	case KFD_MEM_ATT_USERPTR:
>     +		kfd_mem_dmaunmap_userptr(mem, attachment);
>     +		break;
>     +	default:
>     +		WARN_ON_ONCE(1);
>     +	}
>     +}
>     +
>      /* kfd_mem_attach - Add a BO to a VM
>       *
>       * Everything that needs to bo done only once when a BO is first added
>       * to a VM. It can later be mapped and unmapped many times without
>       * repeating these steps.
>       *
>     + * 0. Create BO for DMA mapping, if needed
>       * 1. Allocate and initialize BO VA entry data structure
>       * 2. Add BO to the VM
>       * 3. Determine ASIC-specific PTE flags
>     @@ -488,10 +593,12 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
>      static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>      		struct amdgpu_vm *vm, bool is_aql)
>      {
>     +	struct amdgpu_device *bo_adev = amdgpu_ttm_adev(mem->bo->tbo.bdev);
>      	unsigned long bo_size = mem->bo->tbo.base.size;
>      	uint64_t va = mem->va;
>      	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
>      	struct amdgpu_bo *bo[2] = {NULL, NULL};
>     +	struct drm_gem_object *gobj;
>      	int i, ret;
>
>      	if (!va) {
>     @@ -509,14 +616,37 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>      		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
>      			 va + bo_size, vm);
>
>     -		/* FIXME: For now all attachments use the same BO. This is
>     -		 * incorrect because one BO can only have one DMA mapping
>     -		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
>     -		 * import with dynamic attachment. This will be addressed
>     -		 * one BO-type at a time in subsequent patches.
>     -		 */
>     -		bo[i] = mem->bo;
>     -		drm_gem_object_get(&bo[i]->tbo.base);
>     +		if (adev == bo_adev || (mem->domain == AMDGPU_GEM_DOMAIN_VRAM &&
>     +					amdgpu_xgmi_same_hive(adev, bo_adev))) {
>     +			/* Mappings on the local GPU and VRAM mappings in the
>     +			 * local hive share the original BO
>     +			 */
>     +			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     +			bo[i] = mem->bo;
>     +			drm_gem_object_get(&bo[i]->tbo.base);
>     +		} else if (i > 0) {
>     +			/* Multiple mappings on the same GPU share the BO */
>     +			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     +			bo[i] = bo[0];
>     +			drm_gem_object_get(&bo[i]->tbo.base);
>     +		} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
>     +			/* Create an SG BO to DMA-map userptrs on other GPUs */
>     +			attachment[i]->type = KFD_MEM_ATT_USERPTR;
>     +			ret = amdgpu_gem_object_create(adev, bo_size, 1,
>     +						       AMDGPU_GEM_DOMAIN_CPU,
>     +						       0, ttm_bo_type_sg,
>     +						       mem->bo->tbo.base.resv,
>     +						       &gobj);
>     +			if (ret)
>     +				goto unwind;
>     +			bo[i] = gem_to_amdgpu_bo(gobj);
>     +			bo[i]->parent = amdgpu_bo_ref(mem->bo);
>     +		} else {
>     +			/* FIXME: Need to DMA-map other BO types */
>     +			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     +			bo[i] = mem->bo;
>     +			drm_gem_object_get(&bo[i]->tbo.base);
>     +		}
>
>      		/* Add BO to VM internal data structures */
>      		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
>     -- 
>     2.31.1
>
>     _______________________________________________
>     dri-devel mailing list
>     dri-devel@lists.freedesktop.org
>     https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings
  2021-04-27  0:23   ` Zeng, Oak
@ 2021-04-27  3:47     ` Felix Kuehling
  2021-05-10 22:06       ` Errabolu, Ramesh
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-27  3:47 UTC (permalink / raw)
  To: Zeng, Oak, amd-gfx, dri-devel

Am 2021-04-26 um 8:23 p.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
>     with the BO and page tables reserved, so we can safely update the DMA
>     mapping.
>
>     DMA unmap when a BO is unmapped from a GPU and before updating mappings
>     in restore workers.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++++++++++---------
>      1 file changed, 29 insertions(+), 27 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index 49d1af4aa5f1..7d25d886b98c 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -961,11 +961,12 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
>      	return ret;
>      }
>
>     -static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
>     +static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
>      				struct kfd_mem_attachment *entry,
>      				struct amdgpu_sync *sync)
>      {
>      	struct amdgpu_bo_va *bo_va = entry->bo_va;
>     +	struct amdgpu_device *adev = entry->adev;
>      	struct amdgpu_vm *vm = bo_va->base.vm;
>
>      	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
>     @@ -974,15 +975,20 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
>
>      	amdgpu_sync_fence(sync, bo_va->last_pt_update);
>
>     -	return 0;
>     +	kfd_mem_dmaunmap_attachment(mem, entry);
>      }
>
>     -static int update_gpuvm_pte(struct amdgpu_device *adev,
>     -		struct kfd_mem_attachment *entry,
>     -		struct amdgpu_sync *sync)
>     +static int update_gpuvm_pte(struct kgd_mem *mem,
>     +			    struct kfd_mem_attachment *entry,
>     +			    struct amdgpu_sync *sync)
>      {
>     -	int ret;
>      	struct amdgpu_bo_va *bo_va = entry->bo_va;
>     +	struct amdgpu_device *adev = entry->adev;
>     +	int ret;
>     +
>     +	ret = kfd_mem_dmamap_attachment(mem, entry);
> Should the dma mapping be done in the kfd_mem_attach function on a memory object is attached to a vm the first time? Since each memory object can be mapped to many GPU or many VMs, by doing dma mapping the first it is attached can simplify the logics. Or even simpler, maybe we can just just dma map when a memory object is created - it wastes some iommu page table entry but really simplify the logic in this patch series. I found this series is not very easy to understand.

The DMA mapping must be updated every time the physical memory
allocation changes, e.g. after a BO was evicted and restored. Basically,
if the physical pages of the BO change, we need to update the DMA
mapping to point to those new pages. Therefore I added this in the
update_gpu_vm_pte function, which is called after a BO has been
validated the first time, or revalidated after an eviction.

You'll also see that I call dmaunmap in the re-validation cases (in the
restore workers below) to ensure that we don't leak DMA mappings.

Regards,
  Felix


>     +	if (ret)
>     +		return ret;
>
>      	/* Update the page tables  */
>      	ret = amdgpu_vm_bo_update(adev, bo_va, false);
>     @@ -994,14 +1000,15 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
>      	return amdgpu_sync_fence(sync, bo_va->last_pt_update);
>      }
>
>     -static int map_bo_to_gpuvm(struct amdgpu_device *adev,
>     -		struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
>     -		bool no_update_pte)
>     +static int map_bo_to_gpuvm(struct kgd_mem *mem,
>     +			   struct kfd_mem_attachment *entry,
>     +			   struct amdgpu_sync *sync,
>     +			   bool no_update_pte)
>      {
>      	int ret;
>
>      	/* Set virtual address for the allocation */
>     -	ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
>     +	ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
>      			       amdgpu_bo_size(entry->bo_va->base.bo),
>      			       entry->pte_flags);
>      	if (ret) {
>     @@ -1013,7 +1020,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
>      	if (no_update_pte)
>      		return 0;
>
>     -	ret = update_gpuvm_pte(adev, entry, sync);
>     +	ret = update_gpuvm_pte(mem, entry, sync);
>      	if (ret) {
>      		pr_err("update_gpuvm_pte() failed\n");
>      		goto update_gpuvm_pte_failed;
>     @@ -1022,7 +1029,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
>      	return 0;
>
>      update_gpuvm_pte_failed:
>     -	unmap_bo_from_gpuvm(adev, entry, sync);
>     +	unmap_bo_from_gpuvm(mem, entry, sync);
>      	return ret;
>      }
>
>     @@ -1596,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      		pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
>      			 entry->va, entry->va + bo_size, entry);
>
>     -		ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
>     +		ret = map_bo_to_gpuvm(mem, entry, ctx.sync,
>      				      is_invalid_userptr);
>      		if (ret) {
>      			pr_err("Failed to map bo to gpuvm\n");
>     @@ -1635,7 +1642,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
>      		struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv)
>      {
>     -	struct amdgpu_device *adev = get_amdgpu_device(kgd);
>      	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
>      	struct amdkfd_process_info *process_info = avm->process_info;
>      	unsigned long bo_size = mem->bo->tbo.base.size;
>     @@ -1670,13 +1676,8 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
>      		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
>      			 entry->va, entry->va + bo_size, entry);
>
>     -		ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
>     -		if (ret == 0) {
>     -			entry->is_mapped = false;
>     -		} else {
>     -			pr_err("failed to unmap VA 0x%llx\n", mem->va);
>     -			goto unreserve_out;
>     -		}
>     +		unmap_bo_from_gpuvm(mem, entry, ctx.sync);
>     +		entry->is_mapped = false;
>
>      		mem->mapped_to_gpu_memory--;
>      		pr_debug("\t DEC mapping count %d\n",
>     @@ -2053,9 +2054,8 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
>      			if (!attachment->is_mapped)
>      				continue;
>
>     -			ret = update_gpuvm_pte((struct amdgpu_device *)
>     -					       attachment->adev,
>     -					       attachment, &sync);
>     +			kfd_mem_dmaunmap_attachment(mem, attachment);
>     +			ret = update_gpuvm_pte(mem, attachment, &sync);
>      			if (ret) {
>      				pr_err("%s: update PTE failed\n", __func__);
>      				/* make sure this gets validated again */
>     @@ -2257,9 +2257,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>      			goto validate_map_fail;
>      		}
>      		list_for_each_entry(attachment, &mem->attachments, list) {
>     -			ret = update_gpuvm_pte((struct amdgpu_device *)
>     -					      attachment->adev, attachment,
>     -					      &sync_obj);
>     +			if (!attachment->is_mapped)
>     +				continue;
>     +
>     +			kfd_mem_dmaunmap_attachment(mem, attachment);
>     +			ret = update_gpuvm_pte(mem, attachment, &sync_obj);
>      			if (ret) {
>      				pr_debug("Memory eviction: update PTE failed. Try again\n");
>      				goto validate_map_fail;
>     -- 
>     2.31.1
>
>     _______________________________________________
>     dri-devel mailing list
>     dri-devel@lists.freedesktop.org
>     https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs
  2021-04-27  0:35   ` Zeng, Oak
@ 2021-04-27  3:56     ` Felix Kuehling
  2021-04-27 14:29       ` Zeng, Oak
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-27  3:56 UTC (permalink / raw)
  To: Zeng, Oak, amd-gfx, dri-devel

Am 2021-04-26 um 8:35 p.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++++++++++++++++++-
>      2 files changed, 77 insertions(+), 1 deletion(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     index 63668433f5a6..b706e5a54782 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     @@ -41,6 +41,7 @@ struct amdgpu_device;
>      enum kfd_mem_attachment_type {
>      	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
>      	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
>     +	KFD_MEM_ATT_DMABUF,	/* DMAbuf to DMA map TTM BOs */
>      };
>
>      struct kfd_mem_attachment {
>     @@ -56,6 +57,7 @@ struct kfd_mem_attachment {
>      struct kgd_mem {
>      	struct mutex lock;
>      	struct amdgpu_bo *bo;
>     +	struct dma_buf *dmabuf;
>      	struct list_head attachments;
>      	/* protected by amdkfd_process_info.lock */
>      	struct ttm_validate_buffer validate_list;
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index 9eeedd0c7920..18a1f9222a59 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
>      	return ret;
>      }
>
>     +static int
>     +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
>     +{
>     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     +
>     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>     +	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> How does this work? The function name says this is dma mapping a buffer but from the implementation, it is just a placement and validation

Conceptually, calling ttm_bo_validate ensures that the BO is in the
specified domain, in this case GTT. Before calling validate, it can be
in the CPU domain, which means it may be swapped to disk so it's not GPU
accessible. For a DMABuf attachment, the CPU domain means, that the
DMABuf is not attached because the underlying memory object may be on
the move or swapped out.

The actual implementation of the dmabuf attachment is currently in
amdgpu_ttm_populate/unpopulate. This is incorrect. Patch 10 in this
series fixes that to move the actual dmabuf attachment into
amdgpu_ttm_backend_bind/unbind, which is called from amdgpu_bo_move when
a BO is moved between the CPU and GTT domains.

Regards,
  Felix


>     +}
>     +
>      static int
>      kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>      			  struct kfd_mem_attachment *attachment)
>     @@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>      		return 0;
>      	case KFD_MEM_ATT_USERPTR:
>      		return kfd_mem_dmamap_userptr(mem, attachment);
>     +	case KFD_MEM_ATT_DMABUF:
>     +		return kfd_mem_dmamap_dmabuf(attachment);
>      	default:
>      		WARN_ON_ONCE(1);
>      	}
>     @@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>      	ttm->sg = NULL;
>      }
>
>     +static void
>     +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
>     +{
>     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     +
>     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>     +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     +	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
>     +	 * called
>     +	 */
>     +}
>     +
>      static void
>      kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>      			    struct kfd_mem_attachment *attachment)
>     @@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>      	case KFD_MEM_ATT_USERPTR:
>      		kfd_mem_dmaunmap_userptr(mem, attachment);
>      		break;
>     +	case KFD_MEM_ATT_DMABUF:
>     +		kfd_mem_dmaunmap_dmabuf(attachment);
>     +		break;
>      	default:
>      		WARN_ON_ONCE(1);
>      	}
>     @@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
>      	return 0;
>      }
>
>     +static int
>     +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
>     +		      struct amdgpu_bo **bo)
>     +{
>     +	struct drm_gem_object *gobj;
>     +
>     +	if (!mem->dmabuf) {
>     +		mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
>     +			mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     +				DRM_RDWR : 0);
>     +		if (IS_ERR(mem->dmabuf)) {
>     +			mem->dmabuf = NULL;
>     +			return PTR_ERR(mem->dmabuf);
>     +		}
>     +	}
>     +
>     +	gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
>     +	if (IS_ERR(gobj))
>     +		return PTR_ERR(gobj);
>     +
>     +	/* Import takes an extra reference on the dmabuf. Drop it now to
>     +	 * avoid leaking it. We only need the one reference in
>     +	 * kgd_mem->dmabuf.
>     +	 */
>     +	dma_buf_put(mem->dmabuf);
>     +
>     +	*bo = gem_to_amdgpu_bo(gobj);
>     +	(*bo)->parent = amdgpu_bo_ref(mem->bo);
>     +
>     +	return 0;
>     +}
>     +
>      /* kfd_mem_attach - Add a BO to a VM
>       *
>       * Everything that needs to bo done only once when a BO is first added
>     @@ -662,8 +722,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>      			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
>      			if (ret)
>      				goto unwind;
>     +		} else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
>     +			   mem->bo->tbo.type != ttm_bo_type_sg) {
>     +			/* GTT BOs use DMA-mapping ability of dynamic-attach
>     +			 * DMA bufs. TODO: The same should work for VRAM on
>     +			 * large-BAR GPUs.
>     +			 */
>     +			attachment[i]->type = KFD_MEM_ATT_DMABUF;
>     +			ret = kfd_mem_attach_dmabuf(adev, mem, &bo[i]);
>     +			if (ret)
>     +				goto unwind;
>      		} else {
>     -			/* FIXME: Need to DMA-map other BO types */
>     +			/* FIXME: Need to DMA-map other BO types:
>     +			 * large-BAR VRAM, doorbells, MMIO remap
>     +			 */
>      			attachment[i]->type = KFD_MEM_ATT_SHARED;
>      			bo[i] = mem->bo;
>      			drm_gem_object_get(&bo[i]->tbo.base);
>     @@ -1522,6 +1594,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>
>      	/* Free the BO*/
>      	drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
>     +	if (mem->dmabuf)
>     +		dma_buf_put(mem->dmabuf);
>      	drm_gem_object_put(&mem->bo->tbo.base);
>      	mutex_destroy(&mem->lock);
>      	kfree(mem);
>     -- 
>     2.31.1
>
>     _______________________________________________
>     amd-gfx mailing list
>     amd-gfx@lists.freedesktop.org
>     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs
  2021-04-27  3:56     ` Felix Kuehling
@ 2021-04-27 14:29       ` Zeng, Oak
  2021-04-27 15:08         ` Felix Kuehling
  0 siblings, 1 reply; 32+ messages in thread
From: Zeng, Oak @ 2021-04-27 14:29 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel



Regards,
Oak 

 

On 2021-04-26, 11:56 PM, "Kuehling, Felix" <Felix.Kuehling@amd.com> wrote:

    Am 2021-04-26 um 8:35 p.m. schrieb Zeng, Oak:
    > Regards,
    > Oak 
    >
    >  
    >
    > On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
    >
    >     Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.
    >
    >     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
    >     ---
    >      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +
    >      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++++++++++++++++++-
    >      2 files changed, 77 insertions(+), 1 deletion(-)
    >
    >     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    >     index 63668433f5a6..b706e5a54782 100644
    >     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    >     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
    >     @@ -41,6 +41,7 @@ struct amdgpu_device;
    >      enum kfd_mem_attachment_type {
    >      	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
    >      	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
    >     +	KFD_MEM_ATT_DMABUF,	/* DMAbuf to DMA map TTM BOs */
    >      };
    >
    >      struct kfd_mem_attachment {
    >     @@ -56,6 +57,7 @@ struct kfd_mem_attachment {
    >      struct kgd_mem {
    >      	struct mutex lock;
    >      	struct amdgpu_bo *bo;
    >     +	struct dma_buf *dmabuf;
    >      	struct list_head attachments;
    >      	/* protected by amdkfd_process_info.lock */
    >      	struct ttm_validate_buffer validate_list;
    >     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    >     index 9eeedd0c7920..18a1f9222a59 100644
    >     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    >     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
    >     @@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
    >      	return ret;
    >      }
    >
    >     +static int
    >     +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
    >     +{
    >     +	struct ttm_operation_ctx ctx = {.interruptible = true};
    >     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
    >     +
    >     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
    >     +	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
    > How does this work? The function name says this is dma mapping a buffer but from the implementation, it is just a placement and validation

    Conceptually, calling ttm_bo_validate ensures that the BO is in the
    specified domain, in this case GTT. Before calling validate, it can be
    in the CPU domain, which means it may be swapped to disk so it's not GPU
    accessible. For a DMABuf attachment, the CPU domain means, that the
    DMABuf is not attached because the underlying memory object may be on
    the move or swapped out.

    The actual implementation of the dmabuf attachment is currently in
    amdgpu_ttm_populate/unpopulate. This is incorrect. Patch 10 in this
    series fixes that to move the actual dmabuf attachment into
    amdgpu_ttm_backend_bind/unbind, which is called from amdgpu_bo_move when
    a BO is moved between the CPU and GTT domains.

Thanks for the explanation. One more thing I don't quite understand: before this series, GTT memory should already has been validated somewhere before GTT memory is mapped to GPU. You added GTT memory validation here - will this validation be duplicated?

The function naming kfd_mem_dmamap_dmabuf is still confusing since it seems to me it is only some preparation work before dynamically dma-map a GTT memory. But I understand from this series' perspective, compared to usrptr (where you actually do the dma-mapping in function kfd_mem_dmamap_usrptr), for gtt memory you leveraged the amdgpu ttm function of dynamic dma-mapping. So maybe the naming here makes sense from that perspective.

Another thing related but not directly to this series: for GTT memory, it is dma-mapped when it is allocated. See function ttm_populate_and_map_pages calling dma_map_page. The question is, will gtt be first dma-unmapping before it is mapped in amdgpu_ttm_backend_bind? It is existing work, not from your series. Maybe there is not issue but I just want to make sure while we are looking at this area. 

    Regards,
      Felix


    >     +}
    >     +
    >      static int
    >      kfd_mem_dmamap_attachment(struct kgd_mem *mem,
    >      			  struct kfd_mem_attachment *attachment)
    >     @@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
    >      		return 0;
    >      	case KFD_MEM_ATT_USERPTR:
    >      		return kfd_mem_dmamap_userptr(mem, attachment);
    >     +	case KFD_MEM_ATT_DMABUF:
    >     +		return kfd_mem_dmamap_dmabuf(attachment);
    >      	default:
    >      		WARN_ON_ONCE(1);
    >      	}
    >     @@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
    >      	ttm->sg = NULL;
    >      }
    >
    >     +static void
    >     +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
    >     +{
    >     +	struct ttm_operation_ctx ctx = {.interruptible = true};
    >     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
    >     +
    >     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
    >     +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
    >     +	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
    >     +	 * called
    >     +	 */
    >     +}
    >     +
    >      static void
    >      kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
    >      			    struct kfd_mem_attachment *attachment)
    >     @@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
    >      	case KFD_MEM_ATT_USERPTR:
    >      		kfd_mem_dmaunmap_userptr(mem, attachment);
    >      		break;
    >     +	case KFD_MEM_ATT_DMABUF:
    >     +		kfd_mem_dmaunmap_dmabuf(attachment);
    >     +		break;
    >      	default:
    >      		WARN_ON_ONCE(1);
    >      	}
    >     @@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
    >      	return 0;
    >      }
    >
    >     +static int
    >     +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
    >     +		      struct amdgpu_bo **bo)
    >     +{
    >     +	struct drm_gem_object *gobj;
    >     +
    >     +	if (!mem->dmabuf) {
    >     +		mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
    >     +			mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
    >     +				DRM_RDWR : 0);
    >     +		if (IS_ERR(mem->dmabuf)) {
    >     +			mem->dmabuf = NULL;
    >     +			return PTR_ERR(mem->dmabuf);
    >     +		}
    >     +	}
    >     +
    >     +	gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
    >     +	if (IS_ERR(gobj))
    >     +		return PTR_ERR(gobj);
    >     +
    >     +	/* Import takes an extra reference on the dmabuf. Drop it now to
    >     +	 * avoid leaking it. We only need the one reference in
    >     +	 * kgd_mem->dmabuf.
    >     +	 */
    >     +	dma_buf_put(mem->dmabuf);
    >     +
    >     +	*bo = gem_to_amdgpu_bo(gobj);
    >     +	(*bo)->parent = amdgpu_bo_ref(mem->bo);
    >     +
    >     +	return 0;
    >     +}
    >     +
    >      /* kfd_mem_attach - Add a BO to a VM
    >       *
    >       * Everything that needs to bo done only once when a BO is first added
    >     @@ -662,8 +722,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
    >      			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
    >      			if (ret)
    >      				goto unwind;
    >     +		} else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
    >     +			   mem->bo->tbo.type != ttm_bo_type_sg) {
    >     +			/* GTT BOs use DMA-mapping ability of dynamic-attach
    >     +			 * DMA bufs. TODO: The same should work for VRAM on
    >     +			 * large-BAR GPUs.
    >     +			 */
    >     +			attachment[i]->type = KFD_MEM_ATT_DMABUF;
    >     +			ret = kfd_mem_attach_dmabuf(adev, mem, &bo[i]);
    >     +			if (ret)
    >     +				goto unwind;
    >      		} else {
    >     -			/* FIXME: Need to DMA-map other BO types */
    >     +			/* FIXME: Need to DMA-map other BO types:
    >     +			 * large-BAR VRAM, doorbells, MMIO remap
    >     +			 */
    >      			attachment[i]->type = KFD_MEM_ATT_SHARED;
    >      			bo[i] = mem->bo;
    >      			drm_gem_object_get(&bo[i]->tbo.base);
    >     @@ -1522,6 +1594,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
    >
    >      	/* Free the BO*/
    >      	drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
    >     +	if (mem->dmabuf)
    >     +		dma_buf_put(mem->dmabuf);
    >      	drm_gem_object_put(&mem->bo->tbo.base);
    >      	mutex_destroy(&mem->lock);
    >      	kfree(mem);
    >     -- 
    >     2.31.1
    >
    >     _______________________________________________
    >     amd-gfx mailing list
    >     amd-gfx@lists.freedesktop.org
    >     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
    >

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs
  2021-04-27 14:29       ` Zeng, Oak
@ 2021-04-27 15:08         ` Felix Kuehling
  2021-05-10 22:07           ` Errabolu, Ramesh
  0 siblings, 1 reply; 32+ messages in thread
From: Felix Kuehling @ 2021-04-27 15:08 UTC (permalink / raw)
  To: Zeng, Oak, amd-gfx, dri-devel

Am 2021-04-27 um 10:29 a.m. schrieb Zeng, Oak:
> Regards,
> Oak 
>
>  
>
> On 2021-04-26, 11:56 PM, "Kuehling, Felix" <Felix.Kuehling@amd.com> wrote:
>
>     Am 2021-04-26 um 8:35 p.m. schrieb Zeng, Oak:
>     > Regards,
>     > Oak 
>     >
>     >  
>     >
>     > On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>     >
>     >     Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.
>     >
>     >     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     >     ---
>     >      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +
>     >      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++++++++++++++++++-
>     >      2 files changed, 77 insertions(+), 1 deletion(-)
>     >
>     >     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     >     index 63668433f5a6..b706e5a54782 100644
>     >     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     >     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     >     @@ -41,6 +41,7 @@ struct amdgpu_device;
>     >      enum kfd_mem_attachment_type {
>     >      	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
>     >      	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
>     >     +	KFD_MEM_ATT_DMABUF,	/* DMAbuf to DMA map TTM BOs */
>     >      };
>     >
>     >      struct kfd_mem_attachment {
>     >     @@ -56,6 +57,7 @@ struct kfd_mem_attachment {
>     >      struct kgd_mem {
>     >      	struct mutex lock;
>     >      	struct amdgpu_bo *bo;
>     >     +	struct dma_buf *dmabuf;
>     >      	struct list_head attachments;
>     >      	/* protected by amdkfd_process_info.lock */
>     >      	struct ttm_validate_buffer validate_list;
>     >     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     >     index 9eeedd0c7920..18a1f9222a59 100644
>     >     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     >     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     >     @@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
>     >      	return ret;
>     >      }
>     >
>     >     +static int
>     >     +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
>     >     +{
>     >     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     >     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     >     +
>     >     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>     >     +	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     > How does this work? The function name says this is dma mapping a buffer but from the implementation, it is just a placement and validation
>
>     Conceptually, calling ttm_bo_validate ensures that the BO is in the
>     specified domain, in this case GTT. Before calling validate, it can be
>     in the CPU domain, which means it may be swapped to disk so it's not GPU
>     accessible. For a DMABuf attachment, the CPU domain means, that the
>     DMABuf is not attached because the underlying memory object may be on
>     the move or swapped out.
>
>     The actual implementation of the dmabuf attachment is currently in
>     amdgpu_ttm_populate/unpopulate. This is incorrect. Patch 10 in this
>     series fixes that to move the actual dmabuf attachment into
>     amdgpu_ttm_backend_bind/unbind, which is called from amdgpu_bo_move when
>     a BO is moved between the CPU and GTT domains.
>
> Thanks for the explanation. One more thing I don't quite understand: before this series, GTT memory should already has been validated somewhere before GTT memory is mapped to GPU. You added GTT memory validation here - will this validation be duplicated?

When you have N GPUs there are now N BOs involved. Each GPU needs its
own BO because it needs its own DMA mapping. There will be one actual
GTT BO that allocates physical pages in TTM. The other BOs are dmabuf
imports that DMA-map the same physical pages for access by the other GPUs.

The validate call here validates one of the dmabuf imports. This does
not duplicate the validation of the underlying TTM BO with the actual
physical memory allocation.


>
> The function naming kfd_mem_dmamap_dmabuf is still confusing since it seems to me it is only some preparation work before dynamically dma-map a GTT memory.

No, this series is not just preparation. It implements DMA mapping of
BOs for multiple GPUs. TTM already handles DMA mapping of the memory for
the device where the memory was allocated. (Yes, even GTT memory is
associated with a specific GPU even though it's physically in system
memory). What this patch series adds, is additional DMA mappings for the
other GPUs. Without this patch, we were using the DMA mapping for GPU-1
in the page table of GPU-X, which is incorrect. It works in many cases
where the DMA mapping is a direct mapping:

  * IOMMU disabled
  * IOMMU in passthrough mode

But it breaks when you have multiple GPUs with an IOMMU that's not
disabled or in passthrough mode.


>  But I understand from this series' perspective, compared to usrptr (where you actually do the dma-mapping in function kfd_mem_dmamap_usrptr), for gtt memory you leveraged the amdgpu ttm function of dynamic dma-mapping. So maybe the naming here makes sense from that perspective.

Yes.


>
> Another thing related but not directly to this series: for GTT memory, it is dma-mapped when it is allocated. See function ttm_populate_and_map_pages calling dma_map_page. The question is, will gtt be first dma-unmapping before it is mapped in amdgpu_ttm_backend_bind? It is existing work, not from your series. Maybe there is not issue but I just want to make sure while we are looking at this area. 

Right. The problem is, that the DMA mappings only work for a specific
device. Using the same DMA mapping on multiple devices is broken. The
reason we got away with it for a long time is, that we were running with
IOMMU disabled or in passthrough mode.

Regards,
  Felix


>
>     Regards,
>       Felix
>
>
>     >     +}
>     >     +
>     >      static int
>     >      kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>     >      			  struct kfd_mem_attachment *attachment)
>     >     @@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>     >      		return 0;
>     >      	case KFD_MEM_ATT_USERPTR:
>     >      		return kfd_mem_dmamap_userptr(mem, attachment);
>     >     +	case KFD_MEM_ATT_DMABUF:
>     >     +		return kfd_mem_dmamap_dmabuf(attachment);
>     >      	default:
>     >      		WARN_ON_ONCE(1);
>     >      	}
>     >     @@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>     >      	ttm->sg = NULL;
>     >      }
>     >
>     >     +static void
>     >     +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
>     >     +{
>     >     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     >     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     >     +
>     >     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>     >     +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     >     +	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
>     >     +	 * called
>     >     +	 */
>     >     +}
>     >     +
>     >      static void
>     >      kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>     >      			    struct kfd_mem_attachment *attachment)
>     >     @@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>     >      	case KFD_MEM_ATT_USERPTR:
>     >      		kfd_mem_dmaunmap_userptr(mem, attachment);
>     >      		break;
>     >     +	case KFD_MEM_ATT_DMABUF:
>     >     +		kfd_mem_dmaunmap_dmabuf(attachment);
>     >     +		break;
>     >      	default:
>     >      		WARN_ON_ONCE(1);
>     >      	}
>     >     @@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
>     >      	return 0;
>     >      }
>     >
>     >     +static int
>     >     +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
>     >     +		      struct amdgpu_bo **bo)
>     >     +{
>     >     +	struct drm_gem_object *gobj;
>     >     +
>     >     +	if (!mem->dmabuf) {
>     >     +		mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
>     >     +			mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     >     +				DRM_RDWR : 0);
>     >     +		if (IS_ERR(mem->dmabuf)) {
>     >     +			mem->dmabuf = NULL;
>     >     +			return PTR_ERR(mem->dmabuf);
>     >     +		}
>     >     +	}
>     >     +
>     >     +	gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
>     >     +	if (IS_ERR(gobj))
>     >     +		return PTR_ERR(gobj);
>     >     +
>     >     +	/* Import takes an extra reference on the dmabuf. Drop it now to
>     >     +	 * avoid leaking it. We only need the one reference in
>     >     +	 * kgd_mem->dmabuf.
>     >     +	 */
>     >     +	dma_buf_put(mem->dmabuf);
>     >     +
>     >     +	*bo = gem_to_amdgpu_bo(gobj);
>     >     +	(*bo)->parent = amdgpu_bo_ref(mem->bo);
>     >     +
>     >     +	return 0;
>     >     +}
>     >     +
>     >      /* kfd_mem_attach - Add a BO to a VM
>     >       *
>     >       * Everything that needs to bo done only once when a BO is first added
>     >     @@ -662,8 +722,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>     >      			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
>     >      			if (ret)
>     >      				goto unwind;
>     >     +		} else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
>     >     +			   mem->bo->tbo.type != ttm_bo_type_sg) {
>     >     +			/* GTT BOs use DMA-mapping ability of dynamic-attach
>     >     +			 * DMA bufs. TODO: The same should work for VRAM on
>     >     +			 * large-BAR GPUs.
>     >     +			 */
>     >     +			attachment[i]->type = KFD_MEM_ATT_DMABUF;
>     >     +			ret = kfd_mem_attach_dmabuf(adev, mem, &bo[i]);
>     >     +			if (ret)
>     >     +				goto unwind;
>     >      		} else {
>     >     -			/* FIXME: Need to DMA-map other BO types */
>     >     +			/* FIXME: Need to DMA-map other BO types:
>     >     +			 * large-BAR VRAM, doorbells, MMIO remap
>     >     +			 */
>     >      			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     >      			bo[i] = mem->bo;
>     >      			drm_gem_object_get(&bo[i]->tbo.base);
>     >     @@ -1522,6 +1594,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>     >
>     >      	/* Free the BO*/
>     >      	drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
>     >     +	if (mem->dmabuf)
>     >     +		dma_buf_put(mem->dmabuf);
>     >      	drm_gem_object_put(&mem->bo->tbo.base);
>     >      	mutex_destroy(&mem->lock);
>     >      	kfree(mem);
>     >     -- 
>     >     2.31.1
>     >
>     >     _______________________________________________
>     >     amd-gfx mailing list
>     >     amd-gfx@lists.freedesktop.org
>     >     https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>     >
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD
  2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
                   ` (9 preceding siblings ...)
  2021-04-22  1:30 ` [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind Felix Kuehling
@ 2021-04-27 15:16 ` Zeng, Oak
  10 siblings, 0 replies; 32+ messages in thread
From: Zeng, Oak @ 2021-04-27 15:16 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel

This series is Acked-by: Oak Zeng <Oak.Zeng@amd.com> 

Regards,
Oak 

 

On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:

    This patch series fixes DMA-mappings of system memory (GTT and userptr)
    for KFD running on multi-GPU systems with IOMMU enabled. One SG-BO per
    GPU is needed to maintain the DMA mappings of each BO.

    Changes in v2:
    - Made the original BO parent of the SG BO to fix bo destruction order
    - Removed individualiation hack that is, not needed with parent BO
    - Removed resv locking hace in amdgpu_ttm_unpopulate, not needed without
      the individualization hack
    - Added a patch to enable the Intel IOMMU driver in rock-dbg_defconfig
    - Added a patch to move dmabuf attach/detach into backend_(un)bind

    I'm still seeing some IOMMU access faults in the eviction test. They seem
    to be related to userptr handling. They happen even without this patch
    series on a single-GPU system, where this patch series is not needed. I
    believe this is an old problem in KFD or amdgpu that is being exposed by
    device isolation from the IOMMU. I'm debugging it, but it should not hold
    up this patch series.

    "drm/ttm: Don't count pages in SG BOs against pages_limit" was already
    applied to drm-misc (I think). I'm still including it here because my
    patches depend on it. Without that, the SG BOs created for DMA mappings
    cause many tests fail because TTM incorrectly thinks it's out of memory.

    Felix Kuehling (10):
      rock-dbg_defconfig: Enable Intel IOMMU
      drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
      drm/amdgpu: Keep a bo-reference per-attachment
      drm/amdgpu: Simplify AQL queue mapping
      drm/amdgpu: Add multi-GPU DMA mapping helpers
      drm/amdgpu: DMA map/unmap when updating GPU mappings
      drm/amdgpu: Move kfd_mem_attach outside reservation
      drm/amdgpu: Add DMA mapping of GTT BOs
      drm/ttm: Don't count pages in SG BOs against pages_limit
      drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind

     arch/x86/configs/rock-dbg_defconfig           |  11 +-
     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  18 +-
     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 530 ++++++++++++------
     drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  51 +-
     drivers/gpu/drm/ttm/ttm_tt.c                  |  27 +-
     5 files changed, 437 insertions(+), 200 deletions(-)

    -- 
    2.31.1

    _______________________________________________
    dri-devel mailing list
    dri-devel@lists.freedesktop.org
    https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Coak.zeng%40amd.com%7Cfb31922bd50846641e9508d9052e635d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519058204046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yxNesWxDmM5H8ObiNmeaa0DBIEyptiBpjUKSUqS%2B52M%3D&amp;reserved=0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment
  2021-04-22  1:30 ` [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment Felix Kuehling
@ 2021-05-10 22:00   ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:00 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Wednesday, April 21, 2021 8:31 PM
To: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment

This name is more fitting, especially for the changes coming next to support multi-GPU systems with proper DMA mappings. Cleaned up the code and renamed some related functions and variables to improve readability.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 209 +++++++++---------
 2 files changed, 104 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 313ee49b9f17..c24b2478f445 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -38,10 +38,10 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
 
 struct amdgpu_device;
 
-struct kfd_bo_va_list {
-	struct list_head bo_list;
+struct kfd_mem_attachment {
+	struct list_head list;
 	struct amdgpu_bo_va *bo_va;
-	void *kgd_dev;
+	struct amdgpu_device *adev;
 	bool is_mapped;
 	uint64_t va;
 	uint64_t pte_flags;
@@ -50,7 +50,7 @@ struct kfd_bo_va_list {  struct kgd_mem {
 	struct mutex lock;
 	struct amdgpu_bo *bo;
-	struct list_head bo_va_list;
+	struct list_head attachments;
 	/* protected by amdkfd_process_info.lock */
 	struct ttm_validate_buffer validate_list;
 	struct ttm_validate_buffer resv_list;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index dfa025d694f8..fee4c64dd051 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -72,16 +72,16 @@ static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
 	return (struct amdgpu_device *)kgd;
 }
 
-static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm,
+static bool kfd_mem_is_attached(struct amdgpu_vm *avm,
 		struct kgd_mem *mem)
 {
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list)
+	list_for_each_entry(entry, &mem->attachments, list)
 		if (entry->bo_va->base.vm == avm)
-			return false;
+			return true;
 
-	return true;
+	return false;
 }
 
 /* Set memory usage limits. Current, limits are @@ -473,7 +473,7 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
 	return pte_flags;
 }
 
-/* add_bo_to_vm - Add a BO to a VM
+/* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added
  * to a VM. It can later be mapped and unmapped many times without @@ -485,15 +485,14 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
  * 4. Alloc page tables and directories if needed
  * 4a.  Validate new page tables and directories
  */
-static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
+static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem 
+*mem,
 		struct amdgpu_vm *vm, bool is_aql,
-		struct kfd_bo_va_list **p_bo_va_entry)
+		struct kfd_mem_attachment **p_attachment)
 {
 	int ret;
-	struct kfd_bo_va_list *bo_va_entry;
+	struct kfd_mem_attachment *attachment;
 	struct amdgpu_bo *bo = mem->bo;
 	uint64_t va = mem->va;
-	struct list_head *list_bo_va = &mem->bo_va_list;
 	unsigned long bo_size = bo->tbo.base.size;
 
 	if (!va) {
@@ -504,29 +503,29 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
 	if (is_aql)
 		va += bo_size;
 
-	bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL);
-	if (!bo_va_entry)
+	attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
+	if (!attachment)
 		return -ENOMEM;
 
 	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
 			va + bo_size, vm);
 
 	/* Add BO to VM internal data structures*/
-	bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
-	if (!bo_va_entry->bo_va) {
+	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
+	if (!attachment->bo_va) {
 		ret = -EINVAL;
 		pr_err("Failed to add BO object to VM. ret == %d\n",
 				ret);
 		goto err_vmadd;
 	}
 
-	bo_va_entry->va = va;
-	bo_va_entry->pte_flags = get_pte_flags(adev, mem);
-	bo_va_entry->kgd_dev = (void *)adev;
-	list_add(&bo_va_entry->bo_list, list_bo_va);
+	attachment->va = va;
+	attachment->pte_flags = get_pte_flags(adev, mem);
+	attachment->adev = adev;
+	list_add(&attachment->list, &mem->attachments);
 
-	if (p_bo_va_entry)
-		*p_bo_va_entry = bo_va_entry;
+	if (p_attachment)
+		*p_attachment = attachment;
 
 	/* Allocate validate page tables if needed */
 	ret = vm_validate_pt_pd_bos(vm);
@@ -538,22 +537,20 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
 	return 0;
 
 err_alloc_pts:
-	amdgpu_vm_bo_rmv(adev, bo_va_entry->bo_va);
-	list_del(&bo_va_entry->bo_list);
+	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
+	list_del(&attachment->list);
 err_vmadd:
-	kfree(bo_va_entry);
+	kfree(attachment);
 	return ret;
 }
 
-static void remove_bo_from_vm(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry, unsigned long size)
+static void kfd_mem_detach(struct kfd_mem_attachment *attachment)
 {
-	pr_debug("\t remove VA 0x%llx - 0x%llx in entry %p\n",
-			entry->va,
-			entry->va + size, entry);
-	amdgpu_vm_bo_rmv(adev, entry->bo_va);
-	list_del(&entry->bo_list);
-	kfree(entry);
+	pr_debug("\t remove VA 0x%llx in entry %p\n",
+			attachment->va, attachment);
+	amdgpu_vm_bo_rmv(attachment->adev, attachment->bo_va);
+	list_del(&attachment->list);
+	kfree(attachment);
 }
 
 static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem, @@ -728,7 +725,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 				struct bo_vm_reservation_context *ctx)  {
 	struct amdgpu_bo *bo = mem->bo;
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 	unsigned int i;
 	int ret;
 
@@ -740,7 +737,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 	INIT_LIST_HEAD(&ctx->list);
 	INIT_LIST_HEAD(&ctx->duplicates);
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+	list_for_each_entry(entry, &mem->attachments, list) {
 		if ((vm && vm != entry->bo_va->base.vm) ||
 			(entry->is_mapped != map_type
 			&& map_type != BO_VM_ALL))
@@ -762,7 +759,7 @@ static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
 	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
 
 	i = 0;
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+	list_for_each_entry(entry, &mem->attachments, list) {
 		if ((vm && vm != entry->bo_va->base.vm) ||
 			(entry->is_mapped != map_type
 			&& map_type != BO_VM_ALL))
@@ -817,7 +814,7 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,  }
 
 static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
-				struct kfd_bo_va_list *entry,
+				struct kfd_mem_attachment *entry,
 				struct amdgpu_sync *sync)
 {
 	struct amdgpu_bo_va *bo_va = entry->bo_va; @@ -833,7 +830,7 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,  }
 
 static int update_gpuvm_pte(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry,
+		struct kfd_mem_attachment *entry,
 		struct amdgpu_sync *sync)
 {
 	int ret;
@@ -850,7 +847,7 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,  }
 
 static int map_bo_to_gpuvm(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry, struct amdgpu_sync *sync,
+		struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
 		bool no_update_pte)
 {
 	int ret;
@@ -1194,7 +1191,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		ret = -ENOMEM;
 		goto err;
 	}
-	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	INIT_LIST_HEAD(&(*mem)->attachments);
 	mutex_init(&(*mem)->lock);
 	(*mem)->aql_queue = !!(flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM);
 
@@ -1283,7 +1280,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 {
 	struct amdkfd_process_info *process_info = mem->process_info;
 	unsigned long bo_size = mem->bo->tbo.base.size;
-	struct kfd_bo_va_list *entry, *tmp;
+	struct kfd_mem_attachment *entry, *tmp;
 	struct bo_vm_reservation_context ctx;
 	struct ttm_validate_buffer *bo_list_entry;
 	unsigned int mapped_to_gpu_memory;
@@ -1327,9 +1324,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 		mem->va + bo_size * (1 + mem->aql_queue));
 
 	/* Remove from VM internal data structures */
-	list_for_each_entry_safe(entry, tmp, &mem->bo_va_list, bo_list)
-		remove_bo_from_vm((struct amdgpu_device *)entry->kgd_dev,
-				entry, bo_size);
+	list_for_each_entry_safe(entry, tmp, &mem->attachments, list)
+		kfd_mem_detach(entry);
 
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
@@ -1372,10 +1368,10 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	int ret;
 	struct amdgpu_bo *bo;
 	uint32_t domain;
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 	struct bo_vm_reservation_context ctx;
-	struct kfd_bo_va_list *bo_va_entry = NULL;
-	struct kfd_bo_va_list *bo_va_entry_aql = NULL;
+	struct kfd_mem_attachment *attachment = NULL;
+	struct kfd_mem_attachment *attachment_aql = NULL;
 	unsigned long bo_size;
 	bool is_invalid_userptr = false;
 
@@ -1424,21 +1420,20 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	    bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
 		is_invalid_userptr = true;
 
-	if (check_if_add_bo_to_vm(avm, mem)) {
-		ret = add_bo_to_vm(adev, mem, avm, false,
-				&bo_va_entry);
+	if (!kfd_mem_is_attached(avm, mem)) {
+		ret = kfd_mem_attach(adev, mem, avm, false, &attachment);
 		if (ret)
-			goto add_bo_to_vm_failed;
+			goto attach_failed;
 		if (mem->aql_queue) {
-			ret = add_bo_to_vm(adev, mem, avm,
-					true, &bo_va_entry_aql);
+			ret = kfd_mem_attach(adev, mem, avm, true,
+					     &attachment_aql);
 			if (ret)
-				goto add_bo_to_vm_failed_aql;
+				goto attach_failed_aql;
 		}
 	} else {
 		ret = vm_validate_pt_pd_bos(avm);
 		if (unlikely(ret))
-			goto add_bo_to_vm_failed;
+			goto attach_failed;
 	}
 
 	if (mem->mapped_to_gpu_memory == 0 &&
@@ -1454,30 +1449,30 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		}
 	}
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
-		if (entry->bo_va->base.vm == avm && !entry->is_mapped) {
-			pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
-					entry->va, entry->va + bo_size,
-					entry);
+	list_for_each_entry(entry, &mem->attachments, list) {
+		if (entry->bo_va->base.vm != avm || entry->is_mapped)
+			continue;
 
-			ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
-					      is_invalid_userptr);
-			if (ret) {
-				pr_err("Failed to map bo to gpuvm\n");
-				goto map_bo_to_gpuvm_failed;
-			}
+		pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
+			 entry->va, entry->va + bo_size, entry);
 
-			ret = vm_update_pds(avm, ctx.sync);
-			if (ret) {
-				pr_err("Failed to update page directories\n");
-				goto map_bo_to_gpuvm_failed;
-			}
+		ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
+				      is_invalid_userptr);
+		if (ret) {
+			pr_err("Failed to map bo to gpuvm\n");
+			goto map_bo_to_gpuvm_failed;
+		}
 
-			entry->is_mapped = true;
-			mem->mapped_to_gpu_memory++;
-			pr_debug("\t INC mapping count %d\n",
-					mem->mapped_to_gpu_memory);
+		ret = vm_update_pds(avm, ctx.sync);
+		if (ret) {
+			pr_err("Failed to update page directories\n");
+			goto map_bo_to_gpuvm_failed;
 		}
+
+		entry->is_mapped = true;
+		mem->mapped_to_gpu_memory++;
+		pr_debug("\t INC mapping count %d\n",
+			 mem->mapped_to_gpu_memory);
 	}
 
 	if (!amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) && !bo->tbo.pin_count) @@ -1489,12 +1484,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	goto out;
 
 map_bo_to_gpuvm_failed:
-	if (bo_va_entry_aql)
-		remove_bo_from_vm(adev, bo_va_entry_aql, bo_size);
-add_bo_to_vm_failed_aql:
-	if (bo_va_entry)
-		remove_bo_from_vm(adev, bo_va_entry, bo_size);
-add_bo_to_vm_failed:
+	if (attachment_aql)
+		kfd_mem_detach(attachment_aql);
+attach_failed_aql:
+	if (attachment)
+		kfd_mem_detach(attachment);
+attach_failed:
 	unreserve_bo_and_vms(&ctx, false, false);
 out:
 	mutex_unlock(&mem->process_info->lock);
@@ -1509,7 +1504,7 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
 	struct amdkfd_process_info *process_info = avm->process_info;
 	unsigned long bo_size = mem->bo->tbo.base.size;
-	struct kfd_bo_va_list *entry;
+	struct kfd_mem_attachment *entry;
 	struct bo_vm_reservation_context ctx;
 	int ret;
 
@@ -1533,26 +1528,24 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 		mem->va + bo_size * (1 + mem->aql_queue),
 		avm);
 
-	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
-		if (entry->bo_va->base.vm == avm && entry->is_mapped) {
-			pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
-					entry->va,
-					entry->va + bo_size,
-					entry);
+	list_for_each_entry(entry, &mem->attachments, list) {
+		if (entry->bo_va->base.vm != avm || !entry->is_mapped)
+			continue;
 
-			ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
-			if (ret == 0) {
-				entry->is_mapped = false;
-			} else {
-				pr_err("failed to unmap VA 0x%llx\n",
-						mem->va);
-				goto unreserve_out;
-			}
+		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
+			 entry->va, entry->va + bo_size, entry);
 
-			mem->mapped_to_gpu_memory--;
-			pr_debug("\t DEC mapping count %d\n",
-					mem->mapped_to_gpu_memory);
+		ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
+		if (ret == 0) {
+			entry->is_mapped = false;
+		} else {
+			pr_err("failed to unmap VA 0x%llx\n", mem->va);
+			goto unreserve_out;
 		}
+
+		mem->mapped_to_gpu_memory--;
+		pr_debug("\t DEC mapping count %d\n",
+			 mem->mapped_to_gpu_memory);
 	}
 
 	/* If BO is unmapped from all VMs, unfence it. It can be evicted if @@ -1701,7 +1694,7 @@ int amdgpu_amdkfd_gpuvm_import_dmabuf(struct kgd_dev *kgd,
 	if (mmap_offset)
 		*mmap_offset = amdgpu_bo_mmap_offset(bo);
 
-	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	INIT_LIST_HEAD(&(*mem)->attachments);
 	mutex_init(&(*mem)->lock);
 
 	(*mem)->alloc_flags =
@@ -1898,7 +1891,7 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 	list_for_each_entry_safe(mem, tmp_mem,
 				 &process_info->userptr_inval_list,
 				 validate_list.head) {
-		struct kfd_bo_va_list *bo_va_entry;
+		struct kfd_mem_attachment *attachment;
 
 		bo = mem->bo;
 
@@ -1921,13 +1914,13 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
 		 * VM faults if the GPU tries to access the invalid
 		 * memory.
 		 */
-		list_for_each_entry(bo_va_entry, &mem->bo_va_list, bo_list) {
-			if (!bo_va_entry->is_mapped)
+		list_for_each_entry(attachment, &mem->attachments, list) {
+			if (!attachment->is_mapped)
 				continue;
 
 			ret = update_gpuvm_pte((struct amdgpu_device *)
-					       bo_va_entry->kgd_dev,
-					       bo_va_entry, &sync);
+					       attachment->adev,
+					       attachment, &sync);
 			if (ret) {
 				pr_err("%s: update PTE failed\n", __func__);
 				/* make sure this gets validated again */ @@ -2108,7 +2101,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 
 		struct amdgpu_bo *bo = mem->bo;
 		uint32_t domain = mem->domain;
-		struct kfd_bo_va_list *bo_va_entry;
+		struct kfd_mem_attachment *attachment;
 
 		total_size += amdgpu_bo_size(bo);
 
@@ -2128,11 +2121,9 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 			pr_debug("Memory eviction: Sync BO fence failed. Try again\n");
 			goto validate_map_fail;
 		}
-		list_for_each_entry(bo_va_entry, &mem->bo_va_list,
-				    bo_list) {
+		list_for_each_entry(attachment, &mem->attachments, list) {
 			ret = update_gpuvm_pte((struct amdgpu_device *)
-					      bo_va_entry->kgd_dev,
-					      bo_va_entry,
+					      attachment->adev, attachment,
 					      &sync_obj);
 			if (ret) {
 				pr_debug("Memory eviction: update PTE failed. Try again\n"); @@ -2208,7 +2199,7 @@ int amdgpu_amdkfd_add_gws_to_process(void *info, void *gws, struct kgd_mem **mem
 		return -ENOMEM;
 
 	mutex_init(&(*mem)->lock);
-	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	INIT_LIST_HEAD(&(*mem)->attachments);
 	(*mem)->bo = amdgpu_bo_ref(gws_bo);
 	(*mem)->domain = AMDGPU_GEM_DOMAIN_GWS;
 	(*mem)->process_info = process_info;
--
2.31.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7Cdae33699caef4626f26b08d9052e6488%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519080163885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=w%2FdbGNao9UPDfoZ6%2BNA1qo8BFKPfAiKJzEEbD9eKgdY%3D&amp;reserved=0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment
  2021-04-22  1:30 ` [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment Felix Kuehling
@ 2021-05-10 22:00   ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:00 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Wednesday, April 21, 2021 8:31 PM
To: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment

For now they all reference the same BO. For correct DMA mappings they will refer to different BOs per-GPU.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index fee4c64dd051..34c9a2d0028e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -489,11 +489,11 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		struct amdgpu_vm *vm, bool is_aql,
 		struct kfd_mem_attachment **p_attachment)  {
-	int ret;
-	struct kfd_mem_attachment *attachment;
-	struct amdgpu_bo *bo = mem->bo;
+	unsigned long bo_size = mem->bo->tbo.base.size;
 	uint64_t va = mem->va;
-	unsigned long bo_size = bo->tbo.base.size;
+	struct kfd_mem_attachment *attachment;
+	struct amdgpu_bo *bo;
+	int ret;
 
 	if (!va) {
 		pr_err("Invalid VA when adding BO to VM\n"); @@ -510,6 +510,14 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
 			va + bo_size, vm);
 
+	/* FIXME: For now all attachments use the same BO. This is incorrect
+	 * because one BO can only have one DMA mapping for one GPU. We need
+	 * one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
+	 * will be addressed one BO-type at a time in subsequent patches.
+	 */
+	bo = mem->bo;
+	drm_gem_object_get(&bo->tbo.base);
+
 	/* Add BO to VM internal data structures*/
 	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
 	if (!attachment->bo_va) {
@@ -529,7 +537,7 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 
 	/* Allocate validate page tables if needed */
 	ret = vm_validate_pt_pd_bos(vm);
-	if (ret) {
+	if (unlikely(ret)) {
 		pr_err("validate_pt_pd_bos() failed\n");
 		goto err_alloc_pts;
 	}
@@ -540,15 +548,19 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
 	list_del(&attachment->list);
 err_vmadd:
+	drm_gem_object_put(&bo->tbo.base);
 	kfree(attachment);
 	return ret;
 }
 
 static void kfd_mem_detach(struct kfd_mem_attachment *attachment)  {
+	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
+
 	pr_debug("\t remove VA 0x%llx in entry %p\n",
 			attachment->va, attachment);
 	amdgpu_vm_bo_rmv(attachment->adev, attachment->bo_va);
+	drm_gem_object_put(&bo->tbo.base);
 	list_del(&attachment->list);
 	kfree(attachment);
 }
--
2.31.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7C45e84767a4a54ffcf1e908d9052e6301%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519079854064%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h9%2BK4amKsU5EZUkaA5tI3j1x7xSWECROzVi%2FAY%2FgtLs%3D&amp;reserved=0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping
  2021-04-23  7:23     ` Felix Kuehling
@ 2021-05-10 22:03       ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:03 UTC (permalink / raw)
  To: Kuehling, Felix, Zeng, Oak, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Friday, April 23, 2021 2:23 AM
To: Zeng, Oak <Oak.Zeng@amd.com>; amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping

Am 2021-04-22 um 9:33 p.m. schrieb Zeng, Oak:
> Regards,
> Oak
>
>  
>
> On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     Do AQL queue double-mapping with a single attach call. That will make it
>     easier to create per-GPU BOs later, to be shared between the two BO VA
>     mappings on the same GPU.
>
>     Freeing the attachments is not necessary if map_to_gpu fails. These will be
>     cleaned up when the kdg_mem object is destroyed in
>     amdgpu_amdkfd_gpuvm_free_memory_of_gpu.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 103 ++++++++----------
>      1 file changed, 48 insertions(+), 55 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index 34c9a2d0028e..fbd7e786b54e 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -486,70 +486,76 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
>       * 4a.  Validate new page tables and directories
>       */
>      static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>     -		struct amdgpu_vm *vm, bool is_aql,
>     -		struct kfd_mem_attachment **p_attachment)
>     +		struct amdgpu_vm *vm, bool is_aql)
>      {
>      	unsigned long bo_size = mem->bo->tbo.base.size;
>      	uint64_t va = mem->va;
>     -	struct kfd_mem_attachment *attachment;
>     -	struct amdgpu_bo *bo;
>     -	int ret;
>     +	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
>     +	struct amdgpu_bo *bo[2] = {NULL, NULL};
>     +	int i, ret;
>
>      	if (!va) {
>      		pr_err("Invalid VA when adding BO to VM\n");
>      		return -EINVAL;
>      	}
>
>     -	if (is_aql)
>     -		va += bo_size;
>     -
>     -	attachment = kzalloc(sizeof(*attachment), GFP_KERNEL);
>     -	if (!attachment)
>     -		return -ENOMEM;
>     +	for (i = 0; i <= is_aql; i++) {
>     +		attachment[i] = kzalloc(sizeof(*attachment[i]), GFP_KERNEL);
>     +		if (unlikely(!attachment[i])) {
>     +			ret = -ENOMEM;
>     +			goto unwind;
>     +		}
>
>     -	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
>     -			va + bo_size, vm);
>     +		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
>     +			 va + bo_size, vm);
>
>     -	/* FIXME: For now all attachments use the same BO. This is incorrect
>     -	 * because one BO can only have one DMA mapping for one GPU. We need
>     -	 * one BO per GPU, e.g. a DMABuf import with dynamic attachment. This
>     -	 * will be addressed one BO-type at a time in subsequent patches.
>     -	 */
>     -	bo = mem->bo;
>     -	drm_gem_object_get(&bo->tbo.base);
>     +		/* FIXME: For now all attachments use the same BO. This is
>     +		 * incorrect because one BO can only have one DMA mapping
>     +		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
>     +		 * import with dynamic attachment. This will be addressed
>     +		 * one BO-type at a time in subsequent patches.
>     +		 */
>     +		bo[i] = mem->bo;
>     +		drm_gem_object_get(&bo[i]->tbo.base);
>
>     -	/* Add BO to VM internal data structures*/
>     -	attachment->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
>     -	if (!attachment->bo_va) {
>     -		ret = -EINVAL;
>     -		pr_err("Failed to add BO object to VM. ret == %d\n",
>     -				ret);
>     -		goto err_vmadd;
>     -	}
>     +		/* Add BO to VM internal data structures */
>     +		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
> Just for discussion. Are we allowed to add one bo twice to a vm? When I looked at amdgpu_vm_bo_base_init (called by amdgpu_vm_bo_add), line:
> bo->vm_bo = base;
> when you add the same bo to vm the second time, bo->vm_bo will be overwritten. I am not sure whether this will cause an issue later.
> This is not introduced by your code. The original code (calling kfd_mem_attach twice for aql) has the same problem.

If you just add one more line of context, you'll see that bo->vm_bo is the start of a single linked list of struct amdgpu_vm_bo_base. So adding a BO to a VM multiple times just extends that single-linked list:

        base->next = bo->vm_bo;
        bo->vm_bo = base;

Regards,
  Felix


>     +		if (unlikely(!attachment[i]->bo_va)) {
>     +			ret = -ENOMEM;
>     +			pr_err("Failed to add BO object to VM. ret == %d\n",
>     +			       ret);
>     +			goto unwind;
>     +		}
>
>     -	attachment->va = va;
>     -	attachment->pte_flags = get_pte_flags(adev, mem);
>     -	attachment->adev = adev;
>     -	list_add(&attachment->list, &mem->attachments);
>     +		attachment[i]->va = va;
>     +		attachment[i]->pte_flags = get_pte_flags(adev, mem);
>     +		attachment[i]->adev = adev;
>     +		list_add(&attachment[i]->list, &mem->attachments);
>
>     -	if (p_attachment)
>     -		*p_attachment = attachment;
>     +		va += bo_size;
>     +	}
>
>      	/* Allocate validate page tables if needed */
>      	ret = vm_validate_pt_pd_bos(vm);
>      	if (unlikely(ret)) {
>      		pr_err("validate_pt_pd_bos() failed\n");
>     -		goto err_alloc_pts;
>     +		goto unwind;
>      	}
>
>      	return 0;
>
>     -err_alloc_pts:
>     -	amdgpu_vm_bo_rmv(adev, attachment->bo_va);
>     -	list_del(&attachment->list);
>     -err_vmadd:
>     -	drm_gem_object_put(&bo->tbo.base);
>     -	kfree(attachment);
>     +unwind:
>     +	for (; i >= 0; i--) {
>     +		if (!attachment[i])
>     +			continue;
>     +		if (attachment[i]->bo_va) {
>     +			amdgpu_vm_bo_rmv(adev, attachment[i]->bo_va);
>     +			list_del(&attachment[i]->list);
>     +		}
>     +		if (bo[i])
>     +			drm_gem_object_put(&bo[i]->tbo.base);
>     +		kfree(attachment[i]);
>     +	}
>      	return ret;
>      }
>
>     @@ -1382,8 +1388,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      	uint32_t domain;
>      	struct kfd_mem_attachment *entry;
>      	struct bo_vm_reservation_context ctx;
>     -	struct kfd_mem_attachment *attachment = NULL;
>     -	struct kfd_mem_attachment *attachment_aql = NULL;
>      	unsigned long bo_size;
>      	bool is_invalid_userptr = false;
>
>     @@ -1433,15 +1437,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      		is_invalid_userptr = true;
>
>      	if (!kfd_mem_is_attached(avm, mem)) {
>     -		ret = kfd_mem_attach(adev, mem, avm, false, &attachment);
>     +		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
>      		if (ret)
>      			goto attach_failed;
>     -		if (mem->aql_queue) {
>     -			ret = kfd_mem_attach(adev, mem, avm, true,
>     -					     &attachment_aql);
>     -			if (ret)
>     -				goto attach_failed_aql;
>     -		}
>      	} else {
>      		ret = vm_validate_pt_pd_bos(avm);
>      		if (unlikely(ret))
>     @@ -1496,11 +1494,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      	goto out;
>
>      map_bo_to_gpuvm_failed:
>     -	if (attachment_aql)
>     -		kfd_mem_detach(attachment_aql);
>     -attach_failed_aql:
>     -	if (attachment)
>     -		kfd_mem_detach(attachment);
>      attach_failed:
>      	unreserve_bo_and_vms(&ctx, false, false);
>      out:
>     -- 
>     2.31.1
>
>     _______________________________________________
>     amd-gfx mailing list
>     amd-gfx@lists.freedesktop.org
>     
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cph
> ilip.yang%40amd.com%7Ca464efb3fc9e4139a80508d90628b3a0%7C3dd8961fe4884
> e608e11a82d994e183d%7C0%7C0%7C637547594180231281%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C1000&amp;sdata=D2seAdDqbmuCiDQzFTcv33Uyc8ELzu6zXxIralfCC8E%3D&amp;re
> served=0
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7Ca464efb3fc9e4139a80508d90628b3a0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637547594180231281%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=D2seAdDqbmuCiDQzFTcv33Uyc8ELzu6zXxIralfCC8E%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers
  2021-04-27  3:41     ` Felix Kuehling
@ 2021-05-10 22:05       ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:05 UTC (permalink / raw)
  To: Kuehling, Felix, Zeng, Oak, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Monday, April 26, 2021 10:41 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers

Am 2021-04-26 um 8:09 p.m. schrieb Zeng, Oak:
> As I understand it, when one GPU map another GPU's vram, this vram should also be mapped in iommu page table. Also normal GTT memory (versus userptr) also need to be mapped in iommu. But don't see this code below.

Right, I'm not solving all problems at once. The next patch is there to handle GTT BOs.

Peer mappings of doorbells, MMIO and VRAM still need to be handled in the future. I'm trying to fix the worst issues first. This series should get 99% of real world tests working.


>  I only see you map userptr in iommu. Maybe you map them in iommu not during memory attachment time?
>
> Also see a nit-pick inline
>
> Regards,
> Oak
>
>  
>
> On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     Add BO-type specific helpers functions to DMA-map and unmap
>     kfd_mem_attachments. Implement this functionality for userptrs by creating
>     one SG BO per GPU and filling it with a DMA mapping of the pages from the
>     original mem->bo.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 146 +++++++++++++++++-
>      2 files changed, 145 insertions(+), 9 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     index c24b2478f445..63668433f5a6 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     @@ -38,11 +38,17 @@ extern uint64_t amdgpu_amdkfd_total_mem_size;
>
>      struct amdgpu_device;
>
>     +enum kfd_mem_attachment_type {
>     +	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
>     +	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
>     +};
>     +
>      struct kfd_mem_attachment {
>      	struct list_head list;
>     +	enum kfd_mem_attachment_type type;
>     +	bool is_mapped;
>      	struct amdgpu_bo_va *bo_va;
>      	struct amdgpu_device *adev;
>     -	bool is_mapped;
>      	uint64_t va;
>      	uint64_t pte_flags;
>      };
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index fbd7e786b54e..49d1af4aa5f1 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -473,12 +473,117 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
>      	return pte_flags;
>      }
>
>     +static int
>     +kfd_mem_dmamap_userptr(struct kgd_mem *mem,
>     +		       struct kfd_mem_attachment *attachment)
>     +{
>     +	enum dma_data_direction direction =
>     +		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     +		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
>     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     +	struct amdgpu_device *adev = attachment->adev;
>     +	struct ttm_tt *src_ttm = mem->bo->tbo.ttm;
>     +	struct ttm_tt *ttm = bo->tbo.ttm;
>     +	int ret;
>     +
>     +	ttm->sg = kmalloc(sizeof(*ttm->sg), GFP_KERNEL);
>     +	if (unlikely(!ttm->sg))
>     +		return -ENOMEM;
>     +
>     +	if (WARN_ON(ttm->num_pages != src_ttm->num_pages))
>     +		return -EINVAL;
>     +
>     +	/* Same sequence as in amdgpu_ttm_tt_pin_userptr */
>     +	ret = sg_alloc_table_from_pages(ttm->sg, src_ttm->pages,
>     +					ttm->num_pages, 0,
>     +					(u64)ttm->num_pages << PAGE_SHIFT,
>     +					GFP_KERNEL);
>     +	if (unlikely(ret))
>     +		goto release_sg;
> Should go to a label starting from kfree below?

Thanks, I'll fix that.

Regards,
  Felix


>     +
>     +	ret = dma_map_sgtable(adev->dev, ttm->sg, direction, 0);
>     +	if (unlikely(ret))
>     +		goto release_sg;
>     +
>     +	drm_prime_sg_to_dma_addr_array(ttm->sg, ttm->dma_address,
>     +				       ttm->num_pages);
>     +
>     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>     +	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     +	if (ret)
>     +		goto release_sg;
>     +
>     +	return 0;
>     +
>     +release_sg:
>     +	pr_err("DMA map userptr failed: %d\n", ret);
>     +	sg_free_table(ttm->sg);
>     +	kfree(ttm->sg);
>     +	ttm->sg = NULL;
>     +	return ret;
>     +}
>     +
>     +static int
>     +kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>     +			  struct kfd_mem_attachment *attachment)
>     +{
>     +	switch (attachment->type) {
>     +	case KFD_MEM_ATT_SHARED:
>     +		return 0;
>     +	case KFD_MEM_ATT_USERPTR:
>     +		return kfd_mem_dmamap_userptr(mem, attachment);
>     +	default:
>     +		WARN_ON_ONCE(1);
>     +	}
>     +	return -EINVAL;
>     +}
>     +
>     +static void
>     +kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>     +			 struct kfd_mem_attachment *attachment)
>     +{
>     +	enum dma_data_direction direction =
>     +		mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     +		DMA_BIDIRECTIONAL : DMA_TO_DEVICE;
>     +	struct ttm_operation_ctx ctx = {.interruptible = false};
>     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     +	struct amdgpu_device *adev = attachment->adev;
>     +	struct ttm_tt *ttm = bo->tbo.ttm;
>     +
>     +	if (unlikely(!ttm->sg))
>     +		return;
>     +
>     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>     +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     +
>     +	dma_unmap_sgtable(adev->dev, ttm->sg, direction, 0);
>     +	sg_free_table(ttm->sg);
>     +	ttm->sg = NULL;
>     +}
>     +
>     +static void
>     +kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>     +			    struct kfd_mem_attachment *attachment)
>     +{
>     +	switch (attachment->type) {
>     +	case KFD_MEM_ATT_SHARED:
>     +		break;
>     +	case KFD_MEM_ATT_USERPTR:
>     +		kfd_mem_dmaunmap_userptr(mem, attachment);
>     +		break;
>     +	default:
>     +		WARN_ON_ONCE(1);
>     +	}
>     +}
>     +
>      /* kfd_mem_attach - Add a BO to a VM
>       *
>       * Everything that needs to bo done only once when a BO is first added
>       * to a VM. It can later be mapped and unmapped many times without
>       * repeating these steps.
>       *
>     + * 0. Create BO for DMA mapping, if needed
>       * 1. Allocate and initialize BO VA entry data structure
>       * 2. Add BO to the VM
>       * 3. Determine ASIC-specific PTE flags
>     @@ -488,10 +593,12 @@ static uint64_t get_pte_flags(struct amdgpu_device *adev, struct kgd_mem *mem)
>      static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>      		struct amdgpu_vm *vm, bool is_aql)
>      {
>     +	struct amdgpu_device *bo_adev = amdgpu_ttm_adev(mem->bo->tbo.bdev);
>      	unsigned long bo_size = mem->bo->tbo.base.size;
>      	uint64_t va = mem->va;
>      	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
>      	struct amdgpu_bo *bo[2] = {NULL, NULL};
>     +	struct drm_gem_object *gobj;
>      	int i, ret;
>
>      	if (!va) {
>     @@ -509,14 +616,37 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>      		pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
>      			 va + bo_size, vm);
>
>     -		/* FIXME: For now all attachments use the same BO. This is
>     -		 * incorrect because one BO can only have one DMA mapping
>     -		 * for one GPU. We need one BO per GPU, e.g. a DMABuf
>     -		 * import with dynamic attachment. This will be addressed
>     -		 * one BO-type at a time in subsequent patches.
>     -		 */
>     -		bo[i] = mem->bo;
>     -		drm_gem_object_get(&bo[i]->tbo.base);
>     +		if (adev == bo_adev || (mem->domain == AMDGPU_GEM_DOMAIN_VRAM &&
>     +					amdgpu_xgmi_same_hive(adev, bo_adev))) {
>     +			/* Mappings on the local GPU and VRAM mappings in the
>     +			 * local hive share the original BO
>     +			 */
>     +			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     +			bo[i] = mem->bo;
>     +			drm_gem_object_get(&bo[i]->tbo.base);
>     +		} else if (i > 0) {
>     +			/* Multiple mappings on the same GPU share the BO */
>     +			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     +			bo[i] = bo[0];
>     +			drm_gem_object_get(&bo[i]->tbo.base);
>     +		} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
>     +			/* Create an SG BO to DMA-map userptrs on other GPUs */
>     +			attachment[i]->type = KFD_MEM_ATT_USERPTR;
>     +			ret = amdgpu_gem_object_create(adev, bo_size, 1,
>     +						       AMDGPU_GEM_DOMAIN_CPU,
>     +						       0, ttm_bo_type_sg,
>     +						       mem->bo->tbo.base.resv,
>     +						       &gobj);
>     +			if (ret)
>     +				goto unwind;
>     +			bo[i] = gem_to_amdgpu_bo(gobj);
>     +			bo[i]->parent = amdgpu_bo_ref(mem->bo);
>     +		} else {
>     +			/* FIXME: Need to DMA-map other BO types */
>     +			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     +			bo[i] = mem->bo;
>     +			drm_gem_object_get(&bo[i]->tbo.base);
>     +		}
>
>      		/* Add BO to VM internal data structures */
>      		attachment[i]->bo_va = amdgpu_vm_bo_add(adev, vm, bo[i]);
>     -- 
>     2.31.1
>
>     _______________________________________________
>     dri-devel mailing list
>     dri-devel@lists.freedesktop.org
>     
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7C
> philip.yang%40amd.com%7C3766043c467041cc358408d9092e5990%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637550916942474408%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000&amp;sdata=H1H%2BCwSeLkcjiUMaE%2Fczj%2BYpBKuF%2Fhqnen0p9duG4UM
> %3D&amp;reserved=0
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7C3766043c467041cc358408d9092e5990%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637550916942484402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=N5eiRkOWy2pkABLXozti0uU0chHg3w7JFzgN%2FqdLdOA%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings
  2021-04-27  3:47     ` Felix Kuehling
@ 2021-05-10 22:06       ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:06 UTC (permalink / raw)
  To: Kuehling, Felix, Zeng, Oak, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Monday, April 26, 2021 10:48 PM
To: Zeng, Oak <Oak.Zeng@amd.com>; amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings

Am 2021-04-26 um 8:23 p.m. schrieb Zeng, Oak:
> Regards,
> Oak
>
>  
>
> On 2021-04-21, 9:31 PM, "dri-devel on behalf of Felix Kuehling" <dri-devel-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>
>     DMA map kfd_mem_attachments in update_gpuvm_pte. This function is called
>     with the BO and page tables reserved, so we can safely update the DMA
>     mapping.
>
>     DMA unmap when a BO is unmapped from a GPU and before updating mappings
>     in restore workers.
>
>     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     ---
>      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 56 ++++++++++---------
>      1 file changed, 29 insertions(+), 27 deletions(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     index 49d1af4aa5f1..7d25d886b98c 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     @@ -961,11 +961,12 @@ static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
>      	return ret;
>      }
>
>     -static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
>     +static void unmap_bo_from_gpuvm(struct kgd_mem *mem,
>      				struct kfd_mem_attachment *entry,
>      				struct amdgpu_sync *sync)
>      {
>      	struct amdgpu_bo_va *bo_va = entry->bo_va;
>     +	struct amdgpu_device *adev = entry->adev;
>      	struct amdgpu_vm *vm = bo_va->base.vm;
>
>      	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
>     @@ -974,15 +975,20 @@ static int unmap_bo_from_gpuvm(struct 
> amdgpu_device *adev,
>
>      	amdgpu_sync_fence(sync, bo_va->last_pt_update);
>
>     -	return 0;
>     +	kfd_mem_dmaunmap_attachment(mem, entry);
>      }
>
>     -static int update_gpuvm_pte(struct amdgpu_device *adev,
>     -		struct kfd_mem_attachment *entry,
>     -		struct amdgpu_sync *sync)
>     +static int update_gpuvm_pte(struct kgd_mem *mem,
>     +			    struct kfd_mem_attachment *entry,
>     +			    struct amdgpu_sync *sync)
>      {
>     -	int ret;
>      	struct amdgpu_bo_va *bo_va = entry->bo_va;
>     +	struct amdgpu_device *adev = entry->adev;
>     +	int ret;
>     +
>     +	ret = kfd_mem_dmamap_attachment(mem, entry);
> Should the dma mapping be done in the kfd_mem_attach function on a memory object is attached to a vm the first time? Since each memory object can be mapped to many GPU or many VMs, by doing dma mapping the first it is attached can simplify the logics. Or even simpler, maybe we can just just dma map when a memory object is created - it wastes some iommu page table entry but really simplify the logic in this patch series. I found this series is not very easy to understand.

The DMA mapping must be updated every time the physical memory allocation changes, e.g. after a BO was evicted and restored. Basically, if the physical pages of the BO change, we need to update the DMA mapping to point to those new pages. Therefore I added this in the update_gpu_vm_pte function, which is called after a BO has been validated the first time, or revalidated after an eviction.

You'll also see that I call dmaunmap in the re-validation cases (in the restore workers below) to ensure that we don't leak DMA mappings.

Regards,
  Felix


>     +	if (ret)
>     +		return ret;
>
>      	/* Update the page tables  */
>      	ret = amdgpu_vm_bo_update(adev, bo_va, false);
>     @@ -994,14 +1000,15 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
>      	return amdgpu_sync_fence(sync, bo_va->last_pt_update);
>      }
>
>     -static int map_bo_to_gpuvm(struct amdgpu_device *adev,
>     -		struct kfd_mem_attachment *entry, struct amdgpu_sync *sync,
>     -		bool no_update_pte)
>     +static int map_bo_to_gpuvm(struct kgd_mem *mem,
>     +			   struct kfd_mem_attachment *entry,
>     +			   struct amdgpu_sync *sync,
>     +			   bool no_update_pte)
>      {
>      	int ret;
>
>      	/* Set virtual address for the allocation */
>     -	ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
>     +	ret = amdgpu_vm_bo_map(entry->adev, entry->bo_va, entry->va, 0,
>      			       amdgpu_bo_size(entry->bo_va->base.bo),
>      			       entry->pte_flags);
>      	if (ret) {
>     @@ -1013,7 +1020,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
>      	if (no_update_pte)
>      		return 0;
>
>     -	ret = update_gpuvm_pte(adev, entry, sync);
>     +	ret = update_gpuvm_pte(mem, entry, sync);
>      	if (ret) {
>      		pr_err("update_gpuvm_pte() failed\n");
>      		goto update_gpuvm_pte_failed;
>     @@ -1022,7 +1029,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
>      	return 0;
>
>      update_gpuvm_pte_failed:
>     -	unmap_bo_from_gpuvm(adev, entry, sync);
>     +	unmap_bo_from_gpuvm(mem, entry, sync);
>      	return ret;
>      }
>
>     @@ -1596,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      		pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
>      			 entry->va, entry->va + bo_size, entry);
>
>     -		ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
>     +		ret = map_bo_to_gpuvm(mem, entry, ctx.sync,
>      				      is_invalid_userptr);
>      		if (ret) {
>      			pr_err("Failed to map bo to gpuvm\n");
>     @@ -1635,7 +1642,6 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
>      int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
>      		struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv)
>      {
>     -	struct amdgpu_device *adev = get_amdgpu_device(kgd);
>      	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
>      	struct amdkfd_process_info *process_info = avm->process_info;
>      	unsigned long bo_size = mem->bo->tbo.base.size;
>     @@ -1670,13 +1676,8 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
>      		pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
>      			 entry->va, entry->va + bo_size, entry);
>
>     -		ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
>     -		if (ret == 0) {
>     -			entry->is_mapped = false;
>     -		} else {
>     -			pr_err("failed to unmap VA 0x%llx\n", mem->va);
>     -			goto unreserve_out;
>     -		}
>     +		unmap_bo_from_gpuvm(mem, entry, ctx.sync);
>     +		entry->is_mapped = false;
>
>      		mem->mapped_to_gpu_memory--;
>      		pr_debug("\t DEC mapping count %d\n",
>     @@ -2053,9 +2054,8 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
>      			if (!attachment->is_mapped)
>      				continue;
>
>     -			ret = update_gpuvm_pte((struct amdgpu_device *)
>     -					       attachment->adev,
>     -					       attachment, &sync);
>     +			kfd_mem_dmaunmap_attachment(mem, attachment);
>     +			ret = update_gpuvm_pte(mem, attachment, &sync);
>      			if (ret) {
>      				pr_err("%s: update PTE failed\n", __func__);
>      				/* make sure this gets validated again */
>     @@ -2257,9 +2257,11 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
>      			goto validate_map_fail;
>      		}
>      		list_for_each_entry(attachment, &mem->attachments, list) {
>     -			ret = update_gpuvm_pte((struct amdgpu_device *)
>     -					      attachment->adev, attachment,
>     -					      &sync_obj);
>     +			if (!attachment->is_mapped)
>     +				continue;
>     +
>     +			kfd_mem_dmaunmap_attachment(mem, attachment);
>     +			ret = update_gpuvm_pte(mem, attachment, &sync_obj);
>      			if (ret) {
>      				pr_debug("Memory eviction: update PTE failed. Try again\n");
>      				goto validate_map_fail;
>     -- 
>     2.31.1
>
>     _______________________________________________
>     dri-devel mailing list
>     dri-devel@lists.freedesktop.org
>     
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
> s.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7C
> philip.yang%40amd.com%7C81ace888057c4deed29308d9092f3d71%7C3dd8961fe48
> 84e608e11a82d994e183d%7C0%7C0%7C637550920765995756%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000&amp;sdata=lsvr1xaDROEZrpHSNqD%2FCN4yDSccsM3WO2UnGXFXVLc%3D&am
> p;reserved=0
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7C81ace888057c4deed29308d9092f3d71%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637550920766005746%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hfBbfa5v8vjLGxacmrsOS3nLZ72ORvakcGUnHEDBQig%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation
  2021-04-22  1:30 ` [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation Felix Kuehling
@ 2021-05-10 22:06   ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:06 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Wednesday, April 21, 2021 8:31 PM
To: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation

This is needed to avoid deadlocks with DMA buf import in the next patch.
Also move PT/PD validation out of kfd_mem_attach, that way the caller can bo this unconditionally.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 75 +++++++++++--------
 1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d25d886b98c..9eeedd0c7920 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -577,6 +577,34 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
 	}
 }
 
+static int
+kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
+		       struct amdgpu_bo **bo)
+{
+	unsigned long bo_size = mem->bo->tbo.base.size;
+	struct drm_gem_object *gobj;
+	int ret;
+
+	ret = amdgpu_bo_reserve(mem->bo, false);
+	if (ret)
+		return ret;
+
+	ret = amdgpu_gem_object_create(adev, bo_size, 1,
+				       AMDGPU_GEM_DOMAIN_CPU,
+				       0, ttm_bo_type_sg,
+				       mem->bo->tbo.base.resv,
+				       &gobj);
+	if (ret)
+		return ret;
+
+	amdgpu_bo_unreserve(mem->bo);
+
+	*bo = gem_to_amdgpu_bo(gobj);
+	(*bo)->parent = amdgpu_bo_ref(mem->bo);
+
+	return 0;
+}
+
 /* kfd_mem_attach - Add a BO to a VM
  *
  * Everything that needs to bo done only once when a BO is first added @@ -598,7 +626,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 	uint64_t va = mem->va;
 	struct kfd_mem_attachment *attachment[2] = {NULL, NULL};
 	struct amdgpu_bo *bo[2] = {NULL, NULL};
-	struct drm_gem_object *gobj;
 	int i, ret;
 
 	if (!va) {
@@ -632,15 +659,9 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		} else if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
 			/* Create an SG BO to DMA-map userptrs on other GPUs */
 			attachment[i]->type = KFD_MEM_ATT_USERPTR;
-			ret = amdgpu_gem_object_create(adev, bo_size, 1,
-						       AMDGPU_GEM_DOMAIN_CPU,
-						       0, ttm_bo_type_sg,
-						       mem->bo->tbo.base.resv,
-						       &gobj);
+			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
 			if (ret)
 				goto unwind;
-			bo[i] = gem_to_amdgpu_bo(gobj);
-			bo[i]->parent = amdgpu_bo_ref(mem->bo);
 		} else {
 			/* FIXME: Need to DMA-map other BO types */
 			attachment[i]->type = KFD_MEM_ATT_SHARED; @@ -665,13 +686,6 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
 		va += bo_size;
 	}
 
-	/* Allocate validate page tables if needed */
-	ret = vm_validate_pt_pd_bos(vm);
-	if (unlikely(ret)) {
-		pr_err("validate_pt_pd_bos() failed\n");
-		goto unwind;
-	}
-
 	return 0;
 
 unwind:
@@ -1478,12 +1492,12 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
 		mem->va + bo_size * (1 + mem->aql_queue));
 
+	ret = unreserve_bo_and_vms(&ctx, false, false);
+
 	/* Remove from VM internal data structures */
 	list_for_each_entry_safe(entry, tmp, &mem->attachments, list)
 		kfd_mem_detach(entry);
 
-	ret = unreserve_bo_and_vms(&ctx, false, false);
-
 	/* Free the sync object */
 	amdgpu_sync_free(&mem->sync);
 
@@ -1560,6 +1574,12 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 			mem->va + bo_size * (1 + mem->aql_queue),
 			avm, domain_string(domain));
 
+	if (!kfd_mem_is_attached(avm, mem)) {
+		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
+		if (ret)
+			goto out;
+	}
+
 	ret = reserve_bo_and_vm(mem, avm, &ctx);
 	if (unlikely(ret))
 		goto out;
@@ -1573,15 +1593,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	    bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
 		is_invalid_userptr = true;
 
-	if (!kfd_mem_is_attached(avm, mem)) {
-		ret = kfd_mem_attach(adev, mem, avm, mem->aql_queue);
-		if (ret)
-			goto attach_failed;
-	} else {
-		ret = vm_validate_pt_pd_bos(avm);
-		if (unlikely(ret))
-			goto attach_failed;
-	}
+	ret = vm_validate_pt_pd_bos(avm);
+	if (unlikely(ret))
+		goto out_unreserve;
 
 	if (mem->mapped_to_gpu_memory == 0 &&
 	    !amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { @@ -1592,7 +1606,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		ret = amdgpu_amdkfd_bo_validate(bo, domain, true);
 		if (ret) {
 			pr_debug("Validate failed\n");
-			goto map_bo_to_gpuvm_failed;
+			goto out_unreserve;
 		}
 	}
 
@@ -1607,13 +1621,13 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 				      is_invalid_userptr);
 		if (ret) {
 			pr_err("Failed to map bo to gpuvm\n");
-			goto map_bo_to_gpuvm_failed;
+			goto out_unreserve;
 		}
 
 		ret = vm_update_pds(avm, ctx.sync);
 		if (ret) {
 			pr_err("Failed to update page directories\n");
-			goto map_bo_to_gpuvm_failed;
+			goto out_unreserve;
 		}
 
 		entry->is_mapped = true;
@@ -1630,8 +1644,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 
 	goto out;
 
-map_bo_to_gpuvm_failed:
-attach_failed:
+out_unreserve:
 	unreserve_bo_and_vms(&ctx, false, false);
 out:
 	mutex_unlock(&mem->process_info->lock);
--
2.31.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7C1f49b70a3bae4897034908d9052e5a31%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546518914050257%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Bn6mwTHrNa7qGjkihshsT7z7cY1ffs5HsGI69PfZ36I%3D&amp;reserved=0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs
  2021-04-27 15:08         ` Felix Kuehling
@ 2021-05-10 22:07           ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:07 UTC (permalink / raw)
  To: Kuehling, Felix, Zeng, Oak, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Tuesday, April 27, 2021 10:09 AM
To: Zeng, Oak <Oak.Zeng@amd.com>; amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs

Am 2021-04-27 um 10:29 a.m. schrieb Zeng, Oak:
> Regards,
> Oak
>
>  
>
> On 2021-04-26, 11:56 PM, "Kuehling, Felix" <Felix.Kuehling@amd.com> wrote:
>
>     Am 2021-04-26 um 8:35 p.m. schrieb Zeng, Oak:
>     > Regards,
>     > Oak 
>     >
>     >  
>     >
>     > On 2021-04-21, 9:31 PM, "amd-gfx on behalf of Felix Kuehling" <amd-gfx-bounces@lists.freedesktop.org on behalf of Felix.Kuehling@amd.com> wrote:
>     >
>     >     Use DMABufs with dynamic attachment to DMA-map GTT BOs on other GPUs.
>     >
>     >     Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>     >     ---
>     >      drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +
>     >      .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 76 ++++++++++++++++++-
>     >      2 files changed, 77 insertions(+), 1 deletion(-)
>     >
>     >     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     >     index 63668433f5a6..b706e5a54782 100644
>     >     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     >     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>     >     @@ -41,6 +41,7 @@ struct amdgpu_device;
>     >      enum kfd_mem_attachment_type {
>     >      	KFD_MEM_ATT_SHARED,	/* Share kgd_mem->bo or another attachment's */
>     >      	KFD_MEM_ATT_USERPTR,	/* SG bo to DMA map pages from a userptr bo */
>     >     +	KFD_MEM_ATT_DMABUF,	/* DMAbuf to DMA map TTM BOs */
>     >      };
>     >
>     >      struct kfd_mem_attachment {
>     >     @@ -56,6 +57,7 @@ struct kfd_mem_attachment {
>     >      struct kgd_mem {
>     >      	struct mutex lock;
>     >      	struct amdgpu_bo *bo;
>     >     +	struct dma_buf *dmabuf;
>     >      	struct list_head attachments;
>     >      	/* protected by amdkfd_process_info.lock */
>     >      	struct ttm_validate_buffer validate_list;
>     >     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     >     index 9eeedd0c7920..18a1f9222a59 100644
>     >     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     >     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>     >     @@ -524,6 +524,16 @@ kfd_mem_dmamap_userptr(struct kgd_mem *mem,
>     >      	return ret;
>     >      }
>     >
>     >     +static int
>     >     +kfd_mem_dmamap_dmabuf(struct kfd_mem_attachment *attachment)
>     >     +{
>     >     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     >     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     >     +
>     >     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_GTT);
>     >     +	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     > How does this work? The function name says this is dma mapping a 
> buffer but from the implementation, it is just a placement and 
> validation
>
>     Conceptually, calling ttm_bo_validate ensures that the BO is in the
>     specified domain, in this case GTT. Before calling validate, it can be
>     in the CPU domain, which means it may be swapped to disk so it's not GPU
>     accessible. For a DMABuf attachment, the CPU domain means, that the
>     DMABuf is not attached because the underlying memory object may be on
>     the move or swapped out.
>
>     The actual implementation of the dmabuf attachment is currently in
>     amdgpu_ttm_populate/unpopulate. This is incorrect. Patch 10 in this
>     series fixes that to move the actual dmabuf attachment into
>     amdgpu_ttm_backend_bind/unbind, which is called from amdgpu_bo_move when
>     a BO is moved between the CPU and GTT domains.
>
> Thanks for the explanation. One more thing I don't quite understand: before this series, GTT memory should already has been validated somewhere before GTT memory is mapped to GPU. You added GTT memory validation here - will this validation be duplicated?

When you have N GPUs there are now N BOs involved. Each GPU needs its own BO because it needs its own DMA mapping. There will be one actual GTT BO that allocates physical pages in TTM. The other BOs are dmabuf imports that DMA-map the same physical pages for access by the other GPUs.

The validate call here validates one of the dmabuf imports. This does not duplicate the validation of the underlying TTM BO with the actual physical memory allocation.


>
> The function naming kfd_mem_dmamap_dmabuf is still confusing since it seems to me it is only some preparation work before dynamically dma-map a GTT memory.

No, this series is not just preparation. It implements DMA mapping of BOs for multiple GPUs. TTM already handles DMA mapping of the memory for the device where the memory was allocated. (Yes, even GTT memory is associated with a specific GPU even though it's physically in system memory). What this patch series adds, is additional DMA mappings for the other GPUs. Without this patch, we were using the DMA mapping for GPU-1 in the page table of GPU-X, which is incorrect. It works in many cases where the DMA mapping is a direct mapping:

  * IOMMU disabled
  * IOMMU in passthrough mode

But it breaks when you have multiple GPUs with an IOMMU that's not disabled or in passthrough mode.


>  But I understand from this series' perspective, compared to usrptr (where you actually do the dma-mapping in function kfd_mem_dmamap_usrptr), for gtt memory you leveraged the amdgpu ttm function of dynamic dma-mapping. So maybe the naming here makes sense from that perspective.

Yes.


>
> Another thing related but not directly to this series: for GTT memory, it is dma-mapped when it is allocated. See function ttm_populate_and_map_pages calling dma_map_page. The question is, will gtt be first dma-unmapping before it is mapped in amdgpu_ttm_backend_bind? It is existing work, not from your series. Maybe there is not issue but I just want to make sure while we are looking at this area. 

Right. The problem is, that the DMA mappings only work for a specific device. Using the same DMA mapping on multiple devices is broken. The reason we got away with it for a long time is, that we were running with IOMMU disabled or in passthrough mode.

Regards,
  Felix


>
>     Regards,
>       Felix
>
>
>     >     +}
>     >     +
>     >      static int
>     >      kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>     >      			  struct kfd_mem_attachment *attachment)
>     >     @@ -533,6 +543,8 @@ kfd_mem_dmamap_attachment(struct kgd_mem *mem,
>     >      		return 0;
>     >      	case KFD_MEM_ATT_USERPTR:
>     >      		return kfd_mem_dmamap_userptr(mem, attachment);
>     >     +	case KFD_MEM_ATT_DMABUF:
>     >     +		return kfd_mem_dmamap_dmabuf(attachment);
>     >      	default:
>     >      		WARN_ON_ONCE(1);
>     >      	}
>     >     @@ -562,6 +574,19 @@ kfd_mem_dmaunmap_userptr(struct kgd_mem *mem,
>     >      	ttm->sg = NULL;
>     >      }
>     >
>     >     +static void
>     >     +kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment *attachment)
>     >     +{
>     >     +	struct ttm_operation_ctx ctx = {.interruptible = true};
>     >     +	struct amdgpu_bo *bo = attachment->bo_va->base.bo;
>     >     +
>     >     +	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>     >     +	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>     >     +	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
>     >     +	 * called
>     >     +	 */
>     >     +}
>     >     +
>     >      static void
>     >      kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>     >      			    struct kfd_mem_attachment *attachment)
>     >     @@ -572,6 +597,9 @@ kfd_mem_dmaunmap_attachment(struct kgd_mem *mem,
>     >      	case KFD_MEM_ATT_USERPTR:
>     >      		kfd_mem_dmaunmap_userptr(mem, attachment);
>     >      		break;
>     >     +	case KFD_MEM_ATT_DMABUF:
>     >     +		kfd_mem_dmaunmap_dmabuf(attachment);
>     >     +		break;
>     >      	default:
>     >      		WARN_ON_ONCE(1);
>     >      	}
>     >     @@ -605,6 +633,38 @@ kfd_mem_attach_userptr(struct amdgpu_device *adev, struct kgd_mem *mem,
>     >      	return 0;
>     >      }
>     >
>     >     +static int
>     >     +kfd_mem_attach_dmabuf(struct amdgpu_device *adev, struct kgd_mem *mem,
>     >     +		      struct amdgpu_bo **bo)
>     >     +{
>     >     +	struct drm_gem_object *gobj;
>     >     +
>     >     +	if (!mem->dmabuf) {
>     >     +		mem->dmabuf = amdgpu_gem_prime_export(&mem->bo->tbo.base,
>     >     +			mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ?
>     >     +				DRM_RDWR : 0);
>     >     +		if (IS_ERR(mem->dmabuf)) {
>     >     +			mem->dmabuf = NULL;
>     >     +			return PTR_ERR(mem->dmabuf);
>     >     +		}
>     >     +	}
>     >     +
>     >     +	gobj = amdgpu_gem_prime_import(&adev->ddev, mem->dmabuf);
>     >     +	if (IS_ERR(gobj))
>     >     +		return PTR_ERR(gobj);
>     >     +
>     >     +	/* Import takes an extra reference on the dmabuf. Drop it now to
>     >     +	 * avoid leaking it. We only need the one reference in
>     >     +	 * kgd_mem->dmabuf.
>     >     +	 */
>     >     +	dma_buf_put(mem->dmabuf);
>     >     +
>     >     +	*bo = gem_to_amdgpu_bo(gobj);
>     >     +	(*bo)->parent = amdgpu_bo_ref(mem->bo);
>     >     +
>     >     +	return 0;
>     >     +}
>     >     +
>     >      /* kfd_mem_attach - Add a BO to a VM
>     >       *
>     >       * Everything that needs to bo done only once when a BO is first added
>     >     @@ -662,8 +722,20 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem,
>     >      			ret = kfd_mem_attach_userptr(adev, mem, &bo[i]);
>     >      			if (ret)
>     >      				goto unwind;
>     >     +		} else if (mem->domain == AMDGPU_GEM_DOMAIN_GTT &&
>     >     +			   mem->bo->tbo.type != ttm_bo_type_sg) {
>     >     +			/* GTT BOs use DMA-mapping ability of dynamic-attach
>     >     +			 * DMA bufs. TODO: The same should work for VRAM on
>     >     +			 * large-BAR GPUs.
>     >     +			 */
>     >     +			attachment[i]->type = KFD_MEM_ATT_DMABUF;
>     >     +			ret = kfd_mem_attach_dmabuf(adev, mem, &bo[i]);
>     >     +			if (ret)
>     >     +				goto unwind;
>     >      		} else {
>     >     -			/* FIXME: Need to DMA-map other BO types */
>     >     +			/* FIXME: Need to DMA-map other BO types:
>     >     +			 * large-BAR VRAM, doorbells, MMIO remap
>     >     +			 */
>     >      			attachment[i]->type = KFD_MEM_ATT_SHARED;
>     >      			bo[i] = mem->bo;
>     >      			drm_gem_object_get(&bo[i]->tbo.base);
>     >     @@ -1522,6 +1594,8 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
>     >
>     >      	/* Free the BO*/
>     >      	drm_vma_node_revoke(&mem->bo->tbo.base.vma_node, drm_priv);
>     >     +	if (mem->dmabuf)
>     >     +		dma_buf_put(mem->dmabuf);
>     >      	drm_gem_object_put(&mem->bo->tbo.base);
>     >      	mutex_destroy(&mem->lock);
>     >      	kfree(mem);
>     >     -- 
>     >     2.31.1
>     >
>     >     _______________________________________________
>     >     amd-gfx mailing list
>     >     amd-gfx@lists.freedesktop.org
>     >     https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7Cf3c9c824ef6447cbe26808d9098e6606%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637551329471854747%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=P7SYxlhPoYfyGCNaiiCA%2BaqJS%2BxvGZEMIZkuvqCpCLI%3D&amp;reserved=0
>     >
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7Cf3c9c824ef6447cbe26808d9098e6606%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637551329471854747%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=P7SYxlhPoYfyGCNaiiCA%2BaqJS%2BxvGZEMIZkuvqCpCLI%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit
  2021-04-22  1:30 ` [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit Felix Kuehling
@ 2021-05-10 22:08   ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:08 UTC (permalink / raw)
  To: Kuehling, Felix, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Kuehling, Felix
Sent: Wednesday, April 21, 2021 8:31 PM
To: amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit

Pages in SG BOs were not allocated by TTM. So don't count them against TTM's pages limit.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index 5d8820725b75..e8b8c3257392 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -317,9 +317,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	if (ttm_tt_is_populated(ttm))
 		return 0;
 
-	atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
-	if (bdev->pool.use_dma32)
-		atomic_long_add(ttm->num_pages, &ttm_dma32_pages_allocated);
+	if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
+		if (bdev->pool.use_dma32)
+			atomic_long_add(ttm->num_pages,
+					&ttm_dma32_pages_allocated);
+	}
 
 	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
 	       atomic_long_read(&ttm_dma32_pages_allocated) > @@ -350,9 +353,12 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	return 0;
 
 error:
-	atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-	if (bdev->pool.use_dma32)
-		atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+	if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+		if (bdev->pool.use_dma32)
+			atomic_long_sub(ttm->num_pages,
+					&ttm_dma32_pages_allocated);
+	}
 	return ret;
 }
 EXPORT_SYMBOL(ttm_tt_populate);
@@ -382,9 +388,12 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	else
 		ttm_pool_free(&bdev->pool, ttm);
 
-	atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-	if (bdev->pool.use_dma32)
-		atomic_long_sub(ttm->num_pages, &ttm_dma32_pages_allocated);
+	if (!(ttm->page_flags & TTM_PAGE_FLAG_SG)) {
+		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
+		if (bdev->pool.use_dma32)
+			atomic_long_sub(ttm->num_pages,
+					&ttm_dma32_pages_allocated);
+	}
 
 	ttm->page_flags &= ~TTM_PAGE_FLAG_PRIV_POPULATED;  }
--
2.31.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7C2c56b6451f56454af1ed08d9052e6395%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546519067581184%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eMVIddYdHMgr4TKyeS1fsjQbYVKvzg5D2EgzreknCEI%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind
  2021-04-22 11:20   ` Christian König
@ 2021-05-10 22:09     ` Errabolu, Ramesh
  0 siblings, 0 replies; 32+ messages in thread
From: Errabolu, Ramesh @ 2021-05-10 22:09 UTC (permalink / raw)
  To: Christian König, Kuehling, Felix, amd-gfx, dri-devel

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Ramesh Errabolu <ramesh.errabolu@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Thursday, April 22, 2021 6:20 AM
To: Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind

Am 22.04.21 um 03:30 schrieb Felix Kuehling:
> The dmabuf attachment should be updated by moving the SG BO to 
> DOMAIN_CPU and back to DOMAIN_GTT. This does not necessarily invoke 
> the populate/unpopulate callbacks. Do this in backend_bind/unbind instead.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  3 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 51 +++++++++----------
>   2 files changed, 25 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 18a1f9222a59..68e6ce8dcf33 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -582,9 +582,6 @@ kfd_mem_dmaunmap_dmabuf(struct kfd_mem_attachment 
> *attachment)
>   
>   	amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>   	ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> -	/* FIXME: This does not guarantee that amdgpu_ttm_tt_unpopulate is
> -	 * called
> -	 */
>   }
>   
>   static void
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 7e7d8330d64b..fc2a8d681dbc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -910,7 +910,23 @@ static int amdgpu_ttm_backend_bind(struct ttm_device *bdev,
>   			DRM_ERROR("failed to pin userptr\n");
>   			return r;
>   		}
> +	} else if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
> +		if (!ttm->sg) {
> +			struct dma_buf_attachment *attach;
> +			struct sg_table *sgt;
> +
> +			attach = gtt->gobj->import_attach;
> +			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> +			if (IS_ERR(sgt))
> +				return PTR_ERR(sgt);
> +
> +			ttm->sg = sgt;
> +		}
> +
> +		drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
> +					       ttm->num_pages);
>   	}
> +
>   	if (!ttm->num_pages) {
>   		WARN(1, "nothing to bind %u pages for mreg %p back %p!\n",
>   		     ttm->num_pages, bo_mem, ttm); @@ -1037,8 +1053,15 @@ static 
> void amdgpu_ttm_backend_unbind(struct ttm_device *bdev,
>   	int r;
>   
>   	/* if the pages have userptr pinning then clear that first */
> -	if (gtt->userptr)
> +	if (gtt->userptr) {
>   		amdgpu_ttm_tt_unpin_userptr(bdev, ttm);
> +	} else if (ttm->sg && gtt->gobj->import_attach) {
> +		struct dma_buf_attachment *attach;
> +
> +		attach = gtt->gobj->import_attach;
> +		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
> +		ttm->sg = NULL;
> +	}
>   
>   	if (!gtt->bound)
>   		return;
> @@ -1125,23 +1148,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
>   		return 0;
>   	}
>   
> -	if (ttm->page_flags & TTM_PAGE_FLAG_SG) {
> -		if (!ttm->sg) {
> -			struct dma_buf_attachment *attach;
> -			struct sg_table *sgt;
> -
> -			attach = gtt->gobj->import_attach;
> -			sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> -			if (IS_ERR(sgt))
> -				return PTR_ERR(sgt);
> -
> -			ttm->sg = sgt;
> -		}
> -
> -		drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
> -					       ttm->num_pages);
> +	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
>   		return 0;
> -	}
>   
>   	return ttm_pool_alloc(&adev->mman.bdev.pool, ttm, ctx);
>   }
> @@ -1165,15 +1173,6 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
>   		return;
>   	}
>   
> -	if (ttm->sg && gtt->gobj->import_attach) {
> -		struct dma_buf_attachment *attach;
> -
> -		attach = gtt->gobj->import_attach;
> -		dma_buf_unmap_attachment(attach, ttm->sg, DMA_BIDIRECTIONAL);
> -		ttm->sg = NULL;
> -		return;
> -	}
> -
>   	if (ttm->page_flags & TTM_PAGE_FLAG_SG)
>   		return;
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cphilip.yang%40amd.com%7Cd9f0ee2cba5845a90d6b08d905809df0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546872236542187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vEv%2FmeIFcKXtEwwopw9AM8AW2OSFxLNFsd8gom%2FA6fs%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-05-10 22:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22  1:30 [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Felix Kuehling
2021-04-22  1:30 ` [PATCH v2 01/10] rock-dbg_defconfig: Enable Intel IOMMU Felix Kuehling
2021-04-22  1:30 ` [PATCH v2 02/10] drm/amdgpu: Rename kfd_bo_va_list to kfd_mem_attachment Felix Kuehling
2021-05-10 22:00   ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 03/10] drm/amdgpu: Keep a bo-reference per-attachment Felix Kuehling
2021-05-10 22:00   ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 04/10] drm/amdgpu: Simplify AQL queue mapping Felix Kuehling
2021-04-23  1:33   ` Zeng, Oak
2021-04-23  7:23     ` Felix Kuehling
2021-05-10 22:03       ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 05/10] drm/amdgpu: Add multi-GPU DMA mapping helpers Felix Kuehling
2021-04-27  0:09   ` Zeng, Oak
2021-04-27  3:41     ` Felix Kuehling
2021-05-10 22:05       ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 06/10] drm/amdgpu: DMA map/unmap when updating GPU mappings Felix Kuehling
2021-04-27  0:23   ` Zeng, Oak
2021-04-27  3:47     ` Felix Kuehling
2021-05-10 22:06       ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 07/10] drm/amdgpu: Move kfd_mem_attach outside reservation Felix Kuehling
2021-05-10 22:06   ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 08/10] drm/amdgpu: Add DMA mapping of GTT BOs Felix Kuehling
2021-04-27  0:35   ` Zeng, Oak
2021-04-27  3:56     ` Felix Kuehling
2021-04-27 14:29       ` Zeng, Oak
2021-04-27 15:08         ` Felix Kuehling
2021-05-10 22:07           ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 09/10] drm/ttm: Don't count pages in SG BOs against pages_limit Felix Kuehling
2021-05-10 22:08   ` Errabolu, Ramesh
2021-04-22  1:30 ` [PATCH v2 10/10] drm/amdgpu: Move dmabuf attach/detach to backend_(un)bind Felix Kuehling
2021-04-22 11:20   ` Christian König
2021-05-10 22:09     ` Errabolu, Ramesh
2021-04-27 15:16 ` [PATCH v2 00/10] Implement multi-GPU DMA mappings for KFD Zeng, Oak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).