amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Best effort contiguous VRAM allocation
@ 2024-04-12 20:12 Philip Yang
  2024-04-12 20:12 ` [PATCH 1/6] drm/amdgpu: Support " Philip Yang
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:12 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

This patch series implement new KFD memory alloc flag for best effort contiguous
VRAM allocation, to support peer direct access RDMA device with limited scatter-gather
dma capability.

Philip Yang (6):
  drm/amdgpu: Support contiguous VRAM allocation
  drm/amdgpu: Evict BOs from same process for contiguous allocation
  drm/amdkfd: Evict BO itself for contiguous allocation
  drm/amdkfd: Increase KFD bo restore wait time
  drm/amdgpu: Skip dma map resource for null RDMA device
  drm/amdkfd: Bump kfd version for contiguous VRAM allocation

 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 20 +++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 42 ++++++++++++-------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 +-
 include/uapi/drm/amdgpu_drm.h                 |  5 +++
 include/uapi/linux/kfd_ioctl.h                |  4 +-
 6 files changed, 57 insertions(+), 19 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation
  2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
@ 2024-04-12 20:12 ` Philip Yang
  2024-04-15 12:02   ` Christian König
  2024-04-12 20:12 ` [PATCH 2/6] drm/amdgpu: Evict BOs from same process for contiguous allocation Philip Yang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:12 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

RDMA device with limited scatter-gather capability requires physical
address contiguous VRAM buffer for RDMA peer direct access.

Add a new KFD alloc memory flag and store as new GEM bo alloc flag. When
pin this buffer object to export for RDMA peerdirect access, set
AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag, and then vram_mgr will set
TTM_PL_FLAG_CONTIFUOUS flag to ask VRAM buddy allocator to get
contiguous VRAM.

Remove the 2GB max memory block size limit for contiguous allocation.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c     | 9 +++++++--
 include/uapi/drm/amdgpu_drm.h                    | 5 +++++
 include/uapi/linux/kfd_ioctl.h                   | 1 +
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..3523b91f8add 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1470,6 +1470,9 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain)
 	if (unlikely(ret))
 		return ret;
 
+	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT)
+		bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
+
 	ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
 	if (ret)
 		pr_err("Error in Pinning BO to domain: %d\n", domain);
@@ -1712,6 +1715,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 			alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
 			alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ?
 			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
+
+			/* For contiguous VRAM allocation */
+			if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT)
+				alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT;
 		}
 		xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ?
 					0 : fpriv->xcp_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 8db880244324..1d6e45e238e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -516,8 +516,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 
 		BUG_ON(min_block_size < mm->chunk_size);
 
-		/* Limit maximum size to 2GiB due to SG table limitations */
-		size = min(remaining_size, 2ULL << 30);
+		if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
+			size = remaining_size;
+		else
+			/* Limit maximum size to 2GiB due to SG table limitations
+			 * for no contiguous allocation.
+			 */
+			size = min(remaining_size, 2ULL << 30);
 
 		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
 				!(size & (((u64)pages_per_block << PAGE_SHIFT) - 1)))
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index ad21c613fec8..13645abb8e46 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -171,6 +171,11 @@ extern "C" {
  * may override the MTYPE selected in AMDGPU_VA_OP_MAP.
  */
 #define AMDGPU_GEM_CREATE_EXT_COHERENT		(1 << 15)
+/* Flag that allocating the BO with best effort for contiguous VRAM.
+ * If no contiguous VRAM, fallback to scattered allocation.
+ * Pin the BO for peerdirect RDMA trigger VRAM defragmentation.
+ */
+#define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT	(1 << 16)
 
 struct drm_amdgpu_gem_create_in  {
 	/** the requested memory size */
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 2040a470ddb4..c1394c162d4e 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args {
 #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
 #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED	(1 << 25)
 #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT	(1 << 24)
+#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT	(1 << 23)
 
 /* Allocate memory for later SVM (shared virtual memory) mapping.
  *
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/6] drm/amdgpu: Evict BOs from same process for contiguous allocation
  2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
  2024-04-12 20:12 ` [PATCH 1/6] drm/amdgpu: Support " Philip Yang
@ 2024-04-12 20:12 ` Philip Yang
  2024-04-12 20:12 ` [PATCH 3/6] drm/amdkfd: Evict BO itself " Philip Yang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:12 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

When TTM failed to alloc VRAM, TTM evict BOs from VRAM to system memory
then retry the allocation, this currently skips the KFD BOs from the
same process because KFD requires all BOs are resident for user queues.

If TTM BO with TTM_PL_FLAG_CONTIGUOUS flag to alloc contiguous VRAM,
allow TTM evict KFD BOs from the same process, this will evict the user
queues first, and restore the queues later after contiguous VRAM
allocation.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a5ceec7820cf..00b8603d73e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1383,7 +1383,8 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 	 */
 	dma_resv_for_each_fence(&resv_cursor, bo->base.resv,
 				DMA_RESV_USAGE_BOOKKEEP, f) {
-		if (amdkfd_fence_check_mm(f, current->mm))
+		if (amdkfd_fence_check_mm(f, current->mm) &&
+		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
 			return false;
 	}
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/6] drm/amdkfd: Evict BO itself for contiguous allocation
  2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
  2024-04-12 20:12 ` [PATCH 1/6] drm/amdgpu: Support " Philip Yang
  2024-04-12 20:12 ` [PATCH 2/6] drm/amdgpu: Evict BOs from same process for contiguous allocation Philip Yang
@ 2024-04-12 20:12 ` Philip Yang
  2024-04-12 20:12 ` [PATCH 4/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:12 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

If the BO pages pinned for RDMA is not contiguous on VRAM, evict it to
system memory first to free the VRAM space, then allocate contiguous
VRAM and then move it from system memory back to VRAM.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 3523b91f8add..9506de1094ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1470,8 +1470,21 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain)
 	if (unlikely(ret))
 		return ret;
 
-	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT)
+	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT) {
+		/*
+		 * If bo is not contiguous on VRAM, move to system memory first to ensure
+		 * we can get contiguous VRAM space after evicting other BOs.
+		 */
+		if (!(bo->tbo.resource->placement & TTM_PL_FLAG_CONTIGUOUS)) {
+			ret = amdgpu_amdkfd_bo_validate(bo, AMDGPU_GEM_DOMAIN_GTT, false);
+			if (unlikely(ret)) {
+				pr_debug("validate bo 0x%p to GTT failed %d\n", &bo->tbo, ret);
+				return ret;
+			}
+		}
+
 		bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
+	}
 
 	ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
 	if (ret)
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/6] drm/amdkfd: Increase KFD bo restore wait time
  2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
                   ` (2 preceding siblings ...)
  2024-04-12 20:12 ` [PATCH 3/6] drm/amdkfd: Evict BO itself " Philip Yang
@ 2024-04-12 20:12 ` Philip Yang
  2024-04-12 20:13 ` [PATCH 5/6] drm/amdgpu: Skip dma map resource for null RDMA device Philip Yang
  2024-04-12 20:13 ` [PATCH 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation Philip Yang
  5 siblings, 0 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:12 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

TTM allocate contiguous VRAM may takes more than 1 second to evict BOs
for larger size RDMA buffer. Because KFD restore bo worker reserves all
KFD BOs, then TTM cannot hold the remainning KFD BOs lock to evict them,
this may causes TTM failed to alloc contiguous VRAM.

Increase the KFD restore BO wait time to 2 seconds, long enough for RDMA
pin BO to finish the contiguous VRAM allocation.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index a81ef232fdef..c205e2d3acf9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -698,7 +698,7 @@ struct qcm_process_device {
 /* KFD Memory Eviction */
 
 /* Approx. wait time before attempting to restore evicted BOs */
-#define PROCESS_RESTORE_TIME_MS 100
+#define PROCESS_RESTORE_TIME_MS 2000
 /* Approx. back off time if restore fails due to lack of memory */
 #define PROCESS_BACK_OFF_TIME_MS 100
 /* Approx. time before evicting the process again */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/6] drm/amdgpu: Skip dma map resource for null RDMA device
  2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
                   ` (3 preceding siblings ...)
  2024-04-12 20:12 ` [PATCH 4/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
@ 2024-04-12 20:13 ` Philip Yang
  2024-04-12 20:13 ` [PATCH 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation Philip Yang
  5 siblings, 0 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:13 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

To test RDMA using dummy driver on the system without NIC/RDMA
device, the get dma pages pass in null device pointer, skip the
dma map resource to avoid null device pointer access.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 33 +++++++++++---------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 1d6e45e238e1..93fb63f4dae5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -674,12 +674,15 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 		size_t size = cursor.size;
 		dma_addr_t addr;
 
-		addr = dma_map_resource(dev, phys, size, dir,
-					DMA_ATTR_SKIP_CPU_SYNC);
-		r = dma_mapping_error(dev, addr);
-		if (r)
-			goto error_unmap;
-
+		if (dev) {
+			addr = dma_map_resource(dev, phys, size, dir,
+						DMA_ATTR_SKIP_CPU_SYNC);
+			r = dma_mapping_error(dev, addr);
+			if (r)
+				goto error_unmap;
+		} else {
+			addr = phys;
+		}
 		sg_set_page(sg, NULL, size, 0);
 		sg_dma_address(sg) = addr;
 		sg_dma_len(sg) = size;
@@ -693,10 +696,10 @@ int amdgpu_vram_mgr_alloc_sgt(struct amdgpu_device *adev,
 	for_each_sgtable_sg((*sgt), sg, i) {
 		if (!sg->length)
 			continue;
-
-		dma_unmap_resource(dev, sg->dma_address,
-				   sg->length, dir,
-				   DMA_ATTR_SKIP_CPU_SYNC);
+		if (dev)
+			dma_unmap_resource(dev, sg->dma_address,
+					   sg->length, dir,
+					   DMA_ATTR_SKIP_CPU_SYNC);
 	}
 	sg_free_table(*sgt);
 
@@ -721,10 +724,12 @@ void amdgpu_vram_mgr_free_sgt(struct device *dev,
 	struct scatterlist *sg;
 	int i;
 
-	for_each_sgtable_sg(sgt, sg, i)
-		dma_unmap_resource(dev, sg->dma_address,
-				   sg->length, dir,
-				   DMA_ATTR_SKIP_CPU_SYNC);
+	if (dev) {
+		for_each_sgtable_sg(sgt, sg, i)
+			dma_unmap_resource(dev, sg->dma_address,
+					   sg->length, dir,
+					   DMA_ATTR_SKIP_CPU_SYNC);
+	}
 	sg_free_table(sgt);
 	kfree(sgt);
 }
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation
  2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
                   ` (4 preceding siblings ...)
  2024-04-12 20:13 ` [PATCH 5/6] drm/amdgpu: Skip dma map resource for null RDMA device Philip Yang
@ 2024-04-12 20:13 ` Philip Yang
  5 siblings, 0 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-12 20:13 UTC (permalink / raw)
  To: amd-gfx
  Cc: Felix.Kuehling, christian.koenig, Arunpravin.PaneerSelvam, Philip Yang

Bump the kfd ioctl minor version to delcare the contiguous VRAM
allocation flag support.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index c1394c162d4e..a0af2ef696ea 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -41,9 +41,10 @@
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
  * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl
+ * - 1.16 - Add contiguous VRAM allocation flag for RDMA
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 15
+#define KFD_IOCTL_MINOR_VERSION 16
 
 struct kfd_ioctl_get_version_args {
 	__u32 major_version;	/* from KFD */
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation
  2024-04-12 20:12 ` [PATCH 1/6] drm/amdgpu: Support " Philip Yang
@ 2024-04-15 12:02   ` Christian König
  2024-04-16 14:39     ` Philip Yang
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2024-04-15 12:02 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: Felix.Kuehling, Arunpravin.PaneerSelvam

Am 12.04.24 um 22:12 schrieb Philip Yang:
> RDMA device with limited scatter-gather capability requires physical
> address contiguous VRAM buffer for RDMA peer direct access.
>
> Add a new KFD alloc memory flag and store as new GEM bo alloc flag. When
> pin this buffer object to export for RDMA peerdirect access, set
> AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flag, and then vram_mgr will set
> TTM_PL_FLAG_CONTIFUOUS flag to ask VRAM buddy allocator to get
> contiguous VRAM.
>
> Remove the 2GB max memory block size limit for contiguous allocation.

I'm going to sync up with Arun on this once more, but I think we won't 
even need the new flag.

We will just downgrade the existing flag to be a best effort allocation 
for contiguous buffers and only use the TTM flag internally to signal 
that we need to alter it while pinning.

Regards,
Christian.

>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 7 +++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c     | 9 +++++++--
>   include/uapi/drm/amdgpu_drm.h                    | 5 +++++
>   include/uapi/linux/kfd_ioctl.h                   | 1 +
>   4 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 0ae9fd844623..3523b91f8add 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1470,6 +1470,9 @@ static int amdgpu_amdkfd_gpuvm_pin_bo(struct amdgpu_bo *bo, u32 domain)
>   	if (unlikely(ret))
>   		return ret;
>   
> +	if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT)
> +		bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
> +
>   	ret = amdgpu_bo_pin_restricted(bo, domain, 0, 0);
>   	if (ret)
>   		pr_err("Error in Pinning BO to domain: %d\n", domain);
> @@ -1712,6 +1715,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   			alloc_flags = AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>   			alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ?
>   			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0;
> +
> +			/* For contiguous VRAM allocation */
> +			if (flags & KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT)
> +				alloc_flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT;
>   		}
>   		xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ?
>   					0 : fpriv->xcp_id;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 8db880244324..1d6e45e238e1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -516,8 +516,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>   
>   		BUG_ON(min_block_size < mm->chunk_size);
>   
> -		/* Limit maximum size to 2GiB due to SG table limitations */
> -		size = min(remaining_size, 2ULL << 30);
> +		if (place->flags & TTM_PL_FLAG_CONTIGUOUS)
> +			size = remaining_size;
> +		else
> +			/* Limit maximum size to 2GiB due to SG table limitations
> +			 * for no contiguous allocation.
> +			 */
> +			size = min(remaining_size, 2ULL << 30);
>   
>   		if ((size >= (u64)pages_per_block << PAGE_SHIFT) &&
>   				!(size & (((u64)pages_per_block << PAGE_SHIFT) - 1)))
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index ad21c613fec8..13645abb8e46 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -171,6 +171,11 @@ extern "C" {
>    * may override the MTYPE selected in AMDGPU_VA_OP_MAP.
>    */
>   #define AMDGPU_GEM_CREATE_EXT_COHERENT		(1 << 15)
> +/* Flag that allocating the BO with best effort for contiguous VRAM.
> + * If no contiguous VRAM, fallback to scattered allocation.
> + * Pin the BO for peerdirect RDMA trigger VRAM defragmentation.
> + */
> +#define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS_BEST_EFFORT	(1 << 16)
>   
>   struct drm_amdgpu_gem_create_in  {
>   	/** the requested memory size */
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 2040a470ddb4..c1394c162d4e 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -407,6 +407,7 @@ struct kfd_ioctl_acquire_vm_args {
>   #define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
>   #define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED	(1 << 25)
>   #define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT	(1 << 24)
> +#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS_BEST_EFFORT	(1 << 23)
>   
>   /* Allocate memory for later SVM (shared virtual memory) mapping.
>    *


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/6] drm/amdgpu: Support contiguous VRAM allocation
  2024-04-15 12:02   ` Christian König
@ 2024-04-16 14:39     ` Philip Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Philip Yang @ 2024-04-16 14:39 UTC (permalink / raw)
  To: Christian König, Philip Yang, amd-gfx
  Cc: Felix.Kuehling, Arunpravin.PaneerSelvam

[-- Attachment #1: Type: text/html, Size: 9620 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-16 14:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-12 20:12 [PATCH 0/6] Best effort contiguous VRAM allocation Philip Yang
2024-04-12 20:12 ` [PATCH 1/6] drm/amdgpu: Support " Philip Yang
2024-04-15 12:02   ` Christian König
2024-04-16 14:39     ` Philip Yang
2024-04-12 20:12 ` [PATCH 2/6] drm/amdgpu: Evict BOs from same process for contiguous allocation Philip Yang
2024-04-12 20:12 ` [PATCH 3/6] drm/amdkfd: Evict BO itself " Philip Yang
2024-04-12 20:12 ` [PATCH 4/6] drm/amdkfd: Increase KFD bo restore wait time Philip Yang
2024-04-12 20:13 ` [PATCH 5/6] drm/amdgpu: Skip dma map resource for null RDMA device Philip Yang
2024-04-12 20:13 ` [PATCH 6/6] drm/amdkfd: Bump kfd version for contiguous VRAM allocation Philip Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).