All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem
@ 2022-05-19 16:21 Alex Sierra
  2022-05-19 16:21 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
  2022-05-19 23:09 ` [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Felix Kuehling
  0 siblings, 2 replies; 15+ messages in thread
From: Alex Sierra @ 2022-05-19 16:21 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Sierra, Philip Yang

TTM used to track the "acc_size" of all BOs internally. We needed to
keep track of it in our memory reservation to avoid TTM running out
of memory in its own accounting. However, that "acc_size" accounting
has since been removed from TTM. Therefore we don't really need to
track it any more.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 59 ++++++-------------
 1 file changed, 17 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index fada3b149361..966714dd764b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -108,21 +108,12 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
  * compromise that should work in most cases without reserving too
  * much memory for page tables unnecessarily (factor 16K, >> 14).
  */
-#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
-
-static size_t amdgpu_amdkfd_acc_size(uint64_t size)
-{
-	size >>= PAGE_SHIFT;
-	size *= sizeof(dma_addr_t) + sizeof(void *);
 
-	return __roundup_pow_of_two(sizeof(struct amdgpu_bo)) +
-		__roundup_pow_of_two(sizeof(struct ttm_tt)) +
-		PAGE_ALIGN(size);
-}
+#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
 
 /**
  * amdgpu_amdkfd_reserve_mem_limit() - Decrease available memory by size
- * of buffer including any reserved for control structures
+ * of buffer.
  *
  * @adev: Device to which allocated BO belongs to
  * @size: Size of buffer, in bytes, encapsulated by B0. This should be
@@ -136,28 +127,22 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 {
 	uint64_t reserved_for_pt =
 		ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
-	size_t acc_size, system_mem_needed, ttm_mem_needed, vram_needed;
+	size_t system_mem_needed, ttm_mem_needed, vram_needed;
 	int ret = 0;
 
-	acc_size = amdgpu_amdkfd_acc_size(size);
-
+	system_mem_needed = 0;
+	ttm_mem_needed = 0;
 	vram_needed = 0;
 	if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
-		system_mem_needed = acc_size + size;
-		ttm_mem_needed = acc_size + size;
+		system_mem_needed = size;
+		ttm_mem_needed = size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
-		system_mem_needed = acc_size;
-		ttm_mem_needed = acc_size;
 		vram_needed = size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
-		system_mem_needed = acc_size + size;
-		ttm_mem_needed = acc_size;
-	} else if (alloc_flag &
-		   (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
-		    KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
-		system_mem_needed = acc_size;
-		ttm_mem_needed = acc_size;
-	} else {
+		system_mem_needed = size;
+	} else if (!(alloc_flag &
+				(KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+				 KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
 		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
 		return -ENOMEM;
 	}
@@ -193,28 +178,18 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 static void unreserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
-	size_t acc_size;
-
-	acc_size = amdgpu_amdkfd_acc_size(size);
-
 	spin_lock(&kfd_mem_limit.mem_limit_lock);
 
 	if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
-		kfd_mem_limit.system_mem_used -= (acc_size + size);
-		kfd_mem_limit.ttm_mem_used -= (acc_size + size);
+		kfd_mem_limit.system_mem_used -= size;
+		kfd_mem_limit.ttm_mem_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
-		kfd_mem_limit.system_mem_used -= acc_size;
-		kfd_mem_limit.ttm_mem_used -= acc_size;
 		adev->kfd.vram_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
-		kfd_mem_limit.system_mem_used -= (acc_size + size);
-		kfd_mem_limit.ttm_mem_used -= acc_size;
-	} else if (alloc_flag &
-		   (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
-		    KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
-		kfd_mem_limit.system_mem_used -= acc_size;
-		kfd_mem_limit.ttm_mem_used -= acc_size;
-	} else {
+		kfd_mem_limit.system_mem_used -= size;
+	} else if (!(alloc_flag &
+				(KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
+				 KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
 		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
 		goto release;
 	}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-19 16:21 [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Alex Sierra
@ 2022-05-19 16:21 ` Alex Sierra
  2022-05-19 19:07   ` philip yang
  2022-05-19 23:08   ` Felix Kuehling
  2022-05-19 23:09 ` [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Felix Kuehling
  1 sibling, 2 replies; 15+ messages in thread
From: Alex Sierra @ 2022-05-19 16:21 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Sierra

[WHY]
Unified memory with xnack off should be tracked, as userptr mappings
and legacy allocations do. To avoid oversuscribe system memory when
xnack off.
[How]
Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
API and call them on every prange creation and free.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 ++++++++++++++-----
 3 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f8b9f27adcf5..f55f34af6480 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
+int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
+		uint64_t size, u32 alloc_flag);
+void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
+		uint64_t size, u32 alloc_flag);
 
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 966714dd764b..615e2b34fe12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
  *
  * Return: returns -ENOMEM in case of error, ZERO otherwise
  */
-static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
+int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
 	uint64_t reserved_for_pt =
@@ -157,8 +157,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	     kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
 	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
 	     kfd_mem_limit.max_ttm_mem_limit) ||
-	    (adev->kfd.vram_used + vram_needed >
-	     adev->gmc.real_vram_size - reserved_for_pt)) {
+	    (adev && (adev->kfd.vram_used + vram_needed >
+	     adev->gmc.real_vram_size - reserved_for_pt))) {
 		ret = -ENOMEM;
 		goto release;
 	}
@@ -166,7 +166,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	/* Update memory accounting by decreasing available system
 	 * memory, TTM memory and GPU memory as computed above
 	 */
-	adev->kfd.vram_used += vram_needed;
+	if (adev)
+		adev->kfd.vram_used += vram_needed;
 	kfd_mem_limit.system_mem_used += system_mem_needed;
 	kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
 
@@ -175,7 +176,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	return ret;
 }
 
-static void unreserve_mem_limit(struct amdgpu_device *adev,
+void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
 	spin_lock(&kfd_mem_limit.mem_limit_lock);
@@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
 		kfd_mem_limit.system_mem_used -= size;
 		kfd_mem_limit.ttm_mem_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
-		adev->kfd.vram_used -= size;
+		if (adev)
+			adev->kfd.vram_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
 		kfd_mem_limit.system_mem_used -= size;
 	} else if (!(alloc_flag &
@@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
 		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
 		goto release;
 	}
-
-	WARN_ONCE(adev->kfd.vram_used < 0,
-		  "KFD VRAM memory accounting unbalanced");
+	if (adev)
+		WARN_ONCE(adev->kfd.vram_used < 0,
+			"KFD VRAM memory accounting unbalanced");
 	WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
 		  "KFD TTM memory accounting unbalanced");
 	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
@@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
 	u32 alloc_flags = bo->kfd_bo->alloc_flags;
 	u64 size = amdgpu_bo_size(bo);
 
-	unreserve_mem_limit(adev, size, alloc_flags);
+	amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
 
 	kfree(bo->kfd_bo);
 }
@@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	/* Don't unreserve system mem limit twice */
 	goto err_reserve_limit;
 err_bo_create:
-	unreserve_mem_limit(adev, size, flags);
+	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
 err_reserve_limit:
 	mutex_destroy(&(*mem)->lock);
 	if (gobj)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 835b5187f0b8..1380c2fee0dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct svm_range *prange)
 
 static void svm_range_free(struct svm_range *prange)
 {
+	uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+	struct kfd_process *p;
+
 	pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms, prange,
 		 prange->start, prange->last);
 
 	svm_range_vram_node_free(prange);
 	svm_range_free_dma_mappings(prange);
+
+	p = container_of(prange->svms, struct kfd_process, svms);
+	if (!p->xnack_enabled) {
+		pr_debug("unreserve mem limit: %lld\n", size);
+		amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+					KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+	}
 	mutex_destroy(&prange->lock);
 	mutex_destroy(&prange->migrate_mutex);
 	kfree(prange);
@@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
 
 static struct
 svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
-			 uint64_t last)
+			 uint64_t last, bool is_new_alloc)
 {
 	uint64_t size = last - start + 1;
 	struct svm_range *prange;
@@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
 	prange = kzalloc(sizeof(*prange), GFP_KERNEL);
 	if (!prange)
 		return NULL;
+
+	p = container_of(svms, struct kfd_process, svms);
+	if (!p->xnack_enabled && is_new_alloc &&
+	    amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
+					    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
+		pr_info("SVM mapping failed, exceeds resident system memory limit\n");
+		kfree(prange);
+		return NULL;
+	}
 	prange->npages = size;
 	prange->svms = svms;
 	prange->start = start;
@@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
 	mutex_init(&prange->migrate_mutex);
 	mutex_init(&prange->lock);
 
-	p = container_of(svms, struct kfd_process, svms);
 	if (p->xnack_enabled)
 		bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
 			    MAX_GPU_INSTANCE);
@@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
 
 	svms = prange->svms;
 	if (old_start == start)
-		*new = svm_range_new(svms, last + 1, old_last);
+		*new = svm_range_new(svms, last + 1, old_last, false);
 	else
-		*new = svm_range_new(svms, old_start, start - 1);
+		*new = svm_range_new(svms, old_start, start - 1, false);
 	if (!*new)
 		return -ENOMEM;
 
@@ -1825,7 +1843,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
 {
 	struct svm_range *new;
 
-	new = svm_range_new(old->svms, old->start, old->last);
+	new = svm_range_new(old->svms, old->start, old->last, false);
 	if (!new)
 		return NULL;
 
@@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 		/* insert a new node if needed */
 		if (node->start > start) {
-			prange = svm_range_new(svms, start, node->start - 1);
+			prange = svm_range_new(svms, start, node->start - 1, true);
 			if (!prange) {
 				r = -ENOMEM;
 				goto out;
@@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 	/* add a final range at the end if needed */
 	if (start <= last) {
-		prange = svm_range_new(svms, start, last);
+		prange = svm_range_new(svms, start, last, true);
 		if (!prange) {
 			r = -ENOMEM;
 			goto out;
@@ -2585,7 +2603,7 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
 		last = addr;
 	}
 
-	prange = svm_range_new(&p->svms, start, last);
+	prange = svm_range_new(&p->svms, start, last, true);
 	if (!prange) {
 		pr_debug("Failed to create prange in address [0x%llx]\n", addr);
 		return NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-19 16:21 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
@ 2022-05-19 19:07   ` philip yang
  2022-05-19 23:08   ` Felix Kuehling
  1 sibling, 0 replies; 15+ messages in thread
From: philip yang @ 2022-05-19 19:07 UTC (permalink / raw)
  To: Alex Sierra, amd-gfx

[-- Attachment #1: Type: text/html, Size: 9594 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-19 16:21 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
  2022-05-19 19:07   ` philip yang
@ 2022-05-19 23:08   ` Felix Kuehling
  2022-05-20 12:45     ` philip yang
  1 sibling, 1 reply; 15+ messages in thread
From: Felix Kuehling @ 2022-05-19 23:08 UTC (permalink / raw)
  To: Alex Sierra, amd-gfx

Am 2022-05-19 um 12:21 schrieb Alex Sierra:
> [WHY]
> Unified memory with xnack off should be tracked, as userptr mappings
> and legacy allocations do. To avoid oversuscribe system memory when
> xnack off.
> [How]
> Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
> API and call them on every prange creation and free.
>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 ++++++++++++++-----
>   3 files changed, 43 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index f8b9f27adcf5..f55f34af6480 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *
>   void amdgpu_amdkfd_block_mmu_notifications(void *p);
>   int amdgpu_amdkfd_criu_resume(void *p);
>   bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
> +		uint64_t size, u32 alloc_flag);
> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
> +		uint64_t size, u32 alloc_flag);
>   
>   #if IS_ENABLED(CONFIG_HSA_AMD)
>   void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 966714dd764b..615e2b34fe12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
>    *
>    * Return: returns -ENOMEM in case of error, ZERO otherwise
>    */
> -static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   		uint64_t size, u32 alloc_flag)
>   {
>   	uint64_t reserved_for_pt =
> @@ -157,8 +157,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   	     kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
>   	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
>   	     kfd_mem_limit.max_ttm_mem_limit) ||
> -	    (adev->kfd.vram_used + vram_needed >
> -	     adev->gmc.real_vram_size - reserved_for_pt)) {
> +	    (adev && (adev->kfd.vram_used + vram_needed >
> +	     adev->gmc.real_vram_size - reserved_for_pt))) {
>   		ret = -ENOMEM;
>   		goto release;
>   	}
> @@ -166,7 +166,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   	/* Update memory accounting by decreasing available system
>   	 * memory, TTM memory and GPU memory as computed above
>   	 */
> -	adev->kfd.vram_used += vram_needed;
> +	if (adev)
> +		adev->kfd.vram_used += vram_needed;

Add a WARN_ONCE(vram_needed && !adev).


>   	kfd_mem_limit.system_mem_used += system_mem_needed;
>   	kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
>   
> @@ -175,7 +176,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   	return ret;
>   }
>   
> -static void unreserve_mem_limit(struct amdgpu_device *adev,
> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>   		uint64_t size, u32 alloc_flag)
>   {
>   	spin_lock(&kfd_mem_limit.mem_limit_lock);
> @@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>   		kfd_mem_limit.system_mem_used -= size;
>   		kfd_mem_limit.ttm_mem_used -= size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
> -		adev->kfd.vram_used -= size;
> +		if (adev)

Add a WARN_ONCE(!adev) here.


> +			adev->kfd.vram_used -= size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>   		kfd_mem_limit.system_mem_used -= size;
>   	} else if (!(alloc_flag &
> @@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>   		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>   		goto release;
>   	}
> -
> -	WARN_ONCE(adev->kfd.vram_used < 0,
> -		  "KFD VRAM memory accounting unbalanced");
> +	if (adev)
> +		WARN_ONCE(adev->kfd.vram_used < 0,
> +			"KFD VRAM memory accounting unbalanced");

This could be simplified to WARN_ONCE(adev && adev->kfd.vram_used < 0, ...).


>   	WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
>   		  "KFD TTM memory accounting unbalanced");
>   	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
> @@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
>   	u32 alloc_flags = bo->kfd_bo->alloc_flags;
>   	u64 size = amdgpu_bo_size(bo);
>   
> -	unreserve_mem_limit(adev, size, alloc_flags);
> +	amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
>   
>   	kfree(bo->kfd_bo);
>   }
> @@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   	/* Don't unreserve system mem limit twice */
>   	goto err_reserve_limit;
>   err_bo_create:
> -	unreserve_mem_limit(adev, size, flags);
> +	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>   err_reserve_limit:
>   	mutex_destroy(&(*mem)->lock);
>   	if (gobj)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 835b5187f0b8..1380c2fee0dc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct svm_range *prange)
>   
>   static void svm_range_free(struct svm_range *prange)
>   {
> +	uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
> +	struct kfd_process *p;
> +
>   	pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms, prange,
>   		 prange->start, prange->last);
>   
>   	svm_range_vram_node_free(prange);
>   	svm_range_free_dma_mappings(prange);
> +
> +	p = container_of(prange->svms, struct kfd_process, svms);

You could initialize p in the variable declaration above.


> +	if (!p->xnack_enabled) {
> +		pr_debug("unreserve mem limit: %lld\n", size);
> +		amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
> +					KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
> +	}
>   	mutex_destroy(&prange->lock);
>   	mutex_destroy(&prange->migrate_mutex);
>   	kfree(prange);
> @@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
>   
>   static struct
>   svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
> -			 uint64_t last)
> +			 uint64_t last, bool is_new_alloc)
>   {
>   	uint64_t size = last - start + 1;
>   	struct svm_range *prange;
> @@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>   	prange = kzalloc(sizeof(*prange), GFP_KERNEL);
>   	if (!prange)
>   		return NULL;
> +
> +	p = container_of(svms, struct kfd_process, svms);
> +	if (!p->xnack_enabled && is_new_alloc &&
> +	    amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
> +					    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
> +		pr_info("SVM mapping failed, exceeds resident system memory limit\n");
> +		kfree(prange);
> +		return NULL;
> +	}
>   	prange->npages = size;
>   	prange->svms = svms;
>   	prange->start = start;
> @@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>   	mutex_init(&prange->migrate_mutex);
>   	mutex_init(&prange->lock);
>   
> -	p = container_of(svms, struct kfd_process, svms);
>   	if (p->xnack_enabled)
>   		bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>   			    MAX_GPU_INSTANCE);
> @@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
>   
>   	svms = prange->svms;
>   	if (old_start == start)
> -		*new = svm_range_new(svms, last + 1, old_last);
> +		*new = svm_range_new(svms, last + 1, old_last, false);
>   	else
> -		*new = svm_range_new(svms, old_start, start - 1);
> +		*new = svm_range_new(svms, old_start, start - 1, false);
>   	if (!*new)
>   		return -ENOMEM;
>   
> @@ -1825,7 +1843,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
>   {
>   	struct svm_range *new;
>   
> -	new = svm_range_new(old->svms, old->start, old->last);
> +	new = svm_range_new(old->svms, old->start, old->last, false);

This won't work as intended. When a range gets cloned, one of the clones 
will be freed later. So when freeing that clone, you also need to skip 
unreserving its space, because the other clone is still holding it.

Regards,
   Felix


>   	if (!new)
>   		return NULL;
>   
> @@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>   
>   		/* insert a new node if needed */
>   		if (node->start > start) {
> -			prange = svm_range_new(svms, start, node->start - 1);
> +			prange = svm_range_new(svms, start, node->start - 1, true);
>   			if (!prange) {
>   				r = -ENOMEM;
>   				goto out;
> @@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>   
>   	/* add a final range at the end if needed */
>   	if (start <= last) {
> -		prange = svm_range_new(svms, start, last);
> +		prange = svm_range_new(svms, start, last, true);
>   		if (!prange) {
>   			r = -ENOMEM;
>   			goto out;
> @@ -2585,7 +2603,7 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>   		last = addr;
>   	}
>   
> -	prange = svm_range_new(&p->svms, start, last);
> +	prange = svm_range_new(&p->svms, start, last, true);
>   	if (!prange) {
>   		pr_debug("Failed to create prange in address [0x%llx]\n", addr);
>   		return NULL;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem
  2022-05-19 16:21 [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Alex Sierra
  2022-05-19 16:21 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
@ 2022-05-19 23:09 ` Felix Kuehling
  1 sibling, 0 replies; 15+ messages in thread
From: Felix Kuehling @ 2022-05-19 23:09 UTC (permalink / raw)
  To: Alex Sierra, amd-gfx; +Cc: Philip Yang


Am 2022-05-19 um 12:21 schrieb Alex Sierra:
> TTM used to track the "acc_size" of all BOs internally. We needed to
> keep track of it in our memory reservation to avoid TTM running out
> of memory in its own accounting. However, that "acc_size" accounting
> has since been removed from TTM. Therefore we don't really need to
> track it any more.
>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
> Reviewed-by: Philip Yang <philip.yang@amd.com>
> ---
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 59 ++++++-------------
>   1 file changed, 17 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index fada3b149361..966714dd764b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -108,21 +108,12 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
>    * compromise that should work in most cases without reserving too
>    * much memory for page tables unnecessarily (factor 16K, >> 14).
>    */
> -#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)
> -
> -static size_t amdgpu_amdkfd_acc_size(uint64_t size)
> -{
> -	size >>= PAGE_SHIFT;
> -	size *= sizeof(dma_addr_t) + sizeof(void *);
>   
> -	return __roundup_pow_of_two(sizeof(struct amdgpu_bo)) +
> -		__roundup_pow_of_two(sizeof(struct ttm_tt)) +
> -		PAGE_ALIGN(size);
> -}
> +#define ESTIMATE_PT_SIZE(mem_size) ((mem_size) >> 14)

This is weird, ESTIMATE_PT_SIZE looks unchanged, but the diff is 
removing and adding it anyway.

Anyway, the patch is
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


>   
>   /**
>    * amdgpu_amdkfd_reserve_mem_limit() - Decrease available memory by size
> - * of buffer including any reserved for control structures
> + * of buffer.
>    *
>    * @adev: Device to which allocated BO belongs to
>    * @size: Size of buffer, in bytes, encapsulated by B0. This should be
> @@ -136,28 +127,22 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   {
>   	uint64_t reserved_for_pt =
>   		ESTIMATE_PT_SIZE(amdgpu_amdkfd_total_mem_size);
> -	size_t acc_size, system_mem_needed, ttm_mem_needed, vram_needed;
> +	size_t system_mem_needed, ttm_mem_needed, vram_needed;
>   	int ret = 0;
>   
> -	acc_size = amdgpu_amdkfd_acc_size(size);
> -
> +	system_mem_needed = 0;
> +	ttm_mem_needed = 0;
>   	vram_needed = 0;
>   	if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
> -		system_mem_needed = acc_size + size;
> -		ttm_mem_needed = acc_size + size;
> +		system_mem_needed = size;
> +		ttm_mem_needed = size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
> -		system_mem_needed = acc_size;
> -		ttm_mem_needed = acc_size;
>   		vram_needed = size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
> -		system_mem_needed = acc_size + size;
> -		ttm_mem_needed = acc_size;
> -	} else if (alloc_flag &
> -		   (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> -		    KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
> -		system_mem_needed = acc_size;
> -		ttm_mem_needed = acc_size;
> -	} else {
> +		system_mem_needed = size;
> +	} else if (!(alloc_flag &
> +				(KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> +				 KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
>   		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>   		return -ENOMEM;
>   	}
> @@ -193,28 +178,18 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   static void unreserve_mem_limit(struct amdgpu_device *adev,
>   		uint64_t size, u32 alloc_flag)
>   {
> -	size_t acc_size;
> -
> -	acc_size = amdgpu_amdkfd_acc_size(size);
> -
>   	spin_lock(&kfd_mem_limit.mem_limit_lock);
>   
>   	if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
> -		kfd_mem_limit.system_mem_used -= (acc_size + size);
> -		kfd_mem_limit.ttm_mem_used -= (acc_size + size);
> +		kfd_mem_limit.system_mem_used -= size;
> +		kfd_mem_limit.ttm_mem_used -= size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
> -		kfd_mem_limit.system_mem_used -= acc_size;
> -		kfd_mem_limit.ttm_mem_used -= acc_size;
>   		adev->kfd.vram_used -= size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
> -		kfd_mem_limit.system_mem_used -= (acc_size + size);
> -		kfd_mem_limit.ttm_mem_used -= acc_size;
> -	} else if (alloc_flag &
> -		   (KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> -		    KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) {
> -		kfd_mem_limit.system_mem_used -= acc_size;
> -		kfd_mem_limit.ttm_mem_used -= acc_size;
> -	} else {
> +		kfd_mem_limit.system_mem_used -= size;
> +	} else if (!(alloc_flag &
> +				(KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL |
> +				 KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP))) {
>   		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>   		goto release;
>   	}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-19 23:08   ` Felix Kuehling
@ 2022-05-20 12:45     ` philip yang
  2022-05-20 15:12       ` Felix Kuehling
  0 siblings, 1 reply; 15+ messages in thread
From: philip yang @ 2022-05-20 12:45 UTC (permalink / raw)
  To: Felix Kuehling, Alex Sierra, amd-gfx

[-- Attachment #1: Type: text/html, Size: 20861 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-20 12:45     ` philip yang
@ 2022-05-20 15:12       ` Felix Kuehling
  2022-05-20 17:30         ` Sierra Guiza, Alejandro (Alex)
  0 siblings, 1 reply; 15+ messages in thread
From: Felix Kuehling @ 2022-05-20 15:12 UTC (permalink / raw)
  To: philip yang, Alex Sierra, amd-gfx


Am 2022-05-20 um 08:45 schrieb philip yang:
>
>
> On 2022-05-19 19:08, Felix Kuehling wrote:
>> Am 2022-05-19 um 12:21 schrieb Alex Sierra:
>>> [WHY]
>>> Unified memory with xnack off should be tracked, as userptr mappings
>>> and legacy allocations do. To avoid oversuscribe system memory when
>>> xnack off.
>>> [How]
>>> Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
>>> API and call them on every prange creation and free.
>>>
>>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
>>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 
>>> ++++++++++++++-----
>>>   3 files changed, 43 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> index f8b9f27adcf5..f55f34af6480 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>> @@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct 
>>> amdgpu_device *adev, struct kgd_mem *
>>>   void amdgpu_amdkfd_block_mmu_notifications(void *p);
>>>   int amdgpu_amdkfd_criu_resume(void *p);
>>>   bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct 
>>> amdgpu_device *adev);
>>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>> +        uint64_t size, u32 alloc_flag);
>>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>> +        uint64_t size, u32 alloc_flag);
>>>     #if IS_ENABLED(CONFIG_HSA_AMD)
>>>   void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> index 966714dd764b..615e2b34fe12 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>> @@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t 
>>> size)
>>>    *
>>>    * Return: returns -ENOMEM in case of error, ZERO otherwise
>>>    */
>>> -static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>           uint64_t size, u32 alloc_flag)
>>>   {
>>>       uint64_t reserved_for_pt =
>>> @@ -157,8 +157,8 @@ static int 
>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>            kfd_mem_limit.max_system_mem_limit && 
>>> !no_system_mem_limit) ||
>>>           (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
>>>            kfd_mem_limit.max_ttm_mem_limit) ||
>>> -        (adev->kfd.vram_used + vram_needed >
>>> -         adev->gmc.real_vram_size - reserved_for_pt)) {
>>> +        (adev && (adev->kfd.vram_used + vram_needed >
>>> +         adev->gmc.real_vram_size - reserved_for_pt))) {
>>>           ret = -ENOMEM;
>>>           goto release;
>>>       }
>>> @@ -166,7 +166,8 @@ static int 
>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>       /* Update memory accounting by decreasing available system
>>>        * memory, TTM memory and GPU memory as computed above
>>>        */
>>> -    adev->kfd.vram_used += vram_needed;
>>> +    if (adev)
>>> +        adev->kfd.vram_used += vram_needed;
>>
>> Add a WARN_ONCE(vram_needed && !adev).
>>
>>
>>>       kfd_mem_limit.system_mem_used += system_mem_needed;
>>>       kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
>>>   @@ -175,7 +176,7 @@ static int 
>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>       return ret;
>>>   }
>>>   -static void unreserve_mem_limit(struct amdgpu_device *adev,
>>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>>           uint64_t size, u32 alloc_flag)
>>>   {
>>>       spin_lock(&kfd_mem_limit.mem_limit_lock);
>>> @@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct 
>>> amdgpu_device *adev,
>>>           kfd_mem_limit.system_mem_used -= size;
>>>           kfd_mem_limit.ttm_mem_used -= size;
>>>       } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>>> -        adev->kfd.vram_used -= size;
>>> +        if (adev)
>>
>> Add a WARN_ONCE(!adev) here.
>>
>>
>>> +            adev->kfd.vram_used -= size;
>>>       } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>>>           kfd_mem_limit.system_mem_used -= size;
>>>       } else if (!(alloc_flag &
>>> @@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct 
>>> amdgpu_device *adev,
>>>           pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>>>           goto release;
>>>       }
>>> -
>>> -    WARN_ONCE(adev->kfd.vram_used < 0,
>>> -          "KFD VRAM memory accounting unbalanced");
>>> +    if (adev)
>>> +        WARN_ONCE(adev->kfd.vram_used < 0,
>>> +            "KFD VRAM memory accounting unbalanced");
>>
>> This could be simplified to WARN_ONCE(adev && adev->kfd.vram_used < 
>> 0, ...).
>>
>>
>>> WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
>>>             "KFD TTM memory accounting unbalanced");
>>>       WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
>>> @@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct 
>>> amdgpu_bo *bo)
>>>       u32 alloc_flags = bo->kfd_bo->alloc_flags;
>>>       u64 size = amdgpu_bo_size(bo);
>>>   -    unreserve_mem_limit(adev, size, alloc_flags);
>>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
>>>         kfree(bo->kfd_bo);
>>>   }
>>> @@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>>       /* Don't unreserve system mem limit twice */
>>>       goto err_reserve_limit;
>>>   err_bo_create:
>>> -    unreserve_mem_limit(adev, size, flags);
>>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>>>   err_reserve_limit:
>>>       mutex_destroy(&(*mem)->lock);
>>>       if (gobj)
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 835b5187f0b8..1380c2fee0dc 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct 
>>> svm_range *prange)
>>>     static void svm_range_free(struct svm_range *prange)
>>>   {
>>> +    uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
>>> +    struct kfd_process *p;
>>> +
>>>       pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", 
>>> prange->svms, prange,
>>>            prange->start, prange->last);
>>>         svm_range_vram_node_free(prange);
>>>       svm_range_free_dma_mappings(prange);
>>> +
>>> +    p = container_of(prange->svms, struct kfd_process, svms);
>>
>> You could initialize p in the variable declaration above.
>>
>>
>>> +    if (!p->xnack_enabled) {
>>> +        pr_debug("unreserve mem limit: %lld\n", size);
>>> +        amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
>>> +                    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
>>> +    }
>>>       mutex_destroy(&prange->lock);
>>>       mutex_destroy(&prange->migrate_mutex);
>>>       kfree(prange);
>>> @@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t 
>>> *location, int32_t *prefetch_loc,
>>>     static struct
>>>   svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>>> -             uint64_t last)
>>> +             uint64_t last, bool is_new_alloc)
>>>   {
>>>       uint64_t size = last - start + 1;
>>>       struct svm_range *prange;
>>> @@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list 
>>> *svms, uint64_t start,
>>>       prange = kzalloc(sizeof(*prange), GFP_KERNEL);
>>>       if (!prange)
>>>           return NULL;
>>> +
>>> +    p = container_of(svms, struct kfd_process, svms);
>>> +    if (!p->xnack_enabled && is_new_alloc &&
>>> +        amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
>>> +                        KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
>>> +        pr_info("SVM mapping failed, exceeds resident system memory 
>>> limit\n");
>>> +        kfree(prange);
>>> +        return NULL;
>>> +    }
>>>       prange->npages = size;
>>>       prange->svms = svms;
>>>       prange->start = start;
>>> @@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list 
>>> *svms, uint64_t start,
>>>       mutex_init(&prange->migrate_mutex);
>>>       mutex_init(&prange->lock);
>>>   -    p = container_of(svms, struct kfd_process, svms);
>>>       if (p->xnack_enabled)
>>>           bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>>>                   MAX_GPU_INSTANCE);
>>> @@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, 
>>> uint64_t start, uint64_t last,
>>>         svms = prange->svms;
>>>       if (old_start == start)
>>> -        *new = svm_range_new(svms, last + 1, old_last);
>>> +        *new = svm_range_new(svms, last + 1, old_last, false);
>>>       else
>>> -        *new = svm_range_new(svms, old_start, start - 1);
>>> +        *new = svm_range_new(svms, old_start, start - 1, false);
>>>       if (!*new)
>>>           return -ENOMEM;
>>>   @@ -1825,7 +1843,7 @@ static struct svm_range 
>>> *svm_range_clone(struct svm_range *old)
>>>   {
>>>       struct svm_range *new;
>>>   -    new = svm_range_new(old->svms, old->start, old->last);
>>> +    new = svm_range_new(old->svms, old->start, old->last, false);
>>
>> This won't work as intended. When a range gets cloned, one of the 
>> clones will be freed later. So when freeing that clone, you also need 
>> to skip unreserving its space, because the other clone is still 
>> holding it.
>
> Thanks Felix for catching this, svm_range_free should skip unreserving 
> the cloned ranges from remove_list, otherwise we don't account 
> overlapped head or tail range size now.
>
Also, in the error handling case of svm_range_add, we have to skip 
unreserving the cloned ranges being freed from the insert_list.

Regards,
   Felix


> Regards,
>
> Philip
>
>>
>> Regards,
>>   Felix
>>
>>
>>>       if (!new)
>>>           return NULL;
>>>   @@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, 
>>> uint64_t start, uint64_t size,
>>>             /* insert a new node if needed */
>>>           if (node->start > start) {
>>> -            prange = svm_range_new(svms, start, node->start - 1);
>>> +            prange = svm_range_new(svms, start, node->start - 1, 
>>> true);
>>>               if (!prange) {
>>>                   r = -ENOMEM;
>>>                   goto out;
>>> @@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t 
>>> start, uint64_t size,
>>>         /* add a final range at the end if needed */
>>>       if (start <= last) {
>>> -        prange = svm_range_new(svms, start, last);
>>> +        prange = svm_range_new(svms, start, last, true);
>>>           if (!prange) {
>>>               r = -ENOMEM;
>>>               goto out;
>>> @@ -2585,7 +2603,7 @@ svm_range 
>>> *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>>>           last = addr;
>>>       }
>>>   -    prange = svm_range_new(&p->svms, start, last);
>>> +    prange = svm_range_new(&p->svms, start, last, true);
>>>       if (!prange) {
>>>           pr_debug("Failed to create prange in address [0x%llx]\n", 
>>> addr);
>>>           return NULL;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-20 15:12       ` Felix Kuehling
@ 2022-05-20 17:30         ` Sierra Guiza, Alejandro (Alex)
  0 siblings, 0 replies; 15+ messages in thread
From: Sierra Guiza, Alejandro (Alex) @ 2022-05-20 17:30 UTC (permalink / raw)
  To: Felix Kuehling, philip yang, amd-gfx


On 5/20/2022 10:12 AM, Felix Kuehling wrote:
>
> Am 2022-05-20 um 08:45 schrieb philip yang:
>>
>>
>> On 2022-05-19 19:08, Felix Kuehling wrote:
>>> Am 2022-05-19 um 12:21 schrieb Alex Sierra:
>>>> [WHY]
>>>> Unified memory with xnack off should be tracked, as userptr mappings
>>>> and legacy allocations do. To avoid oversuscribe system memory when
>>>> xnack off.
>>>> [How]
>>>> Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
>>>> API and call them on every prange creation and free.
>>>>
>>>> Signed-off-by: Alex Sierra <alex.sierra@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
>>>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 
>>>> ++++++++++++++-----
>>>>   3 files changed, 43 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> index f8b9f27adcf5..f55f34af6480 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> @@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct 
>>>> amdgpu_device *adev, struct kgd_mem *
>>>>   void amdgpu_amdkfd_block_mmu_notifications(void *p);
>>>>   int amdgpu_amdkfd_criu_resume(void *p);
>>>>   bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct 
>>>> amdgpu_device *adev);
>>>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>> +        uint64_t size, u32 alloc_flag);
>>>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>>> +        uint64_t size, u32 alloc_flag);
>>>>     #if IS_ENABLED(CONFIG_HSA_AMD)
>>>>   void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index 966714dd764b..615e2b34fe12 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t 
>>>> size)
>>>>    *
>>>>    * Return: returns -ENOMEM in case of error, ZERO otherwise
>>>>    */
>>>> -static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
>>>> *adev,
>>>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>           uint64_t size, u32 alloc_flag)
>>>>   {
>>>>       uint64_t reserved_for_pt =
>>>> @@ -157,8 +157,8 @@ static int 
>>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>            kfd_mem_limit.max_system_mem_limit && 
>>>> !no_system_mem_limit) ||
>>>>           (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
>>>>            kfd_mem_limit.max_ttm_mem_limit) ||
>>>> -        (adev->kfd.vram_used + vram_needed >
>>>> -         adev->gmc.real_vram_size - reserved_for_pt)) {
>>>> +        (adev && (adev->kfd.vram_used + vram_needed >
>>>> +         adev->gmc.real_vram_size - reserved_for_pt))) {
>>>>           ret = -ENOMEM;
>>>>           goto release;
>>>>       }
>>>> @@ -166,7 +166,8 @@ static int 
>>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>       /* Update memory accounting by decreasing available system
>>>>        * memory, TTM memory and GPU memory as computed above
>>>>        */
>>>> -    adev->kfd.vram_used += vram_needed;
>>>> +    if (adev)
>>>> +        adev->kfd.vram_used += vram_needed;
>>>
>>> Add a WARN_ONCE(vram_needed && !adev).
>>>
>>>
>>>>       kfd_mem_limit.system_mem_used += system_mem_needed;
>>>>       kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
>>>>   @@ -175,7 +176,7 @@ static int 
>>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>       return ret;
>>>>   }
>>>>   -static void unreserve_mem_limit(struct amdgpu_device *adev,
>>>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>>>           uint64_t size, u32 alloc_flag)
>>>>   {
>>>>       spin_lock(&kfd_mem_limit.mem_limit_lock);
>>>> @@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct 
>>>> amdgpu_device *adev,
>>>>           kfd_mem_limit.system_mem_used -= size;
>>>>           kfd_mem_limit.ttm_mem_used -= size;
>>>>       } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>>>> -        adev->kfd.vram_used -= size;
>>>> +        if (adev)
>>>
>>> Add a WARN_ONCE(!adev) here.
>>>
>>>
>>>> +            adev->kfd.vram_used -= size;
>>>>       } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>>>>           kfd_mem_limit.system_mem_used -= size;
>>>>       } else if (!(alloc_flag &
>>>> @@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct 
>>>> amdgpu_device *adev,
>>>>           pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>>>>           goto release;
>>>>       }
>>>> -
>>>> -    WARN_ONCE(adev->kfd.vram_used < 0,
>>>> -          "KFD VRAM memory accounting unbalanced");
>>>> +    if (adev)
>>>> +        WARN_ONCE(adev->kfd.vram_used < 0,
>>>> +            "KFD VRAM memory accounting unbalanced");
>>>
>>> This could be simplified to WARN_ONCE(adev && adev->kfd.vram_used < 
>>> 0, ...).
>>>
>>>
>>>> WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
>>>>             "KFD TTM memory accounting unbalanced");
>>>>       WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
>>>> @@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct 
>>>> amdgpu_bo *bo)
>>>>       u32 alloc_flags = bo->kfd_bo->alloc_flags;
>>>>       u64 size = amdgpu_bo_size(bo);
>>>>   -    unreserve_mem_limit(adev, size, alloc_flags);
>>>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
>>>>         kfree(bo->kfd_bo);
>>>>   }
>>>> @@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>>>       /* Don't unreserve system mem limit twice */
>>>>       goto err_reserve_limit;
>>>>   err_bo_create:
>>>> -    unreserve_mem_limit(adev, size, flags);
>>>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>>>>   err_reserve_limit:
>>>>       mutex_destroy(&(*mem)->lock);
>>>>       if (gobj)
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> index 835b5187f0b8..1380c2fee0dc 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> @@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct 
>>>> svm_range *prange)
>>>>     static void svm_range_free(struct svm_range *prange)
>>>>   {
>>>> +    uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
>>>> +    struct kfd_process *p;
>>>> +
>>>>       pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", 
>>>> prange->svms, prange,
>>>>            prange->start, prange->last);
>>>>         svm_range_vram_node_free(prange);
>>>>       svm_range_free_dma_mappings(prange);
>>>> +
>>>> +    p = container_of(prange->svms, struct kfd_process, svms);
>>>
>>> You could initialize p in the variable declaration above.
>>>
>>>
>>>> +    if (!p->xnack_enabled) {
>>>> +        pr_debug("unreserve mem limit: %lld\n", size);
>>>> +        amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
>>>> +                    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
>>>> +    }
>>>>       mutex_destroy(&prange->lock);
>>>>       mutex_destroy(&prange->migrate_mutex);
>>>>       kfree(prange);
>>>> @@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t 
>>>> *location, int32_t *prefetch_loc,
>>>>     static struct
>>>>   svm_range *svm_range_new(struct svm_range_list *svms, uint64_t 
>>>> start,
>>>> -             uint64_t last)
>>>> +             uint64_t last, bool is_new_alloc)
>>>>   {
>>>>       uint64_t size = last - start + 1;
>>>>       struct svm_range *prange;
>>>> @@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list 
>>>> *svms, uint64_t start,
>>>>       prange = kzalloc(sizeof(*prange), GFP_KERNEL);
>>>>       if (!prange)
>>>>           return NULL;
>>>> +
>>>> +    p = container_of(svms, struct kfd_process, svms);
>>>> +    if (!p->xnack_enabled && is_new_alloc &&
>>>> +        amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
>>>> +                        KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
>>>> +        pr_info("SVM mapping failed, exceeds resident system 
>>>> memory limit\n");
>>>> +        kfree(prange);
>>>> +        return NULL;
>>>> +    }
>>>>       prange->npages = size;
>>>>       prange->svms = svms;
>>>>       prange->start = start;
>>>> @@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list 
>>>> *svms, uint64_t start,
>>>>       mutex_init(&prange->migrate_mutex);
>>>>       mutex_init(&prange->lock);
>>>>   -    p = container_of(svms, struct kfd_process, svms);
>>>>       if (p->xnack_enabled)
>>>>           bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>>>>                   MAX_GPU_INSTANCE);
>>>> @@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, 
>>>> uint64_t start, uint64_t last,
>>>>         svms = prange->svms;
>>>>       if (old_start == start)
>>>> -        *new = svm_range_new(svms, last + 1, old_last);
>>>> +        *new = svm_range_new(svms, last + 1, old_last, false);
>>>>       else
>>>> -        *new = svm_range_new(svms, old_start, start - 1);
>>>> +        *new = svm_range_new(svms, old_start, start - 1, false);
>>>>       if (!*new)
>>>>           return -ENOMEM;
>>>>   @@ -1825,7 +1843,7 @@ static struct svm_range 
>>>> *svm_range_clone(struct svm_range *old)
>>>>   {
>>>>       struct svm_range *new;
>>>>   -    new = svm_range_new(old->svms, old->start, old->last);
>>>> +    new = svm_range_new(old->svms, old->start, old->last, false);
>>>
>>> This won't work as intended. When a range gets cloned, one of the 
>>> clones will be freed later. So when freeing that clone, you also 
>>> need to skip unreserving its space, because the other clone is still 
>>> holding it.
>>
>> Thanks Felix for catching this, svm_range_free should skip 
>> unreserving the cloned ranges from remove_list, otherwise we don't 
>> account overlapped head or tail range size now.
>>
> Also, in the error handling case of svm_range_add, we have to skip 
> unreserving the cloned ranges being freed from the insert_list.

Could we just treat the cloned prange as new reserve? Eventually it will 
be removed either by old->remove_list or insert_list in case error 
handling at svm_range_add

Regards,
Alex

>
> Regards,
>   Felix
>
>
>> Regards,
>>
>> Philip
>>
>>>
>>> Regards,
>>>   Felix
>>>
>>>
>>>>       if (!new)
>>>>           return NULL;
>>>>   @@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, 
>>>> uint64_t start, uint64_t size,
>>>>             /* insert a new node if needed */
>>>>           if (node->start > start) {
>>>> -            prange = svm_range_new(svms, start, node->start - 1);
>>>> +            prange = svm_range_new(svms, start, node->start - 1, 
>>>> true);
>>>>               if (!prange) {
>>>>                   r = -ENOMEM;
>>>>                   goto out;
>>>> @@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t 
>>>> start, uint64_t size,
>>>>         /* add a final range at the end if needed */
>>>>       if (start <= last) {
>>>> -        prange = svm_range_new(svms, start, last);
>>>> +        prange = svm_range_new(svms, start, last, true);
>>>>           if (!prange) {
>>>>               r = -ENOMEM;
>>>>               goto out;
>>>> @@ -2585,7 +2603,7 @@ svm_range 
>>>> *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>>>>           last = addr;
>>>>       }
>>>>   -    prange = svm_range_new(&p->svms, start, last);
>>>> +    prange = svm_range_new(&p->svms, start, last, true);
>>>>       if (!prange) {
>>>>           pr_debug("Failed to create prange in address [0x%llx]\n", 
>>>> addr);
>>>>           return NULL;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-06-07 17:17 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
@ 2022-06-07 22:36   ` Felix Kuehling
  0 siblings, 0 replies; 15+ messages in thread
From: Felix Kuehling @ 2022-06-07 22:36 UTC (permalink / raw)
  To: Alex Sierra, amd-gfx

Am 2022-06-07 um 13:17 schrieb Alex Sierra:
> [WHY]
> Unified memory with xnack off should be tracked, as userptr mappings
> and legacy allocations do. To avoid oversuscribe system memory when
> xnack off.
> [How]
> Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
> API and call them on every prange creation and free.
>
> Signed-off-by: Alex Sierra <alex.sierra@amd.com>

I'm finding some more problems (see inline). It makes me wonder whether 
there is a good way to test this. We may need to expose the mem limits 
in debugfs so we can check for leaks.


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 ++
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 27 ++++++-----
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 47 +++++++++++++------
>   3 files changed, 51 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index f8b9f27adcf5..f55f34af6480 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *
>   void amdgpu_amdkfd_block_mmu_notifications(void *p);
>   int amdgpu_amdkfd_criu_resume(void *p);
>   bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
> +		uint64_t size, u32 alloc_flag);
> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
> +		uint64_t size, u32 alloc_flag);
>   
>   #if IS_ENABLED(CONFIG_HSA_AMD)
>   void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 50730d2132a6..f13977ae4579 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
>    *
>    * Return: returns -ENOMEM in case of error, ZERO otherwise
>    */
> -static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   		uint64_t size, u32 alloc_flag)
>   {
>   	uint64_t reserved_for_pt =
> @@ -157,8 +157,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   	     kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
>   	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
>   	     kfd_mem_limit.max_ttm_mem_limit) ||
> -	    (adev->kfd.vram_used + vram_needed >
> -	     adev->gmc.real_vram_size - reserved_for_pt)) {
> +	    (adev && (adev->kfd.vram_used + vram_needed >
> +	     adev->gmc.real_vram_size - reserved_for_pt))) {

The extra pair of parentheses you added around the a > b is unnecessary.


>   		ret = -ENOMEM;
>   		goto release;
>   	}
> @@ -166,7 +166,10 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   	/* Update memory accounting by decreasing available system
>   	 * memory, TTM memory and GPU memory as computed above
>   	 */
> -	adev->kfd.vram_used += vram_needed;
> +	WARN_ONCE(vram_needed && !adev,
> +		  "adev reference can't be null when vram is used");
> +	if (adev)
> +		adev->kfd.vram_used += vram_needed;
>   	kfd_mem_limit.system_mem_used += system_mem_needed;
>   	kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
>   
> @@ -175,7 +178,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>   	return ret;
>   }
>   
> -static void unreserve_mem_limit(struct amdgpu_device *adev,
> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>   		uint64_t size, u32 alloc_flag)
>   {
>   	spin_lock(&kfd_mem_limit.mem_limit_lock);
> @@ -184,7 +187,10 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>   		kfd_mem_limit.system_mem_used -= size;
>   		kfd_mem_limit.ttm_mem_used -= size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
> -		adev->kfd.vram_used -= size;
> +		WARN_ONCE(!adev,
> +			  "adev reference can't be null when alloc mem flags vram is set");
> +		if (adev)
> +			adev->kfd.vram_used -= size;
>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>   		kfd_mem_limit.system_mem_used -= size;
>   	} else if (!(alloc_flag &
> @@ -193,11 +199,8 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>   		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>   		goto release;
>   	}
> -
> -	WARN_ONCE(adev->kfd.vram_used < 0,
> +	WARN_ONCE(adev && adev->kfd.vram_used < 0,
>   		  "KFD VRAM memory accounting unbalanced");
> -	WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
> -		  "KFD TTM memory accounting unbalanced");
>   	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
>   		  "KFD system memory accounting unbalanced");
>   
> @@ -211,7 +214,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
>   	u32 alloc_flags = bo->kfd_bo->alloc_flags;
>   	u64 size = amdgpu_bo_size(bo);
>   
> -	unreserve_mem_limit(adev, size, alloc_flags);
> +	amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
>   
>   	kfree(bo->kfd_bo);
>   }
> @@ -1601,7 +1604,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   	/* Don't unreserve system mem limit twice */
>   	goto err_reserve_limit;
>   err_bo_create:
> -	unreserve_mem_limit(adev, size, flags);
> +	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>   err_reserve_limit:
>   	mutex_destroy(&(*mem)->lock);
>   	if (gobj)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index d6fc00d51c8c..329b10d1709b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -259,13 +259,22 @@ void svm_range_free_dma_mappings(struct svm_range *prange)
>   	}
>   }
>   
> -static void svm_range_free(struct svm_range *prange)
> +static void svm_range_free(struct svm_range *prange, bool skip_unreserve)
>   {
> +	uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
> +	struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms);
> +
>   	pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms, prange,
>   		 prange->start, prange->last);
>   
>   	svm_range_vram_node_free(prange);
>   	svm_range_free_dma_mappings(prange);
> +
> +	if (!skip_unreserve && !p->xnack_enabled) {
> +		pr_debug("unreserve mem limit: %lld\n", size);
> +		amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
> +					KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
> +	}
>   	mutex_destroy(&prange->lock);
>   	mutex_destroy(&prange->migrate_mutex);
>   	kfree(prange);
> @@ -284,7 +293,7 @@ svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
>   
>   static struct
>   svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
> -			 uint64_t last)
> +			 uint64_t last, bool is_new_alloc)
>   {
>   	uint64_t size = last - start + 1;
>   	struct svm_range *prange;
> @@ -293,6 +302,15 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>   	prange = kzalloc(sizeof(*prange), GFP_KERNEL);
>   	if (!prange)
>   		return NULL;
> +
> +	p = container_of(svms, struct kfd_process, svms);
> +	if (!p->xnack_enabled && is_new_alloc &&
> +	    amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
> +					    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
> +		pr_info("SVM mapping failed, exceeds resident system memory limit\n");
> +		kfree(prange);
> +		return NULL;
> +	}
>   	prange->npages = size;
>   	prange->svms = svms;
>   	prange->start = start;
> @@ -307,7 +325,6 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>   	mutex_init(&prange->migrate_mutex);
>   	mutex_init(&prange->lock);
>   
> -	p = container_of(svms, struct kfd_process, svms);
>   	if (p->xnack_enabled)
>   		bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>   			    MAX_GPU_INSTANCE);
> @@ -1000,9 +1017,9 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
>   
>   	svms = prange->svms;
>   	if (old_start == start)
> -		*new = svm_range_new(svms, last + 1, old_last);
> +		*new = svm_range_new(svms, last + 1, old_last, false);
>   	else
> -		*new = svm_range_new(svms, old_start, start - 1);
> +		*new = svm_range_new(svms, old_start, start - 1, false);
>   	if (!*new)
>   		return -ENOMEM;
>   
> @@ -1010,7 +1027,7 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
>   	if (r) {
>   		pr_debug("failed %d split [0x%llx 0x%llx] to [0x%llx 0x%llx]\n",
>   			 r, old_start, old_last, start, last);
> -		svm_range_free(*new);
> +		svm_range_free(*new, false);

This should be skip_unreserve=true because the range you're freeing was 
created without reserving space.


>   		*new = NULL;
>   	}
>   
> @@ -1825,7 +1842,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
>   {
>   	struct svm_range *new;
>   
> -	new = svm_range_new(old->svms, old->start, old->last);
> +	new = svm_range_new(old->svms, old->start, old->last, false);
>   	if (!new)
>   		return NULL;
>   
> @@ -1951,7 +1968,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>   
>   		/* insert a new node if needed */
>   		if (node->start > start) {
> -			prange = svm_range_new(svms, start, node->start - 1);
> +			prange = svm_range_new(svms, start, node->start - 1, true);
>   			if (!prange) {
>   				r = -ENOMEM;
>   				goto out;
> @@ -1967,7 +1984,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>   
>   	/* add a final range at the end if needed */
>   	if (start <= last) {
> -		prange = svm_range_new(svms, start, last);
> +		prange = svm_range_new(svms, start, last, true);
>   		if (!prange) {
>   			r = -ENOMEM;
>   			goto out;
> @@ -1979,7 +1996,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>   out:
>   	if (r)
>   		list_for_each_entry_safe(prange, tmp, insert_list, list)
> -			svm_range_free(prange);
> +			svm_range_free(prange, false);

OK, so maybe we do need a flag here after all, because we need to 
unreserve space for ranges that were newly inserted (under the comment 
/* insert a new node if needed */), but skip unreserving space for 
ranges that were cloned from existing ones.

Or we need a separate new_list for real new ranges, that gets spliced 
into the insert_list in the success case, and freed with unreserve in 
the failure case.

Regards,
   Felix


>   
>   	return r;
>   }
> @@ -2026,7 +2043,7 @@ svm_range_handle_list_op(struct svm_range_list *svms, struct svm_range *prange,
>   			 svms, prange, prange->start, prange->last);
>   		svm_range_unlink(prange);
>   		svm_range_remove_notifier(prange);
> -		svm_range_free(prange);
> +		svm_range_free(prange, false);
>   		break;
>   	case SVM_OP_UPDATE_RANGE_NOTIFIER:
>   		pr_debug("update notifier 0x%p prange 0x%p [0x%lx 0x%lx]\n",
> @@ -2588,14 +2605,14 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>   		last = addr;
>   	}
>   
> -	prange = svm_range_new(&p->svms, start, last);
> +	prange = svm_range_new(&p->svms, start, last, true);
>   	if (!prange) {
>   		pr_debug("Failed to create prange in address [0x%llx]\n", addr);
>   		return NULL;
>   	}
>   	if (kfd_process_gpuid_from_adev(p, adev, &gpuid, &gpuidx)) {
>   		pr_debug("failed to get gpuid from kgd\n");
> -		svm_range_free(prange);
> +		svm_range_free(prange, false);
>   		return NULL;
>   	}
>   
> @@ -2884,7 +2901,7 @@ void svm_range_list_fini(struct kfd_process *p)
>   	list_for_each_entry_safe(prange, next, &p->svms.list, list) {
>   		svm_range_unlink(prange);
>   		svm_range_remove_notifier(prange);
> -		svm_range_free(prange);
> +		svm_range_free(prange, false);
>   	}
>   
>   	mutex_destroy(&p->svms.lock);
> @@ -3299,7 +3316,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
>   			 prange->last);
>   		svm_range_unlink(prange);
>   		svm_range_remove_notifier(prange);
> -		svm_range_free(prange);
> +		svm_range_free(prange, true);
>   	}
>   
>   	mmap_write_downgrade(mm);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-06-07 17:17 Alex Sierra
@ 2022-06-07 17:17 ` Alex Sierra
  2022-06-07 22:36   ` Felix Kuehling
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Sierra @ 2022-06-07 17:17 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Sierra

[WHY]
Unified memory with xnack off should be tracked, as userptr mappings
and legacy allocations do. To avoid oversuscribe system memory when
xnack off.
[How]
Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
API and call them on every prange creation and free.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 ++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 27 ++++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 47 +++++++++++++------
 3 files changed, 51 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f8b9f27adcf5..f55f34af6480 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
+int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
+		uint64_t size, u32 alloc_flag);
+void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
+		uint64_t size, u32 alloc_flag);
 
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 50730d2132a6..f13977ae4579 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
  *
  * Return: returns -ENOMEM in case of error, ZERO otherwise
  */
-static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
+int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
 	uint64_t reserved_for_pt =
@@ -157,8 +157,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	     kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
 	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
 	     kfd_mem_limit.max_ttm_mem_limit) ||
-	    (adev->kfd.vram_used + vram_needed >
-	     adev->gmc.real_vram_size - reserved_for_pt)) {
+	    (adev && (adev->kfd.vram_used + vram_needed >
+	     adev->gmc.real_vram_size - reserved_for_pt))) {
 		ret = -ENOMEM;
 		goto release;
 	}
@@ -166,7 +166,10 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	/* Update memory accounting by decreasing available system
 	 * memory, TTM memory and GPU memory as computed above
 	 */
-	adev->kfd.vram_used += vram_needed;
+	WARN_ONCE(vram_needed && !adev,
+		  "adev reference can't be null when vram is used");
+	if (adev)
+		adev->kfd.vram_used += vram_needed;
 	kfd_mem_limit.system_mem_used += system_mem_needed;
 	kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
 
@@ -175,7 +178,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	return ret;
 }
 
-static void unreserve_mem_limit(struct amdgpu_device *adev,
+void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
 	spin_lock(&kfd_mem_limit.mem_limit_lock);
@@ -184,7 +187,10 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
 		kfd_mem_limit.system_mem_used -= size;
 		kfd_mem_limit.ttm_mem_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
-		adev->kfd.vram_used -= size;
+		WARN_ONCE(!adev,
+			  "adev reference can't be null when alloc mem flags vram is set");
+		if (adev)
+			adev->kfd.vram_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
 		kfd_mem_limit.system_mem_used -= size;
 	} else if (!(alloc_flag &
@@ -193,11 +199,8 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
 		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
 		goto release;
 	}
-
-	WARN_ONCE(adev->kfd.vram_used < 0,
+	WARN_ONCE(adev && adev->kfd.vram_used < 0,
 		  "KFD VRAM memory accounting unbalanced");
-	WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
-		  "KFD TTM memory accounting unbalanced");
 	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
 		  "KFD system memory accounting unbalanced");
 
@@ -211,7 +214,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
 	u32 alloc_flags = bo->kfd_bo->alloc_flags;
 	u64 size = amdgpu_bo_size(bo);
 
-	unreserve_mem_limit(adev, size, alloc_flags);
+	amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
 
 	kfree(bo->kfd_bo);
 }
@@ -1601,7 +1604,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	/* Don't unreserve system mem limit twice */
 	goto err_reserve_limit;
 err_bo_create:
-	unreserve_mem_limit(adev, size, flags);
+	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
 err_reserve_limit:
 	mutex_destroy(&(*mem)->lock);
 	if (gobj)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index d6fc00d51c8c..329b10d1709b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -259,13 +259,22 @@ void svm_range_free_dma_mappings(struct svm_range *prange)
 	}
 }
 
-static void svm_range_free(struct svm_range *prange)
+static void svm_range_free(struct svm_range *prange, bool skip_unreserve)
 {
+	uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+	struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms);
+
 	pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms, prange,
 		 prange->start, prange->last);
 
 	svm_range_vram_node_free(prange);
 	svm_range_free_dma_mappings(prange);
+
+	if (!skip_unreserve && !p->xnack_enabled) {
+		pr_debug("unreserve mem limit: %lld\n", size);
+		amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+					KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+	}
 	mutex_destroy(&prange->lock);
 	mutex_destroy(&prange->migrate_mutex);
 	kfree(prange);
@@ -284,7 +293,7 @@ svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
 
 static struct
 svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
-			 uint64_t last)
+			 uint64_t last, bool is_new_alloc)
 {
 	uint64_t size = last - start + 1;
 	struct svm_range *prange;
@@ -293,6 +302,15 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
 	prange = kzalloc(sizeof(*prange), GFP_KERNEL);
 	if (!prange)
 		return NULL;
+
+	p = container_of(svms, struct kfd_process, svms);
+	if (!p->xnack_enabled && is_new_alloc &&
+	    amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
+					    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
+		pr_info("SVM mapping failed, exceeds resident system memory limit\n");
+		kfree(prange);
+		return NULL;
+	}
 	prange->npages = size;
 	prange->svms = svms;
 	prange->start = start;
@@ -307,7 +325,6 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
 	mutex_init(&prange->migrate_mutex);
 	mutex_init(&prange->lock);
 
-	p = container_of(svms, struct kfd_process, svms);
 	if (p->xnack_enabled)
 		bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
 			    MAX_GPU_INSTANCE);
@@ -1000,9 +1017,9 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
 
 	svms = prange->svms;
 	if (old_start == start)
-		*new = svm_range_new(svms, last + 1, old_last);
+		*new = svm_range_new(svms, last + 1, old_last, false);
 	else
-		*new = svm_range_new(svms, old_start, start - 1);
+		*new = svm_range_new(svms, old_start, start - 1, false);
 	if (!*new)
 		return -ENOMEM;
 
@@ -1010,7 +1027,7 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
 	if (r) {
 		pr_debug("failed %d split [0x%llx 0x%llx] to [0x%llx 0x%llx]\n",
 			 r, old_start, old_last, start, last);
-		svm_range_free(*new);
+		svm_range_free(*new, false);
 		*new = NULL;
 	}
 
@@ -1825,7 +1842,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
 {
 	struct svm_range *new;
 
-	new = svm_range_new(old->svms, old->start, old->last);
+	new = svm_range_new(old->svms, old->start, old->last, false);
 	if (!new)
 		return NULL;
 
@@ -1951,7 +1968,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 		/* insert a new node if needed */
 		if (node->start > start) {
-			prange = svm_range_new(svms, start, node->start - 1);
+			prange = svm_range_new(svms, start, node->start - 1, true);
 			if (!prange) {
 				r = -ENOMEM;
 				goto out;
@@ -1967,7 +1984,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 	/* add a final range at the end if needed */
 	if (start <= last) {
-		prange = svm_range_new(svms, start, last);
+		prange = svm_range_new(svms, start, last, true);
 		if (!prange) {
 			r = -ENOMEM;
 			goto out;
@@ -1979,7 +1996,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 out:
 	if (r)
 		list_for_each_entry_safe(prange, tmp, insert_list, list)
-			svm_range_free(prange);
+			svm_range_free(prange, false);
 
 	return r;
 }
@@ -2026,7 +2043,7 @@ svm_range_handle_list_op(struct svm_range_list *svms, struct svm_range *prange,
 			 svms, prange, prange->start, prange->last);
 		svm_range_unlink(prange);
 		svm_range_remove_notifier(prange);
-		svm_range_free(prange);
+		svm_range_free(prange, false);
 		break;
 	case SVM_OP_UPDATE_RANGE_NOTIFIER:
 		pr_debug("update notifier 0x%p prange 0x%p [0x%lx 0x%lx]\n",
@@ -2588,14 +2605,14 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
 		last = addr;
 	}
 
-	prange = svm_range_new(&p->svms, start, last);
+	prange = svm_range_new(&p->svms, start, last, true);
 	if (!prange) {
 		pr_debug("Failed to create prange in address [0x%llx]\n", addr);
 		return NULL;
 	}
 	if (kfd_process_gpuid_from_adev(p, adev, &gpuid, &gpuidx)) {
 		pr_debug("failed to get gpuid from kgd\n");
-		svm_range_free(prange);
+		svm_range_free(prange, false);
 		return NULL;
 	}
 
@@ -2884,7 +2901,7 @@ void svm_range_list_fini(struct kfd_process *p)
 	list_for_each_entry_safe(prange, next, &p->svms.list, list) {
 		svm_range_unlink(prange);
 		svm_range_remove_notifier(prange);
-		svm_range_free(prange);
+		svm_range_free(prange, false);
 	}
 
 	mutex_destroy(&p->svms.lock);
@@ -3299,7 +3316,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
 			 prange->last);
 		svm_range_unlink(prange);
 		svm_range_remove_notifier(prange);
-		svm_range_free(prange);
+		svm_range_free(prange, true);
 	}
 
 	mmap_write_downgrade(mm);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-19 14:01       ` philip yang
@ 2022-05-19 15:14         ` Felix Kuehling
  0 siblings, 0 replies; 15+ messages in thread
From: Felix Kuehling @ 2022-05-19 15:14 UTC (permalink / raw)
  To: philip yang, Alex Sierra, amd-gfx


Am 2022-05-19 um 10:01 schrieb philip yang:
>
>
> On 2022-05-18 17:40, Felix Kuehling wrote:
>>
>> On 2022-05-18 14:36, philip yang wrote:
>>>
>>>
>>> On 2022-05-17 19:11, Alex Sierra wrote:
>>>> [WHY]
>>>> Unified memory with xnack off should be tracked, as userptr mappings
>>>> and legacy allocations do. To avoid oversuscribe system memory when
>>>> xnack off.
>>>> [How]
>>>> Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
>>>> API and call them on every prange creation and free.
>>>>
>>>> Signed-off-by: Alex Sierra<alex.sierra@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
>>>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 
>>>> ++++++++++++++-----
>>>>   3 files changed, 43 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> index f8b9f27adcf5..f55f34af6480 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>>>> @@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct 
>>>> amdgpu_device *adev, struct kgd_mem *
>>>>   void amdgpu_amdkfd_block_mmu_notifications(void *p);
>>>>   int amdgpu_amdkfd_criu_resume(void *p);
>>>>   bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct 
>>>> amdgpu_device *adev);
>>>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>> +        uint64_t size, u32 alloc_flag);
>>>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>>> +        uint64_t size, u32 alloc_flag);
>>>>     #if IS_ENABLED(CONFIG_HSA_AMD)
>>>>   void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> index e985cf9c7ec0..b2236159ff39 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>>>> @@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t 
>>>> size)
>>>>    *
>>>>    * Return: returns -ENOMEM in case of error, ZERO otherwise
>>>>    */
>>>> -static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device 
>>>> *adev,
>>>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>           uint64_t size, u32 alloc_flag)
>>>>   {
>>>>       uint64_t reserved_for_pt =
>>>> @@ -157,8 +157,8 @@ static int 
>>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>            kfd_mem_limit.max_system_mem_limit && 
>>>> !no_system_mem_limit) ||
>>>>           (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
>>>>            kfd_mem_limit.max_ttm_mem_limit) ||
>>>> -        (adev->kfd.vram_used + vram_needed >
>>>> -         adev->gmc.real_vram_size - reserved_for_pt)) {
>>>> +        (adev && (adev->kfd.vram_used + vram_needed >
>>>> +         adev->gmc.real_vram_size - reserved_for_pt))) {
>>>>           ret = -ENOMEM;
>>>>           goto release;
>>>>       }
>>>> @@ -166,7 +166,8 @@ static int 
>>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>       /* Update memory accounting by decreasing available system
>>>>        * memory, TTM memory and GPU memory as computed above
>>>>        */
>>>> -    adev->kfd.vram_used += vram_needed;
>>>> +    if (adev)
>>>> +        adev->kfd.vram_used += vram_needed;
>>>>       kfd_mem_limit.system_mem_used += system_mem_needed;
>>>>       kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
>>>>   @@ -175,7 +176,7 @@ static int 
>>>> amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>>>       return ret;
>>>>   }
>>>>   -static void unreserve_mem_limit(struct amdgpu_device *adev,
>>>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>>>           uint64_t size, u32 alloc_flag)
>>>>   {
>>>>       spin_lock(&kfd_mem_limit.mem_limit_lock);
>>>> @@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct 
>>>> amdgpu_device *adev,
>>>>           kfd_mem_limit.system_mem_used -= size;
>>>>           kfd_mem_limit.ttm_mem_used -= size;
>>>>       } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>>>> -        adev->kfd.vram_used -= size;
>>>> +        if (adev)
>>>> +            adev->kfd.vram_used -= size;
>>>>       } else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>>>>           kfd_mem_limit.system_mem_used -= size;
>>>>       } else if (!(alloc_flag &
>>>> @@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct 
>>>> amdgpu_device *adev,
>>>>           pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>>>>           goto release;
>>>>       }
>>>> -
>>>> -    WARN_ONCE(adev->kfd.vram_used < 0,
>>>> -          "KFD VRAM memory accounting unbalanced");
>>>> +    if (adev)
>>>> +        WARN_ONCE(adev->kfd.vram_used < 0,
>>>> +            "KFD VRAM memory accounting unbalanced");
>>>>       WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
>>>>             "KFD TTM memory accounting unbalanced");
>>>>       WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
>>>> @@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct 
>>>> amdgpu_bo *bo)
>>>>       u32 alloc_flags = bo->kfd_bo->alloc_flags;
>>>>       u64 size = amdgpu_bo_size(bo);
>>>>   -    unreserve_mem_limit(adev, size, alloc_flags);
>>>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
>>>>         kfree(bo->kfd_bo);
>>>>   }
>>>> @@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>>>       /* Don't unreserve system mem limit twice */
>>>>       goto err_reserve_limit;
>>>>   err_bo_create:
>>>> -    unreserve_mem_limit(adev, size, flags);
>>>> +    amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>>>>   err_reserve_limit:
>>>>       mutex_destroy(&(*mem)->lock);
>>>>       if (gobj)
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> index 5ed8d9b549a4..e7e9b388fea4 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> @@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct 
>>>> svm_range *prange)
>>>>     static void svm_range_free(struct svm_range *prange)
>>>>   {
>>>> +    uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
>>>> +    struct kfd_process *p;
>>>> +
>>>>       pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", 
>>>> prange->svms, prange,
>>>>            prange->start, prange->last);
>>>>         svm_range_vram_node_free(prange);
>>>>       svm_range_free_dma_mappings(prange);
>>>> +
>>>> +    p = container_of(prange->svms, struct kfd_process, svms);
>>>> +    if (!p->xnack_enabled) {
>>>> +        pr_debug("unreserve mem limit: %lld\n", size);
>>>> +        amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
>>>> +                    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
>>>> +    }
>>>>       mutex_destroy(&prange->lock);
>>>>       mutex_destroy(&prange->migrate_mutex);
>>>>       kfree(prange);
>>>> @@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t 
>>>> *location, int32_t *prefetch_loc,
>>>>     static struct
>>>>   svm_range *svm_range_new(struct svm_range_list *svms, uint64_t 
>>>> start,
>>>> -             uint64_t last)
>>>> +             uint64_t last, bool is_new_alloc)
>>>>   {
>>>>       uint64_t size = last - start + 1;
>>>>       struct svm_range *prange;
>>>> @@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list 
>>>> *svms, uint64_t start,
>>>>       prange = kzalloc(sizeof(*prange), GFP_KERNEL);
>>>>       if (!prange)
>>>>           return NULL;
>>>> +
>>>> +    p = container_of(svms, struct kfd_process, svms);
>>>> +    if (!p->xnack_enabled && is_new_alloc &&
>>>> +        amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
>>>> +                        KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
>>> The range will create svm_bo to migrate to VRAM, so count acc_size 
>>> is correct.
>>
>> I'm not sure how to understand this comment. The thing is, the VRAM 
>> BO can be evicted without losing functionality. So I don't think we 
>> need to track potential VRAM usage of SVM ranges. We only need to 
>> account for system memory usage. That includes the system memory 
>> being mapped, and maybe the prange structures and dma_addr arrays 
>> used for the mapping. However, like I said, the old acc_size isn't 
>> really suitable for that because it doesn't account for mappings on 
>> multiple GPUs.
>>
>> The dma address arrays take 1/512 of the memory size, per GPU. Even 
>> on an 8GPU system, that's only 1/64 of the memory size. So I think 
>> the 15/16 system memory limit still leaves enough room for those data 
>> structures. If that gets too tight, we may just decide to use a lower 
>> system memory limit, or change the system memory limit based on the 
>> number of GPUs in the system. That would be easier than accurately 
>> tracking the data structure sizes for each allocation and potentially 
>> each mapping on a multi-GPU system.
>
> Yes, based on this calculation and acc_size is not accurate now for 
> mGPUs with IOMMU support and svm range, we can remove acc_size for 
> svm_bo, only count svm range as userptr system memory.
>
> Regards,
>
> Philip
>
>>
>> Regards,
>>   Felix
>>
>>
>>>> +        pr_err("Failed to allocate usrptr memory\n");

I just noticed this error message. It's misleading. The failure here is 
not an allocation failure. Also the term "userptr" is misleading because 
it points to the legacy userptr BO mechanism. I'm not sure we should 
print an error message here at all. Every time we print kernel error 
messages because of user mode problems it confuses people.

Anyway, if we have to have a message, it should be something like "SVM 
mapping failed, exceeds resident system memory limit". I would at least 
downgrade it to a warning or info.

Regards,
   Felix


>>>> +        kfree(prange);
>>>> +        return NULL;
>>>> +    }
>>>>       prange->npages = size;
>>>>       prange->svms = svms;
>>>>       prange->start = start;
>>>> @@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list 
>>>> *svms, uint64_t start,
>>>>       mutex_init(&prange->migrate_mutex);
>>>>       mutex_init(&prange->lock);
>>>>   -    p = container_of(svms, struct kfd_process, svms);
>>>>       if (p->xnack_enabled)
>>>>           bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>>>>                   MAX_GPU_INSTANCE);
>>>> @@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, 
>>>> uint64_t start, uint64_t last,
>>>>         svms = prange->svms;
>>>>       if (old_start == start)
>>>> -        *new = svm_range_new(svms, last + 1, old_last);
>>>> +        *new = svm_range_new(svms, last + 1, old_last, false);
>>>>       else
>>>> -        *new = svm_range_new(svms, old_start, start - 1);
>>>> +        *new = svm_range_new(svms, old_start, start - 1, false);
>>>>       if (!*new)
>>>>           return -ENOMEM;
>>>>   @@ -1825,7 +1843,7 @@ static struct svm_range 
>>>> *svm_range_clone(struct svm_range *old)
>>>>   {
>>>>       struct svm_range *new;
>>>>   -    new = svm_range_new(old->svms, old->start, old->last);
>>>> +    new = svm_range_new(old->svms, old->start, old->last, true);
>>>
>>> clone prange is not new memory allocation, use false.
>>>
>>> Regards,
>>>
>>> Philip
>>>
>>>>       if (!new)
>>>>           return NULL;
>>>>   @@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, 
>>>> uint64_t start, uint64_t size,
>>>>             /* insert a new node if needed */
>>>>           if (node->start > start) {
>>>> -            prange = svm_range_new(svms, start, node->start - 1);
>>>> +            prange = svm_range_new(svms, start, node->start - 1, 
>>>> true);
>>>>               if (!prange) {
>>>>                   r = -ENOMEM;
>>>>                   goto out;
>>>> @@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t 
>>>> start, uint64_t size,
>>>>         /* add a final range at the end if needed */
>>>>       if (start <= last) {
>>>> -        prange = svm_range_new(svms, start, last);
>>>> +        prange = svm_range_new(svms, start, last, true);
>>>>           if (!prange) {
>>>>               r = -ENOMEM;
>>>>               goto out;
>>>> @@ -2585,7 +2603,7 @@ svm_range 
>>>> *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>>>>           last = addr;
>>>>       }
>>>>   -    prange = svm_range_new(&p->svms, start, last);
>>>> +    prange = svm_range_new(&p->svms, start, last, true);
>>>>       if (!prange) {
>>>>           pr_debug("Failed to create prange in address [0x%llx]\n", 
>>>> addr);
>>>>           return NULL;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-18 21:40     ` Felix Kuehling
@ 2022-05-19 14:01       ` philip yang
  2022-05-19 15:14         ` Felix Kuehling
  0 siblings, 1 reply; 15+ messages in thread
From: philip yang @ 2022-05-19 14:01 UTC (permalink / raw)
  To: Felix Kuehling, Alex Sierra, amd-gfx

[-- Attachment #1: Type: text/html, Size: 22648 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-18 18:36   ` philip yang
@ 2022-05-18 21:40     ` Felix Kuehling
  2022-05-19 14:01       ` philip yang
  0 siblings, 1 reply; 15+ messages in thread
From: Felix Kuehling @ 2022-05-18 21:40 UTC (permalink / raw)
  To: philip yang, Alex Sierra, amd-gfx


On 2022-05-18 14:36, philip yang wrote:
>
>
> On 2022-05-17 19:11, Alex Sierra wrote:
>> [WHY]
>> Unified memory with xnack off should be tracked, as userptr mappings
>> and legacy allocations do. To avoid oversuscribe system memory when
>> xnack off.
>> [How]
>> Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
>> API and call them on every prange creation and free.
>>
>> Signed-off-by: Alex Sierra<alex.sierra@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 ++++++++++++++-----
>>   3 files changed, 43 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index f8b9f27adcf5..f55f34af6480 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *
>>   void amdgpu_amdkfd_block_mmu_notifications(void *p);
>>   int amdgpu_amdkfd_criu_resume(void *p);
>>   bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>> +		uint64_t size, u32 alloc_flag);
>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>> +		uint64_t size, u32 alloc_flag);
>>   
>>   #if IS_ENABLED(CONFIG_HSA_AMD)
>>   void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> index e985cf9c7ec0..b2236159ff39 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>> @@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
>>    *
>>    * Return: returns -ENOMEM in case of error, ZERO otherwise
>>    */
>> -static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>> +int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>   		uint64_t size, u32 alloc_flag)
>>   {
>>   	uint64_t reserved_for_pt =
>> @@ -157,8 +157,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>   	     kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
>>   	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
>>   	     kfd_mem_limit.max_ttm_mem_limit) ||
>> -	    (adev->kfd.vram_used + vram_needed >
>> -	     adev->gmc.real_vram_size - reserved_for_pt)) {
>> +	    (adev && (adev->kfd.vram_used + vram_needed >
>> +	     adev->gmc.real_vram_size - reserved_for_pt))) {
>>   		ret = -ENOMEM;
>>   		goto release;
>>   	}
>> @@ -166,7 +166,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>   	/* Update memory accounting by decreasing available system
>>   	 * memory, TTM memory and GPU memory as computed above
>>   	 */
>> -	adev->kfd.vram_used += vram_needed;
>> +	if (adev)
>> +		adev->kfd.vram_used += vram_needed;
>>   	kfd_mem_limit.system_mem_used += system_mem_needed;
>>   	kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
>>   
>> @@ -175,7 +176,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
>>   	return ret;
>>   }
>>   
>> -static void unreserve_mem_limit(struct amdgpu_device *adev,
>> +void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
>>   		uint64_t size, u32 alloc_flag)
>>   {
>>   	spin_lock(&kfd_mem_limit.mem_limit_lock);
>> @@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>>   		kfd_mem_limit.system_mem_used -= size;
>>   		kfd_mem_limit.ttm_mem_used -= size;
>>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
>> -		adev->kfd.vram_used -= size;
>> +		if (adev)
>> +			adev->kfd.vram_used -= size;
>>   	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
>>   		kfd_mem_limit.system_mem_used -= size;
>>   	} else if (!(alloc_flag &
>> @@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
>>   		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
>>   		goto release;
>>   	}
>> -
>> -	WARN_ONCE(adev->kfd.vram_used < 0,
>> -		  "KFD VRAM memory accounting unbalanced");
>> +	if (adev)
>> +		WARN_ONCE(adev->kfd.vram_used < 0,
>> +			"KFD VRAM memory accounting unbalanced");
>>   	WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
>>   		  "KFD TTM memory accounting unbalanced");
>>   	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
>> @@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
>>   	u32 alloc_flags = bo->kfd_bo->alloc_flags;
>>   	u64 size = amdgpu_bo_size(bo);
>>   
>> -	unreserve_mem_limit(adev, size, alloc_flags);
>> +	amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
>>   
>>   	kfree(bo->kfd_bo);
>>   }
>> @@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>>   	/* Don't unreserve system mem limit twice */
>>   	goto err_reserve_limit;
>>   err_bo_create:
>> -	unreserve_mem_limit(adev, size, flags);
>> +	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
>>   err_reserve_limit:
>>   	mutex_destroy(&(*mem)->lock);
>>   	if (gobj)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 5ed8d9b549a4..e7e9b388fea4 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct svm_range *prange)
>>   
>>   static void svm_range_free(struct svm_range *prange)
>>   {
>> +	uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
>> +	struct kfd_process *p;
>> +
>>   	pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms, prange,
>>   		 prange->start, prange->last);
>>   
>>   	svm_range_vram_node_free(prange);
>>   	svm_range_free_dma_mappings(prange);
>> +
>> +	p = container_of(prange->svms, struct kfd_process, svms);
>> +	if (!p->xnack_enabled) {
>> +		pr_debug("unreserve mem limit: %lld\n", size);
>> +		amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
>> +					KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
>> +	}
>>   	mutex_destroy(&prange->lock);
>>   	mutex_destroy(&prange->migrate_mutex);
>>   	kfree(prange);
>> @@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
>>   
>>   static struct
>>   svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>> -			 uint64_t last)
>> +			 uint64_t last, bool is_new_alloc)
>>   {
>>   	uint64_t size = last - start + 1;
>>   	struct svm_range *prange;
>> @@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>>   	prange = kzalloc(sizeof(*prange), GFP_KERNEL);
>>   	if (!prange)
>>   		return NULL;
>> +
>> +	p = container_of(svms, struct kfd_process, svms);
>> +	if (!p->xnack_enabled && is_new_alloc &&
>> +	    amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
>> +					    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
> The range will create svm_bo to migrate to VRAM, so count acc_size is 
> correct.

I'm not sure how to understand this comment. The thing is, the VRAM BO 
can be evicted without losing functionality. So I don't think we need to 
track potential VRAM usage of SVM ranges. We only need to account for 
system memory usage. That includes the system memory being mapped, and 
maybe the prange structures and dma_addr arrays used for the mapping. 
However, like I said, the old acc_size isn't really suitable for that 
because it doesn't account for mappings on multiple GPUs.

The dma address arrays take 1/512 of the memory size, per GPU. Even on 
an 8GPU system, that's only 1/64 of the memory size. So I think the 
15/16 system memory limit still leaves enough room for those data 
structures. If that gets too tight, we may just decide to use a lower 
system memory limit, or change the system memory limit based on the 
number of GPUs in the system. That would be easier than accurately 
tracking the data structure sizes for each allocation and potentially 
each mapping on a multi-GPU system.

Regards,
   Felix


>> +		pr_err("Failed to allocate usrptr memory\n");
>> +		kfree(prange);
>> +		return NULL;
>> +	}
>>   	prange->npages = size;
>>   	prange->svms = svms;
>>   	prange->start = start;
>> @@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
>>   	mutex_init(&prange->migrate_mutex);
>>   	mutex_init(&prange->lock);
>>   
>> -	p = container_of(svms, struct kfd_process, svms);
>>   	if (p->xnack_enabled)
>>   		bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
>>   			    MAX_GPU_INSTANCE);
>> @@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
>>   
>>   	svms = prange->svms;
>>   	if (old_start == start)
>> -		*new = svm_range_new(svms, last + 1, old_last);
>> +		*new = svm_range_new(svms, last + 1, old_last, false);
>>   	else
>> -		*new = svm_range_new(svms, old_start, start - 1);
>> +		*new = svm_range_new(svms, old_start, start - 1, false);
>>   	if (!*new)
>>   		return -ENOMEM;
>>   
>> @@ -1825,7 +1843,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
>>   {
>>   	struct svm_range *new;
>>   
>> -	new = svm_range_new(old->svms, old->start, old->last);
>> +	new = svm_range_new(old->svms, old->start, old->last, true);
>
> clone prange is not new memory allocation, use false.
>
> Regards,
>
> Philip
>
>>   	if (!new)
>>   		return NULL;
>>   
>> @@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>   
>>   		/* insert a new node if needed */
>>   		if (node->start > start) {
>> -			prange = svm_range_new(svms, start, node->start - 1);
>> +			prange = svm_range_new(svms, start, node->start - 1, true);
>>   			if (!prange) {
>>   				r = -ENOMEM;
>>   				goto out;
>> @@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
>>   
>>   	/* add a final range at the end if needed */
>>   	if (start <= last) {
>> -		prange = svm_range_new(svms, start, last);
>> +		prange = svm_range_new(svms, start, last, true);
>>   		if (!prange) {
>>   			r = -ENOMEM;
>>   			goto out;
>> @@ -2585,7 +2603,7 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
>>   		last = addr;
>>   	}
>>   
>> -	prange = svm_range_new(&p->svms, start, last);
>> +	prange = svm_range_new(&p->svms, start, last, true);
>>   	if (!prange) {
>>   		pr_debug("Failed to create prange in address [0x%llx]\n", addr);
>>   		return NULL;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-17 23:11 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
@ 2022-05-18 18:36   ` philip yang
  2022-05-18 21:40     ` Felix Kuehling
  0 siblings, 1 reply; 15+ messages in thread
From: philip yang @ 2022-05-18 18:36 UTC (permalink / raw)
  To: Alex Sierra, amd-gfx

[-- Attachment #1: Type: text/html, Size: 9939 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off
  2022-05-17 23:11 [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Alex Sierra
@ 2022-05-17 23:11 ` Alex Sierra
  2022-05-18 18:36   ` philip yang
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Sierra @ 2022-05-17 23:11 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alex Sierra

[WHY]
Unified memory with xnack off should be tracked, as userptr mappings
and legacy allocations do. To avoid oversuscribe system memory when
xnack off.
[How]
Exposing functions reserve_mem_limit and unreserve_mem_limit to SVM
API and call them on every prange creation and free.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 24 +++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          | 34 ++++++++++++++-----
 3 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f8b9f27adcf5..f55f34af6480 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -301,6 +301,10 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem *
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev);
+int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
+		uint64_t size, u32 alloc_flag);
+void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
+		uint64_t size, u32 alloc_flag);
 
 #if IS_ENABLED(CONFIG_HSA_AMD)
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e985cf9c7ec0..b2236159ff39 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -122,7 +122,7 @@ void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
  *
  * Return: returns -ENOMEM in case of error, ZERO otherwise
  */
-static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
+int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
 	uint64_t reserved_for_pt =
@@ -157,8 +157,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	     kfd_mem_limit.max_system_mem_limit && !no_system_mem_limit) ||
 	    (kfd_mem_limit.ttm_mem_used + ttm_mem_needed >
 	     kfd_mem_limit.max_ttm_mem_limit) ||
-	    (adev->kfd.vram_used + vram_needed >
-	     adev->gmc.real_vram_size - reserved_for_pt)) {
+	    (adev && (adev->kfd.vram_used + vram_needed >
+	     adev->gmc.real_vram_size - reserved_for_pt))) {
 		ret = -ENOMEM;
 		goto release;
 	}
@@ -166,7 +166,8 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	/* Update memory accounting by decreasing available system
 	 * memory, TTM memory and GPU memory as computed above
 	 */
-	adev->kfd.vram_used += vram_needed;
+	if (adev)
+		adev->kfd.vram_used += vram_needed;
 	kfd_mem_limit.system_mem_used += system_mem_needed;
 	kfd_mem_limit.ttm_mem_used += ttm_mem_needed;
 
@@ -175,7 +176,7 @@ static int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
 	return ret;
 }
 
-static void unreserve_mem_limit(struct amdgpu_device *adev,
+void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
 		uint64_t size, u32 alloc_flag)
 {
 	spin_lock(&kfd_mem_limit.mem_limit_lock);
@@ -184,7 +185,8 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
 		kfd_mem_limit.system_mem_used -= size;
 		kfd_mem_limit.ttm_mem_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
-		adev->kfd.vram_used -= size;
+		if (adev)
+			adev->kfd.vram_used -= size;
 	} else if (alloc_flag & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
 		kfd_mem_limit.system_mem_used -= size;
 	} else if (!(alloc_flag &
@@ -193,9 +195,9 @@ static void unreserve_mem_limit(struct amdgpu_device *adev,
 		pr_err("%s: Invalid BO type %#x\n", __func__, alloc_flag);
 		goto release;
 	}
-
-	WARN_ONCE(adev->kfd.vram_used < 0,
-		  "KFD VRAM memory accounting unbalanced");
+	if (adev)
+		WARN_ONCE(adev->kfd.vram_used < 0,
+			"KFD VRAM memory accounting unbalanced");
 	WARN_ONCE(kfd_mem_limit.ttm_mem_used < 0,
 		  "KFD TTM memory accounting unbalanced");
 	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
@@ -211,7 +213,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
 	u32 alloc_flags = bo->kfd_bo->alloc_flags;
 	u64 size = amdgpu_bo_size(bo);
 
-	unreserve_mem_limit(adev, size, alloc_flags);
+	amdgpu_amdkfd_unreserve_mem_limit(adev, size, alloc_flags);
 
 	kfree(bo->kfd_bo);
 }
@@ -1601,7 +1603,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	/* Don't unreserve system mem limit twice */
 	goto err_reserve_limit;
 err_bo_create:
-	unreserve_mem_limit(adev, size, flags);
+	amdgpu_amdkfd_unreserve_mem_limit(adev, size, flags);
 err_reserve_limit:
 	mutex_destroy(&(*mem)->lock);
 	if (gobj)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 5ed8d9b549a4..e7e9b388fea4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -261,11 +261,21 @@ void svm_range_free_dma_mappings(struct svm_range *prange)
 
 static void svm_range_free(struct svm_range *prange)
 {
+	uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT;
+	struct kfd_process *p;
+
 	pr_debug("svms 0x%p prange 0x%p [0x%lx 0x%lx]\n", prange->svms, prange,
 		 prange->start, prange->last);
 
 	svm_range_vram_node_free(prange);
 	svm_range_free_dma_mappings(prange);
+
+	p = container_of(prange->svms, struct kfd_process, svms);
+	if (!p->xnack_enabled) {
+		pr_debug("unreserve mem limit: %lld\n", size);
+		amdgpu_amdkfd_unreserve_mem_limit(NULL, size,
+					KFD_IOC_ALLOC_MEM_FLAGS_USERPTR);
+	}
 	mutex_destroy(&prange->lock);
 	mutex_destroy(&prange->migrate_mutex);
 	kfree(prange);
@@ -284,7 +294,7 @@ svm_range_set_default_attributes(int32_t *location, int32_t *prefetch_loc,
 
 static struct
 svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
-			 uint64_t last)
+			 uint64_t last, bool is_new_alloc)
 {
 	uint64_t size = last - start + 1;
 	struct svm_range *prange;
@@ -293,6 +303,15 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
 	prange = kzalloc(sizeof(*prange), GFP_KERNEL);
 	if (!prange)
 		return NULL;
+
+	p = container_of(svms, struct kfd_process, svms);
+	if (!p->xnack_enabled && is_new_alloc &&
+	    amdgpu_amdkfd_reserve_mem_limit(NULL, size << PAGE_SHIFT,
+					    KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
+		pr_err("Failed to allocate usrptr memory\n");
+		kfree(prange);
+		return NULL;
+	}
 	prange->npages = size;
 	prange->svms = svms;
 	prange->start = start;
@@ -307,7 +326,6 @@ svm_range *svm_range_new(struct svm_range_list *svms, uint64_t start,
 	mutex_init(&prange->migrate_mutex);
 	mutex_init(&prange->lock);
 
-	p = container_of(svms, struct kfd_process, svms);
 	if (p->xnack_enabled)
 		bitmap_copy(prange->bitmap_access, svms->bitmap_supported,
 			    MAX_GPU_INSTANCE);
@@ -1000,9 +1018,9 @@ svm_range_split(struct svm_range *prange, uint64_t start, uint64_t last,
 
 	svms = prange->svms;
 	if (old_start == start)
-		*new = svm_range_new(svms, last + 1, old_last);
+		*new = svm_range_new(svms, last + 1, old_last, false);
 	else
-		*new = svm_range_new(svms, old_start, start - 1);
+		*new = svm_range_new(svms, old_start, start - 1, false);
 	if (!*new)
 		return -ENOMEM;
 
@@ -1825,7 +1843,7 @@ static struct svm_range *svm_range_clone(struct svm_range *old)
 {
 	struct svm_range *new;
 
-	new = svm_range_new(old->svms, old->start, old->last);
+	new = svm_range_new(old->svms, old->start, old->last, true);
 	if (!new)
 		return NULL;
 
@@ -1951,7 +1969,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 		/* insert a new node if needed */
 		if (node->start > start) {
-			prange = svm_range_new(svms, start, node->start - 1);
+			prange = svm_range_new(svms, start, node->start - 1, true);
 			if (!prange) {
 				r = -ENOMEM;
 				goto out;
@@ -1967,7 +1985,7 @@ svm_range_add(struct kfd_process *p, uint64_t start, uint64_t size,
 
 	/* add a final range at the end if needed */
 	if (start <= last) {
-		prange = svm_range_new(svms, start, last);
+		prange = svm_range_new(svms, start, last, true);
 		if (!prange) {
 			r = -ENOMEM;
 			goto out;
@@ -2585,7 +2603,7 @@ svm_range *svm_range_create_unregistered_range(struct amdgpu_device *adev,
 		last = addr;
 	}
 
-	prange = svm_range_new(&p->svms, start, last);
+	prange = svm_range_new(&p->svms, start, last, true);
 	if (!prange) {
 		pr_debug("Failed to create prange in address [0x%llx]\n", addr);
 		return NULL;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-06-07 22:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 16:21 [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Alex Sierra
2022-05-19 16:21 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
2022-05-19 19:07   ` philip yang
2022-05-19 23:08   ` Felix Kuehling
2022-05-20 12:45     ` philip yang
2022-05-20 15:12       ` Felix Kuehling
2022-05-20 17:30         ` Sierra Guiza, Alejandro (Alex)
2022-05-19 23:09 ` [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Felix Kuehling
  -- strict thread matches above, loose matches on Subject: below --
2022-06-07 17:17 Alex Sierra
2022-06-07 17:17 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
2022-06-07 22:36   ` Felix Kuehling
2022-05-17 23:11 [PATCH 1/2] drm/amdgpu: remove acc_size from reserve/unreserve mem Alex Sierra
2022-05-17 23:11 ` [PATCH 2/2] drm/amdkfd: track unified memory reservation with xnack off Alex Sierra
2022-05-18 18:36   ` philip yang
2022-05-18 21:40     ` Felix Kuehling
2022-05-19 14:01       ` philip yang
2022-05-19 15:14         ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.