[PATCH 0/4] Unified memory for CWSR save restore area

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] Unified memory for CWSR save restore area
@ 2022-06-28 21:43 Eric Huang
  2022-06-28 21:43 ` [PATCH 1/2] drm/amdkfd: add new flag for svm Eric Huang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Eric Huang @ 2022-06-28 21:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang, Philip.Yang, felix.kuehling

amdkfd changes:

Eric Huang (2):
  drm/amdkfd: add new flag for svm
  drm/amdkfd: change svm range evict

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
 include/uapi/linux/kfd_ioctl.h       |  2 ++
 2 files changed, 10 insertions(+), 2 deletions(-)

libhsakmt(thunk) changes:
which are based on https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

Eric Huang (2):
  libhsakmt: add new flags for svm
  libhsakmt: allocate unified memory for ctx save restore area

 include/hsakmttypes.h     |   1 +
 include/linux/kfd_ioctl.h |   2 +
 src/queues.c              | 109 +++++++++++++++++++++++++++++++++-----
 3 files changed, 98 insertions(+), 14 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] drm/amdkfd: add new flag for svm
  2022-06-28 21:43 [PATCH 0/4] Unified memory for CWSR save restore area Eric Huang
@ 2022-06-28 21:43 ` Eric Huang
  2022-06-28 21:43 ` [PATCH 2/2] drm/amdkfd: change svm range evict Eric Huang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Eric Huang @ 2022-06-28 21:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang, Philip.Yang, felix.kuehling

It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index fd49dde4d5f4..eba04ebfd9a8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1076,6 +1076,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC    0x00000010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY     0x00000020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x00000040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] drm/amdkfd: change svm range evict
  2022-06-28 21:43 [PATCH 0/4] Unified memory for CWSR save restore area Eric Huang
  2022-06-28 21:43 ` [PATCH 1/2] drm/amdkfd: add new flag for svm Eric Huang
@ 2022-06-28 21:43 ` Eric Huang
  2022-06-29 22:20   ` Felix Kuehling
  2022-06-28 21:43 ` [PATCH 3/4] libhsakmt: add new flags for svm Eric Huang
  2022-06-28 21:43 ` [PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area Eric Huang
  3 siblings, 1 reply; 11+ messages in thread
From: Eric Huang @ 2022-06-28 21:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang, Philip.Yang, felix.kuehling

Two changes:
1. reducing unnecessary evict/unmap when range is not mapped to gpu.
2. adding always evict when flags is set to always_mapped.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 4bf2f75f853b..76e817687ef9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
 	struct kfd_process *p;
 	int r = 0;
 
+	if (!prange->mapped_to_gpu)
+		return 0;
+
 	p = container_of(svms, struct kfd_process, svms);
 
 	pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n",
 		 svms, prange->start, prange->last, start, last);
 
-	if (!p->xnack_enabled) {
+	if (!p->xnack_enabled ||
+	    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
 		int evicted_ranges;
 
 		list_for_each_entry(pchild, &prange->child_list, child_list) {
@@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
 		if (r)
 			goto out_unlock_range;
 
-		if (migrated && !p->xnack_enabled) {
+		if (migrated && (!p->xnack_enabled ||
+		    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
+		    prange->mapped_to_gpu) {
 			pr_debug("restore_work will update mappings of GPUs\n");
 			mutex_unlock(&prange->migrate_mutex);
 			continue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] libhsakmt: add new flags for svm
  2022-06-28 21:43 [PATCH 0/4] Unified memory for CWSR save restore area Eric Huang
  2022-06-28 21:43 ` [PATCH 1/2] drm/amdkfd: add new flag for svm Eric Huang
  2022-06-28 21:43 ` [PATCH 2/2] drm/amdkfd: change svm range evict Eric Huang
@ 2022-06-28 21:43 ` Eric Huang
  2022-06-28 21:43 ` [PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area Eric Huang
  3 siblings, 0 replies; 11+ messages in thread
From: Eric Huang @ 2022-06-28 21:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang, Philip.Yang, felix.kuehling

It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492
---
 include/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h
index 8a0ed49..5c45f58 100644
--- a/include/linux/kfd_ioctl.h
+++ b/include/linux/kfd_ioctl.h
@@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC    0x00000010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY     0x00000020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x00000040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area
  2022-06-28 21:43 [PATCH 0/4] Unified memory for CWSR save restore area Eric Huang
                   ` (2 preceding siblings ...)
  2022-06-28 21:43 ` [PATCH 3/4] libhsakmt: add new flags for svm Eric Huang
@ 2022-06-28 21:43 ` Eric Huang
  3 siblings, 0 replies; 11+ messages in thread
From: Eric Huang @ 2022-06-28 21:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang, Philip.Yang, felix.kuehling

To improve performance on queue preemption, allocate ctx s/r
 area in VRAM instead of system memory, and migrate it back
 to system memory when VRAM is full.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: If775782027188dbe84b6868260e429373675434c
---
 include/hsakmttypes.h |   1 +
 src/queues.c          | 109 ++++++++++++++++++++++++++++++++++++------
 2 files changed, 96 insertions(+), 14 deletions(-)

diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index 9063f85..2c1c7cc 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -1329,6 +1329,7 @@ typedef enum _HSA_SVM_FLAGS {
 	HSA_SVM_FLAG_GPU_RO      = 0x00000008, // GPUs only read, allows replication
 	HSA_SVM_FLAG_GPU_EXEC    = 0x00000010, // Allow execution on GPU
 	HSA_SVM_FLAG_GPU_READ_MOSTLY = 0x00000020, // GPUs mostly read, may allow similar optimizations as RO, but writes fault
+	HSA_SVM_FLAG_GPU_ALWAYS_MAPPED = 0x00000040, // Keep GPU memory mapping always valid as if XNACK is disable
 } HSA_SVM_FLAGS;
 
 typedef enum _HSA_SVM_ATTR_TYPE {
diff --git a/src/queues.c b/src/queues.c
index c83dd93..e65103d 100644
--- a/src/queues.c
+++ b/src/queues.c
@@ -68,6 +68,7 @@ struct queue {
 	uint32_t eop_buffer_size;
 	uint32_t gfxv;
 	bool use_ats;
+	bool unified_ctx_save_restore;
 	/* This queue structure is allocated from GPU with page aligned size
 	 * but only small bytes are used. We use the extra space in the end for
 	 * cu_mask bits array.
@@ -383,13 +384,50 @@ static void free_exec_aligned_memory(void *addr, uint32_t size, uint32_t align,
 		munmap(addr, size);
 }
 
+static HSAKMT_STATUS register_exec_svm_range(void *mem, uint32_t size,
+				uint32_t gpuNode, uint32_t prefetchNode,
+				uint32_t preferredNode, bool alwaysMapped)
+{
+	HSA_SVM_ATTRIBUTE *attrs;
+	HSAuint64 s_attr;
+	HSAuint32 nattr;
+	HSAuint32 flags;
+
+	flags = HSA_SVM_FLAG_HOST_ACCESS |
+	        HSA_SVM_FLAG_GPU_EXEC;
+
+	if (alwaysMapped)
+		flags |= HSA_SVM_FLAG_GPU_ALWAYS_MAPPED;
+
+	nattr = 5;
+	s_attr = sizeof(*attrs) * nattr;
+	attrs = (HSA_SVM_ATTRIBUTE *)alloca(s_attr);
+
+	attrs[0].type = HSA_SVM_ATTR_PREFETCH_LOC;
+	attrs[0].value = prefetchNode;
+	attrs[1].type = HSA_SVM_ATTR_PREFERRED_LOC;
+	attrs[1].value = preferredNode;
+	attrs[2].type = HSA_SVM_ATTR_CLR_FLAGS;
+	attrs[2].value = flags;
+	attrs[3].type = HSA_SVM_ATTR_SET_FLAGS;
+	attrs[3].value = flags;
+	attrs[4].type = HSA_SVM_ATTR_ACCESS;
+	attrs[4].value = gpuNode;
+
+	return hsaKmtSVMSetAttr(mem, size, nattr, attrs);
+}
+
 static void free_queue(struct queue *q)
 {
 	if (q->eop_buffer)
 		free_exec_aligned_memory(q->eop_buffer,
 					 q->eop_buffer_size,
 					 PAGE_SIZE, q->use_ats);
-	if (q->ctx_save_restore)
+	if (q->unified_ctx_save_restore)
+		munmap(q->ctx_save_restore,
+			ALIGN_UP(q->ctx_save_restore_size + q->debug_memory_size,
+					PAGE_SIZE));
+	else if (q->ctx_save_restore)
 		free_exec_aligned_memory(q->ctx_save_restore,
 					 q->ctx_save_restore_size,
 					 PAGE_SIZE, q->use_ats);
@@ -425,6 +463,8 @@ static int handle_concrete_asic(struct queue *q,
 	if (ret) {
 		uint32_t total_mem_alloc_size = 0;
 		HsaUserContextSaveAreaHeader *header;
+		HsaNodeProperties node;
+		bool svm_api;
 
 		args->ctx_save_restore_size = q->ctx_save_restore_size;
 		args->ctl_stack_size = q->ctl_stack_size;
@@ -434,22 +474,63 @@ static int handle_concrete_asic(struct queue *q,
 		 */
 		total_mem_alloc_size = q->ctx_save_restore_size +
 				       q->debug_memory_size;
-		q->ctx_save_restore =
-			allocate_exec_aligned_memory(total_mem_alloc_size,
-					 q->use_ats, NodeId, false, false);
 
-		if (!q->ctx_save_restore)
-			return HSAKMT_STATUS_NO_MEMORY;
+		if (hsaKmtGetNodeProperties(NodeId, &node))
+			svm_api = false;
+		else
+			svm_api = node.Capability.ui32.SVMAPISupported;
 
-		args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore;
+		/* Allocate unified memory for context save restore
+		 * area on dGPU.
+		 */
+		if (!q->use_ats && svm_api) {
+			uint32_t size = ALIGN_UP(total_mem_alloc_size, PAGE_SIZE);
+			void *addr;
+			HSAKMT_STATUS r = HSAKMT_STATUS_ERROR;
+
+			addr = mmap(0, size, PROT_READ | PROT_WRITE,
+					MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+
+			if (addr == MAP_FAILED)
+				pr_err("[%s] mmap failed: %s\n", __func__, strerror(errno));
+			else {
+				header = (HsaUserContextSaveAreaHeader *)addr;
+				header->ErrorEventId = 0;
+				if (Event)
+					header->ErrorEventId = Event->EventId;
+				header->ErrorReason = ErrPayload;
+				header->DebugOffset = q->ctx_save_restore_size;
+				header->DebugSize = q->debug_memory_size;
+
+				r = register_exec_svm_range(addr, size,
+						NodeId, NodeId, 0, true);
+
+				if (r == HSAKMT_STATUS_SUCCESS) {
+					q->ctx_save_restore = addr;
+					q->unified_ctx_save_restore = true;
+				} else
+					munmap(addr, size);
+			}
+		}
 
-		header = (HsaUserContextSaveAreaHeader *)q->ctx_save_restore;
-		header->ErrorEventId = 0;
-		if (Event)
-			header->ErrorEventId = Event->EventId;
-		header->ErrorReason = ErrPayload;
-		header->DebugOffset = q->ctx_save_restore_size;
-		header->DebugSize = q->debug_memory_size;
+		if (!q->unified_ctx_save_restore) {
+			q->ctx_save_restore = allocate_exec_aligned_memory(
+							total_mem_alloc_size,
+							q->use_ats, NodeId, false, false);
+
+			if (!q->ctx_save_restore)
+				return HSAKMT_STATUS_NO_MEMORY;
+
+			header = (HsaUserContextSaveAreaHeader *)q->ctx_save_restore;
+			header->ErrorEventId = 0;
+			if (Event)
+				header->ErrorEventId = Event->EventId;
+			header->ErrorReason = ErrPayload;
+			header->DebugOffset = q->ctx_save_restore_size;
+			header->DebugSize = q->debug_memory_size;
+		}
+
+		args->ctx_save_restore_address = (uintptr_t)q->ctx_save_restore;
 	}
 
 	return HSAKMT_STATUS_SUCCESS;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: change svm range evict
  2022-06-28 21:43 ` [PATCH 2/2] drm/amdkfd: change svm range evict Eric Huang
@ 2022-06-29 22:20   ` Felix Kuehling
  2022-06-29 22:53     ` Eric Huang
  0 siblings, 1 reply; 11+ messages in thread
From: Felix Kuehling @ 2022-06-29 22:20 UTC (permalink / raw)
  To: Eric Huang, amd-gfx; +Cc: Philip.Yang

On 2022-06-28 17:43, Eric Huang wrote:
> Two changes:
> 1. reducing unnecessary evict/unmap when range is not mapped to gpu.
> 2. adding always evict when flags is set to always_mapped.
>
> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 4bf2f75f853b..76e817687ef9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
>   	struct kfd_process *p;
>   	int r = 0;
>   
> +	if (!prange->mapped_to_gpu)
> +		return 0;

This feels like an unrelated optimization that should be in a separate 
patch.

But I'm not sure this is correct, because it doesn't consider child 
ranges. svm_range_unmap_from_gpus already contains this check, so ranges 
should not be unmapped unnecessarily either way. Is there any other 
benefit to this change that I'm missing?

Regards,
   Felix


> +
>   	p = container_of(svms, struct kfd_process, svms);
>   
>   	pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 0x%lx]\n",
>   		 svms, prange->start, prange->last, start, last);
>   
> -	if (!p->xnack_enabled) {
> +	if (!p->xnack_enabled ||
> +	    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
>   		int evicted_ranges;
>   
>   		list_for_each_entry(pchild, &prange->child_list, child_list) {
> @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm,
>   		if (r)
>   			goto out_unlock_range;
>   
> -		if (migrated && !p->xnack_enabled) {
> +		if (migrated && (!p->xnack_enabled ||
> +		    (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
> +		    prange->mapped_to_gpu) {
>   			pr_debug("restore_work will update mappings of GPUs\n");
>   			mutex_unlock(&prange->migrate_mutex);
>   			continue;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: change svm range evict
  2022-06-29 22:20   ` Felix Kuehling
@ 2022-06-29 22:53     ` Eric Huang
  2022-06-29 23:29       ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Huang @ 2022-06-29 22:53 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx; +Cc: Philip.Yang



On 2022-06-29 18:20, Felix Kuehling wrote:
> On 2022-06-28 17:43, Eric Huang wrote:
>> Two changes:
>> 1. reducing unnecessary evict/unmap when range is not mapped to gpu.
>> 2. adding always evict when flags is set to always_mapped.
>>
>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 4bf2f75f853b..76e817687ef9 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
>> struct mm_struct *mm,
>>       struct kfd_process *p;
>>       int r = 0;
>>   +    if (!prange->mapped_to_gpu)
>> +        return 0;
>
> This feels like an unrelated optimization that should be in a separate 
> patch.
>
> But I'm not sure this is correct, because it doesn't consider child 
> ranges. svm_range_unmap_from_gpus already contains this check, so 
> ranges should not be unmapped unnecessarily either way. Is there any 
> other benefit to this change that I'm missing?
I will send another patch separately that considers child ranges. The 
benefit is it will reduce unnecessary queue evicts when allocating 
context save memory, which is unmapped to gpu. It is for efficiency 
reason. On the other hand, without this optimization 
KFDCWSRTest.InterruptRestore fails with queue preemption error. I think 
the reason is the extra queue evicts make HWS too busy to preempt 
existing queues. There is one unmap_queue packet sent to HWS in current 
code, and will be three unmap_queue packets with unified memory 
allocation. So this optimization will keep only one unmap_queue as before.

Regards,
Eric
>
> Regards,
>   Felix
>
>
>> +
>>       p = container_of(svms, struct kfd_process, svms);
>>         pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
>> 0x%lx]\n",
>>            svms, prange->start, prange->last, start, last);
>>   -    if (!p->xnack_enabled) {
>> +    if (!p->xnack_enabled ||
>> +        (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
>>           int evicted_ranges;
>>             list_for_each_entry(pchild, &prange->child_list, 
>> child_list) {
>> @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
>> struct mm_struct *mm,
>>           if (r)
>>               goto out_unlock_range;
>>   -        if (migrated && !p->xnack_enabled) {
>> +        if (migrated && (!p->xnack_enabled ||
>> +            (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
>> +            prange->mapped_to_gpu) {
>>               pr_debug("restore_work will update mappings of GPUs\n");
>>               mutex_unlock(&prange->migrate_mutex);
>>               continue;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: change svm range evict
  2022-06-29 22:53     ` Eric Huang
@ 2022-06-29 23:29       ` Felix Kuehling
  2022-06-30 15:19         ` Eric Huang
  0 siblings, 1 reply; 11+ messages in thread
From: Felix Kuehling @ 2022-06-29 23:29 UTC (permalink / raw)
  To: Eric Huang, amd-gfx; +Cc: Philip.Yang

On 2022-06-29 18:53, Eric Huang wrote:
>
>
> On 2022-06-29 18:20, Felix Kuehling wrote:
>> On 2022-06-28 17:43, Eric Huang wrote:
>>> Two changes:
>>> 1. reducing unnecessary evict/unmap when range is not mapped to gpu.
>>> 2. adding always evict when flags is set to always_mapped.
>>>
>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 4bf2f75f853b..76e817687ef9 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
>>> struct mm_struct *mm,
>>>       struct kfd_process *p;
>>>       int r = 0;
>>>   +    if (!prange->mapped_to_gpu)
>>> +        return 0;
>>
>> This feels like an unrelated optimization that should be in a 
>> separate patch.
>>
>> But I'm not sure this is correct, because it doesn't consider child 
>> ranges. svm_range_unmap_from_gpus already contains this check, so 
>> ranges should not be unmapped unnecessarily either way. Is there any 
>> other benefit to this change that I'm missing?
> I will send another patch separately that considers child ranges.

I think this should only be done in the XNACK-off case. For XNACK-on 
it's already handled in the svm_range_unmap_from_gpus.


> The benefit is it will reduce unnecessary queue evicts when allocating 
> context save memory, which is unmapped to gpu.

That sounds wrong. The context save area should never be unmapped from 
GPU. That's the whole point of setting the 
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening 
while migrating the context save area to VRAM for the first time, even 
before it's mapped to GPU?

I think there may be another eviction, when the CWSR header is 
initialized by the CPU. That would also migrate it back to system 
memory. To avoid that, you should probably register the context save 
area only after the header has been initialized.

I think avoiding an eviction when memory is migrated when it is first 
registered is worthwhile, not just for CWSR.


> It is for efficiency reason. On the other hand, without this 
> optimization KFDCWSRTest.InterruptRestore fails with queue preemption 
> error.

What do you mean by "queue preemption error"? Does HWS hang?


> I think the reason is the extra queue evicts make HWS too busy to 
> preempt existing queues. There is one unmap_queue packet sent to HWS 
> in current code, and will be three unmap_queue packets with unified 
> memory allocation.

When queues of a process are already evicted, they should not get 
evicted again. That's handled by the qpd->evicted counter. There should 
never be multiple unmap_queues packets in flight at the same time. If 
you're seeing three unmap_queues, you should also see queues restored 
three times.

HWS should never be too busy to evict queues. If you're seeing 
preemptions fail, you may have found a bug.

Regards,
   Felix


> So this optimization will keep only one unmap_queue as before.
>
> Regards,
> Eric
>>
>> Regards,
>>   Felix
>>
>>
>>> +
>>>       p = container_of(svms, struct kfd_process, svms);
>>>         pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
>>> 0x%lx]\n",
>>>            svms, prange->start, prange->last, start, last);
>>>   -    if (!p->xnack_enabled) {
>>> +    if (!p->xnack_enabled ||
>>> +        (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
>>>           int evicted_ranges;
>>>             list_for_each_entry(pchild, &prange->child_list, 
>>> child_list) {
>>> @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
>>> struct mm_struct *mm,
>>>           if (r)
>>>               goto out_unlock_range;
>>>   -        if (migrated && !p->xnack_enabled) {
>>> +        if (migrated && (!p->xnack_enabled ||
>>> +            (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
>>> +            prange->mapped_to_gpu) {
>>>               pr_debug("restore_work will update mappings of GPUs\n");
>>>               mutex_unlock(&prange->migrate_mutex);
>>>               continue;
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: change svm range evict
  2022-06-29 23:29       ` Felix Kuehling
@ 2022-06-30 15:19         ` Eric Huang
  2022-06-30 16:09           ` Felix Kuehling
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Huang @ 2022-06-30 15:19 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx; +Cc: Philip.Yang


On 2022-06-29 19:29, Felix Kuehling wrote:
> On 2022-06-29 18:53, Eric Huang wrote:
>>
>>
>> On 2022-06-29 18:20, Felix Kuehling wrote:
>>> On 2022-06-28 17:43, Eric Huang wrote:
>>>> Two changes:
>>>> 1. reducing unnecessary evict/unmap when range is not mapped to gpu.
>>>> 2. adding always evict when flags is set to always_mapped.
>>>>
>>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> index 4bf2f75f853b..76e817687ef9 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>> @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
>>>> struct mm_struct *mm,
>>>>       struct kfd_process *p;
>>>>       int r = 0;
>>>>   +    if (!prange->mapped_to_gpu)
>>>> +        return 0;
>>>
>>> This feels like an unrelated optimization that should be in a 
>>> separate patch.
>>>
>>> But I'm not sure this is correct, because it doesn't consider child 
>>> ranges. svm_range_unmap_from_gpus already contains this check, so 
>>> ranges should not be unmapped unnecessarily either way. Is there any 
>>> other benefit to this change that I'm missing?
>> I will send another patch separately that considers child ranges.
>
> I think this should only be done in the XNACK-off case. For XNACK-on 
> it's already handled in the svm_range_unmap_from_gpus.
Yes and It is also done when KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED is set.
>
>
>> The benefit is it will reduce unnecessary queue evicts when 
>> allocating context save memory, which is unmapped to gpu.
>
> That sounds wrong. The context save area should never be unmapped from 
> GPU. That's the whole point of setting the 
> KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening 
> while migrating the context save area to VRAM for the first time, even 
> before it's mapped to GPU?
Yes. It is for the first time when registering svm range and migrating 
to VRAM are doing together, at this moment, the range is not mapped to GPU.
>
> I think there may be another eviction, when the CWSR header is 
> initialized by the CPU. That would also migrate it back to system 
> memory. To avoid that, you should probably register the context save 
> area only after the header has been initialized.
Yes. I am using this way. Please look at patch 4/4.
>
> I think avoiding an eviction when memory is migrated when it is first 
> registered is worthwhile, not just for CWSR.
>
>
>> It is for efficiency reason. On the other hand, without this 
>> optimization KFDCWSRTest.InterruptRestore fails with queue preemption 
>> error.
>
> What do you mean by "queue preemption error"? Does HWS hang?
HWS doesn't hang immediately, so there is not error for fence timeout 
"The cp might be in an unrecoverable state due to an unsuccessful queues 
preemption". The error is "HIQ MQD's queue_doorbell_id0 is not 0, Queue 
preemption time out" after checking mqd manager, which means HWS 
abandons unmap queue request without returning timeout error to driver. 
And after this error, the following test will fail at queue creation as 
HWS hangs
>
>
>> I think the reason is the extra queue evicts make HWS too busy to 
>> preempt existing queues. There is one unmap_queue packet sent to HWS 
>> in current code, and will be three unmap_queue packets with unified 
>> memory allocation.
>
> When queues of a process are already evicted, they should not get 
> evicted again. That's handled by the qpd->evicted counter. There 
> should never be multiple unmap_queues packets in flight at the same 
> time. If you're seeing three unmap_queues, you should also see queues 
> restored three times.
>
> HWS should never be too busy to evict queues. If you're seeing 
> preemptions fail, you may have found a bug.
The restore delay worker will do something differently in term of 
timing. It could restore queues before next unmap_queues, so the 
situation is too complicate to debug in multiple queues evict/restore 
environment. The error definitely means there is a bug, from driver 
point of view there is nothing wrong even adding extra queue eviction, 
so I try to avoid extra queue eviction and keep it as before. The bottom 
line is to make sure unified svm range for context save area doesn't 
cause any failure in kfdtest, so I can theoretically assume extra queue 
eviction/restoring causes HWS failure.

Regards,
Eric
>
> Regards,
>   Felix
>
>
>> So this optimization will keep only one unmap_queue as before.
>>
>> Regards,
>> Eric
>>>
>>> Regards,
>>>   Felix
>>>
>>>
>>>> +
>>>>       p = container_of(svms, struct kfd_process, svms);
>>>>         pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
>>>> 0x%lx]\n",
>>>>            svms, prange->start, prange->last, start, last);
>>>>   -    if (!p->xnack_enabled) {
>>>> +    if (!p->xnack_enabled ||
>>>> +        (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
>>>>           int evicted_ranges;
>>>>             list_for_each_entry(pchild, &prange->child_list, 
>>>> child_list) {
>>>> @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
>>>> struct mm_struct *mm,
>>>>           if (r)
>>>>               goto out_unlock_range;
>>>>   -        if (migrated && !p->xnack_enabled) {
>>>> +        if (migrated && (!p->xnack_enabled ||
>>>> +            (prange->flags & 
>>>> KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
>>>> +            prange->mapped_to_gpu) {
>>>>               pr_debug("restore_work will update mappings of GPUs\n");
>>>>               mutex_unlock(&prange->migrate_mutex);
>>>>               continue;
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdkfd: change svm range evict
  2022-06-30 15:19         ` Eric Huang
@ 2022-06-30 16:09           ` Felix Kuehling
  0 siblings, 0 replies; 11+ messages in thread
From: Felix Kuehling @ 2022-06-30 16:09 UTC (permalink / raw)
  To: Eric Huang, amd-gfx; +Cc: Philip.Yang

On 2022-06-30 11:19, Eric Huang wrote:
>
> On 2022-06-29 19:29, Felix Kuehling wrote:
>> On 2022-06-29 18:53, Eric Huang wrote:
>>>
>>>
>>> On 2022-06-29 18:20, Felix Kuehling wrote:
>>>> On 2022-06-28 17:43, Eric Huang wrote:
>>>>> Two changes:
>>>>> 1. reducing unnecessary evict/unmap when range is not mapped to gpu.
>>>>> 2. adding always evict when flags is set to always_mapped.
>>>>>
>>>>> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 ++++++++--
>>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>>> index 4bf2f75f853b..76e817687ef9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>>>> @@ -1767,12 +1767,16 @@ svm_range_evict(struct svm_range *prange, 
>>>>> struct mm_struct *mm,
>>>>>       struct kfd_process *p;
>>>>>       int r = 0;
>>>>>   +    if (!prange->mapped_to_gpu)
>>>>> +        return 0;
>>>>
>>>> This feels like an unrelated optimization that should be in a 
>>>> separate patch.
>>>>
>>>> But I'm not sure this is correct, because it doesn't consider child 
>>>> ranges. svm_range_unmap_from_gpus already contains this check, so 
>>>> ranges should not be unmapped unnecessarily either way. Is there 
>>>> any other benefit to this change that I'm missing?
>>> I will send another patch separately that considers child ranges.
>>
>> I think this should only be done in the XNACK-off case. For XNACK-on 
>> it's already handled in the svm_range_unmap_from_gpus.
> Yes and It is also done when KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED is set.
>>
>>
>>> The benefit is it will reduce unnecessary queue evicts when 
>>> allocating context save memory, which is unmapped to gpu.
>>
>> That sounds wrong. The context save area should never be unmapped 
>> from GPU. That's the whole point of setting the 
>> KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. I guess this is happening 
>> while migrating the context save area to VRAM for the first time, 
>> even before it's mapped to GPU?
> Yes. It is for the first time when registering svm range and migrating 
> to VRAM are doing together, at this moment, the range is not mapped to 
> GPU.
>>
>> I think there may be another eviction, when the CWSR header is 
>> initialized by the CPU. That would also migrate it back to system 
>> memory. To avoid that, you should probably register the context save 
>> area only after the header has been initialized.
> Yes. I am using this way. Please look at patch 4/4.
>>
>> I think avoiding an eviction when memory is migrated when it is first 
>> registered is worthwhile, not just for CWSR.
>>
>>
>>> It is for efficiency reason. On the other hand, without this 
>>> optimization KFDCWSRTest.InterruptRestore fails with queue 
>>> preemption error.
>>
>> What do you mean by "queue preemption error"? Does HWS hang?
> HWS doesn't hang immediately, so there is not error for fence timeout 
> "The cp might be in an unrecoverable state due to an unsuccessful 
> queues preemption". The error is "HIQ MQD's queue_doorbell_id0 is not 
> 0, Queue preemption time out" after checking mqd manager, which means 
> HWS abandons unmap queue request without returning timeout error to 
> driver. And after this error, the following test will fail at queue 
> creation as HWS hangs

OK, that sounds like the kind of bug the InterruptRestore test is meant 
to catch. I think you just created a better test by causing more 
preemptions. ;)

So we should do two things:

  * Avoid unnecessary preemptions in KFD
  * Improve the test to reproduce this hang even without unnecessary
    preemptions in KFD, so we can investigate the issue

Regards,
   Felix


>>
>>
>>> I think the reason is the extra queue evicts make HWS too busy to 
>>> preempt existing queues. There is one unmap_queue packet sent to HWS 
>>> in current code, and will be three unmap_queue packets with unified 
>>> memory allocation.
>>
>> When queues of a process are already evicted, they should not get 
>> evicted again. That's handled by the qpd->evicted counter. There 
>> should never be multiple unmap_queues packets in flight at the same 
>> time. If you're seeing three unmap_queues, you should also see queues 
>> restored three times.
>>
>> HWS should never be too busy to evict queues. If you're seeing 
>> preemptions fail, you may have found a bug.
> The restore delay worker will do something differently in term of 
> timing. It could restore queues before next unmap_queues, so the 
> situation is too complicate to debug in multiple queues evict/restore 
> environment. The error definitely means there is a bug, from driver 
> point of view there is nothing wrong even adding extra queue eviction, 
> so I try to avoid extra queue eviction and keep it as before. The 
> bottom line is to make sure unified svm range for context save area 
> doesn't cause any failure in kfdtest, so I can theoretically assume 
> extra queue eviction/restoring causes HWS failure.
>
> Regards,
> Eric
>>
>> Regards,
>>   Felix
>>
>>
>>> So this optimization will keep only one unmap_queue as before.
>>>
>>> Regards,
>>> Eric
>>>>
>>>> Regards,
>>>>   Felix
>>>>
>>>>
>>>>> +
>>>>>       p = container_of(svms, struct kfd_process, svms);
>>>>>         pr_debug("invalidate svms 0x%p prange [0x%lx 0x%lx] [0x%lx 
>>>>> 0x%lx]\n",
>>>>>            svms, prange->start, prange->last, start, last);
>>>>>   -    if (!p->xnack_enabled) {
>>>>> +    if (!p->xnack_enabled ||
>>>>> +        (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) {
>>>>>           int evicted_ranges;
>>>>>             list_for_each_entry(pchild, &prange->child_list, 
>>>>> child_list) {
>>>>> @@ -3321,7 +3325,9 @@ svm_range_set_attr(struct kfd_process *p, 
>>>>> struct mm_struct *mm,
>>>>>           if (r)
>>>>>               goto out_unlock_range;
>>>>>   -        if (migrated && !p->xnack_enabled) {
>>>>> +        if (migrated && (!p->xnack_enabled ||
>>>>> +            (prange->flags & 
>>>>> KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
>>>>> +            prange->mapped_to_gpu) {
>>>>>               pr_debug("restore_work will update mappings of 
>>>>> GPUs\n");
>>>>>               mutex_unlock(&prange->migrate_mutex);
>>>>>               continue;
>>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/4] libhsakmt: add new flags for svm
@ 2022-06-28 21:48 Eric Huang
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Huang @ 2022-06-28 21:48 UTC (permalink / raw)
  To: amd-gfx; +Cc: Eric Huang, Philip.Yang, felix.kuehling

It is to add new option for always keeping gpu mapping.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492
---
 include/linux/kfd_ioctl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h
index 8a0ed49..5c45f58 100644
--- a/include/linux/kfd_ioctl.h
+++ b/include/linux/kfd_ioctl.h
@@ -1069,6 +1069,8 @@ struct kfd_ioctl_cross_memory_copy_args {
 #define KFD_IOCTL_SVM_FLAG_GPU_EXEC    0x00000010
 /* GPUs mostly read, may allow similar optimizations as RO, but writes fault */
 #define KFD_IOCTL_SVM_FLAG_GPU_READ_MOSTLY     0x00000020
+/* Keep GPU memory mapping always valid as if XNACK is disable */
+#define KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED   0x00000040
 
 /**
  * kfd_ioctl_svm_op - SVM ioctl operations
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-30 16:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28 21:43 [PATCH 0/4] Unified memory for CWSR save restore area Eric Huang
2022-06-28 21:43 ` [PATCH 1/2] drm/amdkfd: add new flag for svm Eric Huang
2022-06-28 21:43 ` [PATCH 2/2] drm/amdkfd: change svm range evict Eric Huang
2022-06-29 22:20   ` Felix Kuehling
2022-06-29 22:53     ` Eric Huang
2022-06-29 23:29       ` Felix Kuehling
2022-06-30 15:19         ` Eric Huang
2022-06-30 16:09           ` Felix Kuehling
2022-06-28 21:43 ` [PATCH 3/4] libhsakmt: add new flags for svm Eric Huang
2022-06-28 21:43 ` [PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area Eric Huang
2022-06-28 21:48 [PATCH 3/4] libhsakmt: add new flags for svm Eric Huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.