All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: fix cp hang in eviction
@ 2019-07-10 15:11 Huang, JinHuiEric
  0 siblings, 0 replies; 4+ messages in thread
From: Huang, JinHuiEric @ 2019-07-10 15:11 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Huang, JinHuiEric, Kuehling, Felix

The cp hang occurs in OCL conformance test only on supermicro
platform which has 40 cores and the test generates 40 threads.
The root cause is race condition in non-protected flags.

The fix is to add flags of is_evicted and is_active(init_mqd())
into protected area.

Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 9ffdda5..535c981 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 
 	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
 			q->properties.type)];
-	/*
-	 * Eviction state logic: mark all queues as evicted, even ones
-	 * not currently active. Restoring inactive queues later only
-	 * updates the is_evicted flag but is a no-op otherwise.
-	 */
-	q->properties.is_evicted = !!qpd->evicted;
+
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
 		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
 		dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1173,9 +1168,17 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		retval = -ENOMEM;
 		goto out_deallocate_doorbell;
 	}
+
+	dqm_lock(dqm);
+	/*
+	 * Eviction state logic: mark all queues as evicted, even ones
+	 * not currently active. Restoring inactive queues later only
+	 * updates the is_evicted flag but is a no-op otherwise.
+	 */
+	q->properties.is_evicted = !!qpd->evicted;
+	q->properties.is_suspended = false;
 	mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
 				&q->gart_mqd_addr, &q->properties);
-	dqm_lock(dqm);
 
 	list_add(&q->list, &qpd->queues_list);
 	qpd->queue_count++;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: fix cp hang in eviction
       [not found] ` <1562772046-7681-1-git-send-email-JinhuiEric.Huang-5C7GfCeVMHo@public.gmane.org>
  2019-07-11 14:12   ` Huang, JinHuiEric
@ 2019-07-11 16:38   ` Kuehling, Felix
  1 sibling, 0 replies; 4+ messages in thread
From: Kuehling, Felix @ 2019-07-11 16:38 UTC (permalink / raw)
  To: Huang, JinHuiEric, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2019-07-10 11:20 a.m., Huang, JinHuiEric wrote:
> The cp hang occurs in OCL conformance test only on supermicro
> platform which has 40 cores and the test generates 40 threads.
> The root cause is race condition in non-protected flags.
>
> The fix is to add flags of is_evicted and is_active(init_mqd())
> into protected area.
>
> Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>

Sorry, I missed this one. I only saw the one earlier that you recalled.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +++++++++-------
>   1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 9ffdda5..f23e17b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
>   
>   	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
>   			q->properties.type)];
> -	/*
> -	 * Eviction state logic: mark all queues as evicted, even ones
> -	 * not currently active. Restoring inactive queues later only
> -	 * updates the is_evicted flag but is a no-op otherwise.
> -	 */
> -	q->properties.is_evicted = !!qpd->evicted;
> +
>   	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
>   		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
>   		dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
> @@ -1173,9 +1168,16 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
>   		retval = -ENOMEM;
>   		goto out_deallocate_doorbell;
>   	}
> +
> +	dqm_lock(dqm);
> +	/*
> +	 * Eviction state logic: mark all queues as evicted, even ones
> +	 * not currently active. Restoring inactive queues later only
> +	 * updates the is_evicted flag but is a no-op otherwise.
> +	 */
> +	q->properties.is_evicted = !!qpd->evicted;
>   	mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
>   				&q->gart_mqd_addr, &q->properties);
> -	dqm_lock(dqm);
>   
>   	list_add(&q->list, &qpd->queues_list);
>   	qpd->queue_count++;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: fix cp hang in eviction
       [not found] ` <1562772046-7681-1-git-send-email-JinhuiEric.Huang-5C7GfCeVMHo@public.gmane.org>
@ 2019-07-11 14:12   ` Huang, JinHuiEric
  2019-07-11 16:38   ` Kuehling, Felix
  1 sibling, 0 replies; 4+ messages in thread
From: Huang, JinHuiEric @ 2019-07-11 14:12 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Kuehling, Felix

ping.

On 2019-07-10 11:20 a.m., Huang, JinHuiEric wrote:
> The cp hang occurs in OCL conformance test only on supermicro
> platform which has 40 cores and the test generates 40 threads.
> The root cause is race condition in non-protected flags.
>
> The fix is to add flags of is_evicted and is_active(init_mqd())
> into protected area.
>
> Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +++++++++-------
>   1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 9ffdda5..f23e17b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
>   
>   	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
>   			q->properties.type)];
> -	/*
> -	 * Eviction state logic: mark all queues as evicted, even ones
> -	 * not currently active. Restoring inactive queues later only
> -	 * updates the is_evicted flag but is a no-op otherwise.
> -	 */
> -	q->properties.is_evicted = !!qpd->evicted;
> +
>   	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
>   		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
>   		dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
> @@ -1173,9 +1168,16 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
>   		retval = -ENOMEM;
>   		goto out_deallocate_doorbell;
>   	}
> +
> +	dqm_lock(dqm);
> +	/*
> +	 * Eviction state logic: mark all queues as evicted, even ones
> +	 * not currently active. Restoring inactive queues later only
> +	 * updates the is_evicted flag but is a no-op otherwise.
> +	 */
> +	q->properties.is_evicted = !!qpd->evicted;
>   	mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
>   				&q->gart_mqd_addr, &q->properties);
> -	dqm_lock(dqm);
>   
>   	list_add(&q->list, &qpd->queues_list);
>   	qpd->queue_count++;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] drm/amdkfd: fix cp hang in eviction
@ 2019-07-10 15:20 Huang, JinHuiEric
       [not found] ` <1562772046-7681-1-git-send-email-JinhuiEric.Huang-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Huang, JinHuiEric @ 2019-07-10 15:20 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Huang, JinHuiEric, Kuehling, Felix

The cp hang occurs in OCL conformance test only on supermicro
platform which has 40 cores and the test generates 40 threads.
The root cause is race condition in non-protected flags.

The fix is to add flags of is_evicted and is_active(init_mqd())
into protected area.

Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 9ffdda5..f23e17b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 
 	mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
 			q->properties.type)];
-	/*
-	 * Eviction state logic: mark all queues as evicted, even ones
-	 * not currently active. Restoring inactive queues later only
-	 * updates the is_evicted flag but is a no-op otherwise.
-	 */
-	q->properties.is_evicted = !!qpd->evicted;
+
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
 		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
 		dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1173,9 +1168,16 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		retval = -ENOMEM;
 		goto out_deallocate_doorbell;
 	}
+
+	dqm_lock(dqm);
+	/*
+	 * Eviction state logic: mark all queues as evicted, even ones
+	 * not currently active. Restoring inactive queues later only
+	 * updates the is_evicted flag but is a no-op otherwise.
+	 */
+	q->properties.is_evicted = !!qpd->evicted;
 	mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
 				&q->gart_mqd_addr, &q->properties);
-	dqm_lock(dqm);
 
 	list_add(&q->list, &qpd->queues_list);
 	qpd->queue_count++;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-07-11 16:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-10 15:11 [PATCH] drm/amdkfd: fix cp hang in eviction Huang, JinHuiEric
2019-07-10 15:20 Huang, JinHuiEric
     [not found] ` <1562772046-7681-1-git-send-email-JinhuiEric.Huang-5C7GfCeVMHo@public.gmane.org>
2019-07-11 14:12   ` Huang, JinHuiEric
2019-07-11 16:38   ` Kuehling, Felix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.