All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: start_cpsch don't map queues
@ 2023-07-24 17:52 Philip Yang
  2023-07-24 19:58 ` Felix Kuehling
  2023-07-25 17:06 ` Michel Dänzer
  0 siblings, 2 replies; 4+ messages in thread
From: Philip Yang @ 2023-07-24 17:52 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling, michel

start_cpsch map queues when kfd_init_node have race condition with
IOMMUv2 init, and cause the gfx ring test failed later. Remove it
from start_cpsch because map queues will be done when creating queues
and resume queues.

Reported-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 71b7f16c0173..a2d0d0bcf853 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1658,9 +1658,6 @@ static int start_cpsch(struct device_queue_manager *dqm)
 	dqm->is_resetting = false;
 	dqm->sched_running = true;
 
-	if (!dqm->dev->kfd->shared_resources.enable_mes)
-		execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);
-
 	/* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU */
 	if (amdgpu_emu_mode == 0 && dqm->dev->adev->gmc.is_app_apu &&
 	    (KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 3))) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: start_cpsch don't map queues
  2023-07-24 17:52 [PATCH] drm/amdkfd: start_cpsch don't map queues Philip Yang
@ 2023-07-24 19:58 ` Felix Kuehling
  2023-07-25 17:06 ` Michel Dänzer
  1 sibling, 0 replies; 4+ messages in thread
From: Felix Kuehling @ 2023-07-24 19:58 UTC (permalink / raw)
  To: Philip Yang, amd-gfx; +Cc: michel

On 2023-07-24 13:52, Philip Yang wrote:
> start_cpsch map queues when kfd_init_node have race condition with
> IOMMUv2 init, and cause the gfx ring test failed later. Remove it
> from start_cpsch because map queues will be done when creating queues
> and resume queues.
>
> Reported-by: Michel Dänzer <michel@daenzer.net>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Michel, can you test whether this fixes your regression on Raven? Would 
be good to get a Tested-by for this patch, since we haven't been able to 
reproduce the problem yet.

Thanks,
   Felix


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 ---
>   1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 71b7f16c0173..a2d0d0bcf853 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1658,9 +1658,6 @@ static int start_cpsch(struct device_queue_manager *dqm)
>   	dqm->is_resetting = false;
>   	dqm->sched_running = true;
>   
> -	if (!dqm->dev->kfd->shared_resources.enable_mes)
> -		execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);
> -
>   	/* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU */
>   	if (amdgpu_emu_mode == 0 && dqm->dev->adev->gmc.is_app_apu &&
>   	    (KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 3))) {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: start_cpsch don't map queues
  2023-07-24 17:52 [PATCH] drm/amdkfd: start_cpsch don't map queues Philip Yang
  2023-07-24 19:58 ` Felix Kuehling
@ 2023-07-25 17:06 ` Michel Dänzer
  2023-07-27 14:26   ` Michel Dänzer
  1 sibling, 1 reply; 4+ messages in thread
From: Michel Dänzer @ 2023-07-25 17:06 UTC (permalink / raw)
  To: Philip Yang; +Cc: Felix.Kuehling, amd-gfx

On 7/24/23 19:52, Philip Yang wrote:
> start_cpsch map queues when kfd_init_node have race condition with
> IOMMUv2 init, and cause the gfx ring test failed later. Remove it
> from start_cpsch because map queues will be done when creating queues
> and resume queues.
> 
> Reported-by: Michel Dänzer <michel@daenzer.net>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
This patch doesn't help for any of the symptoms I've described I'm afraid.


iommu=pt on the kernel command line avoids the IB test failures for the compute rings, but doesn't help for any of the other symptoms either, which still leaves the system unusable overall.


Sorry I don't have better news,


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdkfd: start_cpsch don't map queues
  2023-07-25 17:06 ` Michel Dänzer
@ 2023-07-27 14:26   ` Michel Dänzer
  0 siblings, 0 replies; 4+ messages in thread
From: Michel Dänzer @ 2023-07-27 14:26 UTC (permalink / raw)
  To: Philip Yang; +Cc: Felix.Kuehling, amd-gfx

On 7/25/23 19:06, Michel Dänzer wrote:
> On 7/24/23 19:52, Philip Yang wrote:
>> start_cpsch map queues when kfd_init_node have race condition with
>> IOMMUv2 init, and cause the gfx ring test failed later. Remove it
>> from start_cpsch because map queues will be done when creating queues
>> and resume queues.
>>
>> Reported-by: Michel Dänzer <michel@daenzer.net>
>> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
>  
> This patch doesn't help for any of the symptoms I've described I'm afraid.

Actually, I failed to check one thing before:

The patch fixed both IOMMU page faults. The IB tests on the compute rings still failed though.

Interestingly, with iommu=pt there was still one IOMMU page fault, even with this patch:

 amdgpu 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x14105a380 flags=0x0070]


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-27 14:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 17:52 [PATCH] drm/amdkfd: start_cpsch don't map queues Philip Yang
2023-07-24 19:58 ` Felix Kuehling
2023-07-25 17:06 ` Michel Dänzer
2023-07-27 14:26   ` Michel Dänzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.