All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Fix some unload driver issues
@ 2021-03-05  9:04 Emily Deng
  2021-03-05  9:53 ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Emily Deng @ 2021-03-05  9:04 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

If have memory leak, maybe it will have issue in
ttm_bo_force_list_clean-> ttm_mem_evict_first.

Set adev->gart.ptr to null to avoid to call
amdgpu_gmc_set_pte_pde to cause ptr issue pointer when
calling amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 23823a57374f..f1ede4b43d07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct amdgpu_device *adev)
 		return;
 	}
 	amdgpu_bo_unref(&adev->gart.bo);
+	adev->gart.ptr = NULL;
 }
 
 /*
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix some unload driver issues
  2021-03-05  9:04 [PATCH] drm/amdgpu: Fix some unload driver issues Emily Deng
@ 2021-03-05  9:53 ` Christian König
  0 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2021-03-05  9:53 UTC (permalink / raw)
  To: Emily Deng, amd-gfx

Am 05.03.21 um 10:04 schrieb Emily Deng:
> If have memory leak, maybe it will have issue in
> ttm_bo_force_list_clean-> ttm_mem_evict_first.
>
> Set adev->gart.ptr to null to avoid to call
> amdgpu_gmc_set_pte_pde to cause ptr issue pointer when
> calling amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>
> Signed-off-by: Emily Deng <Emily.Deng@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index 23823a57374f..f1ede4b43d07 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct amdgpu_device *adev)
>   		return;
>   	}
>   	amdgpu_bo_unref(&adev->gart.bo);
> +	adev->gart.ptr = NULL;
>   }
>   
>   /*

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] drm/amdgpu: Fix some unload driver issues
  2021-03-05  8:51     ` Christian König
@ 2021-03-05  8:52       ` Deng, Emily
  0 siblings, 0 replies; 7+ messages in thread
From: Deng, Emily @ 2021-03-05  8:52 UTC (permalink / raw)
  To: Christian König, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Friday, March 5, 2021 4:52 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>
>
>
>Am 05.03.21 um 09:43 schrieb Deng, Emily:
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>> -----Original Message-----
>>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>>> Sent: Friday, March 5, 2021 3:55 PM
>>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>>>
>>> Am 05.03.21 um 02:20 schrieb Emily Deng:
>>>> When unloading driver after killing some applications, it will hit
>>>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete.
>>>> So to avoid the job submit after fence driver fini, call
>>>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put
>>> drm_sched_fini before waiting fence.
>>>
>>> Good catch, Reviewed-by: Christian König <christian.koenig@amd.com>
>>> for this part.
>>>
>>>> Set adev->gart.ptr to null to fix null pointer when calling
>>>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>>> For that one I'm wondering if we shouldn't rather rework or double
>>> check the tear down order.
>>>
>>> On the other hand the hardware should be idle by now (this comes
>>> after the fence_driver_fini, doesn't it?) and it looks cleaner on it's own.
>> Yes, you are right, without consider memory leak, with above patch, the bo
>are all cleaned, then won't have issue. But if have memory leak, maybe it will
>have issue in ttm_bo_force_list_clean-> ttm_mem_evict_first?
>
>Yeah, that is a good argument and part of what I mean with it looks cleaner on
>its own.
>
>Maybe write that into the commit message like this. With that done the full
>patch has my rb.
>
>Regards,
>Christian.
Ok, thanks.
>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>>>>    3 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index a11760ec3924..de0597d34588 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device
>>> *adev)
>>>>    {
>>>>    dev_info(adev->dev, "amdgpu: finishing device.\n");
>>>>    flush_delayed_work(&adev->delayed_init_work);
>>>> +ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>>>>    adev->shutdown = true;
>>>>
>>>>    kfree(adev->pci_state);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 143a14f4866f..6d16f58ac91e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct
>>> amdgpu_device
>>>> *adev)
>>>>
>>>>    if (!ring || !ring->fence_drv.initialized)
>>>>    continue;
>>>> +if (!ring->no_scheduler)
>>>> +drm_sched_fini(&ring->sched);
>>>>    r = amdgpu_fence_wait_empty(ring);
>>>>    if (r) {
>>>>    /* no need to trigger GPU reset as we are unloading */
>>> @@ -539,8
>>>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>>    if (ring->fence_drv.irq_src)
>>>>    amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>>>>           ring->fence_drv.irq_type); -if (!ring->no_scheduler)
>>>> -drm_sched_fini(&ring->sched);
>>>> +
>>>>    del_timer_sync(&ring->fence_drv.fallback_timer);
>>>>    for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>>>    dma_fence_put(ring->fence_drv.fences[j]);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> index 23823a57374f..f1ede4b43d07 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct
>>> amdgpu_device *adev)
>>>>    return;
>>>>    }
>>>>    amdgpu_bo_unref(&adev->gart.bo);
>>>> +adev->gart.ptr = NULL;
>>>>    }
>>>>
>>>>    /*

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix some unload driver issues
  2021-03-05  8:43   ` Deng, Emily
@ 2021-03-05  8:51     ` Christian König
  2021-03-05  8:52       ` Deng, Emily
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-03-05  8:51 UTC (permalink / raw)
  To: Deng, Emily, amd-gfx



Am 05.03.21 um 09:43 schrieb Deng, Emily:
> [AMD Official Use Only - Internal Distribution Only]
>
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Friday, March 5, 2021 3:55 PM
>> To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>>
>> Am 05.03.21 um 02:20 schrieb Emily Deng:
>>> When unloading driver after killing some applications, it will hit
>>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So
>>> to avoid the job submit after fence driver fini, call
>>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put
>> drm_sched_fini before waiting fence.
>>
>> Good catch, Reviewed-by: Christian König <christian.koenig@amd.com> for
>> this part.
>>
>>> Set adev->gart.ptr to null to fix null pointer when calling
>>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>> For that one I'm wondering if we shouldn't rather rework or double check the
>> tear down order.
>>
>> On the other hand the hardware should be idle by now (this comes after the
>> fence_driver_fini, doesn't it?) and it looks cleaner on it's own.
> Yes, you are right, without consider memory leak, with above patch, the bo are all cleaned, then won't have issue. But if have memory leak, maybe it will have issue in ttm_bo_force_list_clean-> ttm_mem_evict_first?

Yeah, that is a good argument and part of what I mean with it looks 
cleaner on its own.

Maybe write that into the commit message like this. With that done the 
full patch has my rb.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
>>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>>>    3 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index a11760ec3924..de0597d34588 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device
>> *adev)
>>>    {
>>>    dev_info(adev->dev, "amdgpu: finishing device.\n");
>>>    flush_delayed_work(&adev->delayed_init_work);
>>> +ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>>>    adev->shutdown = true;
>>>
>>>    kfree(adev->pci_state);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index 143a14f4866f..6d16f58ac91e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct
>> amdgpu_device
>>> *adev)
>>>
>>>    if (!ring || !ring->fence_drv.initialized)
>>>    continue;
>>> +if (!ring->no_scheduler)
>>> +drm_sched_fini(&ring->sched);
>>>    r = amdgpu_fence_wait_empty(ring);
>>>    if (r) {
>>>    /* no need to trigger GPU reset as we are unloading */
>> @@ -539,8
>>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>    if (ring->fence_drv.irq_src)
>>>    amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>>>           ring->fence_drv.irq_type);
>>> -if (!ring->no_scheduler)
>>> -drm_sched_fini(&ring->sched);
>>> +
>>>    del_timer_sync(&ring->fence_drv.fallback_timer);
>>>    for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>>    dma_fence_put(ring->fence_drv.fences[j]);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> index 23823a57374f..f1ede4b43d07 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct
>> amdgpu_device *adev)
>>>    return;
>>>    }
>>>    amdgpu_bo_unref(&adev->gart.bo);
>>> +adev->gart.ptr = NULL;
>>>    }
>>>
>>>    /*

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] drm/amdgpu: Fix some unload driver issues
  2021-03-05  7:54 ` Christian König
@ 2021-03-05  8:43   ` Deng, Emily
  2021-03-05  8:51     ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Deng, Emily @ 2021-03-05  8:43 UTC (permalink / raw)
  To: Christian König, amd-gfx

[AMD Official Use Only - Internal Distribution Only]

>-----Original Message-----
>From: Christian König <ckoenig.leichtzumerken@gmail.com>
>Sent: Friday, March 5, 2021 3:55 PM
>To: Deng, Emily <Emily.Deng@amd.com>; amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH] drm/amdgpu: Fix some unload driver issues
>
>Am 05.03.21 um 02:20 schrieb Emily Deng:
>> When unloading driver after killing some applications, it will hit
>> sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So
>> to avoid the job submit after fence driver fini, call
>> ttm_bo_lock_delayed_workqueue before fence driver fini. And also put
>drm_sched_fini before waiting fence.
>
>Good catch, Reviewed-by: Christian König <christian.koenig@amd.com> for
>this part.
>
>> Set adev->gart.ptr to null to fix null pointer when calling
>> amdgpu_gart_unbind in amdgpu_bo_fini which is after gart_fini.
>
>For that one I'm wondering if we shouldn't rather rework or double check the
>tear down order.
>
>On the other hand the hardware should be idle by now (this comes after the
>fence_driver_fini, doesn't it?) and it looks cleaner on it's own.
Yes, you are right, without consider memory leak, with above patch, the bo are all cleaned, then won't have issue. But if have memory leak, maybe it will have issue in ttm_bo_force_list_clean-> ttm_mem_evict_first?

>
>Regards,
>Christian.
>
>>
>> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>>   3 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a11760ec3924..de0597d34588 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device
>*adev)
>>   {
>>   dev_info(adev->dev, "amdgpu: finishing device.\n");
>>   flush_delayed_work(&adev->delayed_init_work);
>> +ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>>   adev->shutdown = true;
>>
>>   kfree(adev->pci_state);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 143a14f4866f..6d16f58ac91e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct
>amdgpu_device
>> *adev)
>>
>>   if (!ring || !ring->fence_drv.initialized)
>>   continue;
>> +if (!ring->no_scheduler)
>> +drm_sched_fini(&ring->sched);
>>   r = amdgpu_fence_wait_empty(ring);
>>   if (r) {
>>   /* no need to trigger GPU reset as we are unloading */
>@@ -539,8
>> +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>   if (ring->fence_drv.irq_src)
>>   amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>>          ring->fence_drv.irq_type);
>> -if (!ring->no_scheduler)
>> -drm_sched_fini(&ring->sched);
>> +
>>   del_timer_sync(&ring->fence_drv.fallback_timer);
>>   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>   dma_fence_put(ring->fence_drv.fences[j]);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> index 23823a57374f..f1ede4b43d07 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
>> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct
>amdgpu_device *adev)
>>   return;
>>   }
>>   amdgpu_bo_unref(&adev->gart.bo);
>> +adev->gart.ptr = NULL;
>>   }
>>
>>   /*

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/amdgpu: Fix some unload driver issues
  2021-03-05  1:20 Emily Deng
@ 2021-03-05  7:54 ` Christian König
  2021-03-05  8:43   ` Deng, Emily
  0 siblings, 1 reply; 7+ messages in thread
From: Christian König @ 2021-03-05  7:54 UTC (permalink / raw)
  To: Emily Deng, amd-gfx

Am 05.03.21 um 02:20 schrieb Emily Deng:
> When unloading driver after killing some applications, it will hit sdma
> flush tlb job timeout which is called by ttm_bo_delay_delete. So
> to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue
> before fence driver fini. And also put drm_sched_fini before waiting fence.

Good catch, Reviewed-by: Christian König <christian.koenig@amd.com> for 
this part.

> Set adev->gart.ptr to null to fix null pointer when calling amdgpu_gart_unbind
> in amdgpu_bo_fini which is after gart_fini.

For that one I'm wondering if we shouldn't rather rework or double check 
the tear down order.

On the other hand the hardware should be idle by now (this comes after 
the fence_driver_fini, doesn't it?) and it looks cleaner on it's own.

Regards,
Christian.

>
> Signed-off-by: Emily Deng <Emily.Deng@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
>   3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a11760ec3924..de0597d34588 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>   {
>   	dev_info(adev->dev, "amdgpu: finishing device.\n");
>   	flush_delayed_work(&adev->delayed_init_work);
> +	ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
>   	adev->shutdown = true;
>   
>   	kfree(adev->pci_state);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 143a14f4866f..6d16f58ac91e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>   
>   		if (!ring || !ring->fence_drv.initialized)
>   			continue;
> +		if (!ring->no_scheduler)
> +			drm_sched_fini(&ring->sched);
>   		r = amdgpu_fence_wait_empty(ring);
>   		if (r) {
>   			/* no need to trigger GPU reset as we are unloading */
> @@ -539,8 +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>   		if (ring->fence_drv.irq_src)
>   			amdgpu_irq_put(adev, ring->fence_drv.irq_src,
>   				       ring->fence_drv.irq_type);
> -		if (!ring->no_scheduler)
> -			drm_sched_fini(&ring->sched);
> +
>   		del_timer_sync(&ring->fence_drv.fallback_timer);
>   		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>   			dma_fence_put(ring->fence_drv.fences[j]);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index 23823a57374f..f1ede4b43d07 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct amdgpu_device *adev)
>   		return;
>   	}
>   	amdgpu_bo_unref(&adev->gart.bo);
> +	adev->gart.ptr = NULL;
>   }
>   
>   /*

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] drm/amdgpu: Fix some unload driver issues
@ 2021-03-05  1:20 Emily Deng
  2021-03-05  7:54 ` Christian König
  0 siblings, 1 reply; 7+ messages in thread
From: Emily Deng @ 2021-03-05  1:20 UTC (permalink / raw)
  To: amd-gfx; +Cc: Emily Deng

When unloading driver after killing some applications, it will hit sdma
flush tlb job timeout which is called by ttm_bo_delay_delete. So
to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue
before fence driver fini. And also put drm_sched_fini before waiting fence.

Set adev->gart.ptr to null to fix null pointer when calling amdgpu_gart_unbind
in amdgpu_bo_fini which is after gart_fini.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a11760ec3924..de0597d34588 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3594,6 +3594,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 {
 	dev_info(adev->dev, "amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
+	ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
 	adev->shutdown = true;
 
 	kfree(adev->pci_state);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 143a14f4866f..6d16f58ac91e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -531,6 +531,8 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
 
 		if (!ring || !ring->fence_drv.initialized)
 			continue;
+		if (!ring->no_scheduler)
+			drm_sched_fini(&ring->sched);
 		r = amdgpu_fence_wait_empty(ring);
 		if (r) {
 			/* no need to trigger GPU reset as we are unloading */
@@ -539,8 +541,7 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
 		if (ring->fence_drv.irq_src)
 			amdgpu_irq_put(adev, ring->fence_drv.irq_src,
 				       ring->fence_drv.irq_type);
-		if (!ring->no_scheduler)
-			drm_sched_fini(&ring->sched);
+
 		del_timer_sync(&ring->fence_drv.fallback_timer);
 		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
 			dma_fence_put(ring->fence_drv.fences[j]);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index 23823a57374f..f1ede4b43d07 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -202,6 +202,7 @@ void amdgpu_gart_table_vram_free(struct amdgpu_device *adev)
 		return;
 	}
 	amdgpu_bo_unref(&adev->gart.bo);
+	adev->gart.ptr = NULL;
 }
 
 /*
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-05  9:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05  9:04 [PATCH] drm/amdgpu: Fix some unload driver issues Emily Deng
2021-03-05  9:53 ` Christian König
  -- strict thread matches above, loose matches on Subject: below --
2021-03-05  1:20 Emily Deng
2021-03-05  7:54 ` Christian König
2021-03-05  8:43   ` Deng, Emily
2021-03-05  8:51     ` Christian König
2021-03-05  8:52       ` Deng, Emily

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.