All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
@ 2022-03-15  7:09 Guchun Chen
  2022-03-15  7:35 ` Christian König
  0 siblings, 1 reply; 6+ messages in thread
From: Guchun Chen @ 2022-03-15  7:09 UTC (permalink / raw)
  To: amd-gfx, hawking.zhang, christian.koenig, xinhui.pan, alexander.deucher
  Cc: Guchun Chen

On GPUs with RAS enabled, below call trace is observed when
suspending or shutting down device. The cause is we have enabled
memory wipe flag for BOs on such GPUs by default, and such BOs
will go to memory wipe by amdgpu_fill_buffer, however, because
ring is off already, it fails to clean up the memory and throw
this error message. So add a suspend/shutdown check before
wipping memory.

[drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off.

Fixes: e7e7c87a205d("drm/amdgpu: Wipe all VRAM on free when RAS is enabled")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 23c9a60693ee..ed1a19be4a54 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1284,6 +1284,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
  */
 void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 {
+	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
 	struct dma_fence *fence = NULL;
 	struct amdgpu_bo *abo;
 	int r;
@@ -1303,7 +1304,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 		amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
 
 	if (bo->resource->mem_type != TTM_PL_VRAM ||
-	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+		!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE) ||
+		adev->in_suspend || adev->shutdown)
 		return;
 
 	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
  2022-03-15  7:09 [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage Guchun Chen
@ 2022-03-15  7:35 ` Christian König
  2022-03-15  7:49   ` Chen, Guchun
  0 siblings, 1 reply; 6+ messages in thread
From: Christian König @ 2022-03-15  7:35 UTC (permalink / raw)
  To: Guchun Chen, amd-gfx, hawking.zhang, xinhui.pan, alexander.deucher



Am 15.03.22 um 08:09 schrieb Guchun Chen:
> On GPUs with RAS enabled, below call trace is observed when
> suspending or shutting down device. The cause is we have enabled
> memory wipe flag for BOs on such GPUs by default, and such BOs
> will go to memory wipe by amdgpu_fill_buffer, however, because
> ring is off already, it fails to clean up the memory and throw
> this error message. So add a suspend/shutdown check before
> wipping memory.
>
> [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off.
>
> Fixes: e7e7c87a205d("drm/amdgpu: Wipe all VRAM on free when RAS is enabled")
> Signed-off-by: Guchun Chen <guchun.chen@amd.com>

Just one nit below, but the patch is anyway Reviewed-by: Christian König 
<christian.koenig@amd.com>.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 23c9a60693ee..ed1a19be4a54 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1284,6 +1284,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
>    */
>   void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   {
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>   	struct dma_fence *fence = NULL;
>   	struct amdgpu_bo *abo;
>   	int r;
> @@ -1303,7 +1304,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   		amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
>   
>   	if (bo->resource->mem_type != TTM_PL_VRAM ||
> -	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
> +		!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE) ||
> +		adev->in_suspend || adev->shutdown)

What editor and settings are you using?

When you have a multi-line condition to an if the next line should start 
after the ( of the previous line, but this here is using two tabs instead.

Regards,
Christian.

>   		return;
>   
>   	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
  2022-03-15  7:35 ` Christian König
@ 2022-03-15  7:49   ` Chen, Guchun
  2022-03-15  7:51     ` Christian König
  0 siblings, 1 reply; 6+ messages in thread
From: Chen, Guchun @ 2022-03-15  7:49 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx, Zhang, Hawking, Pan, Xinhui, Deucher,
	Alexander

I used two tabs in VIM. Let me update this later.

Regards,
Guchun

-----Original Message-----
From: Koenig, Christian <Christian.Koenig@amd.com> 
Sent: Tuesday, March 15, 2022 3:35 PM
To: Chen, Guchun <Guchun.Chen@amd.com>; amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage



Am 15.03.22 um 08:09 schrieb Guchun Chen:
> On GPUs with RAS enabled, below call trace is observed when suspending 
> or shutting down device. The cause is we have enabled memory wipe flag 
> for BOs on such GPUs by default, and such BOs will go to memory wipe 
> by amdgpu_fill_buffer, however, because ring is off already, it fails 
> to clean up the memory and throw this error message. So add a 
> suspend/shutdown check before wipping memory.
>
> [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off.
>
> Fixes: e7e7c87a205d("drm/amdgpu: Wipe all VRAM on free when RAS is 
> enabled")
> Signed-off-by: Guchun Chen <guchun.chen@amd.com>

Just one nit below, but the patch is anyway Reviewed-by: Christian König <christian.koenig@amd.com>.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 23c9a60693ee..ed1a19be4a54 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1284,6 +1284,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
>    */
>   void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   {
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>   	struct dma_fence *fence = NULL;
>   	struct amdgpu_bo *abo;
>   	int r;
> @@ -1303,7 +1304,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   		amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
>   
>   	if (bo->resource->mem_type != TTM_PL_VRAM ||
> -	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
> +		!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE) ||
> +		adev->in_suspend || adev->shutdown)

What editor and settings are you using?

When you have a multi-line condition to an if the next line should start after the ( of the previous line, but this here is using two tabs instead.

Regards,
Christian.

>   		return;
>   
>   	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
  2022-03-15  7:49   ` Chen, Guchun
@ 2022-03-15  7:51     ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2022-03-15  7:51 UTC (permalink / raw)
  To: Chen, Guchun, amd-gfx, Zhang, Hawking, Pan, Xinhui, Deucher, Alexander

Try installing the linuxsty vim pluging. It should give you the correct 
coding style setting for vim.

I'm using it for years and can only recommend it.

Regards,
Christian.

Am 15.03.22 um 08:49 schrieb Chen, Guchun:
> I used two tabs in VIM. Let me update this later.
>
> Regards,
> Guchun
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: Tuesday, March 15, 2022 3:35 PM
> To: Chen, Guchun <Guchun.Chen@amd.com>; amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
>
>
>
> Am 15.03.22 um 08:09 schrieb Guchun Chen:
>> On GPUs with RAS enabled, below call trace is observed when suspending
>> or shutting down device. The cause is we have enabled memory wipe flag
>> for BOs on such GPUs by default, and such BOs will go to memory wipe
>> by amdgpu_fill_buffer, however, because ring is off already, it fails
>> to clean up the memory and throw this error message. So add a
>> suspend/shutdown check before wipping memory.
>>
>> [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off.
>>
>> Fixes: e7e7c87a205d("drm/amdgpu: Wipe all VRAM on free when RAS is
>> enabled")
>> Signed-off-by: Guchun Chen <guchun.chen@amd.com>
> Just one nit below, but the patch is anyway Reviewed-by: Christian König <christian.koenig@amd.com>.
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 23c9a60693ee..ed1a19be4a54 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -1284,6 +1284,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
>>     */
>>    void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>>    {
>> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>>    	struct dma_fence *fence = NULL;
>>    	struct amdgpu_bo *abo;
>>    	int r;
>> @@ -1303,7 +1304,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>>    		amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
>>    
>>    	if (bo->resource->mem_type != TTM_PL_VRAM ||
>> -	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
>> +		!(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE) ||
>> +		adev->in_suspend || adev->shutdown)
> What editor and settings are you using?
>
> When you have a multi-line condition to an if the next line should start after the ( of the previous line, but this here is using two tabs instead.
>
> Regards,
> Christian.
>
>>    		return;
>>    
>>    	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
  2022-03-15  7:54 Guchun Chen
@ 2022-03-15 15:38 ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2022-03-15 15:38 UTC (permalink / raw)
  To: Guchun Chen, amd-gfx, hawking.zhang, christian.koenig,
	xinhui.pan, alexander.deucher

Am 2022-03-15 um 03:54 schrieb Guchun Chen:
> On GPUs with RAS enabled, below call trace is observed when
> suspending or shutting down device. The cause is we have enabled
> memory wipe flag for BOs on such GPUs by default, and such BOs
> will go to memory wipe by amdgpu_fill_buffer, however, because
> ring is off already, it fails to clean up the memory and throw
> this error message. So add a suspend/shutdown check before
> wipping memory.
>
> [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off.
>
> v2: fix coding style issue
>
> Fixes: e7e7c87a205d("drm/amdgpu: Wipe all VRAM on free when RAS is enabled")
> Signed-off-by: Guchun Chen <guchun.chen@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>

Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 23c9a60693ee..c712d7f5e8a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1284,6 +1284,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
>    */
>   void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   {
> +	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>   	struct dma_fence *fence = NULL;
>   	struct amdgpu_bo *abo;
>   	int r;
> @@ -1303,7 +1304,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
>   		amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
>   
>   	if (bo->resource->mem_type != TTM_PL_VRAM ||
> -	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
> +	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE) ||
> +	    adev->in_suspend || adev->shutdown)
>   		return;
>   
>   	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage
@ 2022-03-15  7:54 Guchun Chen
  2022-03-15 15:38 ` Felix Kuehling
  0 siblings, 1 reply; 6+ messages in thread
From: Guchun Chen @ 2022-03-15  7:54 UTC (permalink / raw)
  To: amd-gfx, hawking.zhang, christian.koenig, xinhui.pan, alexander.deucher
  Cc: Guchun Chen

On GPUs with RAS enabled, below call trace is observed when
suspending or shutting down device. The cause is we have enabled
memory wipe flag for BOs on such GPUs by default, and such BOs
will go to memory wipe by amdgpu_fill_buffer, however, because
ring is off already, it fails to clean up the memory and throw
this error message. So add a suspend/shutdown check before
wipping memory.

[drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off.

v2: fix coding style issue

Fixes: e7e7c87a205d("drm/amdgpu: Wipe all VRAM on free when RAS is enabled")
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 23c9a60693ee..c712d7f5e8a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1284,6 +1284,7 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo, uint64_t *vram_mem,
  */
 void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 {
+	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
 	struct dma_fence *fence = NULL;
 	struct amdgpu_bo *abo;
 	int r;
@@ -1303,7 +1304,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)
 		amdgpu_amdkfd_remove_fence_on_pt_pd_bos(abo);
 
 	if (bo->resource->mem_type != TTM_PL_VRAM ||
-	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
+	    !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE) ||
+	    adev->in_suspend || adev->shutdown)
 		return;
 
 	if (WARN_ON_ONCE(!dma_resv_trylock(bo->base.resv)))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-15 15:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-15  7:09 [PATCH 1/2] drm/amdgpu: prevent memory wipe in suspend/shutdown stage Guchun Chen
2022-03-15  7:35 ` Christian König
2022-03-15  7:49   ` Chen, Guchun
2022-03-15  7:51     ` Christian König
2022-03-15  7:54 Guchun Chen
2022-03-15 15:38 ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.