All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
@ 2023-02-02  9:54 Jack Xiao
  2023-02-02 11:43 ` AW: " Koenig, Christian
  0 siblings, 1 reply; 14+ messages in thread
From: Jack Xiao @ 2023-02-02  9:54 UTC (permalink / raw)
  To: amd-gfx, Alexander.Deucher, Christian.Koenig; +Cc: Jack Xiao

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
 	if (*bo == NULL)
 		return;
 
-	WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+	WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+		!amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
 
 	if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
 		if (cpu_addr)
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-02  9:54 [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable Jack Xiao
@ 2023-02-02 11:43 ` Koenig, Christian
  2023-02-03  6:04   ` Xiao, Jack
  0 siblings, 1 reply; 14+ messages in thread
From: Koenig, Christian @ 2023-02-02 11:43 UTC (permalink / raw)
  To: Xiao, Jack, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Xiao, Jack <Jack.Xiao@amd.com>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3


[-- Attachment #2: Type: text/html, Size: 2578 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-02 11:43 ` AW: " Koenig, Christian
@ 2023-02-03  6:04   ` Xiao, Jack
  2023-02-03 13:19     ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao, Jack @ 2023-02-03  6:04 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3

[-- Attachment #2: Type: text/html, Size: 6349 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-03  6:04   ` Xiao, Jack
@ 2023-02-03 13:19     ` Christian König
  2023-02-06  7:23       ` Xiao, Jack
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-03 13:19 UTC (permalink / raw)
  To: Xiao, Jack, Koenig, Christian, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 2489 bytes --]

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those 
can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning 
from that please fix the code which tried to free something during 
suspend instead.

Regards,
Christian.

Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
> >> It's simply illegal to free up memory during suspend.
>
> Why? In my understanding, the limit was caused by DMA shutdown.
>
> Regards,
>
> Jack
>
> *From:* Koenig, Christian <Christian.Koenig@amd.com>
> *Sent:* Thursday, February 2, 2023 7:43 PM
> *To:* Xiao, Jack <Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org; 
> Deucher, Alexander <Alexander.Deucher@amd.com>
> *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Big NAK to this! This warning is not related in any way to the hw state.
>
> It's simply illegal to free up memory during suspend.
>
> Regards,
>
> Christian.
>
> ------------------------------------------------------------------------
>
> *Von:*Xiao, Jack <Jack.Xiao@amd.com>
> *Gesendet:* Donnerstag, 2. Februar 2023 10:54
> *An:* amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; 
> Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian 
> <Christian.Koenig@amd.com>
> *Cc:* Xiao, Jack <Jack.Xiao@amd.com>
> *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is 
> unavailable
>
> Reduce waringings, only warn when DMA is unavailable.
>
> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 2d237f3d3a2e..e3e3764ea697 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, 
> u64 *gpu_addr,
>          if (*bo == NULL)
>                  return;
>
> - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
> + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
> + 
> !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>          if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
>                  if (cpu_addr)
> -- 
> 2.37.3
>

[-- Attachment #2: Type: text/html, Size: 8176 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-03 13:19     ` Christian König
@ 2023-02-06  7:23       ` Xiao, Jack
  2023-02-06  8:05         ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao, Jack @ 2023-02-06  7:23 UTC (permalink / raw)
  To: Christian König, Koenig, Christian, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 3539 bytes --]

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.
Do you know the root cause of these cases hitting the issue? So that we can get an exact point to warn the freeing up behavior.

Thanks,
Jack

From: Christian König <ckoenig.leichtzumerken@gmail.com>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3


[-- Attachment #2: Type: text/html, Size: 8941 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-06  7:23       ` Xiao, Jack
@ 2023-02-06  8:05         ` Christian König
  2023-02-06  8:28           ` Xiao, Jack
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-06  8:05 UTC (permalink / raw)
  To: Xiao, Jack, Christian König, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 4116 bytes --]

Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
> >> Nope, that is not related to any hw state.
>
> can use other flag.
>
> >> It's simply not allowed to free up resources during suspend since 
> those can't be acquired again during resume.
>
> The in_suspend flag is set at the beginning of suspend and unset at 
> the end of resume. It can’t filter the case you mentioned.
>

Why not? This is exactly what it should do.

> Do you know the root cause of these cases hitting the issue? So that 
> we can get an exact point to warn the freeing up behavior.
>

Well the root cause are programming errors. See between suspending and 
resuming you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of 
that, we should probably add another warning during allocation of 
memory. But this here is certainly correct.

Regards,
Christian.

> Thanks,
>
> Jack
>
> *From:* Christian König <ckoenig.leichtzumerken@gmail.com>
> *Sent:* Friday, February 3, 2023 9:20 PM
> *To:* Xiao, Jack <Jack.Xiao@amd.com>; Koenig, Christian 
> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, 
> Alexander <Alexander.Deucher@amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Nope, that is not related to any hw state.
>
> It's simply not allowed to free up resources during suspend since 
> those can't be acquired again during resume.
>
> We had a couple of cases now where this was wrong. If you get a 
> warning from that please fix the code which tried to free something 
> during suspend instead.
>
> Regards,
> Christian.
>
> Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
>     [AMD Official Use Only - General]
>
>     >> It's simply illegal to free up memory during suspend.
>
>     Why? In my understanding, the limit was caused by DMA shutdown.
>
>     Regards,
>
>     Jack
>
>     *From:* Koenig, Christian <Christian.Koenig@amd.com>
>     <mailto:Christian.Koenig@amd.com>
>     *Sent:* Thursday, February 2, 2023 7:43 PM
>     *To:* Xiao, Jack <Jack.Xiao@amd.com> <mailto:Jack.Xiao@amd.com>;
>     amd-gfx@lists.freedesktop.org; Deucher, Alexander
>     <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>     *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing buffers when
>     DMA is unavailable
>
>     Big NAK to this! This warning is not related in any way to the hw
>     state.
>
>     It's simply illegal to free up memory during suspend.
>
>     Regards,
>
>     Christian.
>
>     ------------------------------------------------------------------------
>
>     *Von:*Xiao, Jack <Jack.Xiao@amd.com>
>     *Gesendet:* Donnerstag, 2. Februar 2023 10:54
>     *An:* amd-gfx@lists.freedesktop.org
>     <amd-gfx@lists.freedesktop.org>; Deucher, Alexander
>     <Alexander.Deucher@amd.com>; Koenig, Christian
>     <Christian.Koenig@amd.com>
>     *Cc:* Xiao, Jack <Jack.Xiao@amd.com>
>     *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers when DMA
>     is unavailable
>
>     Reduce waringings, only warn when DMA is unavailable.
>
>     Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
>     ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>      1 file changed, 2 insertions(+), 1 deletion(-)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     index 2d237f3d3a2e..e3e3764ea697 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo
>     **bo, u64 *gpu_addr,
>              if (*bo == NULL)
>                      return;
>
>     - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
>     + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
>     +
>     !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>              if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
>                      if (cpu_addr)
>     -- 
>     2.37.3
>

[-- Attachment #2: Type: text/html, Size: 12211 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-06  8:05         ` Christian König
@ 2023-02-06  8:28           ` Xiao, Jack
  2023-02-06  8:59             ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao, Jack @ 2023-02-06  8:28 UTC (permalink / raw)
  To: Koenig, Christian, Christian König, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 5073 bytes --]

[AMD Official Use Only - General]

                              >> >> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
                              >> The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

               Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the issue you described. But only checking in_suspend flag would take these cases as warning.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack <Jack.Xiao@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

Why not? This is exactly what it should do.

Do you know the root cause of these cases hitting the issue? So that we can get an exact point to warn the freeing up behavior.

Well the root cause are programming errors. See between suspending and resuming you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of that, we should probably add another warning during allocation of memory. But this here is certainly correct.

Regards,
Christian.


Thanks,
Jack

From: Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3



[-- Attachment #2: Type: text/html, Size: 12246 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-06  8:28           ` Xiao, Jack
@ 2023-02-06  8:59             ` Christian König
  2023-02-10  4:12               ` Quan, Evan
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-06  8:59 UTC (permalink / raw)
  To: Xiao, Jack, Koenig, Christian, amd-gfx, Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 5883 bytes --]

Am 06.02.23 um 09:28 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
>                >> >> It's simply not allowed to free up resources 
> during suspend since those can't be acquired again during resume.
>
>                               >> The in_suspend flag is set at the 
> beginning of suspend and unset at the end of resume. It can’t filter 
> the case you mentioned.
>
>
>                Why not? This is exactly what it should do.
>
> [Jack] If freeing up resources during resume, it should not hit the 
> issue you described. But only checking in_suspend flag would take 
> these cases as warning.
>

No, once more: Freeing up or allocating resources between suspend and 
resume is illegal!

If you free up a resource during resume you should absolutely hit that, 
this is intentional!

Regards,
Christian.

> Regards,
>
> Jack
>
> *From:* Koenig, Christian <Christian.Koenig@amd.com>
> *Sent:* Monday, February 6, 2023 4:06 PM
> *To:* Xiao, Jack <Jack.Xiao@amd.com>; Christian König 
> <ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org; 
> Deucher, Alexander <Alexander.Deucher@amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
>     [AMD Official Use Only - General]
>
>     >> Nope, that is not related to any hw state.
>
>     can use other flag.
>
>     >> It's simply not allowed to free up resources during suspend
>     since those can't be acquired again during resume.
>
>     The in_suspend flag is set at the beginning of suspend and unset
>     at the end of resume. It can’t filter the case you mentioned.
>
>
> Why not? This is exactly what it should do.
>
>     Do you know the root cause of these cases hitting the issue? So
>     that we can get an exact point to warn the freeing up behavior.
>
>
> Well the root cause are programming errors. See between suspending and 
> resuming you should not allocate nor free memory.
>
> Otherwise we can run into trouble. And this check here is one part of 
> that, we should probably add another warning during allocation of 
> memory. But this here is certainly correct.
>
> Regards,
> Christian.
>
>     Thanks,
>
>     Jack
>
>     *From:* Christian König <ckoenig.leichtzumerken@gmail.com>
>     <mailto:ckoenig.leichtzumerken@gmail.com>
>     *Sent:* Friday, February 3, 2023 9:20 PM
>     *To:* Xiao, Jack <Jack.Xiao@amd.com> <mailto:Jack.Xiao@amd.com>;
>     Koenig, Christian <Christian.Koenig@amd.com>
>     <mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org;
>     Deucher, Alexander <Alexander.Deucher@amd.com>
>     <mailto:Alexander.Deucher@amd.com>
>     *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
>     DMA is unavailable
>
>     Nope, that is not related to any hw state.
>
>     It's simply not allowed to free up resources during suspend since
>     those can't be acquired again during resume.
>
>     We had a couple of cases now where this was wrong. If you get a
>     warning from that please fix the code which tried to free
>     something during suspend instead.
>
>     Regards,
>     Christian.
>
>     Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
>         [AMD Official Use Only - General]
>
>         >> It's simply illegal to free up memory during suspend.
>
>         Why? In my understanding, the limit was caused by DMA shutdown.
>
>         Regards,
>
>         Jack
>
>         *From:* Koenig, Christian <Christian.Koenig@amd.com>
>         <mailto:Christian.Koenig@amd.com>
>         *Sent:* Thursday, February 2, 2023 7:43 PM
>         *To:* Xiao, Jack <Jack.Xiao@amd.com>
>         <mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org;
>         Deucher, Alexander <Alexander.Deucher@amd.com>
>         <mailto:Alexander.Deucher@amd.com>
>         *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing buffers
>         when DMA is unavailable
>
>         Big NAK to this! This warning is not related in any way to the
>         hw state.
>
>         It's simply illegal to free up memory during suspend.
>
>         Regards,
>
>         Christian.
>
>         ------------------------------------------------------------------------
>
>         *Von:*Xiao, Jack <Jack.Xiao@amd.com>
>         *Gesendet:* Donnerstag, 2. Februar 2023 10:54
>         *An:* amd-gfx@lists.freedesktop.org
>         <amd-gfx@lists.freedesktop.org>; Deucher, Alexander
>         <Alexander.Deucher@amd.com>; Koenig, Christian
>         <Christian.Koenig@amd.com>
>         *Cc:* Xiao, Jack <Jack.Xiao@amd.com>
>         *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers when
>         DMA is unavailable
>
>         Reduce waringings, only warn when DMA is unavailable.
>
>         Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
>         ---
>          drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>          1 file changed, 2 insertions(+), 1 deletion(-)
>
>         diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         index 2d237f3d3a2e..e3e3764ea697 100644
>         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct
>         amdgpu_bo **bo, u64 *gpu_addr,
>                  if (*bo == NULL)
>                          return;
>
>         - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
>         + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
>         +
>         !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>                  if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
>                          if (cpu_addr)
>         -- 
>         2.37.3
>

[-- Attachment #2: Type: text/html, Size: 16099 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-06  8:59             ` Christian König
@ 2023-02-10  4:12               ` Quan, Evan
  2023-02-10  8:07                 ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Quan, Evan @ 2023-02-10  4:12 UTC (permalink / raw)
  To: Christian König, Xiao, Jack, Koenig, Christian, amd-gfx,
	Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 9132 bytes --]

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?
It seems mes created some bo for its self test and freed it up later at the final stage of the resuming process.
All these happened before the in_suspend flag cleared. And that triggered the call trace.
Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_sm
[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G        W IOE      6.0.0-custom #1
[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[74084.931074] Workqueue: events_unbound async_run_entry_fn
[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
[74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
[74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX: 0000000000000000
[74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI: ffffbed6812ebc50
[74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09: 00000000000001ff
[74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12: ffffbed6812ebc70
[74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15: ffff9639c7da5630
[74085.002160] FS:  0000000000000000(0000) GS:ffff963d1dc80000(0000) knlGS:0000000000000000
[74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4: 00000000003706e0
[74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[74085.037453] Call Trace:
[74085.039911]  <TASK>
[74085.042023]  amdgpu_mes_self_test+0x385/0x460 [amdgpu]
[74085.047293]  mes_v11_0_late_init+0x44/0x50 [amdgpu]
[74085.052291]  amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
[74085.058032]  amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
[74085.063187]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[74085.068162]  pci_pm_resume+0x68/0x100
[74085.071836]  ? pci_legacy_resume+0x80/0x80
[74085.075943]  dpm_run_callback+0x4c/0x160
[74085.079873]  device_resume+0xad/0x210
[74085.083546]  async_resume+0x1e/0x40
[74085.087046]  async_run_entry_fn+0x30/0x120
[74085.091152]  process_one_work+0x21a/0x3f0
[74085.095173]  worker_thread+0x50/0x3e0
[74085.098845]  ? process_one_work+0x3f0/0x3f0
[74085.103039]  kthread+0xfa/0x130
[74085.106189]  ? kthread_complete_and_exit+0x20/0x20
[74085.110993]  ret_from_fork+0x1f/0x30
[74085.114576]  </TASK>
[74085.116773] ---[ end trace 0000000000000000 ]---

BR
Evan
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Monday, February 6, 2023 5:00 PM
To: Xiao, Jack <Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 09:28 schrieb Xiao, Jack:

[AMD Official Use Only - General]

                              >> >> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
                              >> The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

               Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the issue you described. But only checking in_suspend flag would take these cases as warning.

No, once more: Freeing up or allocating resources between suspend and resume is illegal!

If you free up a resource during resume you should absolutely hit that, this is intentional!

Regards,
Christian.


Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

Why not? This is exactly what it should do.

Do you know the root cause of these cases hitting the issue? So that we can get an exact point to warn the freeing up behavior.

Well the root cause are programming errors. See between suspending and resuming you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of that, we should probably add another warning during allocation of memory. But this here is certainly correct.

Regards,
Christian.


Thanks,
Jack

From: Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3




[-- Attachment #2: Type: text/html, Size: 19278 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-10  4:12               ` Quan, Evan
@ 2023-02-10  8:07                 ` Christian König
  2023-02-10  9:51                   ` Xiao, Jack
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-10  8:07 UTC (permalink / raw)
  To: Quan, Evan, Christian König, Xiao, Jack, amd-gfx, Deucher,
	Alexander

[-- Attachment #1: Type: text/plain, Size: 10753 bytes --]

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating 
buffers temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel 
ring interface in a way we didn't want anyway and is currently replaced 
by Shahanks work.

Regards,
Christian.

Am 10.02.23 um 05:12 schrieb Quan, Evan:
>
> [AMD Official Use Only - General]
>
> Hi Jack,
>
> Are you trying to fix the call trace popped up on resuming below?
>
> It seems mes created some bo for its self test and freed it up later 
> at the final stage of the resuming process.
>
> All these happened before the in_suspend flag cleared. And that 
> triggered the call trace.
>
> Is my understanding correct?
>
> [74084.799260] WARNING: CPU: 2 PID: 2891 at 
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 
> amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
> [74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 
> gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper 
> drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect 
> sysimgblt snd_sm
>
> [74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp 
> hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 
> ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
>
> [74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G 
>        W IOE      6.0.0-custom #1
>
> [74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, 
> BIOS 2004 11/02/2021
>
> [74084.931074] Workqueue: events_unbound async_run_entry_fn
>
> [74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
> [74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 
> e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc 
> cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
>
> [74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
>
> [74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX: 
> 0000000000000000
>
> [74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI: 
> ffffbed6812ebc50
>
> [74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09: 
> 00000000000001ff
>
> [74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12: 
> ffffbed6812ebc70
>
> [74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15: 
> ffff9639c7da5630
>
> [74085.002160] FS:  0000000000000000(0000) GS:ffff963d1dc80000(0000) 
> knlGS:0000000000000000
>
> [74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
> [74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4: 
> 00000000003706e0
>
> [74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
>
> [74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
> 0000000000000400
>
> [74085.037453] Call Trace:
>
> [74085.039911]  <TASK>
>
> [74085.042023] amdgpu_mes_self_test+0x385/0x460 [amdgpu]
>
> [74085.047293] mes_v11_0_late_init+0x44/0x50 [amdgpu]
>
> [74085.052291] amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
>
> [74085.058032] amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
>
> [74085.063187] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
>
> [74085.068162]  pci_pm_resume+0x68/0x100
>
> [74085.071836]  ? pci_legacy_resume+0x80/0x80
>
> [74085.075943]  dpm_run_callback+0x4c/0x160
>
> [74085.079873]  device_resume+0xad/0x210
>
> [74085.083546]  async_resume+0x1e/0x40
>
> [74085.087046] async_run_entry_fn+0x30/0x120
>
> [74085.091152] process_one_work+0x21a/0x3f0
>
> [74085.095173]  worker_thread+0x50/0x3e0
>
> [74085.098845]  ? process_one_work+0x3f0/0x3f0
>
> [74085.103039]  kthread+0xfa/0x130
>
> [74085.106189]  ? kthread_complete_and_exit+0x20/0x20
>
> [74085.110993]  ret_from_fork+0x1f/0x30
>
> [74085.114576]  </TASK>
>
> [74085.116773] ---[ end trace 0000000000000000 ]---
>
> BR
>
> Evan
>
> *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of 
> *Christian König
> *Sent:* Monday, February 6, 2023 5:00 PM
> *To:* Xiao, Jack <Jack.Xiao@amd.com>; Koenig, Christian 
> <Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, 
> Alexander <Alexander.Deucher@amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Am 06.02.23 um 09:28 schrieb Xiao, Jack:
>
>     [AMD Official Use Only - General]
>
>                    >> >> It's simply not allowed to free up resources
>     during suspend since those can't be acquired again during resume.
>
>                                   >> The in_suspend flag is set at the
>     beginning of suspend and unset at the end of resume. It can’t
>     filter the case you mentioned.
>
>
>                    Why not? This is exactly what it should do.
>
>     [Jack] If freeing up resources during resume, it should not hit
>     the issue you described. But only checking in_suspend flag would
>     take these cases as warning.
>
>
> No, once more: Freeing up or allocating resources between suspend and 
> resume is illegal!
>
> If you free up a resource during resume you should absolutely hit 
> that, this is intentional!
>
> Regards,
> Christian.
>
>     Regards,
>
>     Jack
>
>     *From:* Koenig, Christian <Christian.Koenig@amd.com>
>     <mailto:Christian.Koenig@amd.com>
>     *Sent:* Monday, February 6, 2023 4:06 PM
>     *To:* Xiao, Jack <Jack.Xiao@amd.com> <mailto:Jack.Xiao@amd.com>;
>     Christian König <ckoenig.leichtzumerken@gmail.com>
>     <mailto:ckoenig.leichtzumerken@gmail.com>;
>     amd-gfx@lists.freedesktop.org; Deucher, Alexander
>     <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>     *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
>     DMA is unavailable
>
>     Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
>         [AMD Official Use Only - General]
>
>         >> Nope, that is not related to any hw state.
>
>         can use other flag.
>
>         >> It's simply not allowed to free up resources during suspend
>         since those can't be acquired again during resume.
>
>         The in_suspend flag is set at the beginning of suspend and
>         unset at the end of resume. It can’t filter the case you
>         mentioned.
>
>
>     Why not? This is exactly what it should do.
>
>         Do you know the root cause of these cases hitting the issue?
>         So that we can get an exact point to warn the freeing up behavior.
>
>
>     Well the root cause are programming errors. See between suspending
>     and resuming you should not allocate nor free memory.
>
>     Otherwise we can run into trouble. And this check here is one part
>     of that, we should probably add another warning during allocation
>     of memory. But this here is certainly correct.
>
>     Regards,
>     Christian.
>
>         Thanks,
>
>         Jack
>
>         *From:* Christian König <ckoenig.leichtzumerken@gmail.com>
>         <mailto:ckoenig.leichtzumerken@gmail.com>
>         *Sent:* Friday, February 3, 2023 9:20 PM
>         *To:* Xiao, Jack <Jack.Xiao@amd.com>
>         <mailto:Jack.Xiao@amd.com>; Koenig, Christian
>         <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>;
>         amd-gfx@lists.freedesktop.org; Deucher, Alexander
>         <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>         *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers
>         when DMA is unavailable
>
>         Nope, that is not related to any hw state.
>
>         It's simply not allowed to free up resources during suspend
>         since those can't be acquired again during resume.
>
>         We had a couple of cases now where this was wrong. If you get
>         a warning from that please fix the code which tried to free
>         something during suspend instead.
>
>         Regards,
>         Christian.
>
>         Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
>             [AMD Official Use Only - General]
>
>             >> It's simply illegal to free up memory during suspend.
>
>             Why? In my understanding, the limit was caused by DMA
>             shutdown.
>
>             Regards,
>
>             Jack
>
>             *From:* Koenig, Christian <Christian.Koenig@amd.com>
>             <mailto:Christian.Koenig@amd.com>
>             *Sent:* Thursday, February 2, 2023 7:43 PM
>             *To:* Xiao, Jack <Jack.Xiao@amd.com>
>             <mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org;
>             Deucher, Alexander <Alexander.Deucher@amd.com>
>             <mailto:Alexander.Deucher@amd.com>
>             *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing
>             buffers when DMA is unavailable
>
>             Big NAK to this! This warning is not related in any way to
>             the hw state.
>
>             It's simply illegal to free up memory during suspend.
>
>             Regards,
>
>             Christian.
>
>             ------------------------------------------------------------------------
>
>             *Von:*Xiao, Jack <Jack.Xiao@amd.com>
>             *Gesendet:* Donnerstag, 2. Februar 2023 10:54
>             *An:* amd-gfx@lists.freedesktop.org
>             <amd-gfx@lists.freedesktop.org>; Deucher, Alexander
>             <Alexander.Deucher@amd.com>; Koenig, Christian
>             <Christian.Koenig@amd.com>
>             *Cc:* Xiao, Jack <Jack.Xiao@amd.com>
>             *Betreff:* [PATCH] drm/amdgpu: only WARN freeing buffers
>             when DMA is unavailable
>
>             Reduce waringings, only warn when DMA is unavailable.
>
>             Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
>             ---
>              drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>              1 file changed, 2 insertions(+), 1 deletion(-)
>
>             diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             index 2d237f3d3a2e..e3e3764ea697 100644
>             --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>             @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct
>             amdgpu_bo **bo, u64 *gpu_addr,
>                      if (*bo == NULL)
>                              return;
>
>             - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
>             + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
>             +
>             !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>                      if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
>                              if (cpu_addr)
>             -- 
>             2.37.3
>

[-- Attachment #2: Type: text/html, Size: 25606 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-10  8:07                 ` Christian König
@ 2023-02-10  9:51                   ` Xiao, Jack
  2023-02-10 10:25                     ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao, Jack @ 2023-02-10  9:51 UTC (permalink / raw)
  To: Koenig, Christian, Quan, Evan, Christian König, amd-gfx,
	Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 11375 bytes --]

[AMD Official Use Only - General]

Hi Christian,

>> Allocating buffers temporary for stuff like that is illegal during resume.

Can you *deeply* explain why it is illegal during ip late_init stage which is a part stage of resume?
In my understanding, after gmc ready, driver can allocate/free kernel bo, and after SDMA ready,
the eviction should be ready. What else prevent driver doing that during resume?

>> I strongly suggest to just remove the MES test. It's abusing the kernel ring interface in a way we didn't want anyway and is currently replaced by Shahanks work.

The kernel mes unit test is very meaningful and important to catch MES firmware issue at first time before
issue went spread to other components like kfd/umd to avoid the problem complicated, Otherwise, the issue
would become hard to catch and debug.

Secondly, for mes unit test is self-containing and no dependency, it is a part of milestone to qualify MES ready,
indicating that it can deliver to other component especially during brinup. It is likely ring test and ib test indicating
gfx is ready to go. After totally transitioning to gfx user queue, mes unit test may be the only one unit test which
can indicate gfx is ready at the very early stage of bringup when UMD is not ready.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Friday, February 10, 2023 4:08 PM
To: Quan, Evan <Evan.Quan@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; Xiao, Jack <Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating buffers temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel ring interface in a way we didn't want anyway and is currently replaced by Shahanks work.

Regards,
Christian.
Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?
It seems mes created some bo for its self test and freed it up later at the final stage of the resuming process.
All these happened before the in_suspend flag cleared. And that triggered the call trace.
Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_sm
[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G        W IOE      6.0.0-custom #1
[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[74084.931074] Workqueue: events_unbound async_run_entry_fn
[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
[74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
[74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX: 0000000000000000
[74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI: ffffbed6812ebc50
[74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09: 00000000000001ff
[74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12: ffffbed6812ebc70
[74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15: ffff9639c7da5630
[74085.002160] FS:  0000000000000000(0000) GS:ffff963d1dc80000(0000) knlGS:0000000000000000
[74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4: 00000000003706e0
[74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[74085.037453] Call Trace:
[74085.039911]  <TASK>
[74085.042023]  amdgpu_mes_self_test+0x385/0x460 [amdgpu]
[74085.047293]  mes_v11_0_late_init+0x44/0x50 [amdgpu]
[74085.052291]  amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
[74085.058032]  amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
[74085.063187]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[74085.068162]  pci_pm_resume+0x68/0x100
[74085.071836]  ? pci_legacy_resume+0x80/0x80
[74085.075943]  dpm_run_callback+0x4c/0x160
[74085.079873]  device_resume+0xad/0x210
[74085.083546]  async_resume+0x1e/0x40
[74085.087046]  async_run_entry_fn+0x30/0x120
[74085.091152]  process_one_work+0x21a/0x3f0
[74085.095173]  worker_thread+0x50/0x3e0
[74085.098845]  ? process_one_work+0x3f0/0x3f0
[74085.103039]  kthread+0xfa/0x130
[74085.106189]  ? kthread_complete_and_exit+0x20/0x20
[74085.110993]  ret_from_fork+0x1f/0x30
[74085.114576]  </TASK>
[74085.116773] ---[ end trace 0000000000000000 ]---

BR
Evan
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Monday, February 6, 2023 5:00 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 09:28 schrieb Xiao, Jack:

[AMD Official Use Only - General]

                              >> >> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
                              >> The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

               Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the issue you described. But only checking in_suspend flag would take these cases as warning.

No, once more: Freeing up or allocating resources between suspend and resume is illegal!

If you free up a resource during resume you should absolutely hit that, this is intentional!

Regards,
Christian.



Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

Why not? This is exactly what it should do.


Do you know the root cause of these cases hitting the issue? So that we can get an exact point to warn the freeing up behavior.

Well the root cause are programming errors. See between suspending and resuming you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of that, we should probably add another warning during allocation of memory. But this here is certainly correct.

Regards,
Christian.



Thanks,
Jack

From: Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3





[-- Attachment #2: Type: text/html, Size: 23456 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-10  9:51                   ` Xiao, Jack
@ 2023-02-10 10:25                     ` Christian König
  2023-02-10 11:30                       ` Xiao, Jack
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2023-02-10 10:25 UTC (permalink / raw)
  To: Xiao, Jack, Quan, Evan, Christian König, amd-gfx, Deucher,
	Alexander

[-- Attachment #1: Type: text/plain, Size: 14692 bytes --]

Hi Jack,

Am 10.02.23 um 10:51 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
>
> Hi Christian,
>
> >> Allocating buffers temporary for stuff like that is illegal during 
> resume.
>
> Can you **deeply** explain why it is illegal during ip late_init stage 
> which is a part stage of resume?
>

Well no, I don't have the time to explain this to everybody individually.

> In my understanding, after gmc ready, driver can allocate/free kernel 
> bo, and after SDMA ready,
>
> the eviction should be ready. What else prevent driver doing that 
> during resume?
>

The driver are resumed before the core Linux memory management is ready 
to serve allocations. E.g. swap for example isn't turned on yet.

This means that this stuff only worked because we were able to allocate 
memory from the pool which isn't guaranteed in any way.

> >> I strongly suggest to just remove the MES test. It's abusing the 
> kernel ring interface in a way we didn't want anyway and is currently 
> replaced by Shahanks work.
>
> The kernel mes unit test is very meaningful and important to catch MES 
> firmware issue at first time before
>
> issue went spread to other components like kfd/umd to avoid the 
> problem complicated, Otherwise, the issue
>
> would become hard to catch and debug.
>
> Secondly, for mes unit test is self-containing and no dependency, it 
> is a part of milestone to qualify MES ready,
>
> indicating that it can deliver to other component especially during 
> brinup. It is likely ring test and ib test indicating
>
> gfx is ready to go. After totally transitioning to gfx user queue, mes 
> unit test may be the only one unit test which
>
> can indicate gfx is ready at the very early stage of bringup when UMD 
> is not ready.
>

Alex and I are the maintainers of the driver who are deciding stuff like 
that and at least I don't really buy that argument. The ring, IB and 
benchmark tests are in the kernel module because they are simple.

If we have a complicated unit test like simulating creating an MES user 
queue to test the firmware functionality than that's really overkill. 
Especially when you need to allocate memory for it.

We previously had people requesting to add shader code and other 
complicated testing and rejected that as well because it just bloat up 
the kernel driver unnecessarily.

If we can modify the MES test to not abuse the amdgpu_ring structure 
only work with memory from the SA for example we could keep this, but 
not really in the current state.

Regards,
Christian.

> Regards,
>
> Jack
>
> *From:* Koenig, Christian <Christian.Koenig@amd.com>
> *Sent:* Friday, February 10, 2023 4:08 PM
> *To:* Quan, Evan <Evan.Quan@amd.com>; Christian König 
> <ckoenig.leichtzumerken@gmail.com>; Xiao, Jack <Jack.Xiao@amd.com>; 
> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Hi Evan,
>
> yeah, exactly that's what this warning should prevent. Allocating 
> buffers temporary for stuff like that is illegal during resume.
>
> I strongly suggest to just remove the MES test. It's abusing the 
> kernel ring interface in a way we didn't want anyway and is currently 
> replaced by Shahanks work.
>
> Regards,
> Christian.
>
> Am 10.02.23 um 05:12 schrieb Quan, Evan:
>
>     [AMD Official Use Only - General]
>
>     Hi Jack,
>
>     Are you trying to fix the call trace popped up on resuming below?
>
>     It seems mes created some bo for its self test and freed it up
>     later at the final stage of the resuming process.
>
>     All these happened before the in_suspend flag cleared. And that
>     triggered the call trace.
>
>     Is my understanding correct?
>
>     [74084.799260] WARNING: CPU: 2 PID: 2891 at
>     drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425
>     amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
>     [74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE)
>     iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper
>     drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect
>     sysimgblt snd_sm
>
>     [74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp
>     hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video
>     i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
>
>     [74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G
>            W IOE      6.0.0-custom #1
>
>     [74084.923146] Hardware name: ASUS System Product Name/PRIME
>     Z390-A, BIOS 2004 11/02/2021
>
>     [74084.931074] Workqueue: events_unbound async_run_entry_fn
>
>     [74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
>     [74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d
>     85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d
>     c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96
>     47 00 ebf
>
>     [74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
>
>     [74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX:
>     0000000000000000
>
>     [74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI:
>     ffffbed6812ebc50
>
>     [74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09:
>     00000000000001ff
>
>     [74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12:
>     ffffbed6812ebc70
>
>     [74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15:
>     ffff9639c7da5630
>
>     [74085.002160] FS: 0000000000000000(0000)
>     GS:ffff963d1dc80000(0000) knlGS:0000000000000000
>
>     [74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
>     [74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4:
>     00000000003706e0
>
>     [74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>     0000000000000000
>
>     [74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>     0000000000000400
>
>     [74085.037453] Call Trace:
>
>     [74085.039911]  <TASK>
>
>     [74085.042023] amdgpu_mes_self_test+0x385/0x460 [amdgpu]
>
>     [74085.047293] mes_v11_0_late_init+0x44/0x50 [amdgpu]
>
>     [74085.052291] amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
>
>     [74085.058032] amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
>
>     [74085.063187] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
>
>     [74085.068162]  pci_pm_resume+0x68/0x100
>
>     [74085.071836]  ? pci_legacy_resume+0x80/0x80
>
>     [74085.075943] dpm_run_callback+0x4c/0x160
>
>     [74085.079873]  device_resume+0xad/0x210
>
>     [74085.083546]  async_resume+0x1e/0x40
>
>     [74085.087046] async_run_entry_fn+0x30/0x120
>
>     [74085.091152] process_one_work+0x21a/0x3f0
>
>     [74085.095173]  worker_thread+0x50/0x3e0
>
>     [74085.098845]  ? process_one_work+0x3f0/0x3f0
>
>     [74085.103039]  kthread+0xfa/0x130
>
>     [74085.106189]  ? kthread_complete_and_exit+0x20/0x20
>
>     [74085.110993]  ret_from_fork+0x1f/0x30
>
>     [74085.114576]  </TASK>
>
>     [74085.116773] ---[ end trace 0000000000000000 ]---
>
>     BR
>
>     Evan
>
>     *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>     <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of
>     *Christian König
>     *Sent:* Monday, February 6, 2023 5:00 PM
>     *To:* Xiao, Jack <Jack.Xiao@amd.com> <mailto:Jack.Xiao@amd.com>;
>     Koenig, Christian <Christian.Koenig@amd.com>
>     <mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org;
>     Deucher, Alexander <Alexander.Deucher@amd.com>
>     <mailto:Alexander.Deucher@amd.com>
>     *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
>     DMA is unavailable
>
>     Am 06.02.23 um 09:28 schrieb Xiao, Jack:
>
>         [AMD Official Use Only - General]
>
>                        >> >> It's simply not allowed to free up
>         resources during suspend since those can't be acquired again
>         during resume.
>
>         >> The in_suspend flag is set at the beginning of suspend and
>         unset at the end of resume. It can’t filter the case you
>         mentioned.
>
>
>                        Why not? This is exactly what it should do.
>
>         [Jack] If freeing up resources during resume, it should not
>         hit the issue you described. But only checking in_suspend flag
>         would take these cases as warning.
>
>
>     No, once more: Freeing up or allocating resources between suspend
>     and resume is illegal!
>
>     If you free up a resource during resume you should absolutely hit
>     that, this is intentional!
>
>     Regards,
>     Christian.
>
>
>         Regards,
>
>         Jack
>
>         *From:* Koenig, Christian <Christian.Koenig@amd.com>
>         <mailto:Christian.Koenig@amd.com>
>         *Sent:* Monday, February 6, 2023 4:06 PM
>         *To:* Xiao, Jack <Jack.Xiao@amd.com>
>         <mailto:Jack.Xiao@amd.com>; Christian König
>         <ckoenig.leichtzumerken@gmail.com>
>         <mailto:ckoenig.leichtzumerken@gmail.com>;
>         amd-gfx@lists.freedesktop.org; Deucher, Alexander
>         <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>         *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers
>         when DMA is unavailable
>
>         Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
>             [AMD Official Use Only - General]
>
>             >> Nope, that is not related to any hw state.
>
>             can use other flag.
>
>             >> It's simply not allowed to free up resources during
>             suspend since those can't be acquired again during resume.
>
>             The in_suspend flag is set at the beginning of suspend and
>             unset at the end of resume. It can’t filter the case you
>             mentioned.
>
>
>         Why not? This is exactly what it should do.
>
>
>             Do you know the root cause of these cases hitting the
>             issue? So that we can get an exact point to warn the
>             freeing up behavior.
>
>
>         Well the root cause are programming errors. See between
>         suspending and resuming you should not allocate nor free memory.
>
>         Otherwise we can run into trouble. And this check here is one
>         part of that, we should probably add another warning during
>         allocation of memory. But this here is certainly correct.
>
>         Regards,
>         Christian.
>
>
>             Thanks,
>
>             Jack
>
>             *From:* Christian König <ckoenig.leichtzumerken@gmail.com>
>             <mailto:ckoenig.leichtzumerken@gmail.com>
>             *Sent:* Friday, February 3, 2023 9:20 PM
>             *To:* Xiao, Jack <Jack.Xiao@amd.com>
>             <mailto:Jack.Xiao@amd.com>; Koenig, Christian
>             <Christian.Koenig@amd.com>
>             <mailto:Christian.Koenig@amd.com>;
>             amd-gfx@lists.freedesktop.org; Deucher, Alexander
>             <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>             *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing
>             buffers when DMA is unavailable
>
>             Nope, that is not related to any hw state.
>
>             It's simply not allowed to free up resources during
>             suspend since those can't be acquired again during resume.
>
>             We had a couple of cases now where this was wrong. If you
>             get a warning from that please fix the code which tried to
>             free something during suspend instead.
>
>             Regards,
>             Christian.
>
>             Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
>                 [AMD Official Use Only - General]
>
>                 >> It's simply illegal to free up memory during suspend.
>
>                 Why? In my understanding, the limit was caused by DMA
>                 shutdown.
>
>                 Regards,
>
>                 Jack
>
>                 *From:* Koenig, Christian <Christian.Koenig@amd.com>
>                 <mailto:Christian.Koenig@amd.com>
>                 *Sent:* Thursday, February 2, 2023 7:43 PM
>                 *To:* Xiao, Jack <Jack.Xiao@amd.com>
>                 <mailto:Jack.Xiao@amd.com>;
>                 amd-gfx@lists.freedesktop.org; Deucher, Alexander
>                 <Alexander.Deucher@amd.com>
>                 <mailto:Alexander.Deucher@amd.com>
>                 *Subject:* AW: [PATCH] drm/amdgpu: only WARN freeing
>                 buffers when DMA is unavailable
>
>                 Big NAK to this! This warning is not related in any
>                 way to the hw state.
>
>                 It's simply illegal to free up memory during suspend.
>
>                 Regards,
>
>                 Christian.
>
>                 ------------------------------------------------------------------------
>
>                 *Von:*Xiao, Jack <Jack.Xiao@amd.com>
>                 *Gesendet:* Donnerstag, 2. Februar 2023 10:54
>                 *An:*
>                 amd-gfx@lists.freedesktop.org<amd-gfx@lists.freedesktop.org>;
>                 Deucher, Alexander <Alexander.Deucher@amd.com>;
>                 Koenig, Christian <Christian.Koenig@amd.com>
>                 *Cc:* Xiao, Jack <Jack.Xiao@amd.com>
>                 *Betreff:* [PATCH] drm/amdgpu: only WARN freeing
>                 buffers when DMA is unavailable
>
>                 Reduce waringings, only warn when DMA is unavailable.
>
>                 Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
>                 ---
>                  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>                  1 file changed, 2 insertions(+), 1 deletion(-)
>
>                 diff --git
>                 a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                 b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                 index 2d237f3d3a2e..e3e3764ea697 100644
>                 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                 @@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct
>                 amdgpu_bo **bo, u64 *gpu_addr,
>                          if (*bo == NULL)
>                                  return;
>
>                 - WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
>                 + WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
>                 +
>                 !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>                          if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
>                                  if (cpu_addr)
>                 -- 
>                 2.37.3
>

[-- Attachment #2: Type: text/html, Size: 33081 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-10 10:25                     ` Christian König
@ 2023-02-10 11:30                       ` Xiao, Jack
  2023-02-10 11:56                         ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Xiao, Jack @ 2023-02-10 11:30 UTC (permalink / raw)
  To: Koenig, Christian, Quan, Evan, Christian König, amd-gfx,
	Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 13855 bytes --]

[AMD Official Use Only - General]

>> The driver are resumed before the core Linux memory management is ready to serve allocations. E.g. swap for example isn't turned on yet.

>> This means that this stuff only worked because we were able to allocate memory from the pool which isn't guaranteed in any way.

Memory allocation failure can happen at any time, every programmer should correctly handle it.
If memory allocation failure is not critical error and can gracefully continue to run, it should be acceptable.
The memory allocation failure during mes self test should be the acceptable one. It will not make system hang up and
driver can gracefully continue to run.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: Friday, February 10, 2023 6:25 PM
To: Xiao, Jack <Jack.Xiao@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Hi Jack,

Am 10.02.23 um 10:51 schrieb Xiao, Jack:

[AMD Official Use Only - General]

Hi Christian,

>> Allocating buffers temporary for stuff like that is illegal during resume.

Can you *deeply* explain why it is illegal during ip late_init stage which is a part stage of resume?

Well no, I don't have the time to explain this to everybody individually.

[Jack] ...

In my understanding, after gmc ready, driver can allocate/free kernel bo, and after SDMA ready,
the eviction should be ready. What else prevent driver doing that during resume?

The driver are resumed before the core Linux memory management is ready to serve allocations. E.g. swap for example isn't turned on yet.

This means that this stuff only worked because we were able to allocate memory from the pool which isn't guaranteed in any way.

>> I strongly suggest to just remove the MES test. It's abusing the kernel ring interface in a way we didn't want anyway and is currently replaced by Shahanks work.

The kernel mes unit test is very meaningful and important to catch MES firmware issue at first time before
issue went spread to other components like kfd/umd to avoid the problem complicated, Otherwise, the issue
would become hard to catch and debug.

Secondly, for mes unit test is self-containing and no dependency, it is a part of milestone to qualify MES ready,
indicating that it can deliver to other component especially during brinup. It is likely ring test and ib test indicating
gfx is ready to go. After totally transitioning to gfx user queue, mes unit test may be the only one unit test which
can indicate gfx is ready at the very early stage of bringup when UMD is not ready.

Alex and I are the maintainers of the driver who are deciding stuff like that and at least I don't really buy that argument. The ring, IB and benchmark tests are in the kernel module because they are simple.

If we have a complicated unit test like simulating creating an MES user queue to test the firmware functionality than that's really overkill. Especially when you need to allocate memory for it.

We previously had people requesting to add shader code and other complicated testing and rejected that as well because it just bloat up the kernel driver unnecessarily.

If we can modify the MES test to not abuse the amdgpu_ring structure only work with memory from the SA for example we could keep this, but not really in the current state.

Regards,
Christian.


Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Friday, February 10, 2023 4:08 PM
To: Quan, Evan <Evan.Quan@amd.com><mailto:Evan.Quan@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>; Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Hi Evan,

yeah, exactly that's what this warning should prevent. Allocating buffers temporary for stuff like that is illegal during resume.

I strongly suggest to just remove the MES test. It's abusing the kernel ring interface in a way we didn't want anyway and is currently replaced by Shahanks work.

Regards,
Christian.
Am 10.02.23 um 05:12 schrieb Quan, Evan:

[AMD Official Use Only - General]

Hi Jack,

Are you trying to fix the call trace popped up on resuming below?
It seems mes created some bo for its self test and freed it up later at the final stage of the resuming process.
All these happened before the in_suspend flag cleared. And that triggered the call trace.
Is my understanding correct?

[74084.799260] WARNING: CPU: 2 PID: 2891 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE) iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm drm_display_helper drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_sm
[74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage libahci wmi
[74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted: G        W IOE      6.0.0-custom #1
[74084.923146] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[74084.931074] Workqueue: events_unbound async_run_entry_fn
[74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00 4d 85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f 85 75 96 47 00 ebf
[74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
[74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50 RCX: 0000000000000000
[74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60 RDI: ffffbed6812ebc50
[74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000 R09: 00000000000001ff
[74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000 R12: ffffbed6812ebc70
[74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00 R15: ffff9639c7da5630
[74085.002160] FS:  0000000000000000(0000) GS:ffff963d1dc80000(0000) knlGS:0000000000000000
[74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001 CR4: 00000000003706e0
[74085.023164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[74085.037453] Call Trace:
[74085.039911]  <TASK>
[74085.042023]  amdgpu_mes_self_test+0x385/0x460 [amdgpu]
[74085.047293]  mes_v11_0_late_init+0x44/0x50 [amdgpu]
[74085.052291]  amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
[74085.058032]  amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
[74085.063187]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[74085.068162]  pci_pm_resume+0x68/0x100
[74085.071836]  ? pci_legacy_resume+0x80/0x80
[74085.075943]  dpm_run_callback+0x4c/0x160
[74085.079873]  device_resume+0xad/0x210
[74085.083546]  async_resume+0x1e/0x40
[74085.087046]  async_run_entry_fn+0x30/0x120
[74085.091152]  process_one_work+0x21a/0x3f0
[74085.095173]  worker_thread+0x50/0x3e0
[74085.098845]  ? process_one_work+0x3f0/0x3f0
[74085.103039]  kthread+0xfa/0x130
[74085.106189]  ? kthread_complete_and_exit+0x20/0x20
[74085.110993]  ret_from_fork+0x1f/0x30
[74085.114576]  </TASK>
[74085.116773] ---[ end trace 0000000000000000 ]---

BR
Evan
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org><mailto:amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Christian König
Sent: Monday, February 6, 2023 5:00 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 09:28 schrieb Xiao, Jack:

[AMD Official Use Only - General]

                              >> >> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
                              >> The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

               Why not? This is exactly what it should do.

[Jack] If freeing up resources during resume, it should not hit the issue you described. But only checking in_suspend flag would take these cases as warning.

No, once more: Freeing up or allocating resources between suspend and resume is illegal!

If you free up a resource during resume you should absolutely hit that, this is intentional!

Regards,
Christian.



Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Monday, February 6, 2023 4:06 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Am 06.02.23 um 08:23 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> Nope, that is not related to any hw state.

can use other flag.

>> It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.
The in_suspend flag is set at the beginning of suspend and unset at the end of resume. It can't filter the case you mentioned.

Why not? This is exactly what it should do.


Do you know the root cause of these cases hitting the issue? So that we can get an exact point to warn the freeing up behavior.

Well the root cause are programming errors. See between suspending and resuming you should not allocate nor free memory.

Otherwise we can run into trouble. And this check here is one part of that, we should probably add another warning during allocation of memory. But this here is certainly correct.

Regards,
Christian.



Thanks,
Jack

From: Christian König <ckoenig.leichtzumerken@gmail.com><mailto:ckoenig.leichtzumerken@gmail.com>
Sent: Friday, February 3, 2023 9:20 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Nope, that is not related to any hw state.

It's simply not allowed to free up resources during suspend since those can't be acquired again during resume.

We had a couple of cases now where this was wrong. If you get a warning from that please fix the code which tried to free something during suspend instead.

Regards,
Christian.
Am 03.02.23 um 07:04 schrieb Xiao, Jack:

[AMD Official Use Only - General]

>> It's simply illegal to free up memory during suspend.
Why? In my understanding, the limit was caused by DMA shutdown.

Regards,
Jack

From: Koenig, Christian <Christian.Koenig@amd.com><mailto:Christian.Koenig@amd.com>
Sent: Thursday, February 2, 2023 7:43 PM
To: Xiao, Jack <Jack.Xiao@amd.com><mailto:Jack.Xiao@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: AW: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Big NAK to this! This warning is not related in any way to the hw state.

It's simply illegal to free up memory during suspend.

Regards,
Christian.

________________________________
Von: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Gesendet: Donnerstag, 2. Februar 2023 10:54
An: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Koenig, Christian <Christian.Koenig@amd.com<mailto:Christian.Koenig@amd.com>>
Cc: Xiao, Jack <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
Betreff: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable

Reduce waringings, only warn when DMA is unavailable.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com<mailto:Jack.Xiao@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..e3e3764ea697 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -422,7 +422,8 @@ void amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64 *gpu_addr,
         if (*bo == NULL)
                 return;

-       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
+       WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend &&
+               !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);

         if (likely(amdgpu_bo_reserve(*bo, true) == 0)) {
                 if (cpu_addr)
--
2.37.3






[-- Attachment #2: Type: text/html, Size: 27462 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable
  2023-02-10 11:30                       ` Xiao, Jack
@ 2023-02-10 11:56                         ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2023-02-10 11:56 UTC (permalink / raw)
  To: Xiao, Jack, Quan, Evan, Christian König, amd-gfx, Deucher,
	Alexander

[-- Attachment #1: Type: text/plain, Size: 17498 bytes --]

Am 10.02.23 um 12:30 schrieb Xiao, Jack:
>
> [AMD Official Use Only - General]
>
> >> The driver are resumed before the core Linux memory management is 
> ready to serve allocations. E.g. swap for example isn't turned on yet.
>
> >> This means that this stuff only worked because we were able to 
> allocate memory from the pool which isn't guaranteed in any way.
>
> Memory allocation failure can happen at any time, every programmer 
> should correctly handle it.
>

We are not talking about memory allocation failure, we are talking about 
the kernel calling panic() because it can't properly resume.

Regards,
Christian.

> If memory allocation failure is not critical error and can gracefully 
> continue to run, it should be acceptable.
>
> The memory allocation failure during mes self test should be the 
> acceptable one. It will not make system hang up and
>
> driver can gracefully continue to run.
>
> Regards,
>
> Jack
>
> *From:* Koenig, Christian <Christian.Koenig@amd.com>
> *Sent:* Friday, February 10, 2023 6:25 PM
> *To:* Xiao, Jack <Jack.Xiao@amd.com>; Quan, Evan <Evan.Quan@amd.com>; 
> Christian König <ckoenig.leichtzumerken@gmail.com>; 
> amd-gfx@lists.freedesktop.org; Deucher, Alexander 
> <Alexander.Deucher@amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when DMA 
> is unavailable
>
> Hi Jack,
>
> Am 10.02.23 um 10:51 schrieb Xiao, Jack:
>
>     [AMD Official Use Only - General]
>
>     Hi Christian,
>
>     >> Allocating buffers temporary for stuff like that is illegal
>     during resume.
>
>     Can you **deeply** explain why it is illegal during ip late_init
>     stage which is a part stage of resume?
>
>
> Well no, I don't have the time to explain this to everybody individually.
>
> [Jack] …
>
>     In my understanding, after gmc ready, driver can allocate/free
>     kernel bo, and after SDMA ready,
>
>     the eviction should be ready. What else prevent driver doing that
>     during resume?
>
>
> The driver are resumed before the core Linux memory management is 
> ready to serve allocations. E.g. swap for example isn't turned on yet.
>
> This means that this stuff only worked because we were able to 
> allocate memory from the pool which isn't guaranteed in any way.
>
>     >> I strongly suggest to just remove the MES test. It's abusing
>     the kernel ring interface in a way we didn't want anyway and is
>     currently replaced by Shahanks work.
>
>     The kernel mes unit test is very meaningful and important to catch
>     MES firmware issue at first time before
>
>     issue went spread to other components like kfd/umd to avoid the
>     problem complicated, Otherwise, the issue
>
>     would become hard to catch and debug.
>
>     Secondly, for mes unit test is self-containing and no dependency,
>     it is a part of milestone to qualify MES ready,
>
>     indicating that it can deliver to other component especially
>     during brinup. It is likely ring test and ib test indicating
>
>     gfx is ready to go. After totally transitioning to gfx user queue,
>     mes unit test may be the only one unit test which
>
>     can indicate gfx is ready at the very early stage of bringup when
>     UMD is not ready.
>
>
> Alex and I are the maintainers of the driver who are deciding stuff 
> like that and at least I don't really buy that argument. The ring, IB 
> and benchmark tests are in the kernel module because they are simple.
>
> If we have a complicated unit test like simulating creating an MES 
> user queue to test the firmware functionality than that's really 
> overkill. Especially when you need to allocate memory for it.
>
> We previously had people requesting to add shader code and other 
> complicated testing and rejected that as well because it just bloat up 
> the kernel driver unnecessarily.
>
> If we can modify the MES test to not abuse the amdgpu_ring structure 
> only work with memory from the SA for example we could keep this, but 
> not really in the current state.
>
> Regards,
> Christian.
>
>     Regards,
>
>     Jack
>
>     *From:* Koenig, Christian <Christian.Koenig@amd.com>
>     <mailto:Christian.Koenig@amd.com>
>     *Sent:* Friday, February 10, 2023 4:08 PM
>     *To:* Quan, Evan <Evan.Quan@amd.com> <mailto:Evan.Quan@amd.com>;
>     Christian König <ckoenig.leichtzumerken@gmail.com>
>     <mailto:ckoenig.leichtzumerken@gmail.com>; Xiao, Jack
>     <Jack.Xiao@amd.com> <mailto:Jack.Xiao@amd.com>;
>     amd-gfx@lists.freedesktop.org; Deucher, Alexander
>     <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>     *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers when
>     DMA is unavailable
>
>     Hi Evan,
>
>     yeah, exactly that's what this warning should prevent. Allocating
>     buffers temporary for stuff like that is illegal during resume.
>
>     I strongly suggest to just remove the MES test. It's abusing the
>     kernel ring interface in a way we didn't want anyway and is
>     currently replaced by Shahanks work.
>
>     Regards,
>     Christian.
>
>     Am 10.02.23 um 05:12 schrieb Quan, Evan:
>
>         [AMD Official Use Only - General]
>
>         Hi Jack,
>
>         Are you trying to fix the call trace popped up on resuming below?
>
>         It seems mes created some bo for its self test and freed it up
>         later at the final stage of the resuming process.
>
>         All these happened before the in_suspend flag cleared. And
>         that triggered the call trace.
>
>         Is my understanding correct?
>
>         [74084.799260] WARNING: CPU: 2 PID: 2891 at
>         drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425
>         amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
>         [74084.811019] Modules linked in: nls_iso8859_1 amdgpu(OE)
>         iommu_v2 gpu_sched drm_buddy drm_ttm_helper ttm
>         drm_display_helper drm_kms_helper i2c_algo_bit fb_sys_fops
>         syscopyarea sysfillrect sysimgblt snd_sm
>
>         [74084.811042]  ip_tables x_tables autofs4 hid_logitech_hidpp
>         hid_logitech_dj hid_generic e1000e usbhid ptp uas hid video
>         i2c_i801 ahci pps_core crc32_pclmul i2c_smbus usb_storage
>         libahci wmi
>
>         [74084.914519] CPU: 2 PID: 2891 Comm: kworker/u16:38 Tainted:
>         G        W IOE      6.0.0-custom #1
>
>         [74084.923146] Hardware name: ASUS System Product Name/PRIME
>         Z390-A, BIOS 2004 11/02/2021
>
>         [74084.931074] Workqueue: events_unbound async_run_entry_fn
>
>         [74084.936393] RIP: 0010:amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
>
>         [74084.942422] Code: 00 4d 85 ed 74 08 49 c7 45 00 00 00 00 00
>         4d 85 e4 74 08 49 c7 04 24 00 00 00 00 5b 41 5c 41 5d 41 5e 41
>         5f 5d c3 cc cc cc cc <0f> 0b e9 39 ff ff ff 3d 00 fe ff ff 0f
>         85 75 96 47 00 ebf
>
>         [74084.961199] RSP: 0000:ffffbed6812ebb90 EFLAGS: 00010202
>
>         [74084.966435] RAX: 0000000000000000 RBX: ffffbed6812ebc50
>         RCX: 0000000000000000
>
>         [74084.973578] RDX: ffffbed6812ebc70 RSI: ffffbed6812ebc60
>         RDI: ffffbed6812ebc50
>
>         [74084.980725] RBP: ffffbed6812ebbb8 R08: 0000000000000000
>         R09: 00000000000001ff
>
>         [74084.987869] R10: ffffbed6812ebb40 R11: 0000000000000000
>         R12: ffffbed6812ebc70
>
>         [74084.995015] R13: ffffbed6812ebc60 R14: ffff963a2945cc00
>         R15: ffff9639c7da5630
>
>         [74085.002160] FS: 0000000000000000(0000)
>         GS:ffff963d1dc80000(0000) knlGS:0000000000000000
>
>         [74085.010262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
>         [74085.016016] CR2: 0000000000000000 CR3: 0000000377c0a001
>         CR4: 00000000003706e0
>
>         [74085.023164] DR0: 0000000000000000 DR1: 0000000000000000
>         DR2: 0000000000000000
>
>         [74085.030307] DR3: 0000000000000000 DR6: 00000000fffe0ff0
>         DR7: 0000000000000400
>
>         [74085.037453] Call Trace:
>
>         [74085.039911]  <TASK>
>
>         [74085.042023] amdgpu_mes_self_test+0x385/0x460 [amdgpu]
>
>         [74085.047293] mes_v11_0_late_init+0x44/0x50 [amdgpu]
>
>         [74085.052291] amdgpu_device_ip_late_init+0x50/0x270 [amdgpu]
>
>         [74085.058032] amdgpu_device_resume+0xb0/0x2d0 [amdgpu]
>
>         [74085.063187] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
>
>         [74085.068162] pci_pm_resume+0x68/0x100
>
>         [74085.071836]  ? pci_legacy_resume+0x80/0x80
>
>         [74085.075943] dpm_run_callback+0x4c/0x160
>
>         [74085.079873] device_resume+0xad/0x210
>
>         [74085.083546]  async_resume+0x1e/0x40
>
>         [74085.087046] async_run_entry_fn+0x30/0x120
>
>         [74085.091152] process_one_work+0x21a/0x3f0
>
>         [74085.095173] worker_thread+0x50/0x3e0
>
>         [74085.098845]  ? process_one_work+0x3f0/0x3f0
>
>         [74085.103039]  kthread+0xfa/0x130
>
>         [74085.106189]  ? kthread_complete_and_exit+0x20/0x20
>
>         [74085.110993]  ret_from_fork+0x1f/0x30
>
>         [74085.114576]  </TASK>
>
>         [74085.116773] ---[ end trace 0000000000000000 ]---
>
>         BR
>
>         Evan
>
>         *From:* amd-gfx <amd-gfx-bounces@lists.freedesktop.org>
>         <mailto:amd-gfx-bounces@lists.freedesktop.org> *On Behalf Of
>         *Christian König
>         *Sent:* Monday, February 6, 2023 5:00 PM
>         *To:* Xiao, Jack <Jack.Xiao@amd.com>
>         <mailto:Jack.Xiao@amd.com>; Koenig, Christian
>         <Christian.Koenig@amd.com> <mailto:Christian.Koenig@amd.com>;
>         amd-gfx@lists.freedesktop.org; Deucher, Alexander
>         <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>         *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing buffers
>         when DMA is unavailable
>
>         Am 06.02.23 um 09:28 schrieb Xiao, Jack:
>
>             [AMD Official Use Only - General]
>
>                            >> >> It's simply not allowed to free up
>             resources during suspend since those can't be acquired
>             again during resume.
>
>             >> The in_suspend flag is set at the beginning of suspend
>             and unset at the end of resume. It can’t filter the case
>             you mentioned.
>
>
>                            Why not? This is exactly what it should do.
>
>             [Jack] If freeing up resources during resume, it should
>             not hit the issue you described. But only checking
>             in_suspend flag would take these cases as warning.
>
>
>         No, once more: Freeing up or allocating resources between
>         suspend and resume is illegal!
>
>         If you free up a resource during resume you should absolutely
>         hit that, this is intentional!
>
>         Regards,
>         Christian.
>
>
>             Regards,
>
>             Jack
>
>             *From:* Koenig, Christian <Christian.Koenig@amd.com>
>             <mailto:Christian.Koenig@amd.com>
>             *Sent:* Monday, February 6, 2023 4:06 PM
>             *To:* Xiao, Jack <Jack.Xiao@amd.com>
>             <mailto:Jack.Xiao@amd.com>; Christian König
>             <ckoenig.leichtzumerken@gmail.com>
>             <mailto:ckoenig.leichtzumerken@gmail.com>;
>             amd-gfx@lists.freedesktop.org; Deucher, Alexander
>             <Alexander.Deucher@amd.com> <mailto:Alexander.Deucher@amd.com>
>             *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing
>             buffers when DMA is unavailable
>
>             Am 06.02.23 um 08:23 schrieb Xiao, Jack:
>
>                 [AMD Official Use Only - General]
>
>                 >> Nope, that is not related to any hw state.
>
>                 can use other flag.
>
>                 >> It's simply not allowed to free up resources during
>                 suspend since those can't be acquired again during resume.
>
>                 The in_suspend flag is set at the beginning of suspend
>                 and unset at the end of resume. It can’t filter the
>                 case you mentioned.
>
>
>             Why not? This is exactly what it should do.
>
>
>                 Do you know the root cause of these cases hitting the
>                 issue? So that we can get an exact point to warn the
>                 freeing up behavior.
>
>
>             Well the root cause are programming errors. See between
>             suspending and resuming you should not allocate nor free
>             memory.
>
>             Otherwise we can run into trouble. And this check here is
>             one part of that, we should probably add another warning
>             during allocation of memory. But this here is certainly
>             correct.
>
>             Regards,
>             Christian.
>
>
>                 Thanks,
>
>                 Jack
>
>                 *From:* Christian König
>                 <ckoenig.leichtzumerken@gmail.com>
>                 <mailto:ckoenig.leichtzumerken@gmail.com>
>                 *Sent:* Friday, February 3, 2023 9:20 PM
>                 *To:* Xiao, Jack <Jack.Xiao@amd.com>
>                 <mailto:Jack.Xiao@amd.com>; Koenig, Christian
>                 <Christian.Koenig@amd.com>
>                 <mailto:Christian.Koenig@amd.com>;
>                 amd-gfx@lists.freedesktop.org; Deucher, Alexander
>                 <Alexander.Deucher@amd.com>
>                 <mailto:Alexander.Deucher@amd.com>
>                 *Subject:* Re: [PATCH] drm/amdgpu: only WARN freeing
>                 buffers when DMA is unavailable
>
>                 Nope, that is not related to any hw state.
>
>                 It's simply not allowed to free up resources during
>                 suspend since those can't be acquired again during resume.
>
>                 We had a couple of cases now where this was wrong. If
>                 you get a warning from that please fix the code which
>                 tried to free something during suspend instead.
>
>                 Regards,
>                 Christian.
>
>                 Am 03.02.23 um 07:04 schrieb Xiao, Jack:
>
>                     [AMD Official Use Only - General]
>
>                     >> It's simply illegal to free up memory during
>                     suspend.
>
>                     Why? In my understanding, the limit was caused by
>                     DMA shutdown.
>
>                     Regards,
>
>                     Jack
>
>                     *From:* Koenig, Christian
>                     <Christian.Koenig@amd.com>
>                     <mailto:Christian.Koenig@amd.com>
>                     *Sent:* Thursday, February 2, 2023 7:43 PM
>                     *To:* Xiao, Jack <Jack.Xiao@amd.com>
>                     <mailto:Jack.Xiao@amd.com>;
>                     amd-gfx@lists.freedesktop.org; Deucher, Alexander
>                     <Alexander.Deucher@amd.com>
>                     <mailto:Alexander.Deucher@amd.com>
>                     *Subject:* AW: [PATCH] drm/amdgpu: only WARN
>                     freeing buffers when DMA is unavailable
>
>                     Big NAK to this! This warning is not related in
>                     any way to the hw state.
>
>                     It's simply illegal to free up memory during suspend.
>
>                     Regards,
>
>                     Christian.
>
>                     ------------------------------------------------------------------------
>
>                     *Von:*Xiao, Jack <Jack.Xiao@amd.com>
>                     *Gesendet:* Donnerstag, 2. Februar 2023 10:54
>                     *An:*
>                     amd-gfx@lists.freedesktop.org<amd-gfx@lists.freedesktop.org>;
>                     Deucher, Alexander <Alexander.Deucher@amd.com>;
>                     Koenig, Christian <Christian.Koenig@amd.com>
>                     *Cc:* Xiao, Jack <Jack.Xiao@amd.com>
>                     *Betreff:* [PATCH] drm/amdgpu: only WARN freeing
>                     buffers when DMA is unavailable
>
>                     Reduce waringings, only warn when DMA is unavailable.
>
>                     Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
>                     ---
>                      drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++-
>                      1 file changed, 2 insertions(+), 1 deletion(-)
>
>                     diff --git
>                     a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                     b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                     index 2d237f3d3a2e..e3e3764ea697 100644
>                     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>                     @@ -422,7 +422,8 @@ void
>                     amdgpu_bo_free_kernel(struct amdgpu_bo **bo, u64
>                     *gpu_addr,
>                              if (*bo == NULL)
>                                      return;
>
>                     -
>                     WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend);
>                     +
>                     WARN_ON(amdgpu_ttm_adev((*bo)->tbo.bdev)->in_suspend
>                     &&
>                     +
>                     !amdgpu_ttm_adev((*bo)->tbo.bdev)->ip_blocks[AMD_IP_BLOCK_TYPE_SDMA].status.hw);
>
>                              if (likely(amdgpu_bo_reserve(*bo, true)
>                     == 0)) {
>                                      if (cpu_addr)
>                     -- 
>                     2.37.3
>

[-- Attachment #2: Type: text/html, Size: 37828 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-10 11:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02  9:54 [PATCH] drm/amdgpu: only WARN freeing buffers when DMA is unavailable Jack Xiao
2023-02-02 11:43 ` AW: " Koenig, Christian
2023-02-03  6:04   ` Xiao, Jack
2023-02-03 13:19     ` Christian König
2023-02-06  7:23       ` Xiao, Jack
2023-02-06  8:05         ` Christian König
2023-02-06  8:28           ` Xiao, Jack
2023-02-06  8:59             ` Christian König
2023-02-10  4:12               ` Quan, Evan
2023-02-10  8:07                 ` Christian König
2023-02-10  9:51                   ` Xiao, Jack
2023-02-10 10:25                     ` Christian König
2023-02-10 11:30                       ` Xiao, Jack
2023-02-10 11:56                         ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.