All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Make sure ttm delayed work finished
@ 2022-04-13  3:08 xinhui pan
  2022-04-13  6:22 ` Chen, Guchun
  2022-04-13  7:30 ` AW: " Koenig, Christian
  0 siblings, 2 replies; 5+ messages in thread
From: xinhui pan @ 2022-04-13  3:08 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, xinhui pan, christian.koenig

ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.

Fix it by waiting all BO to be destroyed.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 {
+	int pending = 1;
+
 	dev_info(adev->dev, "amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
-	if (adev->mman.initialized) {
+	while (adev->mman.initialized && pending) {
 		flush_delayed_work(&adev->mman.bdev.wq);
-		ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+		pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+		if (pending) {
+			ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+			msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+		}
 	}
 	adev->shutdown = true;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
  2022-04-13  3:08 [PATCH] drm/amdgpu: Make sure ttm delayed work finished xinhui pan
@ 2022-04-13  6:22 ` Chen, Guchun
  2022-04-13  7:30 ` AW: " Koenig, Christian
  1 sibling, 0 replies; 5+ messages in thread
From: Chen, Guchun @ 2022-04-13  6:22 UTC (permalink / raw)
  To: Pan, Xinhui, amd-gfx
  Cc: Deucher, Alexander, Shi, Leslie, Pan, Xinhui, Koenig, Christian

+			msleep((HZ / 100) < 1) ? 1 : HZ / 100);

a "(" is missed. With it fixed, this patch is:
Acked-by: Guchun Chen <guchun.chen@amd.com>

Regards,
Guchun

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of xinhui pan
Sent: Wednesday, April 13, 2022 11:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Subject: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

ttm_device_delayed_workqueue would reschedule itself if there is pending BO to be destroyed. So just one flush + cancel_sync is not enough. We still see lru_list not empty warnging.

Fix it by waiting all BO to be destroyed.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
+	int pending = 1;
+
 	dev_info(adev->dev, "amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
-	if (adev->mman.initialized) {
+	while (adev->mman.initialized && pending) {
 		flush_delayed_work(&adev->mman.bdev.wq);
-		ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+		pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+		if (pending) {
+			ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+			msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+		}
 	}
 	adev->shutdown = true;
 
--
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
  2022-04-13  3:08 [PATCH] drm/amdgpu: Make sure ttm delayed work finished xinhui pan
  2022-04-13  6:22 ` Chen, Guchun
@ 2022-04-13  7:30 ` Koenig, Christian
  2022-04-13  7:47   ` 回复: " Pan, Xinhui
  1 sibling, 1 reply; 5+ messages in thread
From: Koenig, Christian @ 2022-04-13  7:30 UTC (permalink / raw)
  To: Pan, Xinhui, amd-gfx; +Cc: Deucher, Alexander

[-- Attachment #1: Type: text/plain, Size: 2091 bytes --]

We don't need that.

TTM only reschedules when the BOs are still busy.

And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :)

Regards,
Christian

________________________________
Von: Pan, Xinhui <Xinhui.Pan@amd.com>
Gesendet: Mittwoch, 13. April 2022 05:08
An: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.

Fix it by waiting all BO to be destroyed.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 {
+       int pending = 1;
+
         dev_info(adev->dev, "amdgpu: finishing device.\n");
         flush_delayed_work(&adev->delayed_init_work);
-       if (adev->mman.initialized) {
+       while (adev->mman.initialized && pending) {
                 flush_delayed_work(&adev->mman.bdev.wq);
-               ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+               pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+               if (pending) {
+                       ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+                       msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+               }
         }
         adev->shutdown = true;

--
2.25.1


[-- Attachment #2: Type: text/html, Size: 3782 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* 回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
  2022-04-13  7:30 ` AW: " Koenig, Christian
@ 2022-04-13  7:47   ` Pan, Xinhui
  2022-04-13  8:14     ` Christian König
  0 siblings, 1 reply; 5+ messages in thread
From: Pan, Xinhui @ 2022-04-13  7:47 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx; +Cc: Deucher, Alexander

[AMD Official Use Only]

The log from tester says it is the drm framebuffer BO being busy.

I just feel there is lack of time for its fence to be signaled.
As a delay works too in my test.
But the warning is a little annoying.

________________________________________
发件人: Koenig, Christian <Christian.Koenig@amd.com>
发送时间: 2022年4月13日 15:30
收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
抄送: Deucher, Alexander
主题: AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

We don't need that.

TTM only reschedules when the BOs are still busy.

And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :)

Regards,
Christian

________________________________
Von: Pan, Xinhui <Xinhui.Pan@amd.com>
Gesendet: Mittwoch, 13. April 2022 05:08
An: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

ttm_device_delayed_workqueue would reschedule itself if there is pending
BO to be destroyed. So just one flush + cancel_sync is not enough. We
still see lru_list not empty warnging.

Fix it by waiting all BO to be destroyed.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6f47726f1765..e249923eb9a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 {
+       int pending = 1;
+
         dev_info(adev->dev, "amdgpu: finishing device.\n");
         flush_delayed_work(&adev->delayed_init_work);
-       if (adev->mman.initialized) {
+       while (adev->mman.initialized && pending) {
                 flush_delayed_work(&adev->mman.bdev.wq);
-               ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+               pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+               if (pending) {
+                       ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
+                       msleep((HZ / 100) < 1) ? 1 : HZ / 100);
+               }
         }
         adev->shutdown = true;

--
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: 回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
  2022-04-13  7:47   ` 回复: " Pan, Xinhui
@ 2022-04-13  8:14     ` Christian König
  0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2022-04-13  8:14 UTC (permalink / raw)
  To: Pan, Xinhui, amd-gfx; +Cc: Deucher, Alexander

That warning is a bit more than a little annoying.

Before we stop the delayed delete worker we *must* absolutely make sure 
that there is nothing going on the hardware any more. Otherwise we could 
easily run into use after free issues.

There should somewhere be a amdgpu_fence_wait_empty() before the 
flush_delayed_work() call. If that isn't there we do have a problem 
elsewhere.

Thanks for investigating this,
Christian.

Am 13.04.22 um 09:47 schrieb Pan, Xinhui:
> [AMD Official Use Only]
>
> The log from tester says it is the drm framebuffer BO being busy.
>
> I just feel there is lack of time for its fence to be signaled.
> As a delay works too in my test.
> But the warning is a little annoying.
>
> ________________________________________
> 发件人: Koenig, Christian <Christian.Koenig@amd.com>
> 发送时间: 2022年4月13日 15:30
> 收件人: Pan, Xinhui; amd-gfx@lists.freedesktop.org
> 抄送: Deucher, Alexander
> 主题: AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
>
> We don't need that.
>
> TTM only reschedules when the BOs are still busy.
>
> And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :)
>
> Regards,
> Christian
>
> ________________________________
> Von: Pan, Xinhui <Xinhui.Pan@amd.com>
> Gesendet: Mittwoch, 13. April 2022 05:08
> An: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Pan, Xinhui <Xinhui.Pan@amd.com>
> Betreff: [PATCH] drm/amdgpu: Make sure ttm delayed work finished
>
> ttm_device_delayed_workqueue would reschedule itself if there is pending
> BO to be destroyed. So just one flush + cancel_sync is not enough. We
> still see lru_list not empty warnging.
>
> Fix it by waiting all BO to be destroyed.
>
> Signed-off-by: xinhui pan <xinhui.pan@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6f47726f1765..e249923eb9a7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3957,11 +3957,17 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
>    */
>   void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   {
> +       int pending = 1;
> +
>           dev_info(adev->dev, "amdgpu: finishing device.\n");
>           flush_delayed_work(&adev->delayed_init_work);
> -       if (adev->mman.initialized) {
> +       while (adev->mman.initialized && pending) {
>                   flush_delayed_work(&adev->mman.bdev.wq);
> -               ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +               pending = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
> +               if (pending) {
> +                       ttm_bo_unlock_delayed_workqueue(&adev->mman.bdev, true);
> +                       msleep((HZ / 100) < 1) ? 1 : HZ / 100);
> +               }
>           }
>           adev->shutdown = true;
>
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-13  8:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13  3:08 [PATCH] drm/amdgpu: Make sure ttm delayed work finished xinhui pan
2022-04-13  6:22 ` Chen, Guchun
2022-04-13  7:30 ` AW: " Koenig, Christian
2022-04-13  7:47   ` 回复: " Pan, Xinhui
2022-04-13  8:14     ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.