All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
@ 2021-11-03  1:33 James Zhu
  2021-11-03  2:01 ` Alex Deucher
  2021-11-03 15:40 ` Alex Deucher
  0 siblings, 2 replies; 11+ messages in thread
From: James Zhu @ 2021-11-03  1:33 UTC (permalink / raw)
  To: amd-gfx; +Cc: alexander.deucher, yifan1.zhang, jzhums, zarniwhoop

Remove duplicated kfd_resume_iommu which already runs
in mdgpu_amdkfd_device_init.

Signed-off-by: James Zhu <James.Zhu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e56bc925afcf..f77823ce7ae8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	if (!adev->gmc.xgmi.pending_reset)
 		amdgpu_amdkfd_device_init(adev);
 
-	r = amdgpu_amdkfd_resume_iommu(adev);
-	if (r)
-		goto init_failed;
-
 	amdgpu_fru_get_product_info(adev);
 
 init_failed:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03  1:33 [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu James Zhu
@ 2021-11-03  2:01 ` Alex Deucher
  2021-11-03  2:50   ` Ken Moffat
  2021-11-03 14:34   ` Zhu, James
  2021-11-03 15:40 ` Alex Deucher
  1 sibling, 2 replies; 11+ messages in thread
From: Alex Deucher @ 2021-11-03  2:01 UTC (permalink / raw)
  To: James Zhu
  Cc: Deucher, Alexander, Yifan Zhang, James Zhu, amd-gfx list, Ken Moffat

On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>

Once you get confirmation, please add:
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214859
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1770

Acked-by: Alex Deucher <alexander.deucher@amd.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03  2:01 ` Alex Deucher
@ 2021-11-03  2:50   ` Ken Moffat
  2021-11-03 14:34   ` Zhu, James
  1 sibling, 0 replies; 11+ messages in thread
From: Ken Moffat @ 2021-11-03  2:50 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Deucher, Alexander, Yifan Zhang, James Zhu, James Zhu, amd-gfx list

On Tue, Nov 02, 2021 at 10:01:46PM -0400, Alex Deucher wrote:
> On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com>
> wrote:
> >
> > Remove duplicated kfd_resume_iommu which already runs in
> > mdgpu_amdkfd_device_init.
> >
> > Signed-off-by: James Zhu <James.Zhu@amd.com>
> 
> Once you get confirmation, please add: Bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=214859 Bug:
> https://gitlab.freedesktop.org/drm/amd/-/issues/1770
> 
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> 
> 
I see those were both for 5.14.15, on my 5.14 kernels I have not
moved beyond 5.14.12 so I've dodged the bullet on those.  And on my
2500u which I think is a Raven, 5.15.0-rc7 was ok.

On the picasso, this applies to 5.15.0 with an offset of 34 lines
(no fuzz) and solves the problem.  Thanks.

If it is any use,

Tested-By: Ken Moffat <zarniwhoop@ntlworld.com>

> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e56bc925afcf..f77823ce7ae8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> >         if (!adev->gmc.xgmi.pending_reset)
> >                 amdgpu_amdkfd_device_init(adev);
> >
> > -       r = amdgpu_amdkfd_resume_iommu(adev);
> > -       if (r)
> > -               goto init_failed;
> > -
> >         amdgpu_fru_get_product_info(adev);
> >
> >  init_failed:
> > --
> > 2.25.1
> >

-- 
Vetinari smiled. "Can you keep a secret, Mister Lipwig?"
"Oh, yes, sir. I've kept lots."
"Capital. And the point is, so can I. You do not need to know.”

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03  2:01 ` Alex Deucher
  2021-11-03  2:50   ` Ken Moffat
@ 2021-11-03 14:34   ` Zhu, James
  2021-11-03 14:50     ` Alex Deucher
  1 sibling, 1 reply; 11+ messages in thread
From: Zhu, James @ 2021-11-03 14:34 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, amd-gfx list, Ken Moffat


[-- Attachment #1.1: Type: text/plain, Size: 3126 bytes --]

[AMD Official Use Only]

Hi Alex,

Finally figured out the root cause for this broken,


Linux 5.14.15  + afd1818 can fix the issue.

Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.

714d9e4 drm/amdgpu: init iommu after amdkfd device init

f02abeb drm/amdgpu: move iommu_resume before ip init/resume

afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.

286826d drm/amdgpu: init iommu after amdkfd device init

9cec53c drm/amdgpu: move iommu_resume before ip init/resume

[cid:56660919-2db3-4617-93fe-73404cd29433]


So, do we just discard this patch, and revert 714d9e4 and  f02abeb?


Thanks & Best Regards!


James Zhu

________________________________
From: Alex Deucher <alexdeucher@gmail.com>
Sent: Tuesday, November 2, 2021 10:01 PM
To: Zhu, James <James.Zhu@amd.com>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu

On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>

Once you get confirmation, please add:
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0

Acked-by: Alex Deucher <alexander.deucher@amd.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> --
> 2.25.1
>

[-- Attachment #1.2: Type: text/html, Size: 6698 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 381936 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03 14:34   ` Zhu, James
@ 2021-11-03 14:50     ` Alex Deucher
  2021-11-03 15:03       ` Alex Deucher
  2021-11-03 15:35       ` Alex Deucher
  0 siblings, 2 replies; 11+ messages in thread
From: Alex Deucher @ 2021-11-03 14:50 UTC (permalink / raw)
  To: Zhu, James
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, amd-gfx list, Ken Moffat


[-- Attachment #1.1: Type: text/plain, Size: 3407 bytes --]

On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@amd.com> wrote:

> [AMD Official Use Only]
>
> Hi Alex,
>
> Finally figured out the root cause for this broken,
>
> Linux 5.14.15  + afd1818 can fix the issue.
>
>
I'll do that for stable.


> Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.
>
> 714d9e4 drm/amdgpu: init iommu after amdkfd device init
>
> f02abeb drm/amdgpu: move iommu_resume before ip init/resume
>
> afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
>
> 286826d drm/amdgpu: init iommu after amdkfd device init
>
> 9cec53c drm/amdgpu: move iommu_resume before ip init/resume
>
>
>
> So, do we just discard this patch, and revert 714d9e4 and  f02abeb?
>

I'll do that for 5.15+

Thanks for sorting this out.

Alex


>
> Thanks & Best Regards!
>
>
> James Zhu
> ------------------------------
> *From:* Alex Deucher <alexdeucher@gmail.com>
> *Sent:* Tuesday, November 2, 2021 10:01 PM
> *To:* Zhu, James <James.Zhu@amd.com>
> *Cc:* amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <
> Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James
> Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
> *Subject:* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
>
> On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
> >
> > Remove duplicated kfd_resume_iommu which already runs
> > in mdgpu_amdkfd_device_init.
> >
> > Signed-off-by: James Zhu <James.Zhu@amd.com>
>
> Once you get confirmation, please add:
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0
>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e56bc925afcf..f77823ce7ae8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
> >         if (!adev->gmc.xgmi.pending_reset)
> >                 amdgpu_amdkfd_device_init(adev);
> >
> > -       r = amdgpu_amdkfd_resume_iommu(adev);
> > -       if (r)
> > -               goto init_failed;
> > -
> >         amdgpu_fru_get_product_info(adev);
> >
> >  init_failed:
> > --
> > 2.25.1
> >
>

[-- Attachment #1.2: Type: text/html, Size: 8084 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 381936 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03 14:50     ` Alex Deucher
@ 2021-11-03 15:03       ` Alex Deucher
  2021-11-03 15:54         ` Zhu, James
  2021-11-03 15:35       ` Alex Deucher
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2021-11-03 15:03 UTC (permalink / raw)
  To: Zhu, James
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, amd-gfx list, Ken Moffat


[-- Attachment #1.1: Type: text/plain, Size: 5624 bytes --]

Reverting 714d9e4 and  f02abeb results in this diff which is more than this
patch does.  Is that correct or should I just use your patch?

Alex

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e56bc925afcf..70540712ff2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2360,6 +2360,10 @@ static int amdgpu_device_ip_init(struct
amdgpu_device *adev)
        if (r)
                goto init_failed;

+       r = amdgpu_amdkfd_resume_iommu(adev);
+       if (r)
+               goto init_failed;
+
        r = amdgpu_device_ip_hw_init_phase1(adev);
        if (r)
                goto init_failed;
@@ -2398,10 +2402,6 @@ static int amdgpu_device_ip_init(struct
amdgpu_device *adev)
        if (!adev->gmc.xgmi.pending_reset)
                amdgpu_amdkfd_device_init(adev);

-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               goto init_failed;
-
        amdgpu_fru_get_product_info(adev);

 init_failed:
@@ -3119,10 +3119,6 @@ static int amdgpu_device_ip_resume(struct
amdgpu_device *adev)
 {
        int r;

-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               return r;
-
        r = amdgpu_device_ip_resume_phase1(adev);
        if (r)
                return r;
@@ -4595,10 +4591,6 @@ int amdgpu_do_asic_reset(struct list_head
*device_list_handle,
                                dev_warn(tmp_adev->dev, "asic atom init
failed!");
                        } else {
                                dev_info(tmp_adev->dev, "GPU reset
succeeded, trying to resume\n");
-                               r = amdgpu_amdkfd_resume_iommu(tmp_adev);
-                               if (r)
-                                       goto out;
-
                                r =
amdgpu_device_ip_resume_phase1(tmp_adev);
                                if (r)
                                        goto out;


On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:

>
>
> On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@amd.com> wrote:
>
>> [AMD Official Use Only]
>>
>> Hi Alex,
>>
>> Finally figured out the root cause for this broken,
>>
>> Linux 5.14.15  + afd1818 can fix the issue.
>>
>>
> I'll do that for stable.
>
>
>> Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.
>>
>> 714d9e4 drm/amdgpu: init iommu after amdkfd device init
>>
>> f02abeb drm/amdgpu: move iommu_resume before ip init/resume
>>
>> afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
>>
>> 286826d drm/amdgpu: init iommu after amdkfd device init
>>
>> 9cec53c drm/amdgpu: move iommu_resume before ip init/resume
>>
>>
>>
>> So, do we just discard this patch, and revert 714d9e4 and  f02abeb?
>>
>
> I'll do that for 5.15+
>
> Thanks for sorting this out.
>
> Alex
>
>
>>
>> Thanks & Best Regards!
>>
>>
>> James Zhu
>> ------------------------------
>> *From:* Alex Deucher <alexdeucher@gmail.com>
>> *Sent:* Tuesday, November 2, 2021 10:01 PM
>> *To:* Zhu, James <James.Zhu@amd.com>
>> *Cc:* amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <
>> Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James
>> Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
>> *Subject:* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
>>
>> On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
>> >
>> > Remove duplicated kfd_resume_iommu which already runs
>> > in mdgpu_amdkfd_device_init.
>> >
>> > Signed-off-by: James Zhu <James.Zhu@amd.com>
>>
>> Once you get confirmation, please add:
>> Bug:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0
>> Bug:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0
>>
>> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>>
>>
>> > ---
>> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>> >  1 file changed, 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > index e56bc925afcf..f77823ce7ae8 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct
>> amdgpu_device *adev)
>> >         if (!adev->gmc.xgmi.pending_reset)
>> >                 amdgpu_amdkfd_device_init(adev);
>> >
>> > -       r = amdgpu_amdkfd_resume_iommu(adev);
>> > -       if (r)
>> > -               goto init_failed;
>> > -
>> >         amdgpu_fru_get_product_info(adev);
>> >
>> >  init_failed:
>> > --
>> > 2.25.1
>> >
>>
>

[-- Attachment #1.2: Type: text/html, Size: 11341 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 381936 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03 14:50     ` Alex Deucher
  2021-11-03 15:03       ` Alex Deucher
@ 2021-11-03 15:35       ` Alex Deucher
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2021-11-03 15:35 UTC (permalink / raw)
  To: Zhu, James
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, amd-gfx list, Ken Moffat

On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
>
>
> On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@amd.com> wrote:
>>
>> [AMD Official Use Only]
>>
>>
>> Hi Alex,
>>
>> Finally figured out the root cause for this broken,
>>
>> Linux 5.14.15  + afd1818 can fix the issue.

I think this applies to 5.15 as well.  Only drm-next (5.16) needs this patch.

Alex

>
>
> I'll do that for stable.
>
>>
>> Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.
>>
>> 714d9e4 drm/amdgpu: init iommu after amdkfd device init
>>
>> f02abeb drm/amdgpu: move iommu_resume before ip init/resume
>>
>> afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
>>
>> 286826d drm/amdgpu: init iommu after amdkfd device init
>>
>> 9cec53c drm/amdgpu: move iommu_resume before ip init/resume
>>
>>
>>
>> So, do we just discard this patch, and revert 714d9e4 and  f02abeb?
>
>
> I'll do that for 5.15+
>
> Thanks for sorting this out.
>
> Alex
>
>>
>>
>> Thanks & Best Regards!
>>
>>
>> James Zhu
>>
>> ________________________________
>> From: Alex Deucher <alexdeucher@gmail.com>
>> Sent: Tuesday, November 2, 2021 10:01 PM
>> To: Zhu, James <James.Zhu@amd.com>
>> Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
>> Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
>>
>> On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
>> >
>> > Remove duplicated kfd_resume_iommu which already runs
>> > in mdgpu_amdkfd_device_init.
>> >
>> > Signed-off-by: James Zhu <James.Zhu@amd.com>
>>
>> Once you get confirmation, please add:
>> Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0
>> Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0
>>
>> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>>
>>
>> > ---
>> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>> >  1 file changed, 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > index e56bc925afcf..f77823ce7ae8 100644
>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> > @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>> >         if (!adev->gmc.xgmi.pending_reset)
>> >                 amdgpu_amdkfd_device_init(adev);
>> >
>> > -       r = amdgpu_amdkfd_resume_iommu(adev);
>> > -       if (r)
>> > -               goto init_failed;
>> > -
>> >         amdgpu_fru_get_product_info(adev);
>> >
>> >  init_failed:
>> > --
>> > 2.25.1
>> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03  1:33 [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu James Zhu
  2021-11-03  2:01 ` Alex Deucher
@ 2021-11-03 15:40 ` Alex Deucher
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2021-11-03 15:40 UTC (permalink / raw)
  To: James Zhu
  Cc: Deucher, Alexander, Yifan Zhang, James Zhu, amd-gfx list, Ken Moffat

On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03 15:03       ` Alex Deucher
@ 2021-11-03 15:54         ` Zhu, James
  2021-11-03 15:57           ` Alex Deucher
  0 siblings, 1 reply; 11+ messages in thread
From: Zhu, James @ 2021-11-03 15:54 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, amd-gfx list, Ken Moffat


[-- Attachment #1.1: Type: text/plain, Size: 7958 bytes --]

[AMD Official Use Only]

Hi Alex,

The following two patches were introduced for stable@vger.kernel.org

714d9e4 drm/amdgpu: init iommu after amdkfd device init
f02abeb drm/amdgpu: move iommu_resume before ip init/resume

after commit   970eae15600a883e4ad27dd0757b18871cc983ab
Merge: 27f4432 3906fe9    BackMerge tag 'v5.15-rc7' into drm-next,
It became redundant and overwrote afd1818.

I saw that you just submit (afd1818) "[PATCH] drm/amdkfd: fix boot failure when iommu is disabled in Picasso" to stable@vger.kernel.org.

I checked that if we re-applied afd1818 on current drm-next, it did the same thing as my patch after auto-merged.

I am wondering if BackMerge stable into drm-next in the future will correct current break.

For the above situation, I am not sure what is the proper way to fix this break.

Please let me know your final decision with all these information.


Thanks & Best Regards!


James Zhu

________________________________
From: Alex Deucher <alexdeucher@gmail.com>
Sent: Wednesday, November 3, 2021 11:03 AM
To: Zhu, James <James.Zhu@amd.com>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu

Reverting 714d9e4 and  f02abeb results in this diff which is more than this patch does.  Is that correct or should I just use your patch?

Alex

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e56bc925afcf..70540712ff2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2360,6 +2360,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
        if (r)
                goto init_failed;

+       r = amdgpu_amdkfd_resume_iommu(adev);
+       if (r)
+               goto init_failed;
+
        r = amdgpu_device_ip_hw_init_phase1(adev);
        if (r)
                goto init_failed;
@@ -2398,10 +2402,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
        if (!adev->gmc.xgmi.pending_reset)
                amdgpu_amdkfd_device_init(adev);

-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               goto init_failed;
-
        amdgpu_fru_get_product_info(adev);

 init_failed:
@@ -3119,10 +3119,6 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
 {
        int r;

-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               return r;
-
        r = amdgpu_device_ip_resume_phase1(adev);
        if (r)
                return r;
@@ -4595,10 +4591,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
                                dev_warn(tmp_adev->dev, "asic atom init failed!");
                        } else {
                                dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n");
-                               r = amdgpu_amdkfd_resume_iommu(tmp_adev);
-                               if (r)
-                                       goto out;
-
                                r = amdgpu_device_ip_resume_phase1(tmp_adev);
                                if (r)
                                        goto out;


On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher@gmail.com<mailto:alexdeucher@gmail.com>> wrote:


On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@amd.com<mailto:James.Zhu@amd.com>> wrote:

[AMD Official Use Only]

Hi Alex,

Finally figured out the root cause for this broken,


Linux 5.14.15  + afd1818 can fix the issue.

I'll do that for stable.


Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.

714d9e4 drm/amdgpu: init iommu after amdkfd device init

f02abeb drm/amdgpu: move iommu_resume before ip init/resume

afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.

286826d drm/amdgpu: init iommu after amdkfd device init

9cec53c drm/amdgpu: move iommu_resume before ip init/resume

[cid:17ce6464fcfcb971f161]


So, do we just discard this patch, and revert 714d9e4 and  f02abeb?

I'll do that for 5.15+

Thanks for sorting this out.

Alex



Thanks & Best Regards!


James Zhu

________________________________
From: Alex Deucher <alexdeucher@gmail.com<mailto:alexdeucher@gmail.com>>
Sent: Tuesday, November 2, 2021 10:01 PM
To: Zhu, James <James.Zhu@amd.com<mailto:James.Zhu@amd.com>>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Zhang, Yifan <Yifan1.Zhang@amd.com<mailto:Yifan1.Zhang@amd.com>>; James Zhu <jzhums@gmail.com<mailto:jzhums@gmail.com>>; Ken Moffat <zarniwhoop@ntlworld.com<mailto:zarniwhoop@ntlworld.com>>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu

On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com<mailto:James.Zhu@amd.com>> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com<mailto:James.Zhu@amd.com>>

Once you get confirmation, please add:
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249968500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WhxYtNqFSoeWcuJSbJCCl99VSdd3XyHBVzjbpR3nx7g%3D&reserved=0>
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249978500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hX2U%2BcWp%2BEinTjxptnx0zExc%2Fy3lbFUYgHT2JDdUY0g%3D&reserved=0>

Acked-by: Alex Deucher <alexander.deucher@amd.com<mailto:alexander.deucher@amd.com>>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> --
> 2.25.1
>

[-- Attachment #1.2: Type: text/html, Size: 16760 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 381936 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03 15:54         ` Zhu, James
@ 2021-11-03 15:57           ` Alex Deucher
  2021-11-05  2:31             ` Ken Moffat
  0 siblings, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2021-11-03 15:57 UTC (permalink / raw)
  To: Zhu, James
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, amd-gfx list, Ken Moffat


[-- Attachment #1.1: Type: text/plain, Size: 8197 bytes --]

I think just applying your patch is fine for drm-next (i'll take care of
that).  For 5.14.x and 5.15.x, we can just cherry-pick afd1818.

Alex

On Wed, Nov 3, 2021 at 11:54 AM Zhu, James <James.Zhu@amd.com> wrote:

> [AMD Official Use Only]
>
> Hi Alex,
>
> The following two patches were introduced for stable@vger.kernel.org
>
> 714d9e4 drm/amdgpu: init iommu after amdkfd device init
> f02abeb drm/amdgpu: move iommu_resume before ip init/resume
>
> after commit   970eae15600a883e4ad27dd0757b18871cc983ab
> Merge: 27f4432 3906fe9    BackMerge tag 'v5.15-rc7' into drm-next,
> It became redundant and overwrote afd1818.
>
> I saw that you just submit (afd1818) "[PATCH] drm/amdkfd: fix boot
> failure when iommu is disabled in Picasso" to stable@vger.kernel.org.
>
> I checked that if we re-applied afd1818 on current drm-next, it did the
> same thing as my patch after auto-merged.
>
> I am wondering if BackMerge stable into drm-next in the future will
> correct current break.
>
> For the above situation, I am not sure what is the proper way to fix this
> break.
>
> Please let me know your final decision with all these information.
>
>
> Thanks & Best Regards!
>
>
> James Zhu
> ------------------------------
> *From:* Alex Deucher <alexdeucher@gmail.com>
> *Sent:* Wednesday, November 3, 2021 11:03 AM
> *To:* Zhu, James <James.Zhu@amd.com>
> *Cc:* amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <
> Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James
> Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
> *Subject:* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
>
> Reverting 714d9e4 and  f02abeb results in this diff which is more than
> this patch does.  Is that correct or should I just use your patch?
>
> Alex
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..70540712ff2d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2360,6 +2360,10 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
>         if (r)
>                 goto init_failed;
>
> +       r = amdgpu_amdkfd_resume_iommu(adev);
> +       if (r)
> +               goto init_failed;
> +
>         r = amdgpu_device_ip_hw_init_phase1(adev);
>         if (r)
>                 goto init_failed;
> @@ -2398,10 +2402,6 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> @@ -3119,10 +3119,6 @@ static int amdgpu_device_ip_resume(struct
> amdgpu_device *adev)
>  {
>         int r;
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               return r;
> -
>         r = amdgpu_device_ip_resume_phase1(adev);
>         if (r)
>                 return r;
> @@ -4595,10 +4591,6 @@ int amdgpu_do_asic_reset(struct list_head
> *device_list_handle,
>                                 dev_warn(tmp_adev->dev, "asic atom init
> failed!");
>                         } else {
>                                 dev_info(tmp_adev->dev, "GPU reset
> succeeded, trying to resume\n");
> -                               r = amdgpu_amdkfd_resume_iommu(tmp_adev);
> -                               if (r)
> -                                       goto out;
> -
>                                 r =
> amdgpu_device_ip_resume_phase1(tmp_adev);
>                                 if (r)
>                                         goto out;
>
>
> On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher@gmail.com>
> wrote:
>
>
>
> On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@amd.com> wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> Finally figured out the root cause for this broken,
>
> Linux 5.14.15  + afd1818 can fix the issue.
>
>
> I'll do that for stable.
>
>
> Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.
>
> 714d9e4 drm/amdgpu: init iommu after amdkfd device init
>
> f02abeb drm/amdgpu: move iommu_resume before ip init/resume
>
> afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.
>
> 286826d drm/amdgpu: init iommu after amdkfd device init
>
> 9cec53c drm/amdgpu: move iommu_resume before ip init/resume
>
>
>
> So, do we just discard this patch, and revert 714d9e4 and  f02abeb?
>
>
> I'll do that for 5.15+
>
> Thanks for sorting this out.
>
> Alex
>
>
>
> Thanks & Best Regards!
>
>
> James Zhu
> ------------------------------
> *From:* Alex Deucher <alexdeucher@gmail.com>
> *Sent:* Tuesday, November 2, 2021 10:01 PM
> *To:* Zhu, James <James.Zhu@amd.com>
> *Cc:* amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <
> Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James
> Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
> *Subject:* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
>
> On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com> wrote:
> >
> > Remove duplicated kfd_resume_iommu which already runs
> > in mdgpu_amdkfd_device_init.
> >
> > Signed-off-by: James Zhu <James.Zhu@amd.com>
>
> Once you get confirmation, please add:
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249968500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WhxYtNqFSoeWcuJSbJCCl99VSdd3XyHBVzjbpR3nx7g%3D&reserved=0>
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249978500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hX2U%2BcWp%2BEinTjxptnx0zExc%2Fy3lbFUYgHT2JDdUY0g%3D&reserved=0>
>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
> >  1 file changed, 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e56bc925afcf..f77823ce7ae8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
> >         if (!adev->gmc.xgmi.pending_reset)
> >                 amdgpu_amdkfd_device_init(adev);
> >
> > -       r = amdgpu_amdkfd_resume_iommu(adev);
> > -       if (r)
> > -               goto init_failed;
> > -
> >         amdgpu_fru_get_product_info(adev);
> >
> >  init_failed:
> > --
> > 2.25.1
> >
>
>

[-- Attachment #1.2: Type: text/html, Size: 15738 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 381936 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
  2021-11-03 15:57           ` Alex Deucher
@ 2021-11-05  2:31             ` Ken Moffat
  0 siblings, 0 replies; 11+ messages in thread
From: Ken Moffat @ 2021-11-05  2:31 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Deucher, Alexander, Zhang, Yifan, James Zhu, Zhu, James, amd-gfx list

On Wed, Nov 03, 2021 at 11:57:17AM -0400, Alex Deucher wrote:
> I think just applying your patch is fine for drm-next (i'll take care of
> that).  For 5.14.x and 5.15.x, we can just cherry-pick afd1818.
> 
> Alex
> 

I can confirm that both 5.14.17-rc1 and 5.15.1-rc1 work on my
Picasso 3400G.  Thanks for everyone's efforts.

ĸen
-- 
Vetinari smiled. "Can you keep a secret, Mister Lipwig?"
"Oh, yes, sir. I've kept lots."
"Capital. And the point is, so can I. You do not need to know.”

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-05  2:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-03  1:33 [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu James Zhu
2021-11-03  2:01 ` Alex Deucher
2021-11-03  2:50   ` Ken Moffat
2021-11-03 14:34   ` Zhu, James
2021-11-03 14:50     ` Alex Deucher
2021-11-03 15:03       ` Alex Deucher
2021-11-03 15:54         ` Zhu, James
2021-11-03 15:57           ` Alex Deucher
2021-11-05  2:31             ` Ken Moffat
2021-11-03 15:35       ` Alex Deucher
2021-11-03 15:40 ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.