All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhu, James" <James.Zhu@amd.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Zhang, Yifan" <Yifan1.Zhang@amd.com>,
	James Zhu <jzhums@gmail.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Ken Moffat <zarniwhoop@ntlworld.com>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu
Date: Wed, 3 Nov 2021 15:54:45 +0000	[thread overview]
Message-ID: <BN6PR12MB1874A9156EF80C63D96EBD06E48C9@BN6PR12MB1874.namprd12.prod.outlook.com> (raw)
In-Reply-To: <CADnq5_OrBYv80XHMBTTEwyJzEx1eEzBL2=VuzgmK=9Og5v5=1A@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 7958 bytes --]

[AMD Official Use Only]

Hi Alex,

The following two patches were introduced for stable@vger.kernel.org

714d9e4 drm/amdgpu: init iommu after amdkfd device init
f02abeb drm/amdgpu: move iommu_resume before ip init/resume

after commit   970eae15600a883e4ad27dd0757b18871cc983ab
Merge: 27f4432 3906fe9    BackMerge tag 'v5.15-rc7' into drm-next,
It became redundant and overwrote afd1818.

I saw that you just submit (afd1818) "[PATCH] drm/amdkfd: fix boot failure when iommu is disabled in Picasso" to stable@vger.kernel.org.

I checked that if we re-applied afd1818 on current drm-next, it did the same thing as my patch after auto-merged.

I am wondering if BackMerge stable into drm-next in the future will correct current break.

For the above situation, I am not sure what is the proper way to fix this break.

Please let me know your final decision with all these information.


Thanks & Best Regards!


James Zhu

________________________________
From: Alex Deucher <alexdeucher@gmail.com>
Sent: Wednesday, November 3, 2021 11:03 AM
To: Zhu, James <James.Zhu@amd.com>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; Zhang, Yifan <Yifan1.Zhang@amd.com>; James Zhu <jzhums@gmail.com>; Ken Moffat <zarniwhoop@ntlworld.com>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu

Reverting 714d9e4 and  f02abeb results in this diff which is more than this patch does.  Is that correct or should I just use your patch?

Alex

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e56bc925afcf..70540712ff2d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2360,6 +2360,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
        if (r)
                goto init_failed;

+       r = amdgpu_amdkfd_resume_iommu(adev);
+       if (r)
+               goto init_failed;
+
        r = amdgpu_device_ip_hw_init_phase1(adev);
        if (r)
                goto init_failed;
@@ -2398,10 +2402,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
        if (!adev->gmc.xgmi.pending_reset)
                amdgpu_amdkfd_device_init(adev);

-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               goto init_failed;
-
        amdgpu_fru_get_product_info(adev);

 init_failed:
@@ -3119,10 +3119,6 @@ static int amdgpu_device_ip_resume(struct amdgpu_device *adev)
 {
        int r;

-       r = amdgpu_amdkfd_resume_iommu(adev);
-       if (r)
-               return r;
-
        r = amdgpu_device_ip_resume_phase1(adev);
        if (r)
                return r;
@@ -4595,10 +4591,6 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
                                dev_warn(tmp_adev->dev, "asic atom init failed!");
                        } else {
                                dev_info(tmp_adev->dev, "GPU reset succeeded, trying to resume\n");
-                               r = amdgpu_amdkfd_resume_iommu(tmp_adev);
-                               if (r)
-                                       goto out;
-
                                r = amdgpu_device_ip_resume_phase1(tmp_adev);
                                if (r)
                                        goto out;


On Wed, Nov 3, 2021 at 10:50 AM Alex Deucher <alexdeucher@gmail.com<mailto:alexdeucher@gmail.com>> wrote:


On Wed, Nov 3, 2021 at 10:34 AM Zhu, James <James.Zhu@amd.com<mailto:James.Zhu@amd.com>> wrote:

[AMD Official Use Only]

Hi Alex,

Finally figured out the root cause for this broken,


Linux 5.14.15  + afd1818 can fix the issue.

I'll do that for stable.


Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again.

714d9e4 drm/amdgpu: init iommu after amdkfd device init

f02abeb drm/amdgpu: move iommu_resume before ip init/resume

afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso.

286826d drm/amdgpu: init iommu after amdkfd device init

9cec53c drm/amdgpu: move iommu_resume before ip init/resume

[cid:17ce6464fcfcb971f161]


So, do we just discard this patch, and revert 714d9e4 and  f02abeb?

I'll do that for 5.15+

Thanks for sorting this out.

Alex



Thanks & Best Regards!


James Zhu

________________________________
From: Alex Deucher <alexdeucher@gmail.com<mailto:alexdeucher@gmail.com>>
Sent: Tuesday, November 2, 2021 10:01 PM
To: Zhu, James <James.Zhu@amd.com<mailto:James.Zhu@amd.com>>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Zhang, Yifan <Yifan1.Zhang@amd.com<mailto:Yifan1.Zhang@amd.com>>; James Zhu <jzhums@gmail.com<mailto:jzhums@gmail.com>>; Ken Moffat <zarniwhoop@ntlworld.com<mailto:zarniwhoop@ntlworld.com>>
Subject: Re: [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu

On Tue, Nov 2, 2021 at 9:34 PM James Zhu <James.Zhu@amd.com<mailto:James.Zhu@amd.com>> wrote:
>
> Remove duplicated kfd_resume_iommu which already runs
> in mdgpu_amdkfd_device_init.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com<mailto:James.Zhu@amd.com>>

Once you get confirmation, please add:
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208277821%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a6WyuNGhOU5OT3J8GQtXSQ3O5r942D2p%2BbruFUncT0E%3D&amp;reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214859&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249968500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WhxYtNqFSoeWcuJSbJCCl99VSdd3XyHBVzjbpR3nx7g%3D&reserved=0>
Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&amp;data=04%7C01%7CJames.Zhu%40amd.com%7C8662c25150e94d9d664708d99e6deb2b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715017208287813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E1MFXdprEaldLux2AoXNEeDWL5E85WFv8CrfZODTa%2F4%3D&amp;reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1770&data=04%7C01%7CJames.Zhu%40amd.com%7C67f2c85612f7475d0dd008d99edb1fef%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637715486249978500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hX2U%2BcWp%2BEinTjxptnx0zExc%2Fy3lbFUYgHT2JDdUY0g%3D&reserved=0>

Acked-by: Alex Deucher <alexander.deucher@amd.com<mailto:alexander.deucher@amd.com>>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e56bc925afcf..f77823ce7ae8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2398,10 +2398,6 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>         if (!adev->gmc.xgmi.pending_reset)
>                 amdgpu_amdkfd_device_init(adev);
>
> -       r = amdgpu_amdkfd_resume_iommu(adev);
> -       if (r)
> -               goto init_failed;
> -
>         amdgpu_fru_get_product_info(adev);
>
>  init_failed:
> --
> 2.25.1
>

[-- Attachment #1.2: Type: text/html, Size: 16760 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 381936 bytes --]

  reply	other threads:[~2021-11-03 15:57 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03  1:33 [PATCH] drm/amdgpu: remove duplicated kfd_resume_iommu James Zhu
2021-11-03  2:01 ` Alex Deucher
2021-11-03  2:50   ` Ken Moffat
2021-11-03 14:34   ` Zhu, James
2021-11-03 14:50     ` Alex Deucher
2021-11-03 15:03       ` Alex Deucher
2021-11-03 15:54         ` Zhu, James [this message]
2021-11-03 15:57           ` Alex Deucher
2021-11-05  2:31             ` Ken Moffat
2021-11-03 15:35       ` Alex Deucher
2021-11-03 15:40 ` Alex Deucher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN6PR12MB1874A9156EF80C63D96EBD06E48C9@BN6PR12MB1874.namprd12.prod.outlook.com \
    --to=james.zhu@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Yifan1.Zhang@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jzhums@gmail.com \
    --cc=zarniwhoop@ntlworld.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.