amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Liu, Monk" <Monk.Liu@amd.com>
To: "amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ
Date: Tue, 4 Aug 2020 06:31:25 +0000	[thread overview]
Message-ID: <DM5PR12MB1708B1EBA4B3F695A808F85B844A0@DM5PR12MB1708.namprd12.prod.outlook.com> (raw)
In-Reply-To: <DM5PR12MB170890EFFDD63DA297CDB254844D0@DM5PR12MB1708.namprd12.prod.outlook.com>

[AMD Official Use Only - Internal Distribution Only]

Ping ... this is a severe bug fix

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of Liu, Monk
Sent: Monday, August 3, 2020 9:55 AM
To: Kuehling, Felix <Felix.Kuehling@amd.com>; amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

[AMD Official Use Only - Internal Distribution Only]

[AMD Official Use Only - Internal Distribution Only]

>>In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the
>>right place to stop the KIQ

KIQ (CPC) will never being stopped (the DISABLE on CPC is skipped for SRIOV ) for SRIOV in SW_FINI because SRIOV relies on KIQ to do world switch

But this is really a weird bug because even with the same approach it doesn't make KIQ (CP) hang on GFX9, only GFX10 need this patch ....

By now I do not have other good explanation or better fix than this one

_____________________________________
Monk Liu|GPU Virtualization Team |AMD


-----Original Message-----
From: Kuehling, Felix <Felix.Kuehling@amd.com>
Sent: Friday, July 31, 2020 9:57 PM
To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ

In gfx_v10_0_sw_fini the KIQ ring gets freed. Wouldn't that be the right place to stop the KIQ? Otherwise KIQ will hang as soon as someone allocates the memory that was previously used for the KIQ ring buffer and overwrites it with something that's not a valid PM4 packet.

Regards,
  Felix

Am 2020-07-31 um 3:51 a.m. schrieb Monk Liu:
> KIQ will hang if we try below steps:
> modprobe amdgpu
> rmmod amdgpu
> modprobe amdgpu sched_hw_submission=4
>
> the cause is that due to KIQ is always living there even after we
> unload KMD thus when doing the realod of KMD KIQ will crash upon its
> register programed with different values with the previous
> configuration (the config like HQD addr, ring size, is easily changed
> if we alter the sched_hw_submission)
>
> the fix is we must inactive KIQ first before touching any of its
> registgers
>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index db9f1e8..f571e25 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -6433,6 +6433,9 @@ static int gfx_v10_0_kiq_init_register(struct
> amdgpu_ring *ring)  struct v10_compute_mqd *mqd = ring->mqd_ptr;  int
> j;
>
> +/* activate the queue */
> +WREG32_SOC15(GC, 0, mmCP_HQD_ACTIVE, 0);
> +
>  /* disable wptr polling */
>  WREG32_FIELD15(GC, 0, CP_PQ_WPTR_POLL_CNTL, EN, 0);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7Cmonk.liu%40amd.com%7C4837e2d566b44af845f608d837503a3b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637320165018899834&amp;sdata=TED%2BkhlYyAIyTmLJAZBBBHHnE6PRg4fpUsZhD9ke%2BPU%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2020-08-04  6:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31  7:51 [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Monk Liu
2020-07-31  7:51 ` [PATCH 2/2] drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) Monk Liu
2020-07-31 13:53   ` Felix Kuehling
2020-10-15 15:05     ` Marek Olšák
2020-07-31 13:57 ` [PATCH 1/2] drm/amdgpu: fix reload KMD hang on KIQ Felix Kuehling
2020-08-03  1:54   ` Liu, Monk
2020-08-04  6:31     ` Liu, Monk [this message]
2020-08-04  8:01       ` Deng, Emily

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR12MB1708B1EBA4B3F695A808F85B844A0@DM5PR12MB1708.namprd12.prod.outlook.com \
    --to=monk.liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).