All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhou1, Tao" <Tao.Zhou1@amd.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: "Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Chai, Thomas" <YiPeng.Chai@amd.com>,
	"Yang, Stanley" <Stanley.Yang@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Zhang,  Hawking" <Hawking.Zhang@amd.com>
Subject: RE: [PATCH] drm/amdkfd: print unmap queue status for RAS poison consumption (v2)
Date: Tue, 22 Mar 2022 02:57:17 +0000	[thread overview]
Message-ID: <DM5PR12MB1770E56F3E2A5171EE11D05AB0179@DM5PR12MB1770.namprd12.prod.outlook.com> (raw)
In-Reply-To: <14f7d9cc-e0e2-260f-0073-dde2d40a44f1@molgen.mpg.de>

[AMD Official Use Only]



> -----Original Message-----
> From: Paul Menzel <pmenzel@molgen.mpg.de>
> Sent: Monday, March 21, 2022 6:47 PM
> To: Zhou1, Tao <Tao.Zhou1@amd.com>
> Cc: amd-gfx@lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang@amd.com>; Kuehling, Felix <Felix.Kuehling@amd.com>; Yang,
> Stanley <Stanley.Yang@amd.com>; Chai, Thomas <YiPeng.Chai@amd.com>
> Subject: Re: [PATCH] drm/amdkfd: print unmap queue status for RAS poison
> consumption (v2)
> 
> Dear Tao,
> 
> 
> Thank you for the patch.
> 
> 
> Am 21.03.22 um 10:38 schrieb Tao Zhou:
> > Print the status out when it passes, and also tell user gpu reset is
> > triggered when we fallback to legacy way.
> >
> > v2: make the message more explicitly.
> >
> > Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 11 +++++++----
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > index 56902b5bb7b6..32c451f21db7 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > @@ -105,8 +105,6 @@ static void
> event_interrupt_poison_consumption(struct kfd_dev *dev,
> >   	if (old_poison)
> >   		return;
> >
> > -	pr_warn("RAS poison consumption handling: client id %d\n", client_id);
> > -
> >   	switch (client_id) {
> >   	case SOC15_IH_CLIENTID_SE0SH:
> >   	case SOC15_IH_CLIENTID_SE1SH:
> > @@ -130,10 +128,15 @@ static void
> event_interrupt_poison_consumption(struct kfd_dev *dev,
> >   	/* resetting queue passes, do page retirement without gpu reset
> >   	 * resetting queue fails, fallback to gpu reset solution
> >   	 */
> > -	if (!ret)
> > +	if (!ret) {
> > +		pr_warn("RAS poison consumption, unmap queue flow succeeds:
> client id %d\n",
> > +				client_id);
> 
> succeeded? As it’s a success message, should it be an informational message?

[Tao] thanks, will change to use succeeded before push. Although it reports success, poison consumption is not a usual event.

> 
> >   		amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
> false);
> > -	else
> > +	} else {
> > +		pr_warn("RAS poison consumption, fallback to gpu reset flow:
> client
> > +id %d\n",
> 
> Fall back.
> 
> > +				client_id);
> >   		amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
> true);
> 
> Could the log be moved somehow to the handler?

[Tao] Could not. Unmap queue isn’t called in the handler and client_id isn't transferred to the handler.

> 
> > +	}
> >   }
> >
> >   static bool event_interrupt_isr_v9(struct kfd_dev *dev,
> 
> Unrelated to the patch, at least I as user, would wish these warnings to be more
> elaborate, telling me, what the problem is, what effects it has, and what to do
> to fix it.

[Tao] It's difficult. You need a document instead of dmesg log to tell you all the details.

> 
> 
> Kind regards,
> 
> Paul

  reply	other threads:[~2022-03-22  2:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21  9:38 [PATCH] drm/amdkfd: print unmap queue status for RAS poison consumption (v2) Tao Zhou
2022-03-21 10:47 ` Paul Menzel
2022-03-22  2:57   ` Zhou1, Tao [this message]
2022-03-21 10:50 ` Zhang, Hawking
2022-03-21 11:21 ` Lazar, Lijo
2022-03-22  3:17   ` Zhou1, Tao
2022-03-22 14:05     ` Felix Kuehling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR12MB1770E56F3E2A5171EE11D05AB0179@DM5PR12MB1770.namprd12.prod.outlook.com \
    --to=tao.zhou1@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=Stanley.Yang@amd.com \
    --cc=YiPeng.Chai@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.