All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Deucher <alexdeucher@gmail.com>
To: Evan Quan <evan.quan@amd.com>
Cc: "Deucher, Alexander" <alexander.deucher@amd.com>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amd/powerplay: unify the prompts on thermal interrupts
Date: Wed, 20 May 2020 09:11:26 -0400	[thread overview]
Message-ID: <CADnq5_OD0yfr-UGgffm7_CdaBC6KnkzsxYceG9Ho9Gg0HySxNQ@mail.gmail.com> (raw)
In-Reply-To: <20200520103948.30993-1-evan.quan@amd.com>

On Wed, May 20, 2020 at 6:40 AM Evan Quan <evan.quan@amd.com> wrote:
>
> The prompts will contain pci address(segment/bus/port/function),
> severity(warn or error) and some keywords(GPU, amdgpu). Also this
> address the issue that pci bus retrieved by PCI_BUS_NUM(adev->pdev->devfn)
> is wrong.
>
> Change-Id: I714d1dffb30a6cf76dcede087cf5d9302f683ed8
> Signed-off-by: Evan Quan <evan.quan@amd.com>

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

> ---
>  .../gpu/drm/amd/powerplay/hwmgr/smu_helper.c  | 38 +++++--------------
>  drivers/gpu/drm/amd/powerplay/smu_v11_0.c     | 26 ++++---------
>  2 files changed, 17 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
> index 4279f95ba779..60b5ca974356 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/smu_helper.c
> @@ -597,58 +597,40 @@ int phm_irq_process(struct amdgpu_device *adev,
>
>         if (client_id == AMDGPU_IRQ_CLIENTID_LEGACY) {
>                 if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_LOW_TO_HIGH) {
> -                       pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n",
> -                                               PCI_BUS_NUM(adev->pdev->devfn),
> -                                               PCI_SLOT(adev->pdev->devfn),
> -                                               PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n");
>                         /*
>                          * SW CTF just occurred.
>                          * Try to do a graceful shutdown to prevent further damage.
>                          */
> -                       dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n");
> +                       dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n");
>                         orderly_poweroff(true);
>                 } else if (src_id == VISLANDS30_IV_SRCID_CG_TSS_THERMAL_HIGH_TO_LOW)
> -                       pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n",
> -                                       PCI_BUS_NUM(adev->pdev->devfn),
> -                                       PCI_SLOT(adev->pdev->devfn),
> -                                       PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU under temperature range detected!\n");
>                 else if (src_id == VISLANDS30_IV_SRCID_GPIO_19) {
> -                       pr_warn("GPU Critical Temperature Fault detected on PCIe %d:%d.%d!\n",
> -                                       PCI_BUS_NUM(adev->pdev->devfn),
> -                                       PCI_SLOT(adev->pdev->devfn),
> -                                       PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU HW Critical Temperature Fault(aka CTF) detected!\n");
>                         /*
>                          * HW CTF just occurred. Shutdown to prevent further damage.
>                          */
> -                       dev_emerg(adev->dev, "System is going to shutdown due to HW CTF!\n");
> +                       dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU HW CTF!\n");
>                         orderly_poweroff(true);
>                 }
>         } else if (client_id == SOC15_IH_CLIENTID_THM) {
>                 if (src_id == 0) {
> -                       pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n",
> -                                               PCI_BUS_NUM(adev->pdev->devfn),
> -                                               PCI_SLOT(adev->pdev->devfn),
> -                                               PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n");
>                         /*
>                          * SW CTF just occurred.
>                          * Try to do a graceful shutdown to prevent further damage.
>                          */
> -                       dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n");
> +                       dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n");
>                         orderly_poweroff(true);
>                 } else
> -                       pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n",
> -                                       PCI_BUS_NUM(adev->pdev->devfn),
> -                                       PCI_SLOT(adev->pdev->devfn),
> -                                       PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU under temperature range detected!\n");
>         } else if (client_id == SOC15_IH_CLIENTID_ROM_SMUIO) {
> -               pr_warn("GPU Critical Temperature Fault detected on PCIe %d:%d.%d!\n",
> -                               PCI_BUS_NUM(adev->pdev->devfn),
> -                               PCI_SLOT(adev->pdev->devfn),
> -                               PCI_FUNC(adev->pdev->devfn));
> +               dev_emerg(adev->dev, "ERROR: GPU HW Critical Temperature Fault(aka CTF) detected!\n");
>                 /*
>                  * HW CTF just occurred. Shutdown to prevent further damage.
>                  */
> -               dev_emerg(adev->dev, "System is going to shutdown due to HW CTF!\n");
> +               dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU HW CTF!\n");
>                 orderly_poweroff(true);
>         }
>
> diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> index c1ba77344107..f56789f8ec11 100644
> --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c
> @@ -1540,40 +1540,28 @@ static int smu_v11_0_irq_process(struct amdgpu_device *adev,
>         if (client_id == SOC15_IH_CLIENTID_THM) {
>                 switch (src_id) {
>                 case THM_11_0__SRCID__THM_DIG_THERM_L2H:
> -                       pr_warn("GPU over temperature range detected on PCIe %d:%d.%d!\n",
> -                               PCI_BUS_NUM(adev->pdev->devfn),
> -                               PCI_SLOT(adev->pdev->devfn),
> -                               PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU over temperature range(SW CTF) detected!\n");
>                         /*
>                          * SW CTF just occurred.
>                          * Try to do a graceful shutdown to prevent further damage.
>                          */
> -                       dev_emerg(adev->dev, "System is going to shutdown due to SW CTF!\n");
> +                       dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU SW CTF!\n");
>                         orderly_poweroff(true);
>                 break;
>                 case THM_11_0__SRCID__THM_DIG_THERM_H2L:
> -                       pr_warn("GPU under temperature range detected on PCIe %d:%d.%d!\n",
> -                               PCI_BUS_NUM(adev->pdev->devfn),
> -                               PCI_SLOT(adev->pdev->devfn),
> -                               PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU under temperature range detected\n");
>                 break;
>                 default:
> -                       pr_warn("GPU under temperature range unknown src id (%d), detected on PCIe %d:%d.%d!\n",
> -                               src_id,
> -                               PCI_BUS_NUM(adev->pdev->devfn),
> -                               PCI_SLOT(adev->pdev->devfn),
> -                               PCI_FUNC(adev->pdev->devfn));
> +                       dev_emerg(adev->dev, "ERROR: GPU under temperature range unknown src id (%d)\n",
> +                               src_id);
>                 break;
>                 }
>         } else if (client_id == SOC15_IH_CLIENTID_ROM_SMUIO) {
> -               pr_warn("GPU Critical Temperature Fault detected on PCIe %d:%d.%d!\n",
> -                               PCI_BUS_NUM(adev->pdev->devfn),
> -                               PCI_SLOT(adev->pdev->devfn),
> -                               PCI_FUNC(adev->pdev->devfn));
> +               dev_emerg(adev->dev, "ERROR: GPU HW Critical Temperature Fault(aka CTF) detected!\n");
>                 /*
>                  * HW CTF just occurred. Shutdown to prevent further damage.
>                  */
> -               dev_emerg(adev->dev, "System is going to shutdown due to HW CTF!\n");
> +               dev_emerg(adev->dev, "ERROR: System is going to shutdown due to GPU HW CTF!\n");
>                 orderly_poweroff(true);
>         } else if (client_id == SOC15_IH_CLIENTID_MP1) {
>                 if (src_id == 0xfe) {
> --
> 2.26.2
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

      reply	other threads:[~2020-05-20 13:11 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 10:39 [PATCH] drm/amd/powerplay: unify the prompts on thermal interrupts Evan Quan
2020-05-20 13:11 ` Alex Deucher [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADnq5_OD0yfr-UGgffm7_CdaBC6KnkzsxYceG9Ho9Gg0HySxNQ@mail.gmail.com \
    --to=alexdeucher@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=evan.quan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.