All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Akhil P Oommen <quic_akhilpo@quicinc.com>
Cc: Rob Clark <robdclark@chromium.org>,
	freedreno@lists.freedesktop.org, Emma Anholt <emma@anholt.net>,
	Sean Paul <sean@poorly.run>,
	linux-arm-msm@vger.kernel.org, Ross Zwisler <zwisler@kernel.org>,
	Vladimir Lypak <vladimir.lypak@gmail.com>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	Ricardo Ribalda <ribalda@chromium.org>,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Subject: Re: [PATCH 2/2] adreno: Detect shutdown during get_param()
Date: Fri, 11 Nov 2022 17:19:57 -0500	[thread overview]
Message-ID: <CAEXW_YTnFsMVX6-dGap0UdbhmZeMd+fhkq9Y3tV2QT8wB9Y2DA@mail.gmail.com> (raw)
In-Reply-To: <F4D72FA8-C1D1-46ED-B56E-8BEFBB297E4A@joelfernandes.org>

[-- Attachment #1: Type: text/plain, Size: 5876 bytes --]

On Fri, Nov 11, 2022 at 4:37 PM Joel Fernandes <joel@joelfernandes.org>
wrote:

>
>
> > On Nov 11, 2022, at 4:28 PM, Akhil P Oommen <quic_akhilpo@quicinc.com>
> wrote:
> >
> > On 11/12/2022 1:19 AM, Joel Fernandes (Google) wrote:
> >> Even though the GPU is shut down, during kexec reboot we can have
> userspace
> >> still running. This is especially true if KEXEC_JUMP is not enabled,
> because we
> >> do not freeze userspace in this case.
> >>
> >> To prevent crashes, track that the GPU is shutdown and prevent
> get_param() from
> >> accessing GPU resources if we find it shutdown.
> >>
> >> This fixes the following crash during kexec reboot on an ARM64 device
> with adreno GPU:
> >>
> >> [  292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt
> >> [  292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> >> [  292.534326] Call trace:
> >> [  292.534328]  dump_backtrace+0x0/0x1d4
> >> [  292.534337]  show_stack+0x20/0x2c
> >> [  292.534342]  dump_stack_lvl+0x60/0x78
> >> [  292.534347]  dump_stack+0x18/0x38
> >> [  292.534352]  panic+0x148/0x3b0
> >> [  292.534357]  nmi_panic+0x80/0x94
> >> [  292.534364]  arm64_serror_panic+0x70/0x7c
> >> [  292.534369]  do_serror+0x0/0x7c
> >> [  292.534372]  do_serror+0x54/0x7c
> >> [  292.534377]  el1h_64_error_handler+0x34/0x4c
> >> [  292.534381]  el1h_64_error+0x7c/0x80
> >> [  292.534386]  el1_interrupt+0x20/0x58
> >> [  292.534389]  el1h_64_irq_handler+0x18/0x24
> >> [  292.534395]  el1h_64_irq+0x7c/0x80
> >> [  292.534399]  local_daif_inherit+0x10/0x18
> >> [  292.534405]  el1h_64_sync_handler+0x48/0xb4
> >> [  292.534410]  el1h_64_sync+0x7c/0x80
> >> [  292.534414]  a6xx_gmu_set_oob+0xbc/0x1fc
> >> [  292.534422]  a6xx_get_timestamp+0x40/0xb4
> >> [  292.534426]  adreno_get_param+0x12c/0x1e0
> >> [  292.534433]  msm_ioctl_get_param+0x64/0x70
> >> [  292.534440]  drm_ioctl_kernel+0xe8/0x158
> >> [  292.534448]  drm_ioctl+0x208/0x320
> >> [  292.534453]  __arm64_sys_ioctl+0x98/0xd0
> >> [  292.534461]  invoke_syscall+0x4c/0x118
> >> [  292.534467]  el0_svc_common+0x98/0x104
> >> [  292.534473]  do_el0_svc+0x30/0x80
> >> [  292.534478]  el0_svc+0x20/0x50
> >> [  292.534481]  el0t_64_sync_handler+0x78/0x108
> >> [  292.534485]  el0t_64_sync+0x1a4/0x1a8
> >> [  292.534632] Kernel Offset: 0x1a5f800000 from 0xffffffc008000000
> >> [  292.534635] PHYS_OFFSET: 0x80000000
> >> [  292.534638] CPU features: 0x40018541,a3300e42
> >> [  292.534644] Memory Limit: none
> >>
> >> Cc: Rob Clark <robdclark@chromium.org>
> >> Cc: Steven Rostedt <rostedt@goodmis.org>
> >> Cc: Ricardo Ribalda <ribalda@chromium.org>
> >> Cc: Ross Zwisler <zwisler@kernel.org>
> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >> ---
> >>  drivers/gpu/drm/msm/adreno/adreno_device.c | 1 +
> >>  drivers/gpu/drm/msm/adreno/adreno_gpu.c    | 2 +-
> >>  drivers/gpu/drm/msm/msm_gpu.h              | 3 +++
> >>  3 files changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c
> b/drivers/gpu/drm/msm/adreno/adreno_device.c
> >> index f0cff62812c3..03d912dc0130 100644
> >> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> >> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> >> @@ -612,6 +612,7 @@ static void adreno_shutdown(struct platform_device
> *pdev)
> >>  {
> >>      struct msm_gpu *gpu = dev_to_gpu(&pdev->dev);
> >>  +    gpu->is_shutdown = true;
> >>      WARN_ON_ONCE(adreno_system_suspend(&pdev->dev));
> >>  }
> >>  diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> index 382fb7f9e497..6903c6892469 100644
> >> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> >> @@ -251,7 +251,7 @@ int adreno_get_param(struct msm_gpu *gpu, struct
> msm_file_private *ctx,
> >>      struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >>        /* No pointer params yet */
> >> -    if (*len != 0)
> >> +    if (*len != 0 || gpu->is_shutdown)
> >>          return -EINVAL;
> > This will race with shutdown.
>
> Could you clarify what you mean? At this point in the code, the shutdown
> is completed and it crashes here.
>


Ok so I think you meant that if the shut down happens after we sample the
is_shutdown, then we run into the same issue.

I can’t reproduce that but I’ll look into that. Another way might be to
synchronize using a mutex. Though maybe the shutdown path can wait for
active pm_runtime references?

Thanks.




> > Probably, propagating back the return value of pm_runtime_get() in every
> possible ioctl call path is the right thing to do.
>
> Ok I’ll look into that. But the patch I posted works reliably and fixes
> all crashes we could reproduce.
>
> > I have never thought about this scenario. Do you know why userspace is
> not freezed before kexec?
>
> I am not sure. It depends on how kexec is used. The userspace freeze
> happens only when kexec is called to switch back and forth between
> different kernels (persistence mode). In such scenario I believe the
> userspace has to be frozen and unfrozen. However for normal kexec, that
> does not happen.
>
> Thanks.
>
>
> >
> > -Akhil.
> >>        switch (param) {
> >> diff --git a/drivers/gpu/drm/msm/msm_gpu.h
> b/drivers/gpu/drm/msm/msm_gpu.h
> >> index ff911e7305ce..f18b0a91442b 100644
> >> --- a/drivers/gpu/drm/msm/msm_gpu.h
> >> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> >> @@ -214,6 +214,9 @@ struct msm_gpu {
> >>      /* does gpu need hw_init? */
> >>      bool needs_hw_init;
> >>  +    /* is the GPU shutdown? */
> >> +    bool is_shutdown;
> >> +
> >>      /**
> >>       * global_faults: number of GPU hangs not attributed to a
> particular
> >>       * address space
> >
>

[-- Attachment #2: Type: text/html, Size: 7791 bytes --]

  reply	other threads:[~2022-11-11 22:20 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-11 19:49 [PATCH 1/2] adreno: Shutdown the GPU properly Joel Fernandes (Google)
2022-11-11 19:49 ` Joel Fernandes (Google)
2022-11-11 19:49 ` [PATCH 2/2] adreno: Detect shutdown during get_param() Joel Fernandes (Google)
2022-11-11 19:49   ` Joel Fernandes (Google)
2022-11-11 21:28   ` Akhil P Oommen
2022-11-11 21:28     ` Akhil P Oommen
2022-11-11 21:37     ` Joel Fernandes
2022-11-11 21:37       ` Joel Fernandes
2022-11-11 22:19       ` Joel Fernandes [this message]
2022-11-12 18:35     ` Rob Clark
2022-11-12 18:35       ` Rob Clark
2022-12-01 18:42       ` Joel Fernandes
2022-12-01 18:42         ` Joel Fernandes
2022-12-01 19:33         ` Rob Clark
2022-12-01 19:33           ` Rob Clark
2022-12-01 20:01           ` Joel Fernandes
2022-12-01 20:01             ` Joel Fernandes
2022-11-11 21:08 ` [PATCH 1/2] adreno: Shutdown the GPU properly Joel Fernandes
2022-11-11 21:08   ` Joel Fernandes
2022-11-12 18:44   ` Rob Clark
2022-11-12 18:44     ` Rob Clark
2022-12-01 20:08     ` Joel Fernandes
2022-12-01 20:08       ` Joel Fernandes
2022-12-01 22:06       ` Rob Clark
2022-12-01 22:06         ` Rob Clark
2022-12-01 22:13         ` Joel Fernandes
2022-12-01 22:13           ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEXW_YTnFsMVX6-dGap0UdbhmZeMd+fhkq9Y3tV2QT8wB9Y2DA@mail.gmail.com \
    --to=joel@joelfernandes.org \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=emma@anholt.net \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=quic_akhilpo@quicinc.com \
    --cc=ribalda@chromium.org \
    --cc=robdclark@chromium.org \
    --cc=rostedt@goodmis.org \
    --cc=sean@poorly.run \
    --cc=vladimir.lypak@gmail.com \
    --cc=zwisler@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.