From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Rob Clark <robdclark@chromium.org>, Steven Rostedt <rostedt@goodmis.org>, Ricardo Ribalda <ribalda@chromium.org>, Ross Zwisler <zwisler@kernel.org>, Abhinav Kumar <quic_abhinavk@quicinc.com>, Akhil P Oommen <quic_akhilpo@quicinc.com>, Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@gmail.com>, Dmitry Baryshkov <dmitry.baryshkov@linaro.org>, dri-devel@lists.freedesktop.org, Emma Anholt <emma@anholt.net>, freedreno@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>, Vladimir Lypak <vladimir.lypak@gmail.com> Subject: [PATCH 2/2] adreno: Detect shutdown during get_param() Date: Fri, 11 Nov 2022 19:49:57 +0000 [thread overview] Message-ID: <20221111194957.4046771-2-joel@joelfernandes.org> (raw) In-Reply-To: <20221111194957.4046771-1-joel@joelfernandes.org> Even though the GPU is shut down, during kexec reboot we can have userspace still running. This is especially true if KEXEC_JUMP is not enabled, because we do not freeze userspace in this case. To prevent crashes, track that the GPU is shutdown and prevent get_param() from accessing GPU resources if we find it shutdown. This fixes the following crash during kexec reboot on an ARM64 device with adreno GPU: [ 292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt [ 292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT) [ 292.534326] Call trace: [ 292.534328] dump_backtrace+0x0/0x1d4 [ 292.534337] show_stack+0x20/0x2c [ 292.534342] dump_stack_lvl+0x60/0x78 [ 292.534347] dump_stack+0x18/0x38 [ 292.534352] panic+0x148/0x3b0 [ 292.534357] nmi_panic+0x80/0x94 [ 292.534364] arm64_serror_panic+0x70/0x7c [ 292.534369] do_serror+0x0/0x7c [ 292.534372] do_serror+0x54/0x7c [ 292.534377] el1h_64_error_handler+0x34/0x4c [ 292.534381] el1h_64_error+0x7c/0x80 [ 292.534386] el1_interrupt+0x20/0x58 [ 292.534389] el1h_64_irq_handler+0x18/0x24 [ 292.534395] el1h_64_irq+0x7c/0x80 [ 292.534399] local_daif_inherit+0x10/0x18 [ 292.534405] el1h_64_sync_handler+0x48/0xb4 [ 292.534410] el1h_64_sync+0x7c/0x80 [ 292.534414] a6xx_gmu_set_oob+0xbc/0x1fc [ 292.534422] a6xx_get_timestamp+0x40/0xb4 [ 292.534426] adreno_get_param+0x12c/0x1e0 [ 292.534433] msm_ioctl_get_param+0x64/0x70 [ 292.534440] drm_ioctl_kernel+0xe8/0x158 [ 292.534448] drm_ioctl+0x208/0x320 [ 292.534453] __arm64_sys_ioctl+0x98/0xd0 [ 292.534461] invoke_syscall+0x4c/0x118 [ 292.534467] el0_svc_common+0x98/0x104 [ 292.534473] do_el0_svc+0x30/0x80 [ 292.534478] el0_svc+0x20/0x50 [ 292.534481] el0t_64_sync_handler+0x78/0x108 [ 292.534485] el0t_64_sync+0x1a4/0x1a8 [ 292.534632] Kernel Offset: 0x1a5f800000 from 0xffffffc008000000 [ 292.534635] PHYS_OFFSET: 0x80000000 [ 292.534638] CPU features: 0x40018541,a3300e42 [ 292.534644] Memory Limit: none Cc: Rob Clark <robdclark@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ricardo Ribalda <ribalda@chromium.org> Cc: Ross Zwisler <zwisler@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- drivers/gpu/drm/msm/adreno/adreno_device.c | 1 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/gpu/drm/msm/msm_gpu.h | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c index f0cff62812c3..03d912dc0130 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_device.c +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c @@ -612,6 +612,7 @@ static void adreno_shutdown(struct platform_device *pdev) { struct msm_gpu *gpu = dev_to_gpu(&pdev->dev); + gpu->is_shutdown = true; WARN_ON_ONCE(adreno_system_suspend(&pdev->dev)); } diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 382fb7f9e497..6903c6892469 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -251,7 +251,7 @@ int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx, struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); /* No pointer params yet */ - if (*len != 0) + if (*len != 0 || gpu->is_shutdown) return -EINVAL; switch (param) { diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index ff911e7305ce..f18b0a91442b 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -214,6 +214,9 @@ struct msm_gpu { /* does gpu need hw_init? */ bool needs_hw_init; + /* is the GPU shutdown? */ + bool is_shutdown; + /** * global_faults: number of GPU hangs not attributed to a particular * address space -- 2.38.1.493.g58b659f92b-goog
WARNING: multiple messages have this Message-ID (diff)
From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org Cc: Rob Clark <robdclark@chromium.org>, Emma Anholt <emma@anholt.net>, Akhil P Oommen <quic_akhilpo@quicinc.com>, freedreno@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, Ricardo Ribalda <ribalda@chromium.org>, Vladimir Lypak <vladimir.lypak@gmail.com>, Abhinav Kumar <quic_abhinavk@quicinc.com>, Steven Rostedt <rostedt@goodmis.org>, Dmitry Baryshkov <dmitry.baryshkov@linaro.org>, dri-devel@lists.freedesktop.org, Ross Zwisler <zwisler@kernel.org>, "Joel Fernandes \(Google\)" <joel@joelfernandes.org>, Sean Paul <sean@poorly.run> Subject: [PATCH 2/2] adreno: Detect shutdown during get_param() Date: Fri, 11 Nov 2022 19:49:57 +0000 [thread overview] Message-ID: <20221111194957.4046771-2-joel@joelfernandes.org> (raw) In-Reply-To: <20221111194957.4046771-1-joel@joelfernandes.org> Even though the GPU is shut down, during kexec reboot we can have userspace still running. This is especially true if KEXEC_JUMP is not enabled, because we do not freeze userspace in this case. To prevent crashes, track that the GPU is shutdown and prevent get_param() from accessing GPU resources if we find it shutdown. This fixes the following crash during kexec reboot on an ARM64 device with adreno GPU: [ 292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt [ 292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT) [ 292.534326] Call trace: [ 292.534328] dump_backtrace+0x0/0x1d4 [ 292.534337] show_stack+0x20/0x2c [ 292.534342] dump_stack_lvl+0x60/0x78 [ 292.534347] dump_stack+0x18/0x38 [ 292.534352] panic+0x148/0x3b0 [ 292.534357] nmi_panic+0x80/0x94 [ 292.534364] arm64_serror_panic+0x70/0x7c [ 292.534369] do_serror+0x0/0x7c [ 292.534372] do_serror+0x54/0x7c [ 292.534377] el1h_64_error_handler+0x34/0x4c [ 292.534381] el1h_64_error+0x7c/0x80 [ 292.534386] el1_interrupt+0x20/0x58 [ 292.534389] el1h_64_irq_handler+0x18/0x24 [ 292.534395] el1h_64_irq+0x7c/0x80 [ 292.534399] local_daif_inherit+0x10/0x18 [ 292.534405] el1h_64_sync_handler+0x48/0xb4 [ 292.534410] el1h_64_sync+0x7c/0x80 [ 292.534414] a6xx_gmu_set_oob+0xbc/0x1fc [ 292.534422] a6xx_get_timestamp+0x40/0xb4 [ 292.534426] adreno_get_param+0x12c/0x1e0 [ 292.534433] msm_ioctl_get_param+0x64/0x70 [ 292.534440] drm_ioctl_kernel+0xe8/0x158 [ 292.534448] drm_ioctl+0x208/0x320 [ 292.534453] __arm64_sys_ioctl+0x98/0xd0 [ 292.534461] invoke_syscall+0x4c/0x118 [ 292.534467] el0_svc_common+0x98/0x104 [ 292.534473] do_el0_svc+0x30/0x80 [ 292.534478] el0_svc+0x20/0x50 [ 292.534481] el0t_64_sync_handler+0x78/0x108 [ 292.534485] el0t_64_sync+0x1a4/0x1a8 [ 292.534632] Kernel Offset: 0x1a5f800000 from 0xffffffc008000000 [ 292.534635] PHYS_OFFSET: 0x80000000 [ 292.534638] CPU features: 0x40018541,a3300e42 [ 292.534644] Memory Limit: none Cc: Rob Clark <robdclark@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ricardo Ribalda <ribalda@chromium.org> Cc: Ross Zwisler <zwisler@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- drivers/gpu/drm/msm/adreno/adreno_device.c | 1 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/gpu/drm/msm/msm_gpu.h | 3 +++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c index f0cff62812c3..03d912dc0130 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_device.c +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c @@ -612,6 +612,7 @@ static void adreno_shutdown(struct platform_device *pdev) { struct msm_gpu *gpu = dev_to_gpu(&pdev->dev); + gpu->is_shutdown = true; WARN_ON_ONCE(adreno_system_suspend(&pdev->dev)); } diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 382fb7f9e497..6903c6892469 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -251,7 +251,7 @@ int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx, struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu); /* No pointer params yet */ - if (*len != 0) + if (*len != 0 || gpu->is_shutdown) return -EINVAL; switch (param) { diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index ff911e7305ce..f18b0a91442b 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -214,6 +214,9 @@ struct msm_gpu { /* does gpu need hw_init? */ bool needs_hw_init; + /* is the GPU shutdown? */ + bool is_shutdown; + /** * global_faults: number of GPU hangs not attributed to a particular * address space -- 2.38.1.493.g58b659f92b-goog
next prev parent reply other threads:[~2022-11-11 19:50 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-11-11 19:49 [PATCH 1/2] adreno: Shutdown the GPU properly Joel Fernandes (Google) 2022-11-11 19:49 ` Joel Fernandes (Google) 2022-11-11 19:49 ` Joel Fernandes (Google) [this message] 2022-11-11 19:49 ` [PATCH 2/2] adreno: Detect shutdown during get_param() Joel Fernandes (Google) 2022-11-11 21:28 ` Akhil P Oommen 2022-11-11 21:28 ` Akhil P Oommen 2022-11-11 21:37 ` Joel Fernandes 2022-11-11 21:37 ` Joel Fernandes 2022-11-11 22:19 ` Joel Fernandes 2022-11-12 18:35 ` Rob Clark 2022-11-12 18:35 ` Rob Clark 2022-12-01 18:42 ` Joel Fernandes 2022-12-01 18:42 ` Joel Fernandes 2022-12-01 19:33 ` Rob Clark 2022-12-01 19:33 ` Rob Clark 2022-12-01 20:01 ` Joel Fernandes 2022-12-01 20:01 ` Joel Fernandes 2022-11-11 21:08 ` [PATCH 1/2] adreno: Shutdown the GPU properly Joel Fernandes 2022-11-11 21:08 ` Joel Fernandes 2022-11-12 18:44 ` Rob Clark 2022-11-12 18:44 ` Rob Clark 2022-12-01 20:08 ` Joel Fernandes 2022-12-01 20:08 ` Joel Fernandes 2022-12-01 22:06 ` Rob Clark 2022-12-01 22:06 ` Rob Clark 2022-12-01 22:13 ` Joel Fernandes 2022-12-01 22:13 ` Joel Fernandes
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20221111194957.4046771-2-joel@joelfernandes.org \ --to=joel@joelfernandes.org \ --cc=airlied@gmail.com \ --cc=daniel@ffwll.ch \ --cc=dmitry.baryshkov@linaro.org \ --cc=dri-devel@lists.freedesktop.org \ --cc=emma@anholt.net \ --cc=freedreno@lists.freedesktop.org \ --cc=linux-arm-msm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=quic_abhinavk@quicinc.com \ --cc=quic_akhilpo@quicinc.com \ --cc=ribalda@chromium.org \ --cc=robdclark@chromium.org \ --cc=robdclark@gmail.com \ --cc=rostedt@goodmis.org \ --cc=sean@poorly.run \ --cc=vladimir.lypak@gmail.com \ --cc=zwisler@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.