All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: linux-kernel@vger.kernel.org
Cc: Rob Clark <robdclark@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ricardo Ribalda <ribalda@chromium.org>,
	Ross Zwisler <zwisler@kernel.org>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	Akhil P Oommen <quic_akhilpo@quicinc.com>,
	Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@gmail.com>,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>,
	dri-devel@lists.freedesktop.org, Emma Anholt <emma@anholt.net>,
	freedreno@lists.freedesktop.org, linux-arm-msm@vger.kernel.org,
	Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>,
	Vladimir Lypak <vladimir.lypak@gmail.com>
Subject: Re: [PATCH 1/2] adreno: Shutdown the GPU properly
Date: Fri, 11 Nov 2022 16:08:56 -0500	[thread overview]
Message-ID: <B336E259-FB18-4E16-8BC7-2117614ABE4D@joelfernandes.org> (raw)
In-Reply-To: <20221111194957.4046771-1-joel@joelfernandes.org>



> On Nov 11, 2022, at 2:50 PM, Joel Fernandes (Google) <joel@joelfernandes.org> wrote:
> 
> During kexec on ARM device, we notice that device_shutdown() only calls
> pm_runtime_force_suspend() while shutting down the GPU. This means the GPU
> kthread is still running and further, there maybe active submits.
> 
> This causes all kinds of issues during a kexec reboot:
> 
> Warning from shutdown path:
> 
> [  292.509662] WARNING: CPU: 0 PID: 6304 at [...] adreno_runtime_suspend+0x3c/0x44
> [  292.509863] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> [  292.509872] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  292.509881] pc : adreno_runtime_suspend+0x3c/0x44
> [  292.509891] lr : pm_generic_runtime_suspend+0x30/0x44
> [  292.509905] sp : ffffffc014473bf0
> [...]
> [  292.510043] Call trace:
> [  292.510051]  adreno_runtime_suspend+0x3c/0x44
> [  292.510061]  pm_generic_runtime_suspend+0x30/0x44
> [  292.510071]  pm_runtime_force_suspend+0x54/0xc8
> [  292.510081]  adreno_shutdown+0x1c/0x28
> [  292.510090]  platform_shutdown+0x2c/0x38
> [  292.510104]  device_shutdown+0x158/0x210
> [  292.510119]  kernel_restart_prepare+0x40/0x4c
> 
> And here from GPU kthread, an SError OOPs:
> 
> [  192.648789]  el1h_64_error+0x7c/0x80
> [  192.648812]  el1_interrupt+0x20/0x58
> [  192.648833]  el1h_64_irq_handler+0x18/0x24
> [  192.648854]  el1h_64_irq+0x7c/0x80
> [  192.648873]  local_daif_inherit+0x10/0x18
> [  192.648900]  el1h_64_sync_handler+0x48/0xb4
> [  192.648921]  el1h_64_sync+0x7c/0x80
> [  192.648941]  a6xx_gmu_set_oob+0xbc/0x1fc
> [  192.648968]  a6xx_hw_init+0x44/0xe38
> [  192.648991]  msm_gpu_hw_init+0x48/0x80
> [  192.649013]  msm_gpu_submit+0x5c/0x1a8
> [  192.649034]  msm_job_run+0xb0/0x11c
> [  192.649058]  drm_sched_main+0x170/0x434
> [  192.649086]  kthread+0x134/0x300
> [  192.649114]  ret_from_fork+0x10/0x20
> 
> Fix by calling adreno_system_suspend() in the device_shutdown() path.
> 
> Cc: Rob Clark <robdclark@chromium.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ricardo Ribalda <ribalda@chromium.org>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> drivers/gpu/drm/msm/adreno/adreno_device.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 24b489b6129a..f0cff62812c3 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -607,9 +607,12 @@ static int adreno_remove(struct platform_device *pdev)
>    return 0;
> }
> 
> +static int adreno_system_suspend(struct device *dev);
> static void adreno_shutdown(struct platform_device *pdev)
> {
> -    pm_runtime_force_suspend(&pdev->dev);
> +    struct msm_gpu *gpu = dev_to_gpu(&pdev->dev);
> +

This local variable definition should go to patch 2/2. Will fix in v2.

Thanks,

 - Joel


> +    WARN_ON_ONCE(adreno_system_suspend(&pdev->dev));
> }
> 
> static const struct of_device_id dt_match[] = {
> -- 
> 2.38.1.493.g58b659f92b-goog
> 

WARNING: multiple messages have this Message-ID (diff)
From: Joel Fernandes <joel@joelfernandes.org>
To: linux-kernel@vger.kernel.org
Cc: Rob Clark <robdclark@chromium.org>, Emma Anholt <emma@anholt.net>,
	Akhil P Oommen <quic_akhilpo@quicinc.com>,
	freedreno@lists.freedesktop.org, linux-arm-msm@vger.kernel.org,
	Ricardo Ribalda <ribalda@chromium.org>,
	Vladimir Lypak <vladimir.lypak@gmail.com>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	Steven Rostedt <rostedt@goodmis.org>, Sean Paul <sean@poorly.run>,
	dri-devel@lists.freedesktop.org,
	Ross Zwisler <zwisler@kernel.org>,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Subject: Re: [PATCH 1/2] adreno: Shutdown the GPU properly
Date: Fri, 11 Nov 2022 16:08:56 -0500	[thread overview]
Message-ID: <B336E259-FB18-4E16-8BC7-2117614ABE4D@joelfernandes.org> (raw)
In-Reply-To: <20221111194957.4046771-1-joel@joelfernandes.org>



> On Nov 11, 2022, at 2:50 PM, Joel Fernandes (Google) <joel@joelfernandes.org> wrote:
> 
> During kexec on ARM device, we notice that device_shutdown() only calls
> pm_runtime_force_suspend() while shutting down the GPU. This means the GPU
> kthread is still running and further, there maybe active submits.
> 
> This causes all kinds of issues during a kexec reboot:
> 
> Warning from shutdown path:
> 
> [  292.509662] WARNING: CPU: 0 PID: 6304 at [...] adreno_runtime_suspend+0x3c/0x44
> [  292.509863] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> [  292.509872] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  292.509881] pc : adreno_runtime_suspend+0x3c/0x44
> [  292.509891] lr : pm_generic_runtime_suspend+0x30/0x44
> [  292.509905] sp : ffffffc014473bf0
> [...]
> [  292.510043] Call trace:
> [  292.510051]  adreno_runtime_suspend+0x3c/0x44
> [  292.510061]  pm_generic_runtime_suspend+0x30/0x44
> [  292.510071]  pm_runtime_force_suspend+0x54/0xc8
> [  292.510081]  adreno_shutdown+0x1c/0x28
> [  292.510090]  platform_shutdown+0x2c/0x38
> [  292.510104]  device_shutdown+0x158/0x210
> [  292.510119]  kernel_restart_prepare+0x40/0x4c
> 
> And here from GPU kthread, an SError OOPs:
> 
> [  192.648789]  el1h_64_error+0x7c/0x80
> [  192.648812]  el1_interrupt+0x20/0x58
> [  192.648833]  el1h_64_irq_handler+0x18/0x24
> [  192.648854]  el1h_64_irq+0x7c/0x80
> [  192.648873]  local_daif_inherit+0x10/0x18
> [  192.648900]  el1h_64_sync_handler+0x48/0xb4
> [  192.648921]  el1h_64_sync+0x7c/0x80
> [  192.648941]  a6xx_gmu_set_oob+0xbc/0x1fc
> [  192.648968]  a6xx_hw_init+0x44/0xe38
> [  192.648991]  msm_gpu_hw_init+0x48/0x80
> [  192.649013]  msm_gpu_submit+0x5c/0x1a8
> [  192.649034]  msm_job_run+0xb0/0x11c
> [  192.649058]  drm_sched_main+0x170/0x434
> [  192.649086]  kthread+0x134/0x300
> [  192.649114]  ret_from_fork+0x10/0x20
> 
> Fix by calling adreno_system_suspend() in the device_shutdown() path.
> 
> Cc: Rob Clark <robdclark@chromium.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ricardo Ribalda <ribalda@chromium.org>
> Cc: Ross Zwisler <zwisler@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> drivers/gpu/drm/msm/adreno/adreno_device.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
> index 24b489b6129a..f0cff62812c3 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
> @@ -607,9 +607,12 @@ static int adreno_remove(struct platform_device *pdev)
>    return 0;
> }
> 
> +static int adreno_system_suspend(struct device *dev);
> static void adreno_shutdown(struct platform_device *pdev)
> {
> -    pm_runtime_force_suspend(&pdev->dev);
> +    struct msm_gpu *gpu = dev_to_gpu(&pdev->dev);
> +

This local variable definition should go to patch 2/2. Will fix in v2.

Thanks,

 - Joel


> +    WARN_ON_ONCE(adreno_system_suspend(&pdev->dev));
> }
> 
> static const struct of_device_id dt_match[] = {
> -- 
> 2.38.1.493.g58b659f92b-goog
> 

  parent reply	other threads:[~2022-11-11 21:09 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-11 19:49 [PATCH 1/2] adreno: Shutdown the GPU properly Joel Fernandes (Google)
2022-11-11 19:49 ` Joel Fernandes (Google)
2022-11-11 19:49 ` [PATCH 2/2] adreno: Detect shutdown during get_param() Joel Fernandes (Google)
2022-11-11 19:49   ` Joel Fernandes (Google)
2022-11-11 21:28   ` Akhil P Oommen
2022-11-11 21:28     ` Akhil P Oommen
2022-11-11 21:37     ` Joel Fernandes
2022-11-11 21:37       ` Joel Fernandes
2022-11-11 22:19       ` Joel Fernandes
2022-11-12 18:35     ` Rob Clark
2022-11-12 18:35       ` Rob Clark
2022-12-01 18:42       ` Joel Fernandes
2022-12-01 18:42         ` Joel Fernandes
2022-12-01 19:33         ` Rob Clark
2022-12-01 19:33           ` Rob Clark
2022-12-01 20:01           ` Joel Fernandes
2022-12-01 20:01             ` Joel Fernandes
2022-11-11 21:08 ` Joel Fernandes [this message]
2022-11-11 21:08   ` [PATCH 1/2] adreno: Shutdown the GPU properly Joel Fernandes
2022-11-12 18:44   ` Rob Clark
2022-11-12 18:44     ` Rob Clark
2022-12-01 20:08     ` Joel Fernandes
2022-12-01 20:08       ` Joel Fernandes
2022-12-01 22:06       ` Rob Clark
2022-12-01 22:06         ` Rob Clark
2022-12-01 22:13         ` Joel Fernandes
2022-12-01 22:13           ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B336E259-FB18-4E16-8BC7-2117614ABE4D@joelfernandes.org \
    --to=joel@joelfernandes.org \
    --cc=airlied@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=emma@anholt.net \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=quic_akhilpo@quicinc.com \
    --cc=ribalda@chromium.org \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=sean@poorly.run \
    --cc=vladimir.lypak@gmail.com \
    --cc=zwisler@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.