All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akhil P Oommen <quic_akhilpo@quicinc.com>
To: Rob Clark <robdclark@gmail.com>
Cc: freedreno <freedreno@lists.freedesktop.org>,
	Jonathan Marek <jonathan@marek.ca>,
	Jordan Crouse <jordan@cosmicpenguin.net>,
	David Airlie <airlied@linux.ie>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Konrad Dybcio <konrad.dybcio@somainline.org>,
	Doug Anderson <dianders@chromium.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	Matthias Kaehlcke <mka@chromium.org>,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Sean Paul <sean@poorly.run>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Freedreno] [PATCH v2 3/7] drm/msm: Fix cx collapse issue during recovery
Date: Wed, 13 Jul 2022 00:45:33 +0530	[thread overview]
Message-ID: <3c150bc9-68a0-7a35-6511-f80a42e8945b@quicinc.com> (raw)
In-Reply-To: <CAF6AEGvjD3LRm40mPr4n+jzx71WmwYpVWizUDLct9cgafjFRyw@mail.gmail.com>

On 7/12/2022 10:14 PM, Rob Clark wrote:
> On Mon, Jul 11, 2022 at 10:05 PM Akhil P Oommen
> <quic_akhilpo@quicinc.com> wrote:
>> On 7/12/2022 4:52 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Fri, Jul 8, 2022 at 11:00 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>>> There are some hardware logic under CX domain. For a successful
>>>> recovery, we should ensure cx headswitch collapses to ensure all the
>>>> stale states are cleard out. This is especially true to for a6xx family
>>>> where we can GMU co-processor.
>>>>
>>>> Currently, cx doesn't collapse due to a devlink between gpu and its
>>>> smmu. So the *struct gpu device* needs to be runtime suspended to ensure
>>>> that the iommu driver removes its vote on cx gdsc.
>>>>
>>>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>>>> ---
>>>>
>>>> (no changes since v1)
>>>>
>>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 16 ++++++++++++++--
>>>>    drivers/gpu/drm/msm/msm_gpu.c         |  2 --
>>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> index 4d50110..7ed347c 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> @@ -1278,8 +1278,20 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>            */
>>>>           gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>
>>>> -       gpu->funcs->pm_suspend(gpu);
>>>> -       gpu->funcs->pm_resume(gpu);
>>>> +       /*
>>>> +        * Now drop all the pm_runtime usage count to allow cx gdsc to collapse.
>>>> +        * First drop the usage count from all active submits
>>>> +        */
>>>> +       for (i = gpu->active_submits; i > 0; i--)
>>>> +               pm_runtime_put(&gpu->pdev->dev);
>>>> +
>>>> +       /* And the final one from recover worker */
>>>> +       pm_runtime_put_sync(&gpu->pdev->dev);
>>>> +
>>>> +       for (i = gpu->active_submits; i > 0; i--)
>>>> +               pm_runtime_get(&gpu->pdev->dev);
>>>> +
>>>> +       pm_runtime_get_sync(&gpu->pdev->dev);
>>> In response to v1, Rob suggested pm_runtime_force_suspend/resume().
>>> Those seem like they would work to me, too. Why not use them?
>> Quoting my previous response which I seem to have sent only to Freedreno
>> list:
>>
>> "I believe it is supposed to be used only during system sleep state
>> transitions. Btw, we don't want pm_runtime_get() calls from elsewhere to
>> fail by disabling RPM here."
> The comment about not wanting other runpm calls to fail is valid.. but
> that is also solveable, ie. by holding a lock around runpm calls.
> Which I think we need to do anyways, otherwise looping over
> gpu->active_submits is racey..
>
> I think pm_runtime_force_suspend/resume() is the least-bad option.. or
> at least I'm not seeing any obvious alternative that is better
>
> BR,
> -R
We are holding gpu->lock here which will block further submissions from 
scheduler. Will active_submits still race?

It is possible that there is another thread which successfully completed 
pm_runtime_get() and while it access the hardware, we pulled the plug on 
regulator/clock here. That will result in obvious device crash. So I can 
think of 2 solutions:

1. wrap *every* pm_runtime_get/put with a mutex. Something like:
             mutex_lock();
             pm_runtime_get();
             < ... access hardware here >>
             pm_runtime_put();
             mutex_unlock();

2. Drop runtime votes from every submit in recover worker and wait/poll 
for regulator to collapse in case there are transient votes on 
regulator  from other threads/subsystems.

Option (2) seems simpler to me.  What do you think?

-Akhil.


WARNING: multiple messages have this Message-ID (diff)
From: Akhil P Oommen <quic_akhilpo@quicinc.com>
To: Rob Clark <robdclark@gmail.com>
Cc: Doug Anderson <dianders@chromium.org>,
	Sean Paul <sean@poorly.run>, Jonathan Marek <jonathan@marek.ca>,
	David Airlie <airlied@linux.ie>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Konrad Dybcio <konrad.dybcio@somainline.org>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Matthias Kaehlcke <mka@chromium.org>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>,
	Jordan Crouse <jordan@cosmicpenguin.net>,
	freedreno <freedreno@lists.freedesktop.org>,
	Chia-I Wu <olvaffe@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Freedreno] [PATCH v2 3/7] drm/msm: Fix cx collapse issue during recovery
Date: Wed, 13 Jul 2022 00:45:33 +0530	[thread overview]
Message-ID: <3c150bc9-68a0-7a35-6511-f80a42e8945b@quicinc.com> (raw)
In-Reply-To: <CAF6AEGvjD3LRm40mPr4n+jzx71WmwYpVWizUDLct9cgafjFRyw@mail.gmail.com>

On 7/12/2022 10:14 PM, Rob Clark wrote:
> On Mon, Jul 11, 2022 at 10:05 PM Akhil P Oommen
> <quic_akhilpo@quicinc.com> wrote:
>> On 7/12/2022 4:52 AM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Fri, Jul 8, 2022 at 11:00 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>>> There are some hardware logic under CX domain. For a successful
>>>> recovery, we should ensure cx headswitch collapses to ensure all the
>>>> stale states are cleard out. This is especially true to for a6xx family
>>>> where we can GMU co-processor.
>>>>
>>>> Currently, cx doesn't collapse due to a devlink between gpu and its
>>>> smmu. So the *struct gpu device* needs to be runtime suspended to ensure
>>>> that the iommu driver removes its vote on cx gdsc.
>>>>
>>>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>>>> ---
>>>>
>>>> (no changes since v1)
>>>>
>>>>    drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 16 ++++++++++++++--
>>>>    drivers/gpu/drm/msm/msm_gpu.c         |  2 --
>>>>    2 files changed, 14 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> index 4d50110..7ed347c 100644
>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>> @@ -1278,8 +1278,20 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>            */
>>>>           gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>
>>>> -       gpu->funcs->pm_suspend(gpu);
>>>> -       gpu->funcs->pm_resume(gpu);
>>>> +       /*
>>>> +        * Now drop all the pm_runtime usage count to allow cx gdsc to collapse.
>>>> +        * First drop the usage count from all active submits
>>>> +        */
>>>> +       for (i = gpu->active_submits; i > 0; i--)
>>>> +               pm_runtime_put(&gpu->pdev->dev);
>>>> +
>>>> +       /* And the final one from recover worker */
>>>> +       pm_runtime_put_sync(&gpu->pdev->dev);
>>>> +
>>>> +       for (i = gpu->active_submits; i > 0; i--)
>>>> +               pm_runtime_get(&gpu->pdev->dev);
>>>> +
>>>> +       pm_runtime_get_sync(&gpu->pdev->dev);
>>> In response to v1, Rob suggested pm_runtime_force_suspend/resume().
>>> Those seem like they would work to me, too. Why not use them?
>> Quoting my previous response which I seem to have sent only to Freedreno
>> list:
>>
>> "I believe it is supposed to be used only during system sleep state
>> transitions. Btw, we don't want pm_runtime_get() calls from elsewhere to
>> fail by disabling RPM here."
> The comment about not wanting other runpm calls to fail is valid.. but
> that is also solveable, ie. by holding a lock around runpm calls.
> Which I think we need to do anyways, otherwise looping over
> gpu->active_submits is racey..
>
> I think pm_runtime_force_suspend/resume() is the least-bad option.. or
> at least I'm not seeing any obvious alternative that is better
>
> BR,
> -R
We are holding gpu->lock here which will block further submissions from 
scheduler. Will active_submits still race?

It is possible that there is another thread which successfully completed 
pm_runtime_get() and while it access the hardware, we pulled the plug on 
regulator/clock here. That will result in obvious device crash. So I can 
think of 2 solutions:

1. wrap *every* pm_runtime_get/put with a mutex. Something like:
             mutex_lock();
             pm_runtime_get();
             < ... access hardware here >>
             pm_runtime_put();
             mutex_unlock();

2. Drop runtime votes from every submit in recover worker and wait/poll 
for regulator to collapse in case there are transient votes on 
regulator  from other threads/subsystems.

Option (2) seems simpler to me.  What do you think?

-Akhil.


  reply	other threads:[~2022-07-12 19:15 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-09  5:59 [PATCH v2 0/7] Improve GPU Recovery Akhil P Oommen
2022-07-09  5:59 ` Akhil P Oommen
2022-07-09  5:59 ` [PATCH v2 1/7] drm/msm: Remove unnecessary pm_runtime_get/put Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen
2022-07-09  5:59 ` [PATCH v2 2/7] drm/msm: Correct pm_runtime votes in recover worker Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen
2022-07-09  5:59 ` [PATCH v2 3/7] drm/msm: Fix cx collapse issue during recovery Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen
2022-07-11 23:22   ` Doug Anderson
2022-07-11 23:22     ` Doug Anderson
2022-07-12  5:04     ` [Freedreno] " Akhil P Oommen
2022-07-12  5:04       ` Akhil P Oommen
2022-07-12 16:44       ` Rob Clark
2022-07-12 16:44         ` Rob Clark
2022-07-12 19:15         ` Akhil P Oommen [this message]
2022-07-12 19:15           ` Akhil P Oommen
2022-07-20 18:06           ` Rob Clark
2022-07-20 18:06             ` Rob Clark
2022-07-20 20:38             ` Akhil P Oommen
2022-07-20 20:38               ` Akhil P Oommen
2022-07-22 17:25               ` Akhil P Oommen
2022-07-22 17:25                 ` Akhil P Oommen
2022-07-09  5:59 ` [PATCH v2 4/7] drm/msm: Ensure cx gdsc collapse " Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen
2022-07-09  5:59 ` [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu register list Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen
2022-07-11 23:27   ` Doug Anderson
2022-07-11 23:27     ` Doug Anderson
2022-07-14  5:40     ` Akhil P Oommen
2022-07-14  5:40       ` Akhil P Oommen
2022-07-19  4:07       ` [Freedreno] " Akhil P Oommen
2022-07-19  4:07         ` Akhil P Oommen
2022-07-19  5:49         ` Stephen Boyd
2022-07-19  5:49           ` Stephen Boyd
2022-07-19  6:37           ` Akhil P Oommen
2022-07-19  6:37             ` Akhil P Oommen
2022-07-19  7:19             ` Stephen Boyd
2022-07-19  7:19               ` Stephen Boyd
2022-07-19  9:56               ` Rajendra Nayak
2022-07-19  9:56                 ` Rajendra Nayak
2022-07-20  6:04                 ` Akhil P Oommen
2022-07-20  6:04                   ` Akhil P Oommen
2022-07-21 16:04                   ` Akhil P Oommen
2022-07-21 16:04                     ` Akhil P Oommen
2022-07-22 15:28                     ` Rob Clark
2022-07-22 15:28                       ` Rob Clark
2022-07-09  5:59 ` [PATCH v2 6/7] drm/msm/a6xx: Improve gpu recovery sequence Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen
2022-07-09  5:59 ` [PATCH v2 7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure Akhil P Oommen
2022-07-09  5:59   ` Akhil P Oommen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3c150bc9-68a0-7a35-6511-f80a42e8945b@quicinc.com \
    --to=quic_akhilpo@quicinc.com \
    --cc=airlied@linux.ie \
    --cc=bjorn.andersson@linaro.org \
    --cc=dianders@chromium.org \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=jonathan@marek.ca \
    --cc=jordan@cosmicpenguin.net \
    --cc=konrad.dybcio@somainline.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mka@chromium.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=robdclark@gmail.com \
    --cc=sean@poorly.run \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.