From: Akhil P Oommen <quic_akhilpo@quicinc.com>
To: Rob Clark <robdclark@gmail.com>
Cc: Doug Anderson <dianders@chromium.org>,
Sean Paul <sean@poorly.run>, Jonathan Marek <jonathan@marek.ca>,
David Airlie <airlied@linux.ie>,
linux-arm-msm <linux-arm-msm@vger.kernel.org>,
Konrad Dybcio <konrad.dybcio@somainline.org>,
Abhinav Kumar <quic_abhinavk@quicinc.com>,
dri-devel <dri-devel@lists.freedesktop.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>,
Matthias Kaehlcke <mka@chromium.org>,
"Daniel Vetter" <daniel@ffwll.ch>,
Dmitry Baryshkov <dmitry.baryshkov@linaro.org>,
Jordan Crouse <jordan@cosmicpenguin.net>,
freedreno <freedreno@lists.freedesktop.org>,
Chia-I Wu <olvaffe@gmail.com>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Freedreno] [PATCH v2 3/7] drm/msm: Fix cx collapse issue during recovery
Date: Thu, 21 Jul 2022 02:08:32 +0530 [thread overview]
Message-ID: <b19c361f-025b-db02-debe-8b4bbe1369dd@quicinc.com> (raw)
In-Reply-To: <CAF6AEGsQqE+5iE-=ja96wS6EMN1K1PzCa2fRA6DvQWwyqBq3NA@mail.gmail.com>
On 7/20/2022 11:36 PM, Rob Clark wrote:
> On Tue, Jul 12, 2022 at 12:15 PM Akhil P Oommen
> <quic_akhilpo@quicinc.com> wrote:
>> On 7/12/2022 10:14 PM, Rob Clark wrote:
>>> On Mon, Jul 11, 2022 at 10:05 PM Akhil P Oommen
>>> <quic_akhilpo@quicinc.com> wrote:
>>>> On 7/12/2022 4:52 AM, Doug Anderson wrote:
>>>>> Hi,
>>>>>
>>>>> On Fri, Jul 8, 2022 at 11:00 PM Akhil P Oommen <quic_akhilpo@quicinc.com> wrote:
>>>>>> There are some hardware logic under CX domain. For a successful
>>>>>> recovery, we should ensure cx headswitch collapses to ensure all the
>>>>>> stale states are cleard out. This is especially true to for a6xx family
>>>>>> where we can GMU co-processor.
>>>>>>
>>>>>> Currently, cx doesn't collapse due to a devlink between gpu and its
>>>>>> smmu. So the *struct gpu device* needs to be runtime suspended to ensure
>>>>>> that the iommu driver removes its vote on cx gdsc.
>>>>>>
>>>>>> Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>>>>>> ---
>>>>>>
>>>>>> (no changes since v1)
>>>>>>
>>>>>> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 16 ++++++++++++++--
>>>>>> drivers/gpu/drm/msm/msm_gpu.c | 2 --
>>>>>> 2 files changed, 14 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>> index 4d50110..7ed347c 100644
>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>>>>>> @@ -1278,8 +1278,20 @@ static void a6xx_recover(struct msm_gpu *gpu)
>>>>>> */
>>>>>> gmu_write(&a6xx_gpu->gmu, REG_A6XX_GMU_GMU_PWR_COL_KEEPALIVE, 0);
>>>>>>
>>>>>> - gpu->funcs->pm_suspend(gpu);
>>>>>> - gpu->funcs->pm_resume(gpu);
>>>>>> + /*
>>>>>> + * Now drop all the pm_runtime usage count to allow cx gdsc to collapse.
>>>>>> + * First drop the usage count from all active submits
>>>>>> + */
>>>>>> + for (i = gpu->active_submits; i > 0; i--)
>>>>>> + pm_runtime_put(&gpu->pdev->dev);
>>>>>> +
>>>>>> + /* And the final one from recover worker */
>>>>>> + pm_runtime_put_sync(&gpu->pdev->dev);
>>>>>> +
>>>>>> + for (i = gpu->active_submits; i > 0; i--)
>>>>>> + pm_runtime_get(&gpu->pdev->dev);
>>>>>> +
>>>>>> + pm_runtime_get_sync(&gpu->pdev->dev);
>>>>> In response to v1, Rob suggested pm_runtime_force_suspend/resume().
>>>>> Those seem like they would work to me, too. Why not use them?
>>>> Quoting my previous response which I seem to have sent only to Freedreno
>>>> list:
>>>>
>>>> "I believe it is supposed to be used only during system sleep state
>>>> transitions. Btw, we don't want pm_runtime_get() calls from elsewhere to
>>>> fail by disabling RPM here."
>>> The comment about not wanting other runpm calls to fail is valid.. but
>>> that is also solveable, ie. by holding a lock around runpm calls.
>>> Which I think we need to do anyways, otherwise looping over
>>> gpu->active_submits is racey..
>>>
>>> I think pm_runtime_force_suspend/resume() is the least-bad option.. or
>>> at least I'm not seeing any obvious alternative that is better
>>>
>>> BR,
>>> -R
>> We are holding gpu->lock here which will block further submissions from
>> scheduler. Will active_submits still race?
>>
>> It is possible that there is another thread which successfully completed
>> pm_runtime_get() and while it access the hardware, we pulled the plug on
>> regulator/clock here. That will result in obvious device crash. So I can
>> think of 2 solutions:
>>
>> 1. wrap *every* pm_runtime_get/put with a mutex. Something like:
>> mutex_lock();
>> pm_runtime_get();
>> < ... access hardware here >>
>> pm_runtime_put();
>> mutex_unlock();
>>
>> 2. Drop runtime votes from every submit in recover worker and wait/poll
>> for regulator to collapse in case there are transient votes on
>> regulator from other threads/subsystems.
>>
>> Option (2) seems simpler to me. What do you think?
>>
> But I think without #1 you could still be racing w/ some other path
> that touches the hw, like devfreq, right. They could be holding a
> runpm ref, so even if you loop over active_submits decrementing the
> runpm ref, it still doesn't drop to zero
>
> BR,
> -R
Yes, you are right. There could be some transient votes from other
threads/drivers/subsystem. This is the reason we need to poll for cx
gdsc collapse in the next patch.Even with #1, it is difficult to
coordinate with smmu driver and close to impossible with tz/hyp.
-Akhil.
next prev parent reply other threads:[~2022-07-20 20:39 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-09 5:59 [PATCH v2 0/7] Improve GPU Recovery Akhil P Oommen
2022-07-09 5:59 ` [PATCH v2 1/7] drm/msm: Remove unnecessary pm_runtime_get/put Akhil P Oommen
2022-07-09 5:59 ` [PATCH v2 2/7] drm/msm: Correct pm_runtime votes in recover worker Akhil P Oommen
2022-07-09 5:59 ` [PATCH v2 3/7] drm/msm: Fix cx collapse issue during recovery Akhil P Oommen
2022-07-11 23:22 ` Doug Anderson
2022-07-12 5:04 ` [Freedreno] " Akhil P Oommen
2022-07-12 16:44 ` Rob Clark
2022-07-12 19:15 ` Akhil P Oommen
2022-07-20 18:06 ` Rob Clark
2022-07-20 20:38 ` Akhil P Oommen [this message]
2022-07-22 17:25 ` Akhil P Oommen
2022-07-09 5:59 ` [PATCH v2 4/7] drm/msm: Ensure cx gdsc collapse " Akhil P Oommen
2022-07-09 5:59 ` [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu register list Akhil P Oommen
2022-07-11 23:27 ` Doug Anderson
2022-07-14 5:40 ` Akhil P Oommen
2022-07-19 4:07 ` [Freedreno] " Akhil P Oommen
2022-07-19 5:49 ` Stephen Boyd
2022-07-19 6:37 ` Akhil P Oommen
2022-07-19 7:19 ` Stephen Boyd
2022-07-19 9:56 ` Rajendra Nayak
2022-07-20 6:04 ` Akhil P Oommen
2022-07-21 16:04 ` Akhil P Oommen
2022-07-22 15:28 ` Rob Clark
2022-07-09 5:59 ` [PATCH v2 6/7] drm/msm/a6xx: Improve gpu recovery sequence Akhil P Oommen
2022-07-09 5:59 ` [PATCH v2 7/7] drm/msm/a6xx: Handle GMU prepare-slumber hfi failure Akhil P Oommen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b19c361f-025b-db02-debe-8b4bbe1369dd@quicinc.com \
--to=quic_akhilpo@quicinc.com \
--cc=airlied@linux.ie \
--cc=bjorn.andersson@linaro.org \
--cc=daniel@ffwll.ch \
--cc=dianders@chromium.org \
--cc=dmitry.baryshkov@linaro.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=freedreno@lists.freedesktop.org \
--cc=jonathan@marek.ca \
--cc=jordan@cosmicpenguin.net \
--cc=konrad.dybcio@somainline.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mka@chromium.org \
--cc=olvaffe@gmail.com \
--cc=quic_abhinavk@quicinc.com \
--cc=robdclark@gmail.com \
--cc=sean@poorly.run \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).