From: Akhil P Oommen <akhilpo@codeaurora.org>
To: Amit Pundir <amit.pundir@linaro.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Caleb Connolly <caleb.connolly@linaro.org>,
Rob Clark <robdclark@gmail.com>,
dri-devel <dri-devel@lists.freedesktop.org>,
freedreno <freedreno@lists.freedesktop.org>,
linux-arm-msm <linux-arm-msm@vger.kernel.org>,
Rob Clark <robdclark@chromium.org>, Sean Paul <sean@poorly.run>,
David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
Jordan Crouse <jordan@cosmicpenguin.net>,
Jonathan Marek <jonathan@marek.ca>,
Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
open list <linux-kernel@vger.kernel.org>,
Stephen Boyd <sboyd@kernel.org>
Subject: Re: [PATCH] drm/msm: Disable frequency clamping on a630
Date: Fri, 10 Sep 2021 01:19:50 +0530 [thread overview]
Message-ID: <ea5c23cb-0de4-3f1d-3052-c41fa9317984@codeaurora.org> (raw)
In-Reply-To: <CAMi1Hd2gmo-qzDSDpi1hwpX=N1eGM+Q5HqPSvdbq9LdqwNuK+w@mail.gmail.com>
On 9/9/2021 9:42 PM, Amit Pundir wrote:
> On Thu, 9 Sept 2021 at 17:47, Amit Pundir <amit.pundir@linaro.org> wrote:
>>
>> On Wed, 8 Sept 2021 at 07:50, Bjorn Andersson
>> <bjorn.andersson@linaro.org> wrote:
>>>
>>> On Mon 09 Aug 10:26 PDT 2021, Akhil P Oommen wrote:
>>>
>>>> On 8/9/2021 9:48 PM, Caleb Connolly wrote:
>>>>>
>>>>>
>>>>> On 09/08/2021 17:12, Rob Clark wrote:
>>>>>> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen
>>>>>> <akhilpo@codeaurora.org> wrote:
>>> [..]
>>>>>>> I am a bit confused. We don't define a power domain for gpu in dt,
>>>>>>> correct? Then what exactly set_opp do here? Do you think this usleep is
>>>>>>> what is helping here somehow to mask the issue?
>>>>> The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
>>>>> the GPU DT. For the sake of simplicity I'll refer to the lowest
>>>>> frequency (257000000) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
>>>>> the "min" state, and the highest frequency (710000000) and OPP level
>>>>> (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in
>>>>> sdm845.dtsi under the gpu node.
>>>>>
>>>>> The new devfreq behaviour unmasks what I think is a driver bug, it
>>>>> inadvertently puts much more strain on the GPU regulators than they
>>>>> usually get. With the new behaviour the GPU jumps from it's min state to
>>>>> the max state and back again extremely rapidly under workloads as small
>>>>> as refreshing UI. Where previously the GPU would rarely if ever go above
>>>>> 342MHz when interacting with the device, it now jumps between min and
>>>>> max many times per second.
>>>>>
>>>>> If my understanding is correct, the current implementation of the GMU
>>>>> set freq is the following:
>>>>> - Get OPP for frequency to set
>>>>> - Push the frequency to the GMU - immediately updating the core clock
>>>>> - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds
>>>>> up somewhere in power management code and causes the gx regulator level
>>>>> to be updated
>>>>
>>>> Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. We
>>>> were using a different api earlier which got deprecated -
>>>> dev_pm_opp_set_bw().
>>>>
>>>
>>> On the Lenovo Yoga C630 this is reproduced by starting alacritty and if
>>> I'm lucky I managed to hit a few keys before it crashes, so I spent a
>>> few hours looking into this as well...
>>>
>>> As you say, the dev_pm_opp_set_opp() will only cast a interconnect vote.
>>> The opp-level is just there for show and isn't used by anything, at
>>> least not on 845.
>>>
>>> Further more, I'm missing something in my tree, so the interconnect
>>> doesn't hit sync_state, and as such we're not actually scaling the
>>> buses. So the problem is not that Linux doesn't turn on the buses in
>>> time.
>>>
>>> So I suspect that the "AHB bus error" isn't saying that we turned off
>>> the bus, but rather that the GPU becomes unstable or something of that
>>> sort.
>>>
>>>
>>> Lastly, I reverted 9bc95570175a ("drm/msm: Devfreq tuning") and ran
>>> Aquarium for 20 minutes without a problem. I then switched the gpu
>>> devfreq governor to "userspace" and ran the following:
>>>
>>> while true; do
>>> echo 257000000 > /sys/class/devfreq/5000000.gpu/userspace/set_freq
>>> echo 710000000 > /sys/class/devfreq/5000000.gpu/userspace/set_freq
>>> done
>>>
>>> It took 19 iterations of this loop to crash the GPU.
>>
>> Ack. With your above script, I can reproduce a crash too on db845c
>> (A630) running v5.14. I didn't get any crash log though and device
>> just rebooted to USB crash mode.
>>
>> And same crash on RB5 (A650) too https://hastebin.com/raw/ejutetuwun
Are we sure this is the same issue? It could be, but I thought we were
seeing a bunch of random gpu errors (which may eventually hit device crash).
-Akhil
>
> fwiw I can't reproduce this crash on RB5 so far with v5.15-rc1 merge
> window (HEAD: 477f70cd2a67)
>
>>
>>>
>>> So the problem doesn't seem to be Rob's change, it's just that prior to
>>> it the chance to hitting it is way lower. Question is still what it is
>>> that we're triggering.
>>>
>>> Regards,
>>> Bjorn
next prev parent reply other threads:[~2021-09-09 19:50 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-29 18:39 [PATCH] drm/msm: Disable frequency clamping on a630 Rob Clark
2021-07-29 20:06 ` Caleb Connolly
2021-07-29 20:24 ` Rob Clark
2021-07-29 20:28 ` Caleb Connolly
2021-07-29 20:53 ` Rob Clark
2021-08-07 19:21 ` Caleb Connolly
2021-08-07 20:04 ` Rob Clark
2021-08-08 14:32 ` Caleb Connolly
2021-08-08 16:52 ` Rob Clark
2021-08-09 14:51 ` Akhil P Oommen
2021-08-09 16:12 ` Rob Clark
2021-08-09 16:18 ` Caleb Connolly
2021-08-09 17:26 ` Akhil P Oommen
2021-08-09 17:58 ` Rob Clark
2021-08-09 20:35 ` Caleb Connolly
2021-08-09 21:08 ` Rob Clark
2021-09-07 15:43 ` Bjorn Andersson
2021-09-08 2:21 ` Bjorn Andersson
2021-09-08 13:49 ` Caleb Connolly
2021-09-09 12:17 ` Amit Pundir
2021-09-09 16:12 ` Amit Pundir
2021-09-09 19:49 ` Akhil P Oommen [this message]
2021-09-09 20:54 ` Rob Clark
2021-09-10 17:22 ` Rob Clark
2021-09-10 17:18 ` Rob Clark
2021-09-10 17:34 ` Caleb Connolly
2021-09-13 6:15 ` Akhil P Oommen
2021-09-03 19:39 ` John Stultz
2021-09-03 20:29 ` Rob Clark
2021-09-06 8:01 ` Amit Pundir
2021-09-06 16:28 ` Rob Clark
2021-09-06 19:58 ` Amit Pundir
2021-09-06 20:50 ` Rob Clark
2021-09-06 21:27 ` Rob Clark
2021-09-07 8:18 ` Amit Pundir
2021-09-07 1:45 ` Rob Clark
2021-09-07 8:25 ` Amit Pundir
2021-09-07 14:25 ` Rob Clark
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea5c23cb-0de4-3f1d-3052-c41fa9317984@codeaurora.org \
--to=akhilpo@codeaurora.org \
--cc=airlied@linux.ie \
--cc=amit.pundir@linaro.org \
--cc=bjorn.andersson@linaro.org \
--cc=caleb.connolly@linaro.org \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=freedreno@lists.freedesktop.org \
--cc=jonathan@marek.ca \
--cc=jordan@cosmicpenguin.net \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robdclark@chromium.org \
--cc=robdclark@gmail.com \
--cc=saiprakash.ranjan@codeaurora.org \
--cc=sboyd@kernel.org \
--cc=sean@poorly.run \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).