linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Akhil P Oommen <akhilpo@codeaurora.org>
To: Amit Pundir <amit.pundir@linaro.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Caleb Connolly <caleb.connolly@linaro.org>,
	Rob Clark <robdclark@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	freedreno <freedreno@lists.freedesktop.org>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Rob Clark <robdclark@chromium.org>, Sean Paul <sean@poorly.run>,
	David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	Jordan Crouse <jordan@cosmicpenguin.net>,
	Jonathan Marek <jonathan@marek.ca>,
	Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
	open list <linux-kernel@vger.kernel.org>,
	Stephen Boyd <sboyd@kernel.org>
Subject: Re: [PATCH] drm/msm: Disable frequency clamping on a630
Date: Fri, 10 Sep 2021 01:19:50 +0530	[thread overview]
Message-ID: <ea5c23cb-0de4-3f1d-3052-c41fa9317984@codeaurora.org> (raw)
In-Reply-To: <CAMi1Hd2gmo-qzDSDpi1hwpX=N1eGM+Q5HqPSvdbq9LdqwNuK+w@mail.gmail.com>

On 9/9/2021 9:42 PM, Amit Pundir wrote:
> On Thu, 9 Sept 2021 at 17:47, Amit Pundir <amit.pundir@linaro.org> wrote:
>>
>> On Wed, 8 Sept 2021 at 07:50, Bjorn Andersson
>> <bjorn.andersson@linaro.org> wrote:
>>>
>>> On Mon 09 Aug 10:26 PDT 2021, Akhil P Oommen wrote:
>>>
>>>> On 8/9/2021 9:48 PM, Caleb Connolly wrote:
>>>>>
>>>>>
>>>>> On 09/08/2021 17:12, Rob Clark wrote:
>>>>>> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen
>>>>>> <akhilpo@codeaurora.org> wrote:
>>> [..]
>>>>>>> I am a bit confused. We don't define a power domain for gpu in dt,
>>>>>>> correct? Then what exactly set_opp do here? Do you think this usleep is
>>>>>>> what is helping here somehow to mask the issue?
>>>>> The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
>>>>> the GPU DT. For the sake of simplicity I'll refer to the lowest
>>>>> frequency (257000000) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
>>>>> the "min" state, and the highest frequency (710000000) and OPP level
>>>>> (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in
>>>>> sdm845.dtsi under the gpu node.
>>>>>
>>>>> The new devfreq behaviour unmasks what I think is a driver bug, it
>>>>> inadvertently puts much more strain on the GPU regulators than they
>>>>> usually get. With the new behaviour the GPU jumps from it's min state to
>>>>> the max state and back again extremely rapidly under workloads as small
>>>>> as refreshing UI. Where previously the GPU would rarely if ever go above
>>>>> 342MHz when interacting with the device, it now jumps between min and
>>>>> max many times per second.
>>>>>
>>>>> If my understanding is correct, the current implementation of the GMU
>>>>> set freq is the following:
>>>>>    - Get OPP for frequency to set
>>>>>    - Push the frequency to the GMU - immediately updating the core clock
>>>>>    - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds
>>>>> up somewhere in power management code and causes the gx regulator level
>>>>> to be updated
>>>>
>>>> Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. We
>>>> were using a different api earlier which got deprecated -
>>>> dev_pm_opp_set_bw().
>>>>
>>>
>>> On the Lenovo Yoga C630 this is reproduced by starting alacritty and if
>>> I'm lucky I managed to hit a few keys before it crashes, so I spent a
>>> few hours looking into this as well...
>>>
>>> As you say, the dev_pm_opp_set_opp() will only cast a interconnect vote.
>>> The opp-level is just there for show and isn't used by anything, at
>>> least not on 845.
>>>
>>> Further more, I'm missing something in my tree, so the interconnect
>>> doesn't hit sync_state, and as such we're not actually scaling the
>>> buses. So the problem is not that Linux doesn't turn on the buses in
>>> time.
>>>
>>> So I suspect that the "AHB bus error" isn't saying that we turned off
>>> the bus, but rather that the GPU becomes unstable or something of that
>>> sort.
>>>
>>>
>>> Lastly, I reverted 9bc95570175a ("drm/msm: Devfreq tuning") and ran
>>> Aquarium for 20 minutes without a problem. I then switched the gpu
>>> devfreq governor to "userspace" and ran the following:
>>>
>>> while true; do
>>>    echo 257000000 > /sys/class/devfreq/5000000.gpu/userspace/set_freq
>>>    echo 710000000 > /sys/class/devfreq/5000000.gpu/userspace/set_freq
>>> done
>>>
>>> It took 19 iterations of this loop to crash the GPU.
>>
>> Ack. With your above script, I can reproduce a crash too on db845c
>> (A630) running v5.14. I didn't get any crash log though and device
>> just rebooted to USB crash mode.
>>
>> And same crash on RB5 (A650) too https://hastebin.com/raw/ejutetuwun

Are we sure this is the same issue? It could be, but I thought we were 
seeing a bunch of random gpu errors (which may eventually hit device crash).

-Akhil

> 
> fwiw I can't reproduce this crash on RB5 so far with v5.15-rc1 merge
> window (HEAD: 477f70cd2a67)
> 
>>
>>>
>>> So the problem doesn't seem to be Rob's change, it's just that prior to
>>> it the chance to hitting it is way lower. Question is still what it is
>>> that we're triggering.
>>>
>>> Regards,
>>> Bjorn


  reply	other threads:[~2021-09-09 19:50 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 18:39 [PATCH] drm/msm: Disable frequency clamping on a630 Rob Clark
2021-07-29 20:06 ` Caleb Connolly
2021-07-29 20:24   ` Rob Clark
2021-07-29 20:28     ` Caleb Connolly
2021-07-29 20:53       ` Rob Clark
2021-08-07 19:21         ` Caleb Connolly
2021-08-07 20:04           ` Rob Clark
2021-08-08 14:32             ` Caleb Connolly
2021-08-08 16:52               ` Rob Clark
2021-08-09 14:51                 ` Akhil P Oommen
2021-08-09 16:12                   ` Rob Clark
2021-08-09 16:18                     ` Caleb Connolly
2021-08-09 17:26                       ` Akhil P Oommen
2021-08-09 17:58                         ` Rob Clark
2021-08-09 20:35                           ` Caleb Connolly
2021-08-09 21:08                             ` Rob Clark
2021-09-07 15:43                               ` Bjorn Andersson
2021-09-08  2:21                         ` Bjorn Andersson
2021-09-08 13:49                           ` Caleb Connolly
2021-09-09 12:17                           ` Amit Pundir
2021-09-09 16:12                             ` Amit Pundir
2021-09-09 19:49                               ` Akhil P Oommen [this message]
2021-09-09 20:54                                 ` Rob Clark
2021-09-10 17:22                                   ` Rob Clark
2021-09-10 17:18                           ` Rob Clark
2021-09-10 17:34                             ` Caleb Connolly
2021-09-13  6:15                               ` Akhil P Oommen
2021-09-03 19:39         ` John Stultz
2021-09-03 20:29           ` Rob Clark
2021-09-06  8:01             ` Amit Pundir
2021-09-06 16:28               ` Rob Clark
2021-09-06 19:58                 ` Amit Pundir
2021-09-06 20:50                   ` Rob Clark
2021-09-06 21:27                     ` Rob Clark
2021-09-07  8:18                       ` Amit Pundir
2021-09-07  1:45                   ` Rob Clark
2021-09-07  8:25                     ` Amit Pundir
2021-09-07 14:25                       ` Rob Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea5c23cb-0de4-3f1d-3052-c41fa9317984@codeaurora.org \
    --to=akhilpo@codeaurora.org \
    --cc=airlied@linux.ie \
    --cc=amit.pundir@linaro.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=caleb.connolly@linaro.org \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=jonathan@marek.ca \
    --cc=jordan@cosmicpenguin.net \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=saiprakash.ranjan@codeaurora.org \
    --cc=sboyd@kernel.org \
    --cc=sean@poorly.run \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).