All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akhil P Oommen <akhilpo@codeaurora.org>
To: Amit Pundir <amit.pundir@linaro.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Caleb Connolly <caleb.connolly@linaro.org>,
	Rob Clark <robdclark@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	freedreno <freedreno@lists.freedesktop.org>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Rob Clark <robdclark@chromium.org>, Sean Paul <sean@poorly.run>,
	David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	Jordan Crouse <jordan@cosmicpenguin.net>,
	Jonathan Marek <jonathan@marek.ca>,
	Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
	open list <linux-kernel@vger.kernel.org>,
	Stephen Boyd <sboyd@kernel.org>
Subject: Re: [PATCH] drm/msm: Disable frequency clamping on a630
Date: Fri, 10 Sep 2021 01:19:50 +0530	[thread overview]
Message-ID: <ea5c23cb-0de4-3f1d-3052-c41fa9317984@codeaurora.org> (raw)
In-Reply-To: <CAMi1Hd2gmo-qzDSDpi1hwpX=N1eGM+Q5HqPSvdbq9LdqwNuK+w@mail.gmail.com>

On 9/9/2021 9:42 PM, Amit Pundir wrote:
> On Thu, 9 Sept 2021 at 17:47, Amit Pundir <amit.pundir@linaro.org> wrote:
>>
>> On Wed, 8 Sept 2021 at 07:50, Bjorn Andersson
>> <bjorn.andersson@linaro.org> wrote:
>>>
>>> On Mon 09 Aug 10:26 PDT 2021, Akhil P Oommen wrote:
>>>
>>>> On 8/9/2021 9:48 PM, Caleb Connolly wrote:
>>>>>
>>>>>
>>>>> On 09/08/2021 17:12, Rob Clark wrote:
>>>>>> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen
>>>>>> <akhilpo@codeaurora.org> wrote:
>>> [..]
>>>>>>> I am a bit confused. We don't define a power domain for gpu in dt,
>>>>>>> correct? Then what exactly set_opp do here? Do you think this usleep is
>>>>>>> what is helping here somehow to mask the issue?
>>>>> The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
>>>>> the GPU DT. For the sake of simplicity I'll refer to the lowest
>>>>> frequency (257000000) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
>>>>> the "min" state, and the highest frequency (710000000) and OPP level
>>>>> (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in
>>>>> sdm845.dtsi under the gpu node.
>>>>>
>>>>> The new devfreq behaviour unmasks what I think is a driver bug, it
>>>>> inadvertently puts much more strain on the GPU regulators than they
>>>>> usually get. With the new behaviour the GPU jumps from it's min state to
>>>>> the max state and back again extremely rapidly under workloads as small
>>>>> as refreshing UI. Where previously the GPU would rarely if ever go above
>>>>> 342MHz when interacting with the device, it now jumps between min and
>>>>> max many times per second.
>>>>>
>>>>> If my understanding is correct, the current implementation of the GMU
>>>>> set freq is the following:
>>>>>    - Get OPP for frequency to set
>>>>>    - Push the frequency to the GMU - immediately updating the core clock
>>>>>    - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds
>>>>> up somewhere in power management code and causes the gx regulator level
>>>>> to be updated
>>>>
>>>> Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. We
>>>> were using a different api earlier which got deprecated -
>>>> dev_pm_opp_set_bw().
>>>>
>>>
>>> On the Lenovo Yoga C630 this is reproduced by starting alacritty and if
>>> I'm lucky I managed to hit a few keys before it crashes, so I spent a
>>> few hours looking into this as well...
>>>
>>> As you say, the dev_pm_opp_set_opp() will only cast a interconnect vote.
>>> The opp-level is just there for show and isn't used by anything, at
>>> least not on 845.
>>>
>>> Further more, I'm missing something in my tree, so the interconnect
>>> doesn't hit sync_state, and as such we're not actually scaling the
>>> buses. So the problem is not that Linux doesn't turn on the buses in
>>> time.
>>>
>>> So I suspect that the "AHB bus error" isn't saying that we turned off
>>> the bus, but rather that the GPU becomes unstable or something of that
>>> sort.
>>>
>>>
>>> Lastly, I reverted 9bc95570175a ("drm/msm: Devfreq tuning") and ran
>>> Aquarium for 20 minutes without a problem. I then switched the gpu
>>> devfreq governor to "userspace" and ran the following:
>>>
>>> while true; do
>>>    echo 257000000 > /sys/class/devfreq/5000000.gpu/userspace/set_freq
>>>    echo 710000000 > /sys/class/devfreq/5000000.gpu/userspace/set_freq
>>> done
>>>
>>> It took 19 iterations of this loop to crash the GPU.
>>
>> Ack. With your above script, I can reproduce a crash too on db845c
>> (A630) running v5.14. I didn't get any crash log though and device
>> just rebooted to USB crash mode.
>>
>> And same crash on RB5 (A650) too https://hastebin.com/raw/ejutetuwun

Are we sure this is the same issue? It could be, but I thought we were 
seeing a bunch of random gpu errors (which may eventually hit device crash).

-Akhil

> 
> fwiw I can't reproduce this crash on RB5 so far with v5.15-rc1 merge
> window (HEAD: 477f70cd2a67)
> 
>>
>>>
>>> So the problem doesn't seem to be Rob's change, it's just that prior to
>>> it the chance to hitting it is way lower. Question is still what it is
>>> that we're triggering.
>>>
>>> Regards,
>>> Bjorn


  reply	other threads:[~2021-09-09 19:50 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 18:39 [PATCH] drm/msm: Disable frequency clamping on a630 Rob Clark
2021-07-29 18:39 ` Rob Clark
2021-07-29 20:06 ` Caleb Connolly
2021-07-29 20:06   ` Caleb Connolly
2021-07-29 20:24   ` Rob Clark
2021-07-29 20:24     ` Rob Clark
2021-07-29 20:28     ` Caleb Connolly
2021-07-29 20:28       ` Caleb Connolly
2021-07-29 20:53       ` Rob Clark
2021-07-29 20:53         ` Rob Clark
2021-08-07 19:21         ` Caleb Connolly
2021-08-07 20:04           ` Rob Clark
2021-08-07 20:04             ` Rob Clark
2021-08-08 14:32             ` Caleb Connolly
2021-08-08 16:52               ` Rob Clark
2021-08-08 16:52                 ` Rob Clark
2021-08-09 14:51                 ` Akhil P Oommen
2021-08-09 16:12                   ` Rob Clark
2021-08-09 16:12                     ` Rob Clark
2021-08-09 16:18                     ` Caleb Connolly
2021-08-09 17:26                       ` Akhil P Oommen
2021-08-09 17:58                         ` Rob Clark
2021-08-09 17:58                           ` Rob Clark
2021-08-09 20:35                           ` Caleb Connolly
2021-08-09 21:08                             ` Rob Clark
2021-08-09 21:08                               ` Rob Clark
2021-09-07 15:43                               ` Bjorn Andersson
2021-09-08  2:21                         ` Bjorn Andersson
2021-09-08 13:49                           ` Caleb Connolly
2021-09-09 12:17                           ` Amit Pundir
2021-09-09 12:17                             ` Amit Pundir
2021-09-09 16:12                             ` Amit Pundir
2021-09-09 16:12                               ` Amit Pundir
2021-09-09 19:49                               ` Akhil P Oommen [this message]
2021-09-09 20:54                                 ` Rob Clark
2021-09-09 20:54                                   ` Rob Clark
2021-09-10 17:22                                   ` Rob Clark
2021-09-10 17:22                                     ` Rob Clark
2021-09-10 17:18                           ` Rob Clark
2021-09-10 17:18                             ` Rob Clark
2021-09-10 17:34                             ` Caleb Connolly
2021-09-13  6:15                               ` Akhil P Oommen
2021-09-03 19:39         ` John Stultz
2021-09-03 19:39           ` John Stultz
2021-09-03 20:29           ` Rob Clark
2021-09-03 20:29             ` Rob Clark
2021-09-06  8:01             ` Amit Pundir
2021-09-06  8:01               ` Amit Pundir
2021-09-06 16:28               ` Rob Clark
2021-09-06 16:28                 ` Rob Clark
2021-09-06 19:58                 ` Amit Pundir
2021-09-06 19:58                   ` Amit Pundir
2021-09-06 20:50                   ` Rob Clark
2021-09-06 20:50                     ` Rob Clark
2021-09-06 21:27                     ` Rob Clark
2021-09-06 21:27                       ` Rob Clark
2021-09-07  8:18                       ` Amit Pundir
2021-09-07  8:18                         ` Amit Pundir
2021-09-07  1:45                   ` Rob Clark
2021-09-07  1:45                     ` Rob Clark
2021-09-07  8:25                     ` Amit Pundir
2021-09-07  8:25                       ` Amit Pundir
2021-09-07 14:25                       ` Rob Clark
2021-09-07 14:25                         ` Rob Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea5c23cb-0de4-3f1d-3052-c41fa9317984@codeaurora.org \
    --to=akhilpo@codeaurora.org \
    --cc=airlied@linux.ie \
    --cc=amit.pundir@linaro.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=caleb.connolly@linaro.org \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=jonathan@marek.ca \
    --cc=jordan@cosmicpenguin.net \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=saiprakash.ranjan@codeaurora.org \
    --cc=sboyd@kernel.org \
    --cc=sean@poorly.run \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.