From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F3D0C4338F for ; Mon, 9 Aug 2021 17:27:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83D7260E76 for ; Mon, 9 Aug 2021 17:27:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234268AbhHIR1s (ORCPT ); Mon, 9 Aug 2021 13:27:48 -0400 Received: from so254-9.mailgun.net ([198.61.254.9]:52851 "EHLO so254-9.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233919AbhHIR1p (ORCPT ); Mon, 9 Aug 2021 13:27:45 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1628530045; h=Content-Transfer-Encoding: Content-Type: In-Reply-To: MIME-Version: Date: Message-ID: From: References: Cc: To: Subject: Sender; bh=v2REGmxzCsapzvkmqe+EkF949O1qYsvxNvNx1D9esUY=; b=DVMeSRBnKOIUl3+DvYUejcOYpYxsh6k2LQka+0Cq4E9WOk1l6g7UwZPnjcshQLG3a921gIrz YYUhtAtE11myMqL6vHOLl9OXtqyHgIKiHauBd2UniS0Op8xpEfXGba7N2cMxoRGTYQwqqCDG bkxzYDuiRgOl/f+7LzGXL/XGQEY= X-Mailgun-Sending-Ip: 198.61.254.9 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n01.prod.us-east-1.postgun.com with SMTP id 6111655e66ff1079042ed386 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Mon, 09 Aug 2021 17:26:54 GMT Sender: akhilpo=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 23F14C4360C; Mon, 9 Aug 2021 17:26:54 +0000 (UTC) Received: from [192.168.1.105] (unknown [59.89.230.160]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: akhilpo) by smtp.codeaurora.org (Postfix) with ESMTPSA id 4C23AC433F1; Mon, 9 Aug 2021 17:26:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 4C23AC433F1 Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=fail smtp.mailfrom=akhilpo@codeaurora.org Subject: Re: [PATCH] drm/msm: Disable frequency clamping on a630 To: Caleb Connolly , Rob Clark Cc: dri-devel , freedreno , linux-arm-msm , Rob Clark , Sean Paul , David Airlie , Daniel Vetter , Jordan Crouse , Jonathan Marek , Sai Prakash Ranjan , Bjorn Andersson , Sharat Masetty , open list , Stephen Boyd References: <20210729183942.2839925-1-robdclark@gmail.com> <1a38a590-a64e-58ef-1bbf-0ae49c004d05@linaro.org> <8aa590be-6a9f-9343-e897-18e86ea48202@linaro.org> <6eefedb2-9e59-56d2-7703-2faf6cb0ca3a@codeaurora.org> <83ecbe74-caf0-6c42-e6f5-4887b3b534c6@linaro.org> From: Akhil P Oommen Message-ID: <53d3e5b7-9dc0-a806-70e9-b9b5ff877462@codeaurora.org> Date: Mon, 9 Aug 2021 22:56:40 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <83ecbe74-caf0-6c42-e6f5-4887b3b534c6@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/9/2021 9:48 PM, Caleb Connolly wrote: > > > On 09/08/2021 17:12, Rob Clark wrote: >> On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen >> wrote: >>> >>> On 8/8/2021 10:22 PM, Rob Clark wrote: >>>> On Sun, Aug 8, 2021 at 7:33 AM Caleb Connolly >>>> wrote: >>>>> >>>>> >>>>> >>>>> On 07/08/2021 21:04, Rob Clark wrote: >>>>>> On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly >>>>>> wrote: >>>>>>> >>>>>>> Hi Rob, Akhil, >>>>>>> >>>>>>> On 29/07/2021 21:53, Rob Clark wrote: >>>>>>>> On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 29/07/2021 21:24, Rob Clark wrote: >>>>>>>>>> On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Rob, >>>>>>>>>>> >>>>>>>>>>> I've done some more testing! It looks like before that patch >>>>>>>>>>> ("drm/msm: Devfreq tuning") the GPU would never get above >>>>>>>>>>> the second frequency in the OPP table (342MHz) (at least, not >>>>>>>>>>> in glxgears). With the patch applied it would more >>>>>>>>>>> aggressively jump up to the max frequency which seems to be >>>>>>>>>>> unstable at the default regulator voltages. >>>>>>>>>> >>>>>>>>>> *ohh*, yeah, ok, that would explain it >>>>>>>>>> >>>>>>>>>>> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up >>>>>>>>>>> to 0.988v (instead of the stock 0.516v) makes the GPU stable >>>>>>>>>>> at the higher frequencies. >>>>>>>>>>> >>>>>>>>>>> Applying this patch reverts the behaviour, and the GPU never >>>>>>>>>>> goes above 342MHz in glxgears, losing ~30% performance in >>>>>>>>>>> glxgear. >>>>>>>>>>> >>>>>>>>>>> I think (?) that enabling CPR support would be the proper >>>>>>>>>>> solution to this - that would ensure that the regulators run >>>>>>>>>>> at the voltage the hardware needs to be stable. >>>>>>>>>>> >>>>>>>>>>> Is hacking the voltage higher (although ideally not quite >>>>>>>>>>> that high) an acceptable short term solution until we have >>>>>>>>>>> CPR? Or would it be safer to just not make use of the higher >>>>>>>>>>> frequencies on a630 for now? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is >>>>>>>>>> already >>>>>>>>>> on CC and I added sboyd, maybe one of them knows better. >>>>>>>>>> >>>>>>>>>> In the short term, removing the higher problematic OPPs from >>>>>>>>>> dts might >>>>>>>>>> be a better option than this patch (which I'm dropping), since >>>>>>>>>> there >>>>>>>>>> is nothing stopping other workloads from hitting higher OPPs. >>>>>>>>> Oh yeah that sounds like a more sensible workaround than mine . >>>>>>>>>> >>>>>>>>>> I'm slightly curious why I didn't have problems at higher OPPs >>>>>>>>>> on my >>>>>>>>>> c630 laptop (sdm850) >>>>>>>>> Perhaps you won the sillicon lottery - iirc sdm850 is binned >>>>>>>>> for higher clocks as is out of the factory. >>>>>>>>> >>>>>>>>> Would it be best to drop the OPPs for all devices? Or just >>>>>>>>> those affected? I guess it's possible another c630 might >>>>>>>>> crash where yours doesn't? >>>>>>>> >>>>>>>> I've not heard any reports of similar issues from the handful of >>>>>>>> other >>>>>>>> folks with c630's on #aarch64-laptops.. but I can't really say >>>>>>>> if that >>>>>>>> is luck or not. >>>>>>> It looks like this affects at least the OnePlus 6 and PocoPhone >>>>>>> F1, I've done some more poking and the following diff >>>>>>> seems to fix the stability issues completely, it seems the delay >>>>>>> is required to let the update propagate. >>>>>>> >>>>>>> This doesn't feel like the right fix, but hopefully it's enough >>>>>>> to come up with a better solution than disabling the new >>>>>>> devfreq behaviour on a630. >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >>>>>>> b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >>>>>>> index d7cec7f0dde0..69e2a5e84dae 100644 >>>>>>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >>>>>>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c >>>>>>> @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, >>>>>>> struct dev_pm_opp *opp) >>>>>>>                     return; >>>>>>>             } >>>>>>> >>>>>>> +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp); >>>>>>> + >>>>>>> +       usleep_range(300, 500); >>>>>>> + >>>>>> >>> >>> I am a bit confused. We don't define a power domain for gpu in dt, >>> correct? Then what exactly set_opp do here? Do you think this usleep is >>> what is helping here somehow to mask the issue? > The power domains (for cx and gx) are defined in the GMU DT, the OPPs in > the GPU DT. For the sake of simplicity I'll refer to the lowest > frequency (257000000) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as > the "min" state, and the highest frequency (710000000) and OPP level > (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in > sdm845.dtsi under the gpu node. > > The new devfreq behaviour unmasks what I think is a driver bug, it > inadvertently puts much more strain on the GPU regulators than they > usually get. With the new behaviour the GPU jumps from it's min state to > the max state and back again extremely rapidly under workloads as small > as refreshing UI. Where previously the GPU would rarely if ever go above > 342MHz when interacting with the device, it now jumps between min and > max many times per second. > > If my understanding is correct, the current implementation of the GMU > set freq is the following: >  - Get OPP for frequency to set >  - Push the frequency to the GMU - immediately updating the core clock >  - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds > up somewhere in power management code and causes the gx regulator level > to be updated Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. We were using a different api earlier which got deprecated - dev_pm_opp_set_bw(). > > The regulator will then take some time to reach it's new voltage level > and stabilise. I believe that rapid transitions between min and max > state - in combination with the increased current load from the GPU core > - lead to the regulator becoming unstable (e.g. when it's requested to > transition from it's lowest to highest levels immediately after > transitioning down), the unstable voltage causes the GPU to crash. > > Sillicon lottery will of course play a role here - this is very much an > edge case and would definitely be different on a per-device and even > per-unit basis. >> >> Hmm, I thought "opp-level = RPMH_REGULATOR_LEVEL_*" did *something*, >> but tbh I'm not sure exactly what.. >> >>> I feel we should just leave the new dcvs feature (shall we call it NAP?) >>> disabled for a630 (and 10ms devfreq interval), until this is root >>> caused. > I believe this hacky workaround expresses the root cause of the issue > quite clearly, by setting the OPP first and allowing the gx regulator to > become stable before telling the GPU to change clock speeds, we avoid > the edge case and prevent the crashes. > > I took some rough measurements by adding logging to msm_devfreq_idle and > causing UI updates for ~20 seconds and that function is being called > about 30 times per second, this means the GPU is transitioning between > min (idle) state and max (active / boost) state at that frequency and > causing the issue I described above. It's likely that the usleep is > helping to mask this behaviour. > > I hope this serves as a slightly better explanation of what I perceive > to be the issue, I realise my previous explanations were not very > adequate, I apologise for all the noise. >> >> I suppose "NAP" is a reasonable name. >> >> But I think that reverting to previous behavior would not be enough, >> there is nothing stopping devfreq from jumping from min to max freq, >> which AFAIU should be enough to trigger this.  I guess that there just >> hasn't been enough testing with different game workloads on those >> phones to trigger this. > Ack >> >> That said, I haven't seen similar issues on my sdm850 laptop, where I >> defn have triggered mix->max freq transitions.. I guess it would be >> interesting to know if this issue could be reproduced on db845c, or if >> it really is board specific? > My db845c arrives this week, I'll definitely try and reproduce this. >> >> To workaround, I think we'd need to implement some way to limit that >> maximum frequency jump (and then use delayed work to continue ramping >> up the freq over time until we hit the target).. which seems like a >> lot of work if this is just a board(s) specific workaround and isn't >> needed once CPR is supported > Based on my reasoning above, I came up with the following: reducing > thrashing by preventing rapid idle/active transitions. The minimum > active time of 30ms was just used for testing, I think some number > between 2 and 4 frames would be a sensible choice - the higher the safer. > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > index d7cec7f0dde0..87f2d1085c3e 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > @@ -139,6 +139,8 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct > dev_pm_opp *opp) >                 return; >         } > > +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp); > + >         gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0); > >         gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING, > @@ -158,7 +160,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct > dev_pm_opp *opp) >         if (ret) >                 dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", > ret); > > -       dev_pm_opp_set_opp(&gpu->pdev->dev, opp); >         pm_runtime_put(gmu->dev); >  } > > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h > index 0e4b45bff2e6..0e2293bcb46d 100644 > --- a/drivers/gpu/drm/msm/msm_gpu.h > +++ b/drivers/gpu/drm/msm/msm_gpu.h > @@ -99,8 +99,8 @@ struct msm_gpu_devfreq { >         /** time: Time of last sampling period. */ >         ktime_t time; > > -       /** idle_time: Time of last transition to idle: */ > -       ktime_t idle_time; > +       /** transition_time: Time of last transition between > idle/active: */ > +       ktime_t transition_time; > >         /** >          * idle_freq: > diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c > b/drivers/gpu/drm/msm/msm_gpu_devfreq.c > index 0a1ee20296a2..774a7be33e7a 100644 > --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c > +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c > @@ -157,7 +157,7 @@ void msm_devfreq_active(struct msm_gpu *gpu) >          */ >         mutex_lock(&df->devfreq->lock); > > -       idle_time = ktime_to_ms(ktime_sub(ktime_get(), df->idle_time)); > +       idle_time = ktime_to_ms(ktime_sub(ktime_get(), > df->transition_time)); > >         /* >          * If we've been idle for a significant fraction of a polling > @@ -168,7 +168,7 @@ void msm_devfreq_active(struct msm_gpu *gpu) >                 target_freq *= 2; >         } > > -       df->idle_freq = 0; > +       df->transition_time = ktime_get();; > >         msm_devfreq_target(&gpu->pdev->dev, &target_freq, 0); > > @@ -185,6 +185,16 @@ void msm_devfreq_idle(struct msm_gpu *gpu) >  { >         struct msm_gpu_devfreq *df = &gpu->devfreq; >         unsigned long idle_freq, target_freq = 0; > +       unsigned int active_time; > + > +       active_time = ktime_to_ms(ktime_sub(ktime_get(), > df->transition_time)); > +       /* > +        * Don't go back to idle unless we've been active for at least 30ms > +        * to avoid thrashing. This basically defeats the purpose of this feature! At least, we should keep this '30' gpu specific. Does a Kconfig makes sense here?? BTW, if 300us was helping you earlier why do you want it to be 30ms now? > +        */ > +       if (active_time < 30) { > +               return; > +       } > >         /* >          * Hold devfreq lock to synchronize with get_dev_status()/ > @@ -196,7 +206,7 @@ void msm_devfreq_idle(struct msm_gpu *gpu) > >         msm_devfreq_target(&gpu->pdev->dev, &target_freq, 0); > > -       df->idle_time = ktime_get(); > +       df->transition_time = ktime_get(); >         df->idle_freq = idle_freq; > >         mutex_unlock(&df->devfreq->lock); >> >> BR, >> -R >> >>>>>> Hmm, this is going to be in the critical path on idle -> active >>>>>> transition (ie. think response time to user-input).. so we defn don't >>>>>> want to do this unconditionally.. >>>>>> >>>>>> If I understand the problem, we just want to limit how far we jump >>>>>> the >>>>>> gpu freq in one go.. maybe deleting the lowest (and perhaps highest) >>>>>> OPP would accomplish that?  Could that be done in the board(s)'s >>>>>> toplevel dts files? >>>>> That would be a workaround, however I'd really like to avoid >>>>> limiting performance as a solution if I can help it, >>>>> especially as the fix might just be "set the opp first, wait for it >>>>> to apply, then set the core clock". >>>>> >>>>> Is there a sensible way to get a callback from the opp notify >>>>> chain? Or from rpmh directly? Or is this solution really >>>>> not the right way to go? >>>> >>>> It does seem a bit strange to me that we are telling GMU to change >>>> freq before calling dev_pm_opp_set_opp()..  if dev_pm_opp_set_opp() is >>>> increasing voltage, it seems like you'd want to do that *before* >>>> increasing freq (but reverse the order when decreasing freq).. But I'm >>>> not an expert on the ways of the GMU..  maybe Akhil or Jordan knows >>>> better how this is supposed to work. >>> >>> For legacy gmu, we trigger DCVS using DCVS OOB which comes later in this >>> function. But the order between regulator and clock which you mentioned >>> is correct. >>> >>>> >>>> But the delay seems like papering something over, and I'm trying to go >>>> in the other direction and reduce latency between user input and >>>> pageflip.. >>>> >>>> BR, >>>> -R >>>> >>>>>> >>>>>> BR, >>>>>> -R >>>>>> >>>>>>>             gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0); >>>>>>> >>>>>>>             gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING, >>>>>>> @@ -158,7 +162,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, >>>>>>> struct dev_pm_opp *opp) >>>>>>>             if (ret) >>>>>>>                     dev_err(gmu->dev, "GMU set GPU frequency >>>>>>> error: %d\n", ret); >>>>>>> >>>>>>> -       dev_pm_opp_set_opp(&gpu->pdev->dev, opp); >>>>>>>             pm_runtime_put(gmu->dev); >>>>>>>      } >>>>>>>> >>>>>>>> Maybe just remove it for affected devices?  But I'll defer to >>>>>>>> Bjorn. >>>>>>>> >>>>>>>> BR, >>>>>>>> -R >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Kind Regards, >>>>>>> Caleb (they/them) >>>>> >>>>> -- >>>>> Kind Regards, >>>>> Caleb (they/them) >>> >