Re: [PATCH] drm/lima: Use delayed timer as default in devfreq profile

From: Robin Murphy <robin.murphy@arm.com>
To: Qiang Yu <yuq825@gmail.com>, Lukasz Luba <lukasz.luba@arm.com>
Cc: lima@lists.freedesktop.org, David Airlie <airlied@linux.ie>,
	Christian Hewitt <christianshewitt@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] drm/lima: Use delayed timer as default in devfreq profile
Date: Thu, 4 Feb 2021 13:39:00 +0000	[thread overview]
Message-ID: <0afa6299-1c35-ab98-702e-8dcd168bcaac@arm.com> (raw)
In-Reply-To: <CAKGbVbvTzmj=3tAyNyDRU8autb+de8R9dc6ohBTuM5miJV4cWg@mail.gmail.com>

On 2021-02-03 02:01, Qiang Yu wrote:
> On Tue, Feb 2, 2021 at 10:02 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>
>>
>>
>> On 2/2/21 1:01 AM, Qiang Yu wrote:
>>> Hi Lukasz,
>>>
>>> Thanks for the explanation. So the deferred timer option makes a mistake that
>>> when GPU goes from idle to busy for only one poll periodic, in this
>>> case 50ms, right?
>>
>> Not exactly. Driver sets the polling interval to 50ms (in this case)
>> because it needs ~3-frame average load (in 60fps). I have discovered the
>> issue quite recently that on systems with 2 CPUs or more, the devfreq
>> core is not monitoring the devices even for seconds. Therefore, we might
>> end up with quite big amount of work that GPU is doing, but we don't
>> know about it. Devfreq core didn't check <- timer didn't fired. Then
>> suddenly that CPU, which had the deferred timer registered last time,
>> is waking up and timer triggers to check our device. We get the stats,
>> but they might be showing load from 1sec not 50ms. We feed them into
>> governor. Governor sees the new load, but was tested and configured for
>> 50ms, so it might try to rise the frequency to max. The GPU work might
>> be already lower and there is no need for such freq. Then the CPU goes
>> idle again, so no devfreq core check for next e.g. 1sec, but the
>> frequency stays at max OPP and we burn power.
>>
>> So, it's completely unreliable. We might stuck at min frequency and
>> suffer the frame drops, or sometimes stuck to max freq and burn more
>> power when there is no such need.
>>
>> Similar for thermal governor, which is confused by this old stats and
>> long period stats, longer than 50ms.
>>
>> Stats from last e.g. ~1sec tells you nothing about real recent GPU
>> workload.
> Oh, right, I missed this case.
> 
>>
>>> But delayed timer will wakeup CPU every 50ms even when system is idle, will this
>>> cause more power consumption for the case like phone suspend?
>>
>> No, in case of phone suspend it won't increase the power consumption.
>> The device won't be woken up, it will stay in suspend.
> I mean the CPU is waked up frequently by timer when phone suspend,
> not the whole device (like the display).
> 
> Seems it's better to have deferred timer when device is suspended for
> power saving,
> and delayed timer when device in working state. User knows this and
> can use sysfs
> to change it.

Doesn't devfreq_suspend_device() already cancel any timer work either 
way in that case?

Robin.

> Set the delayed timer as default is reasonable, so patch is:
> Reviewed-by: Qiang Yu <yuq825@gmail.com>
> 
> Regards,
> Qiang
> 
>>
>> Regards,
>> Lukasz
>>
>>
>>>
>>> Regards,
>>> Qiang
>>>
>>>
>>> On Mon, Feb 1, 2021 at 5:53 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>>>>
>>>> Hi Qiang,
>>>>
>>>> On 1/30/21 1:51 PM, Qiang Yu wrote:
>>>>> Thanks for the patch. But I can't observe any difference on glmark2
>>>>> with or without this patch.
>>>>> Maybe you can provide other test which can benefit from it.
>>>>
>>>> This is a design problem and has impact on the whole system.
>>>> There is a few issues. When the device is not checked and there are
>>>> long delays between last check and current, the history is broken.
>>>> It confuses the devfreq governor and thermal governor (Intelligent Power
>>>> Allocation (IPA)). Thermal governor works on stale stats data and makes
>>>> stupid decisions, because there is no new stats (device not checked).
>>>> Similar applies to devfreq simple_ondemand governor, where it 'tires' to
>>>> work on a loooong period even 3sec and make prediction for the next
>>>> frequency based on it (which is broken).
>>>>
>>>> How it should be done: constant reliable check is needed, then:
>>>> - period is guaranteed and has fixed size, e.g 50ms or 100ms.
>>>> - device status is quite recent so thermal devfreq cooling provides
>>>>      'fresh' data into thermal governor
>>>>
>>>> This would prevent odd behavior and solve the broken cases.
>>>>
>>>>>
>>>>> Considering it will wake up CPU more frequently, and user may choose
>>>>> to change this by sysfs,
>>>>> I'd like to not apply it.
>>>>
>>>> The deferred timer for GPU is wrong option, for UFS or eMMC makes more
>>>> sense. It's also not recommended for NoC busses. I've discovered that
>>>> some time ago and proposed to have option to switch into delayed timer.
>>>> Trust me, it wasn't obvious to find out that this missing check has
>>>> those impacts. So the other engineers or users might not know that some
>>>> problems they faces (especially when the device load is changing) is due
>>>> to this delayed vs deffered timer and they will change it in the sysfs.
>>>>
>>>> Regards,
>>>> Lukasz
>>>>
>>>>>
>>>>> Regards,
>>>>> Qiang
>>>>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>