Linux-ARM-MSM Archive on lore.kernel.org
 help / color / Atom feed
* Suspect broken frequency transitions on SDM845
@ 2020-02-04 12:53 Valentin Schneider
  2020-02-12 15:53 ` Valentin Schneider
  0 siblings, 1 reply; 2+ messages in thread
From: Valentin Schneider @ 2020-02-04 12:53 UTC (permalink / raw)
  To: linux-kernel, Linux PM, linux-arm-msm
  Cc: agross, bjorn.andersson, rjw, viresh.kumar, Ionela Voinescu,
	Quentin Perret

Hi folks,

We have a simple sanity test that asserts higher frequency leads to more
work done. It's fairly straightforward - we use the userspace governor,
go through increasing frequencies, run sysbench each time and assert the
values we get are increasing monotonically. We do that for one CPU of each
"type" (i.e. once for a LITTLE and once for a big).

We've been getting some sporadic failures on the big CPUs of a Pixel3
running mainline [1], here is an example of a correct run (CPU4):

| frequency (kHz) | sysbench events |
|-----------------+-----------------|
|          825600 |             236 |
|         1286400 |             369 |
|         1689600 |             483 |
|         2092800 |             600 |
|         2476800 |             711 |

and here is a failed one (still CPU4):

| frequency (kHz) | sysbench events |
|-----------------+-----------------|
|          825600 |             234 |
|         1286400 |             369 |
|         1689600 |             449 |
|         2092800 |             600 |
|         2476800 |             355 |


We've encountered something like this in the past with the exact same
test on h960 [2] but it is much harder to reproduce reliably this time
around.

I haven't found much time to dig into this; I did get a run of ~100 
iterations with about ~15 failures, but nothing cpufreq related showed up in
dmesg. I briefly suspected fast-switch, but it's only used by schedutil, so
in this test I would expect the frequency transition to be complete before we
even try to start executing sysbench.

If anyone has the time and will to look into this, that would be much
appreciated.

[1]: https://git.linaro.org/people/amit.pundir/linux.git/log/?h=blueline-mainline-tracking
[2]: https://lore.kernel.org/lkml/d3ede0ab-b635-344c-faba-a9b1531b7f05@arm.com/

Cheers,
Valentin

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Suspect broken frequency transitions on SDM845
  2020-02-04 12:53 Suspect broken frequency transitions on SDM845 Valentin Schneider
@ 2020-02-12 15:53 ` Valentin Schneider
  0 siblings, 0 replies; 2+ messages in thread
From: Valentin Schneider @ 2020-02-12 15:53 UTC (permalink / raw)
  To: linux-kernel, Linux PM, linux-arm-msm
  Cc: agross, bjorn.andersson, rjw, viresh.kumar, Ionela Voinescu,
	Quentin Perret, Lukasz Luba

On 04/02/2020 12:53, Valentin Schneider wrote:
> We've been getting some sporadic failures on the big CPUs of a Pixel3
> running mainline [1], here is an example of a correct run (CPU4):
> 
> | frequency (kHz) | sysbench events |
> |-----------------+-----------------|
> |          825600 |             236 |
> |         1286400 |             369 |
> |         1689600 |             483 |
> |         2092800 |             600 |
> |         2476800 |             711 |
> 
> and here is a failed one (still CPU4):
> 
> | frequency (kHz) | sysbench events |
> |-----------------+-----------------|
> |          825600 |             234 |
> |         1286400 |             369 |
> |         1689600 |             449 |
> |         2092800 |             600 |
> |         2476800 |             355 |
> 
> 
> We've encountered something like this in the past with the exact same
> test on h960 [2] but it is much harder to reproduce reliably this time
> around.
> 
> I haven't found much time to dig into this; I did get a run of ~100 
> iterations with about ~15 failures, but nothing cpufreq related showed up in
> dmesg. I briefly suspected fast-switch, but it's only used by schedutil, so
> in this test I would expect the frequency transition to be complete before we
> even try to start executing sysbench.
> 

I've been adding some more debug stuff in that test case following some of
Lukasz' recommendations, and I still don't find anything that would
explain what I'm seeing.

The raw output of the test is:

        CPU0:
            300000: 61
            576000: 114
            825600: 172
            1056000: 221
            1324800: 278
            1612800: 339
        CPU4:
            825600: 236
            1286400: 368
            1689600: 479
            2092800: 420 <---}
            2476800: 339 <---} Both of these are not monotonically increasing...


/sys/kernel/debug/clk/clk_summary doesn't seem to include CPU clocks, or
doesn't get updated because I see no diff from one frequency to another
(even between lowest & highest tested frequency)


/sys/devices/system/cpu/cpu*/cpufreq/stats/time_in_state does get updated,
and seems to hint that I am getting the frequency I'm asking for:

  [2020-02-12 14:48:21,706] 2476800 39544
  [2020-02-12 14:48:23,929] 2476800 39745

There's about ~10% (200ms) missing here, but that shouldn't lead to about
half the expected performance (I get ~710 "score" out of that 2.477GHz freq
on non-failing runs).


I also made sure to read back
  /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
and I do see the value I've asked for.


Finally, I also probed the thermal state via
  /sys/class/thermal/cooling_device*/cur_state
and they are *always* 0 (i.e., no throttling) right after finishing the
execution of the benchmark, which should be close to the "hottest" point.


So AFAICT there is nothing on the cpufreq side that hints at a slow or
unsuccessful frequency transition. Can FW mess about frequencies without
notifying the kernel?

> If anyone has the time and will to look into this, that would be much
> appreciated.
> 
> [1]: https://git.linaro.org/people/amit.pundir/linux.git/log/?h=blueline-mainline-tracking
> [2]: https://lore.kernel.org/lkml/d3ede0ab-b635-344c-faba-a9b1531b7f05@arm.com/
> 
> Cheers,
> Valentin
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 12:53 Suspect broken frequency transitions on SDM845 Valentin Schneider
2020-02-12 15:53 ` Valentin Schneider

Linux-ARM-MSM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-msm/0 linux-arm-msm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-msm linux-arm-msm/ https://lore.kernel.org/linux-arm-msm \
		linux-arm-msm@vger.kernel.org
	public-inbox-index linux-arm-msm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-arm-msm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git