linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)
       [not found] <5d3057c8.1c69fb81.c6489.8ad2@mx.google.com>
@ 2019-07-18 16:20 ` Mark Brown
  2019-08-12 17:24   ` Mark Brown
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Mark Brown @ 2019-07-18 16:20 UTC (permalink / raw)
  To: khilman, Heiko Stuebner
  Cc: kernel-build-reports, linux-arm-kernel, linux-next, linux-rockchip

[-- Attachment #1: Type: text/plain, Size: 1188 bytes --]

On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:

Today's -next started failing to boot defconfig on rk3399-firefly:

> arm64:

>     defconfig:
>         gcc-8:
>             rk3399-firefly: 1 failed lab

It hits a BUG() trying to set up cpufreq:

[   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
[   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
[   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
[   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
[   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
[   87.495335] ------------[ cut here ]------------
[   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
[   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

I'm struggling to see anything relevant in the diff from yesterday, the
unlisted frequency warnings were there in the logs yesterday but no oops
and I'm not seeing any changes in cpufreq, clk or anything relevant
looking.

Full bootlog and other info can be found here:

	https://kernelci.org/boot/id/5d302d8359b51498d049e983/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)
  2019-07-18 16:20 ` next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718) Mark Brown
@ 2019-08-12 17:24   ` Mark Brown
  2019-08-13 17:26   ` Kevin Hilman
  2019-08-13 17:35   ` CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)) Kevin Hilman
  2 siblings, 0 replies; 16+ messages in thread
From: Mark Brown @ 2019-08-12 17:24 UTC (permalink / raw)
  To: khilman, Heiko Stuebner
  Cc: kernel-build-reports, linux-arm-kernel, linux-next, linux-rockchip

[-- Attachment #1: Type: text/plain, Size: 1084 bytes --]

On Thu, Jul 18, 2019 at 05:20:05PM +0100, Mark Brown wrote:
> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:

> Today's -next started failing to boot defconfig on rk3399-firefly:

> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
> [   87.495335] ------------[ cut here ]------------
> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

> I'm struggling to see anything relevant in the diff from yesterday, the
> unlisted frequency warnings were there in the logs yesterday but no oops
> and I'm not seeing any changes in cpufreq, clk or anything relevant
> looking.

> Full bootlog and other info can be found here:
> 
> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/

This is still present in -next today, though we don't have the failure
to change frequency any more - it still fails right after cpufreq
though:

	https://kernelci.org/boot/id/5d51784259b514a021f12245/
	https://kernelci.org/boot/id/5d51781559b514a007f12241/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)
  2019-07-18 16:20 ` next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718) Mark Brown
  2019-08-12 17:24   ` Mark Brown
@ 2019-08-13 17:26   ` Kevin Hilman
  2019-08-13 17:35   ` CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)) Kevin Hilman
  2 siblings, 0 replies; 16+ messages in thread
From: Kevin Hilman @ 2019-08-13 17:26 UTC (permalink / raw)
  To: Mark Brown, Heiko Stuebner
  Cc: kernel-build-reports, linux-arm-kernel, linux-next, linux-rockchip

Mark Brown <broonie@kernel.org> writes:

> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>
> Today's -next started failing to boot defconfig on rk3399-firefly:
>
>> arm64:
>
>>     defconfig:
>>         gcc-8:
>>             rk3399-firefly: 1 failed lab
>
> It hits a BUG() trying to set up cpufreq:
>
> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
> [   87.495335] ------------[ cut here ]------------
> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>
> I'm struggling to see anything relevant in the diff from yesterday, the
> unlisted frequency warnings were there in the logs yesterday but no oops
> and I'm not seeing any changes in cpufreq, clk or anything relevant
> looking.
>
> Full bootlog and other info can be found here:
>
> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/

I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
makes the firefly board start working again.

Note that the default defconfig enables the "performance" CPUfreq
governor as the default governor, so during kernel boot, it will always
switch to the max frequency.

For fun, I set the default governor to "userspace" so the kernel
wouldn't make any OPP changes, and that leads to a slightly more
informative splat[1]

There is still an OPP change happening because the detected OPP is not
one that's listed in the table, so it tries to change to a listed OPP
and fails in the bowels of clk_set_rate()

Kevin

[1] https://termbin.com/3oum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718))
  2019-07-18 16:20 ` next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718) Mark Brown
  2019-08-12 17:24   ` Mark Brown
  2019-08-13 17:26   ` Kevin Hilman
@ 2019-08-13 17:35   ` Kevin Hilman
  2019-08-14  9:01     ` Heiko Stuebner
  2 siblings, 1 reply; 16+ messages in thread
From: Kevin Hilman @ 2019-08-13 17:35 UTC (permalink / raw)
  To: Mark Brown, Heiko Stuebner
  Cc: kernel-build-reports, linux-arm-kernel, linux-next, linux-rockchip

[ resent with correct addr for linux-rockchip list ]

Mark Brown <broonie@kernel.org> writes:

> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>
> Today's -next started failing to boot defconfig on rk3399-firefly:
>
>> arm64:
>
>>     defconfig:
>>         gcc-8:
>>             rk3399-firefly: 1 failed lab
>
> It hits a BUG() trying to set up cpufreq:
>
> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
> [   87.495335] ------------[ cut here ]------------
> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>
> I'm struggling to see anything relevant in the diff from yesterday, the
> unlisted frequency warnings were there in the logs yesterday but no oops
> and I'm not seeing any changes in cpufreq, clk or anything relevant
> looking.
>
> Full bootlog and other info can be found here:
>
> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/

I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
makes the firefly board start working again.

Note that the default defconfig enables the "performance" CPUfreq
governor as the default governor, so during kernel boot, it will always
switch to the max frequency.

For fun, I set the default governor to "userspace" so the kernel
wouldn't make any OPP changes, and that leads to a slightly more
informative splat[1]

There is still an OPP change happening because the detected OPP is not
one that's listed in the table, so it tries to change to a listed OPP
and fails in the bowels of clk_set_rate()

Kevin

[1] https://termbin.com/3oum

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718))
  2019-08-13 17:35   ` CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)) Kevin Hilman
@ 2019-08-14  9:01     ` Heiko Stuebner
  2019-08-21 18:59       ` Kevin Hilman
  0 siblings, 1 reply; 16+ messages in thread
From: Heiko Stuebner @ 2019-08-14  9:01 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Mark Brown, kernel-build-reports, linux-arm-kernel, linux-next,
	linux-rockchip

Hi,

Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
> [ resent with correct addr for linux-rockchip list ]
> 
> Mark Brown <broonie@kernel.org> writes:
> 
> > On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
> >
> > Today's -next started failing to boot defconfig on rk3399-firefly:
> >
> >> arm64:
> >
> >>     defconfig:
> >>         gcc-8:
> >>             rk3399-firefly: 1 failed lab
> >
> > It hits a BUG() trying to set up cpufreq:
> >
> > [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
> > [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
> > [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
> > [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
> > [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
> > [   87.495335] ------------[ cut here ]------------
> > [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
> > [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> >
> > I'm struggling to see anything relevant in the diff from yesterday, the
> > unlisted frequency warnings were there in the logs yesterday but no oops
> > and I'm not seeing any changes in cpufreq, clk or anything relevant
> > looking.
> >
> > Full bootlog and other info can be found here:
> >
> > 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
> 
> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
> makes the firefly board start working again.
> 
> Note that the default defconfig enables the "performance" CPUfreq
> governor as the default governor, so during kernel boot, it will always
> switch to the max frequency.
> 
> For fun, I set the default governor to "userspace" so the kernel
> wouldn't make any OPP changes, and that leads to a slightly more
> informative splat[1]
> 
> There is still an OPP change happening because the detected OPP is not
> one that's listed in the table, so it tries to change to a listed OPP
> and fails in the bowels of clk_set_rate()

Though I think that might only be a symptom as well.
Both the PLL setting code as well as the actual cpu-clock implementation
is unchanged since 2017 (and runs just fine on all boards in my farm).

One source for these issues is often the regulator supplying the cpu
going haywire - aka the voltage not matching the opp.

As in this error-case it's CPU4 being set, this would mean it might
be the big cluster supplied by the external syr825 (fan5355 clone)
that might act up. In the Firefly-rk3399 case this is even stranger.

There is a discrepancy between the "fcs,suspend-voltage-selector"
between different bootloader versions (how the selection-pin is set up),
so the kernel might actually write his requested voltage to the wrong
register (not the one for actual voltage, but the second set used for
the suspend voltage).

Did you by chance swap bootloaders at some point in recent past?

I'd assume [2] might actually be the same issue last year, though
the CI-logs are not available anymore it seems.


Could you try to set the vdd_cpu_b regulator to disabled, so that
cpufreq for this cluster defers and see what happens?

I don't really have a Firefly in my boardfarm, so I let 5.3-rc run on
a Theobroma Puma which has the same regulator setup as the Firefly
and all including the performance governor did run nicely, so it really
looks like some sort of Firefly specific issue.

Heiko

> [1] https://termbin.com/3oum

[2] https://lkml.org/lkml/2018/6/19/1167





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718))
  2019-08-14  9:01     ` Heiko Stuebner
@ 2019-08-21 18:59       ` Kevin Hilman
  2019-08-23  0:32         ` CPUfreq fail on rk3399-firefly Kever Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Hilman @ 2019-08-21 18:59 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: Mark Brown, kernel-build-reports, linux-arm-kernel, linux-next,
	linux-rockchip

Hi Heiko,

Heiko Stuebner <heiko@sntech.de> writes:

> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>> [ resent with correct addr for linux-rockchip list ]
>> 
>> Mark Brown <broonie@kernel.org> writes:
>> 
>> > On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>> >
>> > Today's -next started failing to boot defconfig on rk3399-firefly:
>> >
>> >> arm64:
>> >
>> >>     defconfig:
>> >>         gcc-8:
>> >>             rk3399-firefly: 1 failed lab
>> >
>> > It hits a BUG() trying to set up cpufreq:
>> >
>> > [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>> > [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>> > [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>> > [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>> > [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>> > [   87.495335] ------------[ cut here ]------------
>> > [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>> > [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>> >
>> > I'm struggling to see anything relevant in the diff from yesterday, the
>> > unlisted frequency warnings were there in the logs yesterday but no oops
>> > and I'm not seeing any changes in cpufreq, clk or anything relevant
>> > looking.
>> >
>> > Full bootlog and other info can be found here:
>> >
>> > 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>> 
>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>> makes the firefly board start working again.
>> 
>> Note that the default defconfig enables the "performance" CPUfreq
>> governor as the default governor, so during kernel boot, it will always
>> switch to the max frequency.
>> 
>> For fun, I set the default governor to "userspace" so the kernel
>> wouldn't make any OPP changes, and that leads to a slightly more
>> informative splat[1]
>> 
>> There is still an OPP change happening because the detected OPP is not
>> one that's listed in the table, so it tries to change to a listed OPP
>> and fails in the bowels of clk_set_rate()
>
> Though I think that might only be a symptom as well.
> Both the PLL setting code as well as the actual cpu-clock implementation
> is unchanged since 2017 (and runs just fine on all boards in my farm).
>
> One source for these issues is often the regulator supplying the cpu
> going haywire - aka the voltage not matching the opp.
>
> As in this error-case it's CPU4 being set, this would mean it might
> be the big cluster supplied by the external syr825 (fan5355 clone)
> that might act up. In the Firefly-rk3399 case this is even stranger.
>
> There is a discrepancy between the "fcs,suspend-voltage-selector"
> between different bootloader versions (how the selection-pin is set up),
> so the kernel might actually write his requested voltage to the wrong
> register (not the one for actual voltage, but the second set used for
> the suspend voltage).
>
> Did you by chance swap bootloaders at some point in recent past?

No, haven't touched bootloader since I initially setup the board.

> I'd assume [2] might actually be the same issue last year, though
> the CI-logs are not available anymore it seems.
>
> Could you try to set the vdd_cpu_b regulator to disabled, so that
> cpufreq for this cluster defers and see what happens?

Yes, this change[1] definitely makes things boot reliably again, so
there's defintiely something a bit unstable with this regulator, at
least on this firefly.

Kevin

[1]
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts
index c706db0ee9ec..6b70bdcc3328 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts
@@ -454,6 +454,7 @@
 
 	vdd_cpu_b: regulator@40 {
 		compatible = "silergy,syr827";
+		status = "disabled";
 		reg = <0x40>;
 		fcs,suspend-voltage-selector = <0>;
 		regulator-name = "vdd_cpu_b";

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-21 18:59       ` Kevin Hilman
@ 2019-08-23  0:32         ` Kever Yang
  2019-08-23 16:52           ` Kevin Hilman
  0 siblings, 1 reply; 16+ messages in thread
From: Kever Yang @ 2019-08-23  0:32 UTC (permalink / raw)
  To: Kevin Hilman, Heiko Stuebner
  Cc: linux-rockchip, Mark Brown, linux-next, linux-arm-kernel,
	kernel-build-reports, 闫孝军, 张晴

Hi Kevin, Heiko,

On 2019/8/22 上午2:59, Kevin Hilman wrote:
> Hi Heiko,
>
> Heiko Stuebner <heiko@sntech.de> writes:
>
>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>> [ resent with correct addr for linux-rockchip list ]
>>>
>>> Mark Brown <broonie@kernel.org> writes:
>>>
>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>
>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>
>>>>> arm64:
>>>>>      defconfig:
>>>>>          gcc-8:
>>>>>              rk3399-firefly: 1 failed lab
>>>> It hits a BUG() trying to set up cpufreq:
>>>>
>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>> [   87.495335] ------------[ cut here ]------------
>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>
>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>> looking.
>>>>
>>>> Full bootlog and other info can be found here:
>>>>
>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>> makes the firefly board start working again.
>>>
>>> Note that the default defconfig enables the "performance" CPUfreq
>>> governor as the default governor, so during kernel boot, it will always
>>> switch to the max frequency.
>>>
>>> For fun, I set the default governor to "userspace" so the kernel
>>> wouldn't make any OPP changes, and that leads to a slightly more
>>> informative splat[1]
>>>
>>> There is still an OPP change happening because the detected OPP is not
>>> one that's listed in the table, so it tries to change to a listed OPP
>>> and fails in the bowels of clk_set_rate()
>> Though I think that might only be a symptom as well.
>> Both the PLL setting code as well as the actual cpu-clock implementation
>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>
>> One source for these issues is often the regulator supplying the cpu
>> going haywire - aka the voltage not matching the opp.
>>
>> As in this error-case it's CPU4 being set, this would mean it might
>> be the big cluster supplied by the external syr825 (fan5355 clone)
>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>
>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>> between different bootloader versions (how the selection-pin is set up),
>> so the kernel might actually write his requested voltage to the wrong
>> register (not the one for actual voltage, but the second set used for
>> the suspend voltage).
>>
>> Did you by chance swap bootloaders at some point in recent past?
> No, haven't touched bootloader since I initially setup the board.

The CPU voltage does not affect by bootloader for kernel should have its 
own opp-table,

the bootloader may only affect the center/logic power supply.

>
>> I'd assume [2] might actually be the same issue last year, though
>> the CI-logs are not available anymore it seems.
>>
>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>> cpufreq for this cluster defers and see what happens?
> Yes, this change[1] definitely makes things boot reliably again, so
> there's defintiely something a bit unstable with this regulator, at
> least on this firefly.


Is it possible to target which patch introduce this bug? This board  
should have work correctly

for a long time with upstream source code.


Thanks,

- Kever

>
> Kevin
>
> [1]
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts
> index c706db0ee9ec..6b70bdcc3328 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts
> @@ -454,6 +454,7 @@
>   
>   	vdd_cpu_b: regulator@40 {
>   		compatible = "silergy,syr827";
> +		status = "disabled";
>   		reg = <0x40>;
>   		fcs,suspend-voltage-selector = <0>;
>   		regulator-name = "vdd_cpu_b";
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-23  0:32         ` CPUfreq fail on rk3399-firefly Kever Yang
@ 2019-08-23 16:52           ` Kevin Hilman
  2019-08-23 17:03             ` Kevin Hilman
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Hilman @ 2019-08-23 16:52 UTC (permalink / raw)
  To: Kever Yang, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Kever Yang <kever.yang@rock-chips.com> writes:

> Hi Kevin, Heiko,
>
> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>> Hi Heiko,
>>
>> Heiko Stuebner <heiko@sntech.de> writes:
>>
>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>> [ resent with correct addr for linux-rockchip list ]
>>>>
>>>> Mark Brown <broonie@kernel.org> writes:
>>>>
>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>
>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>
>>>>>> arm64:
>>>>>>      defconfig:
>>>>>>          gcc-8:
>>>>>>              rk3399-firefly: 1 failed lab
>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>
>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>> [   87.495335] ------------[ cut here ]------------
>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>
>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>> looking.
>>>>>
>>>>> Full bootlog and other info can be found here:
>>>>>
>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>> makes the firefly board start working again.
>>>>
>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>> governor as the default governor, so during kernel boot, it will always
>>>> switch to the max frequency.
>>>>
>>>> For fun, I set the default governor to "userspace" so the kernel
>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>> informative splat[1]
>>>>
>>>> There is still an OPP change happening because the detected OPP is not
>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>> and fails in the bowels of clk_set_rate()
>>> Though I think that might only be a symptom as well.
>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>
>>> One source for these issues is often the regulator supplying the cpu
>>> going haywire - aka the voltage not matching the opp.
>>>
>>> As in this error-case it's CPU4 being set, this would mean it might
>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>
>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>> between different bootloader versions (how the selection-pin is set up),
>>> so the kernel might actually write his requested voltage to the wrong
>>> register (not the one for actual voltage, but the second set used for
>>> the suspend voltage).
>>>
>>> Did you by chance swap bootloaders at some point in recent past?
>> No, haven't touched bootloader since I initially setup the board.
>
> The CPU voltage does not affect by bootloader for kernel should have its 
> own opp-table,
>
> the bootloader may only affect the center/logic power supply.
>
>>
>>> I'd assume [2] might actually be the same issue last year, though
>>> the CI-logs are not available anymore it seems.
>>>
>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>> cpufreq for this cluster defers and see what happens?
>> Yes, this change[1] definitely makes things boot reliably again, so
>> there's defintiely something a bit unstable with this regulator, at
>> least on this firefly.
>
> Is it possible to target which patch introduce this bug? This board  
> should have work correctly for a long time with upstream source code.

Unfortunately, it seems to be a regular, but intermittent failure, so
bisection is not producing anything reliable.

You can see that both in mainline[1] and in linux-next[2] there are
periodic failures, but it's hard to see any patterns.

I'm starting to think that maybe the regulator on my particular board is
just starting to fail, since disabling the regulator for that cluster
prevents any voltage changes and makes things reliable again.

If we don't find a solution to this, I'll probably just have to retire
this board from my kernelCI lab (of course, I'd be happy to replace it
if someone wants to donate another one.)  :)

Kevin

[1] https://kernelci.org/boot/rk3399-firefly/job/mainline/
[2] https://kernelci.org/boot/rk3399-firefly/job/next/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-23 16:52           ` Kevin Hilman
@ 2019-08-23 17:03             ` Kevin Hilman
  2019-08-26  9:56               ` Kever Yang
  2019-09-26 22:51               ` Kevin Hilman
  0 siblings, 2 replies; 16+ messages in thread
From: Kevin Hilman @ 2019-08-23 17:03 UTC (permalink / raw)
  To: Kever Yang, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Kevin Hilman <khilman@baylibre.com> writes:

> Kever Yang <kever.yang@rock-chips.com> writes:
>
>> Hi Kevin, Heiko,
>>
>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>> Hi Heiko,
>>>
>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>
>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>
>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>
>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>
>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>
>>>>>>> arm64:
>>>>>>>      defconfig:
>>>>>>>          gcc-8:
>>>>>>>              rk3399-firefly: 1 failed lab
>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>
>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>
>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>> looking.
>>>>>>
>>>>>> Full bootlog and other info can be found here:
>>>>>>
>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>> makes the firefly board start working again.
>>>>>
>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>> governor as the default governor, so during kernel boot, it will always
>>>>> switch to the max frequency.
>>>>>
>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>> informative splat[1]
>>>>>
>>>>> There is still an OPP change happening because the detected OPP is not
>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>> and fails in the bowels of clk_set_rate()
>>>> Though I think that might only be a symptom as well.
>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>
>>>> One source for these issues is often the regulator supplying the cpu
>>>> going haywire - aka the voltage not matching the opp.
>>>>
>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>
>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>> between different bootloader versions (how the selection-pin is set up),
>>>> so the kernel might actually write his requested voltage to the wrong
>>>> register (not the one for actual voltage, but the second set used for
>>>> the suspend voltage).
>>>>
>>>> Did you by chance swap bootloaders at some point in recent past?
>>> No, haven't touched bootloader since I initially setup the board.
>>
>> The CPU voltage does not affect by bootloader for kernel should have its 
>> own opp-table,
>>
>> the bootloader may only affect the center/logic power supply.
>>
>>>
>>>> I'd assume [2] might actually be the same issue last year, though
>>>> the CI-logs are not available anymore it seems.
>>>>
>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>> cpufreq for this cluster defers and see what happens?
>>> Yes, this change[1] definitely makes things boot reliably again, so
>>> there's defintiely something a bit unstable with this regulator, at
>>> least on this firefly.
>>
>> Is it possible to target which patch introduce this bug? This board  
>> should have work correctly for a long time with upstream source code.
>
> Unfortunately, it seems to be a regular, but intermittent failure, so
> bisection is not producing anything reliable.
>
> You can see that both in mainline[1] and in linux-next[2] there are
> periodic failures, but it's hard to see any patterns.

Even worse, I (re)tested mainline for versions that were previously
passing (v5.2, v5.3-rc5) and they are also failing now.

They work again if I disable that regulator as suggested by Heiko.

So this is increasingly pointing to failing hardware.

Kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-23 17:03             ` Kevin Hilman
@ 2019-08-26  9:56               ` Kever Yang
  2019-08-26 17:09                 ` Kevin Hilman
  2019-09-26 22:51               ` Kevin Hilman
  1 sibling, 1 reply; 16+ messages in thread
From: Kever Yang @ 2019-08-26  9:56 UTC (permalink / raw)
  To: Kevin Hilman, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Hi Kevin,

     I want to have a test with my board, I can get the Image and dtb 
from the link for the job,

but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?


Thanks,

- Kever

On 2019/8/24 上午1:03, Kevin Hilman wrote:
> Kevin Hilman <khilman@baylibre.com> writes:
>
>> Kever Yang <kever.yang@rock-chips.com> writes:
>>
>>> Hi Kevin, Heiko,
>>>
>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>>> Hi Heiko,
>>>>
>>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>>
>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>>
>>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>>
>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>>
>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>>
>>>>>>>> arm64:
>>>>>>>>       defconfig:
>>>>>>>>           gcc-8:
>>>>>>>>               rk3399-firefly: 1 failed lab
>>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>>
>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>>
>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>>> looking.
>>>>>>>
>>>>>>> Full bootlog and other info can be found here:
>>>>>>>
>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>>> makes the firefly board start working again.
>>>>>>
>>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>>> governor as the default governor, so during kernel boot, it will always
>>>>>> switch to the max frequency.
>>>>>>
>>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>>> informative splat[1]
>>>>>>
>>>>>> There is still an OPP change happening because the detected OPP is not
>>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>>> and fails in the bowels of clk_set_rate()
>>>>> Though I think that might only be a symptom as well.
>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>>
>>>>> One source for these issues is often the regulator supplying the cpu
>>>>> going haywire - aka the voltage not matching the opp.
>>>>>
>>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>>
>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>>> between different bootloader versions (how the selection-pin is set up),
>>>>> so the kernel might actually write his requested voltage to the wrong
>>>>> register (not the one for actual voltage, but the second set used for
>>>>> the suspend voltage).
>>>>>
>>>>> Did you by chance swap bootloaders at some point in recent past?
>>>> No, haven't touched bootloader since I initially setup the board.
>>> The CPU voltage does not affect by bootloader for kernel should have its
>>> own opp-table,
>>>
>>> the bootloader may only affect the center/logic power supply.
>>>
>>>>> I'd assume [2] might actually be the same issue last year, though
>>>>> the CI-logs are not available anymore it seems.
>>>>>
>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>>> cpufreq for this cluster defers and see what happens?
>>>> Yes, this change[1] definitely makes things boot reliably again, so
>>>> there's defintiely something a bit unstable with this regulator, at
>>>> least on this firefly.
>>> Is it possible to target which patch introduce this bug? This board
>>> should have work correctly for a long time with upstream source code.
>> Unfortunately, it seems to be a regular, but intermittent failure, so
>> bisection is not producing anything reliable.
>>
>> You can see that both in mainline[1] and in linux-next[2] there are
>> periodic failures, but it's hard to see any patterns.
> Even worse, I (re)tested mainline for versions that were previously
> passing (v5.2, v5.3-rc5) and they are also failing now.
>
> They work again if I disable that regulator as suggested by Heiko.
>
> So this is increasingly pointing to failing hardware.
>
> Kevin
>
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-26  9:56               ` Kever Yang
@ 2019-08-26 17:09                 ` Kevin Hilman
  2019-08-27  1:54                   ` Kever Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Hilman @ 2019-08-26 17:09 UTC (permalink / raw)
  To: Kever Yang, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Hi Kever,

Kever Yang <kever.yang@rock-chips.com> writes:

> Hi Kevin,
>
>      I want to have a test with my board, I can get the Image and dtb 
> from the link for the job,
>
> but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?

The ramdisk images are here:

  https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/

in the kernelCI logs the ramdisk is slightly modified because the kernel
modules have been inserted into the cpio archive.

However, for the purposes of this test, you can just test with the
unmodified rootfs.cpio.gz above.

Kevin


> Thanks,
>
> - Kever
>
> On 2019/8/24 上午1:03, Kevin Hilman wrote:
>> Kevin Hilman <khilman@baylibre.com> writes:
>>
>>> Kever Yang <kever.yang@rock-chips.com> writes:
>>>
>>>> Hi Kevin, Heiko,
>>>>
>>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>>>> Hi Heiko,
>>>>>
>>>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>>>
>>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>>>
>>>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>>>
>>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>>>
>>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>>>
>>>>>>>>> arm64:
>>>>>>>>>       defconfig:
>>>>>>>>>           gcc-8:
>>>>>>>>>               rk3399-firefly: 1 failed lab
>>>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>>>
>>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>>>
>>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>>>> looking.
>>>>>>>>
>>>>>>>> Full bootlog and other info can be found here:
>>>>>>>>
>>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>>>> makes the firefly board start working again.
>>>>>>>
>>>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>>>> governor as the default governor, so during kernel boot, it will always
>>>>>>> switch to the max frequency.
>>>>>>>
>>>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>>>> informative splat[1]
>>>>>>>
>>>>>>> There is still an OPP change happening because the detected OPP is not
>>>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>>>> and fails in the bowels of clk_set_rate()
>>>>>> Though I think that might only be a symptom as well.
>>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>>>
>>>>>> One source for these issues is often the regulator supplying the cpu
>>>>>> going haywire - aka the voltage not matching the opp.
>>>>>>
>>>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>>>
>>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>>>> between different bootloader versions (how the selection-pin is set up),
>>>>>> so the kernel might actually write his requested voltage to the wrong
>>>>>> register (not the one for actual voltage, but the second set used for
>>>>>> the suspend voltage).
>>>>>>
>>>>>> Did you by chance swap bootloaders at some point in recent past?
>>>>> No, haven't touched bootloader since I initially setup the board.
>>>> The CPU voltage does not affect by bootloader for kernel should have its
>>>> own opp-table,
>>>>
>>>> the bootloader may only affect the center/logic power supply.
>>>>
>>>>>> I'd assume [2] might actually be the same issue last year, though
>>>>>> the CI-logs are not available anymore it seems.
>>>>>>
>>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>>>> cpufreq for this cluster defers and see what happens?
>>>>> Yes, this change[1] definitely makes things boot reliably again, so
>>>>> there's defintiely something a bit unstable with this regulator, at
>>>>> least on this firefly.
>>>> Is it possible to target which patch introduce this bug? This board
>>>> should have work correctly for a long time with upstream source code.
>>> Unfortunately, it seems to be a regular, but intermittent failure, so
>>> bisection is not producing anything reliable.
>>>
>>> You can see that both in mainline[1] and in linux-next[2] there are
>>> periodic failures, but it's hard to see any patterns.
>> Even worse, I (re)tested mainline for versions that were previously
>> passing (v5.2, v5.3-rc5) and they are also failing now.
>>
>> They work again if I disable that regulator as suggested by Heiko.
>>
>> So this is increasingly pointing to failing hardware.
>>
>> Kevin
>>
>>
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-26 17:09                 ` Kevin Hilman
@ 2019-08-27  1:54                   ` Kever Yang
  2019-08-27  2:14                     ` Heiko Stuebner
  0 siblings, 1 reply; 16+ messages in thread
From: Kever Yang @ 2019-08-27  1:54 UTC (permalink / raw)
  To: Kevin Hilman, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel


On 2019/8/27 上午1:09, Kevin Hilman wrote:
> Hi Kever,
>
> Kever Yang <kever.yang@rock-chips.com> writes:
>
>> Hi Kevin,
>>
>>       I want to have a test with my board, I can get the Image and dtb
>> from the link for the job,
>>
>> but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
> The ramdisk images are here:
>
>    https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
>
> in the kernelCI logs the ramdisk is slightly modified because the kernel
> modules have been inserted into the cpio archive.
>
> However, for the purposes of this test, you can just test with the
> unmodified rootfs.cpio.gz above.


I try with this ramdisk, and it hangs at fan53555 init, but not get into 
cpufreq.

Any suggestion?

  My boot log:

https://paste.ubuntu.com/p/WYZKPWp7sk/

Thanks,

- Kever

>
> Kevin
>
>
>> Thanks,
>>
>> - Kever
>>
>> On 2019/8/24 上午1:03, Kevin Hilman wrote:
>>> Kevin Hilman <khilman@baylibre.com> writes:
>>>
>>>> Kever Yang <kever.yang@rock-chips.com> writes:
>>>>
>>>>> Hi Kevin, Heiko,
>>>>>
>>>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>>>>> Hi Heiko,
>>>>>>
>>>>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>>>>
>>>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>>>>
>>>>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>>>>
>>>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>>>>
>>>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>>>>
>>>>>>>>>> arm64:
>>>>>>>>>>        defconfig:
>>>>>>>>>>            gcc-8:
>>>>>>>>>>                rk3399-firefly: 1 failed lab
>>>>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>>>>
>>>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>>>>
>>>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>>>>> looking.
>>>>>>>>>
>>>>>>>>> Full bootlog and other info can be found here:
>>>>>>>>>
>>>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>>>>> makes the firefly board start working again.
>>>>>>>>
>>>>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>>>>> governor as the default governor, so during kernel boot, it will always
>>>>>>>> switch to the max frequency.
>>>>>>>>
>>>>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>>>>> informative splat[1]
>>>>>>>>
>>>>>>>> There is still an OPP change happening because the detected OPP is not
>>>>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>>>>> and fails in the bowels of clk_set_rate()
>>>>>>> Though I think that might only be a symptom as well.
>>>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>>>>
>>>>>>> One source for these issues is often the regulator supplying the cpu
>>>>>>> going haywire - aka the voltage not matching the opp.
>>>>>>>
>>>>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>>>>
>>>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>>>>> between different bootloader versions (how the selection-pin is set up),
>>>>>>> so the kernel might actually write his requested voltage to the wrong
>>>>>>> register (not the one for actual voltage, but the second set used for
>>>>>>> the suspend voltage).
>>>>>>>
>>>>>>> Did you by chance swap bootloaders at some point in recent past?
>>>>>> No, haven't touched bootloader since I initially setup the board.
>>>>> The CPU voltage does not affect by bootloader for kernel should have its
>>>>> own opp-table,
>>>>>
>>>>> the bootloader may only affect the center/logic power supply.
>>>>>
>>>>>>> I'd assume [2] might actually be the same issue last year, though
>>>>>>> the CI-logs are not available anymore it seems.
>>>>>>>
>>>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>>>>> cpufreq for this cluster defers and see what happens?
>>>>>> Yes, this change[1] definitely makes things boot reliably again, so
>>>>>> there's defintiely something a bit unstable with this regulator, at
>>>>>> least on this firefly.
>>>>> Is it possible to target which patch introduce this bug? This board
>>>>> should have work correctly for a long time with upstream source code.
>>>> Unfortunately, it seems to be a regular, but intermittent failure, so
>>>> bisection is not producing anything reliable.
>>>>
>>>> You can see that both in mainline[1] and in linux-next[2] there are
>>>> periodic failures, but it's hard to see any patterns.
>>> Even worse, I (re)tested mainline for versions that were previously
>>> passing (v5.2, v5.3-rc5) and they are also failing now.
>>>
>>> They work again if I disable that regulator as suggested by Heiko.
>>>
>>> So this is increasingly pointing to failing hardware.
>>>
>>> Kevin
>>>
>>>
>>>
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-27  1:54                   ` Kever Yang
@ 2019-08-27  2:14                     ` Heiko Stuebner
  2019-08-27  9:59                       ` Kever Yang
  0 siblings, 1 reply; 16+ messages in thread
From: Heiko Stuebner @ 2019-08-27  2:14 UTC (permalink / raw)
  To: Kever Yang
  Cc: Kevin Hilman, kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Hi Kever,

Am Dienstag, 27. August 2019, 03:54:26 CEST schrieb Kever Yang:
> On 2019/8/27 上午1:09, Kevin Hilman wrote:
> > Kever Yang <kever.yang@rock-chips.com> writes:
> >>       I want to have a test with my board, I can get the Image and dtb
> >> from the link for the job,
> >>
> >> but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
> > The ramdisk images are here:
> >
> >    https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
> >
> > in the kernelCI logs the ramdisk is slightly modified because the kernel
> > modules have been inserted into the cpio archive.
> >
> > However, for the purposes of this test, you can just test with the
> > unmodified rootfs.cpio.gz above.
> 
> 
> I try with this ramdisk, and it hangs at fan53555 init, but not get into 
> cpufreq.
> 
> Any suggestion?

My guess would be the fcs,suspend-voltage-selector maybe?

I.e. old uboots somehow set the voltage gpio strangely, so you'd need
	fcs,suspend-voltage-selector = <0>
while newer uboots I think do configure the gpio, needing a value of <1>;

So try to swap that number in the dts perhaps for a start?


Heiko


>   My boot log:
> 
> https://paste.ubuntu.com/p/WYZKPWp7sk/
> 
> Thanks,
> 
> - Kever
> 
> >
> > Kevin
> >
> >
> >> Thanks,
> >>
> >> - Kever
> >>
> >> On 2019/8/24 上午1:03, Kevin Hilman wrote:
> >>> Kevin Hilman <khilman@baylibre.com> writes:
> >>>
> >>>> Kever Yang <kever.yang@rock-chips.com> writes:
> >>>>
> >>>>> Hi Kevin, Heiko,
> >>>>>
> >>>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
> >>>>>> Hi Heiko,
> >>>>>>
> >>>>>> Heiko Stuebner <heiko@sntech.de> writes:
> >>>>>>
> >>>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
> >>>>>>>> [ resent with correct addr for linux-rockchip list ]
> >>>>>>>>
> >>>>>>>> Mark Brown <broonie@kernel.org> writes:
> >>>>>>>>
> >>>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
> >>>>>>>>>
> >>>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
> >>>>>>>>>
> >>>>>>>>>> arm64:
> >>>>>>>>>>        defconfig:
> >>>>>>>>>>            gcc-8:
> >>>>>>>>>>                rk3399-firefly: 1 failed lab
> >>>>>>>>> It hits a BUG() trying to set up cpufreq:
> >>>>>>>>>
> >>>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
> >>>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
> >>>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
> >>>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
> >>>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
> >>>>>>>>> [   87.495335] ------------[ cut here ]------------
> >>>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
> >>>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> >>>>>>>>>
> >>>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
> >>>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
> >>>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
> >>>>>>>>> looking.
> >>>>>>>>>
> >>>>>>>>> Full bootlog and other info can be found here:
> >>>>>>>>>
> >>>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
> >>>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
> >>>>>>>> makes the firefly board start working again.
> >>>>>>>>
> >>>>>>>> Note that the default defconfig enables the "performance" CPUfreq
> >>>>>>>> governor as the default governor, so during kernel boot, it will always
> >>>>>>>> switch to the max frequency.
> >>>>>>>>
> >>>>>>>> For fun, I set the default governor to "userspace" so the kernel
> >>>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
> >>>>>>>> informative splat[1]
> >>>>>>>>
> >>>>>>>> There is still an OPP change happening because the detected OPP is not
> >>>>>>>> one that's listed in the table, so it tries to change to a listed OPP
> >>>>>>>> and fails in the bowels of clk_set_rate()
> >>>>>>> Though I think that might only be a symptom as well.
> >>>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
> >>>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
> >>>>>>>
> >>>>>>> One source for these issues is often the regulator supplying the cpu
> >>>>>>> going haywire - aka the voltage not matching the opp.
> >>>>>>>
> >>>>>>> As in this error-case it's CPU4 being set, this would mean it might
> >>>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
> >>>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
> >>>>>>>
> >>>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
> >>>>>>> between different bootloader versions (how the selection-pin is set up),
> >>>>>>> so the kernel might actually write his requested voltage to the wrong
> >>>>>>> register (not the one for actual voltage, but the second set used for
> >>>>>>> the suspend voltage).
> >>>>>>>
> >>>>>>> Did you by chance swap bootloaders at some point in recent past?
> >>>>>> No, haven't touched bootloader since I initially setup the board.
> >>>>> The CPU voltage does not affect by bootloader for kernel should have its
> >>>>> own opp-table,
> >>>>>
> >>>>> the bootloader may only affect the center/logic power supply.
> >>>>>
> >>>>>>> I'd assume [2] might actually be the same issue last year, though
> >>>>>>> the CI-logs are not available anymore it seems.
> >>>>>>>
> >>>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
> >>>>>>> cpufreq for this cluster defers and see what happens?
> >>>>>> Yes, this change[1] definitely makes things boot reliably again, so
> >>>>>> there's defintiely something a bit unstable with this regulator, at
> >>>>>> least on this firefly.
> >>>>> Is it possible to target which patch introduce this bug? This board
> >>>>> should have work correctly for a long time with upstream source code.
> >>>> Unfortunately, it seems to be a regular, but intermittent failure, so
> >>>> bisection is not producing anything reliable.
> >>>>
> >>>> You can see that both in mainline[1] and in linux-next[2] there are
> >>>> periodic failures, but it's hard to see any patterns.
> >>> Even worse, I (re)tested mainline for versions that were previously
> >>> passing (v5.2, v5.3-rc5) and they are also failing now.
> >>>
> >>> They work again if I disable that regulator as suggested by Heiko.
> >>>
> >>> So this is increasingly pointing to failing hardware.
> >>>
> >>> Kevin
> >>>
> >>>
> >>>
> >
> >
> 
> 
> 





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-27  2:14                     ` Heiko Stuebner
@ 2019-08-27  9:59                       ` Kever Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Kever Yang @ 2019-08-27  9:59 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: Kevin Hilman, kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Hi Heiko,

On 2019/8/27 上午10:14, Heiko Stuebner wrote:
> Hi Kever,
>
> Am Dienstag, 27. August 2019, 03:54:26 CEST schrieb Kever Yang:
>> On 2019/8/27 上午1:09, Kevin Hilman wrote:
>>> Kever Yang <kever.yang@rock-chips.com> writes:
>>>>        I want to have a test with my board, I can get the Image and dtb
>>>> from the link for the job,
>>>>
>>>> but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
>>> The ramdisk images are here:
>>>
>>>     https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
>>>
>>> in the kernelCI logs the ramdisk is slightly modified because the kernel
>>> modules have been inserted into the cpio archive.
>>>
>>> However, for the purposes of this test, you can just test with the
>>> unmodified rootfs.cpio.gz above.
>>
>> I try with this ramdisk, and it hangs at fan53555 init, but not get into
>> cpufreq.
>>
>> Any suggestion?
> My guess would be the fcs,suspend-voltage-selector maybe?
>
> I.e. old uboots somehow set the voltage gpio strangely, so you'd need
> 	fcs,suspend-voltage-selector = <0>


Both U-Boot and Kernel dts are still '<0>' for this property, and this 
is correct setting for cpu_b;

> while newer uboots I think do configure the gpio, needing a value of <1>;


There is no 'vsel-gpio' in both upstream U-Boot and kernel dts, while 
there is a "vsel-gpios = <&gpio1 18 GPIO_ACTIVE_HIGH>;"

in rockchip kernel 4.4 dts. so I think there is no gpio setting on 
upstream code?

And kernelci's test case, does not update the bootloader, only update 
kernel.


Thanks,

- Kever

>
> So try to swap that number in the dts perhaps for a start?
>
>
> Heiko
>
>
>>    My boot log:
>>
>> https://paste.ubuntu.com/p/WYZKPWp7sk/
>>
>> Thanks,
>>
>> - Kever
>>
>>> Kevin
>>>
>>>
>>>> Thanks,
>>>>
>>>> - Kever
>>>>
>>>> On 2019/8/24 上午1:03, Kevin Hilman wrote:
>>>>> Kevin Hilman <khilman@baylibre.com> writes:
>>>>>
>>>>>> Kever Yang <kever.yang@rock-chips.com> writes:
>>>>>>
>>>>>>> Hi Kevin, Heiko,
>>>>>>>
>>>>>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>>>>>>> Hi Heiko,
>>>>>>>>
>>>>>>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>>>>>>
>>>>>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>>>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>>>>>>
>>>>>>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>>>>>>
>>>>>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>>>>>>
>>>>>>>>>>>> arm64:
>>>>>>>>>>>>         defconfig:
>>>>>>>>>>>>             gcc-8:
>>>>>>>>>>>>                 rk3399-firefly: 1 failed lab
>>>>>>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>>>>>>
>>>>>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>>>>>>
>>>>>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>>>>>>> looking.
>>>>>>>>>>>
>>>>>>>>>>> Full bootlog and other info can be found here:
>>>>>>>>>>>
>>>>>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>>>>>>> makes the firefly board start working again.
>>>>>>>>>>
>>>>>>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>>>>>>> governor as the default governor, so during kernel boot, it will always
>>>>>>>>>> switch to the max frequency.
>>>>>>>>>>
>>>>>>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>>>>>>> informative splat[1]
>>>>>>>>>>
>>>>>>>>>> There is still an OPP change happening because the detected OPP is not
>>>>>>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>>>>>>> and fails in the bowels of clk_set_rate()
>>>>>>>>> Though I think that might only be a symptom as well.
>>>>>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>>>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>>>>>>
>>>>>>>>> One source for these issues is often the regulator supplying the cpu
>>>>>>>>> going haywire - aka the voltage not matching the opp.
>>>>>>>>>
>>>>>>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>>>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>>>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>>>>>>
>>>>>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>>>>>>> between different bootloader versions (how the selection-pin is set up),
>>>>>>>>> so the kernel might actually write his requested voltage to the wrong
>>>>>>>>> register (not the one for actual voltage, but the second set used for
>>>>>>>>> the suspend voltage).
>>>>>>>>>
>>>>>>>>> Did you by chance swap bootloaders at some point in recent past?
>>>>>>>> No, haven't touched bootloader since I initially setup the board.
>>>>>>> The CPU voltage does not affect by bootloader for kernel should have its
>>>>>>> own opp-table,
>>>>>>>
>>>>>>> the bootloader may only affect the center/logic power supply.
>>>>>>>
>>>>>>>>> I'd assume [2] might actually be the same issue last year, though
>>>>>>>>> the CI-logs are not available anymore it seems.
>>>>>>>>>
>>>>>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>>>>>>> cpufreq for this cluster defers and see what happens?
>>>>>>>> Yes, this change[1] definitely makes things boot reliably again, so
>>>>>>>> there's defintiely something a bit unstable with this regulator, at
>>>>>>>> least on this firefly.
>>>>>>> Is it possible to target which patch introduce this bug? This board
>>>>>>> should have work correctly for a long time with upstream source code.
>>>>>> Unfortunately, it seems to be a regular, but intermittent failure, so
>>>>>> bisection is not producing anything reliable.
>>>>>>
>>>>>> You can see that both in mainline[1] and in linux-next[2] there are
>>>>>> periodic failures, but it's hard to see any patterns.
>>>>> Even worse, I (re)tested mainline for versions that were previously
>>>>> passing (v5.2, v5.3-rc5) and they are also failing now.
>>>>>
>>>>> They work again if I disable that regulator as suggested by Heiko.
>>>>>
>>>>> So this is increasingly pointing to failing hardware.
>>>>>
>>>>> Kevin
>>>>>
>>>>>
>>>>>
>>>
>>
>>
>
>
>
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-08-23 17:03             ` Kevin Hilman
  2019-08-26  9:56               ` Kever Yang
@ 2019-09-26 22:51               ` Kevin Hilman
  2019-10-10  9:32                 ` Kever Yang
  1 sibling, 1 reply; 16+ messages in thread
From: Kevin Hilman @ 2019-09-26 22:51 UTC (permalink / raw)
  To: Kever Yang, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Kevin Hilman <khilman@baylibre.com> writes:

> Kevin Hilman <khilman@baylibre.com> writes:
>
>> Kever Yang <kever.yang@rock-chips.com> writes:
>>
>>> Hi Kevin, Heiko,
>>>
>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>>> Hi Heiko,
>>>>
>>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>>
>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>>
>>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>>
>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>>
>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>>
>>>>>>>> arm64:
>>>>>>>>      defconfig:
>>>>>>>>          gcc-8:
>>>>>>>>              rk3399-firefly: 1 failed lab
>>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>>
>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>>
>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>>> looking.
>>>>>>>
>>>>>>> Full bootlog and other info can be found here:
>>>>>>>
>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>>> makes the firefly board start working again.
>>>>>>
>>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>>> governor as the default governor, so during kernel boot, it will always
>>>>>> switch to the max frequency.
>>>>>>
>>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>>> informative splat[1]
>>>>>>
>>>>>> There is still an OPP change happening because the detected OPP is not
>>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>>> and fails in the bowels of clk_set_rate()
>>>>> Though I think that might only be a symptom as well.
>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>>
>>>>> One source for these issues is often the regulator supplying the cpu
>>>>> going haywire - aka the voltage not matching the opp.
>>>>>
>>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>>
>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>>> between different bootloader versions (how the selection-pin is set up),
>>>>> so the kernel might actually write his requested voltage to the wrong
>>>>> register (not the one for actual voltage, but the second set used for
>>>>> the suspend voltage).
>>>>>
>>>>> Did you by chance swap bootloaders at some point in recent past?
>>>> No, haven't touched bootloader since I initially setup the board.
>>>
>>> The CPU voltage does not affect by bootloader for kernel should have its 
>>> own opp-table,
>>>
>>> the bootloader may only affect the center/logic power supply.
>>>
>>>>
>>>>> I'd assume [2] might actually be the same issue last year, though
>>>>> the CI-logs are not available anymore it seems.
>>>>>
>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>>> cpufreq for this cluster defers and see what happens?
>>>> Yes, this change[1] definitely makes things boot reliably again, so
>>>> there's defintiely something a bit unstable with this regulator, at
>>>> least on this firefly.
>>>
>>> Is it possible to target which patch introduce this bug? This board  
>>> should have work correctly for a long time with upstream source code.
>>
>> Unfortunately, it seems to be a regular, but intermittent failure, so
>> bisection is not producing anything reliable.
>>
>> You can see that both in mainline[1] and in linux-next[2] there are
>> periodic failures, but it's hard to see any patterns.
>
> Even worse, I (re)tested mainline for versions that were previously
> passing (v5.2, v5.3-rc5) and they are also failing now.
>
> They work again if I disable that regulator as suggested by Heiko.
>
> So this is increasingly pointing to failing hardware.

This is now failing in the v5.2 stable tree.

Any suggestions on what to do?  otherwise, I'll just need to disable
this board.

Or, if someone wants to donate a new rk3399-firefly for my lab, I'd be
glad to try replacing it.

Kevin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPUfreq fail on rk3399-firefly
  2019-09-26 22:51               ` Kevin Hilman
@ 2019-10-10  9:32                 ` Kever Yang
  0 siblings, 0 replies; 16+ messages in thread
From: Kever Yang @ 2019-10-10  9:32 UTC (permalink / raw)
  To: Kevin Hilman, Heiko Stuebner
  Cc: kernel-build-reports, linux-rockchip, linux-next,
	张晴, 闫孝军,
	linux-arm-kernel

Hi Kevin,

     I will send you a Firefly-rk3399 board to you.


Thanks,

- Kever

On 2019/9/27 上午6:51, Kevin Hilman wrote:
> This is now failing in the v5.2 stable tree.
>
> Any suggestions on what to do?  otherwise, I'll just need to disable
> this board.
>
> Or, if someone wants to donate a new rk3399-firefly for my lab, I'd be
> glad to try replacing it.
>
> Kevin



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-10-10  9:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5d3057c8.1c69fb81.c6489.8ad2@mx.google.com>
2019-07-18 16:20 ` next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718) Mark Brown
2019-08-12 17:24   ` Mark Brown
2019-08-13 17:26   ` Kevin Hilman
2019-08-13 17:35   ` CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)) Kevin Hilman
2019-08-14  9:01     ` Heiko Stuebner
2019-08-21 18:59       ` Kevin Hilman
2019-08-23  0:32         ` CPUfreq fail on rk3399-firefly Kever Yang
2019-08-23 16:52           ` Kevin Hilman
2019-08-23 17:03             ` Kevin Hilman
2019-08-26  9:56               ` Kever Yang
2019-08-26 17:09                 ` Kevin Hilman
2019-08-27  1:54                   ` Kever Yang
2019-08-27  2:14                     ` Heiko Stuebner
2019-08-27  9:59                       ` Kever Yang
2019-09-26 22:51               ` Kevin Hilman
2019-10-10  9:32                 ` Kever Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).