Linux-Next Archive on lore.kernel.org
 help / color / Atom feed
From: Kevin Hilman <khilman@baylibre.com>
To: Kever Yang <kever.yang@rock-chips.com>, Heiko Stuebner <heiko@sntech.de>
Cc: kernel-build-reports@lists.linaro.org,
	linux-rockchip@lists.infradead.org, linux-next@vger.kernel.org,
	张晴 <elaine.zhang@rock-chips.com>, 闫孝军 <andy.yan@rock-chips.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: CPUfreq fail on rk3399-firefly
Date: Thu, 26 Sep 2019 15:51:20 -0700
Message-ID: <7hh84yisd3.fsf@baylibre.com> (raw)
In-Reply-To: <7h8srjzuen.fsf@baylibre.com>

Kevin Hilman <khilman@baylibre.com> writes:

> Kevin Hilman <khilman@baylibre.com> writes:
>
>> Kever Yang <kever.yang@rock-chips.com> writes:
>>
>>> Hi Kevin, Heiko,
>>>
>>> On 2019/8/22 上午2:59, Kevin Hilman wrote:
>>>> Hi Heiko,
>>>>
>>>> Heiko Stuebner <heiko@sntech.de> writes:
>>>>
>>>>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
>>>>>> [ resent with correct addr for linux-rockchip list ]
>>>>>>
>>>>>> Mark Brown <broonie@kernel.org> writes:
>>>>>>
>>>>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
>>>>>>>
>>>>>>> Today's -next started failing to boot defconfig on rk3399-firefly:
>>>>>>>
>>>>>>>> arm64:
>>>>>>>>      defconfig:
>>>>>>>>          gcc-8:
>>>>>>>>              rk3399-firefly: 1 failed lab
>>>>>>> It hits a BUG() trying to set up cpufreq:
>>>>>>>
>>>>>>> [   87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz
>>>>>>> [   87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz
>>>>>>> [   87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz
>>>>>>> [   87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22
>>>>>>> [   87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22
>>>>>>> [   87.495335] ------------[ cut here ]------------
>>>>>>> [   87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438!
>>>>>>> [   87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>>>>>
>>>>>>> I'm struggling to see anything relevant in the diff from yesterday, the
>>>>>>> unlisted frequency warnings were there in the logs yesterday but no oops
>>>>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant
>>>>>>> looking.
>>>>>>>
>>>>>>> Full bootlog and other info can be found here:
>>>>>>>
>>>>>>> 	https://kernelci.org/boot/id/5d302d8359b51498d049e983/
>>>>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n)
>>>>>> makes the firefly board start working again.
>>>>>>
>>>>>> Note that the default defconfig enables the "performance" CPUfreq
>>>>>> governor as the default governor, so during kernel boot, it will always
>>>>>> switch to the max frequency.
>>>>>>
>>>>>> For fun, I set the default governor to "userspace" so the kernel
>>>>>> wouldn't make any OPP changes, and that leads to a slightly more
>>>>>> informative splat[1]
>>>>>>
>>>>>> There is still an OPP change happening because the detected OPP is not
>>>>>> one that's listed in the table, so it tries to change to a listed OPP
>>>>>> and fails in the bowels of clk_set_rate()
>>>>> Though I think that might only be a symptom as well.
>>>>> Both the PLL setting code as well as the actual cpu-clock implementation
>>>>> is unchanged since 2017 (and runs just fine on all boards in my farm).
>>>>>
>>>>> One source for these issues is often the regulator supplying the cpu
>>>>> going haywire - aka the voltage not matching the opp.
>>>>>
>>>>> As in this error-case it's CPU4 being set, this would mean it might
>>>>> be the big cluster supplied by the external syr825 (fan5355 clone)
>>>>> that might act up. In the Firefly-rk3399 case this is even stranger.
>>>>>
>>>>> There is a discrepancy between the "fcs,suspend-voltage-selector"
>>>>> between different bootloader versions (how the selection-pin is set up),
>>>>> so the kernel might actually write his requested voltage to the wrong
>>>>> register (not the one for actual voltage, but the second set used for
>>>>> the suspend voltage).
>>>>>
>>>>> Did you by chance swap bootloaders at some point in recent past?
>>>> No, haven't touched bootloader since I initially setup the board.
>>>
>>> The CPU voltage does not affect by bootloader for kernel should have its 
>>> own opp-table,
>>>
>>> the bootloader may only affect the center/logic power supply.
>>>
>>>>
>>>>> I'd assume [2] might actually be the same issue last year, though
>>>>> the CI-logs are not available anymore it seems.
>>>>>
>>>>> Could you try to set the vdd_cpu_b regulator to disabled, so that
>>>>> cpufreq for this cluster defers and see what happens?
>>>> Yes, this change[1] definitely makes things boot reliably again, so
>>>> there's defintiely something a bit unstable with this regulator, at
>>>> least on this firefly.
>>>
>>> Is it possible to target which patch introduce this bug? This board  
>>> should have work correctly for a long time with upstream source code.
>>
>> Unfortunately, it seems to be a regular, but intermittent failure, so
>> bisection is not producing anything reliable.
>>
>> You can see that both in mainline[1] and in linux-next[2] there are
>> periodic failures, but it's hard to see any patterns.
>
> Even worse, I (re)tested mainline for versions that were previously
> passing (v5.2, v5.3-rc5) and they are also failing now.
>
> They work again if I disable that regulator as suggested by Heiko.
>
> So this is increasingly pointing to failing hardware.

This is now failing in the v5.2 stable tree.

Any suggestions on what to do?  otherwise, I'll just need to disable
this board.

Or, if someone wants to donate a new rk3399-firefly for my lab, I'd be
glad to try replacing it.

Kevin

  parent reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5d3057c8.1c69fb81.c6489.8ad2@mx.google.com>
2019-07-18 16:20 ` next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718) Mark Brown
2019-08-12 17:24   ` Mark Brown
2019-08-13 17:26   ` Kevin Hilman
2019-08-13 17:35   ` CPUfreq fail on rk3399-firefly (was: next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)) Kevin Hilman
2019-08-14  9:01     ` Heiko Stuebner
2019-08-21 18:59       ` Kevin Hilman
2019-08-23  0:32         ` CPUfreq fail on rk3399-firefly Kever Yang
2019-08-23 16:52           ` Kevin Hilman
2019-08-23 17:03             ` Kevin Hilman
2019-08-26  9:56               ` Kever Yang
2019-08-26 17:09                 ` Kevin Hilman
2019-08-27  1:54                   ` Kever Yang
2019-08-27  2:14                     ` Heiko Stuebner
2019-08-27  9:59                       ` Kever Yang
2019-09-26 22:51               ` Kevin Hilman [this message]
2019-10-10  9:32                 ` Kever Yang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7hh84yisd3.fsf@baylibre.com \
    --to=khilman@baylibre.com \
    --cc=andy.yan@rock-chips.com \
    --cc=elaine.zhang@rock-chips.com \
    --cc=heiko@sntech.de \
    --cc=kernel-build-reports@lists.linaro.org \
    --cc=kever.yang@rock-chips.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Next Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-next/0 linux-next/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-next linux-next/ https://lore.kernel.org/linux-next \
		linux-next@vger.kernel.org
	public-inbox-index linux-next

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-next


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git