linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
       [not found] <5db7032c.1c69fb81.888b0.b521@mx.google.com>
@ 2019-10-28 17:48 ` Mark Brown
  2019-10-28 18:40   ` Bjorn Andersson
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Brown @ 2019-10-28 17:48 UTC (permalink / raw)
  To: Bjorn Andersson, Andy Gross
  Cc: kernel-build-reports, linux-next, linux-arm-msm, linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 1363 bytes --]

On Mon, Oct 28, 2019 at 08:03:08AM -0700, kernelci.org bot wrote:

Today's -next (anf Friday's) fails to boot on db820c:

>     defconfig:
>         gcc-8:
>             apq8096-db820c: 1 failed lab

It looks like it deadlocks somewhere, the last things in the log are a
failure to start ufshcd-qcom and then an RCU stall some time later:

03:03:27.191914  [   21.156672] ufshcd-qcom 624000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
03:03:27.198061  <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=qup-i2c-driver-present RESULT=pass>
03:03:27.208499  [   21.175985] ufshcd-qcom 624000.ufshc: ufs_qcom_init: required phy device. hasn't probed yet. err = -517
03:03:27.216720  [   21.176014] ufshcd-qcom 624000.ufshc: ufshcd_variant_hba_init: variant qcom init failed err -517
03:03:27.226220  <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=qup-i2c-blsp-i2c2-probed RESULT=pass>
03:03:27.239850  [   21.211424] ufshcd-qcom 624000.ufshc: Initialization failed
03:03:48.157338  [   42.128777] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
03:03:48.167648  [   42.128802] rcu: 	3-...!: (0 ticks this GP) idle=dde/1/0x4000000000000000 softirq=1715/1715 fqs=60
03:03:48.171895  [   42.133839] 	(detected by 0, t=5252 jiffies, g=2301, q=787)

Full details, including the whole log, at:

	https://kernelci.org/boot/id/5db6bf0d59b514a35660ee72/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
  2019-10-28 17:48 ` next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028) Mark Brown
@ 2019-10-28 18:40   ` Bjorn Andersson
  2019-10-28 19:11     ` Mark Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Andersson @ 2019-10-28 18:40 UTC (permalink / raw)
  To: Mark Brown
  Cc: Andy Gross, kernel-build-reports, linux-next, linux-arm-msm,
	linux-arm-kernel

On Mon 28 Oct 10:48 PDT 2019, Mark Brown wrote:

Hi Mark

> On Mon, Oct 28, 2019 at 08:03:08AM -0700, kernelci.org bot wrote:
> 
> Today's -next (anf Friday's) fails to boot on db820c:
> 
> >     defconfig:
> >         gcc-8:
> >             apq8096-db820c: 1 failed lab
> 
> It looks like it deadlocks somewhere, the last things in the log are a
> failure to start ufshcd-qcom and then an RCU stall some time later:
> 

db820c has been failing intermittently for a while now, it seems that
booting with kpti enabled causes something to go wrong. There are
nothing strange in the kernel logs and ftrace seems to indicate that all
the CPUs are idling nicely.

Regards,
Bjorn

> 03:03:27.191914  [   21.156672] ufshcd-qcom 624000.ufshc: ufshcd_populate_vreg: Unable to find vdd-hba-supply regulator, assuming enabled
> 03:03:27.198061  <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=qup-i2c-driver-present RESULT=pass>
> 03:03:27.208499  [   21.175985] ufshcd-qcom 624000.ufshc: ufs_qcom_init: required phy device. hasn't probed yet. err = -517
> 03:03:27.216720  [   21.176014] ufshcd-qcom 624000.ufshc: ufshcd_variant_hba_init: variant qcom init failed err -517
> 03:03:27.226220  <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=qup-i2c-blsp-i2c2-probed RESULT=pass>
> 03:03:27.239850  [   21.211424] ufshcd-qcom 624000.ufshc: Initialization failed
> 03:03:48.157338  [   42.128777] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> 03:03:48.167648  [   42.128802] rcu: 	3-...!: (0 ticks this GP) idle=dde/1/0x4000000000000000 softirq=1715/1715 fqs=60
> 03:03:48.171895  [   42.133839] 	(detected by 0, t=5252 jiffies, g=2301, q=787)
> 
> Full details, including the whole log, at:
> 
> 	https://kernelci.org/boot/id/5db6bf0d59b514a35660ee72/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
  2019-10-28 18:40   ` Bjorn Andersson
@ 2019-10-28 19:11     ` Mark Brown
  2019-10-28 20:02       ` Bjorn Andersson
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Brown @ 2019-10-28 19:11 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Andy Gross, kernel-build-reports, linux-next, linux-arm-msm,
	linux-arm-kernel, Catalin Marinas, Will Deacon

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On Mon, Oct 28, 2019 at 11:40:19AM -0700, Bjorn Andersson wrote:
> On Mon 28 Oct 10:48 PDT 2019, Mark Brown wrote:
> > On Mon, Oct 28, 2019 at 08:03:08AM -0700, kernelci.org bot wrote:

> > Today's -next (anf Friday's) fails to boot on db820c:

> > >     defconfig:
> > >         gcc-8:
> > >             apq8096-db820c: 1 failed lab

> > It looks like it deadlocks somewhere, the last things in the log are a
> > failure to start ufshcd-qcom and then an RCU stall some time later:

> db820c has been failing intermittently for a while now, it seems that
> booting with kpti enabled causes something to go wrong. There are
> nothing strange in the kernel logs and ftrace seems to indicate that all
> the CPUs are idling nicely.

Oh dear.  Adding Catalin and Will.  Is it definitely KPTI that's
triggering stuff?  It did turn up some bugs on other systems, though
it's a bit strange it's only manifesting in KernelCI...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
  2019-10-28 19:11     ` Mark Brown
@ 2019-10-28 20:02       ` Bjorn Andersson
  2019-10-28 20:14         ` Will Deacon
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Andersson @ 2019-10-28 20:02 UTC (permalink / raw)
  To: Mark Brown
  Cc: Andy Gross, kernel-build-reports, linux-next, linux-arm-msm,
	linux-arm-kernel, Catalin Marinas, Will Deacon

On Mon 28 Oct 12:11 PDT 2019, Mark Brown wrote:

> On Mon, Oct 28, 2019 at 11:40:19AM -0700, Bjorn Andersson wrote:
> > On Mon 28 Oct 10:48 PDT 2019, Mark Brown wrote:
> > > On Mon, Oct 28, 2019 at 08:03:08AM -0700, kernelci.org bot wrote:
> 
> > > Today's -next (anf Friday's) fails to boot on db820c:
> 
> > > >     defconfig:
> > > >         gcc-8:
> > > >             apq8096-db820c: 1 failed lab
> 
> > > It looks like it deadlocks somewhere, the last things in the log are a
> > > failure to start ufshcd-qcom and then an RCU stall some time later:
> 
> > db820c has been failing intermittently for a while now, it seems that
> > booting with kpti enabled causes something to go wrong. There are
> > nothing strange in the kernel logs and ftrace seems to indicate that all
> > the CPUs are idling nicely.
> 
> Oh dear.  Adding Catalin and Will.  Is it definitely KPTI that's
> triggering stuff?  It did turn up some bugs on other systems, though
> it's a bit strange it's only manifesting in KernelCI...

I did a test recently where I booted my db820c 100 times with kpti=yes
and 100 times with kpti=no on the kernel command line, and the result
was 90% failure to reach console vs 0%. Going back and looking at the
logs for the 10% indicated that the boot CPU was fine, but I had stalls
reported on other CPUs.

In an effort to rule out driver bugs I reduced the DT to CPUs, the core
clocks, gic, timers and serial driver, and I still saw the problem.

I have not looked at this with jtag and hence do not know what secure
world is doing.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
  2019-10-28 20:02       ` Bjorn Andersson
@ 2019-10-28 20:14         ` Will Deacon
  2019-10-28 20:23           ` Mark Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2019-10-28 20:14 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Mark Brown, Andy Gross, kernel-build-reports, linux-next,
	linux-arm-msm, linux-arm-kernel, Catalin Marinas

On Mon, Oct 28, 2019 at 01:02:19PM -0700, Bjorn Andersson wrote:
> On Mon 28 Oct 12:11 PDT 2019, Mark Brown wrote:
> 
> > On Mon, Oct 28, 2019 at 11:40:19AM -0700, Bjorn Andersson wrote:
> > > On Mon 28 Oct 10:48 PDT 2019, Mark Brown wrote:
> > > > On Mon, Oct 28, 2019 at 08:03:08AM -0700, kernelci.org bot wrote:
> > 
> > > > Today's -next (anf Friday's) fails to boot on db820c:
> > 
> > > > >     defconfig:
> > > > >         gcc-8:
> > > > >             apq8096-db820c: 1 failed lab
> > 
> > > > It looks like it deadlocks somewhere, the last things in the log are a
> > > > failure to start ufshcd-qcom and then an RCU stall some time later:
> > 
> > > db820c has been failing intermittently for a while now, it seems that
> > > booting with kpti enabled causes something to go wrong. There are
> > > nothing strange in the kernel logs and ftrace seems to indicate that all
> > > the CPUs are idling nicely.
> > 
> > Oh dear.  Adding Catalin and Will.  Is it definitely KPTI that's
> > triggering stuff?  It did turn up some bugs on other systems, though
> > it's a bit strange it's only manifesting in KernelCI...
> 
> I did a test recently where I booted my db820c 100 times with kpti=yes
> and 100 times with kpti=no on the kernel command line, and the result
> was 90% failure to reach console vs 0%. Going back and looking at the
> logs for the 10% indicated that the boot CPU was fine, but I had stalls
> reported on other CPUs.
> 
> In an effort to rule out driver bugs I reduced the DT to CPUs, the core
> clocks, gic, timers and serial driver, and I still saw the problem.
> 
> I have not looked at this with jtag and hence do not know what secure
> world is doing.

Hmm. Is this a recent thing? Neither kpti nor the snapdragon 820 are
particular new. Might be worth checking that CONFIG_QCOM_FALKOR_ERRATUM_1003
is enabled and getting patched in at runtime -- we had hardware issues
during kpti development with this CPU.

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
  2019-10-28 20:14         ` Will Deacon
@ 2019-10-28 20:23           ` Mark Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Brown @ 2019-10-28 20:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Bjorn Andersson, Andy Gross, kernel-build-reports, linux-next,
	linux-arm-msm, linux-arm-kernel, Catalin Marinas

[-- Attachment #1: Type: text/plain, Size: 731 bytes --]

On Mon, Oct 28, 2019 at 08:14:18PM +0000, Will Deacon wrote:

> Hmm. Is this a recent thing? Neither kpti nor the snapdragon 820 are

It's in mainline, don't seem to have any results from Bjorn's lab for
stable.

> particular new. Might be worth checking that CONFIG_QCOM_FALKOR_ERRATUM_1003
> is enabled and getting patched in at runtime -- we had hardware issues
> during kpti development with this CPU.

It's enabled in defconfig:

	https://storage.kernelci.org/next/master/next-20191028/arm64/defconfig/gcc-8/kernel.config

but I can't see any sign that it's been announced in the boot log:

	https://storage.kernelci.org/next/master/next-20191028/arm64/defconfig/gcc-8/lab-bjorn/boot-apq8096-db820c.html

so that might be it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)
@ 2019-10-28 15:03 kernelci.org bot
  0 siblings, 0 replies; 7+ messages in thread
From: kernelci.org bot @ 2019-10-28 15:03 UTC (permalink / raw)
  To: linux-next

next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028)

Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20191028/
Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20191028/

Tree: next
Branch: master
Git Describe: next-20191028
Git Commit: 60c1769a45f4b6beddcc48843739d7d41b88dc1c
Git URL: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Tested: 93 unique boards, 26 SoC families, 28 builds out of 216

Boot Failures Detected:

arm64:
    defconfig+CONFIG_RANDOMIZE_BASE=y:
        gcc-8:
            meson-gxl-s805x-libretech-ac: 1 failed lab
            meson-sm1-khadas-vim3l: 1 failed lab
            meson-sm1-sei610: 1 failed lab

    defconfig:
        gcc-8:
            apq8096-db820c: 1 failed lab
            meson-sm1-khadas-vim3l: 1 failed lab
            meson-sm1-sei610: 1 failed lab

    defconfig+CONFIG_CPU_BIG_ENDIAN=y:
        gcc-8:
            meson-sm1-khadas-vim3l: 1 failed lab
            meson-sm1-sei610: 1 failed lab

Offline Platforms:

riscv:

    defconfig:
        gcc-8
            sifive_fu540: 1 offline lab

arm:

    multi_v7_defconfig:
        gcc-8
            qcom-apq8064-cm-qs600: 1 offline lab
            sun5i-r8-chip: 1 offline lab
            sun7i-a20-bananapi: 1 offline lab

    sunxi_defconfig:
        gcc-8
            sun5i-r8-chip: 1 offline lab
            sun7i-a20-bananapi: 1 offline lab

    davinci_all_defconfig:
        gcc-8
            dm365evm,legacy: 1 offline lab

    qcom_defconfig:
        gcc-8
            qcom-apq8064-cm-qs600: 1 offline lab

Conflicting Boot Failures Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)

arm:
    exynos_defconfig:
        exynos5422-odroidxu3:
            lab-collabora: FAIL (gcc-8)
            lab-baylibre: PASS (gcc-8)

arm64:
    defconfig:
        meson-gxl-s905x-libretech-cc:
            lab-baylibre-seattle: PASS (gcc-8)
            lab-clabbe: FAIL (gcc-8)
            lab-baylibre: PASS (gcc-8)

---
For more info write to <info@kernelci.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-28 20:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5db7032c.1c69fb81.888b0.b521@mx.google.com>
2019-10-28 17:48 ` next/master boot: 257 boots: 8 failed, 237 passed with 8 offline, 2 untried/unknown, 2 conflicts (next-20191028) Mark Brown
2019-10-28 18:40   ` Bjorn Andersson
2019-10-28 19:11     ` Mark Brown
2019-10-28 20:02       ` Bjorn Andersson
2019-10-28 20:14         ` Will Deacon
2019-10-28 20:23           ` Mark Brown
2019-10-28 15:03 kernelci.org bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).