All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [swat] ltp failures on autobuilder
       [not found] <1687473EDD63E45B.21776@lists.yoctoproject.org>
@ 2021-06-11 11:36 ` Richard Purdie
       [not found] ` <168784123C10B53A.9125@lists.yoctoproject.org>
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Purdie @ 2021-06-11 11:36 UTC (permalink / raw)
  To: swat, openembedded-core, Bruce Ashfield, Randy MacLeod, Paul Gortmaker

On Thu, 2021-06-10 at 18:02 +0100, Richard Purdie via lists.yoctoproject.org wrote:
> Noting down what we know about the ltp issue:
> 
> We've seen intermittent issues on the autobuilder where some ltp tests fail or 
> hang. I've been trying to figure out how to reproduce the issue and narrow down
> the cause.
> 
> I was able to isolate a patch which reproduces the issue for me:
> 
> http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222&id=d7d65aae104caa03afc28837b0abe0b486d5a8b8
> 
> with master-next, setting:
> 
> IMAGE_INSTALL_append = ' ltp' 
> TEST_SUITES = 'ping ssh ltp' 

also:

IMAGE_CLASSES += "testimage"
QEMU_USE_KVM_qemux86-64 = "True"


> then 
> 
> bitbake core-image-sato; bitbake core-image-sato -c testimage
> 
> where the issue shows up as a kernel "BUG:" in the logs in WORKDIR/testimage/qemu_*
> 
> The above patch runs the minimum of ltp tests I could find which replicate the issue.
> 
> I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
> (and we've ruled out linux-yocto with plain kernels)
> Also reproduced on both qemu 6.0.0 and 5.2.0.
> 
> My build machine is an Ubuntu 20.04.2 LTS with:
> Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021

Good news (for me) is that Randy and Paul can now reproduce this with the above 
additional key pieces of config.

We have confirmed that the issue is present:

* with gcc 11.1.1 and 10.3
* in hardknott
* if QB_SMP is disabled (i.e. in a single processor qemu)
* on 18.04, 20.04 and 21.04 Ubuntu host distros which have varying 5.4 and 5.11 
  host kernels

I was not able to make the bug appear with in gatesgarth as yet 
(gcc 10.2, 5.8 kernel, qemu 5.1.0) (had to hack -b /dev/null to the ltp commandline)

I did backport the qemu platform, smp and qemu commandline changes back to
gatesgarth and it still doesn't crash.

I also found that setting CONFIG_DEBUG_KERNEL makes the issue 'go away'. 
Since that is a large hammer, I tried:

CONFIG_DEBUG_KERNEL=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set

as a .cfg to the kernel and that still reproduced the crash. However:

CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set

doesn't seem to want to reproduce the crash so something about
those three options seems to make things 'work'.

What does that all mean? No idea.

Cheers,

Richard





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [swat] ltp failures on autobuilder
       [not found] ` <168784123C10B53A.9125@lists.yoctoproject.org>
@ 2021-06-11 13:19   ` Richard Purdie
  2021-06-16 12:56     ` Paul Gortmaker
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Purdie @ 2021-06-11 13:19 UTC (permalink / raw)
  To: swat, openembedded-core, Bruce Ashfield, Randy MacLeod, Paul Gortmaker

On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:
> as a .cfg to the kernel and that still reproduced the crash. However:
> 
> CONFIG_DEBUG_KERNEL=y
> CONFIG_CGROUP_DEBUG=y
> CONFIG_SCHED_DEBUG=y
> CONFIG_DEBUG_PREEMPT=y
> # CONFIG_RCU_TRACE is not set
> # CONFIG_X86_DEBUG_FPU is not set
> # CONFIG_CONSOLE_POLL is not set
> # CONFIG_DEBUG_INFO is not set
> # CONFIG_KGDB is not set
> # CONFIG_KGDB_HONOUR_BLOCKLIST is not set
> # CONFIG_KGDB_SERIAL_CONSOLE is not set
> # CONFIG_KGDB_LOW_LEVEL_TRAP is not set
> # CONFIG_KGDB_KDB is not set
> # CONFIG_KDB_KEYBOARD is not set
> # CONFIG_DEBUG_MISC is not set
> 

Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes" 
the crash. I can enable all the above apart from that and we can reproduce
it.

Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [swat] ltp failures on autobuilder
  2021-06-11 13:19   ` Richard Purdie
@ 2021-06-16 12:56     ` Paul Gortmaker
  2021-06-16 14:17       ` Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Gortmaker @ 2021-06-16 12:56 UTC (permalink / raw)
  To: Richard Purdie; +Cc: swat, openembedded-core, Bruce Ashfield, Randy MacLeod

[Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard Purdie wrote:

> On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:
> > as a .cfg to the kernel and that still reproduced the crash. However:
> > 
> > CONFIG_DEBUG_KERNEL=y
> > CONFIG_CGROUP_DEBUG=y
> > CONFIG_SCHED_DEBUG=y
> > CONFIG_DEBUG_PREEMPT=y
> > # CONFIG_RCU_TRACE is not set
> > # CONFIG_X86_DEBUG_FPU is not set
> > # CONFIG_CONSOLE_POLL is not set
> > # CONFIG_DEBUG_INFO is not set
> > # CONFIG_KGDB is not set
> > # CONFIG_KGDB_HONOUR_BLOCKLIST is not set
> > # CONFIG_KGDB_SERIAL_CONSOLE is not set
> > # CONFIG_KGDB_LOW_LEVEL_TRAP is not set
> > # CONFIG_KGDB_KDB is not set
> > # CONFIG_KDB_KEYBOARD is not set
> > # CONFIG_DEBUG_MISC is not set
> > 
> 
> Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes" 
> the crash. I can enable all the above apart from that and we can reproduce
> it.
> 
> Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
> breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
> is one of the CVE fixes. Continuing to try and isolate.

For the mail archive trail, and for those not follwing the ongoing
research on IRC, we are hopeful that this fixes it.

https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortmaker@windriver.com/

Paul.
--

> 
> Cheers,
> 
> Richard
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [swat] ltp failures on autobuilder
  2021-06-16 12:56     ` Paul Gortmaker
@ 2021-06-16 14:17       ` Richard Purdie
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Purdie @ 2021-06-16 14:17 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: swat, openembedded-core, Bruce Ashfield, Randy MacLeod

On Wed, 2021-06-16 at 08:56 -0400, Paul Gortmaker wrote:
> [Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard Purdie wrote:
> 
> > On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org wrote:
> > > as a .cfg to the kernel and that still reproduced the crash. However:
> > > 
> > > CONFIG_DEBUG_KERNEL=y
> > > CONFIG_CGROUP_DEBUG=y
> > > CONFIG_SCHED_DEBUG=y
> > > CONFIG_DEBUG_PREEMPT=y
> > > # CONFIG_RCU_TRACE is not set
> > > # CONFIG_X86_DEBUG_FPU is not set
> > > # CONFIG_CONSOLE_POLL is not set
> > > # CONFIG_DEBUG_INFO is not set
> > > # CONFIG_KGDB is not set
> > > # CONFIG_KGDB_HONOUR_BLOCKLIST is not set
> > > # CONFIG_KGDB_SERIAL_CONSOLE is not set
> > > # CONFIG_KGDB_LOW_LEVEL_TRAP is not set
> > > # CONFIG_KGDB_KDB is not set
> > > # CONFIG_KDB_KEYBOARD is not set
> > > # CONFIG_DEBUG_MISC is not set
> > > 
> > 
> > Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes" 
> > the crash. I can enable all the above apart from that and we can reproduce
> > it.
> > 
> > Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
> > breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
> > is one of the CVE fixes. Continuing to try and isolate.
> 
> For the mail archive trail, and for those not follwing the ongoing
> research on IRC, we are hopeful that this fixes it.
> 
> https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortmaker@windriver.com/

Awesome work in tracking that down, much appreciated, thanks!

Curious what upstream will make of it now...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-16 14:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1687473EDD63E45B.21776@lists.yoctoproject.org>
2021-06-11 11:36 ` [swat] ltp failures on autobuilder Richard Purdie
     [not found] ` <168784123C10B53A.9125@lists.yoctoproject.org>
2021-06-11 13:19   ` Richard Purdie
2021-06-16 12:56     ` Paul Gortmaker
2021-06-16 14:17       ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.