linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
@ 2020-02-25 15:26 bugzilla-daemon
  2020-02-26  4:02 ` Nicholas Piggin
                   ` (20 more replies)
  0 siblings, 21 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-25 15:26 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

            Bug ID: 206669
           Summary: Little-endian kernel crashing on POWER8 on heavy
                    big-endian PowerKVM load
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: 5.4.x
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PPC-64
          Assignee: platform_ppc-64@kernel-bugs.osdl.org
          Reporter: glaubitz@physik.fu-berlin.de
                CC: matorola@gmail.com
        Regression: No

Created attachment 287605
  --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
Backtrace of host system crashing with little-endian kernel

We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
unstable hosting a big-endian ppc64 virtual machine running the same kernel in
big-endian mode.

When building OpenJDK-11 on the big-endian VM, the testsuite crashes the *host*
system which is little-endian with the following kernel backtrace. The problem
reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
running 5.4.x.

Backtrace attached.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
@ 2020-02-26  4:02 ` Nicholas Piggin
  2020-02-26  4:06 ` [Bug 206669] " bugzilla-daemon
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: Nicholas Piggin @ 2020-02-26  4:02 UTC (permalink / raw)
  To: bugzilla-daemon, linuxppc-dev

bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 1:26 am:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
>             Bug ID: 206669
>            Summary: Little-endian kernel crashing on POWER8 on heavy
>                     big-endian PowerKVM load
>            Product: Platform Specific/Hardware
>            Version: 2.5
>     Kernel Version: 5.4.x
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PPC-64
>           Assignee: platform_ppc-64@kernel-bugs.osdl.org
>           Reporter: glaubitz@physik.fu-berlin.de
>                 CC: matorola@gmail.com
>         Regression: No
> 
> Created attachment 287605
>   --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
> Backtrace of host system crashing with little-endian kernel
> 
> We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
> unstable hosting a big-endian ppc64 virtual machine running the same kernel in
> big-endian mode.
> 
> When building OpenJDK-11 on the big-endian VM, the testsuite crashes the *host*
> system which is little-endian with the following kernel backtrace. The problem
> reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
> running 5.4.x.
> 
> Backtrace attached.

Thanks for the report, we need to get more data about the first BUG if 
we can. What function in your vmlinux contains address 
0xc00000000017a778? (use nm or objdump etc) Is that the first message you get,
No warnings or anything else earlier in the dmesg?

Also 0xc0000000002659a0 would be interesting.

When reproducing, do you ever get a clean trace of the first bug? Could
you try setting /proc/sys/kernel/panic_on_oops and reproducing?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
  2020-02-26  4:02 ` Nicholas Piggin
@ 2020-02-26  4:06 ` bugzilla-daemon
  2020-02-26  7:26 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-26  4:06 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #1 from npiggin@gmail.com ---
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 1:26 am:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
>             Bug ID: 206669
>            Summary: Little-endian kernel crashing on POWER8 on heavy
>                     big-endian PowerKVM load
>            Product: Platform Specific/Hardware
>            Version: 2.5
>     Kernel Version: 5.4.x
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PPC-64
>           Assignee: platform_ppc-64@kernel-bugs.osdl.org
>           Reporter: glaubitz@physik.fu-berlin.de
>                 CC: matorola@gmail.com
>         Regression: No
> 
> Created attachment 287605
>   --> https://bugzilla.kernel.org/attachment.cgi?id=287605&action=edit
> Backtrace of host system crashing with little-endian kernel
> 
> We have an IBM POWER server (8247-42L) running Linux kernel 5.4.13 on Debian
> unstable hosting a big-endian ppc64 virtual machine running the same kernel
> in
> big-endian mode.
> 
> When building OpenJDK-11 on the big-endian VM, the testsuite crashes the
> *host*
> system which is little-endian with the following kernel backtrace. The
> problem
> reproduces both with kernel 4.19.98 as well as 5.4.13, both guest and host
> running 5.4.x.
> 
> Backtrace attached.

Thanks for the report, we need to get more data about the first BUG if 
we can. What function in your vmlinux contains address 
0xc00000000017a778? (use nm or objdump etc) Is that the first message you get,
No warnings or anything else earlier in the dmesg?

Also 0xc0000000002659a0 would be interesting.

When reproducing, do you ever get a clean trace of the first bug? Could
you try setting /proc/sys/kernel/panic_on_oops and reproducing?

Thanks,
Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
  2020-02-26  4:02 ` Nicholas Piggin
  2020-02-26  4:06 ` [Bug 206669] " bugzilla-daemon
@ 2020-02-26  7:26 ` bugzilla-daemon
  2020-02-26  9:25   ` Nicholas Piggin
  2020-02-26  9:29 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-26  7:26 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #2 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
(In reply to npiggin from comment #1)
> Thanks for the report, we need to get more data about the first BUG if 
> we can. What function in your vmlinux contains address 
> 0xc00000000017a778? (use nm or objdump etc)

Seems to be t select_task_rq_fair:

root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c00000000017a
c000000000448550 T select_estimate_accuracy
c000000000170d20 t select_fallback_rq
c000000000e4c940 D select_idle_mask
c000000000179f10 t select_idle_sibling
c00000000018fd80 t select_task_rq_dl
c00000000017a640 t select_task_rq_fair
c000000000177f50 t select_task_rq_idle
c00000000018c9e0 t select_task_rq_rt
c00000000019c800 t select_task_rq_stop
c000000000927710 t selem_alloc.isra.6
c000000000926e50 t selem_link_map
root@watson:/boot#

> Is that the first message you
> get,
> No warnings or anything else earlier in the dmesg?

Correct. You can see the login prompt of the host VM watson directly after
booting up.

> Also 0xc0000000002659a0 would be interesting.

Looks like that's ring_buffer_record_off:

root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
c0000000002659
c0000000002667e0 T ring_buffer_read_finish
c00000000026b4b0 T ring_buffer_read_page
c000000000265e10 T ring_buffer_read_prepare
c000000000265ef0 T ring_buffer_read_prepare_sync
c000000000269ae0 T ring_buffer_read_start
c000000000265950 T ring_buffer_record_disable
c000000000266070 T ring_buffer_record_disable_cpu
c000000000265970 T ring_buffer_record_enable
c0000000002660c0 T ring_buffer_record_enable_cpu
c00000000026d470 T ring_buffer_record_is_on
c00000000026d480 T ring_buffer_record_is_set_on
c000000000265990 T ring_buffer_record_off
c000000000265a10 T ring_buffer_record_on
c000000000266da0 T ring_buffer_reset
c000000000266a90 T ring_buffer_reset_cpu
c000000000267cd0 T ring_buffer_resize
c00000000026d400 T ring_buffer_set_clock
root@watson:/boot#

FWIW, the kernel image comes from this Debian package:

>
> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb

> When reproducing, do you ever get a clean trace of the first bug?

I have logged everything that showed in the console during and after the crash.
After that, the machine no longer responds and has to be hard-resetted.

> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing?

I will try that.

Anything to be considered for the kernel running inside the big-endian VM?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-26  7:26 ` bugzilla-daemon
@ 2020-02-26  9:25   ` Nicholas Piggin
  0 siblings, 0 replies; 24+ messages in thread
From: Nicholas Piggin @ 2020-02-26  9:25 UTC (permalink / raw)
  To: bugzilla-daemon, linuxppc-dev

bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 5:26 pm:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> --- Comment #2 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
> (In reply to npiggin from comment #1)
>> Thanks for the report, we need to get more data about the first BUG if 
>> we can. What function in your vmlinux contains address 
>> 0xc00000000017a778? (use nm or objdump etc)
> 
> Seems to be t select_task_rq_fair:
> 
> root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c00000000017a
> c000000000448550 T select_estimate_accuracy
> c000000000170d20 t select_fallback_rq
> c000000000e4c940 D select_idle_mask
> c000000000179f10 t select_idle_sibling
> c00000000018fd80 t select_task_rq_dl
> c00000000017a640 t select_task_rq_fair
> c000000000177f50 t select_task_rq_idle
> c00000000018c9e0 t select_task_rq_rt
> c00000000019c800 t select_task_rq_stop
> c000000000927710 t selem_alloc.isra.6
> c000000000926e50 t selem_link_map
> root@watson:/boot#
> 
>> Is that the first message you
>> get,
>> No warnings or anything else earlier in the dmesg?
> 
> Correct. You can see the login prompt of the host VM watson directly after
> booting up.
> 
>> Also 0xc0000000002659a0 would be interesting.
> 
> Looks like that's ring_buffer_record_off:
> 
> root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
> c0000000002659
> c0000000002667e0 T ring_buffer_read_finish
> c00000000026b4b0 T ring_buffer_read_page
> c000000000265e10 T ring_buffer_read_prepare
> c000000000265ef0 T ring_buffer_read_prepare_sync
> c000000000269ae0 T ring_buffer_read_start
> c000000000265950 T ring_buffer_record_disable
> c000000000266070 T ring_buffer_record_disable_cpu
> c000000000265970 T ring_buffer_record_enable
> c0000000002660c0 T ring_buffer_record_enable_cpu
> c00000000026d470 T ring_buffer_record_is_on
> c00000000026d480 T ring_buffer_record_is_set_on
> c000000000265990 T ring_buffer_record_off
> c000000000265a10 T ring_buffer_record_on
> c000000000266da0 T ring_buffer_reset
> c000000000266a90 T ring_buffer_reset_cpu
> c000000000267cd0 T ring_buffer_resize
> c00000000026d400 T ring_buffer_set_clock
> root@watson:/boot#

Thanks.

Okay it looks like what's happening here is something crashes in
select_task_rq_fair (kernel data access fault). It's then able to
print out those first two lines but then it calls die(), which
ends up calling oops_enter() which calls tracing_off(), which calls
tracer_tracing_off and crashes there, which goes around the same
cycle only printing out the first two lines.

Nothing obvious as to why those accesses in particular are crashing.
The first data address is 0xc000000002bfd038, the second is
0xc0000007f9070c08. Not vmalloc space, not above the 1TB segment.

Do you have tracing / ftrace enabled in the host kernel for any
reason? Turning that off might let the oops message get printed.

> 
> FWIW, the kernel image comes from this Debian package:
> 
>>
>> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb

Okay. Any chance you could test an upstream kernel? 
> 
>> When reproducing, do you ever get a clean trace of the first bug?
> 
> I have logged everything that showed in the console during and after the crash.
> After that, the machine no longer responds and has to be hard-resetted.
> 
>> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing?
> 
> I will try that.

Don't bother testing that after the above -- panic_on_oops happens
after oops_begin(), so it won't help unfortunately.

Attmepting to get into xmon might though, if you boot with xmon=on.
Try that if tracing wasn't enabled, or disabling it doesn't help.

> 
> Anything to be considered for the kernel running inside the big-endian VM?
> 

Not that I'm aware of really. Certainly it shouldn't be able to crash
the host even if the guest was doing something stupid.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-02-26  7:26 ` bugzilla-daemon
@ 2020-02-26  9:29 ` bugzilla-daemon
  2020-02-26 10:28 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-26  9:29 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #3 from npiggin@gmail.com ---
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 5:26 pm:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> --- Comment #2 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de)
> ---
> (In reply to npiggin from comment #1)
>> Thanks for the report, we need to get more data about the first BUG if 
>> we can. What function in your vmlinux contains address 
>> 0xc00000000017a778? (use nm or objdump etc)
> 
> Seems to be t select_task_rq_fair:
> 
> root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
> c00000000017a
> c000000000448550 T select_estimate_accuracy
> c000000000170d20 t select_fallback_rq
> c000000000e4c940 D select_idle_mask
> c000000000179f10 t select_idle_sibling
> c00000000018fd80 t select_task_rq_dl
> c00000000017a640 t select_task_rq_fair
> c000000000177f50 t select_task_rq_idle
> c00000000018c9e0 t select_task_rq_rt
> c00000000019c800 t select_task_rq_stop
> c000000000927710 t selem_alloc.isra.6
> c000000000926e50 t selem_link_map
> root@watson:/boot#
> 
>> Is that the first message you
>> get,
>> No warnings or anything else earlier in the dmesg?
> 
> Correct. You can see the login prompt of the host VM watson directly after
> booting up.
> 
>> Also 0xc0000000002659a0 would be interesting.
> 
> Looks like that's ring_buffer_record_off:
> 
> root@watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
> c0000000002659
> c0000000002667e0 T ring_buffer_read_finish
> c00000000026b4b0 T ring_buffer_read_page
> c000000000265e10 T ring_buffer_read_prepare
> c000000000265ef0 T ring_buffer_read_prepare_sync
> c000000000269ae0 T ring_buffer_read_start
> c000000000265950 T ring_buffer_record_disable
> c000000000266070 T ring_buffer_record_disable_cpu
> c000000000265970 T ring_buffer_record_enable
> c0000000002660c0 T ring_buffer_record_enable_cpu
> c00000000026d470 T ring_buffer_record_is_on
> c00000000026d480 T ring_buffer_record_is_set_on
> c000000000265990 T ring_buffer_record_off
> c000000000265a10 T ring_buffer_record_on
> c000000000266da0 T ring_buffer_reset
> c000000000266a90 T ring_buffer_reset_cpu
> c000000000267cd0 T ring_buffer_resize
> c00000000026d400 T ring_buffer_set_clock
> root@watson:/boot#

Thanks.

Okay it looks like what's happening here is something crashes in
select_task_rq_fair (kernel data access fault). It's then able to
print out those first two lines but then it calls die(), which
ends up calling oops_enter() which calls tracing_off(), which calls
tracer_tracing_off and crashes there, which goes around the same
cycle only printing out the first two lines.

Nothing obvious as to why those accesses in particular are crashing.
The first data address is 0xc000000002bfd038, the second is
0xc0000007f9070c08. Not vmalloc space, not above the 1TB segment.

Do you have tracing / ftrace enabled in the host kernel for any
reason? Turning that off might let the oops message get printed.

> 
> FWIW, the kernel image comes from this Debian package:
> 
>>
>>
>> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb

Okay. Any chance you could test an upstream kernel? 
> 
>> When reproducing, do you ever get a clean trace of the first bug?
> 
> I have logged everything that showed in the console during and after the
> crash.
> After that, the machine no longer responds and has to be hard-resetted.
> 
>> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing?
> 
> I will try that.

Don't bother testing that after the above -- panic_on_oops happens
after oops_begin(), so it won't help unfortunately.

Attmepting to get into xmon might though, if you boot with xmon=on.
Try that if tracing wasn't enabled, or disabling it doesn't help.

> 
> Anything to be considered for the kernel running inside the big-endian VM?
> 

Not that I'm aware of really. Certainly it shouldn't be able to crash
the host even if the guest was doing something stupid.

Thanks,
Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-02-26  9:29 ` bugzilla-daemon
@ 2020-02-26 10:28 ` bugzilla-daemon
  2020-02-26 11:03   ` Nicholas Piggin
  2020-02-26 11:08 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-26 10:28 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #4 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
(In reply to npiggin from comment #3)
> Do you have tracing / ftrace enabled in the host kernel for any
> reason? Turning that off might let the oops message get printed.

Seems that this is the case in the Debian kernel, yes:

root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le 
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
root@watson:~#

Do you have the kernel command option at hand which disables ftrace on the
command line? Is it just ftrace=off?

> > FWIW, the kernel image comes from this Debian package:
> > 
> >>
> >>
> >>
> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb
> 
> Okay. Any chance you could test an upstream kernel? 

Sure, absolutely. Any preference on the version number?

> Don't bother testing that after the above -- panic_on_oops happens
> after oops_begin(), so it won't help unfortunately.

Okay.

> Attmepting to get into xmon might though, if you boot with xmon=on.
> Try that if tracing wasn't enabled, or disabling it doesn't help.

Okay. I will try to disable ftrace first, then retrigger the crash.

> > 
> > Anything to be considered for the kernel running inside the big-endian VM?
> > 
> 
> Not that I'm aware of really. Certainly it shouldn't be able to crash
> the host even if the guest was doing something stupid.

I agree.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-26 10:28 ` bugzilla-daemon
@ 2020-02-26 11:03   ` Nicholas Piggin
  0 siblings, 0 replies; 24+ messages in thread
From: Nicholas Piggin @ 2020-02-26 11:03 UTC (permalink / raw)
  To: bugzilla-daemon, linuxppc-dev

bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 8:28 pm:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> --- Comment #4 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
> (In reply to npiggin from comment #3)
>> Do you have tracing / ftrace enabled in the host kernel for any
>> reason? Turning that off might let the oops message get printed.
> 
> Seems that this is the case in the Debian kernel, yes:
> 
> root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le 
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_HAVE_KPROBES_ON_FTRACE=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_FTRACE=y
> CONFIG_FTRACE_SYSCALLS=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> # CONFIG_FTRACE_STARTUP_TEST is not set
> root@watson:~#
> 
> Do you have the kernel command option at hand which disables ftrace on the
> command line? Is it just ftrace=off?

Hmm, not sure, Documentation/admin-guide/kernel-parameters.txt seems
to say that wouldn't work.

I thought it might only be going down that path if you have already done
some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set
to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before
you start the test.

>> > FWIW, the kernel image comes from this Debian package:
>> > 
>> >>
>> >>
>> >>
>> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb
>> 
>> Okay. Any chance you could test an upstream kernel? 
> 
> Sure, absolutely. Any preference on the version number?

Current head if you're feeling lucky, but v5.5 if not. But you can
give the ftrace test a try with the debian kernel first if you've got
it ready to go.

>> Don't bother testing that after the above -- panic_on_oops happens
>> after oops_begin(), so it won't help unfortunately.
> 
> Okay.
> 
>> Attmepting to get into xmon might though, if you boot with xmon=on.
>> Try that if tracing wasn't enabled, or disabling it doesn't help.
> 
> Okay. I will try to disable ftrace first, then retrigger the crash.

Cool

Thanks,
Nick

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-02-26 10:28 ` bugzilla-daemon
@ 2020-02-26 11:08 ` bugzilla-daemon
  2020-02-26 12:02 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-26 11:08 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #5 from npiggin@gmail.com ---
bugzilla-daemon@bugzilla.kernel.org's on February 26, 2020 8:28 pm:
> https://bugzilla.kernel.org/show_bug.cgi?id=206669
> 
> --- Comment #4 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de)
> ---
> (In reply to npiggin from comment #3)
>> Do you have tracing / ftrace enabled in the host kernel for any
>> reason? Turning that off might let the oops message get printed.
> 
> Seems that this is the case in the Debian kernel, yes:
> 
> root@watson:~# grep -i ftrace /boot/config-5.4.0-0.bpo.3-powerpc64le 
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_HAVE_KPROBES_ON_FTRACE=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_FTRACE=y
> CONFIG_FTRACE_SYSCALLS=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> # CONFIG_FTRACE_STARTUP_TEST is not set
> root@watson:~#
> 
> Do you have the kernel command option at hand which disables ftrace on the
> command line? Is it just ftrace=off?

Hmm, not sure, Documentation/admin-guide/kernel-parameters.txt seems
to say that wouldn't work.

I thought it might only be going down that path if you have already done
some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set
to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before
you start the test.

>> > FWIW, the kernel image comes from this Debian package:
>> > 
>> >>
>> >>
>> >>
>>
>> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb
>> 
>> Okay. Any chance you could test an upstream kernel? 
> 
> Sure, absolutely. Any preference on the version number?

Current head if you're feeling lucky, but v5.5 if not. But you can
give the ftrace test a try with the debian kernel first if you've got
it ready to go.

>> Don't bother testing that after the above -- panic_on_oops happens
>> after oops_begin(), so it won't help unfortunately.
> 
> Okay.
> 
>> Attmepting to get into xmon might though, if you boot with xmon=on.
>> Try that if tracing wasn't enabled, or disabling it doesn't help.
> 
> Okay. I will try to disable ftrace first, then retrigger the crash.

Cool

Thanks,
Nick

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-02-26 11:08 ` bugzilla-daemon
@ 2020-02-26 12:02 ` bugzilla-daemon
  2020-02-27 16:07 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-26 12:02 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #6 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
(In reply to npiggin from comment #5)
> I thought it might only be going down that path if you have already done
> some tracing. Perhaps ensure /sys/kernel/debug/tracing/tracing_on is set
> to 0, and then `echo 1 > /sys/kernel/debug/tracing/free_buffer` before
> you start the test.

I have done this now and I'm performing the test. Let's see if we can get some
more output.

> >> Okay. Any chance you could test an upstream kernel? 
> > 
> > Sure, absolutely. Any preference on the version number?
> 
> Current head if you're feeling lucky, but v5.5 if not. But you can
> give the ftrace test a try with the debian kernel first if you've got
> it ready to go.

I think I will try the latest 5.5.x first. After the test with the Debian
kernel and tracing turned off.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-02-26 12:02 ` bugzilla-daemon
@ 2020-02-27 16:07 ` bugzilla-daemon
  2020-03-07 21:56 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-02-27 16:07 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #7 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
I have set /sys/kernel/debug/tracing/tracing_on to "0" and
/sys/kernel/debug/tracing/free_buffer to "1" and it seems I can no longer
reproduce the issue.

I will have to do more testing to see if that's just an artifact or really
related.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-02-27 16:07 ` bugzilla-daemon
@ 2020-03-07 21:56 ` bugzilla-daemon
  2020-03-10 12:25 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-03-07 21:56 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #8 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
Created attachment 287823
  --> https://bugzilla.kernel.org/attachment.cgi?id=287823&action=edit
kern.log containing some crash dumps

I have another trace of the crash. Not sure whether this was with tracing
disabled.

Does that help?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-03-07 21:56 ` bugzilla-daemon
@ 2020-03-10 12:25 ` bugzilla-daemon
  2020-03-10 12:28 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-03-10 12:25 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

Aneesh Kumar KV (aneesh.kumar@linux.ibm.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aneesh.kumar@linux.ibm.com

--- Comment #9 from Aneesh Kumar KV (aneesh.kumar@linux.ibm.com) ---
Also, can you try disabling THP. echo "never" >
/sys/kernel/mm/transparent_hugepage/enabled 

-aneesh

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-03-10 12:25 ` bugzilla-daemon
@ 2020-03-10 12:28 ` bugzilla-daemon
  2020-03-14 10:08 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-03-10 12:28 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #10 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
(In reply to Aneesh Kumar KV from comment #9)
> Also, can you try disabling THP. echo "never" >
> /sys/kernel/mm/transparent_hugepage/enabled 

Yes. Just disabled.

FWIW, the machine just crashed some minutes ago but I didn't have a serial
console open to capture the trace.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (10 preceding siblings ...)
  2020-03-10 12:28 ` bugzilla-daemon
@ 2020-03-14 10:08 ` bugzilla-daemon
  2020-03-16 18:13 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-03-14 10:08 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #11 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
It seems I can provoke the crash by running the glibc testsuite in a big-endian
guest VM.

The machine just crashed with the IPMI console open but the only message the
kernel printed was:

"watson login: [ 1809.138398] KVM: couldn't grab cpu 115"

I have not observed the kernel buffer though. But I will try to provoke the
crash now while having the kernel log open.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (11 preceding siblings ...)
  2020-03-14 10:08 ` bugzilla-daemon
@ 2020-03-16 18:13 ` bugzilla-daemon
  2020-03-19 20:22 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-03-16 18:13 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #12 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
Another crash:

watson login: [17667512263.751484] BUG: Unable to handle kernel data access at
0xc000000ff06e4838
[17667512263.751507] Faulting instruction address: 0xc00000000017a778
[17667512263.751513] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751517] Faulting instruction address: 0xc0000000002659a0
[17667512263.751521] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751525] Faulting instruction address: 0xc0000000002659a0
[17667512263.751529] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751533] Faulting instruction address: 0xc0000000002659a0
[17667512263.751537] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751541] Faulting instruction address: 0xc0000000002659a0
[17667512263.751545] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751548] Faulting instruction address: 0xc0000000002659a0
[17667512263.751552] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751556] Faulting instruction address: 0xc0000000002659a0
[17667512263.751560] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751564] Faulting instruction address: 0xc0000000002659a0
[17667512263.751569] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751574] Faulting instruction address: 0xc0000000002659a0
[17667512263.751578] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751583] Faulting instruction address: 0xc0000000002659a0
[17667512263.751587] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751591] Faulting instruction address: 0xc0000000002659a0
[17667512263.751596] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751600] Faulting instruction address: 0xc0000000002659a0
[17667512263.751604] Thread overran stack, or stack corrupted
[17667512263.751608] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[17667512263.751612] Faulting instruction address: 0xc0000000002659a0
[17667512263.751615] Thread overran stack, or stack corrupted
[17667512263.751618] BUG: Unable to handle kernel data access at
0xc0000007f9070c08
[ 1835.743178] BUG: Unable to handle unknown paging fault at 0xc000000000c4b363
[ 1835.743180] Faulting instruction address: 0x00000000
[17667512263.751633] Faulting instruction address: 0xc0000000002659a0
[ 1835.743195] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1835.743198] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 1835.743203] Modules linked in:
[17667512263.751652] Thread overran stack, or stack corrupted
[ 1835.743205]

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (12 preceding siblings ...)
  2020-03-16 18:13 ` bugzilla-daemon
@ 2020-03-19 20:22 ` bugzilla-daemon
  2021-09-10  9:40 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2020-03-19 20:22 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #13 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
watson login: [30887.552539] KVM: CPU 4 seems to be stuck
[30900.094713] watchdog: CPU 8 detected hard LOCKUP on other CPUs 0
[30900.094730] watchdog: CPU 8 TB:15863742878763, last SMP heartbeat
TB:15855546563837 (16008ms ago)
[30908.222926] watchdog: BUG: soft lockup - CPU#80 stuck for 22s! [CPU
4/KVM:2698]
[30908.374929] watchdog: BUG: soft lockup - CPU#112 stuck for 22s! [CPU
23/KVM:2717]
[30908.426934] watchdog: BUG: soft lockup - CPU#120 stuck for 22s! [CPU
16/KVM:2710]
[30909.570962] rcu: INFO: rcu_sched self-detected stall on CPU
[30909.570970] rcu:     120-....: (5059 ticks this GP)
idle=7d2/1/0x4000000000000002 softirq=421758/421758 fqs=2378 
[30912.095025] watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [CPU
18/KVM:2712]
[30912.127027] watchdog: BUG: soft lockup - CPU#40 stuck for 23s! [CPU
22/KVM:2716]
[30912.155026] watchdog: BUG: soft lockup - CPU#56 stuck for 23s! [CPU
27/KVM:2721]
[30912.175028] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [CPU
26/KVM:2720]
[30912.195028] watchdog: BUG: soft lockup - CPU#72 stuck for 23s! [CPU
19/KVM:2713]
[30912.547038] watchdog: BUG: soft lockup - CPU#136 stuck for 22s! [CPU
8/KVM:2702]
[30912.619040] watchdog: BUG: soft lockup - CPU#144 stuck for 22s! [CPU
5/KVM:2699]

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (13 preceding siblings ...)
  2020-03-19 20:22 ` bugzilla-daemon
@ 2021-09-10  9:40 ` bugzilla-daemon
  2021-09-13  8:52 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-09-10  9:40 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #14 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
Still reproduces with Linux 5.10.46 from Debian Bullseye.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (14 preceding siblings ...)
  2021-09-10  9:40 ` bugzilla-daemon
@ 2021-09-13  8:52 ` bugzilla-daemon
  2021-09-13  9:24 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-09-13  8:52 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO
                 CC|                            |michael@ellerman.id.au

--- Comment #15 from Michael Ellerman (michael@ellerman.id.au) ---
After a day and a half I have managed to get BE debian installed in a VM :}

You said "running the glibc testsuite" was enough to trigger it. Do you mean
from the glibc git tree? I can't get upstream, or the debian packaged glibc
sources to build.

Both fail building with:

../include/setjmp.h:42:3: error: static assertion failed: "size of jmp_buf !=
656"
   42 |   _Static_assert (sizeof (type) == size, \
      |   ^~~~~~~~~~~~~~

I guess I'm doing something wrong.

Any pointers on what your setup is?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (15 preceding siblings ...)
  2021-09-13  8:52 ` bugzilla-daemon
@ 2021-09-13  9:24 ` bugzilla-daemon
  2021-09-30 15:44 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-09-13  9:24 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #16 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
Hi Michael!

Thanks a lot for looking into this!

If you have installed a Debian unstable big-endian system, the easiest way to
get such a setup by creating an sbuild chroot. You should set up an sbuild
chroot for both powerpc and ppc64:

$ sbuild-createchroot --arch=powerpc
$ sbuild-createchroot --arch=ppc64

and then build the glibc package using sbuild for both powerpc and ppc64 in
parallel which is what makes the VM and the host crash during the testsuite:

$ dget -u https://deb.debian.org/debian/pool/main/g/glibc/glibc_2.32-2.dsc

In one shell:

$ sbuild -d sid --arch=ppc64 --no-arch-all glibc_2.32-2.dsc

and in a second one:

$ sbuild -d sid --arch=powerpc --no-arch-all glibc_2.32-2.dsc

If glibc doesn't trigger the crash, try gcc-10 or llvm-toolchain-13:

$ dget -u
https://deb.debian.org/debian/pool/main/l/llvm-toolchain-13/llvm-toolchain-13_13.0.0~+rc2-3.dsc
$ dget -u https://deb.debian.org/debian/pool/main/g/gcc-11/gcc-11_11.2.0-5.dsc

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (16 preceding siblings ...)
  2021-09-13  9:24 ` bugzilla-daemon
@ 2021-09-30 15:44 ` bugzilla-daemon
  2021-10-25  8:45 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-09-30 15:44 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #17 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
POWER server crashes with 100% reproducibility when building GCC in a powerpc
chroot and GCC in a ppc64 chroot on the ppc64 KVM instance at the same time.

And I assume it's the testsuite that kills both the KVM instance and the host
system.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (17 preceding siblings ...)
  2021-09-30 15:44 ` bugzilla-daemon
@ 2021-10-25  8:45 ` bugzilla-daemon
  2022-01-31  9:53 ` bugzilla-daemon
  2022-07-29  7:02 ` bugzilla-daemon
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2021-10-25  8:45 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #18 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
There seems to be a related discussion:

> https://yhbt.net/lore/all/20200831091523.GC29521@kitsune.suse.cz/T/

This suspects 10d91611f426d4bafd2a83d966c36da811b2f7ad to be the cause:

>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10d91611f426d4bafd2a83d966c36da811b2f7ad

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (18 preceding siblings ...)
  2021-10-25  8:45 ` bugzilla-daemon
@ 2022-01-31  9:53 ` bugzilla-daemon
  2022-07-29  7:02 ` bugzilla-daemon
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-01-31  9:53 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #19 from John Paul Adrian Glaubitz (glaubitz@physik.fu-berlin.de) ---
In case any runs into this issue, a workaround is disabling "dynamic_mt_modes":

# echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes

This fixes the crashes for me with a 5.15.x kernel.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load
  2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
                   ` (19 preceding siblings ...)
  2022-01-31  9:53 ` bugzilla-daemon
@ 2022-07-29  7:02 ` bugzilla-daemon
  20 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2022-07-29  7:02 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=206669

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-07-29  7:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25 15:26 [Bug 206669] New: Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load bugzilla-daemon
2020-02-26  4:02 ` Nicholas Piggin
2020-02-26  4:06 ` [Bug 206669] " bugzilla-daemon
2020-02-26  7:26 ` bugzilla-daemon
2020-02-26  9:25   ` Nicholas Piggin
2020-02-26  9:29 ` bugzilla-daemon
2020-02-26 10:28 ` bugzilla-daemon
2020-02-26 11:03   ` Nicholas Piggin
2020-02-26 11:08 ` bugzilla-daemon
2020-02-26 12:02 ` bugzilla-daemon
2020-02-27 16:07 ` bugzilla-daemon
2020-03-07 21:56 ` bugzilla-daemon
2020-03-10 12:25 ` bugzilla-daemon
2020-03-10 12:28 ` bugzilla-daemon
2020-03-14 10:08 ` bugzilla-daemon
2020-03-16 18:13 ` bugzilla-daemon
2020-03-19 20:22 ` bugzilla-daemon
2021-09-10  9:40 ` bugzilla-daemon
2021-09-13  8:52 ` bugzilla-daemon
2021-09-13  9:24 ` bugzilla-daemon
2021-09-30 15:44 ` bugzilla-daemon
2021-10-25  8:45 ` bugzilla-daemon
2022-01-31  9:53 ` bugzilla-daemon
2022-07-29  7:02 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).