linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Linux kernel: powerpc: KVM guest can trigger host crash on Power8
@ 2021-10-25 11:18 Michael Ellerman
  2021-10-26  8:48 ` John Paul Adrian Glaubitz
  2021-10-28  3:58 ` [oss-security] " Salvatore Bonaccorso
  0 siblings, 2 replies; 29+ messages in thread
From: Michael Ellerman @ 2021-10-25 11:18 UTC (permalink / raw)
  To: oss-security; +Cc: linuxppc-dev

The Linux kernel for powerpc since v5.2 has a bug which allows a
malicious KVM guest to crash the host, when the host is running on
Power8.

Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
metal, are affected by the bug. Machines running PowerVM are not
affected.

The bug was introduced in:

    10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")

Which was first released in v5.2.

The upstream fix is:

  cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337

Which will be included in the v5.16 release.

Note to backporters, the following commits are required:

  73287caa9210ded6066833195f4335f7f688a46b
  ("powerpc64/idle: Fix SP offsets when saving GPRs")

  9b4416c5095c20e110c82ae602c254099b83b72f
  ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")

  cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
  ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")

  496c5fe25c377ddb7815c4ce8ecfb676f051e9b6
  ("powerpc/idle: Don't corrupt back chain when going idle")


I have a test case to trigger the bug, which I can share privately with
anyone who would like to test the fix.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-25 11:18 Linux kernel: powerpc: KVM guest can trigger host crash on Power8 Michael Ellerman
@ 2021-10-26  8:48 ` John Paul Adrian Glaubitz
  2021-10-27  5:29   ` Nicholas Piggin
                     ` (2 more replies)
  2021-10-28  3:58 ` [oss-security] " Salvatore Bonaccorso
  1 sibling, 3 replies; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-26  8:48 UTC (permalink / raw)
  To: mpe; +Cc: oss-security, debian-powerpc, linuxppc-dev

Hi Michael!

> The Linux kernel for powerpc since v5.2 has a bug which allows a
> malicious KVM guest to crash the host, when the host is running on
> Power8.
> 
> Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
> metal, are affected by the bug. Machines running PowerVM are not
> affected.
> 
> The bug was introduced in:
> 
>     10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
> 
> Which was first released in v5.2.
> 
> The upstream fix is:
> 
>   cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
> 
> Which will be included in the v5.16 release.

I have tested these patches against 5.14 but it seems the problem [1] still remains for me
for big-endian guests. I built a patched kernel yesterday, rebooted the KVM server and let
the build daemons do their work over night.

When I got up this morning, I noticed the machine was down, so I checked the serial console
via IPMI and saw the same messages again as reported in [1]:

[41483.963562] watchdog: BUG: soft lockup - CPU#104 stuck for 25521s! [migration/104:175]
[41507.963307] watchdog: BUG: soft lockup - CPU#104 stuck for 25544s! [migration/104:175]
[41518.311200] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41518.311216] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2729959 
[41547.962882] watchdog: BUG: soft lockup - CPU#104 stuck for 25581s! [migration/104:175]
[41571.962627] watchdog: BUG: soft lockup - CPU#104 stuck for 25603s! [migration/104:175]
[41581.330530] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41581.330546] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2736378 
[41611.962202] watchdog: BUG: soft lockup - CPU#104 stuck for 25641s! [migration/104:175]
[41635.961947] watchdog: BUG: soft lockup - CPU#104 stuck for 25663s! [migration/104:175]
[41644.349859] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41644.349876] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2742753 
[41671.961564] watchdog: BUG: soft lockup - CPU#104 stuck for 25697s! [migration/104:175]
[41695.961309] watchdog: BUG: soft lockup - CPU#104 stuck for 25719s! [migration/104:175]
[41707.369190] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41707.369206] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2749151 
[41735.960884] watchdog: BUG: soft lockup - CPU#104 stuck for 25756s! [migration/104:175]
[41759.960629] watchdog: BUG: soft lockup - CPU#104 stuck for 25778s! [migration/104:175]
[41770.388520] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41770.388548] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2755540 
[41776.076307] rcu: rcu_sched kthread timer wakeup didn't happen for 1423 jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[41776.076327] rcu:     Possible timer handling issue on cpu=32 timer-softirq=1056014
[41776.076336] rcu: rcu_sched kthread starved for 1424 jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=32
[41776.076350] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[41776.076360] rcu: RCU grace-period kthread stack dump:
[41776.076434] rcu: Stack dump where RCU GP kthread last ran:
[41783.960374] watchdog: BUG: soft lockup - CPU#104 stuck for 25801s! [migration/104:175]
[41807.960119] watchdog: BUG: soft lockup - CPU#104 stuck for 25823s! [migration/104:175]
[41831.959864] watchdog: BUG: soft lockup - CPU#104 stuck for 25846s! [migration/104:175]
[41833.407851] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[41833.407868] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2760381 
[41863.959524] watchdog: BUG: soft lockup - CPU#104 stuck for 25875s! [migration/104:175]

It seems that in this case, it was the testsuite of the git package [2] that triggered the bug. As you
can see from the overview, the git package has been in the building state for 8 hours meaning the
build server crashed and is no longer giving feedback to the database.

Adrian

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=206669
> [2] https://buildd.debian.org/status/package.php?p=git&suite=experimental

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-26  8:48 ` John Paul Adrian Glaubitz
@ 2021-10-27  5:29   ` Nicholas Piggin
  2021-10-27  5:30   ` Michael Ellerman
  2021-10-28 13:52   ` John Paul Adrian Glaubitz
  2 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2021-10-27  5:29 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, mpe; +Cc: oss-security, debian-powerpc, linuxppc-dev

Excerpts from John Paul Adrian Glaubitz's message of October 26, 2021 6:48 pm:
> Hi Michael!
> 
>> The Linux kernel for powerpc since v5.2 has a bug which allows a
>> malicious KVM guest to crash the host, when the host is running on
>> Power8.
>> 
>> Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
>> metal, are affected by the bug. Machines running PowerVM are not
>> affected.
>> 
>> The bug was introduced in:
>> 
>>     10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
>> 
>> Which was first released in v5.2.
>> 
>> The upstream fix is:
>> 
>>   cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
>> 
>> Which will be included in the v5.16 release.
> 
> I have tested these patches against 5.14 but it seems the problem [1] still remains for me
> for big-endian guests. I built a patched kernel yesterday, rebooted the KVM server and let
> the build daemons do their work over night.
> 
> When I got up this morning, I noticed the machine was down, so I checked the serial console
> via IPMI and saw the same messages again as reported in [1]:
> 
> [41483.963562] watchdog: BUG: soft lockup - CPU#104 stuck for 25521s! [migration/104:175]
> [41507.963307] watchdog: BUG: soft lockup - CPU#104 stuck for 25544s! [migration/104:175]
> [41518.311200] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41518.311216] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2729959 
> [41547.962882] watchdog: BUG: soft lockup - CPU#104 stuck for 25581s! [migration/104:175]
> [41571.962627] watchdog: BUG: soft lockup - CPU#104 stuck for 25603s! [migration/104:175]
> [41581.330530] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41581.330546] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2736378 
> [41611.962202] watchdog: BUG: soft lockup - CPU#104 stuck for 25641s! [migration/104:175]
> [41635.961947] watchdog: BUG: soft lockup - CPU#104 stuck for 25663s! [migration/104:175]
> [41644.349859] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41644.349876] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2742753 
> [41671.961564] watchdog: BUG: soft lockup - CPU#104 stuck for 25697s! [migration/104:175]
> [41695.961309] watchdog: BUG: soft lockup - CPU#104 stuck for 25719s! [migration/104:175]
> [41707.369190] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41707.369206] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2749151 
> [41735.960884] watchdog: BUG: soft lockup - CPU#104 stuck for 25756s! [migration/104:175]
> [41759.960629] watchdog: BUG: soft lockup - CPU#104 stuck for 25778s! [migration/104:175]
> [41770.388520] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41770.388548] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2755540 
> [41776.076307] rcu: rcu_sched kthread timer wakeup didn't happen for 1423 jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> [41776.076327] rcu:     Possible timer handling issue on cpu=32 timer-softirq=1056014
> [41776.076336] rcu: rcu_sched kthread starved for 1424 jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=32
> [41776.076350] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
> [41776.076360] rcu: RCU grace-period kthread stack dump:
> [41776.076434] rcu: Stack dump where RCU GP kthread last ran:
> [41783.960374] watchdog: BUG: soft lockup - CPU#104 stuck for 25801s! [migration/104:175]
> [41807.960119] watchdog: BUG: soft lockup - CPU#104 stuck for 25823s! [migration/104:175]
> [41831.959864] watchdog: BUG: soft lockup - CPU#104 stuck for 25846s! [migration/104:175]
> [41833.407851] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41833.407868] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2760381 
> [41863.959524] watchdog: BUG: soft lockup - CPU#104 stuck for 25875s! [migration/104:175]

I don't suppose you were able to get any more of the log saved? (The 
first error messages that happened might be interesting)

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-26  8:48 ` John Paul Adrian Glaubitz
  2021-10-27  5:29   ` Nicholas Piggin
@ 2021-10-27  5:30   ` Michael Ellerman
  2021-10-27 10:03     ` John Paul Adrian Glaubitz
  2021-10-28 13:52   ` John Paul Adrian Glaubitz
  2 siblings, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2021-10-27  5:30 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: oss-security, debian-powerpc, linuxppc-dev

John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
> Hi Michael!

Hi Adrian,

Thanks for testing ...

>> The Linux kernel for powerpc since v5.2 has a bug which allows a
>> malicious KVM guest to crash the host, when the host is running on
>> Power8.
>> 
>> Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
>> metal, are affected by the bug. Machines running PowerVM are not
>> affected.
>> 
>> The bug was introduced in:
>> 
>>     10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
>> 
>> Which was first released in v5.2.
>> 
>> The upstream fix is:
>> 
>>   cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
>>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
>> 
>> Which will be included in the v5.16 release.
>
> I have tested these patches against 5.14 but it seems the problem [1] still remains for me
> for big-endian guests. I built a patched kernel yesterday, rebooted the KVM server and let
> the build daemons do their work over night.
>
> When I got up this morning, I noticed the machine was down, so I checked the serial console
> via IPMI and saw the same messages again as reported in [1]:
>
> [41483.963562] watchdog: BUG: soft lockup - CPU#104 stuck for 25521s! [migration/104:175]
> [41507.963307] watchdog: BUG: soft lockup - CPU#104 stuck for 25544s! [migration/104:175]
> [41518.311200] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41518.311216] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2729959 
> [41547.962882] watchdog: BUG: soft lockup - CPU#104 stuck for 25581s! [migration/104:175]
> [41571.962627] watchdog: BUG: soft lockup - CPU#104 stuck for 25603s! [migration/104:175]
> [41581.330530] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41581.330546] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2736378 
> [41611.962202] watchdog: BUG: soft lockup - CPU#104 stuck for 25641s! [migration/104:175]
> [41635.961947] watchdog: BUG: soft lockup - CPU#104 stuck for 25663s! [migration/104:175]
> [41644.349859] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41644.349876] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2742753 
> [41671.961564] watchdog: BUG: soft lockup - CPU#104 stuck for 25697s! [migration/104:175]
> [41695.961309] watchdog: BUG: soft lockup - CPU#104 stuck for 25719s! [migration/104:175]
> [41707.369190] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41707.369206] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2749151 
> [41735.960884] watchdog: BUG: soft lockup - CPU#104 stuck for 25756s! [migration/104:175]
> [41759.960629] watchdog: BUG: soft lockup - CPU#104 stuck for 25778s! [migration/104:175]
> [41770.388520] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41770.388548] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2755540 
> [41776.076307] rcu: rcu_sched kthread timer wakeup didn't happen for 1423 jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
> [41776.076327] rcu:     Possible timer handling issue on cpu=32 timer-softirq=1056014
> [41776.076336] rcu: rcu_sched kthread starved for 1424 jiffies! g49897 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=32
> [41776.076350] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
> [41776.076360] rcu: RCU grace-period kthread stack dump:
> [41776.076434] rcu: Stack dump where RCU GP kthread last ran:
> [41783.960374] watchdog: BUG: soft lockup - CPU#104 stuck for 25801s! [migration/104:175]
> [41807.960119] watchdog: BUG: soft lockup - CPU#104 stuck for 25823s! [migration/104:175]
> [41831.959864] watchdog: BUG: soft lockup - CPU#104 stuck for 25846s! [migration/104:175]
> [41833.407851] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [41833.407868] rcu:     136-...0: (135 ticks this GP) idle=242/1/0x4000000000000000 softirq=32031/32033 fqs=2760381 
> [41863.959524] watchdog: BUG: soft lockup - CPU#104 stuck for 25875s! [migration/104:175]
>
> It seems that in this case, it was the testsuite of the git package [2] that triggered the bug. As you
> can see from the overview, the git package has been in the building state for 8 hours meaning the
> build server crashed and is no longer giving feedback to the database.

OK, that sucks.

I did test the repro case you gave me before (in the bugzilla), which
was building glibc, that passes for me with a patched host.

I guess we have yet another bug.

I tried the following in a debian BE VM and it completed fine:

 $ dget -u http://ftp.debian.org/debian/pool/main/g/git/git_2.33.1-1.dsc
 $ sbuild -d sid --arch=powerpc --no-arch-all git_2.33.1-1.dsc

Same for ppc64.

And I also tried both at once, repeatedly in a loop.

I guess it's something more complicated.

What exact host/guest kernel versions and configs are you running?

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-27  5:30   ` Michael Ellerman
@ 2021-10-27 10:03     ` John Paul Adrian Glaubitz
  2021-10-27 11:06       ` Michael Ellerman
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-27 10:03 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: oss-security, debian-powerpc, linuxppc-dev

Hi Michael!

On 10/27/21 07:30, Michael Ellerman wrote:
> I did test the repro case you gave me before (in the bugzilla), which
> was building glibc, that passes for me with a patched host.

Did you manage to crash the unpatched host? If the unpatched host crashes
for you but the patched doesn't, I will make sure I didn't accidentally
miss anything.

Also, I'll try a kernel from git with Debian's config.

> I guess we have yet another bug.
> 
> I tried the following in a debian BE VM and it completed fine:
> 
>  $ dget -u http://ftp.debian.org/debian/pool/main/g/git/git_2.33.1-1.dsc
>  $ sbuild -d sid --arch=powerpc --no-arch-all git_2.33.1-1.dsc
> 
> Same for ppc64.
> 
> And I also tried both at once, repeatedly in a loop.

Did you try building gcc-11 for powerpc and ppc64 both at once?

> I guess it's something more complicated.
> 
> What exact host/guest kernel versions and configs are you running?

Both the host and guest are running Debian's stock 5.14.12 kernel. The host has
a kernel with your patches applied, the guest doesn't.

Let me do some more testing.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-27 10:03     ` John Paul Adrian Glaubitz
@ 2021-10-27 11:06       ` Michael Ellerman
  2021-10-27 11:09         ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2021-10-27 11:06 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: oss-security, debian-powerpc, linuxppc-dev

John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
> Hi Michael!
>
> On 10/27/21 07:30, Michael Ellerman wrote:
>> I did test the repro case you gave me before (in the bugzilla), which
>> was building glibc, that passes for me with a patched host.
>
> Did you manage to crash the unpatched host?

Yes, the parallel builds of glibc you described crashed the unpatched
host 100% reliably for me.

I also have a standalone reproducer I'll send you.

> If the unpatched host crashes for you but the patched doesn't, I will
> make sure I didn't accidentally miss anything.

OK thanks.

> Also, I'll try a kernel from git with Debian's config.
>
>> I guess we have yet another bug.
>> 
>> I tried the following in a debian BE VM and it completed fine:
>> 
>>  $ dget -u http://ftp.debian.org/debian/pool/main/g/git/git_2.33.1-1.dsc
>>  $ sbuild -d sid --arch=powerpc --no-arch-all git_2.33.1-1.dsc
>> 
>> Same for ppc64.
>> 
>> And I also tried both at once, repeatedly in a loop.
>
> Did you try building gcc-11 for powerpc and ppc64 both at once?

No, I will try that now.

>> I guess it's something more complicated.
>> 
>> What exact host/guest kernel versions and configs are you running?
>
> Both the host and guest are running Debian's stock 5.14.12 kernel. The host has
> a kernel with your patches applied, the guest doesn't.

OK that sounds fine.

I tested upstream stable v5.14.13 + my patches, but there's nothing
betwen 5.14.12 and 5.14.13 that should matter AFAICS.

> Let me do some more testing.

Thanks.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-27 11:06       ` Michael Ellerman
@ 2021-10-27 11:09         ` John Paul Adrian Glaubitz
  2021-10-28  6:39           ` Michael Ellerman
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-27 11:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: oss-security, debian-powerpc, linuxppc-dev

Hi Michael!

On 10/27/21 13:06, Michael Ellerman wrote:
> John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
>> Hi Michael!
>>
>> On 10/27/21 07:30, Michael Ellerman wrote:
>>> I did test the repro case you gave me before (in the bugzilla), which
>>> was building glibc, that passes for me with a patched host.
>>
>> Did you manage to crash the unpatched host?
> 
> Yes, the parallel builds of glibc you described crashed the unpatched
> host 100% reliably for me.

OK, that is very good news!

> I also have a standalone reproducer I'll send you.

Thanks, that would be helpful!

>> Also, I'll try a kernel from git with Debian's config.
>>
>>> I guess we have yet another bug.
>>>
>>> I tried the following in a debian BE VM and it completed fine:
>>>
>>>  $ dget -u http://ftp.debian.org/debian/pool/main/g/git/git_2.33.1-1.dsc
>>>  $ sbuild -d sid --arch=powerpc --no-arch-all git_2.33.1-1.dsc
>>>
>>> Same for ppc64.
>>>
>>> And I also tried both at once, repeatedly in a loop.
>>
>> Did you try building gcc-11 for powerpc and ppc64 both at once?
> 
> No, I will try that now.

OK, great!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [oss-security] Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-25 11:18 Linux kernel: powerpc: KVM guest can trigger host crash on Power8 Michael Ellerman
  2021-10-26  8:48 ` John Paul Adrian Glaubitz
@ 2021-10-28  3:58 ` Salvatore Bonaccorso
  1 sibling, 0 replies; 29+ messages in thread
From: Salvatore Bonaccorso @ 2021-10-28  3:58 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: oss-security, linuxppc-dev, John Paul Adrian Glaubitz

Hi,

On Mon, Oct 25, 2021 at 10:18:54PM +1100, Michael Ellerman wrote:
> The Linux kernel for powerpc since v5.2 has a bug which allows a
> malicious KVM guest to crash the host, when the host is running on
> Power8.
> 
> Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
> metal, are affected by the bug. Machines running PowerVM are not
> affected.
> 
> The bug was introduced in:
> 
>     10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
> 
> Which was first released in v5.2.
> 
> The upstream fix is:
> 
>   cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
> 
> Which will be included in the v5.16 release.
> 
> Note to backporters, the following commits are required:
> 
>   73287caa9210ded6066833195f4335f7f688a46b
>   ("powerpc64/idle: Fix SP offsets when saving GPRs")
> 
>   9b4416c5095c20e110c82ae602c254099b83b72f
>   ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
> 
>   cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
>   ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
> 
>   496c5fe25c377ddb7815c4ce8ecfb676f051e9b6
>   ("powerpc/idle: Don't corrupt back chain when going idle")
> 
> 
> I have a test case to trigger the bug, which I can share privately with
> anyone who would like to test the fix.

The issue has been assigned CVE-2021-43056.

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-27 11:09         ` John Paul Adrian Glaubitz
@ 2021-10-28  6:39           ` Michael Ellerman
  2021-10-28 11:20             ` John Paul Adrian Glaubitz
  2021-10-30  7:19             ` John Paul Adrian Glaubitz
  0 siblings, 2 replies; 29+ messages in thread
From: Michael Ellerman @ 2021-10-28  6:39 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc, linuxppc-dev

[ Dropping oss-security from Cc]

John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
> On 10/27/21 13:06, Michael Ellerman wrote:
>> John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
>>> On 10/27/21 07:30, Michael Ellerman wrote:
>>>> I did test the repro case you gave me before (in the bugzilla), which
>>>> was building glibc, that passes for me with a patched host.
>>>
>>> Did you manage to crash the unpatched host?
>> 
>> Yes, the parallel builds of glibc you described crashed the unpatched
>> host 100% reliably for me.
>
> OK, that is very good news!
>
>> I also have a standalone reproducer I'll send you.
>
> Thanks, that would be helpful!
>
>>> Also, I'll try a kernel from git with Debian's config.
>>>
>>>> I guess we have yet another bug.
>>>>
>>>> I tried the following in a debian BE VM and it completed fine:
>>>>
>>>>  $ dget -u http://ftp.debian.org/debian/pool/main/g/git/git_2.33.1-1.dsc
>>>>  $ sbuild -d sid --arch=powerpc --no-arch-all git_2.33.1-1.dsc
>>>>
>>>> Same for ppc64.
>>>>
>>>> And I also tried both at once, repeatedly in a loop.
>>>
>>> Did you try building gcc-11 for powerpc and ppc64 both at once?
>> 
>> No, I will try that now.

That completed fine on my BE VM here.

I ran these in two tmux windows:
  $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc
  $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc


The VM has 32 CPUs, with 4 threads per core:

  $ ppc64_cpu --info
  Core   0:    0*    1*    2*    3*
  Core   1:    4*    5*    6*    7*
  Core   2:    8*    9*   10*   11*
  Core   3:   12*   13*   14*   15*
  Core   4:   16*   17*   18*   19*
  Core   5:   20*   21*   22*   23*
  Core   6:   24*   25*   26*   27*
  Core   7:   28*   29*   30*   31*


cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28  6:39           ` Michael Ellerman
@ 2021-10-28 11:20             ` John Paul Adrian Glaubitz
  2021-10-28 14:05               ` John Paul Adrian Glaubitz
  2021-10-30  7:19             ` John Paul Adrian Glaubitz
  1 sibling, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-28 11:20 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 10/28/21 08:39, Michael Ellerman wrote:
>>> No, I will try that now.
> 
> That completed fine on my BE VM here.
> 
> I ran these in two tmux windows:
>   $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc
>   $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc
> 
> 
> The VM has 32 CPUs, with 4 threads per core:
> 
>   $ ppc64_cpu --info
>   Core   0:    0*    1*    2*    3*
>   Core   1:    4*    5*    6*    7*
>   Core   2:    8*    9*   10*   11*
>   Core   3:   12*   13*   14*   15*
>   Core   4:   16*   17*   18*   19*
>   Core   5:   20*   21*   22*   23*
>   Core   6:   24*   25*   26*   27*
>   Core   7:   28*   29*   30*   31*

It seems I also can no longer reproduce the issue, even when building the most problematic
packages and I think we should consider it fixed for now. I will keep monitoring the server,
of course, and will let you know in case the problem shows again.

Thanks a lot again for fixing this issue!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-26  8:48 ` John Paul Adrian Glaubitz
  2021-10-27  5:29   ` Nicholas Piggin
  2021-10-27  5:30   ` Michael Ellerman
@ 2021-10-28 13:52   ` John Paul Adrian Glaubitz
  2021-10-28 14:00     ` John Paul Adrian Glaubitz
  2 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-28 13:52 UTC (permalink / raw)
  To: mpe; +Cc: oss-security, debian-powerpc, linuxppc-dev

Hello!

An update to this post with oss-security CC'ed.

On 10/26/21 10:48, John Paul Adrian Glaubitz wrote:
> I have tested these patches against 5.14 but it seems the problem [1] still remains for me
> for big-endian guests. I built a patched kernel yesterday, rebooted the KVM server and let
> the build daemons do their work over night.

I have done thorough testing and I'm no longer seeing the problem with the patched kernel.

I am not sure what triggered my previous crash but I don't think it's related to this
particular bug. I will keep monitoring the server in any case and open a new bug report
in case I'm running into similar issues.

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28 13:52   ` John Paul Adrian Glaubitz
@ 2021-10-28 14:00     ` John Paul Adrian Glaubitz
  0 siblings, 0 replies; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-28 14:00 UTC (permalink / raw)
  To: mpe; +Cc: oss-security, debian-powerpc, linuxppc-dev

Hello!

On 10/28/21 15:52, John Paul Adrian Glaubitz wrote:
> I am not sure what triggered my previous crash but I don't think it's related to this
> particular bug. I will keep monitoring the server in any case and open a new bug report
> in case I'm running into similar issues.

This is very unfortunate, but just after I sent this mail, the machine crashed again.

Sorry for the premature success report. I will have to check now what happened
and get in touch with Michael.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28 11:20             ` John Paul Adrian Glaubitz
@ 2021-10-28 14:05               ` John Paul Adrian Glaubitz
  2021-10-28 14:15                 ` John Paul Adrian Glaubitz
  2021-10-29  0:41                 ` Nicholas Piggin
  0 siblings, 2 replies; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-28 14:05 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 10/28/21 13:20, John Paul Adrian Glaubitz wrote:
> It seems I also can no longer reproduce the issue, even when building the most problematic
> packages and I think we should consider it fixed for now. I will keep monitoring the server,
> of course, and will let you know in case the problem shows again.

The host machine is stuck again but I'm not 100% sure what triggered the problem:

[194817.984249] watchdog: BUG: soft lockup - CPU#80 stuck for 246s! [CPU 2/KVM:1836]
[194818.012248] watchdog: BUG: soft lockup - CPU#152 stuck for 246s! [CPU 3/KVM:1837]
[194825.960164] watchdog: BUG: soft lockup - CPU#24 stuck for 246s! [khugepaged:318]
[194841.983991] watchdog: BUG: soft lockup - CPU#80 stuck for 268s! [CPU 2/KVM:1836]
[194842.011991] watchdog: BUG: soft lockup - CPU#152 stuck for 268s! [CPU 3/KVM:1837]
[194849.959906] watchdog: BUG: soft lockup - CPU#24 stuck for 269s! [khugepaged:318]
[194865.983733] watchdog: BUG: soft lockup - CPU#80 stuck for 291s! [CPU 2/KVM:1836]
[194866.011733] watchdog: BUG: soft lockup - CPU#152 stuck for 291s! [CPU 3/KVM:1837]
[194873.959648] watchdog: BUG: soft lockup - CPU#24 stuck for 291s! [khugepaged:318]
[194889.983475] watchdog: BUG: soft lockup - CPU#80 stuck for 313s! [CPU 2/KVM:1836]
[194890.011475] watchdog: BUG: soft lockup - CPU#152 stuck for 313s! [CPU 3/KVM:1837]
[194897.959390] watchdog: BUG: soft lockup - CPU#24 stuck for 313s! [khugepaged:318]
[194913.983218] watchdog: BUG: soft lockup - CPU#80 stuck for 335s! [CPU 2/KVM:1836]
[194914.011217] watchdog: BUG: soft lockup - CPU#152 stuck for 335s! [CPU 3/KVM:1837]
[194921.959133] watchdog: BUG: soft lockup - CPU#24 stuck for 336s! [khugepaged:318]

The following packages were being built at the same time:

- guest 1: virtuoso-opensource and openturns
- guest 2: llvm-toolchain-13

I really did a lot of testing today with no issues and just after I sent my report
to oss-security that the machine seems to be stable again, the issue showed up :(.

Sorry,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28 14:05               ` John Paul Adrian Glaubitz
@ 2021-10-28 14:15                 ` John Paul Adrian Glaubitz
  2021-11-01 17:36                   ` Michal Suchánek
  2021-10-29  0:41                 ` Nicholas Piggin
  1 sibling, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-28 14:15 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi!

On 10/28/21 16:05, John Paul Adrian Glaubitz wrote:
> The following packages were being built at the same time:
> 
> - guest 1: virtuoso-opensource and openturns
> - guest 2: llvm-toolchain-13
> 
> I really did a lot of testing today with no issues and just after I sent my report
> to oss-security that the machine seems to be stable again, the issue showed up :(.

Do you know whether IPMI features any sort of monitoring for capturing the output
of the serial console non-interactively? This way I would be able to capture the
crash besides what I have seen above.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28 14:05               ` John Paul Adrian Glaubitz
  2021-10-28 14:15                 ` John Paul Adrian Glaubitz
@ 2021-10-29  0:41                 ` Nicholas Piggin
  2021-10-29 12:33                   ` John Paul Adrian Glaubitz
  1 sibling, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2021-10-29  0:41 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz, Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Excerpts from John Paul Adrian Glaubitz's message of October 29, 2021 12:05 am:
> Hi Michael!
> 
> On 10/28/21 13:20, John Paul Adrian Glaubitz wrote:
>> It seems I also can no longer reproduce the issue, even when building the most problematic
>> packages and I think we should consider it fixed for now. I will keep monitoring the server,
>> of course, and will let you know in case the problem shows again.
> 
> The host machine is stuck again but I'm not 100% sure what triggered the problem:
> 
> [194817.984249] watchdog: BUG: soft lockup - CPU#80 stuck for 246s! [CPU 2/KVM:1836]
> [194818.012248] watchdog: BUG: soft lockup - CPU#152 stuck for 246s! [CPU 3/KVM:1837]
> [194825.960164] watchdog: BUG: soft lockup - CPU#24 stuck for 246s! [khugepaged:318]
> [194841.983991] watchdog: BUG: soft lockup - CPU#80 stuck for 268s! [CPU 2/KVM:1836]
> [194842.011991] watchdog: BUG: soft lockup - CPU#152 stuck for 268s! [CPU 3/KVM:1837]
> [194849.959906] watchdog: BUG: soft lockup - CPU#24 stuck for 269s! [khugepaged:318]
> [194865.983733] watchdog: BUG: soft lockup - CPU#80 stuck for 291s! [CPU 2/KVM:1836]
> [194866.011733] watchdog: BUG: soft lockup - CPU#152 stuck for 291s! [CPU 3/KVM:1837]
> [194873.959648] watchdog: BUG: soft lockup - CPU#24 stuck for 291s! [khugepaged:318]
> [194889.983475] watchdog: BUG: soft lockup - CPU#80 stuck for 313s! [CPU 2/KVM:1836]
> [194890.011475] watchdog: BUG: soft lockup - CPU#152 stuck for 313s! [CPU 3/KVM:1837]
> [194897.959390] watchdog: BUG: soft lockup - CPU#24 stuck for 313s! [khugepaged:318]
> [194913.983218] watchdog: BUG: soft lockup - CPU#80 stuck for 335s! [CPU 2/KVM:1836]
> [194914.011217] watchdog: BUG: soft lockup - CPU#152 stuck for 335s! [CPU 3/KVM:1837]
> [194921.959133] watchdog: BUG: soft lockup - CPU#24 stuck for 336s! [khugepaged:318]

Soft lockup should mean it's taking timer interrupts still, just not 
scheduling. Do you have the hard lockup detector enabled as well? Is
there anything stuck spinning on another CPU?

Do you have the full dmesg / kernel log for this boot?

Could you try a sysrq+w to get a trace of blocked tasks?

Are you able to shut down the guests and exit qemu normally?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-29  0:41                 ` Nicholas Piggin
@ 2021-10-29 12:33                   ` John Paul Adrian Glaubitz
  2021-11-01 17:43                     ` Michal Suchánek
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-29 12:33 UTC (permalink / raw)
  To: Nicholas Piggin, Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Nicholas!

On 10/29/21 02:41, Nicholas Piggin wrote:
> Soft lockup should mean it's taking timer interrupts still, just not 
> scheduling. Do you have the hard lockup detector enabled as well? Is
> there anything stuck spinning on another CPU?

I haven't enabled it. But looking at the documentation [1] it seems we could
use it to print a backtrace once the lockup occurs.

> Do you have the full dmesg / kernel log for this boot?

I do, uploaded the messages file here: https://people.debian.org/~glaubitz/messages-kvm-lockup.gz

Also, I noticed there is actually a backtrace:

Oct 25 17:02:31 watson kernel: [14104.902061]   (detected by 80, t=5252 jiffies, g=49897, q=37)
Oct 25 17:02:31 watson kernel: [14104.902072] Sending NMI from CPU 80 to CPUs 136:
Oct 25 17:02:31 watson kernel: [14108.253972] Modules linked in: dm_mod(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) kvm_hv(E) kvm_pr(E) kvm(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_counter(E) nf_tables(E) nfnetlink(E) bridge(E) stp(E) llc(E) xfs(E) ecb(E) xts(E) sg(E) ctr(E) vmx_crypto(E) gf128mul(E) ipmi_powernv(E) powernv_rng(E) ipmi_devintf(E) rng_core(E) ipmi_msghandler(E) powernv_op_panel(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) iscsi_tcp(E) libiscsi_tcp(E) sunrpc(E) libiscsi(E) drm(E) scsi_transport_iscsi(E) fuse(E) drm_panel_orientation_quirks(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) ses(E) cdrom(E) enclosure(E) t10_pi(E) crc_t10dif(E) scsi_transport_sas(E) crct10dif_generic(E) crct10dif_common(E) btrfs(E) blake2b_generic(E) zstd_compress(E) raid10(E) raid456(E)
Oct 25 17:02:31 watson kernel: [14108.254101]  async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) xhci_pci(E) xhci_hcd(E) e1000e(E) usbcore(E) ptp(E) pps_core(E) ipr(E) usb_common(E)
Oct 25 17:02:31 watson kernel: [14108.254139] CPU: 104 PID: 175 Comm: migration/104 Tainted: G            E     5.14.0-0.bpo.2-powerpc64le #1  Debian 5.14.9-2~bpo11+2
Oct 25 17:02:31 watson kernel: [14108.254146] Stopper: multi_cpu_stop+0x0/0x240 <- migrate_swap+0xf8/0x240
Oct 25 17:02:31 watson kernel: [14108.254160] NIP:  c0000000001f6a58 LR: c00000000026b734 CTR: c00000000026b5c0
Oct 25 17:02:31 watson kernel: [14108.254163] REGS: c000001001237970 TRAP: 0900   Tainted: G            E      (5.14.0-0.bpo.2-powerpc64le Debian 5.14.9-2~bpo11+2)
Oct 25 17:02:31 watson kernel: [14108.254168] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28002442  XER: 20000000
Oct 25 17:02:31 watson kernel: [14108.254183] CFAR: c00000000026b730 IRQMASK: 0 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR00: c00000000026b32c c000001001237c10 c00000000166ce00 c000000000d02c30 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR04: c000001806433198 c000001806433198 0000000000000000 000000005687ca06 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR08: c0000017fc8948a0 c0000017fc894780 0000000000000004 c00800000a80e378 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR12: 0000000000000000 c0000017ffff5a00 c000000000173ec8 c00000000194c080 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR20: 0000000000000000 c000001806433170 0000000000000000 0000000000000001 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR24: 0000000000000002 0000000000000003 0000000000000000 c000000000d02c30 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR28: 0000000000000001 c000001806433170 c000001806433194 0000000000000001 
Oct 25 17:02:31 watson kernel: [14108.254240] NIP [c0000000001f6a58] rcu_momentary_dyntick_idle+0x48/0x60
Oct 25 17:02:31 watson kernel: [14108.254245] LR [c00000000026b734] multi_cpu_stop+0x174/0x240
Oct 25 17:02:31 watson kernel: [14108.254251] Call Trace:
Oct 25 17:02:31 watson kernel: [14108.254253] [c000001001237c10] [c000001001237c80] 0xc000001001237c80 (unreliable)
Oct 25 17:02:31 watson kernel: [14108.254260] [c000001001237c80] [c00000000026b32c] cpu_stopper_thread+0x16c/0x280
Oct 25 17:02:31 watson kernel: [14108.254267] [c000001001237d40] [c00000000017ad4c] smpboot_thread_fn+0x1ec/0x260
Oct 25 17:02:31 watson kernel: [14108.254273] [c000001001237da0] [c00000000017403c] kthread+0x17c/0x190
Oct 25 17:02:31 watson kernel: [14108.254280] [c000001001237e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
Oct 25 17:02:31 watson kernel: [14108.254287] Instruction dump:
Oct 25 17:02:31 watson kernel: [14108.254289] 394a7aa4 39297980 7cc751ae e94d0030 7d295214 39090120 7c0004ac 39400004 
Oct 25 17:02:31 watson kernel: [14108.254301] 7ce04028 7cea3a14 7ce0412d 40c2fff4 <7c0004ac> 70e90002 4c820020 0fe00000 
Oct 25 17:02:31 watson kernel: [14110.585275] CPU 136 didn't respond to backtrace IPI, inspecting paca.
Oct 25 17:02:31 watson kernel: [14110.585279] irq_soft_mask: 0x03 in_mce: 0 in_nmi: 0 current: 1813 (CPU 12/KVM)
Oct 25 17:02:31 watson kernel: [14110.585284] Back trace of paca->saved_r1 (0xc00000180640f4c0) (possibly stale):
Oct 25 17:02:31 watson kernel: [14110.585286] Call Trace:
Oct 25 17:02:31 watson kernel: [14110.585378] task:rcu_sched       state:R  running task     stack:    0 pid:   13 ppid:     2 flags:0x00000800
Oct 25 17:02:31 watson kernel: [14110.585386] Call Trace:
Oct 25 17:02:31 watson kernel: [14110.585388] [c00000000e0978d0] [c0000000001f71c0] rcu_implicit_dynticks_qs+0x0/0x370 (unreliable)
Oct 25 17:02:31 watson kernel: [14110.585399] [c00000000e097ac0] [c00000000001b264] __switch_to+0x1d4/0x2e0
Oct 25 17:02:31 watson kernel: [14110.585407] [c00000000e097b30] [c000000000cb9838] __schedule+0x2f8/0xbb0
Oct 25 17:02:31 watson kernel: [14110.585416] [c00000000e097c00] [c000000000cba334] __cond_resched+0x64/0x90
Oct 25 17:02:31 watson kernel: [14110.585424] [c00000000e097c30] [c0000000001f8670] force_qs_rnp+0xe0/0x2e0
Oct 25 17:02:31 watson kernel: [14110.585433] [c00000000e097cd0] [c0000000001fc8a8] rcu_gp_kthread+0x9c8/0xc90
Oct 25 17:02:31 watson kernel: [14110.585442] [c00000000e097da0] [c00000000017403c] kthread+0x17c/0x190
Oct 25 17:02:31 watson kernel: [14110.585450] [c00000000e097e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
Oct 25 17:02:31 watson kernel: [14110.585462] Sending NMI from CPU 80 to CPUs 32:
Oct 25 17:02:31 watson kernel: [14110.585469] NMI backtrace for cpu 32
Oct 25 17:02:31 watson kernel: [14110.585473] CPU: 32 PID: 1289 Comm: in:imklog Tainted: G            EL    5.14.0-0.bpo.2-powerpc64le #1  Debian 5.14.9-2~bpo11+2
Oct 25 17:02:31 watson kernel: [14110.585477] NIP:  00007fff92bc3bbc LR: 00007fff92bc5e90 CTR: 00007fff92bc5bf0
Oct 25 17:02:31 watson kernel: [14110.585480] REGS: c00000001c9bfe80 TRAP: 0500   Tainted: G            EL     (5.14.0-0.bpo.2-powerpc64le Debian 5.14.9-2~bpo11+2)
Oct 25 17:02:31 watson kernel: [14110.585483] MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48004802  XER: 00000000
Oct 25 17:02:31 watson kernel: [14110.585496] CFAR: 00007fff92bc3c34 IRQMASK: 0 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR00: 0000000000000000 00007fff9220d940 00007fff92d37100 000000000000000c 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR04: 00007fff9222f928 00007fff84000060 00007fff84097800 00007fff84000900 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR08: 00007fff840008d0 00007fff84000050 00007fff8408f3a0 0000000000000007 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR12: 0000000028004802 00007fff92236810 00007fff84097af0 0000000000000000 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR16: 00007fff93040000 00007fff92f54478 0000000000000000 00007fff9222f160 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR20: 00007fff9222f810 00007fff9220e4f0 0000000000000008 00007fff927156b0 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR24: 00007fff92715638 00007fff927304f8 0000000000001fa0 0000000000000000 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR28: 00007fff9220e529 000000000000006f 00007fff84000020 0000000000000030 
Oct 25 17:02:31 watson kernel: [14110.585530] NIP [00007fff92bc3bbc] 0x7fff92bc3bbc
Oct 25 17:02:31 watson kernel: [14110.585534] LR [00007fff92bc5e90] 0x7fff92bc5e90

> Could you try a sysrq+w to get a trace of blocked tasks?

Not sure how to send a magic sysrequest over the IPMI serial console. Any idea?

> Are you able to shut down the guests and exit qemu normally?

Not after the crash. I have to hard-reboot the whole machine.

Adrian

> [1] https://www.kernel.org/doc/html/latest/admin-guide/lockup-watchdogs.html

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28  6:39           ` Michael Ellerman
  2021-10-28 11:20             ` John Paul Adrian Glaubitz
@ 2021-10-30  7:19             ` John Paul Adrian Glaubitz
  2021-11-01  6:53               ` Michael Ellerman
  1 sibling, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-10-30  7:19 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 10/28/21 08:39, Michael Ellerman wrote:
> That completed fine on my BE VM here.
> 
> I ran these in two tmux windows:
>   $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc
>   $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc

Could you try gcc-10 instead? It's testsuite has crashed the host for me
with a patched kernel twice now.

$ dget -u https://deb.debian.org/debian/pool/main/g/gcc-10/gcc-10_10.3.0-12.dsc
$ sbuild -d sid --arch=powerpc --no-arch-all gcc-10_10.3.0-12.dsc
$ sbuild -d sid --arch=ppc64 --no-arch-all gcc-10_10.3.0-12.dsc

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-30  7:19             ` John Paul Adrian Glaubitz
@ 2021-11-01  6:53               ` Michael Ellerman
  2021-11-01  7:37                 ` John Paul Adrian Glaubitz
  2022-01-04 13:00                 ` John Paul Adrian Glaubitz
  0 siblings, 2 replies; 29+ messages in thread
From: Michael Ellerman @ 2021-11-01  6:53 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc, linuxppc-dev

John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
> Hi Michael!
>
> On 10/28/21 08:39, Michael Ellerman wrote:
>> That completed fine on my BE VM here.
>> 
>> I ran these in two tmux windows:
>>   $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc
>>   $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc
>
> Could you try gcc-10 instead? It's testsuite has crashed the host for me
> with a patched kernel twice now.
>
> $ dget -u https://deb.debian.org/debian/pool/main/g/gcc-10/gcc-10_10.3.0-12.dsc
> $ sbuild -d sid --arch=powerpc --no-arch-all gcc-10_10.3.0-12.dsc
> $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-10_10.3.0-12.dsc

Sure, will give that a try.

I was able to crash my machine over the weekend, building openjdk, but I
haven't been able to reproduce it for ~24 hours now (I didn't change
anything).


Can you try running your guests with no SMT threads?

I think one of your guests was using:

  -smp 32,sockets=1,dies=1,cores=8,threads=4

Can you change that to:

  -smp 8,sockets=1,dies=1,cores=8,threads=1


And something similar for the other guest(s).

If the system is stable with those settings that would be useful
information, and would also mean you could use the system without it
crashing semi regularly.

cheers

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-11-01  6:53               ` Michael Ellerman
@ 2021-11-01  7:37                 ` John Paul Adrian Glaubitz
  2021-11-01 17:20                   ` John Paul Adrian Glaubitz
  2022-01-04 13:00                 ` John Paul Adrian Glaubitz
  1 sibling, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-11-01  7:37 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 11/1/21 07:53, Michael Ellerman wrote:
> Sure, will give that a try.
> 
> I was able to crash my machine over the weekend, building openjdk, but I
> haven't been able to reproduce it for ~24 hours now (I didn't change
> anything).

I made another experiment and upgraded the host to 5.15-rc7 which contains your
fixes and made the guests build gcc-10. Interestingly, this time, the gcc-10
build crashed the guest but didn't manage to crash the host. I will update the
guest to 5.15-rc7 now as well and see how that goes.

> Can you try running your guests with no SMT threads?
> 
> I think one of your guests was using:
> 
>   -smp 32,sockets=1,dies=1,cores=8,threads=4
> 
> Can you change that to:
> 
>   -smp 8,sockets=1,dies=1,cores=8,threads=1
> 
> 
> And something similar for the other guest(s).

Sure. I will try that later. But first I want to switch the guests to 5.15-rc7 as well.

> If the system is stable with those settings that would be useful
> information, and would also mean you could use the system without it
> crashing semi regularly.

Gotcha.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-11-01  7:37                 ` John Paul Adrian Glaubitz
@ 2021-11-01 17:20                   ` John Paul Adrian Glaubitz
  0 siblings, 0 replies; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-11-01 17:20 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 11/1/21 08:37, John Paul Adrian Glaubitz wrote:
> I made another experiment and upgraded the host to 5.15-rc7 which contains your
> fixes and made the guests build gcc-10. Interestingly, this time, the gcc-10
> build crashed the guest but didn't manage to crash the host. I will update the
> guest to 5.15-rc7 now as well and see how that goes.

OK, so I'm definitely able to crash the 5.15 kernel as well:

[57031.404944] watchdog: BUG: soft lockup - CPU#24 stuck for 14957s! [migration/24:14]
[57035.420898] watchdog: BUG: soft lockup - CPU#48 stuck for 14961s! [CPU 17/KVM:1815]
[57047.456761] watchdog: BUG: soft lockup - CPU#152 stuck for 14841s! [CPU 13/KVM:1811]
[57055.404670] watchdog: BUG: soft lockup - CPU#24 stuck for 14979s! [migration/24:14]
[57059.420624] watchdog: BUG: soft lockup - CPU#48 stuck for 14983s! [CPU 17/KVM:1815]
[57064.064573] rcu: INFO: rcu_sched self-detected stall on CPU
[57064.064584] rcu:     48-....: (3338577 ticks this GP) idle=9f3/1/0x4000000000000002 softirq=77540/77540 fqs=15421 
[57064.064598] rcu: rcu_sched kthread timer wakeup didn't happen for 3988041 jiffies! g125265 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[57064.064606] rcu:     Possible timer handling issue on cpu=136 timer-softirq=313650
[57064.064611] rcu: rcu_sched kthread starved for 3988042 jiffies! g125265 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=136
[57064.064618] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[57064.064624] rcu: RCU grace-period kthread stack dump:
[57064.064665] rcu: Stack dump where RCU GP kthread last ran:
[57071.456487] watchdog: BUG: soft lockup - CPU#152 stuck for 14863s! [CPU 13/KVM:1811]
[57079.404396] watchdog: BUG: soft lockup - CPU#24 stuck for 15002s! [migration/24:14]

And the gcc-10 testsuite is able to trigger the crash very reliably.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-28 14:15                 ` John Paul Adrian Glaubitz
@ 2021-11-01 17:36                   ` Michal Suchánek
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Suchánek @ 2021-11-01 17:36 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc, linuxppc-dev

Hello,

On Thu, Oct 28, 2021 at 04:15:19PM +0200, John Paul Adrian Glaubitz wrote:
> Hi!
> 
> On 10/28/21 16:05, John Paul Adrian Glaubitz wrote:
> > The following packages were being built at the same time:
> > 
> > - guest 1: virtuoso-opensource and openturns
> > - guest 2: llvm-toolchain-13
> > 
> > I really did a lot of testing today with no issues and just after I sent my report
> > to oss-security that the machine seems to be stable again, the issue showed up :(.
> 
> Do you know whether IPMI features any sort of monitoring for capturing the output
> of the serial console non-interactively? This way I would be able to capture the
> crash besides what I have seen above.

I am pretty sure you can run something like

script ipmitool

to capture output indefinitely, and the same inside screen on a remote
machine.

Thanks

Michal

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-10-29 12:33                   ` John Paul Adrian Glaubitz
@ 2021-11-01 17:43                     ` Michal Suchánek
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Suchánek @ 2021-11-01 17:43 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc, linuxppc-dev, Nicholas Piggin

On Fri, Oct 29, 2021 at 02:33:12PM +0200, John Paul Adrian Glaubitz wrote:
> Hi Nicholas!
> 
> On 10/29/21 02:41, Nicholas Piggin wrote:
> > Soft lockup should mean it's taking timer interrupts still, just not 
> > scheduling. Do you have the hard lockup detector enabled as well? Is
> > there anything stuck spinning on another CPU?
> 

> 
> > Could you try a sysrq+w to get a trace of blocked tasks?
> 
> Not sure how to send a magic sysrequest over the IPMI serial console. Any idea?

As on any serial console sending break should be equivalent to the magic
sysrq key combo.

https://tldp.org/HOWTO/Remote-Serial-Console-HOWTO/security-sysrq.html

With ipmitool break is sent by typing ~B

https://linux.die.net/man/1/ipmitool

Thanks

Michal

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2021-11-01  6:53               ` Michael Ellerman
  2021-11-01  7:37                 ` John Paul Adrian Glaubitz
@ 2022-01-04 13:00                 ` John Paul Adrian Glaubitz
  2022-01-06 10:58                   ` Michael Ellerman
  1 sibling, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2022-01-04 13:00 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

Sorry for the long time without any responses. Shall we continue debugging this?

We're currently running 5.15.x on the host system and the guests and the testsuite
for gcc-9 still reproducibly kills the KVM host.

Adrian

On 11/1/21 07:53, Michael Ellerman wrote:
> Sure, will give that a try.
> 
> I was able to crash my machine over the weekend, building openjdk, but I
> haven't been able to reproduce it for ~24 hours now (I didn't change
> anything).
> 
> 
> Can you try running your guests with no SMT threads?
> 
> I think one of your guests was using:
> 
>   -smp 32,sockets=1,dies=1,cores=8,threads=4
> 
> Can you change that to:
> 
>   -smp 8,sockets=1,dies=1,cores=8,threads=1
> 
> 
> And something similar for the other guest(s).
> 
> If the system is stable with those settings that would be useful
> information, and would also mean you could use the system without it
> crashing semi regularly.
> 
> cheers
-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2022-01-04 13:00                 ` John Paul Adrian Glaubitz
@ 2022-01-06 10:58                   ` Michael Ellerman
  2022-01-07 11:20                     ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 29+ messages in thread
From: Michael Ellerman @ 2022-01-06 10:58 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc, linuxppc-dev

John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
> Hi Michael!
>
> Sorry for the long time without any responses. Shall we continue debugging this?

Yes!

Sorry also that I haven't been able to fix it yet, I had to stop chasing
this bug and work on other things before the end of the year.

> We're currently running 5.15.x on the host system and the guests and the testsuite
> for gcc-9 still reproducibly kills the KVM host.

Have you been able to try the different -smp options I suggested?

Can you separately test with (on the host):

 # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes


cheers

> On 11/1/21 07:53, Michael Ellerman wrote:
>> Sure, will give that a try.
>> 
>> I was able to crash my machine over the weekend, building openjdk, but I
>> haven't been able to reproduce it for ~24 hours now (I didn't change
>> anything).
>> 
>> 
>> Can you try running your guests with no SMT threads?
>> 
>> I think one of your guests was using:
>> 
>>   -smp 32,sockets=1,dies=1,cores=8,threads=4
>> 
>> Can you change that to:
>> 
>>   -smp 8,sockets=1,dies=1,cores=8,threads=1
>> 
>> 
>> And something similar for the other guest(s).
>> 
>> If the system is stable with those settings that would be useful
>> information, and would also mean you could use the system without it
>> crashing semi regularly.
>> 
>> cheers
> -- 
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer - glaubitz@debian.org
> `. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2022-01-06 10:58                   ` Michael Ellerman
@ 2022-01-07 11:20                     ` John Paul Adrian Glaubitz
  2022-01-09 22:17                       ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2022-01-07 11:20 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 1/6/22 11:58, Michael Ellerman wrote:
>> We're currently running 5.15.x on the host system and the guests and the testsuite
>> for gcc-9 still reproducibly kills the KVM host.
> 
> Have you been able to try the different -smp options I suggested?
> 
> Can you separately test with (on the host):
> 
>  # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes

I'm trying to turn off "dynamic_mt_modes" first and see if that makes any difference.

I will report back.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2022-01-07 11:20                     ` John Paul Adrian Glaubitz
@ 2022-01-09 22:17                       ` John Paul Adrian Glaubitz
  2022-01-13  0:17                         ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2022-01-09 22:17 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 1/7/22 12:20, John Paul Adrian Glaubitz wrote:
>> Can you separately test with (on the host):
>>
>>  # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes
> 
> I'm trying to turn off "dynamic_mt_modes" first and see if that makes any difference.
> 
> I will report back.

So far the machine is running stable now and the VM built gcc-9 without
crashing the host. I will continue to monitor the machine and report back
if it crashes, but it looks like this could be it.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2022-01-09 22:17                       ` John Paul Adrian Glaubitz
@ 2022-01-13  0:17                         ` John Paul Adrian Glaubitz
  2022-01-26 20:21                           ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2022-01-13  0:17 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 1/9/22 23:17, John Paul Adrian Glaubitz wrote:
> On 1/7/22 12:20, John Paul Adrian Glaubitz wrote:
>>> Can you separately test with (on the host):
>>>
>>>  # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes
>>
>> I'm trying to turn off "dynamic_mt_modes" first and see if that makes any difference.
>>
>> I will report back.
> 
> So far the machine is running stable now and the VM built gcc-9 without
> crashing the host. I will continue to monitor the machine and report back
> if it crashes, but it looks like this could be it.

So, it seems that setting "dynamic_mt_modes" actually did the trick, the host is no longer
crashing. However, I have observed on two occasions now that the build VM is just suddenly
off as if someone has shut it down using the "force-off" option in the virt-manager user
interface.

Not sure why that happens.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2022-01-13  0:17                         ` John Paul Adrian Glaubitz
@ 2022-01-26 20:21                           ` John Paul Adrian Glaubitz
  2022-01-27 15:50                             ` Mike
  0 siblings, 1 reply; 29+ messages in thread
From: John Paul Adrian Glaubitz @ 2022-01-26 20:21 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc, linuxppc-dev

Hi Michael!

On 1/13/22 01:17, John Paul Adrian Glaubitz wrote:
> On 1/9/22 23:17, John Paul Adrian Glaubitz wrote:
>> On 1/7/22 12:20, John Paul Adrian Glaubitz wrote:
>>>> Can you separately test with (on the host):
>>>>
>>>>  # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes
>>>
>>> I'm trying to turn off "dynamic_mt_modes" first and see if that makes any difference.
>>>
>>> I will report back.
>>
>> So far the machine is running stable now and the VM built gcc-9 without
>> crashing the host. I will continue to monitor the machine and report back
>> if it crashes, but it looks like this could be it.
> 
> So, it seems that setting "dynamic_mt_modes" actually did the trick, the host is no longer
> crashing. However, I have observed on two occasions now that the build VM is just suddenly
> off as if someone has shut it down using the "force-off" option in the virt-manager user
> interface.

Just as a heads-up. Ever since I set

	echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes

on the host machine, I never saw the crash again. So the issue seems to be related to the
dynamic_mt_modes feature.

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
  2022-01-26 20:21                           ` John Paul Adrian Glaubitz
@ 2022-01-27 15:50                             ` Mike
  0 siblings, 0 replies; 29+ messages in thread
From: Mike @ 2022-01-27 15:50 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc, open list:LINUX FOR POWERPC...

[-- Attachment #1: Type: text/plain, Size: 1739 bytes --]

I just made the huge mistake of hibernating and resuming, I'm going trough
the process of rescue and all, thankfully I had a 2016 cd in the drive.
I'll read up once the sheer panic settles.

-Michael

On Wed, Jan 26, 2022, 21:22 John Paul Adrian Glaubitz <
glaubitz@physik.fu-berlin.de> wrote:

> Hi Michael!
>
> On 1/13/22 01:17, John Paul Adrian Glaubitz wrote:
> > On 1/9/22 23:17, John Paul Adrian Glaubitz wrote:
> >> On 1/7/22 12:20, John Paul Adrian Glaubitz wrote:
> >>>> Can you separately test with (on the host):
> >>>>
> >>>>  # echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes
> >>>
> >>> I'm trying to turn off "dynamic_mt_modes" first and see if that makes
> any difference.
> >>>
> >>> I will report back.
> >>
> >> So far the machine is running stable now and the VM built gcc-9 without
> >> crashing the host. I will continue to monitor the machine and report
> back
> >> if it crashes, but it looks like this could be it.
> >
> > So, it seems that setting "dynamic_mt_modes" actually did the trick, the
> host is no longer
> > crashing. However, I have observed on two occasions now that the build
> VM is just suddenly
> > off as if someone has shut it down using the "force-off" option in the
> virt-manager user
> > interface.
>
> Just as a heads-up. Ever since I set
>
>         echo 0 > /sys/module/kvm_hv/parameters/dynamic_mt_modes
>
> on the host machine, I never saw the crash again. So the issue seems to be
> related to the
> dynamic_mt_modes feature.
>
> Thanks,
> Adrian
>
> --
>  .''`.  John Paul Adrian Glaubitz
> : :' :  Debian Developer - glaubitz@debian.org
> `. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
>   `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913
>
>

[-- Attachment #2: Type: text/html, Size: 2520 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-01-27 21:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-25 11:18 Linux kernel: powerpc: KVM guest can trigger host crash on Power8 Michael Ellerman
2021-10-26  8:48 ` John Paul Adrian Glaubitz
2021-10-27  5:29   ` Nicholas Piggin
2021-10-27  5:30   ` Michael Ellerman
2021-10-27 10:03     ` John Paul Adrian Glaubitz
2021-10-27 11:06       ` Michael Ellerman
2021-10-27 11:09         ` John Paul Adrian Glaubitz
2021-10-28  6:39           ` Michael Ellerman
2021-10-28 11:20             ` John Paul Adrian Glaubitz
2021-10-28 14:05               ` John Paul Adrian Glaubitz
2021-10-28 14:15                 ` John Paul Adrian Glaubitz
2021-11-01 17:36                   ` Michal Suchánek
2021-10-29  0:41                 ` Nicholas Piggin
2021-10-29 12:33                   ` John Paul Adrian Glaubitz
2021-11-01 17:43                     ` Michal Suchánek
2021-10-30  7:19             ` John Paul Adrian Glaubitz
2021-11-01  6:53               ` Michael Ellerman
2021-11-01  7:37                 ` John Paul Adrian Glaubitz
2021-11-01 17:20                   ` John Paul Adrian Glaubitz
2022-01-04 13:00                 ` John Paul Adrian Glaubitz
2022-01-06 10:58                   ` Michael Ellerman
2022-01-07 11:20                     ` John Paul Adrian Glaubitz
2022-01-09 22:17                       ` John Paul Adrian Glaubitz
2022-01-13  0:17                         ` John Paul Adrian Glaubitz
2022-01-26 20:21                           ` John Paul Adrian Glaubitz
2022-01-27 15:50                             ` Mike
2021-10-28 13:52   ` John Paul Adrian Glaubitz
2021-10-28 14:00     ` John Paul Adrian Glaubitz
2021-10-28  3:58 ` [oss-security] " Salvatore Bonaccorso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).