All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
To: Nicholas Piggin <npiggin@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>
Cc: "debian-powerpc@lists.debian.org"
	<debian-powerpc@lists.debian.org>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
Date: Fri, 29 Oct 2021 14:33:12 +0200	[thread overview]
Message-ID: <1d02b53d-cb39-38bb-8ce2-9a92cc97e729@physik.fu-berlin.de> (raw)
In-Reply-To: <1635467831.en5s268a3l.astroid@bobo.none>

Hi Nicholas!

On 10/29/21 02:41, Nicholas Piggin wrote:
> Soft lockup should mean it's taking timer interrupts still, just not 
> scheduling. Do you have the hard lockup detector enabled as well? Is
> there anything stuck spinning on another CPU?

I haven't enabled it. But looking at the documentation [1] it seems we could
use it to print a backtrace once the lockup occurs.

> Do you have the full dmesg / kernel log for this boot?

I do, uploaded the messages file here: https://people.debian.org/~glaubitz/messages-kvm-lockup.gz

Also, I noticed there is actually a backtrace:

Oct 25 17:02:31 watson kernel: [14104.902061]   (detected by 80, t=5252 jiffies, g=49897, q=37)
Oct 25 17:02:31 watson kernel: [14104.902072] Sending NMI from CPU 80 to CPUs 136:
Oct 25 17:02:31 watson kernel: [14108.253972] Modules linked in: dm_mod(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E) tun(E) kvm_hv(E) kvm_pr(E) kvm(E) xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_counter(E) nf_tables(E) nfnetlink(E) bridge(E) stp(E) llc(E) xfs(E) ecb(E) xts(E) sg(E) ctr(E) vmx_crypto(E) gf128mul(E) ipmi_powernv(E) powernv_rng(E) ipmi_devintf(E) rng_core(E) ipmi_msghandler(E) powernv_op_panel(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) iscsi_tcp(E) libiscsi_tcp(E) sunrpc(E) libiscsi(E) drm(E) scsi_transport_iscsi(E) fuse(E) drm_panel_orientation_quirks(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) ses(E) cdrom(E) enclosure(E) t10_pi(E) crc_t10dif(E) scsi_transport_sas(E) crct10dif_generic(E) crct10dif_common(E) btrfs(E) blake2b_generic(E) zstd_compress(E) raid10(E) raid456(E)
Oct 25 17:02:31 watson kernel: [14108.254101]  async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E) xhci_pci(E) xhci_hcd(E) e1000e(E) usbcore(E) ptp(E) pps_core(E) ipr(E) usb_common(E)
Oct 25 17:02:31 watson kernel: [14108.254139] CPU: 104 PID: 175 Comm: migration/104 Tainted: G            E     5.14.0-0.bpo.2-powerpc64le #1  Debian 5.14.9-2~bpo11+2
Oct 25 17:02:31 watson kernel: [14108.254146] Stopper: multi_cpu_stop+0x0/0x240 <- migrate_swap+0xf8/0x240
Oct 25 17:02:31 watson kernel: [14108.254160] NIP:  c0000000001f6a58 LR: c00000000026b734 CTR: c00000000026b5c0
Oct 25 17:02:31 watson kernel: [14108.254163] REGS: c000001001237970 TRAP: 0900   Tainted: G            E      (5.14.0-0.bpo.2-powerpc64le Debian 5.14.9-2~bpo11+2)
Oct 25 17:02:31 watson kernel: [14108.254168] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28002442  XER: 20000000
Oct 25 17:02:31 watson kernel: [14108.254183] CFAR: c00000000026b730 IRQMASK: 0 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR00: c00000000026b32c c000001001237c10 c00000000166ce00 c000000000d02c30 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR04: c000001806433198 c000001806433198 0000000000000000 000000005687ca06 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR08: c0000017fc8948a0 c0000017fc894780 0000000000000004 c00800000a80e378 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR12: 0000000000000000 c0000017ffff5a00 c000000000173ec8 c00000000194c080 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR20: 0000000000000000 c000001806433170 0000000000000000 0000000000000001 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR24: 0000000000000002 0000000000000003 0000000000000000 c000000000d02c30 
Oct 25 17:02:31 watson kernel: [14108.254183] GPR28: 0000000000000001 c000001806433170 c000001806433194 0000000000000001 
Oct 25 17:02:31 watson kernel: [14108.254240] NIP [c0000000001f6a58] rcu_momentary_dyntick_idle+0x48/0x60
Oct 25 17:02:31 watson kernel: [14108.254245] LR [c00000000026b734] multi_cpu_stop+0x174/0x240
Oct 25 17:02:31 watson kernel: [14108.254251] Call Trace:
Oct 25 17:02:31 watson kernel: [14108.254253] [c000001001237c10] [c000001001237c80] 0xc000001001237c80 (unreliable)
Oct 25 17:02:31 watson kernel: [14108.254260] [c000001001237c80] [c00000000026b32c] cpu_stopper_thread+0x16c/0x280
Oct 25 17:02:31 watson kernel: [14108.254267] [c000001001237d40] [c00000000017ad4c] smpboot_thread_fn+0x1ec/0x260
Oct 25 17:02:31 watson kernel: [14108.254273] [c000001001237da0] [c00000000017403c] kthread+0x17c/0x190
Oct 25 17:02:31 watson kernel: [14108.254280] [c000001001237e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
Oct 25 17:02:31 watson kernel: [14108.254287] Instruction dump:
Oct 25 17:02:31 watson kernel: [14108.254289] 394a7aa4 39297980 7cc751ae e94d0030 7d295214 39090120 7c0004ac 39400004 
Oct 25 17:02:31 watson kernel: [14108.254301] 7ce04028 7cea3a14 7ce0412d 40c2fff4 <7c0004ac> 70e90002 4c820020 0fe00000 
Oct 25 17:02:31 watson kernel: [14110.585275] CPU 136 didn't respond to backtrace IPI, inspecting paca.
Oct 25 17:02:31 watson kernel: [14110.585279] irq_soft_mask: 0x03 in_mce: 0 in_nmi: 0 current: 1813 (CPU 12/KVM)
Oct 25 17:02:31 watson kernel: [14110.585284] Back trace of paca->saved_r1 (0xc00000180640f4c0) (possibly stale):
Oct 25 17:02:31 watson kernel: [14110.585286] Call Trace:
Oct 25 17:02:31 watson kernel: [14110.585378] task:rcu_sched       state:R  running task     stack:    0 pid:   13 ppid:     2 flags:0x00000800
Oct 25 17:02:31 watson kernel: [14110.585386] Call Trace:
Oct 25 17:02:31 watson kernel: [14110.585388] [c00000000e0978d0] [c0000000001f71c0] rcu_implicit_dynticks_qs+0x0/0x370 (unreliable)
Oct 25 17:02:31 watson kernel: [14110.585399] [c00000000e097ac0] [c00000000001b264] __switch_to+0x1d4/0x2e0
Oct 25 17:02:31 watson kernel: [14110.585407] [c00000000e097b30] [c000000000cb9838] __schedule+0x2f8/0xbb0
Oct 25 17:02:31 watson kernel: [14110.585416] [c00000000e097c00] [c000000000cba334] __cond_resched+0x64/0x90
Oct 25 17:02:31 watson kernel: [14110.585424] [c00000000e097c30] [c0000000001f8670] force_qs_rnp+0xe0/0x2e0
Oct 25 17:02:31 watson kernel: [14110.585433] [c00000000e097cd0] [c0000000001fc8a8] rcu_gp_kthread+0x9c8/0xc90
Oct 25 17:02:31 watson kernel: [14110.585442] [c00000000e097da0] [c00000000017403c] kthread+0x17c/0x190
Oct 25 17:02:31 watson kernel: [14110.585450] [c00000000e097e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
Oct 25 17:02:31 watson kernel: [14110.585462] Sending NMI from CPU 80 to CPUs 32:
Oct 25 17:02:31 watson kernel: [14110.585469] NMI backtrace for cpu 32
Oct 25 17:02:31 watson kernel: [14110.585473] CPU: 32 PID: 1289 Comm: in:imklog Tainted: G            EL    5.14.0-0.bpo.2-powerpc64le #1  Debian 5.14.9-2~bpo11+2
Oct 25 17:02:31 watson kernel: [14110.585477] NIP:  00007fff92bc3bbc LR: 00007fff92bc5e90 CTR: 00007fff92bc5bf0
Oct 25 17:02:31 watson kernel: [14110.585480] REGS: c00000001c9bfe80 TRAP: 0500   Tainted: G            EL     (5.14.0-0.bpo.2-powerpc64le Debian 5.14.9-2~bpo11+2)
Oct 25 17:02:31 watson kernel: [14110.585483] MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48004802  XER: 00000000
Oct 25 17:02:31 watson kernel: [14110.585496] CFAR: 00007fff92bc3c34 IRQMASK: 0 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR00: 0000000000000000 00007fff9220d940 00007fff92d37100 000000000000000c 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR04: 00007fff9222f928 00007fff84000060 00007fff84097800 00007fff84000900 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR08: 00007fff840008d0 00007fff84000050 00007fff8408f3a0 0000000000000007 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR12: 0000000028004802 00007fff92236810 00007fff84097af0 0000000000000000 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR16: 00007fff93040000 00007fff92f54478 0000000000000000 00007fff9222f160 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR20: 00007fff9222f810 00007fff9220e4f0 0000000000000008 00007fff927156b0 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR24: 00007fff92715638 00007fff927304f8 0000000000001fa0 0000000000000000 
Oct 25 17:02:31 watson kernel: [14110.585496] GPR28: 00007fff9220e529 000000000000006f 00007fff84000020 0000000000000030 
Oct 25 17:02:31 watson kernel: [14110.585530] NIP [00007fff92bc3bbc] 0x7fff92bc3bbc
Oct 25 17:02:31 watson kernel: [14110.585534] LR [00007fff92bc5e90] 0x7fff92bc5e90

> Could you try a sysrq+w to get a trace of blocked tasks?

Not sure how to send a magic sysrequest over the IPMI serial console. Any idea?

> Are you able to shut down the guests and exit qemu normally?

Not after the crash. I have to hard-reboot the whole machine.

Adrian

> [1] https://www.kernel.org/doc/html/latest/admin-guide/lockup-watchdogs.html

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


  reply	other threads:[~2021-10-29 12:33 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 11:18 Linux kernel: powerpc: KVM guest can trigger host crash on Power8 Michael Ellerman
2021-10-26  8:48 ` John Paul Adrian Glaubitz
2021-10-27  5:29   ` Nicholas Piggin
2021-10-27  5:30   ` Michael Ellerman
2021-10-27 10:03     ` John Paul Adrian Glaubitz
2021-10-27 11:06       ` Michael Ellerman
2021-10-27 11:09         ` John Paul Adrian Glaubitz
2021-10-28  6:39           ` Michael Ellerman
2021-10-28 11:20             ` John Paul Adrian Glaubitz
2021-10-28 14:05               ` John Paul Adrian Glaubitz
2021-10-28 14:15                 ` John Paul Adrian Glaubitz
2021-11-01 17:36                   ` Michal Suchánek
2021-10-29  0:41                 ` Nicholas Piggin
2021-10-29 12:33                   ` John Paul Adrian Glaubitz [this message]
2021-11-01 17:43                     ` Michal Suchánek
2021-10-30  7:19             ` John Paul Adrian Glaubitz
2021-11-01  6:53               ` Michael Ellerman
2021-11-01  7:37                 ` John Paul Adrian Glaubitz
2021-11-01 17:20                   ` John Paul Adrian Glaubitz
2022-01-04 13:00                 ` John Paul Adrian Glaubitz
2022-01-06 10:58                   ` Michael Ellerman
2022-01-07 11:20                     ` John Paul Adrian Glaubitz
2022-01-09 22:17                       ` John Paul Adrian Glaubitz
2022-01-13  0:17                         ` John Paul Adrian Glaubitz
2022-01-26 20:21                           ` John Paul Adrian Glaubitz
2022-01-27 15:50                             ` Mike
2021-10-28 13:52   ` John Paul Adrian Glaubitz
2021-10-28 14:00     ` John Paul Adrian Glaubitz
2021-10-28  3:58 ` [oss-security] " Salvatore Bonaccorso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d02b53d-cb39-38bb-8ce2-9a92cc97e729@physik.fu-berlin.de \
    --to=glaubitz@physik.fu-berlin.de \
    --cc=debian-powerpc@lists.debian.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.