All of lore.kernel.org
 help / color / mirror / Atom feed
* "BUG: soft lockup" and frozen guest
@ 2019-04-29  6:35 Christopher Pereira
  2019-04-29 11:56 ` Vitaly Kuznetsov
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Pereira @ 2019-04-29  6:35 UTC (permalink / raw)
  To: kvm

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

Hi,

I have been experiencing some random guest crashes in the last years and 
would like to invest some time in trying to debug them with your help.

Symptom is:

1) "BUG: soft lockup" & "CPU#* stuck for *s!" messages during high load 
on the guest
2) At some point later (eg. 12 hours later), the guest just hangs 
without any message and must be destroyed / rebooted.

I attached the relevant kernel messages.

Host (spec: Intel(R) Xeon(R) CPU E5645) is running:

    kernel-3.10.0-327.el7.x86_64
    libvirt-daemon-kvm-1.2.17-13.el7_2.5.x86_64
    qemu-kvm-ev-2.3.0-31.el7_2.10.1.x86_64
    qemu-kvm-common-ev-2.3.0-31.el7_2.10.1.x86_64

Guest is running:

    kernel-3.10.0-327.22.2.el7.x86_64

I don't see any suspicious messages in 
/var/log/libvirt/qemu/my-guest.log. BTW, can I enable timestamps for 
this log file?


[-- Attachment #2: messages.txt --]
[-- Type: text/plain, Size: 18381 bytes --]

[irrelevant logs]

Apr 28 12:45:43 my-guest kernel: BUG: soft lockup - CPU#2 stuck for 24s! [systemd:1]
Apr 28 12:45:43 my-guest kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ppdev snd pcspkr parport_pc sg virtio_balloon soundcore i2c_piix4 parport ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_console virtio_net virtio_blk qxl syscopyarea sysfillrect
Apr 28 12:45:43 my-guest kernel: sysimgblt drm_kms_helper ttm crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw drm virtio_pci virtio_ring libata virtio i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
Apr 28 12:45:43 my-guest kernel: CPU: 2 PID: 1 Comm: systemd Not tainted 3.10.0-327.22.2.el7.x86_64 #1
Apr 28 12:45:43 my-guest kernel: Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014
Apr 28 12:45:43 my-guest kernel: task: ffff880429520000 ti: ffff880429528000 task.ti: ffff880429528000
Apr 28 12:45:43 my-guest kernel: RIP: 0033:[<00007f10c76f059c>]  [<00007f10c76f059c>] 0x7f10c76f059b
Apr 28 12:45:43 my-guest kernel: RSP: 002b:00007fff0794b1e0  EFLAGS: 00010216
Apr 28 12:45:43 my-guest kernel: RAX: 00007f10c9caf870 RBX: ffffffff81646889 RCX: 0000000000004011
Apr 28 12:45:43 my-guest kernel: RDX: 00007f10c7a2d7b8 RSI: 00000000044c0008 RDI: 0000000000000003
Apr 28 12:45:43 my-guest kernel: RBP: 00007f10c7a2d760 R08: 0000000000000010 R09: 0000000000001000
Apr 28 12:45:43 my-guest kernel: R10: 0000000000000063 R11: 0000000000000246 R12: ffffffff811e448e
Apr 28 12:45:43 my-guest kernel: R13: ffff88042952bf78 R14: 00007f10c9c33280 R15: 00000000b993c9a9
Apr 28 12:45:43 my-guest kernel: FS:  00007f10c8f34940(0000) GS:ffff88043fd00000(0000) knlGS:0000000000000000
Apr 28 12:45:43 my-guest kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 28 12:45:43 my-guest kernel: CR2: 00007f10c8f45000 CR3: 0000000425f27000 CR4: 00000000000006e0
Apr 28 12:45:43 my-guest kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 28 12:45:43 my-guest kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 28 12:45:43 my-guest kernel: 
Apr 28 12:45:43 my-guest systemd-logind: Failed to start session scope session-59439.scope: Connection timed out (null)
Apr 28 12:45:49 my-guest systemd: Failed to start Session 59439 of user apache.
Apr 28 12:45:49 my-guest systemd: Removed slice user-48.slice.
Apr 28 12:45:49 my-guest systemd: Stopping user-48.slice.
Apr 28 12:48:23 my-guest kernel: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
Apr 28 12:48:23 my-guest kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ppdev snd pcspkr parport_pc sg virtio_balloon soundcore i2c_piix4 parport ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_console virtio_net virtio_blk qxl syscopyarea sysfillrect
Apr 28 12:48:23 my-guest kernel: BUG: soft lockup - CPU#2 stuck for 23s! [swapper/2:0]
Apr 28 12:48:23 my-guest kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer ppdev snd pcspkr parport_pc sg virtio_balloon soundcore i2c_piix4 parport ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_console virtio_net virtio_blk qxl syscopyarea sysfillrect
Apr 28 12:48:23 my-guest kernel: sysimgblt drm_kms_helper ttm crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw drm virtio_pci virtio_ring libata virtio i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod
Apr 28 12:48:23 my-guest kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G             L ------------   3.10.0-327.22.2.el7.x86_64 #1
Apr 28 12:48:23 my-guest kernel: Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014
Apr 28 12:48:23 my-guest kernel: task: ffff880429575c00 ti: ffff8804295bc000 task.ti: ffff8804295bc000
Apr 28 12:48:23 my-guest kernel: RIP: 0010:[<ffffffff81058e96>]  [<ffffffff81058e96>] native_safe_halt+0x6/0x10
Apr 28 12:48:23 my-guest kernel: RSP: 0018:ffff88043fd03c90  EFLAGS: 00000206
Apr 28 12:48:23 my-guest kernel: RAX: 000000000000c638 RBX: ffff8804270f8070 RCX: 000000000000c638
Apr 28 12:48:23 my-guest kernel: RDX: 000000000000c63c RSI: 000000000000c63c RDI: ffffffffa03fd478
Apr 28 12:48:23 my-guest kernel: RBP: ffff88043fd03c90 R08: 000000009e173c29 R09: 00000000c1f6ce69
Apr 28 12:48:23 my-guest kernel: R10: ffff8801195d11f0 R11: 0000000000000005 R12: ffff88043fd03c08
Apr 28 12:48:23 my-guest kernel: R13: ffffffff816474dd R14: ffff88043fd03c90 R15: ffff88043fd0d200
Apr 28 12:48:23 my-guest kernel: FS:  0000000000000000(0000) GS:ffff88043fd00000(0000) knlGS:0000000000000000
Apr 28 12:48:23 my-guest kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 28 12:48:23 my-guest kernel: CR2: 00007f396aa9a838 CR3: 000000000194a000 CR4: 00000000000006e0
Apr 28 12:48:23 my-guest kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 28 12:48:23 my-guest kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 28 12:48:23 my-guest kernel: Stack:
Apr 28 12:48:23 my-guest kernel: sysimgblt
Apr 28 12:48:23 my-guest kernel: ffff88043fd03cc0 ffffffff81058605
Apr 28 12:48:23 my-guest kernel: drm_kms_helper ttm crct10dif_pclmul crct10dif_common crc32c_intel ata_piix serio_raw
Apr 28 12:48:23 my-guest kernel: ffffffffa03fd478
Apr 28 12:48:23 my-guest kernel: drm
Apr 28 12:48:23 my-guest kernel: virtio_pci virtio_ring
Apr 28 12:48:23 my-guest kernel: ffff8802f030a0c0
Apr 28 12:48:23 my-guest kernel: ffff8801195d1180 ffffffffa03d24b0
Apr 28 12:48:23 my-guest kernel: libata
Apr 28 12:48:23 my-guest kernel: virtio
Apr 28 12:48:23 my-guest kernel: i2c_core
Apr 28 12:48:23 my-guest kernel: floppy
Apr 28 12:48:23 my-guest kernel: ffff88043fd03d20 ffffffff81057d41
Apr 28 12:48:23 my-guest kernel: dm_mirror
Apr 28 12:48:23 my-guest kernel: dm_region_hash
Apr 28 12:48:23 my-guest kernel: dm_log
Apr 28 12:48:23 my-guest kernel: dm_mod
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: 0000000000000005 ffff8801195d11f0 00000000c1f6ce69 000000009e173c29
Apr 28 12:48:23 my-guest kernel: Call Trace:
Apr 28 12:48:23 my-guest kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G             L ------------   3.10.0-327.22.2.el7.x86_64 #1
Apr 28 12:48:23 my-guest kernel: Hardware name: Red Hat KVM, BIOS seabios-1.7.5-11.el7 04/01/2014
Apr 28 12:48:23 my-guest kernel: task: ffffffff81951440 ti: ffffffff8193c000 task.ti: ffffffff8193c000
Apr 28 12:48:23 my-guest kernel: RIP: 0010:[<ffffffff81058e96>]  [<ffffffff81058e96>] native_safe_halt+0x6/0x10
Apr 28 12:48:23 my-guest kernel: RSP: 0018:ffff88043fc03c90  EFLAGS: 00000206
Apr 28 12:48:23 my-guest kernel: RAX: 000000000000c638 RBX: 00000000000003ff RCX: 000000000000c638
Apr 28 12:48:23 my-guest kernel: RDX: 000000000000c63a RSI: 000000000000c63a RDI: ffffffffa03fd478
Apr 28 12:48:23 my-guest kernel: RBP: ffff88043fc03c90 R08: 00000000905c37f6 R09: 0000000017e26e26
Apr 28 12:48:23 my-guest kernel: R10: ffff880101e6e570 R11: 0000000000000005 R12: ffff88043fc03c08
Apr 28 12:48:23 my-guest kernel: R13: ffffffff816474dd R14: ffff88043fc03c90 R15: ffff88043fc0d200
Apr 28 12:48:23 my-guest kernel: FS:  0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
Apr 28 12:48:23 my-guest kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 28 12:48:23 my-guest kernel: CR2: 00007f396c4bb000 CR3: 00000004261e1000 CR4: 00000000000006f0
Apr 28 12:48:23 my-guest kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 28 12:48:23 my-guest kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 28 12:48:23 my-guest kernel: Stack:
Apr 28 12:48:23 my-guest kernel: <IRQ> 
Apr 28 12:48:23 my-guest kernel: ffff88043fc03cc0
Apr 28 12:48:23 my-guest kernel: [<ffffffff81058605>] kvm_lock_spinning+0xb5/0xc0
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24b0>] ? nf_ct_delete+0x270/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff81057d41>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Apr 28 12:48:23 my-guest kernel: ffffffff81058605 ffffffffa03fd478 ffff880424d52540
Apr 28 12:48:23 my-guest kernel: [<ffffffff8163db1c>] ? _raw_spin_lock_bh+0x4c/0x50
Apr 28 12:48:23 my-guest kernel: ffff880101e6e500
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03f9257>] nf_nat_cleanup_conntrack+0x37/0x70 [nf_nat]
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: ffffffffa03d24b0
Apr 28 12:48:23 my-guest kernel: ffff88043fc03d20 ffffffff81057d41
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03da705>] __nf_ct_ext_destroy+0x45/0x60 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: 0000000000000005 ffff880101e6e570 0000000017e26e26 00000000905c37f6
Apr 28 12:48:23 my-guest kernel: Call Trace:
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d1365>] nf_conntrack_free+0x25/0x60 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: <IRQ> 
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d2ca0>] destroy_conntrack+0x90/0xe0 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff8155d197>] nf_conntrack_destroy+0x17/0x20
Apr 28 12:48:23 my-guest kernel: [<ffffffff81058605>] kvm_lock_spinning+0xb5/0xc0
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d23cd>] nf_ct_delete+0x18d/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff810a86a1>] ? run_posix_cpu_timers+0x51/0x840
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24b0>] ? nf_ct_delete+0x270/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24c2>] death_by_timeout+0x12/0x20 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff8108b0a6>] call_timer_fn+0x36/0x110
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24b0>] ? nf_ct_delete+0x270/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff81057d41>] __raw_callee_save_kvm_lock_spinning+0x11/0x20
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24b0>] ? nf_ct_delete+0x270/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff8163db1c>] ? _raw_spin_lock_bh+0x4c/0x50
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03f9257>] nf_nat_cleanup_conntrack+0x37/0x70 [nf_nat]
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03da705>] __nf_ct_ext_destroy+0x45/0x60 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff8108dd97>] run_timer_softirq+0x237/0x340
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d1365>] nf_conntrack_free+0x25/0x60 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff81084b0f>] __do_softirq+0xef/0x280
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d2ca0>] destroy_conntrack+0x90/0xe0 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff8164819c>] call_softirq+0x1c/0x30
Apr 28 12:48:23 my-guest kernel: [<ffffffff8155d197>] nf_conntrack_destroy+0x17/0x20
Apr 28 12:48:23 my-guest kernel: [<ffffffff81016fc5>] do_softirq+0x65/0xa0
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d23cd>] nf_ct_delete+0x18d/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff81084ea5>] irq_exit+0x115/0x120
Apr 28 12:48:23 my-guest kernel: [<ffffffff810a86a1>] ? run_posix_cpu_timers+0x51/0x840
Apr 28 12:48:23 my-guest kernel: [<ffffffff81648e15>] smp_apic_timer_interrupt+0x45/0x60
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24b0>] ? nf_ct_delete+0x270/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff816474dd>] apic_timer_interrupt+0x6d/0x80
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24c2>] death_by_timeout+0x12/0x20 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: <EOI> 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
Apr 28 12:48:23 my-guest kernel: [<ffffffff8108b0a6>] call_timer_fn+0x36/0x110
Apr 28 12:48:23 my-guest kernel: [<ffffffffa03d24b0>] ? nf_ct_delete+0x270/0x270 [nf_conntrack]
Apr 28 12:48:23 my-guest kernel: [<ffffffff8101dbff>] default_idle+0x1f/0xc0
Apr 28 12:48:23 my-guest kernel: [<ffffffff8101e506>] arch_cpu_idle+0x26/0x30
Apr 28 12:48:23 my-guest kernel: [<ffffffff8108dd97>] run_timer_softirq+0x237/0x340
Apr 28 12:48:23 my-guest kernel: [<ffffffff81084b0f>] __do_softirq+0xef/0x280
Apr 28 12:48:23 my-guest kernel: [<ffffffff810d6445>] cpu_startup_entry+0x245/0x290
Apr 28 12:48:23 my-guest kernel: [<ffffffff8164819c>] call_softirq+0x1c/0x30
Apr 28 12:48:23 my-guest kernel: [<ffffffff8104762a>] start_secondary+0x1ba/0x230
Apr 28 12:48:23 my-guest kernel: [<ffffffff81016fc5>] do_softirq+0x65/0xa0
Apr 28 12:48:23 my-guest kernel: Code: 00 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81084ea5>] irq_exit+0x115/0x120
Apr 28 12:48:23 my-guest kernel: [<ffffffff81648e15>] smp_apic_timer_interrupt+0x45/0x60
Apr 28 12:48:23 my-guest kernel: 00 00 
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff816474dd>] apic_timer_interrupt+0x6d/0x80
Apr 28 12:48:23 my-guest kernel: <EOI> 
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
Apr 28 12:48:23 my-guest kernel: 00 00 
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff8101dbff>] default_idle+0x1f/0xc0
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff8101e506>] arch_cpu_idle+0x26/0x30
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff810d6445>] cpu_startup_entry+0x245/0x290
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81625d37>] rest_init+0x77/0x80
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81a8d057>] start_kernel+0x429/0x44a
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81a8ca37>] ? repair_env_string+0x5c/0x5c
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81a8c120>] ? early_idt_handlers+0x120/0x120
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81a8c5ee>] x86_64_start_reservations+0x2a/0x2c
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: [<ffffffff81a8c742>] x86_64_start_kernel+0x152/0x175
Apr 28 12:48:23 my-guest kernel: Code: 
Apr 28 12:48:23 my-guest kernel: 55 48 89 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: e5 fa 5d 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 55 
Apr 28 12:48:23 my-guest kernel: 48 
Apr 28 12:48:23 my-guest kernel: 89 
Apr 28 12:48:23 my-guest kernel: e5 
Apr 28 12:48:23 my-guest kernel: fa 
Apr 28 12:48:23 my-guest kernel: 5d 
Apr 28 12:48:23 my-guest kernel: c3 
Apr 28 12:48:23 my-guest kernel: 66 
Apr 28 12:48:23 my-guest kernel: 0f 
Apr 28 12:48:23 my-guest kernel: 1f 
Apr 28 12:48:23 my-guest kernel: 84 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 55 
Apr 28 12:48:23 my-guest kernel: 48 
Apr 28 12:48:23 my-guest kernel: 89 
Apr 28 12:48:23 my-guest kernel: e5 
Apr 28 12:48:23 my-guest kernel: fb 
Apr 28 12:48:23 my-guest kernel: 5d 
Apr 28 12:48:23 my-guest kernel: c3 
Apr 28 12:48:23 my-guest kernel: 66 
Apr 28 12:48:23 my-guest kernel: 0f 
Apr 28 12:48:23 my-guest kernel: 1f 
Apr 28 12:48:23 my-guest kernel: 84 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 55 
Apr 28 12:48:23 my-guest kernel: 48 
Apr 28 12:48:23 my-guest kernel: 89 
Apr 28 12:48:23 my-guest kernel: e5 
Apr 28 12:48:23 my-guest kernel: fb 
Apr 28 12:48:23 my-guest kernel: f4 
Apr 28 12:48:23 my-guest kernel: <5d> 
Apr 28 12:48:23 my-guest kernel: c3 
Apr 28 12:48:23 my-guest kernel: 0f 
Apr 28 12:48:23 my-guest kernel: 1f 
Apr 28 12:48:23 my-guest kernel: 84 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 00 
Apr 28 12:48:23 my-guest kernel: 55 
Apr 28 12:48:23 my-guest kernel: 48 
Apr 28 12:48:23 my-guest kernel: 89 
Apr 28 12:48:23 my-guest kernel: e5 
Apr 28 12:48:23 my-guest kernel: f4 
Apr 28 12:48:23 my-guest kernel: 5d 
Apr 28 12:48:23 my-guest kernel: c3 
Apr 28 12:48:23 my-guest kernel: 66 
Apr 28 12:48:23 my-guest kernel: 0f 
Apr 28 12:48:23 my-guest kernel: 1f 
Apr 28 12:48:23 my-guest kernel: 84 
Apr 28 12:48:23 my-guest kernel: 
Apr 28 12:48:23 my-guest kernel: c3 
Apr 28 12:48:23 my-guest kernel: 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84 

[irrelevant logs]

[guest freezed without any additional message]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "BUG: soft lockup" and frozen guest
  2019-04-29  6:35 "BUG: soft lockup" and frozen guest Christopher Pereira
@ 2019-04-29 11:56 ` Vitaly Kuznetsov
  2019-04-29 23:14   ` Christopher Pereira
  0 siblings, 1 reply; 4+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-29 11:56 UTC (permalink / raw)
  To: Christopher Pereira; +Cc: kvm

Christopher Pereira <kripper@imatronix.cl> writes:

> Hi,
>
> I have been experiencing some random guest crashes in the last years and 
> would like to invest some time in trying to debug them with your help.
>
> Symptom is:
>
> 1) "BUG: soft lockup" & "CPU#* stuck for *s!" messages during high load 
> on the guest
> 2) At some point later (eg. 12 hours later), the guest just hangs 
> without any message and must be destroyed / rebooted.
>
> I attached the relevant kernel messages.
>
> Host (spec: Intel(R) Xeon(R) CPU E5645) is running:
>
>     kernel-3.10.0-327.el7.x86_64
>     libvirt-daemon-kvm-1.2.17-13.el7_2.5.x86_64
>     qemu-kvm-ev-2.3.0-31.el7_2.10.1.x86_64
>     qemu-kvm-common-ev-2.3.0-31.el7_2.10.1.x86_64

This is pretty old stuff, e.g. kernel-3.10.0-327.el7 was release with
RHEL-7.2 (Nov 2015). As this is upstream mailing list, it would be great
if you could build an upstream kernel (should work with EL7 userspace)
and try to reproduce.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "BUG: soft lockup" and frozen guest
  2019-04-29 11:56 ` Vitaly Kuznetsov
@ 2019-04-29 23:14   ` Christopher Pereira
  2019-04-30 10:49     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Pereira @ 2019-04-29 23:14 UTC (permalink / raw)
  To: Vitaly Kuznetsov; +Cc: kvm


On April 29, 2019 7:56:44 AM AST, Vitaly Kuznetsov <vkuznets@redhat.com> 
wrote:

    Christopher Pereira <kripper@imatronix.cl> writes:

        Hi, I have been experiencing some random guest crashes in the
        last years and would like to invest some time in trying to debug
        them with your help. Symptom is: 1) "BUG: soft lockup" & "CPU#*
        stuck for *s!" messages during high load on the guest 2) At some
        point later (eg. 12 hours later), the guest just hangs without
        any message and must be destroyed / rebooted. I attached the
        relevant kernel messages. Host (spec: Intel(R) Xeon(R) CPU
        E5645) is running: kernel-3.10.0-327.el7.x86_64
        libvirt-daemon-kvm-1.2.17-13.el7_2.5.x86_64
        qemu-kvm-ev-2.3.0-31.el7_2.10.1.x86_64
        qemu-kvm-common-ev-2.3.0-31.el7_2.10.1.x86_64 


    This is pretty old stuff, e.g. kernel-3.10.0-327.el7 was release with
    RHEL-7.2 (Nov 2015). As this is upstream mailing list, it would be great
    if you could build an upstream kernel (should work with EL7 userspace)
    and try to reproduce.

Hi Vitaly,

Yes, but it's a critical production environment and I haven't seen any 
related patch in the kernel changelog since 3.10. We will try to upgrade 
whenever possible.

I believe this bug could be related to overcommitting resources. Does 
qemu-kvm throw any log message when resources are overcommited? Is there 
some way to enable this?

We have seen this happening one in a while in the last 4 years on 
different production hardware and wanted to ask if this is a common 
issue and how to address/debug this issue.

Best regards.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: "BUG: soft lockup" and frozen guest
  2019-04-29 23:14   ` Christopher Pereira
@ 2019-04-30 10:49     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 4+ messages in thread
From: Vitaly Kuznetsov @ 2019-04-30 10:49 UTC (permalink / raw)
  To: Christopher Pereira; +Cc: kvm

Christopher Pereira <kripper@imatronix.cl> writes:

> On April 29, 2019 7:56:44 AM AST, Vitaly Kuznetsov <vkuznets@redhat.com> 
> wrote:
>
>     Christopher Pereira <kripper@imatronix.cl> writes:
>
>         Hi, I have been experiencing some random guest crashes in the
>         last years and would like to invest some time in trying to debug
>         them with your help. Symptom is: 1) "BUG: soft lockup" & "CPU#*
>         stuck for *s!" messages during high load on the guest 2) At some
>         point later (eg. 12 hours later), the guest just hangs without
>         any message and must be destroyed / rebooted. I attached the
>         relevant kernel messages. Host (spec: Intel(R) Xeon(R) CPU
>         E5645) is running: kernel-3.10.0-327.el7.x86_64
>         libvirt-daemon-kvm-1.2.17-13.el7_2.5.x86_64
>         qemu-kvm-ev-2.3.0-31.el7_2.10.1.x86_64
>         qemu-kvm-common-ev-2.3.0-31.el7_2.10.1.x86_64 
>
>
>     This is pretty old stuff, e.g. kernel-3.10.0-327.el7 was release with
>     RHEL-7.2 (Nov 2015). As this is upstream mailing list, it would be great
>     if you could build an upstream kernel (should work with EL7 userspace)
>     and try to reproduce.
>
> Hi Vitaly,
>
> Yes, but it's a critical production environment and I haven't seen any 
> related patch in the kernel changelog since 3.10. We will try to upgrade 
> whenever possible.

It's hard to tell which changes may be related here (as, for example, I
also see nf_conntrack_* in your trace and the bug may as well be there)
but even in RHEL-7.2 updates (kernel-3.10.0-327.*) I can see several
dozed KVM commits (and there's several hundred between 7.2 and 7.6).

>
> I believe this bug could be related to overcommitting resources. Does 
> qemu-kvm throw any log message when resources are overcommited? Is there 
> some way to enable this?
>
> We have seen this happening one in a while in the last 4 years on 
> different production hardware and wanted to ask if this is a common 
> issue and how to address/debug this issue.


Define "resources" and "overcommit" ;-) In case you overcommit
CPUs/memory severily (like dozens/hundereds of vCPUs per pCPU, host
constantly swapping) guests may, of course, start to misbehave. In case
it is just a couple of vCPU per pCPU and the host is not swapping
guest softlockups are not normal.

In case there's no way to trigger the issue you may want to enable kdump
and set

sysctl -w kernel.softlockup_panic=1
sysctl -w kernel.softlockup_all_cpu_backtrace=1

and then inspect the crash dump.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-30 10:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29  6:35 "BUG: soft lockup" and frozen guest Christopher Pereira
2019-04-29 11:56 ` Vitaly Kuznetsov
2019-04-29 23:14   ` Christopher Pereira
2019-04-30 10:49     ` Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.