[Qemu-devel] sda abort with virtio-scsi

* [Qemu-devel] sda abort with virtio-scsi
@ 2016-02-03 21:46 Jim Minter
  2016-02-03 23:19 ` Paolo Bonzini
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Minter @ 2016-02-03 21:46 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 10421 bytes --]

Hello,

I am hitting the following VM lockup issue running a VM with latest 
RHEL7 kernel on a host also running latest RHEL7 kernel.  FWIW I'm using 
virtio-scsi because I want to use discard=unmap.  I ran the VM as follows:

/usr/libexec/qemu-kvm -nodefaults \
   -cpu host \
   -smp 4 \
   -m 8192 \
   -drive discard=unmap,file=vm.qcow2,id=disk1,if=none,cache=unsafe \
   -device virtio-scsi-pci \
   -device scsi-disk,drive=disk1 \
   -netdev bridge,id=net0,br=br0 \
   -device virtio-net-pci,netdev=net0,mac=$(utils/random-mac.py) \
   -chardev socket,id=chan0,path=/tmp/rhev.sock,server,nowait \
   -chardev socket,id=chan1,path=/tmp/qemu.sock,server,nowait \
   -monitor unix:tmp/vm.sock,server,nowait \
   -device virtio-serial-pci \
   -device virtserialport,chardev=chan0,name=com.redhat.rhevm.vdsm \
   -device virtserialport,chardev=chan1,name=org.qemu.guest_agent.0 \
   -device cirrus-vga \
   -vnc none \
   -usbdevice tablet

The host was busyish at the time, but not excessively (IMO).  Nothing 
untoward in the host's kernel log; host storage subsystem is fine.  I 
didn't get any qemu logs this time around, but I will when the issue 
next recurs.  The VM's full kernel log is attached; here are the highlights:

sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 2, 
t=60002 jiffies, g=5253, c=5252, q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-327.4.5.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
task: ffff88023417d080 ti: ffff8802341a4000 task.ti: ffff8802341a4000
RIP: 0010:[<ffffffff81058e96>]  [<ffffffff81058e96>] 
native_safe_halt+0x6/0x10
RSP: 0018:ffff8802341a7e98  EFLAGS: 00000286
RAX: 00000000ffffffed RBX: ffff8802341a4000 RCX: 0100000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
RBP: ffff8802341a7e98 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffff8802341a4000 R14: ffff8802341a4000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4978587008 CR3: 000000003645e000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
  ffff8802341a7eb8 ffffffff8101dbcf ffff8802341a4000 ffffffff81a68260
  ffff8802341a7ec8 ffffffff8101e4d6 ffff8802341a7f20 ffffffff810d62e5
  ffff8802341a7fd8 ffff8802341a4000 2581685d70de192c 7ba58fdb3a3bc8d4
Call Trace:
  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
  [<ffffffff810d62e5>] cpu_startup_entry+0x245/0x290
  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 
48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 
0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-327.4.5.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
task: ffffffff81951440 ti: ffffffff8193c000 task.ti: ffffffff8193c000
RIP: 0010:[<ffffffff81058e96>]  [<ffffffff81058e96>] 
native_safe_halt+0x6/0x10
RSP: 0018:ffffffff8193fea0  EFLAGS: 00000286
RAX: 00000000ffffffed RBX: ffffffff8193c000 RCX: 0100000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000046
RBP: ffffffff8193fea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000000b975 R12: 0000000000000000
R13: ffffffff8193c000 R14: ffffffff8193c000 R15: ffffffff8193ffa8
FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f497d2dca60 CR3: 00000001cd69b000 CR4: 00000000003407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
  ffffffff8193fec0 ffffffff8101dbcf ffffffff8193c000 ffffffff81a68260
  ffffffff8193fed0 ffffffff8101e4d6 ffffffff8193ff28 ffffffff810d62e5
  ffffffff8193ffd8 ffffffff8193c000 d30a24e273173538 b42178990e5a2834
Call Trace:
  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
  [<ffffffff810d62e5>] cpu_startup_entry+0x245/0x290
  [<ffffffff81624db7>] rest_init+0x77/0x80
  [<ffffffff81a8d057>] start_kernel+0x429/0x44a
  [<ffffffff81a8ca37>] ? repair_env_string+0x5c/0x5c
  [<ffffffff81a8c120>] ? early_idt_handlers+0x120/0x120
  [<ffffffff81a8c5ee>] x86_64_start_reservations+0x2a/0x2c
  [<ffffffff81a8c742>] x86_64_start_kernel+0x152/0x175
Code: 00 00 00 00 00 55 48 89 e5 fa 5d c3 66 0f 1f 84 00 00 00 00 00 55 
48 89 e5 fb 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 fb f4 <5d> c3 
0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 66 0f 1f 84
NMI backtrace for cpu 2
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.10.0-327.4.5.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
task: ffff88023417dc00 ti: ffff8802341a8000 task.ti: ffff8802341a8000
RIP: 0010:[<ffffffff81058d5a>]  [<ffffffff81058d5a>] 
native_write_msr_safe+0xa/0x10
RSP: 0018:ffff88023fd03d78  EFLAGS: 00000046
RAX: 0000000000000400 RBX: 0000000000000002 RCX: 0000000000000830
RDX: 0000000000000002 RSI: 0000000000000400 RDI: 0000000000000830
RBP: ffff88023fd03d78 R08: ffffffff81a658e0 R09: 00000000000003d8
R10: 0000000000000000 R11: ffff88023fd03afe R12: ffffffff81a658e0
R13: 0000000000000002 R14: 000000000000a022 R15: 0000000000000002
FS:  0000000000000000(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c2087f1200 CR3: 00000002330ba000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
  ffff88023fd03dc8 ffffffff8104fa22 0000000000000096 0000000000000002
  000800003fd03da8 0000000000000001 0000000000000002 0000000000000004
  ffff88023fd0df80 ffffffff819a6080 ffff88023fd03dd8 ffffffff8104fac3
Call Trace:
  <IRQ>

  [<ffffffff8104fa22>] __x2apic_send_IPI_mask+0xb2/0xe0
  [<ffffffff8104fac3>] x2apic_send_IPI_mask+0x13/0x20
  [<ffffffff8104b2ad>] arch_trigger_all_cpu_backtrace+0x11d/0x130
  [<ffffffff8112695d>] rcu_check_callbacks+0x5bd/0x610
  [<ffffffff810e0600>] ? tick_sched_handle.isra.14+0x60/0x60
  [<ffffffff8108e8c7>] update_process_times+0x47/0x80
  [<ffffffff810e05c5>] tick_sched_handle.isra.14+0x25/0x60
  [<ffffffff810e0641>] tick_sched_timer+0x41/0x70
  [<ffffffff810a9d32>] __hrtimer_run_queues+0xd2/0x260
  [<ffffffff810aa2d0>] hrtimer_interrupt+0xb0/0x1e0
  [<ffffffff81049537>] local_apic_timer_interrupt+0x37/0x60
  [<ffffffff81647e4f>] smp_apic_timer_interrupt+0x3f/0x60
  [<ffffffff8164651d>] apic_timer_interrupt+0x6d/0x80
  <EOI>

  [<ffffffff81058e96>] ? native_safe_halt+0x6/0x10
  [<ffffffff8101dbcf>] default_idle+0x1f/0xc0
  [<ffffffff8101e4d6>] arch_cpu_idle+0x26/0x30
  [<ffffffff810d62e5>] cpu_startup_entry+0x245/0x290
  [<ffffffff810475fa>] start_secondary+0x1ba/0x230
Code: 00 55 89 f9 48 89 e5 0f 32 31 c9 89 c0 48 c1 e2 20 89 0e 48 09 c2 
48 89 d0 5d c3 66 0f 1f 44 00 00 55 89 f0 89 f9 48 89 e5 0f 30 <31> c0 
5d c3 66 90 55 89 f9 48 89 e5 0f 33 89 c0 48 c1 e2 20 48
NMI backtrace for cpu 3
CPU: 3 PID: 30 Comm: migration/3 Not tainted 3.10.0-327.4.5.el7.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
task: ffff8802341bf300 ti: ffff880234208000 task.ti: ffff880234208000
RIP: 0010:[<ffffffff8163ca37>]  [<ffffffff8163ca37>] 
_raw_spin_unlock+0x17/0x30
RSP: 0018:ffff88023fd83e58  EFLAGS: 00000082
RAX: 00000000f90cf90c RBX: ffff88023fc14780 RCX: 00000000000052f8
RDX: 000000000000f90c RSI: 00000000f90ef90c RDI: ffff88023fc14780
RBP: ffff88023fd83e98 R08: ffffffff81c6ec80 R09: ffff88023fd8dbc1
R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000989680
R13: ffff880209f73a40 R14: 0000000a4dbdb2da R15: ffff8801faed1c00
FS:  0000000000000000(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91d9cab000 CR3: 0000000230c56000 CR4: 00000000003407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
  ffffffff810c4487 ffff8801faed1d00 000000004f324af7 ffff880209f73980
  ffff880209f73948 0000000000989680 ffff880209f73a40 0000000a4dbdb2da
  ffff88023fd83ed0 ffffffff810c46d7 ffff880209f73980 ffff88023fd8d960
Call Trace:
  <IRQ>

  [<ffffffff810c4487>] ? distribute_cfs_runtime+0x87/0x100
  [<ffffffff810c46d7>] sched_cfs_period_timer+0x107/0x140
  [<ffffffff810c45d0>] ? sched_cfs_slack_timer+0xd0/0xd0
  [<ffffffff810a9d32>] __hrtimer_run_queues+0xd2/0x260
  [<ffffffff810aa2d0>] hrtimer_interrupt+0xb0/0x1e0
  [<ffffffff81049537>] local_apic_timer_interrupt+0x37/0x60
  [<ffffffff81647e4f>] smp_apic_timer_interrupt+0x3f/0x60
  [<ffffffff8164651d>] apic_timer_interrupt+0x6d/0x80
  <EOI>

  [<ffffffff81064cba>] ? native_set_fixmap+0x3a/0x40
  [<ffffffff811031f5>] ? multi_cpu_stop+0xa5/0xf0
  [<ffffffff81103150>] ? cpu_stop_should_run+0x50/0x50
  [<ffffffff811033d8>] cpu_stopper_thread+0x88/0x160
  [<ffffffff810b4268>] ? finish_task_switch+0x108/0x170
  [<ffffffff8163a228>] ? __schedule+0x2d8/0x900
  [<ffffffff810ae51f>] smpboot_thread_fn+0xff/0x1a0
  [<ffffffff8163a879>] ? schedule+0x29/0x70
  [<ffffffff810ae420>] ? lg_double_unlock+0x90/0x90
  [<ffffffff810a5aef>] kthread+0xcf/0xe0
  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
  [<ffffffff81645818>] ret_from_fork+0x58/0x90
  [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Code: 39 d0 75 ea b8 01 00 00 00 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 
44 00 00 e9 06 00 00 00 66 83 07 02 c3 90 8b 37 f0 66 83 07 02 <f6> 47 
02 01 74 f1 55 48 89 e5 e8 fd 2c ff ff 5d c3 0f 1f 84 00

Does anyone have any quick thoughts on what the issue could be, or any 
advice on how to get further appropriate debugging information to help 
get to the bottom of it?

Thanks,

Jim

[-- Attachment #2: vm-dmesg.gz --]
[-- Type: application/gzip, Size: 37692 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread