All of lore.kernel.org
 help / color / mirror / Atom feed
* both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
@ 2009-09-02 17:23 Nikola Ciprich
  2009-09-02 19:21 ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Nikola Ciprich @ 2009-09-02 17:23 UTC (permalink / raw)
  To: KVM list; +Cc: nikola.ciprich

Hello,
we're having problem one of our kvm guests. According to storage summary mail Christopher
sent few days ago, I switched my disk model to SCSI which should be the safest choice.
But now we've stumbled upon the problem that the whole guest crashes when we run one
specific postgres query which heavily loads it.
I can 100% reproduce this problem on both our production nodes (each 8cores, 16GB RAM),
on my testing machine (4cores, 3GB RAM) this causes the HOST to reboot (which is even worse).
We haven't experienced this problem with virtio.
I tried both qemu-kvm-0.10.6 and kvm-88, the host is running 2.6.30.5, I tried 2.6.29.x and 
2.6.30.5 for guest.

Guest backtrace follows (it's a bit mangled as it's obtained using netconsole)
Sep  2 19:01:20 sql2 [ 1564.795629] BUG: unable to handle kernel
Sep  2 19:01:20 sql2 NULL pointer dereference
Sep  2 19:01:20 sql2 NULL pointer dereference
Sep  2 19:01:20 sql2 at 0000000000000358
Sep  2 19:01:20 sql2 [ 1564.797727] IP:
Sep  2 19:01:20 sql2 at 0000000000000358
Sep  2 19:01:20 sql2 [ 1564.797727] IP:
Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.798969] PGD 2d5a2067
Sep  2 19:01:20 sql2 PUD 2d4ad067
Sep  2 19:01:20 sql2 [ 1564.798969] PGD 2d5a2067
Sep  2 19:01:20 sql2 PMD 0
Sep  2 19:01:20 sql2 PUD 2d4ad067
Sep  2 19:01:20 sql2 PMD 0
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.799505] Oops: 0000 [#1]
Sep  2 19:01:20 sql2 PREEMPT
Sep  2 19:01:20 sql2 [ 1564.799505] Oops: 0000 [#1]
Sep  2 19:01:20 sql2 SMP
Sep  2 19:01:20 sql2 PREEMPT
Sep  2 19:01:20 sql2 SMP
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.800773] last sysfs file: /sys/block/md0/dev
Sep  2 19:01:20 sql2 [ 1564.800773] CPU 3
Sep  2 19:01:20 sql2 [ 1564.800773] last sysfs file: /sys/block/md0/dev
Sep  2 19:01:20 sql2 [ 1564.800773] CPU 3
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.800773] Modules linked in:
Sep  2 19:01:20 sql2 netconsole
Sep  2 19:01:20 sql2 [ 1564.800773] Modules linked in:
Sep  2 19:01:20 sql2 ipv6
Sep  2 19:01:20 sql2 netconsole
Sep  2 19:01:20 sql2 ipv6
Sep  2 19:01:20 sql2 reiserfs
Sep  2 19:01:20 sql2 dm_mirror
Sep  2 19:01:20 sql2 reiserfs
Sep  2 19:01:20 sql2 dm_mirror
Sep  2 19:01:20 sql2 dm_region_hash
Sep  2 19:01:20 sql2 dm_log
Sep  2 19:01:20 sql2 dm_region_hash
Sep  2 19:01:20 sql2 dm_log
Sep  2 19:01:20 sql2 dm_mod
Sep  2 19:01:20 sql2 video
Sep  2 19:01:20 sql2 dm_mod
Sep  2 19:01:20 sql2 video
Sep  2 19:01:20 sql2 backlight
Sep  2 19:01:20 sql2 output
Sep  2 19:01:20 sql2 backlight
Sep  2 19:01:20 sql2 output
Sep  2 19:01:20 sql2 sbs
Sep  2 19:01:20 sql2 sbshc
Sep  2 19:01:20 sql2 sbs
Sep  2 19:01:20 sql2 sbshc
Sep  2 19:01:20 sql2 fan
Sep  2 19:01:20 sql2 container
Sep  2 19:01:20 sql2 fan
Sep  2 19:01:20 sql2 battery
Sep  2 19:01:20 sql2 container
Sep  2 19:01:20 sql2 battery
Sep  2 19:01:20 sql2 ac
Sep  2 19:01:20 sql2 ac
Sep  2 19:01:20 sql2 parport_pc
Sep  2 19:01:20 sql2 lp
Sep  2 19:01:20 sql2 parport_pc
Sep  2 19:01:20 sql2 lp
Sep  2 19:01:20 sql2 parport
Sep  2 19:01:20 sql2 nvram
Sep  2 19:01:20 sql2 parport
Sep  2 19:01:20 sql2 nvram
Sep  2 19:01:20 sql2 sg
Sep  2 19:01:20 sql2 sr_mod
Sep  2 19:01:20 sql2 sg
Sep  2 19:01:20 sql2 sr_mod
Sep  2 19:01:20 sql2 cdrom
Sep  2 19:01:20 sql2 thermal
Sep  2 19:01:20 sql2 cdrom
Sep  2 19:01:20 sql2 thermal
Sep  2 19:01:20 sql2 processor
Sep  2 19:01:20 sql2 thermal_sys
Sep  2 19:01:20 sql2 processor
Sep  2 19:01:20 sql2 button
Sep  2 19:01:20 sql2 thermal_sys
Sep  2 19:01:20 sql2 button
Sep  2 19:01:20 sql2 virtio_balloon
Sep  2 19:01:20 sql2 virtio_balloon
Sep  2 19:01:20 sql2 i2c_piix4
Sep  2 19:01:20 sql2 e1000
Sep  2 19:01:20 sql2 i2c_piix4
Sep  2 19:01:20 sql2 i2c_core
Sep  2 19:01:20 sql2 e1000
Sep  2 19:01:20 sql2 piix
Sep  2 19:01:20 sql2 i2c_core
Sep  2 19:01:20 sql2 piix
Sep  2 19:01:20 sql2 sym53c8xx
Sep  2 19:01:20 sql2 scsi_transport_spi
Sep  2 19:01:20 sql2 sym53c8xx
Sep  2 19:01:20 sql2 pcspkr
Sep  2 19:01:20 sql2 scsi_transport_spi
Sep  2 19:01:20 sql2 pcspkr
Sep  2 19:01:20 sql2 virtio_blk
Sep  2 19:01:20 sql2 virtio_net
Sep  2 19:01:20 sql2 virtio_blk
Sep  2 19:01:20 sql2 virtio_pci
Sep  2 19:01:20 sql2 virtio_net
Sep  2 19:01:20 sql2 virtio_ring
Sep  2 19:01:20 sql2 virtio_pci
Sep  2 19:01:20 sql2 virtio
Sep  2 19:01:20 sql2 virtio_ring
Sep  2 19:01:20 sql2 8139cp
Sep  2 19:01:20 sql2 virtio
Sep  2 19:01:20 sql2 8139cp
Sep  2 19:01:20 sql2 mii
Sep  2 19:01:20 sql2 bitrev
Sep  2 19:01:20 sql2 mii
Sep  2 19:01:20 sql2 bitrev
Sep  2 19:01:20 sql2 pata_acpi
Sep  2 19:01:20 sql2 pata_acpi
Sep  2 19:01:20 sql2 ide_pci_generic
Sep  2 19:01:20 sql2 ide_core
Sep  2 19:01:20 sql2 ide_pci_generic
Sep  2 19:01:20 sql2 ata_piix
Sep  2 19:01:20 sql2 ide_core
Sep  2 19:01:20 sql2 ata_generic
Sep  2 19:01:20 sql2 ata_piix
Sep  2 19:01:20 sql2 libata
Sep  2 19:01:20 sql2 ata_generic
Sep  2 19:01:20 sql2 sd_mod
Sep  2 19:01:20 sql2 libata
Sep  2 19:01:20 sql2 scsi_mod
Sep  2 19:01:20 sql2 sd_mod
Sep  2 19:01:20 sql2 ext4
Sep  2 19:01:20 sql2 scsi_mod
Sep  2 19:01:20 sql2 jbd2
Sep  2 19:01:20 sql2 ext4
Sep  2 19:01:20 sql2 jbd2
Sep  2 19:01:20 sql2 crc32
Sep  2 19:01:20 sql2 crc16
Sep  2 19:01:20 sql2 crc32
Sep  2 19:01:20 sql2 uhci_hcd
Sep  2 19:01:20 sql2 crc16
Sep  2 19:01:20 sql2 ohci_hcd
Sep  2 19:01:20 sql2 uhci_hcd
Sep  2 19:01:20 sql2 ehci_hcd
Sep  2 19:01:20 sql2 ohci_hcd
Sep  2 19:01:20 sql2 ehci_hcd
Sep  2 19:01:20 sql2 [last unloaded: scsi_wait_scan]
Sep  2 19:01:20 sql2 [last unloaded: scsi_wait_scan]
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.800773] Pid: 0, comm: swapper Not tainted 2.6.30lb.06 #1
Sep  2 19:01:20 sql2 [ 1564.800773] Pid: 0, comm: swapper Not tainted 2.6.30lb.06 #1
Sep  2 19:01:20 sql2 [ 1564.800773] RIP: 0010:[<ffffffffa018beff>]
Sep  2 19:01:20 sql2 [ 1564.800773] RIP: 0010:[<ffffffffa018beff>]
Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.800773] RSP: 0018:ffff88000244edf8  EFLAGS: 00010087
Sep  2 19:01:20 sql2 [ 1564.800773] RSP: 0018:ffff88000244edf8  EFLAGS: 00010087
Sep  2 19:01:20 sql2 [ 1564.800773] RAX: 000000000000000b RBX: 000000000000000b RCX: 0000000000000084
Sep  2 19:01:20 sql2 [ 1564.800773] RAX: 000000000000000b RBX: 000000000000000b RCX: 0000000000000084
Sep  2 19:01:20 sql2 [ 1564.800773] RDX: ffff88002e8dc800 RSI: 000000002d46b168 RDI: ffffc2000021a006
Sep  2 19:01:20 sql2 [ 1564.800773] RDX: ffff88002e8dc800 RSI: 000000002d46b168 RDI: ffffc2000021a006
Sep  2 19:01:20 sql2 [ 1564.800773] RBP: ffff88000244ee68 R08: ffffffffa00b243c R09: 00017fd30f1b243c
Sep  2 19:01:20 sql2 [ 1564.800773] RBP: ffff88000244ee68 R08: ffffffffa00b243c R09: 00017fd30f1b243c
Sep  2 19:01:20 sql2 [ 1564.800773] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88002d46b168
Sep  2 19:01:20 sql2 [ 1564.800773] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88002d46b168
Sep  2 19:01:20 sql2 [ 1564.800773] R13: 0000000000000000 R14: ffff88002d46b000 R15: 0000000000000000
Sep  2 19:01:20 sql2 [ 1564.800773] R13: 0000000000000000 R14: ffff88002d46b000 R15: 0000000000000000
Sep  2 19:01:20 sql2 [ 1564.800773] FS:  0000000000000000(0000) GS:ffff88000244b000(0000) knlGS:0000000000000000
Sep  2 19:01:20 sql2 [ 1564.800773] FS:  0000000000000000(0000) GS:ffff88000244b000(0000) knlGS:0000000000000000
Sep  2 19:01:20 sql2 [ 1564.800773] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep  2 19:01:20 sql2 [ 1564.800773] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep  2 19:01:20 sql2 [ 1564.800773] CR2: 0000000000000358 CR3: 000000002dd34000 CR4: 00000000000006e0
Sep  2 19:01:20 sql2 [ 1564.800773] CR2: 0000000000000358 CR3: 000000002dd34000 CR4: 00000000000006e0
Sep  2 19:01:20 sql2 [ 1564.800773] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  2 19:01:20 sql2 [ 1564.800773] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  2 19:01:20 sql2 [ 1564.800773] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  2 19:01:20 sql2 [ 1564.800773] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep  2 19:01:20 sql2 [ 1564.800773] Process swapper (pid: 0, threadinfo ffff88002f8c2000, task ffff88002f8c8000)
Sep  2 19:01:20 sql2 [ 1564.800773] Process swapper (pid: 0, threadinfo ffff88002f8c2000, task ffff88002f8c8000)
Sep  2 19:01:20 sql2 [ 1564.800773] Stack:
Sep  2 19:01:20 sql2 [ 1564.800773]  0000000000000040
Sep  2 19:01:20 sql2 ffff88000244eec0
Sep  2 19:01:20 sql2 [ 1564.800773]  0000000000000040
Sep  2 19:01:20 sql2 0000000000000001
Sep  2 19:01:20 sql2 ffff88000244eec0
Sep  2 19:01:20 sql2 0000000000000001
Sep  2 19:01:20 sql2 0000000000000000
Sep  2 19:01:20 sql2 0000000000000000
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 ffff88000245bfc0
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 0000000000000046
Sep  2 19:01:20 sql2 ffff88000245bfc0
Sep  2 19:01:20 sql2 0000000000000000
Sep  2 19:01:20 sql2 0000000000000001
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 0000000000000000
Sep  2 19:01:20 sql2 0000000000000001
Sep  2 19:01:20 sql2 0000000000000000
Sep  2 19:01:20 sql2 0000000000000000
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.800773] Call Trace:
Sep  2 19:01:20 sql2 [ 1564.800773]  <IRQ>
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [<ffffffffa018e11c>] sym_interrupt+0x43c/0x720 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffffa0186b61>] sym53c8xx_intr+0x41/0x90 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff8028ecac>] handle_IRQ_event+0x4c/0x1c0
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff80290eb9>] handle_fasteoi_irq+0x89/0x110
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff8020e05f>] handle_irq+0x1f/0x30
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff8020d81e>] do_IRQ+0x6e/0x100
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff8020bd53>] ret_from_intr+0x0/0xa
Sep  2 19:01:20 sql2 [ 1564.800773]  <EOI>
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [<ffffffff802130c5>] ? default_idle+0x45/0x150
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff8025da81>] ? atomic_notifier_call_chain+0x11/0x20
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff8020a20e>] ? cpu_idle+0x4e/0xa0
Sep  2 19:01:20 sql2 [ 1564.800773]
Sep  2 19:01:20 sql2 [<ffffffff804e5aa3>] ? start_secondary+0x173/0x1d0
Sep  2 19:01:20 sql2 [ 1564.800773] Code:
Sep  2 19:01:20 sql2 ad
Sep  2 19:01:20 sql2 07
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 2c
Sep  2 19:01:20 sql2 01
Sep  2 19:01:20 sql2 0f
Sep  2 19:01:20 sql2 84
Sep  2 19:01:20 sql2 5d
Sep  2 19:01:20 sql2 09
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 0f
Sep  2 19:01:20 sql2 ae
Sep  2 19:01:20 sql2 f8
Sep  2 19:01:20 sql2 49
Sep  2 19:01:20 sql2 8b
Sep  2 19:01:20 sql2 b6
Sep  2 19:01:20 sql2 50
Sep  2 19:01:20 sql2 05
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 41
Sep  2 19:01:20 sql2 8b
Sep  2 19:01:20 sql2 be
Sep  2 19:01:20 sql2 60
Sep  2 19:01:20 sql2 06
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 48
Sep  2 19:01:20 sql2 83
Sep  2 19:01:20 sql2 c6
Sep  2 19:01:20 sql2 2c
Sep  2 19:01:20 sql2 e8
Sep  2 19:01:20 sql2 e6
Sep  2 19:01:20 sql2 64
Sep  2 19:01:20 sql2 1f
Sep  2 19:01:20 sql2 e0
Sep  2 19:01:20 sql2 e9
Sep  2 19:01:20 sql2 4d
Sep  2 19:01:20 sql2 fe
Sep  2 19:01:20 sql2 ff
Sep  2 19:01:20 sql2 ff
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 8b
Sep  2 19:01:20 sql2 97
Sep  2 19:01:20 sql2 58
Sep  2 19:01:20 sql2 03
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 48
Sep  2 19:01:20 sql2 8b
Sep  2 19:01:20 sql2 82
Sep  2 19:01:20 sql2 80
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 4c
Sep  2 19:01:20 sql2 8b
Sep  2 19:01:20 sql2 a0
Sep  2 19:01:20 sql2 c8
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2 00
Sep  2 19:01:20 sql2
Sep  2 19:01:20 sql2 [ 1564.878205] RIP
Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.878205]  RSP <ffff88000244edf8>
Sep  2 19:01:20 sql2 [ 1564.878205] CR2: 0000000000000358
Sep  2 19:01:20 sql2 [ 1564.878205] ---[ end trace 1af2f1cc92726b74 ]---
Sep  2 19:01:20 sql2 [ 1564.878205] Kernel panic - not syncing: Fatal exception in interrupt
Sep  2 19:01:20 sql2 [ 1564.878205] Pid: 0, comm: swapper Tainted: G      D    2.6.30lb.06 #1
Sep  2 19:01:20 sql2 [ 1564.878205] Call Trace:
Sep  2 19:01:20 sql2 [ 1564.878205]  <IRQ>
Sep  2 19:01:20 sql2 [<ffffffff80240cda>] panic+0xaa/0x180
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff804ecee0>] ? _spin_unlock_irqrestore+0x50/0x60
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8024146f>] ? release_console_sem+0x1df/0x200
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff80241707>] ? console_unblank+0x67/0x80
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8024095e>] ? print_oops_end_marker+0x1e/0x20
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8020f74f>] oops_end+0x9f/0xb0
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8022871a>] no_context+0x15a/0x250
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff80232730>] ? __enqueue_entity+0x80/0x90
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff802289ab>] __bad_area_nosemaphore+0xdb/0x1c0
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff80234499>] ? find_busiest_group+0x1a9/0x940
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff80228b2e>] bad_area_nosemaphore+0xe/0x10
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff80228e4b>] do_page_fault+0xab/0x270
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff804ed36f>] page_fault+0x1f/0x30
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffffa018beff>] ? sym_int_sir+0x2bf/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffffa018bc9e>] ? sym_int_sir+0x5e/0x1590 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffffa018e11c>] sym_interrupt+0x43c/0x720 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffffa0186b61>] sym53c8xx_intr+0x41/0x90 [sym53c8xx]
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8028ecac>] handle_IRQ_event+0x4c/0x1c0
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff80290eb9>] handle_fasteoi_irq+0x89/0x110
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8020e05f>] handle_irq+0x1f/0x30
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8020d81e>] do_IRQ+0x6e/0x100
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8020bd53>] ret_from_intr+0x0/0xa
Sep  2 19:01:20 sql2 [ 1564.878205]  <EOI>
Sep  2 19:01:20 sql2 [<ffffffff802130c5>] ? default_idle+0x45/0x150
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8025da81>] ? atomic_notifier_call_chain+0x11/0x20
Sep  2 19:01:20 sql2 [ 1564.878205]  [<ffffffff8020a20e>] ? cpu_idle+0x4e/0xa0

Any idea on what could be wrong?
I'll gladly provide any further information which might help to track this bug down...
thanks a lot
regards
nik


-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
  2009-09-02 17:23 both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage Nikola Ciprich
@ 2009-09-02 19:21 ` Avi Kivity
  2009-09-02 21:14   ` Nikola Ciprich
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-09-02 19:21 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: KVM list, nikola.ciprich

On 09/02/2009 08:23 PM, Nikola Ciprich wrote:
> Hello,
> we're having problem one of our kvm guests. According to storage summary mail Christopher
> sent few days ago, I switched my disk model to SCSI which should be the safest choice.
> But now we've stumbled upon the problem that the whole guest crashes when we run one
> specific postgres query which heavily loads it.
>    

scsi is hardly tested, so this isn't surprising.

> I can 100% reproduce this problem on both our production nodes (each 8cores, 16GB RAM),
> on my testing machine (4cores, 3GB RAM) this causes the HOST to reboot (which is even worse).
> We haven't experienced this problem with virtio.
>    

AMD or Intel?  Uni or smp guests?

The host crash is of course more worrying.  Can you capture dmesg?

> I tried both qemu-kvm-0.10.6 and kvm-88, the host is running 2.6.30.5, I tried 2.6.29.x and
> 2.6.30.5 for guest.
>
> Guest backtrace follows (it's a bit mangled as it's obtained using netconsole)
> Sep  2 19:01:20 sql2 [ 1564.795629] BUG: unable to handle kernel
> Sep  2 19:01:20 sql2 NULL pointer dereference
> Sep  2 19:01:20 sql2 NULL pointer dereference
> Sep  2 19:01:20 sql2 at 0000000000000358
> Sep  2 19:01:20 sql2 [ 1564.797727] IP:
> Sep  2 19:01:20 sql2 at 0000000000000358
> Sep  2 19:01:20 sql2 [ 1564.797727] IP:
> Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
> Sep  2 19:01:20 sql2 [<ffffffffa018beff>] sym_int_sir+0x2bf/0x1590 [sym53c8xx]
>    

It's in the scsi driver, so it's probable our scsi emulation is broken.  
Or maybe (unlikely) a bug in the driver.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
  2009-09-02 19:21 ` Avi Kivity
@ 2009-09-02 21:14   ` Nikola Ciprich
  2009-09-02 21:42     ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Nikola Ciprich @ 2009-09-02 21:14 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list, nikola.ciprich

Hello Avi, thnaks for reply.

> scsi is hardly tested, so this isn't surprising.
well, but if it's still the most safe way to go, then there's not much other
choices, is it?

> AMD or Intel?  Uni or smp guests?
intel, SMP guests. sorry for not reporting this right away..


> The host crash is of course more worrying.  Can you capture dmesg?
there's absolutely no output, just immediate reboot :(

>
> It's in the scsi driver, so it's probable our scsi emulation is broken.   
> Or maybe (unlikely) a bug in the driver.
Well, I guess it'll be in emulation as well, as the driver is quite old
so it should be well tested. 
Any hints on how could I futher debug it?
thanks a lot once again
nik

-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
  2009-09-02 21:14   ` Nikola Ciprich
@ 2009-09-02 21:42     ` Avi Kivity
  2009-09-03  9:59       ` Nikola Ciprich
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-09-02 21:42 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: KVM list, nikola.ciprich

On 09/03/2009 12:14 AM, Nikola Ciprich wrote:
> Hello Avi, thnaks for reply.
>
>    
>> scsi is hardly tested, so this isn't surprising.
>>      
> well, but if it's still the most safe way to go, then there's not much other
> choices, is it?
>
>    

If it crashes, it isn't the safest way.

>> AMD or Intel?  Uni or smp guests?
>>      
> intel, SMP guests. sorry for not reporting this right away..
>
>
>    
>> The host crash is of course more worrying.  Can you capture dmesg?
>>      
> there's absolutely no output, just immediate reboot :(
>
>    

Not even through netconsole?  It's on just one host, right?

>> It's in the scsi driver, so it's probable our scsi emulation is broken.
>> Or maybe (unlikely) a bug in the driver.
>>      
> Well, I guess it'll be in emulation as well, as the driver is quite old
> so it should be well tested.
> Any hints on how could I futher debug it?
>
>    

Add printk()s in the driver and see why it's confused.  Maybe it will 
tell us something about the bug in the device.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
  2009-09-02 21:42     ` Avi Kivity
@ 2009-09-03  9:59       ` Nikola Ciprich
  2009-09-03 10:05         ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Nikola Ciprich @ 2009-09-03  9:59 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list, nikola.ciprich

I tried running the test on my test machine with 2.6.31-rc8 guest kernel
and I wasn't able to reproduce the problem...
So it's possible that the problem could have been fixed somewhere between
2.6.30..2.6.31-rc8
Do You have any idea on what could be the related fix? It would be a good
candidate for 2.6.30.6 stable release...
I'll proceed with further testing...



-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
  2009-09-03  9:59       ` Nikola Ciprich
@ 2009-09-03 10:05         ` Avi Kivity
  2009-09-03 10:18           ` Nikola Ciprich
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2009-09-03 10:05 UTC (permalink / raw)
  To: Nikola Ciprich; +Cc: KVM list, nikola.ciprich

On 09/03/2009 12:59 PM, Nikola Ciprich wrote:
> I tried running the test on my test machine with 2.6.31-rc8 guest kernel
> and I wasn't able to reproduce the problem...
> So it's possible that the problem could have been fixed somewhere between
> 2.6.30..2.6.31-rc8
> Do You have any idea on what could be the related fix? It would be a good
> candidate for 2.6.30.6 stable release...
> I'll proceed with further testing...
>
>    

A bisect could help, though it would be a long process.  Running

   git bisect arch/x86/kvm virt/kvm

will speed it up, expect around 8 tests and 2-3 full recompiles.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage
  2009-09-03 10:05         ` Avi Kivity
@ 2009-09-03 10:18           ` Nikola Ciprich
  0 siblings, 0 replies; 7+ messages in thread
From: Nikola Ciprich @ 2009-09-03 10:18 UTC (permalink / raw)
  To: Avi Kivity; +Cc: KVM list, nikola.ciprich

Yup,
I was lazily trying to avoid that :)
OK, I'll give it a try and report..

On Thu, Sep 03, 2009 at 01:05:46PM +0300, Avi Kivity wrote:
> On 09/03/2009 12:59 PM, Nikola Ciprich wrote:
>> I tried running the test on my test machine with 2.6.31-rc8 guest kernel
>> and I wasn't able to reproduce the problem...
>> So it's possible that the problem could have been fixed somewhere between
>> 2.6.30..2.6.31-rc8
>> Do You have any idea on what could be the related fix? It would be a good
>> candidate for 2.6.30.6 stable release...
>> I'll proceed with further testing...
>>
>>    
>
> A bisect could help, though it would be a long process.  Running
>
>   git bisect arch/x86/kvm virt/kvm
>
> will speed it up, expect around 8 tests and 2-3 full recompiles.
>
> -- 
> error compiling committee.c: too many arguments to function
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
-------------------------------------
Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-03 10:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-02 17:23 both qemu-kvm-0.10.6 and kvm-88 crashing under heavy load while using scsi-backed storage Nikola Ciprich
2009-09-02 19:21 ` Avi Kivity
2009-09-02 21:14   ` Nikola Ciprich
2009-09-02 21:42     ` Avi Kivity
2009-09-03  9:59       ` Nikola Ciprich
2009-09-03 10:05         ` Avi Kivity
2009-09-03 10:18           ` Nikola Ciprich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.