kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG() with SCSI-interfaced disk images
@ 2008-12-26 21:00 John Morrissey
  2009-01-07 20:19 ` John Morrissey
  2009-01-08 19:24 ` qcow2 corruption? John Morrissey
  0 siblings, 2 replies; 12+ messages in thread
From: John Morrissey @ 2008-12-26 21:00 UTC (permalink / raw)
  To: kvm

I'm encountering a kernel BUG() in guests using SCSI-interfaced disk images.
I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same
behavior (disclaimer: Debian has about a dozen patches in their kvm
packaging, but they all seem to be changes to the build/install process or
security-related).

IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian
lenny (32-bit/i386) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64
2.6.26-12).

After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB
filesystem is a reliable trigger), the kernel BUGs (oops output below).

I was previously using KVM 72, and tried upgrading to 79 because both Debian
lenny and Ubuntu hardy guests were panicing due to sym disconnects/timeouts.
79 makes the lenny guest start BUGging as described above. 82 is not
perceivably different from 79 for the lenny guest.

FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up, although
it emits:

Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No Sense [current] 
Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0
Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No additional sense information

at seemingly random intervals. The upgrade to 82 made the hardy guest start
BUGging on soft lockups at random intervals (I can provide the full output
if anyone's interested, but I'm much more interested in the lenny guest
oops at this point).

john


run via libvirt:
/usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \
	-boot c -drive file=image.qcow,if=scsi,index=0,boot=on
	-net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \
	-net tap,fd=17,script=,vlan=0,ifname=vnet2 \
	-net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \
	-net tap,fd=18,script=,vlan=1,ifname=vnet3 \
	-serial pty -parallel none -usb -vnc 0.0.0.0:1

[The KVMWiki asks whether the problem is reproducible with -no-kvm-irqchip,
 -no-kvm-pit, or -no-kvm, but when I tried invoking the above command line
 by hand (outside of libvirt), the VNC console was always blank and there
 was no console output on the serial pty. If this would be useful
 information to have in this case, I'd love to know what I'm doing wrong, or
 if there's a way to specify additional command line arguments with
 libvirt.]

oops generated in the guest:
[  140.101828] sym0: unexpected disconnect
[  140.102748] BUG: unable to handle kernel NULL pointer dereference at 00000358
[  140.103818] IP: [<e08e2670>] :sym53c8xx:sym_int_sir+0x547/0x118f
[  140.106449] *pdpt = 000000001f5f9001 *pde = 0000000000000000 
[  140.107356] Oops: 0000 [#1] SMP 
[  140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core thermal processor fan thermal_sys
[  140.108062] 
[  140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1)
[  140.108062] EIP: 0060:[<e08e2670>] EFLAGS: 00010287 CPU: 0
[  140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx]
[  140.108062] EAX: 0000000a EBX: 00000000 ECX: 1f98c084 EDX: 00000030
[  140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0
[  140.108062]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 task.ti=de0f2000)
[  140.108062] Stack: 00000000 000144d6 7f5a222c c011a853 0021d496 00000000 00000000 00000000 
[  140.108062]        00000000 df98c000 e08e08cd 00000000 00000000 00000001 00000000 df98c000 
[  140.108062]        00000084 e08e3f2f df988c00 00000046 00000000 df544400 00000196 00000000 
[  140.108062] Call Trace:
[  140.108062]  [<c011a853>] pvclock_clocksource_read+0x4b/0xd0
[  140.108062]  [<e08e08cd>] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx]
[  140.108062]  [<e08e3f2f>] sym_interrupt+0x3ee/0x5fd [sym53c8xx]
[  140.108062]  [<e08df3dc>] sym53c8xx_intr+0x35/0x56 [sym53c8xx]
[  140.108062]  [<c0158e4e>] handle_IRQ_event+0x23/0x51
[  140.108062]  [<c0159f4d>] handle_fasteoi_irq+0x71/0xa4
[  140.108062]  [<c010afd2>] do_IRQ+0x4d/0x63
[  140.108062]  [<c01092a7>] common_interrupt+0x23/0x28
[  140.108062]  [<c01300d8>] ptrace_request+0x1ec/0x278
[  140.108062]  [<c012d0c6>] __do_softirq+0x57/0xd3
[  140.108062]  [<c012d187>] do_softirq+0x45/0x53
[  140.108062]  [<c012d43e>] irq_exit+0x35/0x67
[  140.108062]  [<c01152b6>] smp_apic_timer_interrupt+0x6b/0x75
[  140.108062]  [<c0109364>] apic_timer_interrupt+0x28/0x30
[  140.108062]  [<c02c9953>] _spin_unlock_irqrestore+0x7/0x10
[  140.108062]  [<e0865a94>] scsi_dispatch_cmd+0x197/0x205 [scsi_mod]
[  140.108062]  [<e086ab2e>] scsi_request_fn+0x264/0x32a [scsi_mod]
[  140.108063]  [<c01dcbd6>] __generic_unplug_device+0x1a/0x1c
[  140.108063]  [<c01dd3e9>] __make_request+0x2fe/0x348
[  140.108063]  [<c01dc008>] generic_make_request+0x34d/0x37b
[  140.108063]  [<c015f9f1>] mempool_alloc+0x1c/0xba
[  140.108063]  [<c01dd0e4>] submit_bio+0xc6/0xcd
[  140.108063]  [<c019cdff>] bio_alloc_bioset+0x9b/0xf3
[  140.108063]  [<c0199983>] submit_bh+0xcf/0xed
[  140.108063]  [<c019b32e>] __block_write_full_page+0x1fa/0x2da
[  140.108063]  [<c019eb73>] blkdev_get_block+0x0/0x43
[  140.108063]  [<c019b4ef>] block_write_full_page+0xe1/0xea
[  140.108063]  [<c019eb73>] blkdev_get_block+0x0/0x43
[  140.108063]  [<c01626d5>] __writepage+0x8/0x21
[  140.108063]  [<c0162b50>] write_cache_pages+0x16a/0x27b
[  140.108063]  [<c01626cd>] __writepage+0x0/0x21
[  140.108063]  [<c0162c61>] generic_writepages+0x0/0x21
[  140.108063]  [<c0162c7b>] generic_writepages+0x1a/0x21
[  140.108063]  [<c0162ca2>] do_writepages+0x20/0x30
[  140.108063]  [<c0196525>] __writeback_single_inode+0x127/0x251
[  140.108063]  [<c019691c>] sync_sb_inodes+0x17c/0x233
[  140.108063]  [<c0196c93>] writeback_inodes+0x53/0x99
[  140.108063]  [<c01638c1>] pdflush+0x0/0x1cc
[  140.108063]  [<c016357c>] wb_kupdate+0x7b/0xdb
[  140.108063]  [<c01639f0>] pdflush+0x12f/0x1cc
[  140.108063]  [<c0163501>] wb_kupdate+0x0/0xdb
[  140.108063]  [<c0138643>] kthread+0x38/0x5d
[  140.108063]  [<c013860b>] kthread+0x0/0x5d
[  140.108063]  [<c01094f3>] kernel_thread_helper+0x7/0x10
[  140.108063]  =======================
[  140.108063] Code: 93 4c 01 00 00 52 50 68 42 76 8e e0 eb 4e 8d 83 b0 00 00 00 e8 32 71 96 df 8d 93 4c 01 00 00 52 50 68 7c 76 8e e0 eb 59 8b 1c 24 <8b> 93 58 03 00 00 8b 82 84 00 00 00 8b 1a 8b 70 60 85 f6 74 29 
[  140.108063] EIP: [<e08e2670>] sym_int_sir+0x547/0x118f [sym53c8xx] SS:ESP 0068:de0f3ba0
[  140.162446] Kernel panic - not syncing: Fatal exception in interrupt

vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
stepping        : 6
cpu MHz         : 2500.087
cache size      : 6144 KB
[...]
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2
ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips        : 5000.23
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG() with SCSI-interfaced disk images
  2008-12-26 21:00 BUG() with SCSI-interfaced disk images John Morrissey
@ 2009-01-07 20:19 ` John Morrissey
  2009-01-07 22:34   ` Ryan Harper
  2009-01-08 19:24 ` qcow2 corruption? John Morrissey
  1 sibling, 1 reply; 12+ messages in thread
From: John Morrissey @ 2009-01-07 20:19 UTC (permalink / raw)
  To: kvm

On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
> I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
> images. I've tried with the Debian packaging of KVM 79 and 82; both
> exhibit the same behavior (disclaimer: Debian has about a dozen patches in
> their kvm packaging, but they all seem to be changes to the build/install
> process or security-related).

Not to be pushy, but does anyone have any ideas on this, or can I provide
any additional information? I'm afraid I'm a bit over my head when debugging
kernel internals.

john

> IDE-interfaced disk images seem fine. Host and guest are up-to-date Debian
> lenny (32-bit/i386) running kernel 2.6.26 (Debian
> linux-image-2.6.26-1-amd64 2.6.26-12).
> 
> After a few minutes of disk activity (fsck(8)ing a fairly empty ~20GB
> filesystem is a reliable trigger), the kernel BUGs (oops output below).
> 
> I was previously using KVM 72, and tried upgrading to 79 because both
> Debian lenny and Ubuntu hardy guests were panicing due to sym
> disconnects/timeouts. 79 makes the lenny guest start BUGging as described
> above. 82 is not perceivably different from 79 for the lenny guest.
> 
> FWIW, the upgrade to 79 allowed the Ubuntu hardy guest to stay up,
> although it emits:
> 
> Dec 25 00:28:51 vicar kernel: [106621.553272] sd 2:0:0:0: [sda] Sense Key : No Sense [current] 
> Dec 25 00:28:51 vicar kernel: [106621.553279] Info fld=0x0
> Dec 25 00:28:51 vicar kernel: [106621.553280] sd 2:0:0:0: [sda] Add. Sense: No additional sense information
> 
> at seemingly random intervals. The upgrade to 82 made the hardy guest
> start BUGging on soft lockups at random intervals (I can provide the full
> output if anyone's interested, but I'm much more interested in the lenny
> guest oops at this point).
> 
> john
> 
> 
> run via libvirt:
> /usr/bin/kvm -S -M pc -m 512 -smp 1 -name test -monitor pty \
> 	-boot c -drive file=image.qcow,if=scsi,index=0,boot=on
> 	-net nic,macaddr=00:0c:29:1e:ea:b9,vlan=0,model=e1000 \
> 	-net tap,fd=17,script=,vlan=0,ifname=vnet2 \
> 	-net nic,macaddr=00:0c:29:1e:ea:c3,vlan=1,model=e1000 \
> 	-net tap,fd=18,script=,vlan=1,ifname=vnet3 \
> 	-serial pty -parallel none -usb -vnc 0.0.0.0:1
> 
> [The KVMWiki asks whether the problem is reproducible with
>  -no-kvm-irqchip, -no-kvm-pit, or -no-kvm, but when I tried invoking the
>  above command line by hand (outside of libvirt), the VNC console was
>  always blank and there was no console output on the serial pty. If this
>  would be useful information to have in this case, I'd love to know what
>  I'm doing wrong, or if there's a way to specify additional command line
>  arguments with libvirt.]
> 
> oops generated in the guest:
> [  140.101828] sym0: unexpected disconnect
> [  140.102748] BUG: unable to handle kernel NULL pointer dereference at 00000358
> [  140.103818] IP: [<e08e2670>] :sym53c8xx:sym_int_sir+0x547/0x118f
> [  140.106449] *pdpt = 000000001f5f9001 *pde = 0000000000000000 
> [  140.107356] Oops: 0000 [#1] SMP 
> [  140.107864] Modules linked in: loop virtio_balloon psmouse pcspkr serio_raw i2c_piix4 i2c_core button evdev ext3 jbd mbcache sd_mod ide_cd_mod cdrom ata_generic libata dock ide_pci_generic floppy virtio_pci virtio_ring virtio sym53c8xx scsi_transport_spi scsi_mod e1000 uhci_hcd usbcore piix ide_core thermal processor fan thermal_sys
> [  140.108062] 
> [  140.108062] Pid: 131, comm: pdflush Not tainted (2.6.26-1-686-bigmem #1)
> [  140.108062] EIP: 0060:[<e08e2670>] EFLAGS: 00010287 CPU: 0
> [  140.108062] EIP is at sym_int_sir+0x547/0x118f [sym53c8xx]
> [  140.108062] EAX: 0000000a EBX: 00000000 ECX: 1f98c084 EDX: 00000030
> [  140.108062] ESI: df98c084 EDI: df98c000 EBP: df98c000 ESP: de0f3ba0
> [  140.108062]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [  140.108062] Process pdflush (pid: 131, ti=de0f2000 task=df48e520 task.ti=de0f2000)
> [  140.108062] Stack: 00000000 000144d6 7f5a222c c011a853 0021d496 00000000 00000000 00000000 
> [  140.108062]        00000000 df98c000 e08e08cd 00000000 00000000 00000001 00000000 df98c000 
> [  140.108062]        00000084 e08e3f2f df988c00 00000046 00000000 df544400 00000196 00000000 
> [  140.108062] Call Trace:
> [  140.108062]  [<c011a853>] pvclock_clocksource_read+0x4b/0xd0
> [  140.108062]  [<e08e08cd>] sym_recover_scsi_int+0xb3/0x10d [sym53c8xx]
> [  140.108062]  [<e08e3f2f>] sym_interrupt+0x3ee/0x5fd [sym53c8xx]
> [  140.108062]  [<e08df3dc>] sym53c8xx_intr+0x35/0x56 [sym53c8xx]
> [  140.108062]  [<c0158e4e>] handle_IRQ_event+0x23/0x51
> [  140.108062]  [<c0159f4d>] handle_fasteoi_irq+0x71/0xa4
> [  140.108062]  [<c010afd2>] do_IRQ+0x4d/0x63
> [  140.108062]  [<c01092a7>] common_interrupt+0x23/0x28
> [  140.108062]  [<c01300d8>] ptrace_request+0x1ec/0x278
> [  140.108062]  [<c012d0c6>] __do_softirq+0x57/0xd3
> [  140.108062]  [<c012d187>] do_softirq+0x45/0x53
> [  140.108062]  [<c012d43e>] irq_exit+0x35/0x67
> [  140.108062]  [<c01152b6>] smp_apic_timer_interrupt+0x6b/0x75
> [  140.108062]  [<c0109364>] apic_timer_interrupt+0x28/0x30
> [  140.108062]  [<c02c9953>] _spin_unlock_irqrestore+0x7/0x10
> [  140.108062]  [<e0865a94>] scsi_dispatch_cmd+0x197/0x205 [scsi_mod]
> [  140.108062]  [<e086ab2e>] scsi_request_fn+0x264/0x32a [scsi_mod]
> [  140.108063]  [<c01dcbd6>] __generic_unplug_device+0x1a/0x1c
> [  140.108063]  [<c01dd3e9>] __make_request+0x2fe/0x348
> [  140.108063]  [<c01dc008>] generic_make_request+0x34d/0x37b
> [  140.108063]  [<c015f9f1>] mempool_alloc+0x1c/0xba
> [  140.108063]  [<c01dd0e4>] submit_bio+0xc6/0xcd
> [  140.108063]  [<c019cdff>] bio_alloc_bioset+0x9b/0xf3
> [  140.108063]  [<c0199983>] submit_bh+0xcf/0xed
> [  140.108063]  [<c019b32e>] __block_write_full_page+0x1fa/0x2da
> [  140.108063]  [<c019eb73>] blkdev_get_block+0x0/0x43
> [  140.108063]  [<c019b4ef>] block_write_full_page+0xe1/0xea
> [  140.108063]  [<c019eb73>] blkdev_get_block+0x0/0x43
> [  140.108063]  [<c01626d5>] __writepage+0x8/0x21
> [  140.108063]  [<c0162b50>] write_cache_pages+0x16a/0x27b
> [  140.108063]  [<c01626cd>] __writepage+0x0/0x21
> [  140.108063]  [<c0162c61>] generic_writepages+0x0/0x21
> [  140.108063]  [<c0162c7b>] generic_writepages+0x1a/0x21
> [  140.108063]  [<c0162ca2>] do_writepages+0x20/0x30
> [  140.108063]  [<c0196525>] __writeback_single_inode+0x127/0x251
> [  140.108063]  [<c019691c>] sync_sb_inodes+0x17c/0x233
> [  140.108063]  [<c0196c93>] writeback_inodes+0x53/0x99
> [  140.108063]  [<c01638c1>] pdflush+0x0/0x1cc
> [  140.108063]  [<c016357c>] wb_kupdate+0x7b/0xdb
> [  140.108063]  [<c01639f0>] pdflush+0x12f/0x1cc
> [  140.108063]  [<c0163501>] wb_kupdate+0x0/0xdb
> [  140.108063]  [<c0138643>] kthread+0x38/0x5d
> [  140.108063]  [<c013860b>] kthread+0x0/0x5d
> [  140.108063]  [<c01094f3>] kernel_thread_helper+0x7/0x10
> [  140.108063]  =======================
> [  140.108063] Code: 93 4c 01 00 00 52 50 68 42 76 8e e0 eb 4e 8d 83 b0 00 00 00 e8 32 71 96 df 8d 93 4c 01 00 00 52 50 68 7c 76 8e e0 eb 59 8b 1c 24 <8b> 93 58 03 00 00 8b 82 84 00 00 00 8b 1a 8b 70 60 85 f6 74 29 
> [  140.108063] EIP: [<e08e2670>] sym_int_sir+0x547/0x118f [sym53c8xx] SS:ESP 0068:de0f3ba0
> [  140.162446] Kernel panic - not syncing: Fatal exception in interrupt
> 
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 23
> model name      : Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz
> stepping        : 6
> cpu MHz         : 2500.087
> cache size      : 6144 KB
> [...]
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall lm
> constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2
> ssse3 cx16 xtpr dca sse4_1 lahf_lm
> bogomips        : 5000.23
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 38 bits physical, 48 bits virtual
> power management:

-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG() with SCSI-interfaced disk images
  2009-01-07 20:19 ` John Morrissey
@ 2009-01-07 22:34   ` Ryan Harper
  2009-01-08  2:13     ` John Morrissey
  0 siblings, 1 reply; 12+ messages in thread
From: Ryan Harper @ 2009-01-07 22:34 UTC (permalink / raw)
  To: John Morrissey; +Cc: kvm

* John Morrissey <jwm@horde.net> [2009-01-07 15:59]:
> On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
> > I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
> > images. I've tried with the Debian packaging of KVM 79 and 82; both
> > exhibit the same behavior (disclaimer: Debian has about a dozen patches in
> > their kvm packaging, but they all seem to be changes to the build/install
> > process or security-related).
> 
> Not to be pushy, but does anyone have any ideas on this, or can I provide
> any additional information? I'm afraid I'm a bit over my head when debugging
> kernel internals.

Sorry, I meant to respond.  This is more than likely a SCSI emulation
error rather than a kernel error.  I've seen the error a couple of
times, but I don't have a fix for the issue yet as I don't have a
reliable way to reproduce the error.  If you have an easy way to
reproduce the bug, I'll see if I can figure out a fix.  

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG() with SCSI-interfaced disk images
  2009-01-07 22:34   ` Ryan Harper
@ 2009-01-08  2:13     ` John Morrissey
  2009-01-08 14:01       ` Ryan Harper
  0 siblings, 1 reply; 12+ messages in thread
From: John Morrissey @ 2009-01-08  2:13 UTC (permalink / raw)
  To: Ryan Harper; +Cc: kvm

On Wed, Jan 07, 2009 at 04:34:50PM -0600, Ryan Harper wrote:
> * John Morrissey <jwm@horde.net> [2009-01-07 15:59]:
> > On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
> > > I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
> > > images. I've tried with the Debian packaging of KVM 79 and 82; both
> > > exhibit the same behavior (disclaimer: Debian has about a dozen
> > > patches in their kvm packaging, but they all seem to be changes to the
> > > build/install process or security-related).
> > 
> > Not to be pushy, but does anyone have any ideas on this, or can I
> > provide any additional information? I'm afraid I'm a bit over my head
> > when debugging kernel internals.
> 
> Sorry, I meant to respond.  This is more than likely a SCSI emulation
> error rather than a kernel error.  I've seen the error a couple of
> times, but I don't have a fix for the issue yet as I don't have a
> reliable way to reproduce the error.  If you have an easy way to
> reproduce the bug, I'll see if I can figure out a fix.  

I can reproduce this reliably when fscking a filesystem in a .vmdk I have.
I can't give you the vmdk or a dump of the filesystem, but I can devote some
time to troubleshoot this if you can guide me a little. If having the vmdk
is really important, I might be able to sanitize it enough to send it to you
(hopefully not making this bug unreproducible in the process).

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG() with SCSI-interfaced disk images
  2009-01-08  2:13     ` John Morrissey
@ 2009-01-08 14:01       ` Ryan Harper
  2009-01-08 18:15         ` John Morrissey
  0 siblings, 1 reply; 12+ messages in thread
From: Ryan Harper @ 2009-01-08 14:01 UTC (permalink / raw)
  To: John Morrissey; +Cc: Ryan Harper, kvm

* John Morrissey <jwm@horde.net> [2009-01-07 20:16]:
> On Wed, Jan 07, 2009 at 04:34:50PM -0600, Ryan Harper wrote:
> > * John Morrissey <jwm@horde.net> [2009-01-07 15:59]:
> > > On Fri, Dec 26, 2008 at 04:00:28PM -0500, John Morrissey wrote:
> > > > I'm encountering a kernel BUG() in guests using SCSI-interfaced disk
> > > > images. I've tried with the Debian packaging of KVM 79 and 82; both
> > > > exhibit the same behavior (disclaimer: Debian has about a dozen
> > > > patches in their kvm packaging, but they all seem to be changes to the
> > > > build/install process or security-related).
> > > 
> > > Not to be pushy, but does anyone have any ideas on this, or can I
> > > provide any additional information? I'm afraid I'm a bit over my head
> > > when debugging kernel internals.
> > 
> > Sorry, I meant to respond.  This is more than likely a SCSI emulation
> > error rather than a kernel error.  I've seen the error a couple of
> > times, but I don't have a fix for the issue yet as I don't have a
> > reliable way to reproduce the error.  If you have an easy way to
> > reproduce the bug, I'll see if I can figure out a fix.  
> 
> I can reproduce this reliably when fscking a filesystem in a .vmdk I have.
> I can't give you the vmdk or a dump of the filesystem, but I can devote some
> time to troubleshoot this if you can guide me a little. If having the vmdk
> is really important, I might be able to sanitize it enough to send it to you
> (hopefully not making this bug unreproducible in the process).

I don't need the vmdk, but if there is some other repeatable process
that can trigger this for you, getting that will allow me to recreate
the issue myself.  For example, installing Debian into a vmdk, reboot,
and then fsck'ing from inside the vm would trigger it.  Finding some
sort of repeatable process that can trip the bug but without using any
of your specific data would be the best way to move forward with the
bug.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG() with SCSI-interfaced disk images
  2009-01-08 14:01       ` Ryan Harper
@ 2009-01-08 18:15         ` John Morrissey
  0 siblings, 0 replies; 12+ messages in thread
From: John Morrissey @ 2009-01-08 18:15 UTC (permalink / raw)
  To: Ryan Harper; +Cc: kvm

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

On Thu, Jan 08, 2009 at 08:01:03AM -0600, Ryan Harper wrote:
> I don't need the vmdk, but if there is some other repeatable process
> that can trigger this for you, getting that will allow me to recreate
> the issue myself.  For example, installing Debian into a vmdk, reboot,
> and then fsck'ing from inside the vm would trigger it.  Finding some
> sort of repeatable process that can trip the bug but without using any
> of your specific data would be the best way to move forward with the
> bug.

I can reproduce this when installing Debian lenny i386 (using the lenny rc1
install images from http://www.debian.org/devel/debian-installer/). The
installer will complain of I/O problems when trying to mkfs(8) the
filesystem and will prompt you to retry/ignore. Shortly thereafter, the
domain kernel panics.

Attached is the libvirt configuration; it's pretty straightforward and
translates into this kvm(1) invocation:

/usr/bin/kvm -S -M pc -m 512 -smp 1 -name scsi -monitor pty -boot n \
	-drive file=/var/lib/libvirt/images/scsi.qcow,if=scsi,index=0 \
	-net nic,macaddr=02:00:00:4d:58:13,vlan=0,model=e1000 \
	-net tap,fd=11,script=,vlan=0,ifname=vnet0 \
	-net nic,macaddr=02:00:00:23:7f:3d,vlan=1,model=e1000 \
	-net tap,fd=13,script=,vlan=1,ifname=vnet1 \
	-serial pty -parallel none -usb -vnc 0.0.0.0:0

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

[-- Attachment #2: scsi.xml --]
[-- Type: application/xml, Size: 1150 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* qcow2 corruption?
  2008-12-26 21:00 BUG() with SCSI-interfaced disk images John Morrissey
  2009-01-07 20:19 ` John Morrissey
@ 2009-01-08 19:24 ` John Morrissey
  2009-01-08 20:10   ` Ryan Harper
  2009-01-08 20:33   ` Anthony Liguori
  1 sibling, 2 replies; 12+ messages in thread
From: John Morrissey @ 2009-01-08 19:24 UTC (permalink / raw)
  To: kvm

I'm encountering what seems like disk corruption when using qcow2 images,
created with 'kvm-img create -f qcow2 image.qcow2 15G'.

A simple test case is to use the Debian installer (I'm using the lenny rc1
images from http://www.debian.org/devel/debian-installer/) to install a new
domain. The qcow2 file on disk grows due to the mkfs(8) activity, then the
installer faults while trying to mount the root filesystem (Invalid
argument). 'fdisk -l' shows that the partition table just created by the
installer is gone.

In a few cases, I've managed to finish an installation, but the resulting
filesystem is strangely corrupt:

# file /usr/bin/w.procps
/usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last modified: Wed Mar 14 14:11:18 2007, max compression

I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same
behavior (disclaimer: Debian has about a dozen patches in their kvm
packaging, but they all seem to be changes to the build/install process or
security-related). Host running KVM is up-to-date Debian lenny
(64-bit/amd64) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64
2.6.26-12).

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qcow2 corruption?
  2009-01-08 19:24 ` qcow2 corruption? John Morrissey
@ 2009-01-08 20:10   ` Ryan Harper
  2009-01-08 20:16     ` John Morrissey
  2009-01-08 20:33   ` Anthony Liguori
  1 sibling, 1 reply; 12+ messages in thread
From: Ryan Harper @ 2009-01-08 20:10 UTC (permalink / raw)
  To: John Morrissey; +Cc: kvm

* John Morrissey <jwm@horde.net> [2009-01-08 13:28]:
> I'm encountering what seems like disk corruption when using qcow2 images,
> created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> 

using ide or scsi as your block device?

> A simple test case is to use the Debian installer (I'm using the lenny rc1
> images from http://www.debian.org/devel/debian-installer/) to install a new
> domain. The qcow2 file on disk grows due to the mkfs(8) activity, then the
> installer faults while trying to mount the root filesystem (Invalid
> argument). 'fdisk -l' shows that the partition table just created by the
> installer is gone.
> 
> In a few cases, I've managed to finish an installation, but the resulting
> filesystem is strangely corrupt:
> 
> # file /usr/bin/w.procps
> /usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last modified: Wed Mar 14 14:11:18 2007, max compression

If you are using ide and getting corruption, try again but with creating
a disk image with the raw format: 

qemu-img create -f raw <imagename> <size>

That should help track down where the corruption is coming from.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qcow2 corruption?
  2009-01-08 20:10   ` Ryan Harper
@ 2009-01-08 20:16     ` John Morrissey
  0 siblings, 0 replies; 12+ messages in thread
From: John Morrissey @ 2009-01-08 20:16 UTC (permalink / raw)
  To: Ryan Harper; +Cc: kvm

On Thu, Jan 08, 2009 at 02:10:31PM -0600, Ryan Harper wrote:
> * John Morrissey <jwm@horde.net> [2009-01-08 13:28]:
> > I'm encountering what seems like disk corruption when using qcow2 images,
> > created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> 
> using ide or scsi as your block device?

IDE.

> > A simple test case is to use the Debian installer (I'm using the lenny
> > rc1 images from http://www.debian.org/devel/debian-installer/) to
> > install a new domain. The qcow2 file on disk grows due to the mkfs(8)
> > activity, then the installer faults while trying to mount the root
> > filesystem (Invalid argument). 'fdisk -l' shows that the partition table
> > just created by the installer is gone.
> > 
> > In a few cases, I've managed to finish an installation, but the resulting
> > filesystem is strangely corrupt:
> > 
> > # file /usr/bin/w.procps
> > /usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last modified: Wed Mar 14 14:11:18 2007, max compression
> 
> If you are using ide and getting corruption, try again but with creating
> a disk image with the raw format: 
> 
> qemu-img create -f raw <imagename> <size>
> 
> That should help track down where the corruption is coming from.

raw images are fine. (Sorry, should've mentioned that in the first place.)

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qcow2 corruption?
  2009-01-08 19:24 ` qcow2 corruption? John Morrissey
  2009-01-08 20:10   ` Ryan Harper
@ 2009-01-08 20:33   ` Anthony Liguori
  2009-01-09  3:42     ` John Morrissey
  1 sibling, 1 reply; 12+ messages in thread
From: Anthony Liguori @ 2009-01-08 20:33 UTC (permalink / raw)
  To: John Morrissey; +Cc: kvm

John Morrissey wrote:
> I'm encountering what seems like disk corruption when using qcow2 images,
> created with 'kvm-img create -f qcow2 image.qcow2 15G'.
>
> A simple test case is to use the Debian installer (I'm using the lenny rc1
> images from http://www.debian.org/devel/debian-installer/) to install a new
> domain. The qcow2 file on disk grows due to the mkfs(8) activity, then the
> installer faults while trying to mount the root filesystem (Invalid
> argument). 'fdisk -l' shows that the partition table just created by the
> installer is gone.
>
> In a few cases, I've managed to finish an installation, but the resulting
> filesystem is strangely corrupt:
>
> # file /usr/bin/w.procps
> /usr/bin/w.procps: gzip compressed data, was "aptitude.8", from Unix, last modified: Wed Mar 14 14:11:18 2007, max compression
>
> I've tried with the Debian packaging of KVM 79 and 82; both exhibit the same
> behavior (disclaimer: Debian has about a dozen patches in their kvm
> packaging, but they all seem to be changes to the build/install process or
> security-related). Host running KVM is up-to-date Debian lenny
> (64-bit/amd64) running kernel 2.6.26 (Debian linux-image-2.6.26-1-amd64
> 2.6.26-12).
>   

There are patches that touch the block layer.  Please try to reproduce 
on vanilla kvm.  I don't trust the debian patches.

Regards,

Anthony Liguori

> john
>   


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qcow2 corruption?
  2009-01-08 20:33   ` Anthony Liguori
@ 2009-01-09  3:42     ` John Morrissey
  2009-01-09 13:34       ` Ryan Harper
  0 siblings, 1 reply; 12+ messages in thread
From: John Morrissey @ 2009-01-09  3:42 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kvm

On Thu, Jan 08, 2009 at 02:33:28PM -0600, Anthony Liguori wrote:
> John Morrissey wrote:
> >I'm encountering what seems like disk corruption when using qcow2 images,
> >created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> >
> >A simple test case is to use the Debian installer (I'm using the lenny
> >rc1 images from http://www.debian.org/devel/debian-installer/) to install
> >a new domain. The qcow2 file on disk grows due to the mkfs(8) activity,
> >then the installer faults while trying to mount the root filesystem
> >(Invalid argument). 'fdisk -l' shows that the partition table just
> >created by the installer is gone.
> 
> There are patches that touch the block layer.  Please try to reproduce 
> on vanilla kvm.  I don't trust the debian patches.

Couldn't reproduce this with Debian packaging minus its patch for
CVE-2008-0928 (taken from Fedora FWIW), which is the only one touching the
block layer.

Upon further scrutiny, I realized I pooched updating the patch for KVM 82.
The value for the BDRV_O_AUTOGROW constant introduced in that patch collides
with a new BDRV_ constant introduced between KVM 79 and 82. Changing the
constant's value (Fedora project has an updated patch, too) fixes this.

Ryan, this seems to fix the SCSI BUGging, too. I figure you won't want to
pursue that further?

Sorry for the bother, guys.

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: qcow2 corruption?
  2009-01-09  3:42     ` John Morrissey
@ 2009-01-09 13:34       ` Ryan Harper
  0 siblings, 0 replies; 12+ messages in thread
From: Ryan Harper @ 2009-01-09 13:34 UTC (permalink / raw)
  To: John Morrissey; +Cc: Anthony Liguori, kvm

* John Morrissey <jwm@horde.net> [2009-01-08 21:44]:
> On Thu, Jan 08, 2009 at 02:33:28PM -0600, Anthony Liguori wrote:
> > John Morrissey wrote:
> > >I'm encountering what seems like disk corruption when using qcow2 images,
> > >created with 'kvm-img create -f qcow2 image.qcow2 15G'.
> > >
> > >A simple test case is to use the Debian installer (I'm using the lenny
> > >rc1 images from http://www.debian.org/devel/debian-installer/) to install
> > >a new domain. The qcow2 file on disk grows due to the mkfs(8) activity,
> > >then the installer faults while trying to mount the root filesystem
> > >(Invalid argument). 'fdisk -l' shows that the partition table just
> > >created by the installer is gone.
> > 
> > There are patches that touch the block layer.  Please try to reproduce 
> > on vanilla kvm.  I don't trust the debian patches.
> 
> Couldn't reproduce this with Debian packaging minus its patch for
> CVE-2008-0928 (taken from Fedora FWIW), which is the only one touching the
> block layer.
> 
> Upon further scrutiny, I realized I pooched updating the patch for KVM 82.
> The value for the BDRV_O_AUTOGROW constant introduced in that patch collides
> with a new BDRV_ constant introduced between KVM 79 and 82. Changing the
> constant's value (Fedora project has an updated patch, too) fixes this.
> 
> Ryan, this seems to fix the SCSI BUGging, too. I figure you won't want to
> pursue that further?

excellent!  I had seen the error before but only while developing some
new code for the scsi device, so it was a little surprising to see.  If
you can't recreate now, I think we're done. =)

> 
> Sorry for the bother, guys.

np, thanks for testing.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-01-09 13:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-26 21:00 BUG() with SCSI-interfaced disk images John Morrissey
2009-01-07 20:19 ` John Morrissey
2009-01-07 22:34   ` Ryan Harper
2009-01-08  2:13     ` John Morrissey
2009-01-08 14:01       ` Ryan Harper
2009-01-08 18:15         ` John Morrissey
2009-01-08 19:24 ` qcow2 corruption? John Morrissey
2009-01-08 20:10   ` Ryan Harper
2009-01-08 20:16     ` John Morrissey
2009-01-08 20:33   ` Anthony Liguori
2009-01-09  3:42     ` John Morrissey
2009-01-09 13:34       ` Ryan Harper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).