xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Severe guest disk corruption with device_model_stubdomain_override=1...
@ 2016-03-23  6:03 Sarah Newman
  2016-03-23 21:46 ` Sarah Newman
  0 siblings, 1 reply; 5+ messages in thread
From: Sarah Newman @ 2016-03-23  6:03 UTC (permalink / raw)
  To: xen-devel

And nested xen.

CPU: AMD Opteron 2352
Outer configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-18.el6.x86_64
Inner configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-19.el6.x86_64
Inner xen command line:  cpuinfo loglvl=all guest_loglvl=error dom0_mem=512M,max:512M com1=115200,8n1 console=com1 dom0_max_vcpus=1 dom0_vcpus_pin=true
Inner linux command line: ro root=LABEL=DISK rootflags=barrier=0 swiotlb=32768 console=hvc0

Relevant parts of xl.cfg:
builder = 'hvm'
memory = 2688
vcpus = 1
pae = 1
nx = 1
acpi = 1
viridian = 0
xen_platform_pci = 0
apic = 1
device_model_stubdomain_override = 1
disk = ['file:/var/lib/instance-disks/c6:0,hda,w',
        'file:/var/lib/instance-disks/c6:1,hdb,w',
        'file:/var/lib/instance-disks/c6:2,hdc,w']

This is the error that typically comes up with device_model_stubdomain_override=1:

scsi host0: ata_piix


scsi host1: ata_piix


ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc200 irq 14


ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc208 irq 15


ata1.00: ATA-7: QEMU HARDDISK, 0.10.2, max UDMA/100


ata1.00: 20971520 sectors, multi 16: LBA48


ata1.01: ATA-7: QEMU HARDDISK, 0.10.2, max UDMA/100


ata1.01: 209715200 sectors, multi 16: LBA48


ata2.00: ATA-7: QEMU HARDDISK, 0.10.2, max UDMA/100


ata2.00: 41943040 sectors, multi 16: LBA48


ata2.00: configured for MWDMA2


ata1.00: configured for MWDMA2


ata1.01: configured for MWDMA2


scsi 0:0:0:0: Direct-Access     ATA      QEMU HARDDISK    .2   PQ: 0 ANSI: 5


scsi 0:0:1:0: Direct-Access     ATA      QEMU HARDDISK    .2   PQ: 0 ANSI: 5


scsi 1:0:0:0: Direct-Access     ATA      QEMU HARDDISK    .2   PQ: 0 ANSI: 5


sd 0:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)


sd 0:0:1:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)


sd 1:0:0:0: [sdc] 41943040 512-byte logical blocks: (21.4 GB/20.0 GiB)


sd 0:0:1:0: [sdb] Write Protect is off


sd 1:0:0:0: [sdc] Write Protect is off


sd 0:0:0:0: [sda] Write Protect is off


sd 0:0:1:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA


sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA


sd 1:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA


...
sd 0:0:0:0: [sda] Attached SCSI disk


sd 0:0:1:0: [sdb] Attached SCSI disk


sd 1:0:0:0: [sdc] Attached SCSI disk


....
sd 0:0:0:0: Attached scsi generic sg0 type 0


sd 0:0:1:0: Attached scsi generic sg1 type 0


sd 1:0:0:0: Attached scsi generic sg2 type 0


...

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0


ata1.00: BMDMA stat 0x4


ata1.00: failed command: READ DMA


ata1.00: cmd c8/00:08:d0:09:a5/00:00:00:00:00/e0 tag 0 dma 4096 in


         res 41/04:08:d0:09:a5/00:00:00:00:00/e0 Emask 0x1 (device error)


ata1.00: status: { DRDY ERR }


ata1.00: error: { ABRT }



Out of curiosity, I tried adding libata.dma=0 to the linux command line and after that, the corruption was bad enough I had to start over.

With the normal command line, device_model_stubdomain_override = 0, and nested xen, there are no errors.
After adding libata.dma=0 to the kernel command line, with device_model_stubdomain_override = 0 and nested xen, there are no errors.

xen_platform_pci seems to be ignored with device_model_stubdomain_override=1. So I don't think I can test what happens with the 3.18.25-19.el6.x86_64
kernel, no nested xen, and non-paravirtual block devices.

--Sarah

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Severe guest disk corruption with device_model_stubdomain_override=1...
  2016-03-23  6:03 Severe guest disk corruption with device_model_stubdomain_override=1 Sarah Newman
@ 2016-03-23 21:46 ` Sarah Newman
  2016-03-24  4:26   ` Sarah Newman
  2016-03-24  9:55   ` George Dunlap
  0 siblings, 2 replies; 5+ messages in thread
From: Sarah Newman @ 2016-03-23 21:46 UTC (permalink / raw)
  To: xen-devel

On 03/22/2016 11:03 PM, Sarah Newman wrote:
> And nested xen.
> 
> CPU: AMD Opteron 2352
> Outer configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-18.el6.x86_64
> Inner configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-19.el6.x86_64
> Inner xen command line:  cpuinfo loglvl=all guest_loglvl=error dom0_mem=512M,max:512M com1=115200,8n1 console=com1 dom0_max_vcpus=1 dom0_vcpus_pin=true
> Inner linux command line: ro root=LABEL=DISK rootflags=barrier=0 swiotlb=32768 console=hvc0

> xen_platform_pci seems to be ignored with device_model_stubdomain_override=1. So I don't think I can test what happens with the 3.18.25-19.el6.x86_64
> kernel, no nested xen, and non-paravirtual block devices.

The patch submitted in http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html appears to fix the issue.

--Sarah


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Severe guest disk corruption with device_model_stubdomain_override=1...
  2016-03-23 21:46 ` Sarah Newman
@ 2016-03-24  4:26   ` Sarah Newman
  2016-03-24  9:55   ` George Dunlap
  1 sibling, 0 replies; 5+ messages in thread
From: Sarah Newman @ 2016-03-24  4:26 UTC (permalink / raw)
  To: xen-devel

On 03/23/2016 02:46 PM, Sarah Newman wrote:
> On 03/22/2016 11:03 PM, Sarah Newman wrote:
>> And nested xen.
>>
>> CPU: AMD Opteron 2352
>> Outer configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-18.el6.x86_64
>> Inner configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-19.el6.x86_64
>> Inner xen command line:  cpuinfo loglvl=all guest_loglvl=error dom0_mem=512M,max:512M com1=115200,8n1 console=com1 dom0_max_vcpus=1 dom0_vcpus_pin=true
>> Inner linux command line: ro root=LABEL=DISK rootflags=barrier=0 swiotlb=32768 console=hvc0
> 
>> xen_platform_pci seems to be ignored with device_model_stubdomain_override=1. So I don't think I can test what happens with the 3.18.25-19.el6.x86_64
>> kernel, no nested xen, and non-paravirtual block devices.
> 
> The patch submitted in http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html appears to fix the issue.

FYI, I also had to run "ethtool -K <vifname>-emu tx off" or tcp did not work for intra-host communications (off-host worked OK.) I'm not sure if
that's a known issue or not.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Severe guest disk corruption with device_model_stubdomain_override=1...
  2016-03-23 21:46 ` Sarah Newman
  2016-03-24  4:26   ` Sarah Newman
@ 2016-03-24  9:55   ` George Dunlap
  2016-03-24 17:20     ` Sarah Newman
  1 sibling, 1 reply; 5+ messages in thread
From: George Dunlap @ 2016-03-24  9:55 UTC (permalink / raw)
  To: Sarah Newman; +Cc: xen-devel

On Wed, Mar 23, 2016 at 9:46 PM, Sarah Newman <srn@prgmr.com> wrote:
> On 03/22/2016 11:03 PM, Sarah Newman wrote:
>> And nested xen.
>>
>> CPU: AMD Opteron 2352
>> Outer configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-18.el6.x86_64
>> Inner configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-19.el6.x86_64
>> Inner xen command line:  cpuinfo loglvl=all guest_loglvl=error dom0_mem=512M,max:512M com1=115200,8n1 console=com1 dom0_max_vcpus=1 dom0_vcpus_pin=true
>> Inner linux command line: ro root=LABEL=DISK rootflags=barrier=0 swiotlb=32768 console=hvc0
>
>> xen_platform_pci seems to be ignored with device_model_stubdomain_override=1. So I don't think I can test what happens with the 3.18.25-19.el6.x86_64
>> kernel, no nested xen, and non-paravirtual block devices.
>
> The patch submitted in http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html appears to fix the issue.

This is the best kind of bug report to come in to find in the morning
-- reported, then fix posted. :-)

But your patch to minios is for the netfront driver -- why would that
cause a fault in the emulated disk?  Did the guest hang because the
stubdomain was spinning rather than handling requests?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Severe guest disk corruption with device_model_stubdomain_override=1...
  2016-03-24  9:55   ` George Dunlap
@ 2016-03-24 17:20     ` Sarah Newman
  0 siblings, 0 replies; 5+ messages in thread
From: Sarah Newman @ 2016-03-24 17:20 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

On 03/24/2016 02:55 AM, George Dunlap wrote:
> On Wed, Mar 23, 2016 at 9:46 PM, Sarah Newman <srn@prgmr.com> wrote:
>> On 03/22/2016 11:03 PM, Sarah Newman wrote:
>>> And nested xen.
>>>
>>> CPU: AMD Opteron 2352
>>> Outer configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-18.el6.x86_64
>>> Inner configuration: Xen4CentOS 6 xen 4.6.1-2.el6, linux 3.18.25-19.el6.x86_64
>>> Inner xen command line:  cpuinfo loglvl=all guest_loglvl=error dom0_mem=512M,max:512M com1=115200,8n1 console=com1 dom0_max_vcpus=1 dom0_vcpus_pin=true
>>> Inner linux command line: ro root=LABEL=DISK rootflags=barrier=0 swiotlb=32768 console=hvc0
>>
>>> xen_platform_pci seems to be ignored with device_model_stubdomain_override=1. So I don't think I can test what happens with the 3.18.25-19.el6.x86_64
>>> kernel, no nested xen, and non-paravirtual block devices.
>>
>> The patch submitted in http://lists.xenproject.org/archives/html/xen-devel/2016-03/msg03080.html appears to fix the issue.
> 
> This is the best kind of bug report to come in to find in the morning
> -- reported, then fix posted. :-)
> 
> But your patch to minios is for the netfront driver -- why would that
> cause a fault in the emulated disk?  Did the guest hang because the
> stubdomain was spinning rather than handling requests?


Please see https://github.com/QubesOS/qubes-issues/issues/1486 and http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg00917.html .

This was reported as a bug almost four months ago, but it got dropped on the floor.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-24 17:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-23  6:03 Severe guest disk corruption with device_model_stubdomain_override=1 Sarah Newman
2016-03-23 21:46 ` Sarah Newman
2016-03-24  4:26   ` Sarah Newman
2016-03-24  9:55   ` George Dunlap
2016-03-24 17:20     ` Sarah Newman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).