All of lore.kernel.org
 help / color / mirror / Atom feed
* Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
@ 2012-12-14 14:54 Alex Bligh
  2012-12-17 10:10 ` Jan Beulich
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2012-12-14 14:54 UTC (permalink / raw)
  To: Xen Devel; +Cc: Ian Campbell, Alex Bligh

We are seeing a nasty crash on xen4.2 HVM + qemu-xen device model.

When running an Ubuntu Cloud Image VM as a guest operating system,
then all (or nearly all) the time, some way through the boot process
the physical machine either crashes totally and reboots, or loses
networking. A typical crash dump is below.

The strange things is this does *NOT* appear to happen using the
non-cloud-image version of the same Ubuntu guest operating system
(despite loading it with bonnie++ and lots of network traffic).
We believe the main significant change is that the cloud image
resizes its partition thus filing system on boot. Perhaps some
magic happens when the partition table is written to. Obviously
no guest OS should crash dom0.

The setup we have at the moment is a qcow2 disk file on NFS and
a backing file, using the qemu-xen device model. It seems to
require NFS to crash it.

Steps to replicate:

# cd /my/nfs/directory
# wget http://cloud-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-disk1.img
# qemu-img create -f qcow2 -b precise-server-cloudimg-amd64-disk1.img testdisk.qcow2 20G
# xl create xlcreate-qcow.conf

Start the machine and (this is important) change the boot line
to include the text 'ds=nocloud ubuntu-pass=password' (which
stops the image hanging whilst it's trying to fetch metadata).
You may want to remove console redirection to serial.
It should crash dom0 in less than a minute.

The config file is pasted below.

This is replicable independent of hardware (we've tried on 4 different
machines of various types). It is replicable independent of dom0
kernel (we've tried 3.2.0-32 and the current quantal kernel, and a
few others). It also does not happen on kvm (exactly the same setup).

This looks a bit like this ancient bug:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640941
which 2011 bug which Ian Campbell (copied) related to some more
ancient bugs, specifically this one:
  http://marc.info/?l=linux-nfs&m=122424132729720&w=2
However, as far as I can tell it breaks even with modern
kernels where the relevant NFS changes were made. Also,
we do not appear to need to force retransmits to happen
(it's possible that there is some lockup on Xen which is
causing the retransmit to occur which is triggering the
issue).

Any ideas?

-- 
Alex Bligh



# The domain build function. HVM domain uses 'hvm'.
builder='hvm'

# Initial memory allocation (in megabytes) for the new domain.
#
# WARNING: Creating a domain with insufficient memory may cause out of
#          memory errors. The domain needs enough memory to boot kernel
#          and modules. Allocating less than 32MBs is not recommended.
memory = 512


# A name for your domain. All domains must have different names.
name = "UbuntuXen"

# 128-bit UUID for the domain.  The default behavior is to generate a new UUID
# on each call to 'xm create'.
#uuid = "06ed00fe-1162-4fc4-b5d8-11993ee4a8b9"

#-----------------------------------------------------------------------------
# The number of cpus guest platform has, default=1
vcpus=2


disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]

vif = ['mac=00:16:3e:25:96:c8 , bridge=defaultbr']

device_model_version = 'qemu-xen'
device_model_override = '/usr/lib/xen/bin/qemu-system-i386'
#device_model_override = '/usr/bin/qemu-system-x86_64'
#device_model_args = [ '-monitor', 'tcp:127.0.0.1:2345' ]

sdl=0

#----------------------------------------------------------------------------
# enable OpenGL for texture rendering inside the SDL window, default = 1
# valid only if sdl is enabled.
opengl=1

#----------------------------------------------------------------------------
# enable VNC library for graphics, default = 1
vnc=1

#----------------------------------------------------------------------------
# address that should be listened on for the VNC server if vnc is set.
# default is to use 'vnc-listen' setting from
# auxbin.xen_configdir() + /xend-config.sxp
vnclisten="0.0.0.0"

#----------------------------------------------------------------------------
# set VNC display number, default = domid
vncdisplay=0

#----------------------------------------------------------------------------
# try to find an unused port for the VNC server, default = 1
vncunused=0

#----------------------------------------------------------------------------
# set password for domain's VNC console
# default is depents on vncpasswd in xend-config.sxp
vncpasswd='password'


#----------------------------------------------------------------------------
# enable stdvga, default = 0 (use cirrus logic device model)
stdvga=0

#-----------------------------------------------------------------------------
#   serial port re-direct to pty deivce, /dev/pts/n
#   then xm console or minicom can connect
serial='pty'





Kernel 3.2.0-32-generic on an x86_64

[ 1416.992402] BUG: unable to handle kernel paging request at
ffff88073fee6e00
[ 1416.992902] IP: [<ffffffff81318e2b>] memcpy+0xb/0x120
[ 1416.993244] PGD 1c06067 PUD 7ec73067 PMD 7ee73067 PTE 0
[ 1416.993985] Oops: 0000 [#1] SMP
[ 1416.994433] CPU 4
[ 1416.994587] Modules linked in: xt_physdev xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs veth ip6t_LOG
nf_conntrack_ipv6 nf_
defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state
xt_tcpudp nf_conntrack_netlink nfnetlink ebt_ip ebtable_filter
iptable_mangle ipt_MASQUERADE
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
iptable_filter ip_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi ebtable_broute ebtables
x_tables dcdbas psmouse serio_raw amd64_edac_mod usbhid hid edac_core
sp5100_tco i2c_piix
4 edac_mce_amd fam15h_power k10temp igb bnx2 acpi_power_meter mac_hid
dm_multipath bridge 8021q garp stp ixgbe dca mdio nfsd nfs lockd fscache
auth_rpcgss nf
s_acl sunrpc [last unloaded: scsi_transport_iscsi]
[ 1417.005011]
[ 1417.005011] Pid: 0, comm: swapper/4 Tainted: G        W
3.2.0-32-generic #51-Ubuntu Dell Inc. PowerEdge R715/0C5MMK
[ 1417.005011] RIP: e030:[<ffffffff81318e2b>]  [<ffffffff81318e2b>]
memcpy+0xb/0x120
[ 1417.005011] RSP: e02b:ffff880060083b08  EFLAGS: 00010246
[ 1417.005011] RAX: ffff88001e12c9e4 RBX: 0000000000000210 RCX:
0000000000000040
[ 1417.005011] RDX: 0000000000000000 RSI: ffff88073fee6e00 RDI:
ffff88001e12c9e4
[ 1417.005011] RBP: ffff880060083b70 R08: 00000000000002e8 R09:
0000000000000200
[ 1417.005011] R10: ffff88001e12c9e4 R11: 0000000000000280 R12:
00000000000000e8
[ 1417.005011] R13: ffff88004b014c00 R14: ffff88004b532000 R15:
0000000000000001
[ 1417.005011] FS:  00007f1a99089700(0000) GS:ffff880060080000(0000)
knlGS:0000000000000000
[ 1417.005011] CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
[ 1417.005011] CR2: ffff88073fee6e00 CR3: 0000000015d22000 CR4:
0000000000040660
[ 1417.005011] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1417.005011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 1417.005011] Process swapper/4 (pid: 0, threadinfo ffff88004b532000,
task ffff88004b538000)
[ 1417.005011] Stack:
[ 1417.005011]  ffffffff81532c0e 0000000000000000 ffff8800000002e8
ffff880000000200
[ 1417.005011]  ffff88001e12c9e4 0000000000000200 ffff88004b533fd8
ffff880060083ba0
[ 1417.005011]  ffff88004b015800 ffff88004b014c00 ffff88001b142000
00000000000000fc
[ 1417.005011] Call Trace:
[ 1417.005011]  <IRQ>
[ 1417.005011]  [<ffffffff81532c0e>] ? skb_copy_bits+0x16e/0x2c0
[ 1417.005011]  [<ffffffff8153463a>] skb_copy+0x8a/0xb0
[ 1417.005011]  [<ffffffff8154b517>] neigh_probe+0x37/0x80
[ 1417.005011]  [<ffffffff8154b9db>] __neigh_event_send+0xbb/0x210
[ 1417.005011]  [<ffffffff8154bc73>] neigh_resolve_output+0x143/0x1f0
[ 1417.005011]  [<ffffffff8156dde5>] ? nf_hook_slow+0x75/0x150
[ 1417.005011]  [<ffffffff8157a510>] ? ip_fragment+0x810/0x810
[ 1417.005011]  [<ffffffff8157a68e>] ip_finish_output+0x17e/0x2f0
[ 1417.005011]  [<ffffffff81533ddb>] ? __alloc_skb+0x4b/0x240
[ 1417.005011]  [<ffffffff8157b1e8>] ip_output+0x98/0xa0
[ 1417.005011]  [<ffffffff8157a8a4>] ? __ip_local_out+0xa4/0xb0
[ 1417.005011]  [<ffffffff8157a8d9>] ip_local_out+0x29/0x30
[ 1417.005011]  [<ffffffff8157aa3c>] ip_queue_xmit+0x15c/0x410
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81592c69>] tcp_transmit_skb+0x359/0x580
[ 1417.005011]  [<ffffffff81593be1>] tcp_retransmit_skb+0x171/0x310
[ 1417.005011]  [<ffffffff8159561b>] tcp_retransmit_timer+0x21b/0x440
[ 1417.005011]  [<ffffffff81595928>] tcp_write_timer+0xe8/0x110
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81075d36>] call_timer_fn+0x46/0x160
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81077682>] run_timer_softirq+0x132/0x2a0
[ 1417.005011]  [<ffffffff8106e5d8>] __do_softirq+0xa8/0x210
[ 1417.005011]  [<ffffffff813a94b7>] ? __xen_evtchn_do_upcall+0x207/0x250
[ 1417.005011]  [<ffffffff816656ac>] call_softirq+0x1c/0x30
[ 1417.005011]  [<ffffffff81015305>] do_softirq+0x65/0xa0
[ 1417.005011]  [<ffffffff8106e9be>] irq_exit+0x8e/0xb0
[ 1417.005011]  [<ffffffff813ab595>] xen_evtchn_do_upcall+0x35/0x50
[ 1417.005011]  [<ffffffff816656fe>] xen_do_hypervisor_callback+0x1e/0x30
[ 1417.005011]  <EOI>
[ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff8100a2d0>] ? xen_safe_halt+0x10/0x20
[ 1417.005011]  [<ffffffff8101b983>] ? default_idle+0x53/0x1d0
[ 1417.005011]  [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
[ 1417.005011]  [<ffffffff8100ab29>] ? xen_irq_enable_direct_reloc+0x4/0x4
[ 1417.005011]  [<ffffffff8163369c>] ? cpu_bringup_and_idle+0xe/0x10
[ 1417.005011] Code: 58 48 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8
1b fe ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83
e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c
[ 1417.005011] RIP  [<ffffffff81318e2b>] memcpy+0xb/0x120
[ 1417.005011]  RSP <ffff880060083b08>
[ 1417.005011] CR2: ffff88073fee6e00
[ 1417.005011] ---[ end trace ae4e7f56ea0665fe ]---
[ 1417.005011] Kernel panic - not syncing: Fatal exception in interrupt
[ 1417.005011] Pid: 0, comm: swapper/4 Tainted: G      D W
3.2.0-32-generic #51-Ubuntu
[ 1417.005011] Call Trace:
[ 1417.005011]  <IRQ>  [<ffffffff81642197>] panic+0x91/0x1a4
[ 1417.005011]  [<ffffffff8165c01a>] oops_end+0xea/0xf0
[ 1417.005011]  [<ffffffff81641027>] no_context+0x150/0x15d
[ 1417.005011]  [<ffffffff816411fd>] __bad_area_nosemaphore+0x1c9/0x1e8
[ 1417.005011]  [<ffffffff81640835>] ? pte_offset_kernel+0x13/0x3c
[ 1417.005011]  [<ffffffff8164122f>] bad_area_nosemaphore+0x13/0x15
[ 1417.005011]  [<ffffffff8165ec36>] do_page_fault+0x426/0x520
[ 1417.005011]  [<ffffffff8165b0ce>] ? _raw_spin_lock_irqsave+0x2e/0x40
[ 1417.005011]  [<ffffffff81059d8a>] ? get_nohz_timer_target+0x5a/0xc0
[ 1417.005011]  [<ffffffff8165b04e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 1417.005011]  [<ffffffff81077f93>] ? mod_timer_pending+0x113/0x240
[ 1417.005011]  [<ffffffffa0317f34>] ? __nf_ct_refresh_acct+0xd4/0x100
[nf_conntrack]
[ 1417.005011]  [<ffffffff8165b5b5>] page_fault+0x25/0x30
[ 1417.005011]  [<ffffffff81318e2b>] ? memcpy+0xb/0x120
[ 1417.005011]  [<ffffffff81532c0e>] ? skb_copy_bits+0x16e/0x2c0
[ 1417.005011]  [<ffffffff8153463a>] skb_copy+0x8a/0xb0
[ 1417.005011]  [<ffffffff8154b517>] neigh_probe+0x37/0x80
[ 1417.005011]  [<ffffffff8154b9db>] __neigh_event_send+0xbb/0x210
[ 1417.005011]  [<ffffffff8154bc73>] neigh_resolve_output+0x143/0x1f0
[ 1417.005011]  [<ffffffff8156dde5>] ? nf_hook_slow+0x75/0x150
[ 1417.005011]  [<ffffffff8157a510>] ? ip_fragment+0x810/0x810
[ 1417.005011]  [<ffffffff8157a68e>] ip_finish_output+0x17e/0x2f0
[ 1417.005011]  [<ffffffff81533ddb>] ? __alloc_skb+0x4b/0x240
[ 1417.005011]  [<ffffffff8157b1e8>] ip_output+0x98/0xa0
[ 1417.005011]  [<ffffffff8157a8a4>] ? __ip_local_out+0xa4/0xb0
[ 1417.005011]  [<ffffffff8157a8d9>] ip_local_out+0x29/0x30
[ 1417.005011]  [<ffffffff8157aa3c>] ip_queue_xmit+0x15c/0x410
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81592c69>] tcp_transmit_skb+0x359/0x580
[ 1417.005011]  [<ffffffff81593be1>] tcp_retransmit_skb+0x171/0x310
[ 1417.005011]  [<ffffffff8159561b>] tcp_retransmit_timer+0x21b/0x440
[ 1417.005011]  [<ffffffff81595928>] tcp_write_timer+0xe8/0x110
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81075d36>] call_timer_fn+0x46/0x160
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81077682>] run_timer_softirq+0x132/0x2a0
[ 1417.005011]  [<ffffffff8106e5d8>] __do_softirq+0xa8/0x210
[ 1417.005011]  [<ffffffff813a94b7>] ? __xen_evtchn_do_upcall+0x207/0x250
[ 1417.005011]  [<ffffffff816656ac>] call_softirq+0x1c/0x30
[ 1417.005011]  [<ffffffff81015305>] do_softirq+0x65/0xa0
[ 1417.005011]  [<ffffffff8106e9be>] irq_exit+0x8e/0xb0
[ 1417.005011]  [<ffffffff813ab595>] xen_evtchn_do_upcall+0x35/0x50
[ 1417.005011]  [<ffffffff816656fe>] xen_do_hypervisor_callback+0x1e/0x30
[ 1417.005011]  <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff8100a2d0>] ? xen_safe_halt+0x10/0x20
[ 1417.005011]  [<ffffffff8101b983>] ? default_idle+0x53/0x1d0
[ 1417.005011]  [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
[ 1417.005011]  [<ffffffff8100ab29>] ? xen_irq_enable_direct_reloc+0x4/0x4
[ 1417.005011]  [<ffffffff8163369c>] ? cpu_bringup_and_idle+0xe/0x10
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.	

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2012-12-14 14:54 Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
@ 2012-12-17 10:10 ` Jan Beulich
  2012-12-17 17:09   ` Alex Bligh
  2013-01-16 10:56   ` Alex Bligh
  0 siblings, 2 replies; 91+ messages in thread
From: Jan Beulich @ 2012-12-17 10:10 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Ian Campbell, Xen Devel

>>> On 14.12.12 at 15:54, Alex Bligh <alex@alex.org.uk> wrote:
> [ 1416.992402] BUG: unable to handle kernel paging request at ffff88073fee6e00

Assuming the address above is valid in the first place (i.e. you
have at least 32G installed), this very much suggests access to
a ballooned out page. Could you therefore suppress the use of
ballooning for debugging purposes, and see whether the issue
goes away then?

Jan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2012-12-17 10:10 ` Jan Beulich
@ 2012-12-17 17:09   ` Alex Bligh
  2013-01-16 10:56   ` Alex Bligh
  1 sibling, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2012-12-17 17:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Campbell, Alex Bligh, Xen Devel

Jan,

--On 17 December 2012 10:10:30 +0000 Jan Beulich <JBeulich@suse.com> wrote:

>>>> On 14.12.12 at 15:54, Alex Bligh <alex@alex.org.uk> wrote:
>> [ 1416.992402] BUG: unable to handle kernel paging request at ffff88073fee6e00
>
> Assuming the address above is valid in the first place (i.e. you
> have at least 32G installed), this very much suggests access to
> a ballooned out page. Could you therefore suppress the use of
> ballooning for debugging purposes, and see whether the issue
> goes away then?

Thanks. I believe that dump was on a 64G box (though it dies happily on lots
of hardware.

I believe we have dom0_mem configured already. Is setting autoballoon=0 in
xl.conf sufficient? (we don't use xend).

I will report back.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2012-12-17 10:10 ` Jan Beulich
  2012-12-17 17:09   ` Alex Bligh
@ 2013-01-16 10:56   ` Alex Bligh
  2013-01-16 14:34     ` Stefano Stabellini
  1 sibling, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-16 10:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Ian Campbell, Alex Bligh, Xen Devel

Jan,

--On 17 December 2012 10:10:30 +0000 Jan Beulich <JBeulich@suse.com> wrote:

>>>> On 14.12.12 at 15:54, Alex Bligh <alex@alex.org.uk> wrote:
>> [ 1416.992402] BUG: unable to handle kernel paging request at
>> ffff88073fee6e00
>
> Assuming the address above is valid in the first place (i.e. you
> have at least 32G installed), this very much suggests access to
> a ballooned out page. Could you therefore suppress the use of
> ballooning for debugging purposes, and see whether the issue
> goes away then?

We configured dom0_mem=512M,max:512M in the grub line and
put autobalooning=0 in xl.conf (we are using the xl tool stack).

We can still repeatably wipe out dom0 just by starting a VM.

Interestingly, if this step:

# qemu-img create -f qcow2 -b precise-server-cloudimg-amd64-disk1.img 
testdisk.qcow2 20G

is replaced by

# cp precise-server-cloudimg-amd64-disk1.img testdisk.qcow2

we don't get the crash.

It would thus imply it's something to do with the qcow2 backing file.
However the test works fine under KVM.

I suspect the backing file is a bit of a distraction, just like
running this specific image that triggers it is a bit of a distraction.
What the image is doing is extends the partition table and then
extends an ext4 filing system which does lots of reads and writes.
I'm guessing it's something triggered timing wise.

The interesting thing is this is totally replicable on every piece
of hardware we've tried, 100% of the time.

-- 
Alex Bligh


We are seeing a nasty crash on xen4.2 HVM + qemu-xen device model.

When running an Ubuntu Cloud Image VM as a guest operating system,
then all (or nearly all) the time, some way through the boot process
the physical machine either crashes totally and reboots, or loses
networking. A typical crash dump is below.

The strange things is this does *NOT* appear to happen using the
non-cloud-image version of the same Ubuntu guest operating system
(despite loading it with bonnie++ and lots of network traffic).
We believe the main significant change is that the cloud image
resizes its partition thus filing system on boot. Perhaps some
magic happens when the partition table is written to. Obviously
no guest OS should crash dom0.

The setup we have at the moment is a qcow2 disk file on NFS and
a backing file, using the qemu-xen device model. It seems to
require NFS to crash it.

Steps to replicate:

# cd /my/nfs/directory
# wget 
http://cloud-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-disk1.img
# qemu-img create -f qcow2 -b precise-server-cloudimg-amd64-disk1.img 
testdisk.qcow2 20G
# xl create xlcreate-qcow.conf

Start the machine and (this is important) change the boot line
to include the text 'ds=nocloud ubuntu-pass=password' (which
stops the image hanging whilst it's trying to fetch metadata).
You may want to remove console redirection to serial.
It should crash dom0 in less than a minute.

The config file is pasted below.

This is replicable independent of hardware (we've tried on 4 different
machines of various types). It is replicable independent of dom0
kernel (we've tried 3.2.0-32 and the current quantal kernel, and a
few others). It also does not happen on kvm (exactly the same setup).

This looks a bit like this ancient bug:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=640941
which 2011 bug which Ian Campbell (copied) related to some more
ancient bugs, specifically this one:
  http://marc.info/?l=linux-nfs&m=122424132729720&w=2
However, as far as I can tell it breaks even with modern
kernels where the relevant NFS changes were made. Also,
we do not appear to need to force retransmits to happen
(it's possible that there is some lockup on Xen which is
causing the retransmit to occur which is triggering the
issue).

Any ideas?

-- 
Alex Bligh



# The domain build function. HVM domain uses 'hvm'.
builder='hvm'

# Initial memory allocation (in megabytes) for the new domain.
#
# WARNING: Creating a domain with insufficient memory may cause out of
#          memory errors. The domain needs enough memory to boot kernel
#          and modules. Allocating less than 32MBs is not recommended.
memory = 512


# A name for your domain. All domains must have different names.
name = "UbuntuXen"

# 128-bit UUID for the domain.  The default behavior is to generate a new 
UUID
# on each call to 'xm create'.
#uuid = "06ed00fe-1162-4fc4-b5d8-11993ee4a8b9"

#-----------------------------------------------------------------------------
# The number of cpus guest platform has, default=1
vcpus=2


disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]

vif = ['mac=00:16:3e:25:96:c8 , bridge=defaultbr']

device_model_version = 'qemu-xen'
device_model_override = '/usr/lib/xen/bin/qemu-system-i386'
#device_model_override = '/usr/bin/qemu-system-x86_64'
#device_model_args = [ '-monitor', 'tcp:127.0.0.1:2345' ]

sdl=0

#----------------------------------------------------------------------------
# enable OpenGL for texture rendering inside the SDL window, default = 1
# valid only if sdl is enabled.
opengl=1

#----------------------------------------------------------------------------
# enable VNC library for graphics, default = 1
vnc=1

#----------------------------------------------------------------------------
# address that should be listened on for the VNC server if vnc is set.
# default is to use 'vnc-listen' setting from
# auxbin.xen_configdir() + /xend-config.sxp
vnclisten="0.0.0.0"

#----------------------------------------------------------------------------
# set VNC display number, default = domid
vncdisplay=0

#----------------------------------------------------------------------------
# try to find an unused port for the VNC server, default = 1
vncunused=0

#----------------------------------------------------------------------------
# set password for domain's VNC console
# default is depents on vncpasswd in xend-config.sxp
vncpasswd='password'


#----------------------------------------------------------------------------
# enable stdvga, default = 0 (use cirrus logic device model)
stdvga=0

#-----------------------------------------------------------------------------
#   serial port re-direct to pty deivce, /dev/pts/n
#   then xm console or minicom can connect
serial='pty'





Kernel 3.2.0-32-generic on an x86_64

[ 1416.992402] BUG: unable to handle kernel paging request at
ffff88073fee6e00
[ 1416.992902] IP: [<ffffffff81318e2b>] memcpy+0xb/0x120
[ 1416.993244] PGD 1c06067 PUD 7ec73067 PMD 7ee73067 PTE 0
[ 1416.993985] Oops: 0000 [#1] SMP
[ 1416.994433] CPU 4
[ 1416.994587] Modules linked in: xt_physdev xen_pciback xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs veth ip6t_LOG
nf_conntrack_ipv6 nf_
defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state
xt_tcpudp nf_conntrack_netlink nfnetlink ebt_ip ebtable_filter
iptable_mangle ipt_MASQUERADE
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
iptable_filter ip_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
ib_core ib_addr iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi ebtable_broute ebtables
x_tables dcdbas psmouse serio_raw amd64_edac_mod usbhid hid edac_core
sp5100_tco i2c_piix
4 edac_mce_amd fam15h_power k10temp igb bnx2 acpi_power_meter mac_hid
dm_multipath bridge 8021q garp stp ixgbe dca mdio nfsd nfs lockd fscache
auth_rpcgss nf
s_acl sunrpc [last unloaded: scsi_transport_iscsi]
[ 1417.005011]
[ 1417.005011] Pid: 0, comm: swapper/4 Tainted: G        W
3.2.0-32-generic #51-Ubuntu Dell Inc. PowerEdge R715/0C5MMK
[ 1417.005011] RIP: e030:[<ffffffff81318e2b>]  [<ffffffff81318e2b>]
memcpy+0xb/0x120
[ 1417.005011] RSP: e02b:ffff880060083b08  EFLAGS: 00010246
[ 1417.005011] RAX: ffff88001e12c9e4 RBX: 0000000000000210 RCX:
0000000000000040
[ 1417.005011] RDX: 0000000000000000 RSI: ffff88073fee6e00 RDI:
ffff88001e12c9e4
[ 1417.005011] RBP: ffff880060083b70 R08: 00000000000002e8 R09:
0000000000000200
[ 1417.005011] R10: ffff88001e12c9e4 R11: 0000000000000280 R12:
00000000000000e8
[ 1417.005011] R13: ffff88004b014c00 R14: ffff88004b532000 R15:
0000000000000001
[ 1417.005011] FS:  00007f1a99089700(0000) GS:ffff880060080000(0000)
knlGS:0000000000000000
[ 1417.005011] CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
[ 1417.005011] CR2: ffff88073fee6e00 CR3: 0000000015d22000 CR4:
0000000000040660
[ 1417.005011] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1417.005011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 1417.005011] Process swapper/4 (pid: 0, threadinfo ffff88004b532000,
task ffff88004b538000)
[ 1417.005011] Stack:
[ 1417.005011]  ffffffff81532c0e 0000000000000000 ffff8800000002e8
ffff880000000200
[ 1417.005011]  ffff88001e12c9e4 0000000000000200 ffff88004b533fd8
ffff880060083ba0
[ 1417.005011]  ffff88004b015800 ffff88004b014c00 ffff88001b142000
00000000000000fc
[ 1417.005011] Call Trace:
[ 1417.005011]  <IRQ>
[ 1417.005011]  [<ffffffff81532c0e>] ? skb_copy_bits+0x16e/0x2c0
[ 1417.005011]  [<ffffffff8153463a>] skb_copy+0x8a/0xb0
[ 1417.005011]  [<ffffffff8154b517>] neigh_probe+0x37/0x80
[ 1417.005011]  [<ffffffff8154b9db>] __neigh_event_send+0xbb/0x210
[ 1417.005011]  [<ffffffff8154bc73>] neigh_resolve_output+0x143/0x1f0
[ 1417.005011]  [<ffffffff8156dde5>] ? nf_hook_slow+0x75/0x150
[ 1417.005011]  [<ffffffff8157a510>] ? ip_fragment+0x810/0x810
[ 1417.005011]  [<ffffffff8157a68e>] ip_finish_output+0x17e/0x2f0
[ 1417.005011]  [<ffffffff81533ddb>] ? __alloc_skb+0x4b/0x240
[ 1417.005011]  [<ffffffff8157b1e8>] ip_output+0x98/0xa0
[ 1417.005011]  [<ffffffff8157a8a4>] ? __ip_local_out+0xa4/0xb0
[ 1417.005011]  [<ffffffff8157a8d9>] ip_local_out+0x29/0x30
[ 1417.005011]  [<ffffffff8157aa3c>] ip_queue_xmit+0x15c/0x410
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81592c69>] tcp_transmit_skb+0x359/0x580
[ 1417.005011]  [<ffffffff81593be1>] tcp_retransmit_skb+0x171/0x310
[ 1417.005011]  [<ffffffff8159561b>] tcp_retransmit_timer+0x21b/0x440
[ 1417.005011]  [<ffffffff81595928>] tcp_write_timer+0xe8/0x110
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81075d36>] call_timer_fn+0x46/0x160
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81077682>] run_timer_softirq+0x132/0x2a0
[ 1417.005011]  [<ffffffff8106e5d8>] __do_softirq+0xa8/0x210
[ 1417.005011]  [<ffffffff813a94b7>] ? __xen_evtchn_do_upcall+0x207/0x250
[ 1417.005011]  [<ffffffff816656ac>] call_softirq+0x1c/0x30
[ 1417.005011]  [<ffffffff81015305>] do_softirq+0x65/0xa0
[ 1417.005011]  [<ffffffff8106e9be>] irq_exit+0x8e/0xb0
[ 1417.005011]  [<ffffffff813ab595>] xen_evtchn_do_upcall+0x35/0x50
[ 1417.005011]  [<ffffffff816656fe>] xen_do_hypervisor_callback+0x1e/0x30
[ 1417.005011]  <EOI>
[ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff8100a2d0>] ? xen_safe_halt+0x10/0x20
[ 1417.005011]  [<ffffffff8101b983>] ? default_idle+0x53/0x1d0
[ 1417.005011]  [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
[ 1417.005011]  [<ffffffff8100ab29>] ? xen_irq_enable_direct_reloc+0x4/0x4
[ 1417.005011]  [<ffffffff8163369c>] ? cpu_bringup_and_idle+0xe/0x10
[ 1417.005011] Code: 58 48 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8
1b fe ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83
e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c
[ 1417.005011] RIP  [<ffffffff81318e2b>] memcpy+0xb/0x120
[ 1417.005011]  RSP <ffff880060083b08>
[ 1417.005011] CR2: ffff88073fee6e00
[ 1417.005011] ---[ end trace ae4e7f56ea0665fe ]---
[ 1417.005011] Kernel panic - not syncing: Fatal exception in interrupt
[ 1417.005011] Pid: 0, comm: swapper/4 Tainted: G      D W
3.2.0-32-generic #51-Ubuntu
[ 1417.005011] Call Trace:
[ 1417.005011]  <IRQ>  [<ffffffff81642197>] panic+0x91/0x1a4
[ 1417.005011]  [<ffffffff8165c01a>] oops_end+0xea/0xf0
[ 1417.005011]  [<ffffffff81641027>] no_context+0x150/0x15d
[ 1417.005011]  [<ffffffff816411fd>] __bad_area_nosemaphore+0x1c9/0x1e8
[ 1417.005011]  [<ffffffff81640835>] ? pte_offset_kernel+0x13/0x3c
[ 1417.005011]  [<ffffffff8164122f>] bad_area_nosemaphore+0x13/0x15
[ 1417.005011]  [<ffffffff8165ec36>] do_page_fault+0x426/0x520
[ 1417.005011]  [<ffffffff8165b0ce>] ? _raw_spin_lock_irqsave+0x2e/0x40
[ 1417.005011]  [<ffffffff81059d8a>] ? get_nohz_timer_target+0x5a/0xc0
[ 1417.005011]  [<ffffffff8165b04e>] ? 
_raw_spin_unlock_irqrestore+0x1e/0x30
[ 1417.005011]  [<ffffffff81077f93>] ? mod_timer_pending+0x113/0x240
[ 1417.005011]  [<ffffffffa0317f34>] ? __nf_ct_refresh_acct+0xd4/0x100
[nf_conntrack]
[ 1417.005011]  [<ffffffff8165b5b5>] page_fault+0x25/0x30
[ 1417.005011]  [<ffffffff81318e2b>] ? memcpy+0xb/0x120
[ 1417.005011]  [<ffffffff81532c0e>] ? skb_copy_bits+0x16e/0x2c0
[ 1417.005011]  [<ffffffff8153463a>] skb_copy+0x8a/0xb0
[ 1417.005011]  [<ffffffff8154b517>] neigh_probe+0x37/0x80
[ 1417.005011]  [<ffffffff8154b9db>] __neigh_event_send+0xbb/0x210
[ 1417.005011]  [<ffffffff8154bc73>] neigh_resolve_output+0x143/0x1f0
[ 1417.005011]  [<ffffffff8156dde5>] ? nf_hook_slow+0x75/0x150
[ 1417.005011]  [<ffffffff8157a510>] ? ip_fragment+0x810/0x810
[ 1417.005011]  [<ffffffff8157a68e>] ip_finish_output+0x17e/0x2f0
[ 1417.005011]  [<ffffffff81533ddb>] ? __alloc_skb+0x4b/0x240
[ 1417.005011]  [<ffffffff8157b1e8>] ip_output+0x98/0xa0
[ 1417.005011]  [<ffffffff8157a8a4>] ? __ip_local_out+0xa4/0xb0
[ 1417.005011]  [<ffffffff8157a8d9>] ip_local_out+0x29/0x30
[ 1417.005011]  [<ffffffff8157aa3c>] ip_queue_xmit+0x15c/0x410
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81592c69>] tcp_transmit_skb+0x359/0x580
[ 1417.005011]  [<ffffffff81593be1>] tcp_retransmit_skb+0x171/0x310
[ 1417.005011]  [<ffffffff8159561b>] tcp_retransmit_timer+0x21b/0x440
[ 1417.005011]  [<ffffffff81595928>] tcp_write_timer+0xe8/0x110
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81075d36>] call_timer_fn+0x46/0x160
[ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
[ 1417.005011]  [<ffffffff81077682>] run_timer_softirq+0x132/0x2a0
[ 1417.005011]  [<ffffffff8106e5d8>] __do_softirq+0xa8/0x210
[ 1417.005011]  [<ffffffff813a94b7>] ? __xen_evtchn_do_upcall+0x207/0x250
[ 1417.005011]  [<ffffffff816656ac>] call_softirq+0x1c/0x30
[ 1417.005011]  [<ffffffff81015305>] do_softirq+0x65/0xa0
[ 1417.005011]  [<ffffffff8106e9be>] irq_exit+0x8e/0xb0
[ 1417.005011]  [<ffffffff813ab595>] xen_evtchn_do_upcall+0x35/0x50
[ 1417.005011]  [<ffffffff816656fe>] xen_do_hypervisor_callback+0x1e/0x30
[ 1417.005011]  <EOI>  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 1417.005011]  [<ffffffff8100a2d0>] ? xen_safe_halt+0x10/0x20
[ 1417.005011]  [<ffffffff8101b983>] ? default_idle+0x53/0x1d0
[ 1417.005011]  [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
[ 1417.005011]  [<ffffffff8100ab29>] ? xen_irq_enable_direct_reloc+0x4/0x4
[ 1417.005011]  [<ffffffff8163369c>] ? cpu_bringup_and_idle+0xe/0x10
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.	




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 10:56   ` Alex Bligh
@ 2013-01-16 14:34     ` Stefano Stabellini
  2013-01-16 15:06       ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-16 14:34 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Rzeszutek Wilk, Ian Campbell, Jan Beulich, Xen Devel

On Wed, 16 Jan 2013, Alex Bligh wrote:
> Kernel 3.2.0-32-generic on an x86_64
> 
> [ 1416.992402] BUG: unable to handle kernel paging request at
> ffff88073fee6e00
> [ 1416.992902] IP: [<ffffffff81318e2b>] memcpy+0xb/0x120
> [ 1416.993244] PGD 1c06067 PUD 7ec73067 PMD 7ee73067 PTE 0
> [ 1416.993985] Oops: 0000 [#1] SMP
> [ 1416.994433] CPU 4
> [ 1416.994587] Modules linked in: xt_physdev xen_pciback xen_netback
> xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs veth ip6t_LOG
> nf_conntrack_ipv6 nf_
> defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_limit xt_state
> xt_tcpudp nf_conntrack_netlink nfnetlink ebt_ip ebtable_filter
> iptable_mangle ipt_MASQUERADE
> iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
> iptable_filter ip_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad
> ib_core ib_addr iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi ebtable_broute ebtables
> x_tables dcdbas psmouse serio_raw amd64_edac_mod usbhid hid edac_core
> sp5100_tco i2c_piix
> 4 edac_mce_amd fam15h_power k10temp igb bnx2 acpi_power_meter mac_hid
> dm_multipath bridge 8021q garp stp ixgbe dca mdio nfsd nfs lockd fscache
> auth_rpcgss nf
> s_acl sunrpc [last unloaded: scsi_transport_iscsi]
> [ 1417.005011]
> [ 1417.005011] Pid: 0, comm: swapper/4 Tainted: G        W
> 3.2.0-32-generic #51-Ubuntu Dell Inc. PowerEdge R715/0C5MMK
> [ 1417.005011] RIP: e030:[<ffffffff81318e2b>]  [<ffffffff81318e2b>]
> memcpy+0xb/0x120
> [ 1417.005011] RSP: e02b:ffff880060083b08  EFLAGS: 00010246
> [ 1417.005011] RAX: ffff88001e12c9e4 RBX: 0000000000000210 RCX:
> 0000000000000040
> [ 1417.005011] RDX: 0000000000000000 RSI: ffff88073fee6e00 RDI:
> ffff88001e12c9e4
> [ 1417.005011] RBP: ffff880060083b70 R08: 00000000000002e8 R09:
> 0000000000000200
> [ 1417.005011] R10: ffff88001e12c9e4 R11: 0000000000000280 R12:
> 00000000000000e8
> [ 1417.005011] R13: ffff88004b014c00 R14: ffff88004b532000 R15:
> 0000000000000001
> [ 1417.005011] FS:  00007f1a99089700(0000) GS:ffff880060080000(0000)
> knlGS:0000000000000000
> [ 1417.005011] CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
> [ 1417.005011] CR2: ffff88073fee6e00 CR3: 0000000015d22000 CR4:
> 0000000000040660
> [ 1417.005011] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1417.005011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 1417.005011] Process swapper/4 (pid: 0, threadinfo ffff88004b532000,
> task ffff88004b538000)
> [ 1417.005011] Stack:
> [ 1417.005011]  ffffffff81532c0e 0000000000000000 ffff8800000002e8
> ffff880000000200
> [ 1417.005011]  ffff88001e12c9e4 0000000000000200 ffff88004b533fd8
> ffff880060083ba0
> [ 1417.005011]  ffff88004b015800 ffff88004b014c00 ffff88001b142000
> 00000000000000fc
> [ 1417.005011] Call Trace:
> [ 1417.005011]  <IRQ>
> [ 1417.005011]  [<ffffffff81532c0e>] ? skb_copy_bits+0x16e/0x2c0
> [ 1417.005011]  [<ffffffff8153463a>] skb_copy+0x8a/0xb0
> [ 1417.005011]  [<ffffffff8154b517>] neigh_probe+0x37/0x80
> [ 1417.005011]  [<ffffffff8154b9db>] __neigh_event_send+0xbb/0x210
> [ 1417.005011]  [<ffffffff8154bc73>] neigh_resolve_output+0x143/0x1f0
> [ 1417.005011]  [<ffffffff8156dde5>] ? nf_hook_slow+0x75/0x150
> [ 1417.005011]  [<ffffffff8157a510>] ? ip_fragment+0x810/0x810
> [ 1417.005011]  [<ffffffff8157a68e>] ip_finish_output+0x17e/0x2f0
> [ 1417.005011]  [<ffffffff81533ddb>] ? __alloc_skb+0x4b/0x240
> [ 1417.005011]  [<ffffffff8157b1e8>] ip_output+0x98/0xa0
> [ 1417.005011]  [<ffffffff8157a8a4>] ? __ip_local_out+0xa4/0xb0
> [ 1417.005011]  [<ffffffff8157a8d9>] ip_local_out+0x29/0x30
> [ 1417.005011]  [<ffffffff8157aa3c>] ip_queue_xmit+0x15c/0x410
> [ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
> [ 1417.005011]  [<ffffffff81592c69>] tcp_transmit_skb+0x359/0x580
> [ 1417.005011]  [<ffffffff81593be1>] tcp_retransmit_skb+0x171/0x310
> [ 1417.005011]  [<ffffffff8159561b>] tcp_retransmit_timer+0x21b/0x440
> [ 1417.005011]  [<ffffffff81595928>] tcp_write_timer+0xe8/0x110
> [ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
> [ 1417.005011]  [<ffffffff81075d36>] call_timer_fn+0x46/0x160
> [ 1417.005011]  [<ffffffff81595840>] ? tcp_retransmit_timer+0x440/0x440
> [ 1417.005011]  [<ffffffff81077682>] run_timer_softirq+0x132/0x2a0
> [ 1417.005011]  [<ffffffff8106e5d8>] __do_softirq+0xa8/0x210
> [ 1417.005011]  [<ffffffff813a94b7>] ? __xen_evtchn_do_upcall+0x207/0x250
> [ 1417.005011]  [<ffffffff816656ac>] call_softirq+0x1c/0x30
> [ 1417.005011]  [<ffffffff81015305>] do_softirq+0x65/0xa0
> [ 1417.005011]  [<ffffffff8106e9be>] irq_exit+0x8e/0xb0
> [ 1417.005011]  [<ffffffff813ab595>] xen_evtchn_do_upcall+0x35/0x50
> [ 1417.005011]  [<ffffffff816656fe>] xen_do_hypervisor_callback+0x1e/0x30
> [ 1417.005011]  <EOI>
> [ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
> [ 1417.005011]  [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
> [ 1417.005011]  [<ffffffff8100a2d0>] ? xen_safe_halt+0x10/0x20
> [ 1417.005011]  [<ffffffff8101b983>] ? default_idle+0x53/0x1d0
> [ 1417.005011]  [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
> [ 1417.005011]  [<ffffffff8100ab29>] ? xen_irq_enable_direct_reloc+0x4/0x4
> [ 1417.005011]  [<ffffffff8163369c>] ? cpu_bringup_and_idle+0xe/0x10
> [ 1417.005011] Code: 58 48 2b 43 50 88 43 4e 48 83 c4 08 5b 5d c3 90 e8
> 1b fe ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 89 d1 c1 e9 03 83
> e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c

It seems that the grant mapping is already gone by the time
tcp_retransmit is called.
That might happen because QEMU already completed the read/write
operation and called xc_gnttab_munmap, that causes the grant_table and
the m2p_override to remove the p2m and m2p mappings of the foreign
pages.

Isn't there a way to prevent tcp_retransmit from running when the
request is already completed? Or stop it if you find out that the pages
are already gone?

You could try persistent grants, that wouldn't solve the bug but they
should be able to "hide" it pretty well. Not ideal, I know.
The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
Konrad issued a pull request recently with the corresponding Linux
blkfront changes:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.8

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 14:34     ` Stefano Stabellini
@ 2013-01-16 15:06       ` Alex Bligh
  2013-01-16 16:00         ` Alex Bligh
  2013-01-16 16:27         ` Stefano Stabellini
  0 siblings, 2 replies; 91+ messages in thread
From: Alex Bligh @ 2013-01-16 15:06 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, Ian Campbell, Jan Beulich, Xen Devel

Stefano,

--On 16 January 2013 14:34:34 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

> It seems that the grant mapping is already gone by the time
> tcp_retransmit is called.
> That might happen because QEMU already completed the read/write
> operation and called xc_gnttab_munmap, that causes the grant_table and
> the m2p_override to remove the p2m and m2p mappings of the foreign
> pages.

What I want to know is why QEMU is completing the read/write operation
before the write (as it surely must be a write) has completed in any
case. This /seems/ to happen only if a backing file is being used
but I'm not sure if that's just triggering the retransmits due to
(e.g.) a slow filer.

If QEMU is completing writes before they've actually been done, haven't
we got a wider set of problems to worry about?

Could the problem be "cache=writeback" on the QEMU command
line (evident from a 'ps'). If caching is writeback perhaps QEMU
needs to copy the data. Is there some setting to turn this off in
xl for test purposes?

> Isn't there a way to prevent tcp_retransmit from running when the
> request is already completed? Or stop it if you find out that the pages
> are already gone?

But what would you do? If you don't run the tcp_retransmit the write
would be lost (to say nothing of the NFS connection to the server).

> You could try persistent grants, that wouldn't solve the bug but they
> should be able to "hide" it pretty well. Not ideal, I know.
> The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
> Konrad issued a pull request recently with the corresponding Linux
> blkfront changes:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
> stable/for-jens-3.8

That's presumably the fir 8 commits at:
http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/stable/for-jens-3.8

So I'd need a new dom0 kernel and to backport the QEMU patch.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 15:06       ` Alex Bligh
@ 2013-01-16 16:00         ` Alex Bligh
  2013-01-16 16:27         ` Stefano Stabellini
  1 sibling, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-01-16 16:00 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, Ian Campbell, Jan Beulich, Xen Devel



--On 16 January 2013 15:06:49 +0000 Alex Bligh <alex@alex.org.uk> wrote:

> If QEMU is completing writes before they've actually been done, haven't
> we got a wider set of problems to worry about?

One such problem would be 'what happens if the guest changes the data
after it has been written'. If I understand Stefano right, dom0
is referencing after the write is being marked as complete

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 15:06       ` Alex Bligh
  2013-01-16 16:00         ` Alex Bligh
@ 2013-01-16 16:27         ` Stefano Stabellini
  2013-01-16 17:13           ` Alex Bligh
  1 sibling, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-16 16:27 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Ian Campbell, Jan Beulich,
	Stefano Stabellini

On Wed, 16 Jan 2013, Alex Bligh wrote:
> Stefano,
> 
> --On 16 January 2013 14:34:34 +0000 Stefano Stabellini 
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> > It seems that the grant mapping is already gone by the time
> > tcp_retransmit is called.
> > That might happen because QEMU already completed the read/write
> > operation and called xc_gnttab_munmap, that causes the grant_table and
> > the m2p_override to remove the p2m and m2p mappings of the foreign
> > pages.
> 
> What I want to know is why QEMU is completing the read/write operation
> before the write (as it surely must be a write) has completed in any
> case. This /seems/ to happen only if a backing file is being used
> but I'm not sure if that's just triggering the retransmits due to
> (e.g.) a slow filer.
> 
> If QEMU is completing writes before they've actually been done, haven't
> we got a wider set of problems to worry about?

Reading the thread you linked in a previous email, it seems that
it can actually happen that a userspace application is told that
the write is completed before all the outstanding network requests are
dealt with.


> Could the problem be "cache=writeback" on the QEMU command
> line (evident from a 'ps'). If caching is writeback perhaps QEMU
> needs to copy the data. Is there some setting to turn this off in
> xl for test purposes?

The command line cache options are ignored by xen_disk, so, assuming
that the guest is using the PV disk interface, that can't be the issue.


> > Isn't there a way to prevent tcp_retransmit from running when the
> > request is already completed? Or stop it if you find out that the pages
> > are already gone?
> 
> But what would you do? If you don't run the tcp_retransmit the write
> would be lost (to say nothing of the NFS connection to the server).

Well, that is not true: if the write was really lost, the kernel wouldn't
have completed the AIO write and notified QEMU.


> > You could try persistent grants, that wouldn't solve the bug but they
> > should be able to "hide" it pretty well. Not ideal, I know.
> > The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
> > Konrad issued a pull request recently with the corresponding Linux
> > blkfront changes:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
> > stable/for-jens-3.8
> 
> That's presumably the fir 8 commits at:
> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/stable/for-jens-3.8
> 
> So I'd need a new dom0 kernel and to backport the QEMU patch.

Yep.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 16:27         ` Stefano Stabellini
@ 2013-01-16 17:13           ` Alex Bligh
  2013-01-16 17:33             ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-16 17:13 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Xen Devel, Jan Beulich, Alex Bligh


>> If QEMU is completing writes before they've actually been done, haven't
>> we got a wider set of problems to worry about?
>
> Reading the thread you linked in a previous email, it seems that
> it can actually happen that a userspace application is told that
> the write is completed before all the outstanding network requests are
> dealt with.

What is 'userspace application' in this context? QEMU running in dom0?
That would seem to me to be a kernel bug unless the page is marked
CoW, wouldn't it? Else write() then alter the page might write the
altered data. But perhaps I've misunderstood (see below)

>> Could the problem be "cache=writeback" on the QEMU command
>> line (evident from a 'ps'). If caching is writeback perhaps QEMU
>> needs to copy the data. Is there some setting to turn this off in
>> xl for test purposes?
>
> The command line cache options are ignored by xen_disk, so, assuming
> that the guest is using the PV disk interface, that can't be the issue.

This appears not to be the case (at least in our environment).

We use PV on HVM and:
 disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]
(remainder of config file in the original message)

We tried modifying the cache= setting using the patch below (yes,
the mail client will probably have eaten it, but in essence change
the word 'writeback' to 'none'), and that stops it booting VMs
at all with
 hd0 write error
 error: couldn't read file
so it would appear not to be entirely correct that the cache=
settings are being ignored. I've not had time to find out why
(possibly it's trying and failing to use O_DIRECT on NFS) but
I'll try writethrough.

One thing the guest is doing is writing to the partition table
(UEC cloud images do this on boot). This isn't special cased in
any way is it?

>> > Isn't there a way to prevent tcp_retransmit from running when the
>> > request is already completed? Or stop it if you find out that the pages
>> > are already gone?
>>
>> But what would you do? If you don't run the tcp_retransmit the write
>> would be lost (to say nothing of the NFS connection to the server).
>
> Well, that is not true: if the write was really lost, the kernel wouldn't
> have completed the AIO write and notified QEMU.

Isn't that exactly what you said did happen? The kernel completed the AIO
write and notified QEMU prior to the write actually completing as the
data to write is still sitting in some as-yet-unacked TCP buffer. The
kernel then doesn't get the ACK in respect of that sequence number and
decides to resend the entire TCP segment. That than blows up because
the TCP segment it points to contains data pointing to a hole in memory.
Perhaps I'm misunderstanding the problem.

If TCP does not retransmit, that segment will never get ACKed, and the
TCP stream will lock up (this assumes that the cause of the original
need to retransmit was packet loss - if it's simply buffering at
a busy filer, then I agree).

>> > You could try persistent grants, that wouldn't solve the bug but they
>> > should be able to "hide" it pretty well. Not ideal, I know.
>> > The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
>> > Konrad issued a pull request recently with the corresponding Linux
>> > blkfront changes:
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
>> > stable/for-jens-3.8
>>
>> That's presumably the fir 8 commits at:
>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=re
>> fs/heads/stable/for-jens-3.8
>>
>> So I'd need a new dom0 kernel and to backport the QEMU patch.
>
> Yep.

What puzzles me about this is (a) why we never see the same problems
on KVM, and (b) why this doesn't affect NFS clients even when no
virtualisation is involved.

-- 
Alex Bligh


dcrisan@dcrisan-lnx:/home/dcrisan/code/git/xen-4.2-live-migrate$ git diff
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 7662b3d..7b74e24 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -549,10 +549,10 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,
             if (disks[i].is_cdrom) {
                 if (disks[i].format == LIBXL_DISK_FORMAT_EMPTY)
                     drive = libxl__sprintf
-                        (gc, 
"if=ide,index=%d,media=cdrom,cache=writeback", disk);
+                        (gc, "if=ide,index=%d,media=cdrom,cache=none", 
disk);
                 else
                     drive = libxl__sprintf
-                        (gc, 
"file=%s,if=ide,index=%d,media=cdrom,format=%s,cache=writeback",
+                        (gc, 
"file=%s,if=ide,index=%d,media=cdrom,format=%s,cache=none",
                          disks[i].pdev_path, disk, format);
             } else {
                 if (disks[i].format == LIBXL_DISK_FORMAT_EMPTY) {
@@ -575,11 +575,11 @@ static char ** 
libxl__build_device_model_args_new(libxl__gc *gc,
                  */
                 if (strncmp(disks[i].vdev, "sd", 2) == 0)
                     drive = libxl__sprintf
-                        (gc, 
"file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
+                        (gc, 
"file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=none",
                          disks[i].pdev_path, disk, format);
                 else if (disk < 4)
                     drive = libxl__sprintf
-                        (gc, 
"file=%s,if=ide,index=%d,media=disk,format=%s,cache=writeback",
+                        (gc, 
"file=%s,if=ide,index=%d,media=disk,format=%s,cache=none",
                          disks[i].pdev_path, disk, format);
                 else
                     continue; /* Do not emulate this disk */

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 17:13           ` Alex Bligh
@ 2013-01-16 17:33             ` Stefano Stabellini
  2013-01-16 17:39               ` Stefano Stabellini
                                 ` (2 more replies)
  0 siblings, 3 replies; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-16 17:33 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Ian Campbell, Jan Beulich,
	Stefano Stabellini

On Wed, 16 Jan 2013, Alex Bligh wrote:
> >> If QEMU is completing writes before they've actually been done, haven't
> >> we got a wider set of problems to worry about?
> >
> > Reading the thread you linked in a previous email, it seems that
> > it can actually happen that a userspace application is told that
> > the write is completed before all the outstanding network requests are
> > dealt with.
> 
> What is 'userspace application' in this context? QEMU running in dom0?
> That would seem to me to be a kernel bug unless the page is marked
> CoW, wouldn't it? Else write() then alter the page might write the
> altered data. But perhaps I've misunderstood (see below)

Yes, the application is QEMU. I also think that it is a kernel bug.


> >> Could the problem be "cache=writeback" on the QEMU command
> >> line (evident from a 'ps'). If caching is writeback perhaps QEMU
> >> needs to copy the data. Is there some setting to turn this off in
> >> xl for test purposes?
> >
> > The command line cache options are ignored by xen_disk, so, assuming
> > that the guest is using the PV disk interface, that can't be the issue.
> 
> This appears not to be the case (at least in our environment).
> 
> We use PV on HVM and:
>  disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]
> (remainder of config file in the original message)
> 
> We tried modifying the cache= setting using the patch below (yes,
> the mail client will probably have eaten it, but in essence change
> the word 'writeback' to 'none'), and that stops it booting VMs
> at all with
>  hd0 write error
>  error: couldn't read file
> so it would appear not to be entirely correct that the cache=
> settings are being ignored. I've not had time to find out why
> (possibly it's trying and failing to use O_DIRECT on NFS) but
> I'll try writethrough.

The cache command line option is ignored by xen_disk, the PV disk
backend.  I was assuming that the guest is using blkfront to access the
disk, but it looks like I am wrong.  If the guest is using the IDE
interface, then yes, the cache command line option makes a big
difference.

It is interesting that cache=none has that terrible effect on the disk
reads, that means that O_DIRECT doesn't work properly either.


> One thing the guest is doing is writing to the partition table
> (UEC cloud images do this on boot). This isn't special cased in
> any way is it?

I don't think so.


> >> > Isn't there a way to prevent tcp_retransmit from running when the
> >> > request is already completed? Or stop it if you find out that the pages
> >> > are already gone?
> >>
> >> But what would you do? If you don't run the tcp_retransmit the write
> >> would be lost (to say nothing of the NFS connection to the server).
> >
> > Well, that is not true: if the write was really lost, the kernel wouldn't
> > have completed the AIO write and notified QEMU.
> 
> Isn't that exactly what you said did happen? The kernel completed the AIO
> write and notified QEMU prior to the write actually completing as the
> data to write is still sitting in some as-yet-unacked TCP buffer. The
> kernel then doesn't get the ACK in respect of that sequence number and
> decides to resend the entire TCP segment. That than blows up because
> the TCP segment it points to contains data pointing to a hole in memory.
> Perhaps I'm misunderstanding the problem.
> 
> If TCP does not retransmit, that segment will never get ACKed, and the
> TCP stream will lock up (this assumes that the cause of the original
> need to retransmit was packet loss - if it's simply buffering at
> a busy filer, then I agree).

Almost. I am saying that the kernel completed the AIO write and notified
QEMU after it received an ACK from the other end, but before the
tcp_retransmit was supposed to run.  I admit I am not that familiar with
the network stack so this is just a supposition.


> >> > You could try persistent grants, that wouldn't solve the bug but they
> >> > should be able to "hide" it pretty well. Not ideal, I know.
> >> > The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
> >> > Konrad issued a pull request recently with the corresponding Linux
> >> > blkfront changes:
> >> >
> >> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
> >> > stable/for-jens-3.8
> >>
> >> That's presumably the fir 8 commits at:
> >> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=re
> >> fs/heads/stable/for-jens-3.8
> >>
> >> So I'd need a new dom0 kernel and to backport the QEMU patch.
> >
> > Yep.
> 
> What puzzles me about this is (a) why we never see the same problems
> on KVM, and (b) why this doesn't affect NFS clients even when no
> virtualisation is involved.

If it is the bug that I think it is, then it would also affect KVM and
other native clients, but it wouldn't cause such horrible host crashes.
For example tcp_retransmit could send stale data or even data that has
just been written by QEMU but that it is not supposed to go over the
network yet. After all, who knows what's written on those pages now that
the AIO is completed?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 17:33             ` Stefano Stabellini
@ 2013-01-16 17:39               ` Stefano Stabellini
  2013-01-16 18:14                 ` Alex Bligh
  2013-01-16 18:12               ` Alex Bligh
  2013-01-21 15:15               ` Alex Bligh
  2 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-16 17:39 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Jan Beulich, Alex Bligh, Ian Campbell

On Wed, 16 Jan 2013, Stefano Stabellini wrote:
> > >> Could the problem be "cache=writeback" on the QEMU command
> > >> line (evident from a 'ps'). If caching is writeback perhaps QEMU
> > >> needs to copy the data. Is there some setting to turn this off in
> > >> xl for test purposes?
> > >
> > > The command line cache options are ignored by xen_disk, so, assuming
> > > that the guest is using the PV disk interface, that can't be the issue.
> > 
> > This appears not to be the case (at least in our environment).
> > 
> > We use PV on HVM and:
> >  disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]
> > (remainder of config file in the original message)
> > 
> > We tried modifying the cache= setting using the patch below (yes,
> > the mail client will probably have eaten it, but in essence change
> > the word 'writeback' to 'none'), and that stops it booting VMs
> > at all with
> >  hd0 write error
> >  error: couldn't read file
> > so it would appear not to be entirely correct that the cache=
> > settings are being ignored. I've not had time to find out why
> > (possibly it's trying and failing to use O_DIRECT on NFS) but
> > I'll try writethrough.
> 
> The cache command line option is ignored by xen_disk, the PV disk
> backend.  I was assuming that the guest is using blkfront to access the
> disk, but it looks like I am wrong.  If the guest is using the IDE
> interface, then yes, the cache command line option makes a big
> difference.
> 
> It is interesting that cache=none has that terrible effect on the disk
> reads, that means that O_DIRECT doesn't work properly either.

Let me elaborate on this: the guest is a PV on HVM guest, so at the very
least the bootloader is going to use the emulated IDE interface to grab
Xen and the kernel.
After that the kernel should use the PV disk interface straight away (I
actually downloaded the image and tried it myself: it is using the PV
disk interface indeed).

The bug should occur after the guest has switch over to the PV disk
interface (state = 4 on xenstore for the vbd device). It can't be the
IDE emulator triggering the issue.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 17:33             ` Stefano Stabellini
  2013-01-16 17:39               ` Stefano Stabellini
@ 2013-01-16 18:12               ` Alex Bligh
  2013-01-21 15:15               ` Alex Bligh
  2 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-01-16 18:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Xen Devel, Jan Beulich, Alex Bligh

Stefano,

--On 16 January 2013 17:33:59 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

>> What is 'userspace application' in this context? QEMU running in dom0?
>> That would seem to me to be a kernel bug unless the page is marked
>> CoW, wouldn't it? Else write() then alter the page might write the
>> altered data. But perhaps I've misunderstood (see below)
>
> Yes, the application is QEMU. I also think that it is a kernel bug.

OK

>> We use PV on HVM and:
>>  disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]
>> (remainder of config file in the original message)
>>
>> We tried modifying the cache= setting using the patch below (yes,
>> the mail client will probably have eaten it, but in essence change
>> the word 'writeback' to 'none'), and that stops it booting VMs
>> at all with
>>  hd0 write error
>>  error: couldn't read file
>> so it would appear not to be entirely correct that the cache=
>> settings are being ignored. I've not had time to find out why
>> (possibly it's trying and failing to use O_DIRECT on NFS) but
>> I'll try writethrough.
>
> The cache command line option is ignored by xen_disk, the PV disk
> backend.  I was assuming that the guest is using blkfront to access the
> disk, but it looks like I am wrong.  If the guest is using the IDE
> interface, then yes, the cache command line option makes a big
> difference.

As above, we are using:
  tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w

My understanding of docs/misc/vbd-interface.txt is that this uses
an xvd device. What that will do is:

   For HVM guests, each whole-disk hd* and and sd* device is made
   available _both_ via emulated IDE resp. SCSI controller, _and_ as a
   Xen VBD.  The HVM guest is entitled to assume that the IDE or SCSI
   disks available via the emulated IDE controller target the same
   underlying devices as the corresponding Xen VBD (ie, multipath).

However, the kernel should be doing a PCI unplug on the hd/sd devices
(to be honest I've never seen an hd device). And indeed the root
partition is /dev/xvda1.

So I'd have thought it was neither IDE nor SCSI. dmesg from the guest
(when booted without a qemu backing file) is attached below. This supports
this.

> It is interesting that cache=none has that terrible effect on the disk
> reads, that means that O_DIRECT doesn't work properly either.

Indeed.

With no backing file, cache=writethrough works. With a backing file
(i.e. the same circumstance as the previous crash), it appears to
causes the NFS kernel client to die, so dom0 can no longer
access the NFS mount at all. This would rather suggest the tcp
connection to the server has died in some horrible manner.

> Almost. I am saying that the kernel completed the AIO write and notified
> QEMU after it received an ACK from the other end, but before the
> tcp_retransmit was supposed to run.  I admit I am not that familiar with
> the network stack so this is just a supposition.

If the segment has been tcp-ACKed, it shouldn't need to be retransmitted,
surely? ACK on the data means 'you can throw the data away now' (as
far as I know - it's nearly 20 years since I looked at the Linux tcp
stack properly so I'm in an equal state of ignorance).

> If it is the bug that I think it is, then it would also affect KVM and
> other native clients, but it wouldn't cause such horrible host crashes.
> For example tcp_retransmit could send stale data or even data that has
> just been written by QEMU but that it is not supposed to go over the
> network yet. After all, who knows what's written on those pages now that
> the AIO is completed?

Indeed.

Note we're using:
  tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w

Is that actually using aio? I'm not quite clear what in the new world
order "tap:" means. Previously tap and aio were separate options. Can
one use qdisk and have it use a different codepath? Should we in
fact be using:
  backendtype=tap,/my/nfs/directory/testdisk,qcow2,xvda,rw
or
  backendtype=qdisk,/my/nfs/directory/testdisk,qcow2,xvda,rw

-- 
Alex Bligh


[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.2.0-34-virtual (buildd@allspice) (gcc 
version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #53-Ubuntu SMP Thu Nov 15 
11:08:40 UTC 2012 (Ubuntu 3.2.0-34.53-virtual 3.2.33)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-34-virtual 
root=LABEL=cloudimg-rootfs ro init=/usr/lib/cloud-init/uncloud-init 
ds=nocloud ubuntu-pass=ubuntu
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
[    0.000000]  BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bf7ff000 (usable)
[    0.000000]  BIOS-e820: 00000000bf7ff000 - 00000000bf800000 (reserved)
[    0.000000]  BIOS-e820: 00000000fc000000 - 0000000100000000 (reserved)
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.4 present.
[    0.000000] DMI: Xen HVM domU, BIOS 4.2.0 01/16/2013
[    0.000000] Hypervisor detected: Xen HVM
[    0.000000] Xen version 4.2.
[    0.000000] Xen Platform PCI: I/O protocol version 1
[    0.000000] Netfront and the Xen platform PCI driver have been compiled 
for this kernel: unplug emulated NICs.
[    0.000000] Blkfront and the Xen platform PCI driver have been compiled 
for this kernel: unplug emulated disks.
[    0.000000] You might have to change the root device
[    0.000000] from /dev/hd[a-d] to /dev/xvd[a-d]
[    0.000000] in your root= kernel command line option
[    0.000000] HVMOP_pagetable_dying not supported
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 
(usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 
(usable)
[    0.000000] No AGP bridge found
[    0.000000] last_pfn = 0xbf7ff max_arch_pfn = 0x400000000
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF write-combining
[    0.000000]   C0000-FFFFF write-back
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 base 0000F0000000 mask FFFFF8000000 uncachable
[    0.000000]   1 base 0000F8000000 mask FFFFFC000000 uncachable
[    0.000000]   2 disabled
[    0.000000]   3 disabled
[    0.000000]   4 disabled
[    0.000000]   5 disabled
[    0.000000]   6 disabled
[    0.000000]   7 disabled
[    0.000000] TOM2: 0000000420000000 aka 16896M
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 
0x7010600070106
[    0.000000] found SMP MP-table at [ffff8800000fda50] fda50
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] Base memory trampoline at [ffff88000009a000] 9a000 size 20480
[    0.000000] init_memory_mapping: 0000000000000000-00000000bf7ff000
[    0.000000]  0000000000 - 00bf600000 page 2M
[    0.000000]  00bf600000 - 00bf7ff000 page 4k
[    0.000000] kernel direct mapping tables up to bf7ff000 @ 
1fffb000-20000000
[    0.000000] RAMDISK: 3776c000 - 37bae000
[    0.000000] ACPI: RSDP 00000000000fd9a0 00024 (v02    Xen)
[    0.000000] ACPI: XSDT 00000000fc009f40 00054 (v01    Xen      HVM 
00000000 HVML 00000000)
[    0.000000] ACPI: FACP 00000000fc009870 000F4 (v04    Xen      HVM 
00000000 HVML 00000000)
[    0.000000] ACPI: DSDT 00000000fc001290 08555 (v02    Xen      HVM 
00000000 INTL 20090521)
[    0.000000] ACPI: FACS 00000000fc001250 00040
[    0.000000] ACPI: APIC 00000000fc009970 00460 (v02    Xen      HVM 
00000000 HVML 00000000)
[    0.000000] ACPI: HPET 00000000fc009e50 00038 (v01    Xen      HVM 
00000000 HVML 00000000)
[    0.000000] ACPI: WAET 00000000fc009e90 00028 (v01    Xen      HVM 
00000000 HVML 00000000)
[    0.000000] ACPI: SSDT 00000000fc009ec0 00031 (v02    Xen      HVM 
00000000 INTL 20090521)
[    0.000000] ACPI: SSDT 00000000fc009f00 00031 (v02    Xen      HVM 
00000000 INTL 20090521)
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at 0000000000000000-00000000bf7ff000
[    0.000000] Initmem setup node 0 0000000000000000-00000000bf7ff000
[    0.000000]   NODE_DATA [00000000bf7fa000 - 00000000bf7fefff]
[    0.000000]  [ffffea0000000000-ffffea0002ffffff] PMD -> 
[ffff8800bbe00000-ffff8800bedfffff] on node 0
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000010 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   empty
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[2] active PFN ranges
[    0.000000]     0: 0x00000010 -> 0x0000009f
[    0.000000]     0: 0x00000100 -> 0x000bf7ff
[    0.000000] On node 0 totalpages: 784270
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 5 pages reserved
[    0.000000]   DMA zone: 3914 pages, LIFO batch:0
[    0.000000]   DMA32 zone: 12192 pages used for memmap
[    0.000000]   DMA32 zone: 768095 pages, LIFO batch:31
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: Local APIC address 0xfee00000
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x08] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x0a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x0c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x0e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x10] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x12] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x14] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x16] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x18] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x1a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x1c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x1e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x20] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x22] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x24] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x26] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x28] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x15] lapic_id[0x2a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x16] lapic_id[0x2c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x17] lapic_id[0x2e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x18] lapic_id[0x30] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x19] lapic_id[0x32] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x34] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x36] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x38] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x3a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x3c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x3e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x20] lapic_id[0x40] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x21] lapic_id[0x42] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x22] lapic_id[0x44] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x23] lapic_id[0x46] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x24] lapic_id[0x48] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x25] lapic_id[0x4a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x26] lapic_id[0x4c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x27] lapic_id[0x4e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x28] lapic_id[0x50] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x29] lapic_id[0x52] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x2a] lapic_id[0x54] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x2b] lapic_id[0x56] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x2c] lapic_id[0x58] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x2d] lapic_id[0x5a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x2e] lapic_id[0x5c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x2f] lapic_id[0x5e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x30] lapic_id[0x60] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x31] lapic_id[0x62] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x32] lapic_id[0x64] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x33] lapic_id[0x66] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x34] lapic_id[0x68] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x35] lapic_id[0x6a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x36] lapic_id[0x6c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x37] lapic_id[0x6e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x38] lapic_id[0x70] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x39] lapic_id[0x72] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x3a] lapic_id[0x74] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x3b] lapic_id[0x76] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x3c] lapic_id[0x78] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x3d] lapic_id[0x7a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x3e] lapic_id[0x7c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x3f] lapic_id[0x7e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x40] lapic_id[0x80] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x41] lapic_id[0x82] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x42] lapic_id[0x84] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x43] lapic_id[0x86] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x44] lapic_id[0x88] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x45] lapic_id[0x8a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x46] lapic_id[0x8c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x47] lapic_id[0x8e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x48] lapic_id[0x90] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x49] lapic_id[0x92] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x4a] lapic_id[0x94] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x4b] lapic_id[0x96] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x4c] lapic_id[0x98] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x4d] lapic_id[0x9a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x4e] lapic_id[0x9c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x4f] lapic_id[0x9e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x50] lapic_id[0xa0] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x51] lapic_id[0xa2] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x52] lapic_id[0xa4] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x53] lapic_id[0xa6] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x54] lapic_id[0xa8] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x55] lapic_id[0xaa] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x56] lapic_id[0xac] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x57] lapic_id[0xae] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x58] lapic_id[0xb0] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x59] lapic_id[0xb2] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x5a] lapic_id[0xb4] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x5b] lapic_id[0xb6] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x5c] lapic_id[0xb8] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x5d] lapic_id[0xba] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x5e] lapic_id[0xbc] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x5f] lapic_id[0xbe] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x60] lapic_id[0xc0] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x61] lapic_id[0xc2] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x62] lapic_id[0xc4] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x63] lapic_id[0xc6] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x64] lapic_id[0xc8] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x65] lapic_id[0xca] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x66] lapic_id[0xcc] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x67] lapic_id[0xce] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x68] lapic_id[0xd0] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x69] lapic_id[0xd2] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x6a] lapic_id[0xd4] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x6b] lapic_id[0xd6] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x6c] lapic_id[0xd8] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x6d] lapic_id[0xda] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x6e] lapic_id[0xdc] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x6f] lapic_id[0xde] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x70] lapic_id[0xe0] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x71] lapic_id[0xe2] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x72] lapic_id[0xe4] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x73] lapic_id[0xe6] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x74] lapic_id[0xe8] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x75] lapic_id[0xea] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x76] lapic_id[0xec] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x77] lapic_id[0xee] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x78] lapic_id[0xf0] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x79] lapic_id[0xf2] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x7a] lapic_id[0xf4] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x7b] lapic_id[0xf6] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x7c] lapic_id[0xf8] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x7d] lapic_id[0xfa] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x7e] lapic_id[0xfc] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x7f] lapic_id[0xfe] disabled)
[    0.000000] ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 
0-47
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
[    0.000000] ACPI: IRQ0 used by override.
[    0.000000] ACPI: IRQ2 used by override.
[    0.000000] ACPI: IRQ5 used by override.
[    0.000000] ACPI: IRQ9 used by override.
[    0.000000] ACPI: IRQ10 used by override.
[    0.000000] ACPI: IRQ11 used by override.
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.000000] 128 Processors exceeds NR_CPUS limit of 64
[    0.000000] SMP: Allowing 64 CPUs, 62 hotplug CPUs
[    0.000000] nr_irqs_gsi: 64
[    0.000000] PM: Registered nosave memory: 000000000009f000 - 
00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 
00000000000f0000
[    0.000000] PM: Registered nosave memory: 00000000000f0000 - 
0000000000100000
[    0.000000] Allocating PCI resources starting at bf800000 (gap: 
bf800000:3c800000)
[    0.000000] Booting paravirtualized kernel on Xen HVM
[    0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:64 
nr_node_ids:1
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff8800bb600000 s82880 r8192 
d23616 u131072
[    0.000000] pcpu-alloc: s82880 r8192 d23616 u131072 alloc=1*2097152
[    0.000000] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 
15
[    0.000000] pcpu-alloc: [0] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
31
[    0.000000] pcpu-alloc: [0] 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 
47
[    0.000000] pcpu-alloc: [0] 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 
63
[    0.000000] Built 1 zonelists in Node order, mobility grouping on. 
Total pages: 772009
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: 
BOOT_IMAGE=/boot/vmlinuz-3.2.0-34-virtual root=LABEL=cloudimg-rootfs ro 
init=/usr/lib/cloud-init/uncloud-init ds=nocloud ubuntu-pass=ubuntu
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Calgary: detecting Calgary via BIOS EBDA area
[    0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[    0.000000] Memory: 3060592k/3137532k available (6536k kernel code, 452k 
absent, 76488k reserved, 6656k data, 924k init)
[    0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, 
CPUs=64, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000] 	RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000] NR_IRQS:4352 nr_irqs:1600 16
[    0.000000] Xen HVM callback vector for event delivery is enabled
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty0] enabled
[    0.000000] allocated 25165824 bytes of page_cgroup
[    0.000000] please try 'cgroup_disable=memory' option if you don't want 
memory cgroups
[    0.000000] hpet clockevent registered
[    0.000000] Detected 1497.551 MHz processor.
[    0.000000] Marking TSC unstable due to TSCs unsynchronized
[    0.008000] Calibrating delay loop (skipped), value calculated using 
timer frequency.. 2995.10 BogoMIPS (lpj=5990204)
[    0.009487] pid_max: default: 65536 minimum: 512
[    0.012183] Security Framework initialized
[    0.013893] AppArmor: AppArmor initialized
[    0.015597] Yama: becoming mindful.
[    0.016554] Dentry cache hash table entries: 524288 (order: 10, 4194304 
bytes)
[    0.021896] Inode-cache hash table entries: 262144 (order: 9, 2097152 
bytes)
[    0.024830] Mount-cache hash table entries: 256
[    0.028114] Initializing cgroup subsys cpuacct
[    0.029921] Initializing cgroup subsys memory
[    0.031416] Initializing cgroup subsys devices
[    0.032017] Initializing cgroup subsys freezer
[    0.033481] Initializing cgroup subsys blkio
[    0.036034] Initializing cgroup subsys perf_event
[    0.037859] tseg: 0000000000
[    0.037927] CPU: Physical Processor ID: 0
[    0.039554] CPU: Processor Core ID: 0
[    0.040022] mce: CPU supports 2 MCE banks
[    0.050408] ACPI: Core revision 20110623
[    0.058688] ftrace: allocating 27024 entries in 106 pages
[    0.104013] x2apic not enabled, IRQ remapping init failed
[    0.105802] Switched APIC routing to physical flat.
[    0.114752] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=0 pin2=0
[    0.156365] CPU0: AMD Turion(tm) II Neo N40L Dual-Core Processor 
stepping 03
[    0.159690] Xen: using vcpuop timer interface
[    0.159707] installing Xen timer for CPU 0
[    0.160239] cpu 0 spinlock event irq 69
[    0.161861] Performance Events: Broken PMU hardware detected, using 
software events only.
[    0.164353] NMI watchdog disabled (cpu0): hardware events not enabled
[    0.166297] register_vcpu_info failed: err=-22
[    0.166338] cpu 1 spinlock event irq 70
[    0.168059] Booting Node   0, Processors  #1
[    0.169697] smpboot cpu 1: start_ip = 9a000
[    0.268811] NMI watchdog disabled (cpu1): hardware events not enabled
[    0.268802] installing Xen timer for CPU 1
[    0.272100] Brought up 2 CPUs
[    0.273758] Total of 2 processors activated (6019.99 BogoMIPS).
[    0.276853] devtmpfs: initialized
[    0.281464] EVM: security.selinux
[    0.283084] EVM: security.SMACK64
[    0.284037] EVM: security.capability
[    0.285620] print_constraints: dummy:
[    0.288105] RTC time: 18:01:28, date: 01/16/13
[    0.289882] NET: Registered protocol family 16
[    0.292145] Trying to unpack rootfs image as initramfs...
[    0.303111] Extended Config Space enabled on 0 nodes
[    0.308102] ACPI: bus type pci registered
[    0.308394] PCI: Using configuration type 1 for base access
[    0.311298] PCI: Using configuration type 1 for extended access
[    0.312018] bio: create slab <bio-0> at 0
[    0.312018] ACPI: Added _OSI(Module Device)
[    0.312018] ACPI: Added _OSI(Processor Device)
[    0.312018] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.312018] ACPI: Added _OSI(Processor Aggregator Device)
[    0.322015] ACPI: EC: Look up EC in DSDT
[    0.329596] ACPI: Interpreter enabled
[    0.332033] ACPI: (supports S0 S3 S4 S5)
[    0.339941] ACPI: Using IOAPIC for interrupt routing
[    0.371254] ACPI: No dock devices found.
[    0.372033] HEST: Table not found.
[    0.374674] PCI: Using host bridge windows from ACPI; if necessary, use 
"pci=nocrs" and report a bug
[    0.384175] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.387625] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
[    0.387625] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
[    0.387625] pci_root PNP0A03:00: host bridge window [mem 
0x000a0000-0x000bffff]
[    0.388023] pci_root PNP0A03:00: host bridge window [mem 
0xf0000000-0xfbffffff]
[    0.388888] pci 0000:00:00.0: [8086:1237] type 0 class 0x000600
[    0.392023] pci 0000:00:01.0: [8086:7000] type 0 class 0x000601
[    0.411354] pci 0000:00:01.1: [8086:7010] type 0 class 0x000101
[    0.412025] pci 0000:00:01.1: reg 20: [io  0xc200-0xc20f]
[    0.414808] pci 0000:00:01.3: [8086:7113] type 0 class 0x000680
[    0.423955] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by 
PIIX4 ACPI
[    0.428213] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by 
PIIX4 SMB
[    0.434411] pci 0000:00:02.0: [1013:00b8] type 0 class 0x000300
[    0.456045] pci 0000:00:02.0: reg 10: [mem 0xf0000000-0xf1ffffff pref]
[    0.464048] pci 0000:00:02.0: reg 14: [mem 0xf3020000-0xf3020fff]
[    0.465446] Freeing initrd memory: 4360k freed
[    0.524058] pci 0000:00:02.0: reg 30: [mem 0xf3000000-0xf300ffff pref]
[    0.525298] pci 0000:00:03.0: [5853:0001] type 0 class 0x00ff80
[    0.526327] pci 0000:00:03.0: reg 10: [io  0xc000-0xc0ff]
[    0.527202] pci 0000:00:03.0: reg 14: [mem 0xf2000000-0xf2ffffff pref]
[    0.532915] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    0.533532]  pci0000:00: Requesting ACPI _OSC control (0x1d)
[    0.535322]  pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), 
returned control mask: 0x1d
[    0.540055] ACPI _OSC control for PCIe not granted, disabling ASPM
[    0.549774] ACPI: PCI Interrupt Link [LNKA] (IRQs *5 10 11)
[    0.554373] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.558976] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.563526] ACPI: PCI Interrupt Link [LNKD] (IRQs *5 10 11)
[    0.567879] xen/balloon: Initialising balloon driver.
[    0.572100] xen-balloon: Initialising balloon driver.
[    0.574029] vgaarb: device added: 
PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    0.576058] vgaarb: loaded
[    0.577457] vgaarb: bridge control possible 0000:00:02.0
[    0.579181] i2c-core: driver [aat2870] using legacy suspend method
[    0.580055] i2c-core: driver [aat2870] using legacy resume method
[    0.581809] SCSI subsystem initialized
[    0.585115] libata version 3.00 loaded.
[    0.585115] usbcore: registered new interface driver usbfs
[    0.585961] usbcore: registered new interface driver hub
[    0.588186] usbcore: registered new device driver usb
[    0.589988] PCI: Using ACPI for IRQ routing
[    0.592059] PCI: pci_cache_line_size set to 64 bytes
[    0.592697] reserve RAM buffer: 000000000009f400 - 000000000009ffff
[    0.592701] reserve RAM buffer: 00000000bf7ff000 - 00000000bfffffff
[    0.592934] NetLabel: Initializing
[    0.594574] NetLabel:  domain hash size = 128
[    0.596054] NetLabel:  protocols = UNLABELED CIPSOv4
[    0.597829] NetLabel:  unlabeled traffic allowed by default
[    0.599906] HPET: 3 timers in total, 0 timers will be used for per-cpu 
timer
[    0.600102] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    0.606964] hpet0: 3 comparators, 64-bit 62.500000 MHz counter
[    0.620129] Switching to clocksource xen
[    0.629374] AppArmor: AppArmor Filesystem Enabled
[    0.631648] pnp: PnP ACPI init
[    0.633308] ACPI: bus type pnp registered
[    0.635253] pnp 00:00: [mem 0x00000000-0x0009ffff]
[    0.635331] system 00:00: [mem 0x00000000-0x0009ffff] could not be 
reserved
[    0.643138] system 00:00: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.643266] pnp 00:01: [bus 00-ff]
[    0.643271] pnp 00:01: [io  0x0cf8-0x0cff]
[    0.643275] pnp 00:01: [io  0x0000-0x0cf7 window]
[    0.643292] pnp 00:01: [io  0x0d00-0xffff window]
[    0.643295] pnp 00:01: [mem 0x000a0000-0x000bffff window]
[    0.643299] pnp 00:01: [mem 0xf0000000-0xfbffffff window]
[    0.643394] pnp 00:01: Plug and Play ACPI device, IDs PNP0a03 (active)
[    0.643426] pnp 00:02: [mem 0xfed00000-0xfed003ff]
[    0.643457] pnp 00:02: Plug and Play ACPI device, IDs PNP0103 (active)
[    0.643483] pnp 00:03: [io  0x0010-0x001f]
[    0.643486] pnp 00:03: [io  0x0022-0x002d]
[    0.643489] pnp 00:03: [io  0x0030-0x003f]
[    0.643492] pnp 00:03: [io  0x0044-0x005f]
[    0.643494] pnp 00:03: [io  0x0062-0x0063]
[    0.643497] pnp 00:03: [io  0x0065-0x006f]
[    0.643499] pnp 00:03: [io  0x0072-0x007f]
[    0.643502] pnp 00:03: [io  0x0080]
[    0.643504] pnp 00:03: [io  0x0084-0x0086]
[    0.643507] pnp 00:03: [io  0x0088]
[    0.643509] pnp 00:03: [io  0x008c-0x008e]
[    0.643512] pnp 00:03: [io  0x0090-0x009f]
[    0.643514] pnp 00:03: [io  0x00a2-0x00bd]
[    0.643517] pnp 00:03: [io  0x00e0-0x00ef]
[    0.643519] pnp 00:03: [io  0x08a0-0x08a3]
[    0.643522] pnp 00:03: [io  0x0cc0-0x0ccf]
[    0.643524] pnp 00:03: [io  0x04d0-0x04d1]
[    0.643580] system 00:03: [io  0x08a0-0x08a3] has been reserved
[    0.646617] system 00:03: [io  0x0cc0-0x0ccf] has been reserved
[    0.649506] system 00:03: [io  0x04d0-0x04d1] has been reserved
[    0.652378] system 00:03: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.652396] pnp 00:04: [dma 4]
[    0.652399] pnp 00:04: [io  0x0000-0x000f]
[    0.652402] pnp 00:04: [io  0x0081-0x0083]
[    0.652405] pnp 00:04: [io  0x0087]
[    0.652407] pnp 00:04: [io  0x0089-0x008b]
[    0.652410] pnp 00:04: [io  0x008f]
[    0.652412] pnp 00:04: [io  0x00c0-0x00df]
[    0.652415] pnp 00:04: [io  0x0480-0x048f]
[    0.652452] pnp 00:04: Plug and Play ACPI device, IDs PNP0200 (active)
[    0.652469] pnp 00:05: [io  0x0070-0x0071]
[    0.652496] xen: --> pirq=16 -> irq=8 (gsi=8)
[    0.652500] pnp 00:05: [irq 8]
[    0.652534] pnp 00:05: Plug and Play ACPI device, IDs PNP0b00 (active)
[    0.652545] pnp 00:06: [io  0x0061]
[    0.652573] pnp 00:06: Plug and Play ACPI device, IDs PNP0800 (active)
[    0.652604] xen: --> pirq=17 -> irq=12 (gsi=12)
[    0.652607] pnp 00:07: [irq 12]
[    0.652639] pnp 00:07: Plug and Play ACPI device, IDs PNP0f13 (active)
[    0.652658] pnp 00:08: [io  0x0060]
[    0.652661] pnp 00:08: [io  0x0064]
[    0.652674] xen: --> pirq=18 -> irq=1 (gsi=1)
[    0.652677] pnp 00:08: [irq 1]
[    0.652707] pnp 00:08: Plug and Play ACPI device, IDs PNP0303 PNP030b 
(active)
[    0.652729] pnp 00:09: [io  0x03f0-0x03f5]
[    0.652732] pnp 00:09: [io  0x03f7]
[    0.652745] xen: --> pirq=19 -> irq=6 (gsi=6)
[    0.652748] pnp 00:09: [irq 6]
[    0.652751] pnp 00:09: [dma 2]
[    0.652781] pnp 00:09: Plug and Play ACPI device, IDs PNP0700 (active)
[    0.652808] pnp 00:0a: [io  0x03f8-0x03ff]
[    0.652820] xen: --> pirq=20 -> irq=4 (gsi=4)
[    0.652823] pnp 00:0a: [irq 4]
[    0.652857] pnp 00:0a: Plug and Play ACPI device, IDs PNP0501 (active)
[    0.652896] pnp 00:0b: [io  0x0378-0x037f]
[    0.652909] xen: --> pirq=21 -> irq=7 (gsi=7)
[    0.652912] pnp 00:0b: [irq 7]
[    0.652943] pnp 00:0b: Plug and Play ACPI device, IDs PNP0400 (active)
[    0.652955] pnp 00:0c: [io  0xae00-0xae0f]
[    0.652958] pnp 00:0c: [io  0xb044-0xb047]
[    0.652998] system 00:0c: [io  0xae00-0xae0f] has been reserved
[    0.655892] system 00:0c: [io  0xb044-0xb047] has been reserved
[    0.658765] system 00:0c: Plug and Play ACPI device, IDs PNP0c02 (active)
[    0.659673] pnp: PnP ACPI: found 13 devices
[    0.662368] ACPI: ACPI bus type pnp unregistered
[    0.677442] PCI: max bus depth: 0 pci_try_num: 1
[    0.677455] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7]
[    0.677459] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff]
[    0.677462] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff]
[    0.677466] pci_bus 0000:00: resource 7 [mem 0xf0000000-0xfbffffff]
[    0.677586] NET: Registered protocol family 2
[    0.681082] IP route cache hash table entries: 131072 (order: 8, 1048576 
bytes)
[    0.686297] TCP established hash table entries: 524288 (order: 11, 
8388608 bytes)
[    0.693976] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.697252] TCP: Hash tables configured (established 524288 bind 65536)
[    0.700134] TCP reno registered
[    0.701776] UDP hash table entries: 2048 (order: 4, 65536 bytes)
[    0.703433] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes)
[    0.705532] NET: Registered protocol family 1
[    0.713624] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.716822] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.719408] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.722607] pci 0000:00:02.0: Boot video device
[    0.722733] PCI: CLS 0 bytes, default 64
[    0.724140] audit: initializing netlink socket (disabled)
[    0.726301] type=2000 audit(1358359289.311:1): initialized
[    0.761261] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.767573] VFS: Disk quotas dquot_6.5.2
[    0.770009] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.773907] fuse init (API version 7.17)
[    0.775969] msgmni has been set to 5986
[    0.779199] Block layer SCSI generic (bsg) driver version 0.4 loaded 
(major 253)
[    0.788481] io scheduler noop registered
[    0.791236] io scheduler deadline registered (default)
[    0.793644] io scheduler cfq registered
[    0.795576] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    0.797583] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    0.799720] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    0.802404] ACPI: Power Button [PWRF]
[    0.805016] input: Sleep Button as 
/devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
[    0.808407] ACPI: Sleep Button [SLPF]
[    0.813144] ERST: Table is not found!
[    0.822158] GHES: HEST is not enabled!
[    0.825276] xen: --> pirq=22 -> irq=28 (gsi=28)
[    0.825283] xen-platform-pci 0000:00:03.0: PCI INT A -> GSI 28 (level, 
low) -> IRQ 28
[    0.828179] Grant table initialized
[    0.833456] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    0.835538] init_memory_mapping: 00000000c0000000-00000000c8000000
[    0.837891]  00c0000000 - 00c8000000 page 2M
[    0.909796] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.039308] 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.042107] Linux agpgart interface v0.103
[    1.050200] brd: module loaded
[    1.055316] loop: module loaded
[    1.060727] blkfront device/vbd/51712 num-ring-pages 1 nr_ents 32.
[    1.071667] ata_piix 0000:00:01.1: version 2.13
[    1.073260] blkfront: xvda: barrier: enabled
[    1.074276] ata_piix 0000:00:01.1: setting latency timer to 64
[    1.077274] scsi0 : ata_piix
[    1.080346] scsi1 : ata_piix
[    1.084974] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc200 irq 14
[    1.086073]  xvda: xvda1
[    1.091702] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc208 irq 15
[    1.094545] Fixed MDIO Bus: probed
[    1.096585] tun: Universal TUN/TAP device driver, 1.6
[    1.099142] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    1.101451] PPP generic driver version 2.4.2
[    1.103531] Initialising Xen virtual ethernet driver.
[    1.118234] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.126459] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    1.131351] uhci_hcd: USB Universal Host Controller Interface driver
[    1.135316] usbcore: registered new interface driver libusual
[    1.138271] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 
0x60,0x64 irq 1,12
[    1.161445] serio: i8042 KBD port at 0x60,0x64 irq 1
[    1.164757] serio: i8042 AUX port at 0x60,0x64 irq 12
[    1.167145] mousedev: PS/2 mouse device common for all mice
[    1.173078] input: AT Translated Set 2 keyboard as 
/devices/platform/i8042/serio0/input/input2
[    1.178298] rtc_cmos 00:05: rtc core: registered rtc_cmos as rtc0
[    1.181549] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[    1.184768] device-mapper: uevent: version 1.0.3
[    1.203004] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised: 
dm-devel@redhat.com
[    1.205907] cpuidle: using governor ladder
[    1.207658] cpuidle: using governor menu
[    1.209301] EFI Variables Facility v0.08 2004-May-17
[    1.211211] TCP cubic registered
[    1.212991] NET: Registered protocol family 10
[    1.215865] NET: Registered protocol family 17
[    1.217818] Registering the dns_resolver key type
[    1.220097] PM: Hibernation image not present or could not be loaded.
[    1.220138] registered taskstats version 1
[    1.240638] XENBUS: Device with no driver: device/vkbd/-1
[    1.242308]   Magic number: 13:42:39
[    1.243906] bdi 7:4: hash matches
[    1.245528] pnp 00:0b: hash matches
[    1.249161] rtc_cmos 00:05: setting system clock to 2013-01-16 18:01:29 
UTC (1358359289)
[    1.252045] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    1.254152] EDD information not available.
[    1.280696] ata2.01: NODEV after polling detection
[    1.291783] ata2.00: ATAPI: QEMU DVD-ROM, 1.0.1, max UDMA/100
[    1.296391] ata2.00: configured for MWDMA2
[    1.325039] scsi 1:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM 
1.0. PQ: 0 ANSI: 5
[    1.363462] sr0: scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[    1.365184] cdrom: Uniform CD-ROM driver Revision: 3.20
[    1.367022] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    1.367615] sr 1:0:0:0: Attached scsi generic sg0 type 5
[    1.375267] Freeing unused kernel memory: 924k freed
[    1.377569] Write protecting the kernel read-only data: 12288k
[    1.397057] Freeing unused kernel memory: 1636k freed
[    1.406034] Freeing unused kernel memory: 1200k freed
[    1.465322] udevd[91]: starting version 175
[    2.570567] FDC 0 is a S82078B
[    2.774600] EXT4-fs (xvda1): mounted filesystem with ordered data mode. 
Opts: (null)
[    3.236118] EXT4-fs (xvda1): re-mounted. Opts: (null)
[    4.022669] EXT4-fs (xvda1): re-mounted. Opts: (null)
[    4.708987] EXT4-fs (xvda1): re-mounted. Opts: (null)
[    4.738013] ADDRCONF(NETDEV_UP): eth0: link is not ready
[    4.784132] udevd[335]: starting version 175
[    5.067714] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    5.067874] acpiphp: Slot [1] registered
[    5.067901] acpiphp: Slot [2] registered
[    5.067926] acpiphp: Slot [3] registered
[    5.067956] acpiphp: Slot [4] registered
[    5.067982] acpiphp: Slot [5] registered
[    5.068011] acpiphp: Slot [6] registered
[    5.068036] acpiphp: Slot [7] registered
[    5.068136] acpiphp: Slot [8] registered
[    5.068162] acpiphp: Slot [9] registered
[    5.068193] acpiphp: Slot [10] registered
[    5.068218] acpiphp: Slot [11] registered
[    5.068250] acpiphp: Slot [12] registered
[    5.068276] acpiphp: Slot [13] registered
[    5.068312] acpiphp: Slot [14] registered
[    5.068337] acpiphp: Slot [15] registered
[    5.068366] acpiphp: Slot [16] registered
[    5.068391] acpiphp: Slot [17] registered
[    5.068418] acpiphp: Slot [18] registered
[    5.068441] acpiphp: Slot [19] registered
[    5.068467] acpiphp: Slot [20] registered
[    5.068490] acpiphp: Slot [21] registered
[    5.068515] acpiphp: Slot [22] registered
[    5.068539] acpiphp: Slot [23] registered
[    5.068572] acpiphp: Slot [24] registered
[    5.068597] acpiphp: Slot [25] registered
[    5.068622] acpiphp: Slot [26] registered
[    5.068645] acpiphp: Slot [27] registered
[    5.068667] acpiphp: Slot [28] registered
[    5.068690] acpiphp: Slot [29] registered
[    5.068713] acpiphp: Slot [30] registered
[    5.068736] acpiphp: Slot [31] registered
[    5.637744] parport_pc 00:0b: reported by Plug and Play ACPI
[    5.645608] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[    5.774426] ppdev: user-space parallel port driver
[    5.813435] input: Xen Virtual Keyboard as /devices/virtual/input/input3
[    5.844228] input: Xen Virtual Pointer as /devices/virtual/input/input4
[    6.194152] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input5
[    7.679641] type=1400 audit(1358359295.925:2): apparmor="STATUS" 
operation="profile_load" name="/sbin/dhclient" pid=466 
comm="apparmor_parser"
[    7.679813] type=1400 audit(1358359295.925:3): apparmor="STATUS" 
operation="profile_load" 
name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=466 
comm="apparmor_parser"
[    7.679910] type=1400 audit(1358359295.925:4): apparmor="STATUS" 
operation="profile_load" name="/usr/lib/connman/scripts/dhclient-script" 
pid=466 comm="apparmor_parser"
[    7.871730] type=1400 audit(1358359296.117:5): apparmor="STATUS" 
operation="profile_replace" name="/sbin/dhclient" pid=465 
comm="apparmor_parser"
[    7.871906] type=1400 audit(1358359296.117:6): apparmor="STATUS" 
operation="profile_replace" 
name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=465 
comm="apparmor_parser"
[    7.872007] type=1400 audit(1358359296.117:7): apparmor="STATUS" 
operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" 
pid=465 comm="apparmor_parser"
[    9.205808] eth0: IPv6 duplicate address fe80::216:3eff:fee9:96a8 
detected!
[   10.788193] init: udev-fallback-graphics main process (689) terminated 
with status 1
[   11.239468] init: plymouth main process (261) killed by ABRT signal
[   11.240422] init: plymouth-splash main process (693) terminated with 
status 2
[   13.668274] init: plymouth-log main process (735) terminated with status 
1
[   13.717078] init: failsafe main process (736) killed by TERM signal
[   14.162243] type=1400 audit(1358359302.409:8): apparmor="STATUS" 
operation="profile_replace" name="/sbin/dhclient" pid=798 
comm="apparmor_parser"
[   14.163044] type=1400 audit(1358359302.409:9): apparmor="STATUS" 
operation="profile_replace" 
name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=798 
comm="apparmor_parser"
[   14.163698] type=1400 audit(1358359302.409:10): apparmor="STATUS" 
operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" 
pid=798 comm="apparmor_parser"
[   14.190440] init: plymouth-upstart-bridge main process (805) terminated 
with status 1
[   14.848278] type=1400 audit(1358359303.097:11): apparmor="STATUS" 
operation="profile_load" name="/usr/sbin/tcpdump" pid=806 
comm="apparmor_parser"
[   15.728854] init: plymouth-stop pre-start process (888) terminated with 
status 1

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 17:39               ` Stefano Stabellini
@ 2013-01-16 18:14                 ` Alex Bligh
  2013-01-16 18:49                   ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-16 18:14 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, Ian Campbell, Jan Beulich, Xen Devel

Stefano,

--On 16 January 2013 17:39:59 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

> Let me elaborate on this: the guest is a PV on HVM guest, so at the very
> least the bootloader is going to use the emulated IDE interface to grab
> Xen and the kernel.
> After that the kernel should use the PV disk interface straight away (I
> actually downloaded the image and tried it myself: it is using the PV
> disk interface indeed).

Indeed. Thanks for taking the time to download the image and try it.
Did you see the bug? (using the snapshot and the backing file)

> The bug should occur after the guest has switch over to the PV disk
> interface (state = 4 on xenstore for the vbd device). It can't be the
> IDE emulator triggering the issue.

I would agree. It appears to be triggered at exactly the point the
guest image rewrites the partition table then tries to resize the
filing system. If you fail to change the grub command line, then it
hangs about waiting to retrieve cloud metadata from 169.254.169.254
and nothing crashes until this times out.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 18:14                 ` Alex Bligh
@ 2013-01-16 18:49                   ` Stefano Stabellini
  2013-01-16 19:00                     ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-16 18:49 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Ian Campbell, Jan Beulich,
	Stefano Stabellini

On Wed, 16 Jan 2013, Alex Bligh wrote:
> Stefano,
> 
> --On 16 January 2013 17:39:59 +0000 Stefano Stabellini 
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> > Let me elaborate on this: the guest is a PV on HVM guest, so at the very
> > least the bootloader is going to use the emulated IDE interface to grab
> > Xen and the kernel.
> > After that the kernel should use the PV disk interface straight away (I
> > actually downloaded the image and tried it myself: it is using the PV
> > disk interface indeed).
> 
> Indeed. Thanks for taking the time to download the image and try it.
> Did you see the bug? (using the snapshot and the backing file)

I don't, but I don't have NFS setup at the moment :)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 18:49                   ` Stefano Stabellini
@ 2013-01-16 19:00                     ` Stefano Stabellini
  2013-01-17  7:58                       ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-16 19:00 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Jan Beulich, Alex Bligh, Ian Campbell

On Wed, 16 Jan 2013, Stefano Stabellini wrote:
> On Wed, 16 Jan 2013, Alex Bligh wrote:
> > Stefano,
> > 
> > --On 16 January 2013 17:39:59 +0000 Stefano Stabellini 
> > <stefano.stabellini@eu.citrix.com> wrote:
> > 
> > > Let me elaborate on this: the guest is a PV on HVM guest, so at the very
> > > least the bootloader is going to use the emulated IDE interface to grab
> > > Xen and the kernel.
> > > After that the kernel should use the PV disk interface straight away (I
> > > actually downloaded the image and tried it myself: it is using the PV
> > > disk interface indeed).
> > 
> > Indeed. Thanks for taking the time to download the image and try it.
> > Did you see the bug? (using the snapshot and the backing file)
> 
> I don't, but I don't have NFS setup at the moment :)
> 

BTW what version of NFS are you using?
Do you have a chance to try NFSv4? According to the original thread, it
doesn't have that problem.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 19:00                     ` Stefano Stabellini
@ 2013-01-17  7:58                       ` Alex Bligh
  0 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-01-17  7:58 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, Ian Campbell, Jan Beulich, Xen Devel

Stefano,

--On 16 January 2013 19:00:08 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

> BTW what version of NFS are you using?
> Do you have a chance to try NFSv4? According to the original thread, it
> doesn't have that problem.

We are using NFSv4.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-16 17:33             ` Stefano Stabellini
  2013-01-16 17:39               ` Stefano Stabellini
  2013-01-16 18:12               ` Alex Bligh
@ 2013-01-21 15:15               ` Alex Bligh
  2013-01-21 15:23                 ` Ian Campbell
  2 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 15:15 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Xen Devel, Jan Beulich, Alex Bligh

Stefano,

--On 16 January 2013 17:33:59 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

> Almost. I am saying that the kernel completed the AIO write and notified
> QEMU after it received an ACK from the other end, but before the
> tcp_retransmit was supposed to run.  I admit I am not that familiar with
> the network stack so this is just a supposition.

OK, let's presume you are right here.

The page is still referenced by the networking stack at this point
because it's in some tcp transmit buffer (the original thread
established that), and that will show up in a reference count.

Surely before Xen removes the grant on the page, unmapping it from dom0's
memory, it should check to see if there are any existing references
to the page and if there are, given the kernel its own COW copy, rather
than unmap it totally which is going to lead to problems.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 15:15               ` Alex Bligh
@ 2013-01-21 15:23                 ` Ian Campbell
  2013-01-21 15:35                   ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-21 15:23 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 15:15 +0000, Alex Bligh wrote:
> Surely before Xen removes the grant on the page, unmapping it from dom0's
> memory, it should check to see if there are any existing references
> to the page and if there are, given the kernel its own COW copy, rather
> than unmap it totally which is going to lead to problems.

Unfortunately each page only has one reference count, so you cannot
distinguish between references from this particular NFS write from other
references (other writes, the ref held by the process itself, etc).

My old series added a reference count to the SKB itself exactly so that
it would be possible to know when the network stack was truly finished
with the page in the context of a specific operation.

Unfortunately due to lack of time I've not been able to finish those
off.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 15:23                 ` Ian Campbell
@ 2013-01-21 15:35                   ` Alex Bligh
  2013-01-21 15:50                     ` Ian Campbell
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 15:35 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini

Ian,

--On 21 January 2013 15:23:10 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> On Mon, 2013-01-21 at 15:15 +0000, Alex Bligh wrote:
>> Surely before Xen removes the grant on the page, unmapping it from dom0's
>> memory, it should check to see if there are any existing references
>> to the page and if there are, given the kernel its own COW copy, rather
>> than unmap it totally which is going to lead to problems.
>
> Unfortunately each page only has one reference count, so you cannot
> distinguish between references from this particular NFS write from other
> references (other writes, the ref held by the process itself, etc).
>
> My old series added a reference count to the SKB itself exactly so that
> it would be possible to know when the network stack was truly finished
> with the page in the context of a specific operation.
>
> Unfortunately due to lack of time I've not been able to finish those
> off.

Does that apply even when O_DIRECT is not being used (which I don't
think it is by default for upstream qemu & xen, as it's
cache=writeback, and cache=none produces a different failure)?

If so, I think it's the case that *ALL* NFS dom0 access by Xen domU
VMs is unsafe in the event of tcp retransmit (both in the sense that
the grant can be freed up causing a crash, or the domU's data can be
rewritten post write causing corruption). I think that would also
apply to iSCSI over tcp, which would presumably suffer similarly.

Is that analysis correct?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 15:35                   ` Alex Bligh
@ 2013-01-21 15:50                     ` Ian Campbell
  2013-01-21 16:33                       ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-21 15:50 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Rzeszutek Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 15:35 +0000, Alex Bligh wrote:
> Ian,
> 
> --On 21 January 2013 15:23:10 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> > On Mon, 2013-01-21 at 15:15 +0000, Alex Bligh wrote:
> >> Surely before Xen removes the grant on the page, unmapping it from dom0's
> >> memory, it should check to see if there are any existing references
> >> to the page and if there are, given the kernel its own COW copy, rather
> >> than unmap it totally which is going to lead to problems.
> >
> > Unfortunately each page only has one reference count, so you cannot
> > distinguish between references from this particular NFS write from other
> > references (other writes, the ref held by the process itself, etc).
> >
> > My old series added a reference count to the SKB itself exactly so that
> > it would be possible to know when the network stack was truly finished
> > with the page in the context of a specific operation.
> >
> > Unfortunately due to lack of time I've not been able to finish those
> > off.
> 
> Does that apply even when O_DIRECT is not being used (which I don't
> think it is by default for upstream qemu & xen, as it's
> cache=writeback, and cache=none produces a different failure)?
> 
> If so, I think it's the case that *ALL* NFS dom0 access by Xen domU
> VMs is unsafe in the event of tcp retransmit (both in the sense that
> the grant can be freed up causing a crash, or the domU's data can be
> rewritten post write causing corruption).

Yes. Prior to your report this (assuming it is the same issue) had been
a very difficult to trigger issue -- I was only able to do so with
userspace firewalls rules which deliberately delayed TCP acks.

The fact that you can reproduce so easily makes me wonder if this is
really the same issue. To trigger the issue you need this sequence of
events:
      * Send an RPC
      * RPC is encapsulated into a TCP/IP frame (or several) and sent.
      * Wait for an ACK response to the TCP/IP frame
      * Timeout.
      * Queue a retransmit of the TCP/IP frame(s)
      * Receive the ACK to the original.
      * Receive the reply to the RPC as well
      * Report success up the stack
      * Userspace gets success and unmaps the page
      * Retransmit hits the front of the queue
      * BOOM

To do this you need to be pretty unlucky or retransmitting a lot (which
would usually imply something up with either the network or the filer).

BTW, there is also a similar situation with RPC level retransmits, which
I think might be where the NFSv3 vs v4 comes from (i.e. only v3 is
susceptible to that specific case), this one is very hard to reproduce
as well (although slightly easier than the TCP retransmit one, IIRC)

>  I think that would also
> apply to iSCSI over tcp, which would presumably suffer similarly.

Correct, iSCSI over TCP can also have this issue.

> Is that analysis correct?

The important thing is zero copy vs. non-zero copy or not. IOW it is
only a problem if the actual userspace page, which is a mapped domU
page, is what gets queued up. Whether zero copy is done or not depends
on things like O_DIRECT and write(2) vs. sendpage(2) etc and what the
underlying fs implements etc. I thought NFS only did it for O_DIRECT. I
may be mistaken. aio is probably a factor too.

FWIW blktap2 always copies for pretty much this reason, I seem to recall
the maintainer saying the perf hit wasn't noticeable.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 15:50                     ` Ian Campbell
@ 2013-01-21 16:33                       ` Alex Bligh
  2013-01-21 16:51                         ` Ian Campbell
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 16:33 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini

Ian,

>> If so, I think it's the case that *ALL* NFS dom0 access by Xen domU
>> VMs is unsafe in the event of tcp retransmit (both in the sense that
>> the grant can be freed up causing a crash, or the domU's data can be
>> rewritten post write causing corruption).
>
> Yes. Prior to your report this (assuming it is the same issue) had been
> a very difficult to trigger issue -- I was only able to do so with
> userspace firewalls rules which deliberately delayed TCP acks.

Eek!

> The fact that you can reproduce so easily makes me wonder if this is
> really the same issue. To trigger the issue you need this sequence of
> events:
>       * Send an RPC
>       * RPC is encapsulated into a TCP/IP frame (or several) and sent.
>       * Wait for an ACK response to the TCP/IP frame
>       * Timeout.
>       * Queue a retransmit of the TCP/IP frame(s)
>       * Receive the ACK to the original.
>       * Receive the reply to the RPC as well
>       * Report success up the stack
>       * Userspace gets success and unmaps the page
>       * Retransmit hits the front of the queue
>       * BOOM
>
> To do this you need to be pretty unlucky or retransmitting a lot (which
> would usually imply something up with either the network or the filer).

Well, the two things we are doing different that potentially make this
easier to replicate are:

* We are using a QCOW2 backing file, and running a VM image which
  expands the partition, and then the filing system. This is a particularly
  write heavy load. We're also using upstream qemu DM which I think
  wasn't there when you lasted tested.

* The filer we run this on is a dev filer which is performs poorly,
  and has lots of LUNs (though I think we replicated it on another
  filer too). Though the filer and network certainly aren't great,
  they can run VMs just fine.

> BTW, there is also a similar situation with RPC level retransmits, which
> I think might be where the NFSv3 vs v4 comes from (i.e. only v3 is
> susceptible to that specific case), this one is very hard to reproduce
> as well (although slightly easier than the TCP retransmit one, IIRC)

I /think/ that won't be the issue we have as RPC retransmits on v4 over
TCP happen only very rarely - i.e. when the TCP connection has died,
and I don't believe we are seeing that (though it's difficult to
tell given the box dies totally what happened first).

However, I agree it will suffer from the same problem.

>>  I think that would also
>> apply to iSCSI over tcp, which would presumably suffer similarly.
>
> Correct, iSCSI over TCP can also have this issue.
>
>> Is that analysis correct?
>
> The important thing is zero copy vs. non-zero copy or not. IOW it is
> only a problem if the actual userspace page, which is a mapped domU
> page, is what gets queued up. Whether zero copy is done or not depends
> on things like O_DIRECT and write(2) vs. sendpage(2) etc and what the
> underlying fs implements etc. I thought NFS only did it for O_DIRECT. I
> may be mistaken. aio is probably a factor too.

Right, and I'm pretty sure we're not using O_DIRECT as we're using
cache=writeback (which is the default). Is there some way to make it
copy pages?

I'm wondering whether what's happening is that when the disk grows
(or there's a backing file in place) some sort of different I/O is
done by qemu. Perhaps irrespective of write cache setting, it does some
form of zero copy I/O when there's a backing file in place.

> FWIW blktap2 always copies for pretty much this reason, I seem to recall
> the maintainer saying the perf hit wasn't noticeable.

I'm afraid I find the various blk* combinations a bit of an impenetrable
maze. Is it possible (if only for testing purposes) to use blktap2
with HVM domU and qcow2 disks with backing files? I had thought the
alternatives were qdisk and tap?

And a late comment on  your previous email:

>> Surely before Xen removes the grant on the page, unmapping it from dom0's
>> memory, it should check to see if there are any existing references
>> to the page and if there are, given the kernel its own COW copy, rather
>> than unmap it totally which is going to lead to problems.
>
> Unfortunately each page only has one reference count, so you cannot
> distinguish between references from this particular NFS write from other
> references (other writes, the ref held by the process itself, etc).

Sure, I understand that. But I wasn't suggesting the tcp layer triggered
this (in which case it would need to get back to the NFS write). I
think Trond said you were arranging for sendpage() to provide a callback.
I'm not suggesting that.

What I was (possibly naively) suggesting, is that the single reference
count to the page should be zero by the time the xen grant stuff is
about to remove the mapping, else it's in use somewhere in the domain
into which it's mapped. The xen grant stuff can't know whether that's
for NFS, or iSCSI or whatever. But it does know some other bit of the
kernel is going to use that page, and when it's finished with it will
decrement the reference count and presumably free the page up. So if
it finds a page like this, surely the right thing to do is to leave
a copy of it in dom0, which is no longer associated with the domU
page; it will then get freed when the tcp stack (or whatever is using
it) decrements the reference count later. I don't know if that makes
any sense.


-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 16:33                       ` Alex Bligh
@ 2013-01-21 16:51                         ` Ian Campbell
  2013-01-21 17:06                           ` Alex Bligh
                                             ` (2 more replies)
  0 siblings, 3 replies; 91+ messages in thread
From: Ian Campbell @ 2013-01-21 16:51 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 16:33 +0000, Alex Bligh wrote:

> > The fact that you can reproduce so easily makes me wonder if this is
> > really the same issue. To trigger the issue you need this sequence of
> > events:
> >       * Send an RPC
> >       * RPC is encapsulated into a TCP/IP frame (or several) and sent.
> >       * Wait for an ACK response to the TCP/IP frame
> >       * Timeout.
> >       * Queue a retransmit of the TCP/IP frame(s)
> >       * Receive the ACK to the original.
> >       * Receive the reply to the RPC as well
> >       * Report success up the stack
> >       * Userspace gets success and unmaps the page
> >       * Retransmit hits the front of the queue
> >       * BOOM
> >
> > To do this you need to be pretty unlucky or retransmitting a lot (which
> > would usually imply something up with either the network or the filer).
> 
> Well, the two things we are doing different that potentially make this
> easier to replicate are:
> 
> * We are using a QCOW2 backing file, and running a VM image which
>   expands the partition, and then the filing system. This is a particularly
>   write heavy load. We're also using upstream qemu DM which I think
>   wasn't there when you lasted tested.

I've never tired to repro this with any version of qemu, we used to see
it with vhd+blktap2 and I had a PoC which showed the issue under native
too.

> * The filer we run this on is a dev filer which is performs poorly,
>   and has lots of LUNs (though I think we replicated it on another
>   filer too). Though the filer and network certainly aren't great,
>   they can run VMs just fine.

This could well be a factor I guess.

> >>  I think that would also
> >> apply to iSCSI over tcp, which would presumably suffer similarly.
> >
> > Correct, iSCSI over TCP can also have this issue.
> >
> >> Is that analysis correct?
> >
> > The important thing is zero copy vs. non-zero copy or not. IOW it is
> > only a problem if the actual userspace page, which is a mapped domU
> > page, is what gets queued up. Whether zero copy is done or not depends
> > on things like O_DIRECT and write(2) vs. sendpage(2) etc and what the
> > underlying fs implements etc. I thought NFS only did it for O_DIRECT. I
> > may be mistaken. aio is probably a factor too.
> 
> Right, and I'm pretty sure we're not using O_DIRECT as we're using
> cache=writeback (which is the default). Is there some way to make it
> copy pages?

Not as far as I know, but Trond zero-copy == O_DIRECT so if you aren't
using O_DIRECT then you aren't using zero copy -- and that agrees with
my recollection. In that case your issue is something totally unrelated.

You could try stracing the qemu-dm and see what it does.

> I'm wondering whether what's happening is that when the disk grows
> (or there's a backing file in place) some sort of different I/O is
> done by qemu. Perhaps irrespective of write cache setting, it does some
> form of zero copy I/O when there's a backing file in place.

I doubt that, but I don't really know anything about qdisk.

I'd be much more inclined to suspect a bug in the xen_qdisk backend's
handling of disks resizes, if that's what you are doing.

> > FWIW blktap2 always copies for pretty much this reason, I seem to recall
> > the maintainer saying the perf hit wasn't noticeable.
> 
> I'm afraid I find the various blk* combinations a bit of an impenetrable
> maze. Is it possible (if only for testing purposes) to use blktap2
> with HVM domU and qcow2 disks with backing files? I had thought the
> alternatives were qdisk and tap?

tap == blktap2. I don't know if it supports qcow or not but I don't
think xl exposes it if it does.

You could try with a test .vhd or .raw file though.

> And a late comment on  your previous email:
> 
> >> Surely before Xen removes the grant on the page, unmapping it from dom0's
> >> memory, it should check to see if there are any existing references
> >> to the page and if there are, given the kernel its own COW copy, rather
> >> than unmap it totally which is going to lead to problems.
> >
> > Unfortunately each page only has one reference count, so you cannot
> > distinguish between references from this particular NFS write from other
> > references (other writes, the ref held by the process itself, etc).
> 
> Sure, I understand that. But I wasn't suggesting the tcp layer triggered
> this (in which case it would need to get back to the NFS write). I
> think Trond said you were arranging for sendpage() to provide a callback.
> I'm not suggesting that.
> 
> What I was (possibly naively) suggesting, is that the single reference
> count to the page should be zero by the time the xen grant stuff is
> about to remove the mapping,

Unfortunately it won't be zero. There will be at least one reference
from the page being part of the process, which won't be dropped until
the process dies.

BTW I'm talking about the dom0 kernels page reference count. Xen's page
reference count is irrelevant here.

>  else it's in use somewhere in the domain
> into which it's mapped. The xen grant stuff can't know whether that's
> for NFS, or iSCSI or whatever. But it does know some other bit of the
> kernel is going to use that page, and when it's finished with it will
> decrement the reference count and presumably free the page up. So if
> it finds a page like this, surely the right thing to do is to leave
> a copy of it in dom0, which is no longer associated with the domU
> page; it will then get freed when the tcp stack (or whatever is using
> it) decrements the reference count later. I don't know if that makes
> any sense.

The whole point is that there is no such reference count which drops to
zero under these circumstances, that's why my series "skb paged fragment
destructors" adds one.

I suggest you google up previous discussions on the netdev list about
this issue -- all these sorts of ideas were discussed back then.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 16:51                         ` Ian Campbell
@ 2013-01-21 17:06                           ` Alex Bligh
  2013-01-21 17:29                             ` Ian Campbell
  2013-01-21 17:31                           ` Alex Bligh
  2013-01-21 20:37                           ` Alex Bligh
  2 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 17:06 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini

Ian,

> Not as far as I know, but Trond zero-copy == O_DIRECT so if you aren't
> using O_DIRECT then you aren't using zero copy -- and that agrees with
> my recollection. In that case your issue is something totally unrelated.
>
> You could try stracing the qemu-dm and see what it does.

Will do. I'm wondering if AIO + NFS is ever zero copy without O_DIRECT.

>> I'm wondering whether what's happening is that when the disk grows
>> (or there's a backing file in place) some sort of different I/O is
>> done by qemu. Perhaps irrespective of write cache setting, it does some
>> form of zero copy I/O when there's a backing file in place.
>
> I doubt that, but I don't really know anything about qdisk.
>
> I'd be much more inclined to suspect a bug in the xen_qdisk backend's
> handling of disks resizes, if that's what you are doing.

We aren't resizing the qcow2 disk itself. What we're doing is
creating a 20G (virtual size) qcow2 disk, containing a 3G (or
so) Ubuntu image - i.e. the partition table says it's 3G. We
then take a snapshot of it and use that as a backing file. The
guest then writes to the partition table enlarging it to the
virtual size of the disk, then resizes the file system. This
triggers it. Unless QEMU has some special reason to care about
what is in the partition table (e.g. to support the old xen
'mount a file as a partition' stuff), it's just a pile of sectors
being written.

> tap == blktap2. I don't know if it supports qcow or not but I don't
> think xl exposes it if it does.

Well, in xl's conf file we are using
 disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]

I think that's how you are meant to do qcow2 isn't it?

> You could try with a test .vhd or .raw file though.

We can do this but I'm betting it won't fail (at least with .raw)
as it only breaks on qcow2 if there's a backing file associated
with the qcow2 file (i.e. if we're writing to a snapshot).

> Unfortunately it won't be zero. There will be at least one reference
> from the page being part of the process, which won't be dropped until
> the process dies.

OK, well this is my ignorance of how the grant mechanism work.
I had assumed the page from the relevant domU got mapped into the
process in dom0, and that when it was unmapped it would be mapped
back out of the process's memory. Otherwise would the process's
memory map not fill up?

> BTW I'm talking about the dom0 kernels page reference count. Xen's page
> reference count is irrelevant here.

Indeed.

> I suggest you google up previous discussions on the netdev list about
> this issue -- all these sorts of ideas were discussed back then.

OK. I will google.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 17:06                           ` Alex Bligh
@ 2013-01-21 17:29                             ` Ian Campbell
  0 siblings, 0 replies; 91+ messages in thread
From: Ian Campbell @ 2013-01-21 17:29 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 17:06 +0000, Alex Bligh wrote:
> >> I'm wondering whether what's happening is that when the disk grows
> >> (or there's a backing file in place) some sort of different I/O is
> >> done by qemu. Perhaps irrespective of write cache setting, it does some
> >> form of zero copy I/O when there's a backing file in place.
> >
> > I doubt that, but I don't really know anything about qdisk.
> >
> > I'd be much more inclined to suspect a bug in the xen_qdisk backend's
> > handling of disks resizes, if that's what you are doing.
> 
> We aren't resizing the qcow2 disk itself. What we're doing is
> creating a 20G (virtual size) qcow2 disk, containing a 3G (or
> so) Ubuntu image - i.e. the partition table says it's 3G. We
> then take a snapshot of it and use that as a backing file. The
> guest then writes to the partition table enlarging it to the
> virtual size of the disk, then resizes the file system. This
> triggers it. Unless QEMU has some special reason to care about
> what is in the partition table (e.g. to support the old xen
> 'mount a file as a partition' stuff), it's just a pile of sectors
> being written.
> 
> > tap == blktap2. I don't know if it supports qcow or not but I don't
> > think xl exposes it if it does.
> 
> Well, in xl's conf file we are using
>  disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]
> 
> I think that's how you are meant to do qcow2 isn't it?

See docs/misc/xl-disk-configuration.txt, the "tap" prefix is deprecated
and ignored by xl. Sorry, I didn't think of this usage of "tap" above.
With xend the tap: prefix did force blktap (1 or 2) to be used. xl tries
to pick the most suitable, and picks xen_qdisk for qcow, I think always.

> > You could try with a test .vhd or .raw file though.
> 
> We can do this but I'm betting it won't fail (at least with .raw)
> as it only breaks on qcow2 if there's a backing file associated
> with the qcow2 file (i.e. if we're writing to a snapshot).
> 
> > Unfortunately it won't be zero. There will be at least one reference
> > from the page being part of the process, which won't be dropped until
> > the process dies.
> 
> OK, well this is my ignorance of how the grant mechanism work.
> I had assumed the page from the relevant domU got mapped into the
> process in dom0, and that when it was unmapped it would be mapped
> back out of the process's memory. Otherwise would the process's
> memory map not fill up?

The page is mapped out of the user process like you expect. The problem
is that you cannot tell if the network stack still has a reference after
the write() syscall has finished. if you were to assume it did then you
would indeed fill the processes memory map.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 16:51                         ` Ian Campbell
  2013-01-21 17:06                           ` Alex Bligh
@ 2013-01-21 17:31                           ` Alex Bligh
  2013-01-21 17:32                             ` Ian Campbell
  2013-01-21 20:37                           ` Alex Bligh
  2 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 17:31 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini



--On 21 January 2013 16:51:13 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

>> Right, and I'm pretty sure we're not using O_DIRECT as we're using
>> cache=writeback (which is the default). Is there some way to make it
>> copy pages?
>
> Not as far as I know,

A thought: would it be possible for testing purposes to replace
the grant-map by a grant-copy? How easy would that be? Would that
not provide a page with the same lifecycle as a normal dom0 allocated
one?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 17:31                           ` Alex Bligh
@ 2013-01-21 17:32                             ` Ian Campbell
  2013-01-21 18:14                               ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-21 17:32 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 17:31 +0000, Alex Bligh wrote:
> 
> --On 21 January 2013 16:51:13 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> >> Right, and I'm pretty sure we're not using O_DIRECT as we're using
> >> cache=writeback (which is the default). Is there some way to make it
> >> copy pages?
> >
> > Not as far as I know,
> 
> A thought: would it be possible for testing purposes to replace
> the grant-map by a grant-copy? How easy would that be? Would that
> not provide a page with the same lifecycle as a normal dom0 allocated
> one?

You would also need to malloc/free the buffer you are copying to/from. I
don't know how hard that is within qemu.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 17:32                             ` Ian Campbell
@ 2013-01-21 18:14                               ` Alex Bligh
  2013-01-22 10:05                                 ` Ian Campbell
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 18:14 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini

Ian,

--On 21 January 2013 17:32:54 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> You would also need to malloc/free the buffer you are copying to/from. I
> don't know how hard that is within qemu.

I think it's ioreq_map and ioreq_unmap within hw/xen_disk.c. I had
foolishly assumed xc would do the grant copy, but it looks like
as you say I need malloc/free (or mmap equivalents) + memcpy.

Is this a useful approach to try?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 16:51                         ` Ian Campbell
  2013-01-21 17:06                           ` Alex Bligh
  2013-01-21 17:31                           ` Alex Bligh
@ 2013-01-21 20:37                           ` Alex Bligh
  2013-01-22 10:07                             ` Ian Campbell
                                               ` (2 more replies)
  2 siblings, 3 replies; 91+ messages in thread
From: Alex Bligh @ 2013-01-21 20:37 UTC (permalink / raw)
  To: Ian Campbell, Stefano Stabellini
  Cc: Konrad Wilk, Alex Bligh, Jan Beulich, Xen Devel

Ian, Stefano,

--On 21 January 2013 16:51:13 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> Not as far as I know, but Trond zero-copy == O_DIRECT so if you aren't
> using O_DIRECT then you aren't using zero copy -- and that agrees with
> my recollection. In that case your issue is something totally unrelated.

Further investigation suggests that Stefano's commit
  47982cb00584371928e44ab6dfc6865d605a52fd
(attached below) may have somewhat surprising results.

Firstly, changing the cache=writeback settings as passed to the QEMU
command line probably only affects emulated disks, as the parameters
for the PV disk appear to be hard coded per this commit, assuming I've
understood correctly. I am guessing my fiddling with the cache=
setting merely caused the emulated disk (used in HVM until the kernel
has loaded) to break.

Secondly, the chosen mode of cache operation is:
  BDRV_O_NOCACHE | BDRV_O_CACHE_WB
This appears to be the same as "cache=none" produces (see code
fragment from bdrv_parse_cache_flags below), which is somewhat
counterintuitive given the name of the second flag. "cache=writeback"
(as appears on the command line) uses BDRV_O_CACHE_WB only.

BDRV_O_NOCACHE appears to map on Linux to O_DIRECT, and BDRV_O_CACHE_WB
to writeback caching. This implies O_DIRECT will always be used. This
is somewhat surprising as qemu by default only uses O_DIRECT with
cache=none, and yet the emulated devices are set up with the
equivalent of cache=writeback.

But this would explain why I'm still seeing the crash with O_DIRECT
apparently off (cache=writeback), as the cache setting is being ignored.

This would also explain why Ian might not have seen it (it went in
late and without O_DIRECT we think this crash can't happen).

Is the BDRV_O_NOCACHE | BDRV_O_CACHE_WB combination intentional or
should BDRV_O_NOCACHE be removed? Why would the default be different
for emulated and PV disks?

-- 
Alex Bligh


commit 47982cb00584371928e44ab6dfc6865d605a52fd
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Fri Mar 23 14:36:18 2012 +0000

    xen_disk: open disks with BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO

    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index 285a951..16c3e66 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -663,10 +663,10 @@ static int blk_init(struct XenDevice *xendev)
     }

     /* read-only ? */
+    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
-        qflags = BDRV_O_RDWR;
+        qflags |= BDRV_O_RDWR;
     } else {
-        qflags = 0;
         info  |= VDISK_READONLY;
     }




Except from qemu's block.c

int bdrv_parse_cache_flags(const char *mode, int *flags)
{
    *flags &= ~BDRV_O_CACHE_MASK;

    if (!strcmp(mode, "off") || !strcmp(mode, "none")) {
        *flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
    } else if (!strcmp(mode, "directsync")) {
        *flags |= BDRV_O_NOCACHE;
    } else if (!strcmp(mode, "writeback")) {
        *flags |= BDRV_O_CACHE_WB;
    } else if (!strcmp(mode, "unsafe")) {
        *flags |= BDRV_O_CACHE_WB;
        *flags |= BDRV_O_NO_FLUSH;
    } else if (!strcmp(mode, "writethrough")) {
        /* this is the default */
    } else {
        return -1;
    }

    return 0;
}

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 18:14                               ` Alex Bligh
@ 2013-01-22 10:05                                 ` Ian Campbell
  2013-01-22 13:02                                   ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-22 10:05 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 18:14 +0000, Alex Bligh wrote:
> Ian,
> 
> --On 21 January 2013 17:32:54 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> > You would also need to malloc/free the buffer you are copying to/from. I
> > don't know how hard that is within qemu.
> 
> I think it's ioreq_map and ioreq_unmap within hw/xen_disk.c. I had
> foolishly assumed xc would do the grant copy, but it looks like
> as you say I need malloc/free (or mmap equivalents) + memcpy.
> 
> Is this a useful approach to try?

I've never looked inside xen_qdisk so I can't really advise, but that
sound broadly correct, except you'd want to use gnttab_copy not mmap
+memcpy. e.g. (totally making up the API because I'm too lazy to go
look):
	buffer = malloc(SIZE)
	if (writing)
		gnttab_copy_from(buffer, gntref, size) 

	io->buffer = buffer
	submit(io)
... wait for completion:
	if (reading)
		gnttab_copy_to(...)
	free(io->buffer)

Hopefully you get the idea ;-) Sounds like before the ... goes in
ioreq_map and after the ... goes in ioreq_unmap.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 20:37                           ` Alex Bligh
@ 2013-01-22 10:07                             ` Ian Campbell
  2013-01-22 13:01                               ` Alex Bligh
  2013-01-22 10:13                             ` Ian Campbell
  2013-01-22 15:42                             ` Stefano Stabellini
  2 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-22 10:07 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 20:37 +0000, Alex Bligh wrote:
> This would also explain why Ian might not have seen it (it went in
> late and without O_DIRECT we think this crash can't happen). 

When I saw this sort of thing it was with PV guests and blktap. I have
never tried anything like this with an HVM guest so it's not hard to
explain why I've never seen it there ;-).

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 20:37                           ` Alex Bligh
  2013-01-22 10:07                             ` Ian Campbell
@ 2013-01-22 10:13                             ` Ian Campbell
  2013-01-22 12:59                               ` Alex Bligh
  2013-01-22 15:42                             ` Stefano Stabellini
  2 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-22 10:13 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Mon, 2013-01-21 at 20:37 +0000, Alex Bligh wrote:
> Is the BDRV_O_NOCACHE | BDRV_O_CACHE_WB combination intentional or
> should BDRV_O_NOCACHE be removed? Why would the default be different
> for emulated and PV disks? 

AIUI it is intentional. The safe option (i,e, the one which doesn't risk
trashing the filesystem) is not to cache (so that writes have really hit
the disk when the device says so). We want to always use this mode with
PV devices.

However the performance impact of not caching the emulated (esp IDE)
devices is enormous and so the trade off was made to cache them on the
theory that they would only be used during install and early boot and
that OSes would switch to PV drivers ASAP. Also AIUI the emulated IDE
interface doesn't have any way to request that something really hits the
disk anyway, but I might be misremembering that.

Check the archives from around the time this change was made, I'm pretty
sure there was some discussion of why things are this way.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 10:13                             ` Ian Campbell
@ 2013-01-22 12:59                               ` Alex Bligh
  2013-01-22 15:46                                 ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-22 12:59 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini

Ian,

--On 22 January 2013 10:13:58 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> AIUI it is intentional. The safe option (i,e, the one which doesn't risk
> trashing the filesystem) is not to cache (so that writes have really hit
> the disk when the device says so). We want to always use this mode with
> PV devices.

Except it isn't the safe option because O_DIRECT (apparently) is not safe
with NFS or (we think) iSCSI, and in a far nastier way. And fixing that
is more intrusive (see your patch series).

I can't test it at the moment (travelling), but if not using O_DIRECT
fixed things, would you accept something that somehow allowed one to
specify an option of not using O_DIRECT? I need to look at how to get
the disk config line from xen_disk.c if so.

> However the performance impact of not caching the emulated (esp IDE)
> devices is enormous and so the trade off was made to cache them on the
> theory that they would only be used during install and early boot and
> that OSes would switch to PV drivers ASAP. Also AIUI the emulated IDE
> interface doesn't have any way to request that something really hits the
> disk anyway, but I might be misremembering that.

You mean the equivalent of flush/fua? I can't comment, but I believe
the write barrier stuff isn't in 4.2 in xen_disk either.

> Check the archives from around the time this change was made, I'm pretty
> sure there was some discussion of why things are this way.

I tried but couldn't find anything. I could only find the patches
going to qemu-devel and not on the main list, but I may well have missed
them as I'm relying on Google finding stuff in archives at that time.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 10:07                             ` Ian Campbell
@ 2013-01-22 13:01                               ` Alex Bligh
  2013-01-22 13:14                                 ` Ian Campbell
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-22 13:01 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini



--On 22 January 2013 10:07:07 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> On Mon, 2013-01-21 at 20:37 +0000, Alex Bligh wrote:
>> This would also explain why Ian might not have seen it (it went in
>> late and without O_DIRECT we think this crash can't happen).
>
> When I saw this sort of thing it was with PV guests and blktap. I have
> never tried anything like this with an HVM guest so it's not hard to
> explain why I've never seen it there ;-).

My knowledge of PV guests is about zero, but I thought they used
xen_disk.c as well. However, as you were testing pre April 2012 I
think, they wouldn't have used O_DIRECT.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 10:05                                 ` Ian Campbell
@ 2013-01-22 13:02                                   ` Alex Bligh
  2013-01-22 13:13                                     ` Ian Campbell
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-22 13:02 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini

Ian,

--On 22 January 2013 10:05:16 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> I've never looked inside xen_qdisk so I can't really advise, but that
> sound broadly correct, except you'd want to use gnttab_copy not mmap
> +memcpy. e.g. (totally making up the API because I'm too lazy to go
> look):
> 	buffer = malloc(SIZE)
> 	if (writing)
> 		gnttab_copy_from(buffer, gntref, size)
>
> 	io->buffer = buffer
> 	submit(io)
> ... wait for completion:
> 	if (reading)
> 		gnttab_copy_to(...)
> 	free(io->buffer)
>
> Hopefully you get the idea ;-) Sounds like before the ... goes in
> ioreq_map and after the ... goes in ioreq_unmap.

Thanks. If the O_DIRECT thing doesn't fix it, I'll take a look again.
My initial analysis was that a copy interface wasn't available through
xc_ type functions.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 13:02                                   ` Alex Bligh
@ 2013-01-22 13:13                                     ` Ian Campbell
  0 siblings, 0 replies; 91+ messages in thread
From: Ian Campbell @ 2013-01-22 13:13 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Tue, 2013-01-22 at 13:02 +0000, Alex Bligh wrote:
> Ian,
> 
> --On 22 January 2013 10:05:16 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> > I've never looked inside xen_qdisk so I can't really advise, but that
> > sound broadly correct, except you'd want to use gnttab_copy not mmap
> > +memcpy. e.g. (totally making up the API because I'm too lazy to go
> > look):
> > 	buffer = malloc(SIZE)
> > 	if (writing)
> > 		gnttab_copy_from(buffer, gntref, size)
> >
> > 	io->buffer = buffer
> > 	submit(io)
> > ... wait for completion:
> > 	if (reading)
> > 		gnttab_copy_to(...)
> > 	free(io->buffer)
> >
> > Hopefully you get the idea ;-) Sounds like before the ... goes in
> > ioreq_map and after the ... goes in ioreq_unmap.
> 
> Thanks. If the O_DIRECT thing doesn't fix it, I'll take a look again.
> My initial analysis was that a copy interface wasn't available through
> xc_ type functions.

I think the underlying /dev/xen/gntdev thing does, but you may need to
plumb it through libxc.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 13:01                               ` Alex Bligh
@ 2013-01-22 13:14                                 ` Ian Campbell
  2013-01-22 13:18                                   ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-01-22 13:14 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Konrad Wilk, Xen Devel, Jan Beulich, Stefano Stabellini

On Tue, 2013-01-22 at 13:01 +0000, Alex Bligh wrote:
> 
> --On 22 January 2013 10:07:07 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> > On Mon, 2013-01-21 at 20:37 +0000, Alex Bligh wrote:
> >> This would also explain why Ian might not have seen it (it went in
> >> late and without O_DIRECT we think this crash can't happen).
> >
> > When I saw this sort of thing it was with PV guests and blktap. I have
> > never tried anything like this with an HVM guest so it's not hard to
> > explain why I've never seen it there ;-).
> 
> My knowledge of PV guests is about zero, but I thought they used
> xen_disk.c as well.

In my case they were using blkback, which is the kernel and blktap2
(partially kernel and partially usersapce). There is no qemu component
there.

Also I was using a VHD file

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 13:14                                 ` Ian Campbell
@ 2013-01-22 13:18                                   ` Alex Bligh
  0 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-01-22 13:18 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Konrad Wilk, Xen Devel, Alex Bligh, Jan Beulich, Stefano Stabellini



--On 22 January 2013 13:14:22 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
wrote:

> In my case they were using blkback, which is the kernel and blktap2
> (partially kernel and partially usersapce). There is no qemu component
> there.
>
> Also I was using a VHD file

Ah of course. I have no idea what open() options blkback uses.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-21 20:37                           ` Alex Bligh
  2013-01-22 10:07                             ` Ian Campbell
  2013-01-22 10:13                             ` Ian Campbell
@ 2013-01-22 15:42                             ` Stefano Stabellini
  2013-01-22 16:09                               ` Stefano Stabellini
  2 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-22 15:42 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Wilk, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

On Mon, 21 Jan 2013, Alex Bligh wrote:
> Ian, Stefano,
> 
> --On 21 January 2013 16:51:13 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> > Not as far as I know, but Trond zero-copy == O_DIRECT so if you aren't
> > using O_DIRECT then you aren't using zero copy -- and that agrees with
> > my recollection. In that case your issue is something totally unrelated.
> 
> Further investigation suggests that Stefano's commit
>   47982cb00584371928e44ab6dfc6865d605a52fd
> (attached below) may have somewhat surprising results.
> 
> Firstly, changing the cache=writeback settings as passed to the QEMU
> command line probably only affects emulated disks, as the parameters
> for the PV disk appear to be hard coded per this commit, assuming I've
> understood correctly. I am guessing my fiddling with the cache=
> setting merely caused the emulated disk (used in HVM until the kernel
> has loaded) to break.

That is correct.


> Secondly, the chosen mode of cache operation is:
>   BDRV_O_NOCACHE | BDRV_O_CACHE_WB
> This appears to be the same as "cache=none" produces (see code
> fragment from bdrv_parse_cache_flags below), which is somewhat
> counterintuitive given the name of the second flag. "cache=writeback"
> (as appears on the command line) uses BDRV_O_CACHE_WB only.
>
> BDRV_O_NOCACHE appears to map on Linux to O_DIRECT, and BDRV_O_CACHE_WB
> to writeback caching. This implies O_DIRECT will always be used. This
> is somewhat surprising as qemu by default only uses O_DIRECT with
> cache=none, and yet the emulated devices are set up with the
> equivalent of cache=writeback.

Yes, it is counterintuitive, but you got it right: BDRV_O_NOCACHE |
BDRV_O_CACHE_WB means O_DIRECT.


> But this would explain why I'm still seeing the crash with O_DIRECT
> apparently off (cache=writeback), as the cache setting is being ignored.
> 
> This would also explain why Ian might not have seen it (it went in
> late and without O_DIRECT we think this crash can't happen).
> 
> Is the BDRV_O_NOCACHE | BDRV_O_CACHE_WB combination intentional or
> should BDRV_O_NOCACHE be removed? Why would the default be different
> for emulated and PV disks?

The setting is different from the one of emulated devices because after
analyzing the IDE code, we thought that using BDRV_O_CACHE_WB would be
safe enough because when the guest wants to make sure that the data hits
the disk, it issues an IDE FLUSH_CACHE operation.

In the xen_disk case instead, we weren't quite sure about the
assumptions of all the possible different PV frontend drivers, so we
went for the safe choice, that is O_DIRECT.

In fact if we wanted to change the cache setting for xen_disk, we would
probably have to go back to write-through (this setting is selected by
passing neither BDRV_O_NOCACHE nor BDRV_O_CACHE_WB) that is quite slow.

Recently, thanks to Konrad's work on blkfront cache flushes, a new flush
operation has been implemented in the block protocol:
BLKIF_OP_FLUSH_DISKCACHE. BLKIF_OP_FLUSH_DISKCACHE was introduced in
xen_disk by 7e7b7cba16faa7b721b822fa9ed8bebafa35700f "xen_disk:
implement BLKIF_OP_FLUSH_DISKCACHE, remove BLKIF_OP_WRITE_BARRIER".
Thanks to the new operation, maybe it is now safe to use write-back
caching.
Konrad, what do you think? Is blkback using the Linux disk cache by
default? Or is it using O_DIRECT?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 12:59                               ` Alex Bligh
@ 2013-01-22 15:46                                 ` Stefano Stabellini
  0 siblings, 0 replies; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-22 15:46 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Wilk, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

On Tue, 22 Jan 2013, Alex Bligh wrote:
> Ian,
> 
> --On 22 January 2013 10:13:58 +0000 Ian Campbell <Ian.Campbell@citrix.com> 
> wrote:
> 
> > AIUI it is intentional. The safe option (i,e, the one which doesn't risk
> > trashing the filesystem) is not to cache (so that writes have really hit
> > the disk when the device says so). We want to always use this mode with
> > PV devices.
> 
> Except it isn't the safe option because O_DIRECT (apparently) is not safe
> with NFS or (we think) iSCSI, and in a far nastier way. And fixing that
> is more intrusive (see your patch series).
> 
> I can't test it at the moment (travelling), but if not using O_DIRECT
> fixed things, would you accept something that somehow allowed one to
> specify an option of not using O_DIRECT? I need to look at how to get
> the disk config line from xen_disk.c if so.

I think that it would be acceptable.


> > However the performance impact of not caching the emulated (esp IDE)
> > devices is enormous and so the trade off was made to cache them on the
> > theory that they would only be used during install and early boot and
> > that OSes would switch to PV drivers ASAP. Also AIUI the emulated IDE
> > interface doesn't have any way to request that something really hits the
> > disk anyway, but I might be misremembering that.

Actually it is other way around: IDE has a way to specify that data
needs to hit the disk. The block protocol has it too, but the old
operation to do that (BLKIF_OP_WRITE_BARRIER) wasn't considered safe.
Konrad might be able to elaborate on the reasons why.


> You mean the equivalent of flush/fua? I can't comment, but I believe
> the write barrier stuff isn't in 4.2 in xen_disk either.

It is not, but given that it can be considered an important bug fix (the
old BLKIF_OP_WRITE_BARRIER op is known to have issues), I would be OK with
backporting it.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 15:42                             ` Stefano Stabellini
@ 2013-01-22 16:09                               ` Stefano Stabellini
  2013-01-22 20:31                                 ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-22 16:09 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Jan Beulich, Konrad Wilk, Ian Campbell, Alex Bligh, Xen Devel

On Tue, 22 Jan 2013, Stefano Stabellini wrote:
> > But this would explain why I'm still seeing the crash with O_DIRECT
> > apparently off (cache=writeback), as the cache setting is being ignored.
> > 
> > This would also explain why Ian might not have seen it (it went in
> > late and without O_DIRECT we think this crash can't happen).
> > 
> > Is the BDRV_O_NOCACHE | BDRV_O_CACHE_WB combination intentional or
> > should BDRV_O_NOCACHE be removed? Why would the default be different
> > for emulated and PV disks?
> 
> The setting is different from the one of emulated devices because after
> analyzing the IDE code, we thought that using BDRV_O_CACHE_WB would be
> safe enough because when the guest wants to make sure that the data hits
> the disk, it issues an IDE FLUSH_CACHE operation.
> 
> In the xen_disk case instead, we weren't quite sure about the
> assumptions of all the possible different PV frontend drivers, so we
> went for the safe choice, that is O_DIRECT.
> 
> In fact if we wanted to change the cache setting for xen_disk, we would
> probably have to go back to write-through (this setting is selected by
> passing neither BDRV_O_NOCACHE nor BDRV_O_CACHE_WB) that is quite slow.
> 
> Recently, thanks to Konrad's work on blkfront cache flushes, a new flush
> operation has been implemented in the block protocol:
> BLKIF_OP_FLUSH_DISKCACHE. BLKIF_OP_FLUSH_DISKCACHE was introduced in
> xen_disk by 7e7b7cba16faa7b721b822fa9ed8bebafa35700f "xen_disk:
> implement BLKIF_OP_FLUSH_DISKCACHE, remove BLKIF_OP_WRITE_BARRIER".
> Thanks to the new operation, maybe it is now safe to use write-back
> caching.
> Konrad, what do you think? Is blkback using the Linux disk cache by
> default? Or is it using O_DIRECT?

Looking more closely at xen_disk's flush operations,
even if the semantics of the old BLKIF_OP_WRITE_BARRIER is confused, the
implementation of it in xen_disk is strickly a superset of the flushes
performed by the new BLKIF_OP_FLUSH_DISKCACHE.

So unless blkfront issues fewer cache flushes when using
BLKIF_OP_WRITE_BARRIER, I am tempted to say that even with the old flush
operation, it might be safe to open the file write-back.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 16:09                               ` Stefano Stabellini
@ 2013-01-22 20:31                                 ` Alex Bligh
  2013-01-23 11:52                                   ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-22 20:31 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Alex Bligh, Konrad Wilk, Ian Campbell, Jan Beulich, Xen Devel

Stefano,

--On 22 January 2013 16:09:21 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

> Looking more closely at xen_disk's flush operations,
> even if the semantics of the old BLKIF_OP_WRITE_BARRIER is confused, the
> implementation of it in xen_disk is strickly a superset of the flushes
> performed by the new BLKIF_OP_FLUSH_DISKCACHE.
>
> So unless blkfront issues fewer cache flushes when using
> BLKIF_OP_WRITE_BARRIER, I am tempted to say that even with the old flush
> operation, it might be safe to open the file write-back.

Either writeback or 'user specifiable' would be my preference. Writethrough
has known performance problems with various disk formats, "notably qcow2"
(see qemu manpage). And "none" (aka O_DIRECT) appears to be dangerous.

I suspect the best possible answer is a
  cache=[qemu-cache-identifier]
config key, which gets put in xenstore, and which xen_disk.c then
interprets using the same routine QEMU does itself for cache= on the
command line, then uses exactly those BDEV flags.

For completeness one could also add an emulcache= option and just
pass that straight through to the qemu command line for the emulated
drives.

I had a quick look at this on the train and it appears that to do it
properly requires fiddling with lex file and possibly xenstore, i.e.
is not completely trivial.

An alternative more disgusting arrangement would be to overload one
of the existing options.

What's the right approach here, and have I any hope of getting a patch
into 4.2.2? Not a disaster if not as I already need to maintain a local
tree due to the absence of HVM live migrate on qemu-upstream.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-22 20:31                                 ` Alex Bligh
@ 2013-01-23 11:52                                   ` Stefano Stabellini
  2013-01-23 15:19                                     ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-23 11:52 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Wilk, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

On Tue, 22 Jan 2013, Alex Bligh wrote:
> Stefano,
> 
> --On 22 January 2013 16:09:21 +0000 Stefano Stabellini 
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> > Looking more closely at xen_disk's flush operations,
> > even if the semantics of the old BLKIF_OP_WRITE_BARRIER is confused, the
> > implementation of it in xen_disk is strickly a superset of the flushes
> > performed by the new BLKIF_OP_FLUSH_DISKCACHE.
> >
> > So unless blkfront issues fewer cache flushes when using
> > BLKIF_OP_WRITE_BARRIER, I am tempted to say that even with the old flush
> > operation, it might be safe to open the file write-back.
> 
> Either writeback or 'user specifiable' would be my preference. Writethrough
> has known performance problems with various disk formats, "notably qcow2"
> (see qemu manpage). And "none" (aka O_DIRECT) appears to be dangerous.
> 
> I suspect the best possible answer is a
>   cache=[qemu-cache-identifier]
> config key, which gets put in xenstore, and which xen_disk.c then
> interprets using the same routine QEMU does itself for cache= on the
> command line, then uses exactly those BDEV flags.

that would be nice to have


> For completeness one could also add an emulcache= option and just
> pass that straight through to the qemu command line for the emulated
> drives.

In qemu-xen it should be already possible to change the cache setting
for the IDE disk by passing the right command line option to QEMU. Not
ideal, but it would work.


> I had a quick look at this on the train and it appears that to do it
> properly requires fiddling with lex file and possibly xenstore, i.e.
> is not completely trivial.
> 
> An alternative more disgusting arrangement would be to overload one
> of the existing options.
> 
> What's the right approach here, and have I any hope of getting a patch
> into 4.2.2? Not a disaster if not as I already need to maintain a local
> tree due to the absence of HVM live migrate on qemu-upstream.

Backports are only for bug fixes. However it might be possible to
backport a simple patch that in case of opening failure with O_DIRECT,
tries again without it (see appended). I don't think that would work
for you though, unless you manage to mount NFS in a way that would
return EINVAL on open(O_DIRECT), like it used to.

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index 33a5531..d6d71fe 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -659,9 +659,16 @@ static int blk_init(struct XenDevice *xendev)
        if (blkdev->bs) {
            if (bdrv_open2(blkdev->bs, blkdev->filename, qflags,
                            bdrv_find_format(blkdev->fileproto)) != 0) {
-               bdrv_delete(blkdev->bs);
-               blkdev->bs = NULL;
-           }
+                /* try without O_DIRECT */
+                xen_be_printf(&blkdev->xendev, 0, "opening %s with O_DIRECT failed, trying write-through.\n",
+                        blkdev->filename);
+                qflags &= BDRV_O_NOCACHE;
+                if (bdrv_open2(blkdev->bs, blkdev->filename, qflags,
+                            bdrv_find_format(blkdev->fileproto)) != 0) {
+                    bdrv_delete(blkdev->bs);
+                    blkdev->bs = NULL;
+                }
+            }
        }
        if (!blkdev->bs)
            return -1;

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-23 11:52                                   ` Stefano Stabellini
@ 2013-01-23 15:19                                     ` Alex Bligh
  2013-01-23 16:29                                       ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-23 15:19 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Campbell
  Cc: Konrad Wilk, Alex Bligh, Jan Beulich, Xen Devel

Ian, Stefano,

First bit of good news: Avoiding using O_DIRECT in xen_disk.c
fixes the problem (see patch below for what we tested).

Therefore I think this problem will potentially occur:

* Using qemu-xen or qemu-traditional (both of which use O_DIRECT)

* With anything that uses a grant of another domX's memory (I
  suspect that includes PV guests as well as HVM).

* Whenever the dom0 uses TCP to the backing device. That includes
  NFS and iSCSI.

Disabling O_DIRECT is therefore a workaround, but I suspect not
an attractive workaround performance-wise. Fixing it properly
would appear to require Ian C's skbuff page tracking stuff.

>> I suspect the best possible answer is a
>>   cache=[qemu-cache-identifier]
>> config key, which gets put in xenstore, and which xen_disk.c then
>> interprets using the same routine QEMU does itself for cache= on the
>> command line, then uses exactly those BDEV flags.
>
> that would be nice to have

Good

>> For completeness one could also add an emulcache= option and just
>> pass that straight through to the qemu command line for the emulated
>> drives.
>
> In qemu-xen it should be already possible to change the cache setting
> for the IDE disk by passing the right command line option to QEMU. Not
> ideal, but it would work.

It would indeed, but 'cache=writeback' is currently hard coded. I was
planning to change that to 'cache=%s' and pass the value of
emulcache=.

>> I had a quick look at this on the train and it appears that to do it
>> properly requires fiddling with lex file and possibly xenstore, i.e.
>> is not completely trivial.
>>
>> An alternative more disgusting arrangement would be to overload one
>> of the existing options.
>>
>> What's the right approach here, and have I any hope of getting a patch
>> into 4.2.2? Not a disaster if not as I already need to maintain a local
>> tree due to the absence of HVM live migrate on qemu-upstream.
>
> Backports are only for bug fixes.

Sure - and it would seem to me dom0 crashing horribly is a bug! I
have a reliable way to replicate it, but it can happen to others.

In terms of a local fix, a one line change to xen_disk is all I need.

What I'm asking is what is the right way to fix this bug in 4.2.
At one extreme, I can probably cook up a 'let's do it properly' patch
which takes the proposed cache= option (most useful, most intrusive).
At the other extreme, we could just apply the patch below or a variant
of it conditional on a configure flag (least useful, least intrusive).

> However it might be possible to
> backport a simple patch that in case of opening failure with O_DIRECT,
> tries again without it (see appended). I don't think that would work
> for you though, unless you manage to mount NFS in a way that would
> return EINVAL on open(O_DIRECT), like it used to.

I don't think that will work for any of the problem cases I've identified
above as NFS always permits O_DIRECT opens (I'm not even sure whether
CONFIG_NFS_DIRECTIO still exists), and so does iSCSI.

-- 
Alex Bligh

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index a402ac8..1c3a6f5 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
     }

     /* read-only ? */
-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
     } else {

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-23 15:19                                     ` Alex Bligh
@ 2013-01-23 16:29                                       ` Stefano Stabellini
  2013-01-25 11:28                                         ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-01-23 16:29 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Wilk, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

On Wed, 23 Jan 2013, Alex Bligh wrote:
> Ian, Stefano,
> 
> First bit of good news: Avoiding using O_DIRECT in xen_disk.c
> fixes the problem (see patch below for what we tested).
> 
> Therefore I think this problem will potentially occur:
> 
> * Using qemu-xen or qemu-traditional (both of which use O_DIRECT)
> 
> * With anything that uses a grant of another domX's memory (I
>   suspect that includes PV guests as well as HVM).
> 
> * Whenever the dom0 uses TCP to the backing device. That includes
>   NFS and iSCSI.
> 
> Disabling O_DIRECT is therefore a workaround, but I suspect not
> an attractive workaround performance-wise. Fixing it properly
> would appear to require Ian C's skbuff page tracking stuff.
> 
> >> I suspect the best possible answer is a
> >>   cache=[qemu-cache-identifier]
> >> config key, which gets put in xenstore, and which xen_disk.c then
> >> interprets using the same routine QEMU does itself for cache= on the
> >> command line, then uses exactly those BDEV flags.
> >
> > that would be nice to have
> 
> Good
> 
> >> For completeness one could also add an emulcache= option and just
> >> pass that straight through to the qemu command line for the emulated
> >> drives.
> >
> > In qemu-xen it should be already possible to change the cache setting
> > for the IDE disk by passing the right command line option to QEMU. Not
> > ideal, but it would work.
> 
> It would indeed, but 'cache=writeback' is currently hard coded. I was
> planning to change that to 'cache=%s' and pass the value of
> emulcache=.
> 
> >> I had a quick look at this on the train and it appears that to do it
> >> properly requires fiddling with lex file and possibly xenstore, i.e.
> >> is not completely trivial.
> >>
> >> An alternative more disgusting arrangement would be to overload one
> >> of the existing options.
> >>
> >> What's the right approach here, and have I any hope of getting a patch
> >> into 4.2.2? Not a disaster if not as I already need to maintain a local
> >> tree due to the absence of HVM live migrate on qemu-upstream.
> >
> > Backports are only for bug fixes.
> 
> Sure - and it would seem to me dom0 crashing horribly is a bug! I
> have a reliable way to replicate it, but it can happen to others.
> 
> In terms of a local fix, a one line change to xen_disk is all I need.
> 
> What I'm asking is what is the right way to fix this bug in 4.2.
> At one extreme, I can probably cook up a 'let's do it properly' patch
> which takes the proposed cache= option (most useful, most intrusive).
> At the other extreme, we could just apply the patch below or a variant
> of it conditional on a configure flag (least useful, least intrusive).
> 
> > However it might be possible to
> > backport a simple patch that in case of opening failure with O_DIRECT,
> > tries again without it (see appended). I don't think that would work
> > for you though, unless you manage to mount NFS in a way that would
> > return EINVAL on open(O_DIRECT), like it used to.
> 
> I don't think that will work for any of the problem cases I've identified
> above as NFS always permits O_DIRECT opens (I'm not even sure whether
> CONFIG_NFS_DIRECTIO still exists), and so does iSCSI.
> 
> -- 
> Alex Bligh
> 
> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index a402ac8..1c3a6f5 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
>      }
> 
>      /* read-only ? */
> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> +    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>      if (strcmp(blkdev->mode, "w") == 0) {
>          qflags |= BDRV_O_RDWR;
>      } else {

Before going for something like that I would like a confirmation from
Konrad about blkfront behavior regarding barriers and
BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data
corruptions.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-23 16:29                                       ` Stefano Stabellini
@ 2013-01-25 11:28                                         ` Alex Bligh
  2013-02-05 15:40                                           ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-01-25 11:28 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Wilk
  Cc: Alex Bligh, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

Konrad,

--On 23 January 2013 16:29:20 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

>> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
>> index a402ac8..1c3a6f5 100644
>> --- a/hw/xen_disk.c
>> +++ b/hw/xen_disk.c
>> @@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
>>      }
>>
>>      /* read-only ? */
>> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>> +    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>>      if (strcmp(blkdev->mode, "w") == 0) {
>>          qflags |= BDRV_O_RDWR;
>>      } else {
>
> Before going for something like that I would like a confirmation from
> Konrad about blkfront behavior regarding barriers and
> BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data
> corruptions.

Any ideas?


A slightly prettier patch would look like the one pasted
below (not sent with git-sendemail so beware whitespace issues).

-- 
Alex Bligh

commit a7d7296aebc21af15074f3bf64c5c6795ca05f16
Author: Alex Bligh <alex@alex.org.uk>
Date:   Thu Jan 24 09:41:34 2013 +0000

    Disable use of O_DIRECT by default as it results in crashes.

    See:
      http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
    for more details.

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index a402ac8..a618d8d 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -45,6 +45,8 @@ static int batch_maps   = 0;

 static int max_requests = 32;

+static int use_o_direct = 0;
+
 /* ------------------------------------------------------------- */

 #define BLOCK_SIZE  512
@@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
     }

     /* read-only ? */
-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB | 
BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
     } else {

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-01-25 11:28                                         ` Alex Bligh
@ 2013-02-05 15:40                                           ` Alex Bligh
  2013-02-22 17:28                                             ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-02-05 15:40 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Wilk
  Cc: Alex Bligh, Ian Campbell, Jan Beulich, Xen Devel

Konrad / Stefano,

Any movement / ideas on this one?

Alex

--On 25 January 2013 11:28:31 +0000 Alex Bligh <alex@alex.org.uk> wrote:

> Konrad,
>
> --On 23 January 2013 16:29:20 +0000 Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>
>>> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
>>> index a402ac8..1c3a6f5 100644
>>> --- a/hw/xen_disk.c
>>> +++ b/hw/xen_disk.c
>>> @@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
>>>      }
>>>
>>>      /* read-only ? */
>>> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>>> +    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB |
>>> BDRV_O_NATIVE_AIO; if (strcmp(blkdev->mode, "w") == 0) {
>>>          qflags |= BDRV_O_RDWR;
>>>      } else {
>>
>> Before going for something like that I would like a confirmation from
>> Konrad about blkfront behavior regarding barriers and
>> BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data
>> corruptions.
>
> Any ideas?
>
>
> A slightly prettier patch would look like the one pasted
> below (not sent with git-sendemail so beware whitespace issues).
>
> --
> Alex Bligh
>
> commit a7d7296aebc21af15074f3bf64c5c6795ca05f16
> Author: Alex Bligh <alex@alex.org.uk>
> Date:   Thu Jan 24 09:41:34 2013 +0000
>
>     Disable use of O_DIRECT by default as it results in crashes.
>
>     See:
>       http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
>     for more details.
>
> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index a402ac8..a618d8d 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -45,6 +45,8 @@ static int batch_maps   = 0;
>
>  static int max_requests = 32;
>
> +static int use_o_direct = 0;
> +
>  /* ------------------------------------------------------------- */
>
>  #define BLOCK_SIZE  512
> @@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
>      }
>
>      /* read-only ? */
> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> +    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB |
> BDRV_O_NATIVE_AIO;
>      if (strcmp(blkdev->mode, "w") == 0) {
>          qflags |= BDRV_O_RDWR;
>      } else {
>
>
>



-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-02-05 15:40                                           ` Alex Bligh
@ 2013-02-22 17:28                                             ` Alex Bligh
  2013-02-22 17:41                                               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-02-22 17:28 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Wilk
  Cc: Alex Bligh, Ian Campbell, Jan Beulich, Xen Devel

Konrad / Stefano,

Any update here?

I can't help but think a crashed dom0 and guaranteed domU corruption is
less bad than a theoretical data loss on a crash (not that we know
that theoretical possibility exists yet).

I'm currently using this trivial patch
 https://github.com/abligh/qemu-upstream-4.2-testing-livemigrate/commit/a7d72
96aebc21af15074f3bf64c5c6795ca05f16

Alex


--On 5 February 2013 15:40:33 +0000 Alex Bligh <alex@alex.org.uk> wrote:

> Konrad / Stefano,
>
> Any movement / ideas on this one?
>
> Alex
>
> --On 25 January 2013 11:28:31 +0000 Alex Bligh <alex@alex.org.uk> wrote:
>
>> Konrad,
>>
>> --On 23 January 2013 16:29:20 +0000 Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>>
>>>> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
>>>> index a402ac8..1c3a6f5 100644
>>>> --- a/hw/xen_disk.c
>>>> +++ b/hw/xen_disk.c
>>>> @@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
>>>>      }
>>>>
>>>>      /* read-only ? */
>>>> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>>>> +    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB |
>>>> BDRV_O_NATIVE_AIO; if (strcmp(blkdev->mode, "w") == 0) {
>>>>          qflags |= BDRV_O_RDWR;
>>>>      } else {
>>>
>>> Before going for something like that I would like a confirmation from
>>> Konrad about blkfront behavior regarding barriers and
>>> BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data
>>> corruptions.
>>
>> Any ideas?
>>
>>
>> A slightly prettier patch would look like the one pasted
>> below (not sent with git-sendemail so beware whitespace issues).
>>
>> --
>> Alex Bligh
>>
>> commit a7d7296aebc21af15074f3bf64c5c6795ca05f16
>> Author: Alex Bligh <alex@alex.org.uk>
>> Date:   Thu Jan 24 09:41:34 2013 +0000
>>
>>     Disable use of O_DIRECT by default as it results in crashes.
>>
>>     See:
>>       http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
>>     for more details.
>>
>> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
>> index a402ac8..a618d8d 100644
>> --- a/hw/xen_disk.c
>> +++ b/hw/xen_disk.c
>> @@ -45,6 +45,8 @@ static int batch_maps   = 0;
>>
>>  static int max_requests = 32;
>>
>> +static int use_o_direct = 0;
>> +
>>  /* ------------------------------------------------------------- */
>>
>>  # define BLOCK_SIZE  512
>> @@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
>>      }
>>
>>      /* read-only ? */
>> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>> +    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB |
>> BDRV_O_NATIVE_AIO;
>>      if (strcmp(blkdev->mode, "w") == 0) {
>>          qflags |= BDRV_O_RDWR;
>>      } else {
>>
>>
>>
>
>
>
> --
> Alex Bligh
>
>



-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-02-22 17:28                                             ` Alex Bligh
@ 2013-02-22 17:41                                               ` Konrad Rzeszutek Wilk
  2013-02-22 18:00                                                 ` Stefano Stabellini
  2013-02-22 19:53                                                 ` Alex Bligh
  0 siblings, 2 replies; 91+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-02-22 17:41 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

On Fri, Feb 22, 2013 at 05:28:42PM +0000, Alex Bligh wrote:
> Konrad / Stefano,
> 
> Any update here?

Sorry, been so busy with bugs that this keeps on getting deferred.
> 
> I can't help but think a crashed dom0 and guaranteed domU corruption is
> less bad than a theoretical data loss on a crash (not that we know
> that theoretical possibility exists yet).

So from a perspective of blkif protocol, as long as the sync requests
is properly being sent - then we are fine. I recall Stefano (or mayber Roger)
finding some oddity in the xen_disk were the fsync wasn't sent.

You should be able to test this rather easy by (in your guest)
mounting an ext3 or ext4 with barrier support and then looking at
the blktrace/blkparse to make sure that the sync commands are indeed hitting
the platter.

Thought there is probably a nice framework to do all of this automatically.
Perhaps fio does that already..

> 
> I'm currently using this trivial patch
> https://github.com/abligh/qemu-upstream-4.2-testing-livemigrate/commit/a7d72
> 96aebc21af15074f3bf64c5c6795ca05f16
> 
> Alex
> 
> 
> --On 5 February 2013 15:40:33 +0000 Alex Bligh <alex@alex.org.uk> wrote:
> 
> >Konrad / Stefano,
> >
> >Any movement / ideas on this one?
> >
> >Alex
> >
> >--On 25 January 2013 11:28:31 +0000 Alex Bligh <alex@alex.org.uk> wrote:
> >
> >>Konrad,
> >>
> >>--On 23 January 2013 16:29:20 +0000 Stefano Stabellini
> >><stefano.stabellini@eu.citrix.com> wrote:
> >>
> >>>>diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> >>>>index a402ac8..1c3a6f5 100644
> >>>>--- a/hw/xen_disk.c
> >>>>+++ b/hw/xen_disk.c
> >>>>@@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
> >>>>     }
> >>>>
> >>>>     /* read-only ? */
> >>>>-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> >>>>+    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB |
> >>>>BDRV_O_NATIVE_AIO; if (strcmp(blkdev->mode, "w") == 0) {
> >>>>         qflags |= BDRV_O_RDWR;
> >>>>     } else {
> >>>
> >>>Before going for something like that I would like a confirmation from
> >>>Konrad about blkfront behavior regarding barriers and
> >>>BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data
> >>>corruptions.
> >>
> >>Any ideas?
> >>
> >>
> >>A slightly prettier patch would look like the one pasted
> >>below (not sent with git-sendemail so beware whitespace issues).
> >>
> >>--
> >>Alex Bligh
> >>
> >>commit a7d7296aebc21af15074f3bf64c5c6795ca05f16
> >>Author: Alex Bligh <alex@alex.org.uk>
> >>Date:   Thu Jan 24 09:41:34 2013 +0000
> >>
> >>    Disable use of O_DIRECT by default as it results in crashes.
> >>
> >>    See:
> >>      http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> >>    for more details.
> >>
> >>diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> >>index a402ac8..a618d8d 100644
> >>--- a/hw/xen_disk.c
> >>+++ b/hw/xen_disk.c
> >>@@ -45,6 +45,8 @@ static int batch_maps   = 0;
> >>
> >> static int max_requests = 32;
> >>
> >>+static int use_o_direct = 0;
> >>+
> >> /* ------------------------------------------------------------- */
> >>
> >> # define BLOCK_SIZE  512
> >>@@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
> >>     }
> >>
> >>     /* read-only ? */
> >>-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> >>+    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB |
> >>BDRV_O_NATIVE_AIO;
> >>     if (strcmp(blkdev->mode, "w") == 0) {
> >>         qflags |= BDRV_O_RDWR;
> >>     } else {
> >>
> >>
> >>
> >
> >
> >
> >--
> >Alex Bligh
> >
> >
> 
> 
> 
> -- 
> Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-02-22 17:41                                               ` Konrad Rzeszutek Wilk
@ 2013-02-22 18:00                                                 ` Stefano Stabellini
  2013-02-22 19:53                                                 ` Alex Bligh
  1 sibling, 0 replies; 91+ messages in thread
From: Stefano Stabellini @ 2013-02-22 18:00 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Xen Devel, Ian Campbell, Alex Bligh, Stefano Stabellini

On Fri, 22 Feb 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Feb 22, 2013 at 05:28:42PM +0000, Alex Bligh wrote:
> > Konrad / Stefano,
> > 
> > Any update here?
> 
> Sorry, been so busy with bugs that this keeps on getting deferred.

I also have been too busy to look at this yet


> > I can't help but think a crashed dom0 and guaranteed domU corruption is
> > less bad than a theoretical data loss on a crash (not that we know
> > that theoretical possibility exists yet).
> 
> So from a perspective of blkif protocol, as long as the sync requests
> is properly being sent - then we are fine. I recall Stefano (or mayber Roger)
> finding some oddity in the xen_disk were the fsync wasn't sent.
> 
> You should be able to test this rather easy by (in your guest)
> mounting an ext3 or ext4 with barrier support and then looking at
> the blktrace/blkparse to make sure that the sync commands are indeed hitting
> the platter.
> 
> Thought there is probably a nice framework to do all of this automatically.
> Perhaps fio does that already..

Yes, if you can prove that the sync requests are all sent at the right
time and handled properly, then I am OK with the change.


> > I'm currently using this trivial patch
> > https://github.com/abligh/qemu-upstream-4.2-testing-livemigrate/commit/a7d72
> > 96aebc21af15074f3bf64c5c6795ca05f16
> > 
> > Alex
> > 
> > 
> > --On 5 February 2013 15:40:33 +0000 Alex Bligh <alex@alex.org.uk> wrote:
> > 
> > >Konrad / Stefano,
> > >
> > >Any movement / ideas on this one?
> > >
> > >Alex
> > >
> > >--On 25 January 2013 11:28:31 +0000 Alex Bligh <alex@alex.org.uk> wrote:
> > >
> > >>Konrad,
> > >>
> > >>--On 23 January 2013 16:29:20 +0000 Stefano Stabellini
> > >><stefano.stabellini@eu.citrix.com> wrote:
> > >>
> > >>>>diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> > >>>>index a402ac8..1c3a6f5 100644
> > >>>>--- a/hw/xen_disk.c
> > >>>>+++ b/hw/xen_disk.c
> > >>>>@@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
> > >>>>     }
> > >>>>
> > >>>>     /* read-only ? */
> > >>>>-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> > >>>>+    qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB |
> > >>>>BDRV_O_NATIVE_AIO; if (strcmp(blkdev->mode, "w") == 0) {
> > >>>>         qflags |= BDRV_O_RDWR;
> > >>>>     } else {
> > >>>
> > >>>Before going for something like that I would like a confirmation from
> > >>>Konrad about blkfront behavior regarding barriers and
> > >>>BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data
> > >>>corruptions.
> > >>
> > >>Any ideas?
> > >>
> > >>
> > >>A slightly prettier patch would look like the one pasted
> > >>below (not sent with git-sendemail so beware whitespace issues).
> > >>
> > >>--
> > >>Alex Bligh
> > >>
> > >>commit a7d7296aebc21af15074f3bf64c5c6795ca05f16
> > >>Author: Alex Bligh <alex@alex.org.uk>
> > >>Date:   Thu Jan 24 09:41:34 2013 +0000
> > >>
> > >>    Disable use of O_DIRECT by default as it results in crashes.
> > >>
> > >>    See:
> > >>      http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> > >>    for more details.
> > >>
> > >>diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> > >>index a402ac8..a618d8d 100644
> > >>--- a/hw/xen_disk.c
> > >>+++ b/hw/xen_disk.c
> > >>@@ -45,6 +45,8 @@ static int batch_maps   = 0;
> > >>
> > >> static int max_requests = 32;
> > >>
> > >>+static int use_o_direct = 0;
> > >>+
> > >> /* ------------------------------------------------------------- */
> > >>
> > >> # define BLOCK_SIZE  512
> > >>@@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
> > >>     }
> > >>
> > >>     /* read-only ? */
> > >>-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> > >>+    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB |
> > >>BDRV_O_NATIVE_AIO;
> > >>     if (strcmp(blkdev->mode, "w") == 0) {
> > >>         qflags |= BDRV_O_RDWR;
> > >>     } else {
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > >--
> > >Alex Bligh
> > >
> > >
> > 
> > 
> > 
> > -- 
> > Alex Bligh
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-02-22 17:41                                               ` Konrad Rzeszutek Wilk
  2013-02-22 18:00                                                 ` Stefano Stabellini
@ 2013-02-22 19:53                                                 ` Alex Bligh
  2013-03-06 11:50                                                   ` Alex Bligh
  1 sibling, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-02-22 19:53 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Alex Bligh, Stefano Stabellini, Ian Campbell, Jan Beulich, Xen Devel

Konrad,

--On 22 February 2013 12:41:29 -0500 Konrad Rzeszutek Wilk 
<konrad.wilk@oracle.com> wrote:

> You should be able to test this rather easy by (in your guest)
> mounting an ext3 or ext4 with barrier support and then looking at
> the blktrace/blkparse to make sure that the sync commands are indeed
> hitting the platter.

OK, I will do that.

I take it that it will sufficient to show:
a) blktrace on the guest performing FUA/FLUSH operations; and
b) blktrace on the host performing FUA/FLUSH operations
in each case where there is an ext4 FS with barrier support turned
on.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-02-22 19:53                                                 ` Alex Bligh
@ 2013-03-06 11:50                                                   ` Alex Bligh
  2013-03-07  1:01                                                     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-06 11:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Alex Bligh, Stefano Stabellini, Ian Campbell, Jan Beulich, Xen Devel

Konrad,

--On 22 February 2013 19:53:22 +0000 Alex Bligh <alex@alex.org.uk> wrote:

>> You should be able to test this rather easy by (in your guest)
>> mounting an ext3 or ext4 with barrier support and then looking at
>> the blktrace/blkparse to make sure that the sync commands are indeed
>> hitting the platter.
>
> OK, I will do that.
>
> I take it that it will sufficient to show:
> a) blktrace on the guest performing FUA/FLUSH operations; and
> b) blktrace on the host performing FUA/FLUSH operations
> in each case where there is an ext4 FS with barrier support turned
> on.

Results are positive.

We used the -y flag on sparsecopy
 https://github.com/abligh/sparsecopy

to generate frequent barrier operations on a write to a file. This does
an fdatasync() after a configurable number of bytes. This was run in a VM
with /dev/xvda mapped to a qcow file on a Constellation ES.2 SATA drive
in the host on /dev/sdb (with no other traffic on /dev/sdb).

This is with O_DIRECT switched off.

The sound of the disk is a bit of give-away this is working, but for those
that like blktrace / blkparse output. The WBS indicates a barrier write.

I think this indicates barriers are getting through. Correct?

-- 
Alex Bligh

Extract from output of guest VM

202,0    0        1     0.000000000     0  D  WS 1066249 + 8 [swapper/0]
202,0    0        2     0.009730334     0  C  WS 1066249 + 8 [0]
202,0    0        3     0.009737210     0  C  WS 1066249 [0]
202,0    0        4     0.015065483     0  C  WS 2013633 + 32 [0]
202,0    0        5     0.016021243     0  C  WS 1066257 + 40 [0]
202,0    0        6     0.016149561   217  A WBS 1066297 + 8 <- (202,1) 
1050232
202,0    0        7     0.016154194   217  Q WBS 1066297 + 8 [(null)]
202,0    0        8     0.016158208   217  G WBS 1066297 + 8 [(null)]
202,0    0        9     0.016162792   217  I WBS 1066297 + 8 [(null)]
202,0    0       10     0.034824799     0  D  WS 1066297 + 8 [swapper/0]
202,0    0       11     0.041799906     0  C  WS 1066297 + 8 [0]
202,0    0       12     0.041807562     0  C  WS 1066297 [0]
202,0    1        1     0.014174798  3601  A  WS 2013633 + 32 <- (202,1) 
1997568

Extract from output of host VM

  8,17   1        0     0.205626177     0  m   N cfq1542S / complete 
rqnoidle 1
  8,17   1        0     0.205630473     0  m   N cfq1542S / set_slice=30
  8,17   1        0     0.205637109     0  m   N cfq1542S / arm_idle: 2 
group_idle: 0
  8,17   1        0     0.205638061     0  m   N cfq schedule dispatch
  8,16   1       72     0.205742935  1542  A WBS 1950869136 + 8 <- (8,17) 
1950867088
  8,17   1       73     0.205746817  1542  Q  WS 1950869136 + 8 
[jbd2/sdb1-8]
  8,17   1       74     0.205754223  1542  G  WS 1950869136 + 8 
[jbd2/sdb1-8]
  8,17   1       75     0.205758076  1542  I  WS 1950869136 + 8 
[jbd2/sdb1-8]
  8,17   1        0     0.205760996     0  m   N cfq1542S / insert_request
  8,17   1        0     0.205766871     0  m   N cfq1542S / dispatch_insert
  8,17   1        0     0.205770429     0  m   N cfq1542S / dispatched a 
request
  8,17   1        0     0.205772193     0  m   N cfq1542S / activate rq, 
drv=1
  8,17   1       76     0.205772854  1542  D  WS 1950869136 + 8 
[jbd2/sdb1-8]
  8,17   1        0     0.210488008     0  m   N cfq idle timer fired

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-03-06 11:50                                                   ` Alex Bligh
@ 2013-03-07  1:01                                                     ` Konrad Rzeszutek Wilk
  2013-03-07  4:15                                                       ` Stefano Stabellini
  2013-03-07  8:16                                                       ` Alex Bligh
  0 siblings, 2 replies; 91+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-07  1:01 UTC (permalink / raw)
  To: Alex Bligh; +Cc: Stefano Stabellini, Ian Campbell, Jan Beulich, Xen Devel

On Wed, Mar 06, 2013 at 11:50:48AM +0000, Alex Bligh wrote:
> Konrad,
> 
> --On 22 February 2013 19:53:22 +0000 Alex Bligh <alex@alex.org.uk> wrote:
> 
> >>You should be able to test this rather easy by (in your guest)
> >>mounting an ext3 or ext4 with barrier support and then looking at
> >>the blktrace/blkparse to make sure that the sync commands are indeed
> >>hitting the platter.
> >
> >OK, I will do that.
> >
> >I take it that it will sufficient to show:
> >a) blktrace on the guest performing FUA/FLUSH operations; and
> >b) blktrace on the host performing FUA/FLUSH operations
> >in each case where there is an ext4 FS with barrier support turned
> >on.
> 
> Results are positive.

Great!
> 
> We used the -y flag on sparsecopy
> https://github.com/abligh/sparsecopy
> 
> to generate frequent barrier operations on a write to a file. This does
> an fdatasync() after a configurable number of bytes. This was run in a VM
> with /dev/xvda mapped to a qcow file on a Constellation ES.2 SATA drive
> in the host on /dev/sdb (with no other traffic on /dev/sdb).
> 
> This is with O_DIRECT switched off.
> 
> The sound of the disk is a bit of give-away this is working, but for those
> that like blktrace / blkparse output. The WBS indicates a barrier write.
> 
> I think this indicates barriers are getting through. Correct?

Yes, and in the correct order (which is the most important part).

Thought it is interesting that the application is doing write and sync
(WS) for other blocks (like 2013633), which I thought would have
been the case if the file was opened with O_SYNC. The sparsecopy
code does not seem to do it?

> 
> -- 
> Alex Bligh
> 
> Extract from output of guest VM
> 
> 202,0    0        1     0.000000000     0  D  WS 1066249 + 8 [swapper/0]
> 202,0    0        2     0.009730334     0  C  WS 1066249 + 8 [0]
> 202,0    0        3     0.009737210     0  C  WS 1066249 [0]
> 202,0    0        4     0.015065483     0  C  WS 2013633 + 32 [0]
> 202,0    0        5     0.016021243     0  C  WS 1066257 + 40 [0]
> 202,0    0        6     0.016149561   217  A WBS 1066297 + 8 <-
> (202,1) 1050232
> 202,0    0        7     0.016154194   217  Q WBS 1066297 + 8 [(null)]
> 202,0    0        8     0.016158208   217  G WBS 1066297 + 8 [(null)]
> 202,0    0        9     0.016162792   217  I WBS 1066297 + 8 [(null)]
> 202,0    0       10     0.034824799     0  D  WS 1066297 + 8 [swapper/0]
> 202,0    0       11     0.041799906     0  C  WS 1066297 + 8 [0]
> 202,0    0       12     0.041807562     0  C  WS 1066297 [0]
> 202,0    1        1     0.014174798  3601  A  WS 2013633 + 32 <-
> (202,1) 1997568
> 
> Extract from output of host VM
> 
>  8,17   1        0     0.205626177     0  m   N cfq1542S / complete
> rqnoidle 1
>  8,17   1        0     0.205630473     0  m   N cfq1542S / set_slice=30
>  8,17   1        0     0.205637109     0  m   N cfq1542S / arm_idle:
> 2 group_idle: 0
>  8,17   1        0     0.205638061     0  m   N cfq schedule dispatch
>  8,16   1       72     0.205742935  1542  A WBS 1950869136 + 8 <-
> (8,17) 1950867088
>  8,17   1       73     0.205746817  1542  Q  WS 1950869136 + 8
> [jbd2/sdb1-8]
>  8,17   1       74     0.205754223  1542  G  WS 1950869136 + 8
> [jbd2/sdb1-8]
>  8,17   1       75     0.205758076  1542  I  WS 1950869136 + 8
> [jbd2/sdb1-8]
>  8,17   1        0     0.205760996     0  m   N cfq1542S / insert_request
>  8,17   1        0     0.205766871     0  m   N cfq1542S / dispatch_insert
>  8,17   1        0     0.205770429     0  m   N cfq1542S /
> dispatched a request
>  8,17   1        0     0.205772193     0  m   N cfq1542S / activate
> rq, drv=1
>  8,17   1       76     0.205772854  1542  D  WS 1950869136 + 8
> [jbd2/sdb1-8]
>  8,17   1        0     0.210488008     0  m   N cfq idle timer fired
> 
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-03-07  1:01                                                     ` Konrad Rzeszutek Wilk
@ 2013-03-07  4:15                                                       ` Stefano Stabellini
  2013-03-07 10:47                                                         ` [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
  2013-03-07 10:51                                                         ` Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
  2013-03-07  8:16                                                       ` Alex Bligh
  1 sibling, 2 replies; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-07  4:15 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Jan Beulich, Stefano Stabellini, Ian Campbell, Alex Bligh, Xen Devel

On Thu, 7 Mar 2013, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 06, 2013 at 11:50:48AM +0000, Alex Bligh wrote:
> > I think this indicates barriers are getting through. Correct?
> 
> Yes, and in the correct order (which is the most important part).

Alex, can you submit a proper patch for QEMU upstream to remove
BDRV_O_NOCACHE from qflags?
Please add an explanation on why you are removing it and a link to this
thread.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-03-07  1:01                                                     ` Konrad Rzeszutek Wilk
  2013-03-07  4:15                                                       ` Stefano Stabellini
@ 2013-03-07  8:16                                                       ` Alex Bligh
  1 sibling, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-07  8:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Alex Bligh, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini

Konrad,

--On 6 March 2013 20:01:53 -0500 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:

>> Results are positive.
>
> Great!

Does that mean some sort of switch to turn off O_DIRECT (or turning it off
by default) can go into qemu?

> Thought it is interesting that the application is doing write and sync
> (WS) for other blocks (like 2013633), which I thought would have
> been the case if the file was opened with O_SYNC. The sparsecopy
> code does not seem to do it?

On both host and guest every single write has 'S' set. You are correct that
sparsecopy doesn't open with O_SYNC. Perhaps it's the fact that it's the
fdatasync() which is causing the writes from the page cache (and that by
definition is blocking) which causes blktrace to see 'S' bits? I think it
was writing 4k blocks and syncing every 16k.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-07  4:15                                                       ` Stefano Stabellini
@ 2013-03-07 10:47                                                         ` Alex Bligh
  2013-03-08  3:18                                                           ` Stefano Stabellini
  2013-03-07 10:51                                                         ` Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
  1 sibling, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-07 10:47 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini
  Cc: Jan Beulich, Ian Jackson, Ian Campbell, Alex Bligh,
	Konrad Rzeszutek Wilk

Due to what is almost certainly a kernel bug, writes with
O_DIRECT may continue to reference the page after the write
has been marked as completed, particularly in the case of
TCP retransmit. In other scenarios, this "merely" risks
data corruption on the write, but with Xen pages from domU
are only transiently mapped into dom0's memory, resulting
in kernel panics when they are subsequently accessed.

See:
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 hw/xen_disk.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index a402ac8..a618d8d 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -45,6 +45,8 @@ static int batch_maps   = 0;
 
 static int max_requests = 32;
 
+static int use_o_direct = 0;
+
 /* ------------------------------------------------------------- */
 
 #define BLOCK_SIZE  512
@@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
     }
 
     /* read-only ? */
-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
     } else {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  2013-03-07  4:15                                                       ` Stefano Stabellini
  2013-03-07 10:47                                                         ` [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
@ 2013-03-07 10:51                                                         ` Alex Bligh
  1 sibling, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-07 10:51 UTC (permalink / raw)
  To: Stefano Stabellini, Konrad Rzeszutek Wilk
  Cc: Alex Bligh, Stefano Stabellini, Ian Campbell, Jan Beulich, Xen Devel

Stefano,

--On 7 March 2013 04:15:39 +0000 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> Alex, can you submit a proper patch for QEMU upstream to remove
> BDRV_O_NOCACHE from qflags?
> Please add an explanation on why you are removing it and a link to this
> thread.

Done. Note that this applies cleanly to both 4.2 and unstable.

It's exactly the same patch as I've been running with, with the description
expanded a little. I haven't actually tested on unstable (just 4.2) but I can't
see why it wouldn't work.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-07 10:47                                                         ` [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
@ 2013-03-08  3:18                                                           ` Stefano Stabellini
  2013-03-08  9:25                                                             ` [PATCHv2] " Alex Bligh
                                                                               ` (2 more replies)
  0 siblings, 3 replies; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-08  3:18 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Ian Jackson, xen-devel, Jan Beulich

On Thu, 7 Mar 2013, Alex Bligh wrote:
> Due to what is almost certainly a kernel bug, writes with
> O_DIRECT may continue to reference the page after the write
> has been marked as completed, particularly in the case of
> TCP retransmit. In other scenarios, this "merely" risks
> data corruption on the write, but with Xen pages from domU
> are only transiently mapped into dom0's memory, resulting
> in kernel panics when they are subsequently accessed.
> 
> See:
>   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> for more details.
> 
> Signed-off-by: Alex Bligh <alex@alex.org.uk>
> ---
>  hw/xen_disk.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index a402ac8..a618d8d 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -45,6 +45,8 @@ static int batch_maps   = 0;
>  
>  static int max_requests = 32;
>  
> +static int use_o_direct = 0;
> +
>  /* ------------------------------------------------------------- */
>  
>  #define BLOCK_SIZE  512
> @@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
>      }
>  
>      /* read-only ? */
> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> +    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>      if (strcmp(blkdev->mode, "w") == 0) {
>          qflags |= BDRV_O_RDWR;
>      } else {

I would just remove use_o_direct and BDRV_O_NOCACHE altogether

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv2] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08  3:18                                                           ` Stefano Stabellini
@ 2013-03-08  9:25                                                             ` Alex Bligh
  2013-03-08  9:26                                                             ` [PATCH] " Alex Bligh
  2013-03-08 10:17                                                             ` George Dunlap
  2 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-08  9:25 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini
  Cc: Jan Beulich, Ian Jackson, Ian Campbell, Alex Bligh,
	Konrad Rzeszutek Wilk

Due to what is almost certainly a kernel bug, writes with
O_DIRECT may continue to reference the page after the write
has been marked as completed, particularly in the case of
TCP retransmit. In other scenarios, this "merely" risks
data corruption on the write, but with Xen pages from domU
are only transiently mapped into dom0's memory, resulting
in kernel panics when they are subsequently accessed.

See:
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 hw/xen_disk.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index a402ac8..14f8723 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
     }
 
     /* read-only ? */
-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
     } else {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08  3:18                                                           ` Stefano Stabellini
  2013-03-08  9:25                                                             ` [PATCHv2] " Alex Bligh
@ 2013-03-08  9:26                                                             ` Alex Bligh
  2013-03-08 10:17                                                             ` George Dunlap
  2 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-08  9:26 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Stefano Stabellini, Alex Bligh,
	Konrad Rzeszutek Wilk, Ian Jackson, xen-devel, Jan Beulich



--On 8 March 2013 03:18:22 +0000 Stefano Stabellini <stefano.stabellini@eu.citrix.com> wrote:

> I would just remove use_o_direct and BDRV_O_NOCACHE altogether

Done.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08  3:18                                                           ` Stefano Stabellini
  2013-03-08  9:25                                                             ` [PATCHv2] " Alex Bligh
  2013-03-08  9:26                                                             ` [PATCH] " Alex Bligh
@ 2013-03-08 10:17                                                             ` George Dunlap
  2013-03-08 10:27                                                               ` Alex Bligh
  2013-03-08 10:28                                                               ` George Dunlap
  2 siblings, 2 replies; 91+ messages in thread
From: George Dunlap @ 2013-03-08 10:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Alex Bligh, Ian Jackson,
	xen-devel, Jan Beulich

On Fri, Mar 8, 2013 at 3:18 AM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 7 Mar 2013, Alex Bligh wrote:
>> Due to what is almost certainly a kernel bug, writes with
>> O_DIRECT may continue to reference the page after the write
>> has been marked as completed, particularly in the case of
>> TCP retransmit. In other scenarios, this "merely" risks
>> data corruption on the write, but with Xen pages from domU
>> are only transiently mapped into dom0's memory, resulting
>> in kernel panics when they are subsequently accessed.
>>
>> See:
>>   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
>> for more details.
>>
>> Signed-off-by: Alex Bligh <alex@alex.org.uk>
>> ---
>>  hw/xen_disk.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
>> index a402ac8..a618d8d 100644
>> --- a/hw/xen_disk.c
>> +++ b/hw/xen_disk.c
>> @@ -45,6 +45,8 @@ static int batch_maps   = 0;
>>
>>  static int max_requests = 32;
>>
>> +static int use_o_direct = 0;
>> +
>>  /* ------------------------------------------------------------- */
>>
>>  #define BLOCK_SIZE  512
>> @@ -603,7 +605,7 @@ static int blk_init(struct XenDevice *xendev)
>>      }
>>
>>      /* read-only ? */
>> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>> +    qflags = (use_o_direct?BDRV_O_NOCACHE:0) | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>>      if (strcmp(blkdev->mode, "w") == 0) {
>>          qflags |= BDRV_O_RDWR;
>>      } else {
>
> I would just remove use_o_direct and BDRV_O_NOCACHE altogether

Wait, aren't O_DIRECT and BDRV_O_NOCACHE required for safety?  That
is, without these flags isn't it possible that the guest OS thinks
that the data has made it onto stable storage, while in fact it's
still in dom0's memory?  Or am I missing something?

 -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 10:17                                                             ` George Dunlap
@ 2013-03-08 10:27                                                               ` Alex Bligh
  2013-03-08 10:35                                                                 ` George Dunlap
  2013-03-08 10:28                                                               ` George Dunlap
  1 sibling, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-08 10:27 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Ian Jackson, xen-devel,
	Jan Beulich, Alex Bligh



--On 8 March 2013 10:17:12 +0000 George Dunlap <George.Dunlap@eu.citrix.com> wrote:

> Wait, aren't O_DIRECT and BDRV_O_NOCACHE required for safety?  That
> is, without these flags isn't it possible that the guest OS thinks
> that the data has made it onto stable storage, while in fact it's
> still in dom0's memory?  Or am I missing something?

That's why Konrad asked me to do the blktrace and check barrier writes
were getting through.

In any case, the argument about safety seems slightly off base when we've
already shown that even with O_DIRECT it's already returning before
the write is actually done (on any form of network based storage), and
moreover can crash dom0 and hence all the guests. If you have network
based storage, not using O_DIRECT is infinitely safer.

I suppose arguably it would be better if this was a configuration knob.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 10:17                                                             ` George Dunlap
  2013-03-08 10:27                                                               ` Alex Bligh
@ 2013-03-08 10:28                                                               ` George Dunlap
  2013-03-08 10:45                                                                 ` Alex Bligh
  1 sibling, 1 reply; 91+ messages in thread
From: George Dunlap @ 2013-03-08 10:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Alex Bligh, Ian Jackson,
	xen-devel, Jan Beulich

On Fri, Mar 8, 2013 at 10:17 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
>> I would just remove use_o_direct and BDRV_O_NOCACHE altogether
>
> Wait, aren't O_DIRECT and BDRV_O_NOCACHE required for safety?  That
> is, without these flags isn't it possible that the guest OS thinks
> that the data has made it onto stable storage, while in fact it's
> still in dom0's memory?  Or am I missing something?

And in any case, if it's a kernel bug it should be fixed in the kernel.

Alex, which dom0 kernel are you using? I actually seem to recall this
being a long-known bug with some versions of the pvops kernels; but I
thought it had long since been fixed.  Kondrad / IanC, can you
comment?

Alex, if that's the case and if you're using a distro kernel maybe you
should try to push for a backport?

 -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 10:27                                                               ` Alex Bligh
@ 2013-03-08 10:35                                                                 ` George Dunlap
  2013-03-08 10:50                                                                   ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: George Dunlap @ 2013-03-08 10:35 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Ian Jackson, xen-devel, Jan Beulich

On Fri, Mar 8, 2013 at 10:27 AM, Alex Bligh <alex@alex.org.uk> wrote:
>
>
> --On 8 March 2013 10:17:12 +0000 George Dunlap <George.Dunlap@eu.citrix.com>
> wrote:
>
>> Wait, aren't O_DIRECT and BDRV_O_NOCACHE required for safety?  That
>> is, without these flags isn't it possible that the guest OS thinks
>> that the data has made it onto stable storage, while in fact it's
>> still in dom0's memory?  Or am I missing something?
>
>
> That's why Konrad asked me to do the blktrace and check barrier writes
> were getting through.
>
> In any case, the argument about safety seems slightly off base when we've
> already shown that even with O_DIRECT it's already returning before
> the write is actually done (on any form of network based storage), and
> moreover can crash dom0 and hence all the guests. If you have network
> based storage, not using O_DIRECT is infinitely safer.
>
> I suppose arguably it would be better if this was a configuration knob.

Sorry, obviously I wasn't following that thread.

If you can verify that it's always safe to use qemu without O_DIRECT
(or that it's not any safer to use it), then you should just remove
it; no point in having options which have no practical effect.  But in
that case you should make sure to mention in the commit message why
O_DIRECT isn't actually needed.

 -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 10:28                                                               ` George Dunlap
@ 2013-03-08 10:45                                                                 ` Alex Bligh
  0 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-08 10:45 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Ian Jackson, xen-devel,
	Jan Beulich, Alex Bligh

George,

--On 8 March 2013 10:28:32 +0000 George Dunlap <George.Dunlap@eu.citrix.com> wrote:

>> Wait, aren't O_DIRECT and BDRV_O_NOCACHE required for safety?  That
>> is, without these flags isn't it possible that the guest OS thinks
>> that the data has made it onto stable storage, while in fact it's
>> still in dom0's memory?  Or am I missing something?
>
> And in any case, if it's a kernel bug it should be fixed in the kernel.

Well, in theory yes. In practice, it's a very difficult bug to fix it seems.
You need to track skbs.

Here's a set of patches which sort of fix it.

> Alex, which dom0 kernel are you using? I actually seem to recall this
> being a long-known bug with some versions of the pvops kernels; but I
> thought it had long since been fixed.  Kondrad / IanC, can you
> comment?

I'm using Ubuntu Precise's Kernel 3.2.0-32-generic on x86_64. However,
this bug is in every kernel back to (at least) 2007.

This thread is shorter that the one on xen-devel if you want to follow
the history:
  http://comments.gmane.org/gmane.linux.nfs/54325

> Alex, if that's the case and if you're using a distro kernel maybe you
> should try to push for a backport?

That would require it being fixed first!

In our lab we literally cannot boot Ubuntu cloud images (a standard
OS) as a guest without Xen crashing horribly using an NFS backend
without this patch. So, we have a choice. Either we work around the
kernel bug in Xen, or we wait until it's fixed in the kernel and
a very invasive backport is produced. Given the almost total lack
of interest in fixing this in the kernel (save for Ian Campbell's
patches which by his own omission he hasn't had the time to finish),
and given the fact that it's a one line fix in qemu, I know which
I prefer!

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 10:35                                                                 ` George Dunlap
@ 2013-03-08 10:50                                                                   ` Alex Bligh
  2013-03-08 11:18                                                                     ` George Dunlap
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-08 10:50 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Stefano Stabellini, Alex Bligh,
	Konrad Rzeszutek Wilk, Ian Jackson, xen-devel, Jan Beulich



--On 8 March 2013 10:35:43 +0000 George Dunlap <George.Dunlap@eu.citrix.com> wrote:

> If you can verify that it's always safe to use qemu without O_DIRECT
> (or that it's not any safer to use it), then you should just remove
> it; no point in having options which have no practical effect.  But in
> that case you should make sure to mention in the commit message why
> O_DIRECT isn't actually needed.

How about me adding the following:

This brings PV devices in line with emulated devices. Removing O_DIRECT
is safe as barrier operations are now correctly passed through.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 10:50                                                                   ` Alex Bligh
@ 2013-03-08 11:18                                                                     ` George Dunlap
  2013-03-08 11:40                                                                       ` [PATCHv3] " Alex Bligh
  2013-03-08 11:41                                                                       ` [PATCH] " Alex Bligh
  0 siblings, 2 replies; 91+ messages in thread
From: George Dunlap @ 2013-03-08 11:18 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Ian Jackson, xen-devel, Jan Beulich

On 08/03/13 10:50, Alex Bligh wrote:
>
> --On 8 March 2013 10:35:43 +0000 George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>
>> If you can verify that it's always safe to use qemu without O_DIRECT
>> (or that it's not any safer to use it), then you should just remove
>> it; no point in having options which have no practical effect.  But in
>> that case you should make sure to mention in the commit message why
>> O_DIRECT isn't actually needed.
> How about me adding the following:
>
> This brings PV devices in line with emulated devices. Removing O_DIRECT
> is safe as barrier operations are now correctly passed through.

That sounds good to me -- thanks, Alex.

  -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 11:18                                                                     ` George Dunlap
@ 2013-03-08 11:40                                                                       ` Alex Bligh
  2013-03-08 12:54                                                                         ` George Dunlap
  2013-03-14 18:37                                                                         ` Stefano Stabellini
  2013-03-08 11:41                                                                       ` [PATCH] " Alex Bligh
  1 sibling, 2 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-08 11:40 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
	Jan Beulich, Alex Bligh

Due to what is almost certainly a kernel bug, writes with
O_DIRECT may continue to reference the page after the write
has been marked as completed, particularly in the case of
TCP retransmit. In other scenarios, this "merely" risks
data corruption on the write, but with Xen pages from domU
are only transiently mapped into dom0's memory, resulting
in kernel panics when they are subsequently accessed.

This brings PV devices in line with emulated devices. Removing
O_DIRECT is safe as barrier operations are now correctly passed
through.

See:
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 hw/xen_disk.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index a402ac8..14f8723 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
     }
 
     /* read-only ? */
-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
     } else {
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 11:18                                                                     ` George Dunlap
  2013-03-08 11:40                                                                       ` [PATCHv3] " Alex Bligh
@ 2013-03-08 11:41                                                                       ` Alex Bligh
  1 sibling, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-08 11:41 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Stefano Stabellini, Alex Bligh,
	Konrad Rzeszutek Wilk, Ian Jackson, xen-devel, Jan Beulich



--On 8 March 2013 11:18:45 +0000 George Dunlap <george.dunlap@eu.citrix.com> wrote:

>> How about me adding the following:
>>
>> This brings PV devices in line with emulated devices. Removing O_DIRECT
>> is safe as barrier operations are now correctly passed through.
>
> That sounds good to me -- thanks, Alex.

Done.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 11:40                                                                       ` [PATCHv3] " Alex Bligh
@ 2013-03-08 12:54                                                                         ` George Dunlap
  2013-03-11 14:02                                                                           ` Alex Bligh
  2013-03-14 18:37                                                                         ` Stefano Stabellini
  1 sibling, 1 reply; 91+ messages in thread
From: George Dunlap @ 2013-03-08 12:54 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Ian Jackson, xen-devel, Jan Beulich

On Fri, Mar 8, 2013 at 11:40 AM, Alex Bligh <alex@alex.org.uk> wrote:
> Due to what is almost certainly a kernel bug, writes with
> O_DIRECT may continue to reference the page after the write
> has been marked as completed, particularly in the case of
> TCP retransmit. In other scenarios, this "merely" risks
> data corruption on the write, but with Xen pages from domU
> are only transiently mapped into dom0's memory, resulting
> in kernel panics when they are subsequently accessed.
>
> This brings PV devices in line with emulated devices. Removing
> O_DIRECT is safe as barrier operations are now correctly passed
> through.

Not qualified to comment on the technical merits of the patch, but re
the commit message:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

 -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 12:54                                                                         ` George Dunlap
@ 2013-03-11 14:02                                                                           ` Alex Bligh
  2013-03-11 14:42                                                                             ` George Dunlap
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-11 14:02 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Stefano Stabellini, Alex Bligh,
	Konrad Rzeszutek Wilk, Ian Jackson, xen-devel, Jan Beulich



--On 8 March 2013 12:54:16 +0000 George Dunlap 
<George.Dunlap@eu.citrix.com> wrote:

> On Fri, Mar 8, 2013 at 11:40 AM, Alex Bligh <alex@alex.org.uk> wrote:
>> Due to what is almost certainly a kernel bug, writes with
>> O_DIRECT may continue to reference the page after the write
>> has been marked as completed, particularly in the case of
>> TCP retransmit. In other scenarios, this "merely" risks
>> data corruption on the write, but with Xen pages from domU
>> are only transiently mapped into dom0's memory, resulting
>> in kernel panics when they are subsequently accessed.
>>
>> This brings PV devices in line with emulated devices. Removing
>> O_DIRECT is safe as barrier operations are now correctly passed
>> through.
>
> Not qualified to comment on the technical merits of the patch, but re
> the commit message:
>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

Any further thoughts on this one?

-- 
Alex Bligh




---------- Forwarded Message ----------
Date: 8 March 2013 11:40:44 +0000
From: Alex Bligh <alex@alex.org.uk>
To: xen-devel <xen-devel@lists.xen.org>, Stefano Stabellini 
<stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>, Alex Bligh <alex@alex.org.uk>, 
Ian Jackson <Ian.Jackson@eu.citrix.com>, Konrad Rzeszutek Wilk 
<konrad.wilk@oracle.com>, Jan Beulich <JBeulich@suse.com>, George Dunlap 
<George.Dunlap@eu.citrix.com>
Subject: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default 
as it results in crashes.

Due to what is almost certainly a kernel bug, writes with
O_DIRECT may continue to reference the page after the write
has been marked as completed, particularly in the case of
TCP retransmit. In other scenarios, this "merely" risks
data corruption on the write, but with Xen pages from domU
are only transiently mapped into dom0's memory, resulting
in kernel panics when they are subsequently accessed.

This brings PV devices in line with emulated devices. Removing
O_DIRECT is safe as barrier operations are now correctly passed
through.

See:
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
for more details.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
 hw/xen_disk.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index a402ac8..14f8723 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
     }

     /* read-only ? */
-    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
     } else {
-- 
1.7.4.1



---------- End Forwarded Message ----------



-- 
Alex Bligh

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-11 14:02                                                                           ` Alex Bligh
@ 2013-03-11 14:42                                                                             ` George Dunlap
  2013-03-11 17:48                                                                               ` Konrad Rzeszutek Wilk
  2013-03-12 12:08                                                                               ` Ian Campbell
  0 siblings, 2 replies; 91+ messages in thread
From: George Dunlap @ 2013-03-11 14:42 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Stefano Stabellini, Konrad Rzeszutek Wilk,
	Ian Jackson, xen-devel, Jan Beulich, Anthony Perard

On 11/03/13 14:02, Alex Bligh wrote:
>
> --On 8 March 2013 12:54:16 +0000 George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>
>> On Fri, Mar 8, 2013 at 11:40 AM, Alex Bligh <alex@alex.org.uk> wrote:
>>> Due to what is almost certainly a kernel bug, writes with
>>> O_DIRECT may continue to reference the page after the write
>>> has been marked as completed, particularly in the case of
>>> TCP retransmit. In other scenarios, this "merely" risks
>>> data corruption on the write, but with Xen pages from domU
>>> are only transiently mapped into dom0's memory, resulting
>>> in kernel panics when they are subsequently accessed.
>>>
>>> This brings PV devices in line with emulated devices. Removing
>>> O_DIRECT is safe as barrier operations are now correctly passed
>>> through.
>> Not qualified to comment on the technical merits of the patch, but re
>> the commit message:
>>
>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> Any further thoughts on this one?

I believe that Stefano is the person to commit it, but he's on holiday 
for two weeks.

  -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-11 14:42                                                                             ` George Dunlap
@ 2013-03-11 17:48                                                                               ` Konrad Rzeszutek Wilk
  2013-03-11 17:55                                                                                 ` Ian Jackson
  2013-03-12 12:08                                                                               ` Ian Campbell
  1 sibling, 1 reply; 91+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-11 17:48 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Campbell, Stefano Stabellini, Alex Bligh, Ian Jackson,
	xen-devel, Jan Beulich, Anthony Perard

On Mon, Mar 11, 2013 at 02:42:21PM +0000, George Dunlap wrote:
> On 11/03/13 14:02, Alex Bligh wrote:
> >
> >--On 8 March 2013 12:54:16 +0000 George Dunlap
> ><George.Dunlap@eu.citrix.com> wrote:
> >
> >>On Fri, Mar 8, 2013 at 11:40 AM, Alex Bligh <alex@alex.org.uk> wrote:
> >>>Due to what is almost certainly a kernel bug, writes with
> >>>O_DIRECT may continue to reference the page after the write
> >>>has been marked as completed, particularly in the case of
> >>>TCP retransmit. In other scenarios, this "merely" risks
> >>>data corruption on the write, but with Xen pages from domU
> >>>are only transiently mapped into dom0's memory, resulting
> >>>in kernel panics when they are subsequently accessed.
> >>>
> >>>This brings PV devices in line with emulated devices. Removing
> >>>O_DIRECT is safe as barrier operations are now correctly passed
> >>>through.
> >>Not qualified to comment on the technical merits of the patch, but re
> >>the commit message:
> >>
> >>Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> >Any further thoughts on this one?
> 
> I believe that Stefano is the person to commit it, but he's on
> holiday for two weeks.

Who is his backup?
> 
>  -George

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-11 17:48                                                                               ` Konrad Rzeszutek Wilk
@ 2013-03-11 17:55                                                                                 ` Ian Jackson
  2013-03-14 17:06                                                                                   ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Jackson @ 2013-03-11 17:55 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Ian Campbell, Stefano Stabellini, George Dunlap, xen-devel,
	Jan Beulich, Anthony Perard, Alex Bligh

Konrad Rzeszutek Wilk writes ("Re: [Xen-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> On Mon, Mar 11, 2013 at 02:42:21PM +0000, George Dunlap wrote:
> > I believe that Stefano is the person to commit it, but he's on
> > holiday for two weeks.
> 
> Who is his backup?

I think it will have to be me.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-11 14:42                                                                             ` George Dunlap
  2013-03-11 17:48                                                                               ` Konrad Rzeszutek Wilk
@ 2013-03-12 12:08                                                                               ` Ian Campbell
  1 sibling, 0 replies; 91+ messages in thread
From: Ian Campbell @ 2013-03-12 12:08 UTC (permalink / raw)
  To: George Dunlap
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, Stefano Stabellini,
	Ian Jackson, xen-devel, Jan Beulich, Anthony Perard

On Mon, 2013-03-11 at 14:42 +0000, George Dunlap wrote:
> On 11/03/13 14:02, Alex Bligh wrote:
> >
> > --On 8 March 2013 12:54:16 +0000 George Dunlap
> > <George.Dunlap@eu.citrix.com> wrote:
> >
> >> On Fri, Mar 8, 2013 at 11:40 AM, Alex Bligh <alex@alex.org.uk> wrote:
> >>> Due to what is almost certainly a kernel bug, writes with
> >>> O_DIRECT may continue to reference the page after the write
> >>> has been marked as completed, particularly in the case of
> >>> TCP retransmit. In other scenarios, this "merely" risks
> >>> data corruption on the write, but with Xen pages from domU
> >>> are only transiently mapped into dom0's memory, resulting
> >>> in kernel panics when they are subsequently accessed.
> >>>
> >>> This brings PV devices in line with emulated devices. Removing
> >>> O_DIRECT is safe as barrier operations are now correctly passed
> >>> through.
> >> Not qualified to comment on the technical merits of the patch, but re
> >> the commit message:
> >>
> >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> > Any further thoughts on this one?
> 
> I believe that Stefano is the person to commit it, but he's on holiday 
> for two weeks.

I don't think its that long -- IIRC he's due back tomorrow (although
perhaps not in the office until Thursday).

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-11 17:55                                                                                 ` Ian Jackson
@ 2013-03-14 17:06                                                                                   ` Alex Bligh
  2013-03-14 18:26                                                                                     ` Ian Jackson
  0 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-14 17:06 UTC (permalink / raw)
  To: Ian Jackson, Konrad Rzeszutek Wilk
  Cc: Ian Campbell, Stefano Stabellini, George Dunlap, xen-devel,
	Jan Beulich, Anthony Perard, Alex Bligh



--On 11 March 2013 17:55:06 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> 
wrote:

> Konrad Rzeszutek Wilk writes ("Re: [Xen-devel] [PATCHv3] QEMU(upstream):
> Disable xen's use of O_DIRECT by default as it results in crashes."):
>> On Mon, Mar 11, 2013 at 02:42:21PM +0000, George Dunlap wrote:
>> > I believe that Stefano is the person to commit it, but he's on
>> > holiday for two weeks.
>>
>> Who is his backup?
>
> I think it will have to be me.

Any update on this?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-14 17:06                                                                                   ` Alex Bligh
@ 2013-03-14 18:26                                                                                     ` Ian Jackson
  0 siblings, 0 replies; 91+ messages in thread
From: Ian Jackson @ 2013-03-14 18:26 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Stefano Stabellini, George Dunlap,
	Konrad Rzeszutek Wilk, xen-devel, Jan Beulich, Anthony Perard

Alex Bligh writes ("Re: [Xen-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> --On 11 March 2013 17:55:06 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> 
> wrote:
> > I think it will have to be me.
> 
> Any update on this?

As Ian C wrote, Stefano is in fact back today.  I'll draw his
attention to this thread.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-08 11:40                                                                       ` [PATCHv3] " Alex Bligh
  2013-03-08 12:54                                                                         ` George Dunlap
@ 2013-03-14 18:37                                                                         ` Stefano Stabellini
  2013-03-14 19:30                                                                           ` Ian Jackson
  1 sibling, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-14 18:37 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Stefano Stabellini, George Dunlap,
	Konrad Rzeszutek Wilk, Ian Jackson, xen-devel, Jan Beulich

On Fri, 8 Mar 2013, Alex Bligh wrote:
> Due to what is almost certainly a kernel bug, writes with
> O_DIRECT may continue to reference the page after the write
> has been marked as completed, particularly in the case of
> TCP retransmit. In other scenarios, this "merely" risks
> data corruption on the write, but with Xen pages from domU
> are only transiently mapped into dom0's memory, resulting
> in kernel panics when they are subsequently accessed.
> 
> This brings PV devices in line with emulated devices. Removing
> O_DIRECT is safe as barrier operations are now correctly passed
> through.
> 
> See:
>   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> for more details.
> 
> Signed-off-by: Alex Bligh <alex@alex.org.uk>


Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  hw/xen_disk.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index a402ac8..14f8723 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev)
>      }
>  
>      /* read-only ? */
> -    qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
> +    qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
>      if (strcmp(blkdev->mode, "w") == 0) {
>          qflags |= BDRV_O_RDWR;
>      } else {
> -- 
> 1.7.4.1
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-14 18:37                                                                         ` Stefano Stabellini
@ 2013-03-14 19:30                                                                           ` Ian Jackson
  2013-03-14 19:56                                                                             ` Alex Bligh
  2013-03-15  9:28                                                                             ` Ian Campbell
  0 siblings, 2 replies; 91+ messages in thread
From: Ian Jackson @ 2013-03-14 19:30 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, xen-devel,
	Jan Beulich, Alex Bligh

Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> On Fri, 8 Mar 2013, Alex Bligh wrote:
> > Due to what is almost certainly a kernel bug, writes with
> > O_DIRECT may continue to reference the page after the write
> > has been marked as completed, particularly in the case of
> > TCP retransmit. In other scenarios, this "merely" risks
> > data corruption on the write, but with Xen pages from domU
> > are only transiently mapped into dom0's memory, resulting
> > in kernel panics when they are subsequently accessed.
> > 
> > This brings PV devices in line with emulated devices. Removing
> > O_DIRECT is safe as barrier operations are now correctly passed
> > through.
> > 
> > See:
> >   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> > for more details.
> > 
> > Signed-off-by: Alex Bligh <alex@alex.org.uk>
> 
> 
> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

Marvellous, pushed to staging/qemu-upstream-unstable.git

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-14 19:30                                                                           ` Ian Jackson
@ 2013-03-14 19:56                                                                             ` Alex Bligh
  2013-03-15  9:28                                                                             ` Ian Campbell
  1 sibling, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-14 19:56 UTC (permalink / raw)
  To: Ian Jackson, Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, xen-devel,
	Jan Beulich, Alex Bligh

Ian, Stefano,

--On 14 March 2013 19:30:22 +0000 Ian Jackson <Ian.Jackson@eu.citrix.com> 
wrote:

>> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>
> Marvellous, pushed to staging/qemu-upstream-unstable.git

Thanks!

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-14 19:30                                                                           ` Ian Jackson
  2013-03-14 19:56                                                                             ` Alex Bligh
@ 2013-03-15  9:28                                                                             ` Ian Campbell
  2013-03-15 10:43                                                                               ` Stefano Stabellini
  1 sibling, 1 reply; 91+ messages in thread
From: Ian Campbell @ 2013-03-15  9:28 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, George Dunlap,
	Stefano Stabellini, xen-devel, Jan Beulich

On Thu, 2013-03-14 at 19:30 +0000, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> > On Fri, 8 Mar 2013, Alex Bligh wrote:
> > > Due to what is almost certainly a kernel bug, writes with
> > > O_DIRECT may continue to reference the page after the write
> > > has been marked as completed, particularly in the case of
> > > TCP retransmit. In other scenarios, this "merely" risks
> > > data corruption on the write, but with Xen pages from domU
> > > are only transiently mapped into dom0's memory, resulting
> > > in kernel panics when they are subsequently accessed.
> > > 
> > > This brings PV devices in line with emulated devices. Removing
> > > O_DIRECT is safe as barrier operations are now correctly passed
> > > through.
> > > 
> > > See:
> > >   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> > > for more details.
> > > 
> > > Signed-off-by: Alex Bligh <alex@alex.org.uk>
> > 
> > 
> > Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> Marvellous, pushed to staging/qemu-upstream-unstable.git

Isn't one of the guiding principals of this tree that it should contain
only backports from upstream qemu.git#master? In order to avoid
regressions in the future, accidental forking, etc etc.

I don't see this commit in upstream qemu.git and AFAICT this patch
wasn't even sent to qemu-devel.

There may well be exceptions to this e.g. where qemu-devel is slow and
the matter is urgent or where the issue only pertains to the stable
branch but I don't think there are any exceptional circumstances here.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15  9:28                                                                             ` Ian Campbell
@ 2013-03-15 10:43                                                                               ` Stefano Stabellini
  2013-03-15 11:21                                                                                 ` Ian Jackson
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-15 10:43 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Alex Bligh, Konrad Rzeszutek Wilk, George Dunlap,
	Stefano Stabellini, Ian Jackson, xen-devel, Jan Beulich

On Fri, 15 Mar 2013, Ian Campbell wrote:
> On Thu, 2013-03-14 at 19:30 +0000, Ian Jackson wrote:
> > Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> > > On Fri, 8 Mar 2013, Alex Bligh wrote:
> > > > Due to what is almost certainly a kernel bug, writes with
> > > > O_DIRECT may continue to reference the page after the write
> > > > has been marked as completed, particularly in the case of
> > > > TCP retransmit. In other scenarios, this "merely" risks
> > > > data corruption on the write, but with Xen pages from domU
> > > > are only transiently mapped into dom0's memory, resulting
> > > > in kernel panics when they are subsequently accessed.
> > > > 
> > > > This brings PV devices in line with emulated devices. Removing
> > > > O_DIRECT is safe as barrier operations are now correctly passed
> > > > through.
> > > > 
> > > > See:
> > > >   http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> > > > for more details.
> > > > 
> > > > Signed-off-by: Alex Bligh <alex@alex.org.uk>
> > > 
> > > 
> > > Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > 
> > Marvellous, pushed to staging/qemu-upstream-unstable.git
> 
> Isn't one of the guiding principals of this tree that it should contain
> only backports from upstream qemu.git#master? In order to avoid
> regressions in the future, accidental forking, etc etc.
> 
> I don't see this commit in upstream qemu.git and AFAICT this patch
> wasn't even sent to qemu-devel.
> 
> There may well be exceptions to this e.g. where qemu-devel is slow and
> the matter is urgent or where the issue only pertains to the stable
> branch but I don't think there are any exceptional circumstances here.

I agree. My intention was to send the patch upstream and then backport
it to qemu-upstream-unstable.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 10:43                                                                               ` Stefano Stabellini
@ 2013-03-15 11:21                                                                                 ` Ian Jackson
  2013-03-15 11:28                                                                                   ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Jackson @ 2013-03-15 11:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, xen-devel,
	Jan Beulich, Alex Bligh

Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> On Fri, 15 Mar 2013, Ian Campbell wrote:
> > There may well be exceptions to this e.g. where qemu-devel is slow and
> > the matter is urgent or where the issue only pertains to the stable
> > branch but I don't think there are any exceptional circumstances here.
> 
> I agree. My intention was to send the patch upstream and then backport
> it to qemu-upstream-unstable.

Oh.  I must have misunderstood our conversation on IRC:

18:27 <Diziet> stefano_s: See "Disable xen's use of O_DIRECT by
               default" from Alex Bligh.
18:27 <Diziet> Or tell me to go ahead :-).
18:38 <stefano_s> Diziet:  go ahead. I'll submit a pull request
                  for that patch to QEMU upstream in the next few days
18:39 <Diziet> OK.  Shall I put your ack on it ?
18:40 <stefano_s> Diziet: yes, I just sent an email with it
18:41 <Diziet> Ta

I thought you were telling me to commit it to your tree.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 11:21                                                                                 ` Ian Jackson
@ 2013-03-15 11:28                                                                                   ` Stefano Stabellini
  2013-03-15 11:37                                                                                     ` Ian Jackson
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-15 11:28 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Alex Bligh,
	Stefano Stabellini, xen-devel, George Dunlap, Jan Beulich

On Fri, 15 Mar 2013, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> > On Fri, 15 Mar 2013, Ian Campbell wrote:
> > > There may well be exceptions to this e.g. where qemu-devel is slow and
> > > the matter is urgent or where the issue only pertains to the stable
> > > branch but I don't think there are any exceptional circumstances here.
> > 
> > I agree. My intention was to send the patch upstream and then backport
> > it to qemu-upstream-unstable.
> 
> Oh.  I must have misunderstood our conversation on IRC:
> 
> 18:27 <Diziet> stefano_s: See "Disable xen's use of O_DIRECT by
>                default" from Alex Bligh.
> 18:27 <Diziet> Or tell me to go ahead :-).
> 18:38 <stefano_s> Diziet:  go ahead. I'll submit a pull request
>                   for that patch to QEMU upstream in the next few days
> 18:39 <Diziet> OK.  Shall I put your ack on it ?
> 18:40 <stefano_s> Diziet: yes, I just sent an email with it
> 18:41 <Diziet> Ta
> 
> I thought you were telling me to commit it to your tree.

Right, we misunderstood each other.
I thought you were going to commit it to qemu-xen-unstable.git

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 11:28                                                                                   ` Stefano Stabellini
@ 2013-03-15 11:37                                                                                     ` Ian Jackson
  2013-03-15 11:43                                                                                       ` Stefano Stabellini
  0 siblings, 1 reply; 91+ messages in thread
From: Ian Jackson @ 2013-03-15 11:37 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, xen-devel,
	Jan Beulich, Alex Bligh

Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> On Fri, 15 Mar 2013, Ian Jackson wrote:
> > I thought you were telling me to commit it to your tree.
> 
> Right, we misunderstood each other.
> I thought you were going to commit it to qemu-xen-unstable.git

Oh I see.  I thought we already had the equivalent there.  Let me
check...

...no, we don't.  The code there is somewhat different.  How about
the patch below ?

Stefano, shall I revert the commit in qemu-upstream-unstable ?

Ian.

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index 33a5531..ee8d36f 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -635,7 +635,7 @@ static int blk_init(struct XenDevice *xendev)
 	return -1;
 
     /* read-only ? */
-    qflags = BDRV_O_NOCACHE;
+    qflags = BDRV_O_CACHE_WB;
     if (strcmp(blkdev->mode, "w") == 0) {
 	mode   = O_RDWR;
 	qflags |= BDRV_O_RDWR;

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 11:37                                                                                     ` Ian Jackson
@ 2013-03-15 11:43                                                                                       ` Stefano Stabellini
  2013-03-15 12:43                                                                                         ` Alex Bligh
                                                                                                           ` (2 more replies)
  0 siblings, 3 replies; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-15 11:43 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, Alex Bligh,
	Stefano Stabellini, xen-devel, George Dunlap, Jan Beulich

On Fri, 15 Mar 2013, Ian Jackson wrote:
> Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> > On Fri, 15 Mar 2013, Ian Jackson wrote:
> > > I thought you were telling me to commit it to your tree.
> > 
> > Right, we misunderstood each other.
> > I thought you were going to commit it to qemu-xen-unstable.git
> 
> Oh I see.  I thought we already had the equivalent there.  Let me
> check...
> 
> ...no, we don't.  The code there is somewhat different.  How about
> the patch below ?

Yes, the patch below should do it.

> Stefano, shall I revert the commit in qemu-upstream-unstable ?

No, no need. After all I think it's going to be committed to upstream
QEMU without any issues.



> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index 33a5531..ee8d36f 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -635,7 +635,7 @@ static int blk_init(struct XenDevice *xendev)
>  	return -1;
>  
>      /* read-only ? */
> -    qflags = BDRV_O_NOCACHE;
> +    qflags = BDRV_O_CACHE_WB;
>      if (strcmp(blkdev->mode, "w") == 0) {
>  	mode   = O_RDWR;
>  	qflags |= BDRV_O_RDWR;
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 11:43                                                                                       ` Stefano Stabellini
@ 2013-03-15 12:43                                                                                         ` Alex Bligh
  2013-03-15 12:50                                                                                           ` Ian Campbell
  2013-03-15 18:31                                                                                         ` Ian Jackson
  2013-03-18 10:29                                                                                         ` Alex Bligh
  2 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-15 12:43 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Jackson
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap,
	Stefano Stabellini, xen-devel, Jan Beulich, Alex Bligh



--On 15 March 2013 11:43:33 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

>> Stefano, shall I revert the commit in qemu-upstream-unstable ?
>
> No, no need. After all I think it's going to be committed to upstream
> QEMU without any issues.

This seems to have made it out the of staging/qemu-upstream-unstable.git
into the normal qemu-upstream-unstable.git

Can I propose it for inclusion into staging/qemu-upstream-4.2-testing.git ?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 12:43                                                                                         ` Alex Bligh
@ 2013-03-15 12:50                                                                                           ` Ian Campbell
  0 siblings, 0 replies; 91+ messages in thread
From: Ian Campbell @ 2013-03-15 12:50 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Konrad Rzeszutek Wilk, George Dunlap, Stefano Stabellini,
	Ian Jackson, xen-devel, Jan Beulich

On Fri, 2013-03-15 at 12:43 +0000, Alex Bligh wrote:
> Can I propose it for inclusion into staging/qemu-upstream-4.2-testing.git ?

I think we definitely want to wait for it to be accepted by upstream
qemu.git before compounding the existing mistake any further.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 11:43                                                                                       ` Stefano Stabellini
  2013-03-15 12:43                                                                                         ` Alex Bligh
@ 2013-03-15 18:31                                                                                         ` Ian Jackson
  2013-03-18 10:29                                                                                         ` Alex Bligh
  2 siblings, 0 replies; 91+ messages in thread
From: Ian Jackson @ 2013-03-15 18:31 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap, xen-devel,
	Jan Beulich, Alex Bligh

Stefano Stabellini writes ("Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes."):
> Yes, the patch below should do it.

I've committed that to qemu-xen-unstable.git.  We should backport it
to 4.2 when it's out of staging.

Ian.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-15 11:43                                                                                       ` Stefano Stabellini
  2013-03-15 12:43                                                                                         ` Alex Bligh
  2013-03-15 18:31                                                                                         ` Ian Jackson
@ 2013-03-18 10:29                                                                                         ` Alex Bligh
  2013-03-18 11:47                                                                                           ` Stefano Stabellini
  2 siblings, 1 reply; 91+ messages in thread
From: Alex Bligh @ 2013-03-18 10:29 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Jackson
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap,
	Stefano Stabellini, xen-devel, Jan Beulich, Alex Bligh

Stefano,

--On 15 March 2013 11:43:33 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

>> Stefano, shall I revert the commit in qemu-upstream-unstable ?
>
> No, no need. After all I think it's going to be committed to upstream
> QEMU without any issues.

Would it be helpful for me to send this to qemu-devel (in which case
I presume you are OK with me transcribing your signed-off tag
from this list), or are you planning to issue a pull request?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-18 10:29                                                                                         ` Alex Bligh
@ 2013-03-18 11:47                                                                                           ` Stefano Stabellini
  2013-03-18 12:21                                                                                             ` Alex Bligh
  0 siblings, 1 reply; 91+ messages in thread
From: Stefano Stabellini @ 2013-03-18 11:47 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap,
	Stefano Stabellini, Ian Jackson, xen-devel, Jan Beulich

On Mon, 18 Mar 2013, Alex Bligh wrote:
> Stefano,
> 
> --On 15 March 2013 11:43:33 +0000 Stefano Stabellini 
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> >> Stefano, shall I revert the commit in qemu-upstream-unstable ?
> >
> > No, no need. After all I think it's going to be committed to upstream
> > QEMU without any issues.
> 
> Would it be helpful for me to send this to qemu-devel (in which case
> I presume you are OK with me transcribing your signed-off tag
> from this list), or are you planning to issue a pull request?

Yes, it would be helpful, but you need to add my acked-by, not my
signed-off-by.
Thanks,

Stefano

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
  2013-03-18 11:47                                                                                           ` Stefano Stabellini
@ 2013-03-18 12:21                                                                                             ` Alex Bligh
  0 siblings, 0 replies; 91+ messages in thread
From: Alex Bligh @ 2013-03-18 12:21 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Ian Campbell, Konrad Rzeszutek Wilk, George Dunlap,
	Stefano Stabellini, Ian Jackson, xen-devel, Jan Beulich,
	Alex Bligh



--On 18 March 2013 11:47:47 +0000 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

>> Would it be helpful for me to send this to qemu-devel (in which case
>> I presume you are OK with me transcribing your signed-off tag
>> from this list), or are you planning to issue a pull request?
>
> Yes, it would be helpful,

Done

> but you need to add my acked-by, not my signed-off-by.

Sorry - that's what I meant.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2013-03-18 12:21 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-14 14:54 Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
2012-12-17 10:10 ` Jan Beulich
2012-12-17 17:09   ` Alex Bligh
2013-01-16 10:56   ` Alex Bligh
2013-01-16 14:34     ` Stefano Stabellini
2013-01-16 15:06       ` Alex Bligh
2013-01-16 16:00         ` Alex Bligh
2013-01-16 16:27         ` Stefano Stabellini
2013-01-16 17:13           ` Alex Bligh
2013-01-16 17:33             ` Stefano Stabellini
2013-01-16 17:39               ` Stefano Stabellini
2013-01-16 18:14                 ` Alex Bligh
2013-01-16 18:49                   ` Stefano Stabellini
2013-01-16 19:00                     ` Stefano Stabellini
2013-01-17  7:58                       ` Alex Bligh
2013-01-16 18:12               ` Alex Bligh
2013-01-21 15:15               ` Alex Bligh
2013-01-21 15:23                 ` Ian Campbell
2013-01-21 15:35                   ` Alex Bligh
2013-01-21 15:50                     ` Ian Campbell
2013-01-21 16:33                       ` Alex Bligh
2013-01-21 16:51                         ` Ian Campbell
2013-01-21 17:06                           ` Alex Bligh
2013-01-21 17:29                             ` Ian Campbell
2013-01-21 17:31                           ` Alex Bligh
2013-01-21 17:32                             ` Ian Campbell
2013-01-21 18:14                               ` Alex Bligh
2013-01-22 10:05                                 ` Ian Campbell
2013-01-22 13:02                                   ` Alex Bligh
2013-01-22 13:13                                     ` Ian Campbell
2013-01-21 20:37                           ` Alex Bligh
2013-01-22 10:07                             ` Ian Campbell
2013-01-22 13:01                               ` Alex Bligh
2013-01-22 13:14                                 ` Ian Campbell
2013-01-22 13:18                                   ` Alex Bligh
2013-01-22 10:13                             ` Ian Campbell
2013-01-22 12:59                               ` Alex Bligh
2013-01-22 15:46                                 ` Stefano Stabellini
2013-01-22 15:42                             ` Stefano Stabellini
2013-01-22 16:09                               ` Stefano Stabellini
2013-01-22 20:31                                 ` Alex Bligh
2013-01-23 11:52                                   ` Stefano Stabellini
2013-01-23 15:19                                     ` Alex Bligh
2013-01-23 16:29                                       ` Stefano Stabellini
2013-01-25 11:28                                         ` Alex Bligh
2013-02-05 15:40                                           ` Alex Bligh
2013-02-22 17:28                                             ` Alex Bligh
2013-02-22 17:41                                               ` Konrad Rzeszutek Wilk
2013-02-22 18:00                                                 ` Stefano Stabellini
2013-02-22 19:53                                                 ` Alex Bligh
2013-03-06 11:50                                                   ` Alex Bligh
2013-03-07  1:01                                                     ` Konrad Rzeszutek Wilk
2013-03-07  4:15                                                       ` Stefano Stabellini
2013-03-07 10:47                                                         ` [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
2013-03-08  3:18                                                           ` Stefano Stabellini
2013-03-08  9:25                                                             ` [PATCHv2] " Alex Bligh
2013-03-08  9:26                                                             ` [PATCH] " Alex Bligh
2013-03-08 10:17                                                             ` George Dunlap
2013-03-08 10:27                                                               ` Alex Bligh
2013-03-08 10:35                                                                 ` George Dunlap
2013-03-08 10:50                                                                   ` Alex Bligh
2013-03-08 11:18                                                                     ` George Dunlap
2013-03-08 11:40                                                                       ` [PATCHv3] " Alex Bligh
2013-03-08 12:54                                                                         ` George Dunlap
2013-03-11 14:02                                                                           ` Alex Bligh
2013-03-11 14:42                                                                             ` George Dunlap
2013-03-11 17:48                                                                               ` Konrad Rzeszutek Wilk
2013-03-11 17:55                                                                                 ` Ian Jackson
2013-03-14 17:06                                                                                   ` Alex Bligh
2013-03-14 18:26                                                                                     ` Ian Jackson
2013-03-12 12:08                                                                               ` Ian Campbell
2013-03-14 18:37                                                                         ` Stefano Stabellini
2013-03-14 19:30                                                                           ` Ian Jackson
2013-03-14 19:56                                                                             ` Alex Bligh
2013-03-15  9:28                                                                             ` Ian Campbell
2013-03-15 10:43                                                                               ` Stefano Stabellini
2013-03-15 11:21                                                                                 ` Ian Jackson
2013-03-15 11:28                                                                                   ` Stefano Stabellini
2013-03-15 11:37                                                                                     ` Ian Jackson
2013-03-15 11:43                                                                                       ` Stefano Stabellini
2013-03-15 12:43                                                                                         ` Alex Bligh
2013-03-15 12:50                                                                                           ` Ian Campbell
2013-03-15 18:31                                                                                         ` Ian Jackson
2013-03-18 10:29                                                                                         ` Alex Bligh
2013-03-18 11:47                                                                                           ` Stefano Stabellini
2013-03-18 12:21                                                                                             ` Alex Bligh
2013-03-08 11:41                                                                       ` [PATCH] " Alex Bligh
2013-03-08 10:28                                                               ` George Dunlap
2013-03-08 10:45                                                                 ` Alex Bligh
2013-03-07 10:51                                                         ` Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
2013-03-07  8:16                                                       ` Alex Bligh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.