From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Date: Wed, 23 Jan 2013 16:29:20 +0000 Message-ID: References: <5B4525F296F6ABEB38B0E614@nimrod.local> <50CEFDA602000078000B0B11@nat28.tlf.novell.com> <3B1D0701EAEA6532CEA91EA0@Ximines.local> <77822E2DDAEA8F94631B6A52@Ximines.local> <1358781790.3279.224.camel@zakaz.uk.xensource.com> <1358783420.3279.235.camel@zakaz.uk.xensource.com> <19EA31DDC3BEF4D66B42CBAC@Ximines.local> <4D54EE421529BA5E0A511E31@nimrod.local> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Alex Bligh Cc: Konrad Wilk , Xen Devel , Ian Campbell , Jan Beulich , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Wed, 23 Jan 2013, Alex Bligh wrote: > Ian, Stefano, > > First bit of good news: Avoiding using O_DIRECT in xen_disk.c > fixes the problem (see patch below for what we tested). > > Therefore I think this problem will potentially occur: > > * Using qemu-xen or qemu-traditional (both of which use O_DIRECT) > > * With anything that uses a grant of another domX's memory (I > suspect that includes PV guests as well as HVM). > > * Whenever the dom0 uses TCP to the backing device. That includes > NFS and iSCSI. > > Disabling O_DIRECT is therefore a workaround, but I suspect not > an attractive workaround performance-wise. Fixing it properly > would appear to require Ian C's skbuff page tracking stuff. > > >> I suspect the best possible answer is a > >> cache=[qemu-cache-identifier] > >> config key, which gets put in xenstore, and which xen_disk.c then > >> interprets using the same routine QEMU does itself for cache= on the > >> command line, then uses exactly those BDEV flags. > > > > that would be nice to have > > Good > > >> For completeness one could also add an emulcache= option and just > >> pass that straight through to the qemu command line for the emulated > >> drives. > > > > In qemu-xen it should be already possible to change the cache setting > > for the IDE disk by passing the right command line option to QEMU. Not > > ideal, but it would work. > > It would indeed, but 'cache=writeback' is currently hard coded. I was > planning to change that to 'cache=%s' and pass the value of > emulcache=. > > >> I had a quick look at this on the train and it appears that to do it > >> properly requires fiddling with lex file and possibly xenstore, i.e. > >> is not completely trivial. > >> > >> An alternative more disgusting arrangement would be to overload one > >> of the existing options. > >> > >> What's the right approach here, and have I any hope of getting a patch > >> into 4.2.2? Not a disaster if not as I already need to maintain a local > >> tree due to the absence of HVM live migrate on qemu-upstream. > > > > Backports are only for bug fixes. > > Sure - and it would seem to me dom0 crashing horribly is a bug! I > have a reliable way to replicate it, but it can happen to others. > > In terms of a local fix, a one line change to xen_disk is all I need. > > What I'm asking is what is the right way to fix this bug in 4.2. > At one extreme, I can probably cook up a 'let's do it properly' patch > which takes the proposed cache= option (most useful, most intrusive). > At the other extreme, we could just apply the patch below or a variant > of it conditional on a configure flag (least useful, least intrusive). > > > However it might be possible to > > backport a simple patch that in case of opening failure with O_DIRECT, > > tries again without it (see appended). I don't think that would work > > for you though, unless you manage to mount NFS in a way that would > > return EINVAL on open(O_DIRECT), like it used to. > > I don't think that will work for any of the problem cases I've identified > above as NFS always permits O_DIRECT opens (I'm not even sure whether > CONFIG_NFS_DIRECTIO still exists), and so does iSCSI. > > -- > Alex Bligh > > diff --git a/hw/xen_disk.c b/hw/xen_disk.c > index a402ac8..1c3a6f5 100644 > --- a/hw/xen_disk.c > +++ b/hw/xen_disk.c > @@ -603,7 +603,7 @@ static int blk_init(struct XenDevice *xendev) > } > > /* read-only ? */ > - qflags = BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO; > + qflags = /* BDRV_O_NOCACHE | */ BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO; > if (strcmp(blkdev->mode, "w") == 0) { > qflags |= BDRV_O_RDWR; > } else { Before going for something like that I would like a confirmation from Konrad about blkfront behavior regarding barriers and BLKIF_OP_FLUSH_DISKCACHE. I certainly don't want to risk data corruptions.