From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Date: Tue, 22 Jan 2013 15:42:18 +0000 Message-ID: References: <5B4525F296F6ABEB38B0E614@nimrod.local> <50CEFDA602000078000B0B11@nat28.tlf.novell.com> <3B1D0701EAEA6532CEA91EA0@Ximines.local> <77822E2DDAEA8F94631B6A52@Ximines.local> <1358781790.3279.224.camel@zakaz.uk.xensource.com> <1358783420.3279.235.camel@zakaz.uk.xensource.com> <1358787073.3279.257.camel@zakaz.uk.xensource.com> <19EA31DDC3BEF4D66B42CBAC@Ximines.local> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <19EA31DDC3BEF4D66B42CBAC@Ximines.local> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Alex Bligh Cc: Konrad Wilk , Xen Devel , Ian Campbell , Jan Beulich , Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Mon, 21 Jan 2013, Alex Bligh wrote: > Ian, Stefano, > > --On 21 January 2013 16:51:13 +0000 Ian Campbell > wrote: > > > Not as far as I know, but Trond zero-copy == O_DIRECT so if you aren't > > using O_DIRECT then you aren't using zero copy -- and that agrees with > > my recollection. In that case your issue is something totally unrelated. > > Further investigation suggests that Stefano's commit > 47982cb00584371928e44ab6dfc6865d605a52fd > (attached below) may have somewhat surprising results. > > Firstly, changing the cache=writeback settings as passed to the QEMU > command line probably only affects emulated disks, as the parameters > for the PV disk appear to be hard coded per this commit, assuming I've > understood correctly. I am guessing my fiddling with the cache= > setting merely caused the emulated disk (used in HVM until the kernel > has loaded) to break. That is correct. > Secondly, the chosen mode of cache operation is: > BDRV_O_NOCACHE | BDRV_O_CACHE_WB > This appears to be the same as "cache=none" produces (see code > fragment from bdrv_parse_cache_flags below), which is somewhat > counterintuitive given the name of the second flag. "cache=writeback" > (as appears on the command line) uses BDRV_O_CACHE_WB only. > > BDRV_O_NOCACHE appears to map on Linux to O_DIRECT, and BDRV_O_CACHE_WB > to writeback caching. This implies O_DIRECT will always be used. This > is somewhat surprising as qemu by default only uses O_DIRECT with > cache=none, and yet the emulated devices are set up with the > equivalent of cache=writeback. Yes, it is counterintuitive, but you got it right: BDRV_O_NOCACHE | BDRV_O_CACHE_WB means O_DIRECT. > But this would explain why I'm still seeing the crash with O_DIRECT > apparently off (cache=writeback), as the cache setting is being ignored. > > This would also explain why Ian might not have seen it (it went in > late and without O_DIRECT we think this crash can't happen). > > Is the BDRV_O_NOCACHE | BDRV_O_CACHE_WB combination intentional or > should BDRV_O_NOCACHE be removed? Why would the default be different > for emulated and PV disks? The setting is different from the one of emulated devices because after analyzing the IDE code, we thought that using BDRV_O_CACHE_WB would be safe enough because when the guest wants to make sure that the data hits the disk, it issues an IDE FLUSH_CACHE operation. In the xen_disk case instead, we weren't quite sure about the assumptions of all the possible different PV frontend drivers, so we went for the safe choice, that is O_DIRECT. In fact if we wanted to change the cache setting for xen_disk, we would probably have to go back to write-through (this setting is selected by passing neither BDRV_O_NOCACHE nor BDRV_O_CACHE_WB) that is quite slow. Recently, thanks to Konrad's work on blkfront cache flushes, a new flush operation has been implemented in the block protocol: BLKIF_OP_FLUSH_DISKCACHE. BLKIF_OP_FLUSH_DISKCACHE was introduced in xen_disk by 7e7b7cba16faa7b721b822fa9ed8bebafa35700f "xen_disk: implement BLKIF_OP_FLUSH_DISKCACHE, remove BLKIF_OP_WRITE_BARRIER". Thanks to the new operation, maybe it is now safe to use write-back caching. Konrad, what do you think? Is blkback using the Linux disk cache by default? Or is it using O_DIRECT?