From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Bligh Subject: Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Date: Wed, 06 Mar 2013 11:50:48 +0000 Message-ID: <262CC5014C323700E12B5A63@nimrod.local> References: <19EA31DDC3BEF4D66B42CBAC@Ximines.local> <4D54EE421529BA5E0A511E31@nimrod.local> <850CD247D67BBF77A4CF0417@nimrod.local> <656C0D5981D3A8BFB9993B2E@nimrod.local> <330E687E05E24E508F95A610@nimrod.local> <20130222174129.GD7768@phenom.dumpdata.com> Reply-To: Alex Bligh Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Alex Bligh , Stefano Stabellini , Ian Campbell , Jan Beulich , Xen Devel List-Id: xen-devel@lists.xenproject.org Konrad, --On 22 February 2013 19:53:22 +0000 Alex Bligh wrote: >> You should be able to test this rather easy by (in your guest) >> mounting an ext3 or ext4 with barrier support and then looking at >> the blktrace/blkparse to make sure that the sync commands are indeed >> hitting the platter. > > OK, I will do that. > > I take it that it will sufficient to show: > a) blktrace on the guest performing FUA/FLUSH operations; and > b) blktrace on the host performing FUA/FLUSH operations > in each case where there is an ext4 FS with barrier support turned > on. Results are positive. We used the -y flag on sparsecopy https://github.com/abligh/sparsecopy to generate frequent barrier operations on a write to a file. This does an fdatasync() after a configurable number of bytes. This was run in a VM with /dev/xvda mapped to a qcow file on a Constellation ES.2 SATA drive in the host on /dev/sdb (with no other traffic on /dev/sdb). This is with O_DIRECT switched off. The sound of the disk is a bit of give-away this is working, but for those that like blktrace / blkparse output. The WBS indicates a barrier write. I think this indicates barriers are getting through. Correct? -- Alex Bligh Extract from output of guest VM 202,0 0 1 0.000000000 0 D WS 1066249 + 8 [swapper/0] 202,0 0 2 0.009730334 0 C WS 1066249 + 8 [0] 202,0 0 3 0.009737210 0 C WS 1066249 [0] 202,0 0 4 0.015065483 0 C WS 2013633 + 32 [0] 202,0 0 5 0.016021243 0 C WS 1066257 + 40 [0] 202,0 0 6 0.016149561 217 A WBS 1066297 + 8 <- (202,1) 1050232 202,0 0 7 0.016154194 217 Q WBS 1066297 + 8 [(null)] 202,0 0 8 0.016158208 217 G WBS 1066297 + 8 [(null)] 202,0 0 9 0.016162792 217 I WBS 1066297 + 8 [(null)] 202,0 0 10 0.034824799 0 D WS 1066297 + 8 [swapper/0] 202,0 0 11 0.041799906 0 C WS 1066297 + 8 [0] 202,0 0 12 0.041807562 0 C WS 1066297 [0] 202,0 1 1 0.014174798 3601 A WS 2013633 + 32 <- (202,1) 1997568 Extract from output of host VM 8,17 1 0 0.205626177 0 m N cfq1542S / complete rqnoidle 1 8,17 1 0 0.205630473 0 m N cfq1542S / set_slice=30 8,17 1 0 0.205637109 0 m N cfq1542S / arm_idle: 2 group_idle: 0 8,17 1 0 0.205638061 0 m N cfq schedule dispatch 8,16 1 72 0.205742935 1542 A WBS 1950869136 + 8 <- (8,17) 1950867088 8,17 1 73 0.205746817 1542 Q WS 1950869136 + 8 [jbd2/sdb1-8] 8,17 1 74 0.205754223 1542 G WS 1950869136 + 8 [jbd2/sdb1-8] 8,17 1 75 0.205758076 1542 I WS 1950869136 + 8 [jbd2/sdb1-8] 8,17 1 0 0.205760996 0 m N cfq1542S / insert_request 8,17 1 0 0.205766871 0 m N cfq1542S / dispatch_insert 8,17 1 0 0.205770429 0 m N cfq1542S / dispatched a request 8,17 1 0 0.205772193 0 m N cfq1542S / activate rq, drv=1 8,17 1 76 0.205772854 1542 D WS 1950869136 + 8 [jbd2/sdb1-8] 8,17 1 0 0.210488008 0 m N cfq idle timer fired