From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jamie Lokier Subject: Re: [Qemu-devel] Notes on block I/O data integrity Date: Thu, 27 Aug 2009 15:09:51 +0100 Message-ID: <20090827140951.GA31453@shareable.org> References: <20090825181120.GA4863@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, rusty@rustcorp.com.au To: Christoph Hellwig Return-path: Received: from mail2.shareable.org ([80.68.89.115]:40736 "EHLO mail2.shareable.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752738AbZH0OJw (ORCPT ); Thu, 27 Aug 2009 10:09:52 -0400 Content-Disposition: inline In-Reply-To: <20090825181120.GA4863@lst.de> Sender: kvm-owner@vger.kernel.org List-ID: Christoph Hellwig wrote: > As various people wanted to know how the various data integrity patches > I've send out recently play together here's a small writeup on what > issues we have in QEMU and how to fix it: Thanks for taking this on. Both this email and the one on linux-fsdevel about Linux behaviour are wonderfully clear summaries of the issues. > Action plan for QEMU: > > - IDE needs to set the write cache enabled bit > - virtio needs to implement a cache flush command and advertise it > (also needs a small change to the host driver) With IDE and SCSI, and perhaps virtio-blk, guests should also be able to disable the "write cache enabled" bit, and that should be equivalent to the guest issuing a cache flush command after every write. At the host it could be implemented as if every write were followed by flush, or by switching to O_DSYNC (cache=writethrough) in response. The other way around: for guests where integrity isn't required (e.g. disposable guests for testing - or speed during guest OS installs), you might want an option to ignore cache flush commands - just let the guest *think* it's committing to disk, but don't waste time doing that on the host. > For disks using volatile write caches, the cache flush is implemented by > a protocol specific request, and the the barrier request are implemented > by performing cache flushes before and after the barrier request, in > addition to the draining mentioned above. The second cache flush can be > replaced by setting the "Force Unit Access" bit on the barrier request > on modern disks. For fdatasync (etc), you've probably noticed that it only needs one cache flush by itself, no second request or FUA write. Less obviously, there are opportunities to merge and reorder around non-barrier flush requests in the elevator, and to eliminate redundant flush requests. Also you don't need flushes to reach every backing drive on RAID, but knowing which ones to leave out is tricky and needs more hints from the filesystem. I agree with the whole of your general plan, both in QEMU and in Linux as a host. Spot on! -- Jamie From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Mgfft-0008AG-9E for qemu-devel@nongnu.org; Thu, 27 Aug 2009 10:10:01 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Mgffo-0007ur-4I for qemu-devel@nongnu.org; Thu, 27 Aug 2009 10:10:00 -0400 Received: from [199.232.76.173] (port=35225 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Mgffo-0007uY-0L for qemu-devel@nongnu.org; Thu, 27 Aug 2009 10:09:56 -0400 Received: from mail2.shareable.org ([80.68.89.115]:50291) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Mgffn-0004in-JO for qemu-devel@nongnu.org; Thu, 27 Aug 2009 10:09:55 -0400 Date: Thu, 27 Aug 2009 15:09:51 +0100 From: Jamie Lokier Subject: Re: [Qemu-devel] Notes on block I/O data integrity Message-ID: <20090827140951.GA31453@shareable.org> References: <20090825181120.GA4863@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090825181120.GA4863@lst.de> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christoph Hellwig Cc: rusty@rustcorp.com.au, qemu-devel@nongnu.org, kvm@vger.kernel.org Christoph Hellwig wrote: > As various people wanted to know how the various data integrity patches > I've send out recently play together here's a small writeup on what > issues we have in QEMU and how to fix it: Thanks for taking this on. Both this email and the one on linux-fsdevel about Linux behaviour are wonderfully clear summaries of the issues. > Action plan for QEMU: > > - IDE needs to set the write cache enabled bit > - virtio needs to implement a cache flush command and advertise it > (also needs a small change to the host driver) With IDE and SCSI, and perhaps virtio-blk, guests should also be able to disable the "write cache enabled" bit, and that should be equivalent to the guest issuing a cache flush command after every write. At the host it could be implemented as if every write were followed by flush, or by switching to O_DSYNC (cache=writethrough) in response. The other way around: for guests where integrity isn't required (e.g. disposable guests for testing - or speed during guest OS installs), you might want an option to ignore cache flush commands - just let the guest *think* it's committing to disk, but don't waste time doing that on the host. > For disks using volatile write caches, the cache flush is implemented by > a protocol specific request, and the the barrier request are implemented > by performing cache flushes before and after the barrier request, in > addition to the draining mentioned above. The second cache flush can be > replaced by setting the "Force Unit Access" bit on the barrier request > on modern disks. For fdatasync (etc), you've probably noticed that it only needs one cache flush by itself, no second request or FUA write. Less obviously, there are opportunities to merge and reorder around non-barrier flush requests in the elevator, and to eliminate redundant flush requests. Also you don't need flushes to reach every backing drive on RAID, but knowing which ones to leave out is tricky and needs more hints from the filesystem. I agree with the whole of your general plan, both in QEMU and in Linux as a host. Spot on! -- Jamie