From: Jamie Lokier <jamie@shareable.org> To: Christoph Hellwig <hch@lst.de> Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, rusty@rustcorp.com.au Subject: Re: [Qemu-devel] Notes on block I/O data integrity Date: Thu, 27 Aug 2009 15:09:51 +0100 [thread overview] Message-ID: <20090827140951.GA31453@shareable.org> (raw) In-Reply-To: <20090825181120.GA4863@lst.de> Christoph Hellwig wrote: > As various people wanted to know how the various data integrity patches > I've send out recently play together here's a small writeup on what > issues we have in QEMU and how to fix it: Thanks for taking this on. Both this email and the one on linux-fsdevel about Linux behaviour are wonderfully clear summaries of the issues. > Action plan for QEMU: > > - IDE needs to set the write cache enabled bit > - virtio needs to implement a cache flush command and advertise it > (also needs a small change to the host driver) With IDE and SCSI, and perhaps virtio-blk, guests should also be able to disable the "write cache enabled" bit, and that should be equivalent to the guest issuing a cache flush command after every write. At the host it could be implemented as if every write were followed by flush, or by switching to O_DSYNC (cache=writethrough) in response. The other way around: for guests where integrity isn't required (e.g. disposable guests for testing - or speed during guest OS installs), you might want an option to ignore cache flush commands - just let the guest *think* it's committing to disk, but don't waste time doing that on the host. > For disks using volatile write caches, the cache flush is implemented by > a protocol specific request, and the the barrier request are implemented > by performing cache flushes before and after the barrier request, in > addition to the draining mentioned above. The second cache flush can be > replaced by setting the "Force Unit Access" bit on the barrier request > on modern disks. For fdatasync (etc), you've probably noticed that it only needs one cache flush by itself, no second request or FUA write. Less obviously, there are opportunities to merge and reorder around non-barrier flush requests in the elevator, and to eliminate redundant flush requests. Also you don't need flushes to reach every backing drive on RAID, but knowing which ones to leave out is tricky and needs more hints from the filesystem. I agree with the whole of your general plan, both in QEMU and in Linux as a host. Spot on! -- Jamie
WARNING: multiple messages have this Message-ID (diff)
From: Jamie Lokier <jamie@shareable.org> To: Christoph Hellwig <hch@lst.de> Cc: rusty@rustcorp.com.au, qemu-devel@nongnu.org, kvm@vger.kernel.org Subject: Re: [Qemu-devel] Notes on block I/O data integrity Date: Thu, 27 Aug 2009 15:09:51 +0100 [thread overview] Message-ID: <20090827140951.GA31453@shareable.org> (raw) In-Reply-To: <20090825181120.GA4863@lst.de> Christoph Hellwig wrote: > As various people wanted to know how the various data integrity patches > I've send out recently play together here's a small writeup on what > issues we have in QEMU and how to fix it: Thanks for taking this on. Both this email and the one on linux-fsdevel about Linux behaviour are wonderfully clear summaries of the issues. > Action plan for QEMU: > > - IDE needs to set the write cache enabled bit > - virtio needs to implement a cache flush command and advertise it > (also needs a small change to the host driver) With IDE and SCSI, and perhaps virtio-blk, guests should also be able to disable the "write cache enabled" bit, and that should be equivalent to the guest issuing a cache flush command after every write. At the host it could be implemented as if every write were followed by flush, or by switching to O_DSYNC (cache=writethrough) in response. The other way around: for guests where integrity isn't required (e.g. disposable guests for testing - or speed during guest OS installs), you might want an option to ignore cache flush commands - just let the guest *think* it's committing to disk, but don't waste time doing that on the host. > For disks using volatile write caches, the cache flush is implemented by > a protocol specific request, and the the barrier request are implemented > by performing cache flushes before and after the barrier request, in > addition to the draining mentioned above. The second cache flush can be > replaced by setting the "Force Unit Access" bit on the barrier request > on modern disks. For fdatasync (etc), you've probably noticed that it only needs one cache flush by itself, no second request or FUA write. Less obviously, there are opportunities to merge and reorder around non-barrier flush requests in the elevator, and to eliminate redundant flush requests. Also you don't need flushes to reach every backing drive on RAID, but knowing which ones to leave out is tricky and needs more hints from the filesystem. I agree with the whole of your general plan, both in QEMU and in Linux as a host. Spot on! -- Jamie
next prev parent reply other threads:[~2009-08-27 14:09 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-08-25 18:11 Notes on block I/O data integrity Christoph Hellwig 2009-08-25 18:11 ` [Qemu-devel] " Christoph Hellwig 2009-08-25 19:33 ` Javier Guerra 2009-08-25 19:33 ` [Qemu-devel] " Javier Guerra 2009-08-25 19:36 ` Christoph Hellwig 2009-08-25 19:36 ` [Qemu-devel] " Christoph Hellwig 2009-08-26 18:57 ` Jamie Lokier 2009-08-26 18:57 ` Jamie Lokier 2009-08-26 22:17 ` Christoph Hellwig 2009-08-26 22:17 ` Christoph Hellwig 2009-08-27 9:00 ` Jamie Lokier 2009-08-25 20:25 ` Nikola Ciprich 2009-08-25 20:25 ` [Qemu-devel] " Nikola Ciprich 2009-08-26 18:55 ` Jamie Lokier 2009-08-26 18:55 ` Jamie Lokier 2009-08-27 0:15 ` Christoph Hellwig 2009-08-27 0:15 ` [Qemu-devel] " Christoph Hellwig 2009-08-27 10:51 ` Rusty Russell 2009-08-27 10:51 ` [Qemu-devel] " Rusty Russell 2009-08-27 13:42 ` Christoph Hellwig 2009-08-27 13:42 ` [Qemu-devel] " Christoph Hellwig 2009-08-28 2:03 ` Rusty Russell 2009-08-28 2:03 ` [Qemu-devel] " Rusty Russell 2009-08-27 14:09 ` Jamie Lokier [this message] 2009-08-27 14:09 ` [Qemu-devel] " Jamie Lokier
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20090827140951.GA31453@shareable.org \ --to=jamie@shareable.org \ --cc=hch@lst.de \ --cc=kvm@vger.kernel.org \ --cc=qemu-devel@nongnu.org \ --cc=rusty@rustcorp.com.au \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.