All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Hellwig <hch@lst.de>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: Notes on block I/O data integrity
Date: Thu, 27 Aug 2009 15:42:39 +0200	[thread overview]
Message-ID: <20090827134239.GA13479@lst.de> (raw)
In-Reply-To: <200908272021.56219.rusty@rustcorp.com.au>

On Thu, Aug 27, 2009 at 08:21:55PM +0930, Rusty Russell wrote:
> >  - virtio-blk needs to advertise ordered queue by default.
> >    This makes cache=writethrough safe on virtio.
> 
> >From a guest POV, that's "we don't know, let's say we're ordered because that
> may make us safer".  Of course, it may not help: how much does it cost to
> drain the queue?
> 
> The bug, IMHO is that we *should* know.  And in future I'd like to fix that,
> either by adding an VIRTIO_BLK_F_ORDERED feature, or a VIRTIO_BLK_F_UNORDERED
> feature.
> 
> > Action plan for QEMU:
> > 
> >  - IDE needs to set the write cache enabled bit
> >  - virtio needs to implement a cache flush command and advertise it
> >    (also needs a small change to the host driver)
> 
> So, virtio-blk needs to be enhanced for this as well.

Really, enabling volatile write caches without advertising a cache flush
command is a bug in the storage, where in our case qemu is the storage.
So I don't really see the need for two feature bits.  Here's my plan for
virtio-blk:


 - add a new VIRTIO_BLK_F_WCACHE feature.  If this feature is set we
   do
     (a) implement the prepare_flush queue operation to send a
         standalone cache flush
     (b) set a proper barrier ordering flag on the queue

	Now I'm not entirely sure which queue ordering feature we will
	use.  It is not going to be QUEUE_ORDERED_TAG as for
	VIRTIO_BLK_F_BARRIER as that leaves all the queue draining to
	the host.  Which for everything that uses something resembling
	Posix I/O as a backed and has more than one outstanding command
	at a time just means duplicating all the queue management we
	already do in the guest for no gain.
	The easiest one would be QUEUE_ORDERED_DRAIN_FLUSH, in which
	case the cache flush command really is everything we need.
	As a slight optimization of it we could make it
	QUEUE_ORDERED_DRAIN_FUA which still does all the queue draining
	in the guest, but only sends one explicit cache flush before the
	barrier and gthen sets the FUA bit on the actual barrier
	request.  In qemu we still would implement this as fdatasync
	before and after the request, but we would save one protocol
	roundtrip.

Now the big question is when do we set the VIRTIO_BLK_F_WCACHE feature.
The proper thing to do would be to set it for cache=writeback and
cache=none, because they do need the fdatasync, and not for
cache=writethrough because it does not require it.

Now Avi is a big advocate for the cache=writethrough should mean go fast
and loose and don't care about data integrity.  There's a certain point
to that as I don't really see a good use case for that mode, but I
really hate to make something unsafe that doesn't explicitly say so
in the option name.

The complex (not to say over engineered) verison would be to split
the caching and data integrity setting into two options:


 (1) hostcache=on|off
 	use buffered vs O_DIRECT I/O
 (2) integrity=osync|fsync|none
 	use O_SYNC, use f(data)sync or do not care about data integrity


WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@lst.de>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Hellwig <hch@lst.de>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: [Qemu-devel] Re: Notes on block I/O data integrity
Date: Thu, 27 Aug 2009 15:42:39 +0200	[thread overview]
Message-ID: <20090827134239.GA13479@lst.de> (raw)
In-Reply-To: <200908272021.56219.rusty@rustcorp.com.au>

On Thu, Aug 27, 2009 at 08:21:55PM +0930, Rusty Russell wrote:
> >  - virtio-blk needs to advertise ordered queue by default.
> >    This makes cache=writethrough safe on virtio.
> 
> >From a guest POV, that's "we don't know, let's say we're ordered because that
> may make us safer".  Of course, it may not help: how much does it cost to
> drain the queue?
> 
> The bug, IMHO is that we *should* know.  And in future I'd like to fix that,
> either by adding an VIRTIO_BLK_F_ORDERED feature, or a VIRTIO_BLK_F_UNORDERED
> feature.
> 
> > Action plan for QEMU:
> > 
> >  - IDE needs to set the write cache enabled bit
> >  - virtio needs to implement a cache flush command and advertise it
> >    (also needs a small change to the host driver)
> 
> So, virtio-blk needs to be enhanced for this as well.

Really, enabling volatile write caches without advertising a cache flush
command is a bug in the storage, where in our case qemu is the storage.
So I don't really see the need for two feature bits.  Here's my plan for
virtio-blk:


 - add a new VIRTIO_BLK_F_WCACHE feature.  If this feature is set we
   do
     (a) implement the prepare_flush queue operation to send a
         standalone cache flush
     (b) set a proper barrier ordering flag on the queue

	Now I'm not entirely sure which queue ordering feature we will
	use.  It is not going to be QUEUE_ORDERED_TAG as for
	VIRTIO_BLK_F_BARRIER as that leaves all the queue draining to
	the host.  Which for everything that uses something resembling
	Posix I/O as a backed and has more than one outstanding command
	at a time just means duplicating all the queue management we
	already do in the guest for no gain.
	The easiest one would be QUEUE_ORDERED_DRAIN_FLUSH, in which
	case the cache flush command really is everything we need.
	As a slight optimization of it we could make it
	QUEUE_ORDERED_DRAIN_FUA which still does all the queue draining
	in the guest, but only sends one explicit cache flush before the
	barrier and gthen sets the FUA bit on the actual barrier
	request.  In qemu we still would implement this as fdatasync
	before and after the request, but we would save one protocol
	roundtrip.

Now the big question is when do we set the VIRTIO_BLK_F_WCACHE feature.
The proper thing to do would be to set it for cache=writeback and
cache=none, because they do need the fdatasync, and not for
cache=writethrough because it does not require it.

Now Avi is a big advocate for the cache=writethrough should mean go fast
and loose and don't care about data integrity.  There's a certain point
to that as I don't really see a good use case for that mode, but I
really hate to make something unsafe that doesn't explicitly say so
in the option name.

The complex (not to say over engineered) verison would be to split
the caching and data integrity setting into two options:


 (1) hostcache=on|off
 	use buffered vs O_DIRECT I/O
 (2) integrity=osync|fsync|none
 	use O_SYNC, use f(data)sync or do not care about data integrity

  reply	other threads:[~2009-08-27 13:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-25 18:11 Notes on block I/O data integrity Christoph Hellwig
2009-08-25 18:11 ` [Qemu-devel] " Christoph Hellwig
2009-08-25 19:33 ` Javier Guerra
2009-08-25 19:33   ` [Qemu-devel] " Javier Guerra
2009-08-25 19:36   ` Christoph Hellwig
2009-08-25 19:36     ` [Qemu-devel] " Christoph Hellwig
2009-08-26 18:57     ` Jamie Lokier
2009-08-26 18:57       ` Jamie Lokier
2009-08-26 22:17       ` Christoph Hellwig
2009-08-26 22:17         ` Christoph Hellwig
2009-08-27  9:00         ` Jamie Lokier
2009-08-25 20:25 ` Nikola Ciprich
2009-08-25 20:25   ` [Qemu-devel] " Nikola Ciprich
2009-08-26 18:55   ` Jamie Lokier
2009-08-26 18:55     ` Jamie Lokier
2009-08-27  0:15   ` Christoph Hellwig
2009-08-27  0:15     ` [Qemu-devel] " Christoph Hellwig
2009-08-27 10:51 ` Rusty Russell
2009-08-27 10:51   ` [Qemu-devel] " Rusty Russell
2009-08-27 13:42   ` Christoph Hellwig [this message]
2009-08-27 13:42     ` Christoph Hellwig
2009-08-28  2:03     ` Rusty Russell
2009-08-28  2:03       ` [Qemu-devel] " Rusty Russell
2009-08-27 14:09 ` [Qemu-devel] " Jamie Lokier
2009-08-27 14:09   ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090827134239.GA13479@lst.de \
    --to=hch@lst.de \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.