All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, rusty@rustcorp.com.au
Subject: Re: [Qemu-devel] Notes on block I/O data integrity
Date: Thu, 27 Aug 2009 15:09:51 +0100	[thread overview]
Message-ID: <20090827140951.GA31453@shareable.org> (raw)
In-Reply-To: <20090825181120.GA4863@lst.de>

Christoph Hellwig wrote:
> As various people wanted to know how the various data integrity patches
> I've send out recently play together here's a small writeup on what
> issues we have in QEMU and how to fix it:

Thanks for taking this on.  Both this email and the one on
linux-fsdevel about Linux behaviour are wonderfully clear summaries of
the issues.

> Action plan for QEMU:
>
>  - IDE needs to set the write cache enabled bit
>  - virtio needs to implement a cache flush command and advertise it
>    (also needs a small change to the host driver)

With IDE and SCSI, and perhaps virtio-blk, guests should also be able
to disable the "write cache enabled" bit, and that should be
equivalent to the guest issuing a cache flush command after every
write.

At the host it could be implemented as if every write were followed by
flush, or by switching to O_DSYNC (cache=writethrough) in response.

The other way around: for guests where integrity isn't required
(e.g. disposable guests for testing - or speed during guest OS
installs), you might want an option to ignore cache flush commands -
just let the guest *think* it's committing to disk, but don't waste
time doing that on the host.

> For disks using volatile write caches, the cache flush is implemented by
> a protocol specific request, and the the barrier request are implemented
> by performing cache flushes before and after the barrier request, in
> addition to the draining mentioned above.  The second cache flush can be
> replaced by setting the "Force Unit Access" bit on the barrier request 
> on modern disks.

For fdatasync (etc), you've probably noticed that it only needs one
cache flush by itself, no second request or FUA write.

Less obviously, there are opportunities to merge and reorder around
non-barrier flush requests in the elevator, and to eliminate redundant
flush requests.

Also you don't need flushes to reach every backing drive on RAID, but
knowing which ones to leave out is tricky and needs more hints from
the filesystem.

I agree with the whole of your general plan, both in QEMU and in Linux
as a host.  Spot on!

-- Jamie

WARNING: multiple messages have this Message-ID (diff)
From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: rusty@rustcorp.com.au, qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] Notes on block I/O data integrity
Date: Thu, 27 Aug 2009 15:09:51 +0100	[thread overview]
Message-ID: <20090827140951.GA31453@shareable.org> (raw)
In-Reply-To: <20090825181120.GA4863@lst.de>

Christoph Hellwig wrote:
> As various people wanted to know how the various data integrity patches
> I've send out recently play together here's a small writeup on what
> issues we have in QEMU and how to fix it:

Thanks for taking this on.  Both this email and the one on
linux-fsdevel about Linux behaviour are wonderfully clear summaries of
the issues.

> Action plan for QEMU:
>
>  - IDE needs to set the write cache enabled bit
>  - virtio needs to implement a cache flush command and advertise it
>    (also needs a small change to the host driver)

With IDE and SCSI, and perhaps virtio-blk, guests should also be able
to disable the "write cache enabled" bit, and that should be
equivalent to the guest issuing a cache flush command after every
write.

At the host it could be implemented as if every write were followed by
flush, or by switching to O_DSYNC (cache=writethrough) in response.

The other way around: for guests where integrity isn't required
(e.g. disposable guests for testing - or speed during guest OS
installs), you might want an option to ignore cache flush commands -
just let the guest *think* it's committing to disk, but don't waste
time doing that on the host.

> For disks using volatile write caches, the cache flush is implemented by
> a protocol specific request, and the the barrier request are implemented
> by performing cache flushes before and after the barrier request, in
> addition to the draining mentioned above.  The second cache flush can be
> replaced by setting the "Force Unit Access" bit on the barrier request 
> on modern disks.

For fdatasync (etc), you've probably noticed that it only needs one
cache flush by itself, no second request or FUA write.

Less obviously, there are opportunities to merge and reorder around
non-barrier flush requests in the elevator, and to eliminate redundant
flush requests.

Also you don't need flushes to reach every backing drive on RAID, but
knowing which ones to leave out is tricky and needs more hints from
the filesystem.

I agree with the whole of your general plan, both in QEMU and in Linux
as a host.  Spot on!

-- Jamie

  parent reply	other threads:[~2009-08-27 14:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-25 18:11 Notes on block I/O data integrity Christoph Hellwig
2009-08-25 18:11 ` [Qemu-devel] " Christoph Hellwig
2009-08-25 19:33 ` Javier Guerra
2009-08-25 19:33   ` [Qemu-devel] " Javier Guerra
2009-08-25 19:36   ` Christoph Hellwig
2009-08-25 19:36     ` [Qemu-devel] " Christoph Hellwig
2009-08-26 18:57     ` Jamie Lokier
2009-08-26 18:57       ` Jamie Lokier
2009-08-26 22:17       ` Christoph Hellwig
2009-08-26 22:17         ` Christoph Hellwig
2009-08-27  9:00         ` Jamie Lokier
2009-08-25 20:25 ` Nikola Ciprich
2009-08-25 20:25   ` [Qemu-devel] " Nikola Ciprich
2009-08-26 18:55   ` Jamie Lokier
2009-08-26 18:55     ` Jamie Lokier
2009-08-27  0:15   ` Christoph Hellwig
2009-08-27  0:15     ` [Qemu-devel] " Christoph Hellwig
2009-08-27 10:51 ` Rusty Russell
2009-08-27 10:51   ` [Qemu-devel] " Rusty Russell
2009-08-27 13:42   ` Christoph Hellwig
2009-08-27 13:42     ` [Qemu-devel] " Christoph Hellwig
2009-08-28  2:03     ` Rusty Russell
2009-08-28  2:03       ` [Qemu-devel] " Rusty Russell
2009-08-27 14:09 ` Jamie Lokier [this message]
2009-08-27 14:09   ` [Qemu-devel] " Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090827140951.GA31453@shareable.org \
    --to=jamie@shareable.org \
    --cc=hch@lst.de \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.