All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Nikola Ciprich <extmaillist@linuxbox.cz>
Cc: Christoph Hellwig <hch@lst.de>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	rusty@rustcorp.com.au, nikola.ciprich@linuxbox.cz,
	kopi@linuxbox.cz
Subject: Re: Notes on block I/O data integrity
Date: Thu, 27 Aug 2009 02:15:42 +0200	[thread overview]
Message-ID: <20090827001542.GA8317@lst.de> (raw)
In-Reply-To: <20090825202508.GB4807@nik-comp.linuxbox.cz>

[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]

On Tue, Aug 25, 2009 at 10:25:08PM +0200, Nikola Ciprich wrote:
> Hello Christopher,
> 
> thanks a lot vor this overview, it answers a lot of my questions!
> May I suggest You put it somewhere on the wiki so it doesn't get 
> forgotten in the maillist only?

I'll rather try to get the worst issues fixed ASAP.

> It also rises few new questions though. We have experienced postgresql
> database corruptions lately, two times to be exact. First time, I blamed
> server crash, but lately (freshly created) database got corrupted for the 
> second time and there were no crashes since the initialisation. The server
> hardware is surely OK. I didn't have much time to look into this
> yet, but Your mail just poked me to return to the subject. The situation
> is a bit more complex, as there are additional two layers of storage there:
> we're using SATA/SAS drives, network-mirrored by DRBD, clustered LVM on top
> of those, and finally qemu-kvm using virtio on top of created logical
> volumes. So there are plenty of possible culprits, but Your mention of virtio
> unsafeness while using cache=writethrough (which is the default for drive 
> types other then qcow) leads me to suspicion that this might be the reason of 
> the problem. Databases are sensitive for requests reordering, so I guess
> using virtio for postgres storage was quite stupid from me :(
> So my question is, could You please advise me a bit on the storage
> configuration? virtio performed much better then SCSI, but of course
> data integrity is crucial, so would You suggest rather using SCSI?
> DRBD doesn't have problem with barriers, clustered LVM SHOULD not 
> have problems with it, as we're using just striped volumes, but I'll
> check it to be sure. So is it safe for me to keep cache=writethrough
> for the database volume?

I'm pretty sure one of the many laters in your setup will not pass
through write barriers, so defintively make sure your write caches are
disabled.  Also right now virtio is not a good idea for data integrity.
The guest side fix for a setup with cache=writethrough or cache=none
on block device without volatile disk write cache is however a trivial
one line patch I've already submitted.  I've attached it below for
reference:


[-- Attachment #2: virtio-blk-drain --]
[-- Type: text/plain, Size: 1896 bytes --]

Subject: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
From: Christoph Hellwig <hch@lst.de>

Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
means it does not allow filesystems to use barriers.  But the typical use
case for virtio-blk is to use a backed that uses synchronous I/O, and in
that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer
drain the request queue around barrier I/O and provide the semantics that
the filesystems need.  This is what the SCSI disk driver does for disks
that have the write cache disabled.

With this patch we incorrectly advertise barrier support if someone
configure qemu with write back caching.  While this displays wrong
information in the guest there is nothing that guest could have done
even if we rightfully told it that we do not support any barriers.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/drivers/block/virtio_blk.c
===================================================================
--- linux-2.6.orig/drivers/block/virtio_blk.c	2009-08-20 17:41:37.019718433 -0300
+++ linux-2.6/drivers/block/virtio_blk.c	2009-08-20 17:45:40.511747922 -0300
@@ -336,9 +336,16 @@ static int __devinit virtblk_probe(struc
 	vblk->disk->driverfs_dev = &vdev->dev;
 	index++;
 
-	/* If barriers are supported, tell block layer that queue is ordered */
+	/*
+	 * If barriers are supported, tell block layer that queue is ordered.
+	 *
+	 * If no barriers are supported assume the host uses synchronous
+	 * writes and just drain the the queue before and after the barrier.
+	 */
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER))
 		blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_TAG, NULL);
+	else
+		blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_DRAIN, NULL);
 
 	/* If disk is read-only in the host, the guest should obey */
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))

WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@lst.de>
To: Nikola Ciprich <extmaillist@linuxbox.cz>
Cc: kopi@linuxbox.cz, kvm@vger.kernel.org, rusty@rustcorp.com.au,
	qemu-devel@nongnu.org, nikola.ciprich@linuxbox.cz,
	Christoph Hellwig <hch@lst.de>
Subject: [Qemu-devel] Re: Notes on block I/O data integrity
Date: Thu, 27 Aug 2009 02:15:42 +0200	[thread overview]
Message-ID: <20090827001542.GA8317@lst.de> (raw)
In-Reply-To: <20090825202508.GB4807@nik-comp.linuxbox.cz>

[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]

On Tue, Aug 25, 2009 at 10:25:08PM +0200, Nikola Ciprich wrote:
> Hello Christopher,
> 
> thanks a lot vor this overview, it answers a lot of my questions!
> May I suggest You put it somewhere on the wiki so it doesn't get 
> forgotten in the maillist only?

I'll rather try to get the worst issues fixed ASAP.

> It also rises few new questions though. We have experienced postgresql
> database corruptions lately, two times to be exact. First time, I blamed
> server crash, but lately (freshly created) database got corrupted for the 
> second time and there were no crashes since the initialisation. The server
> hardware is surely OK. I didn't have much time to look into this
> yet, but Your mail just poked me to return to the subject. The situation
> is a bit more complex, as there are additional two layers of storage there:
> we're using SATA/SAS drives, network-mirrored by DRBD, clustered LVM on top
> of those, and finally qemu-kvm using virtio on top of created logical
> volumes. So there are plenty of possible culprits, but Your mention of virtio
> unsafeness while using cache=writethrough (which is the default for drive 
> types other then qcow) leads me to suspicion that this might be the reason of 
> the problem. Databases are sensitive for requests reordering, so I guess
> using virtio for postgres storage was quite stupid from me :(
> So my question is, could You please advise me a bit on the storage
> configuration? virtio performed much better then SCSI, but of course
> data integrity is crucial, so would You suggest rather using SCSI?
> DRBD doesn't have problem with barriers, clustered LVM SHOULD not 
> have problems with it, as we're using just striped volumes, but I'll
> check it to be sure. So is it safe for me to keep cache=writethrough
> for the database volume?

I'm pretty sure one of the many laters in your setup will not pass
through write barriers, so defintively make sure your write caches are
disabled.  Also right now virtio is not a good idea for data integrity.
The guest side fix for a setup with cache=writethrough or cache=none
on block device without volatile disk write cache is however a trivial
one line patch I've already submitted.  I've attached it below for
reference:


[-- Attachment #2: virtio-blk-drain --]
[-- Type: text/plain, Size: 1896 bytes --]

Subject: [PATCH] virtio-blk: set QUEUE_ORDERED_DRAIN by default
From: Christoph Hellwig <hch@lst.de>

Currently virtio-blk doesn't set any QUEUE_ORDERED_ flag by default, which
means it does not allow filesystems to use barriers.  But the typical use
case for virtio-blk is to use a backed that uses synchronous I/O, and in
that case we can simply set QUEUE_ORDERED_DRAIN to make the block layer
drain the request queue around barrier I/O and provide the semantics that
the filesystems need.  This is what the SCSI disk driver does for disks
that have the write cache disabled.

With this patch we incorrectly advertise barrier support if someone
configure qemu with write back caching.  While this displays wrong
information in the guest there is nothing that guest could have done
even if we rightfully told it that we do not support any barriers.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/drivers/block/virtio_blk.c
===================================================================
--- linux-2.6.orig/drivers/block/virtio_blk.c	2009-08-20 17:41:37.019718433 -0300
+++ linux-2.6/drivers/block/virtio_blk.c	2009-08-20 17:45:40.511747922 -0300
@@ -336,9 +336,16 @@ static int __devinit virtblk_probe(struc
 	vblk->disk->driverfs_dev = &vdev->dev;
 	index++;
 
-	/* If barriers are supported, tell block layer that queue is ordered */
+	/*
+	 * If barriers are supported, tell block layer that queue is ordered.
+	 *
+	 * If no barriers are supported assume the host uses synchronous
+	 * writes and just drain the the queue before and after the barrier.
+	 */
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_BARRIER))
 		blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_TAG, NULL);
+	else
+		blk_queue_ordered(vblk->disk->queue, QUEUE_ORDERED_DRAIN, NULL);
 
 	/* If disk is read-only in the host, the guest should obey */
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))

  parent reply	other threads:[~2009-08-27  0:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-25 18:11 Notes on block I/O data integrity Christoph Hellwig
2009-08-25 18:11 ` [Qemu-devel] " Christoph Hellwig
2009-08-25 19:33 ` Javier Guerra
2009-08-25 19:33   ` [Qemu-devel] " Javier Guerra
2009-08-25 19:36   ` Christoph Hellwig
2009-08-25 19:36     ` [Qemu-devel] " Christoph Hellwig
2009-08-26 18:57     ` Jamie Lokier
2009-08-26 18:57       ` Jamie Lokier
2009-08-26 22:17       ` Christoph Hellwig
2009-08-26 22:17         ` Christoph Hellwig
2009-08-27  9:00         ` Jamie Lokier
2009-08-25 20:25 ` Nikola Ciprich
2009-08-25 20:25   ` [Qemu-devel] " Nikola Ciprich
2009-08-26 18:55   ` Jamie Lokier
2009-08-26 18:55     ` Jamie Lokier
2009-08-27  0:15   ` Christoph Hellwig [this message]
2009-08-27  0:15     ` Christoph Hellwig
2009-08-27 10:51 ` Rusty Russell
2009-08-27 10:51   ` [Qemu-devel] " Rusty Russell
2009-08-27 13:42   ` Christoph Hellwig
2009-08-27 13:42     ` [Qemu-devel] " Christoph Hellwig
2009-08-28  2:03     ` Rusty Russell
2009-08-28  2:03       ` [Qemu-devel] " Rusty Russell
2009-08-27 14:09 ` [Qemu-devel] " Jamie Lokier
2009-08-27 14:09   ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090827001542.GA8317@lst.de \
    --to=hch@lst.de \
    --cc=extmaillist@linuxbox.cz \
    --cc=kopi@linuxbox.cz \
    --cc=kvm@vger.kernel.org \
    --cc=nikola.ciprich@linuxbox.cz \
    --cc=qemu-devel@nongnu.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.