All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH, RFC 0/4] allow guest control of the volatile write cache
@ 2011-03-15 14:10 Christoph Hellwig
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha, prerna@linux.vnet.ibm.com Anthony Liguori

This series adds support for the guest to control the volatile write
cache setting on disks exported by qemu for ide and virtio.
For ide it just wires up the existing SETFEATURES calls, and for virtio
it adds a new writeable config space field.  SCSI is not supported at
this point, as the convoluted callback mess in the SCSI stack doesn't
allow commands except for plain WRITEs to read data from guest memory.

The backend is based on the code that Prerna posted a while ago,
and not Stefan's /proc based variant.  I'm open to either one, but
the problem with the /proc based one is that it's Linux-specific.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-15 14:10 [Qemu-devel] [PATCH, RFC 0/4] allow guest control of the volatile write cache Christoph Hellwig
@ 2011-03-15 14:11 ` Christoph Hellwig
  2011-03-16  9:42   ` [Qemu-devel] " Stefan Hajnoczi
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
  2011-03-15 14:16   ` [Qemu-devel] " Christoph Hellwig
  2 siblings, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha, prerna@linux.vnet.ibm.com Anthony Liguori

Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
but no writeback semantics.  All existing callers are changed to also
specify BDRV_O_CACHE_WB to give them writeback semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-03-08 20:11:08.188219978 +0100
+++ qemu/block.c	2011-03-08 20:12:44.971718742 +0100
@@ -439,13 +439,7 @@ static int bdrv_open_common(BlockDriverS
     bs->drv = drv;
     bs->opaque = qemu_mallocz(drv->instance_size);
 
-    /*
-     * Yes, BDRV_O_NOCACHE aka O_DIRECT means we have to present a
-     * write cache to the guest.  We do need the fdatasync to flush
-     * out transactions for block allocations, and we maybe have a
-     * volatile write cache in our backing device to deal with.
-     */
-    if (flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))
+    if (flags & BDRV_O_CACHE_WB)
         bs->enable_write_cache = 1;
 
     /*
Index: qemu/block/raw-posix.c
===================================================================
--- qemu.orig/block/raw-posix.c	2011-03-08 20:11:08.200220692 +0100
+++ qemu/block/raw-posix.c	2011-03-08 20:11:22.229218596 +0100
@@ -154,7 +154,7 @@ static int raw_open_common(BlockDriverSt
      * and O_DIRECT for no caching. */
     if ((bdrv_flags & BDRV_O_NOCACHE))
         s->open_flags |= O_DIRECT;
-    else if (!(bdrv_flags & BDRV_O_CACHE_WB))
+    if (!(bdrv_flags & BDRV_O_CACHE_WB))
         s->open_flags |= O_DSYNC;
 
     s->fd = -1;
Index: qemu/block/raw-win32.c
===================================================================
--- qemu.orig/block/raw-win32.c	2011-03-08 20:11:08.212218227 +0100
+++ qemu/block/raw-win32.c	2011-03-08 20:11:22.237218180 +0100
@@ -88,9 +88,9 @@ static int raw_open(BlockDriverState *bs
     }
 
     overlapped = FILE_ATTRIBUTE_NORMAL;
-    if ((flags & BDRV_O_NOCACHE))
-        overlapped |= FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH;
-    else if (!(flags & BDRV_O_CACHE_WB))
+    if (flags & BDRV_O_NOCACHE)
+        overlapped |= FILE_FLAG_NO_BUFFERING;
+    if (!(flags & BDRV_O_CACHE_WB))
         overlapped |= FILE_FLAG_WRITE_THROUGH;
     s->hfile = CreateFile(filename, access_flags,
                           FILE_SHARE_READ, NULL,
@@ -349,9 +349,9 @@ static int hdev_open(BlockDriverState *b
     create_flags = OPEN_EXISTING;
 
     overlapped = FILE_ATTRIBUTE_NORMAL;
-    if ((flags & BDRV_O_NOCACHE))
-        overlapped |= FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH;
-    else if (!(flags & BDRV_O_CACHE_WB))
+    if (flags & BDRV_O_NOCACHE)
+        overlapped |= FILE_FLAG_NO_BUFFERING;
+    if (!(flags & BDRV_O_CACHE_WB))
         overlapped |= FILE_FLAG_WRITE_THROUGH;
     s->hfile = CreateFile(filename, access_flags,
                           FILE_SHARE_READ, NULL,
Index: qemu/blockdev.c
===================================================================
--- qemu.orig/blockdev.c	2011-03-08 20:11:08.220217606 +0100
+++ qemu/blockdev.c	2011-03-08 20:11:22.237218180 +0100
@@ -326,7 +326,7 @@ DriveInfo *drive_init(QemuOpts *opts, in
 
     if ((buf = qemu_opt_get(opts, "cache")) != NULL) {
         if (!strcmp(buf, "off") || !strcmp(buf, "none")) {
-            bdrv_flags |= BDRV_O_NOCACHE;
+            bdrv_flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
         } else if (!strcmp(buf, "writeback")) {
             bdrv_flags |= BDRV_O_CACHE_WB;
         } else if (!strcmp(buf, "unsafe")) {
Index: qemu/qemu-io.c
===================================================================
--- qemu.orig/qemu-io.c	2011-03-08 20:11:08.232220650 +0100
+++ qemu/qemu-io.c	2011-03-08 20:11:22.245232398 +0100
@@ -1655,7 +1655,7 @@ open_f(int argc, char **argv)
 			flags |= BDRV_O_SNAPSHOT;
 			break;
 		case 'n':
-			flags |= BDRV_O_NOCACHE;
+			flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
 			break;
 		case 'r':
 			readonly = 1;
@@ -1751,7 +1751,7 @@ int main(int argc, char **argv)
 			flags |= BDRV_O_SNAPSHOT;
 			break;
 		case 'n':
-			flags |= BDRV_O_NOCACHE;
+			flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
 			break;
 		case 'c':
 			add_user_command(optarg);
Index: qemu/qemu-nbd.c
===================================================================
--- qemu.orig/qemu-nbd.c	2011-03-08 20:11:08.244217894 +0100
+++ qemu/qemu-nbd.c	2011-03-08 20:11:22.253267426 +0100
@@ -238,7 +238,7 @@ int main(int argc, char **argv)
             flags |= BDRV_O_SNAPSHOT;
             break;
         case 'n':
-            flags |= BDRV_O_NOCACHE;
+            flags |= BDRV_O_NOCACHE | BDRV_O_CACHE_WB;
             break;
         case 'b':
             bindto = optarg;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-15 14:10 [Qemu-devel] [PATCH, RFC 0/4] allow guest control of the volatile write cache Christoph Hellwig
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE Christoph Hellwig
@ 2011-03-15 14:11 ` Christoph Hellwig
  2011-03-15 14:11   ` [Qemu-devel] [PATCH 3/4] ide: wire up setfeatures cache control Christoph Hellwig
                     ` (3 more replies)
  2011-03-15 14:16   ` [Qemu-devel] " Christoph Hellwig
  2 siblings, 4 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha, aliguori, prerna

Add a new bdrv_change_cache that can set/clear the writeback flag
at runtime by stopping all I/O and closing/reopening the image file.

All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
with minimal refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2011-03-15 11:47:31.285634626 +0100
+++ qemu/block.c	2011-03-15 14:57:03.680633093 +0100
@@ -441,6 +441,8 @@ static int bdrv_open_common(BlockDriverS
 
     if (flags & BDRV_O_CACHE_WB)
         bs->enable_write_cache = 1;
+    else
+        bs->enable_write_cache = 0;
 
     /*
      * Clear flags that are internal to the block layer before opening the
@@ -651,6 +653,44 @@ unlink_and_fail:
     return ret;
 }
 
+static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
+{
+    BlockDriver *drv = bs->drv;
+    int ret;
+
+    if (bdrv_flags == bs->open_flags) {
+        return 0;
+    }
+
+    /* Quiesce IO for the given block device */
+    qemu_aio_flush();
+    bdrv_flush(bs);
+
+    bdrv_close(bs);
+    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
+
+    /*
+     * A failed attempt to reopen the image file must lead to 'abort()'
+     */
+    if (ret != 0) {
+        abort();
+    }
+
+    return ret;
+}
+
+int bdrv_change_cache(BlockDriverState *bs, bool enable)
+{
+    int bdrv_flags = 0;
+
+    bdrv_flags = bs->open_flags & ~BDRV_O_CACHE_WB;
+    if (enable) {
+        bdrv_flags |= BDRV_O_CACHE_WB;
+    }
+
+    return bdrv_reopen(bs, bdrv_flags);
+}
+
 void bdrv_close(BlockDriverState *bs)
 {
     if (bs->drv) {
Index: qemu/block.h
===================================================================
--- qemu.orig/block.h	2011-03-15 11:47:18.664136441 +0100
+++ qemu/block.h	2011-03-15 11:47:31.813634525 +0100
@@ -87,6 +87,7 @@ int bdrv_pwrite_sync(BlockDriverState *b
 int bdrv_write_sync(BlockDriverState *bs, int64_t sector_num,
     const uint8_t *buf, int nb_sectors);
 int bdrv_truncate(BlockDriverState *bs, int64_t offset);
+int bdrv_change_cache(BlockDriverState *bs, bool enable);
 int64_t bdrv_getlength(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_guess_geometry(BlockDriverState *bs, int *pcyls, int *pheads, int *psecs);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 3/4] ide: wire up setfeatures cache control
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
@ 2011-03-15 14:11   ` Christoph Hellwig
  2011-03-15 14:11   ` [Qemu-devel] [PATCH 4/4] virtio-blk: add runtime " Christoph Hellwig
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha, aliguori, prerna

Wire up the ATA SETFEATURES subcalls that control the volatile write cache
to the new bdrv_change_cache helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/ide/core.c
===================================================================
--- qemu.orig/hw/ide/core.c	2011-03-15 11:47:18.569636140 +0100
+++ qemu/hw/ide/core.c	2011-03-15 13:07:21.464634347 +0100
@@ -1700,6 +1700,19 @@ void ide_ioport_write(void *opaque, uint
     }
 }
 
+static void ide_setcache(IDEState *s, bool enable)
+{
+    if (bdrv_change_cache(s->bs, enable)) {
+        ide_abort_command(s);
+        ide_set_irq(s->bus);
+        return;
+    }
+
+    s->identify_set = 0;
+
+    s->status = READY_STAT | SEEK_STAT;
+    ide_set_irq(s->bus);
+}
 
 void ide_exec_cmd(IDEBus *bus, uint32_t val)
 {
@@ -1855,7 +1868,11 @@ void ide_exec_cmd(IDEBus *bus, uint32_t
         case 0xcc: /* reverting to power-on defaults enable */
         case 0x66: /* reverting to power-on defaults disable */
         case 0x02: /* write cache enable */
+            ide_setcache(s, true);
+            break;
         case 0x82: /* write cache disable */
+            ide_setcache(s, false);
+            break;
         case 0xaa: /* read look-ahead enable */
         case 0x55: /* read look-ahead disable */
         case 0x05: /* set advanced power management mode */

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH 4/4] virtio-blk: add runtime cache control
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
  2011-03-15 14:11   ` [Qemu-devel] [PATCH 3/4] ide: wire up setfeatures cache control Christoph Hellwig
@ 2011-03-15 14:11   ` Christoph Hellwig
  2011-03-16  9:49   ` [Qemu-devel] Re: [PATCH 2/4] block: add a helper to change writeback mode on the fly Stefan Hajnoczi
  2011-03-17 14:44   ` [Qemu-devel] " Daniel P. Berrange
  3 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha, aliguori, prerna

Add a new writeable features config space field, which allows the guest
to communicate features it wants enabled/disabled at runtime.  The only
feature defined so far is the status of the volatile write cache.

Also rename the virtio_blk_update_config to virtio_blk_get_config to
fit the method naming scheme.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/virtio-blk.c
===================================================================
--- qemu.orig/hw/virtio-blk.c	2011-03-15 13:07:10.093633862 +0100
+++ qemu/hw/virtio-blk.c	2011-03-15 13:07:54.108135875 +0100
@@ -434,7 +434,7 @@ static void virtio_blk_reset(VirtIODevic
 
 /* coalesce internal state, copy to pci i/o region 0
  */
-static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
+static void virtio_blk_get_config(VirtIODevice *vdev, uint8_t *config)
 {
     VirtIOBlock *s = to_virtio_blk(vdev);
     struct virtio_blk_config blkcfg;
@@ -455,9 +455,26 @@ static void virtio_blk_update_config(Vir
     blkcfg.alignment_offset = 0;
     blkcfg.min_io_size = s->conf->min_io_size / blkcfg.blk_size;
     blkcfg.opt_io_size = s->conf->opt_io_size / blkcfg.blk_size;
+    if (bdrv_enable_write_cache(s->bs))
+        blkcfg.features |= VIRTIO_BLK_RT_WCE;
     memcpy(config, &blkcfg, sizeof(struct virtio_blk_config));
 }
 
+static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
+{
+    VirtIOBlock *s = to_virtio_blk(vdev);
+    struct virtio_blk_config blkcfg;
+    bool enable = false;
+
+    memcpy(&blkcfg, config, sizeof(blkcfg));
+
+    if (blkcfg.features & VIRTIO_BLK_RT_WCE)
+        enable = true;
+
+    /* no error reporting, needs to be checked by a config re-read */
+    bdrv_change_cache(s->bs, enable);
+}
+
 static uint32_t virtio_blk_get_features(VirtIODevice *vdev, uint32_t features)
 {
     VirtIOBlock *s = to_virtio_blk(vdev);
@@ -466,6 +483,7 @@ static uint32_t virtio_blk_get_features(
     features |= (1 << VIRTIO_BLK_F_GEOMETRY);
     features |= (1 << VIRTIO_BLK_F_TOPOLOGY);
     features |= (1 << VIRTIO_BLK_F_BLK_SIZE);
+    features |= (1 << VIRTIO_BLK_F_DYNAMIC);
 
     if (bdrv_enable_write_cache(s->bs))
         features |= (1 << VIRTIO_BLK_F_WCACHE);
@@ -543,7 +561,8 @@ VirtIODevice *virtio_blk_init(DeviceStat
                                           sizeof(struct virtio_blk_config),
                                           sizeof(VirtIOBlock));
 
-    s->vdev.get_config = virtio_blk_update_config;
+    s->vdev.get_config = virtio_blk_get_config;
+    s->vdev.set_config = virtio_blk_set_config;
     s->vdev.get_features = virtio_blk_get_features;
     s->vdev.reset = virtio_blk_reset;
     s->bs = conf->bs;
Index: qemu/hw/virtio-blk.h
===================================================================
--- qemu.orig/hw/virtio-blk.h	2011-03-15 13:07:10.109636192 +0100
+++ qemu/hw/virtio-blk.h	2011-03-15 13:07:54.112135546 +0100
@@ -33,6 +33,7 @@
 /* #define VIRTIO_BLK_F_IDENTIFY   8       ATA IDENTIFY supported, DEPRECATED */
 #define VIRTIO_BLK_F_WCACHE     9       /* write cache enabled */
 #define VIRTIO_BLK_F_TOPOLOGY   10      /* Topology information is available */
+#define VIRTIO_BLK_F_DYNAMIC    11      /* Dynamic features field */
 
 struct virtio_blk_config
 {
@@ -47,6 +48,8 @@ struct virtio_blk_config
     uint8_t alignment_offset;
     uint16_t min_io_size;
     uint32_t opt_io_size;
+    uint32_t features;
+#define VIRTIO_BLK_RT_WCE              (1 << 0)
 } __attribute__((packed));
 
 /* These two define direction. */

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH, RFC] virtio_blk: add cache control support
  2011-03-15 14:10 [Qemu-devel] [PATCH, RFC 0/4] allow guest control of the volatile write cache Christoph Hellwig
@ 2011-03-15 14:16   ` Christoph Hellwig
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
  2011-03-15 14:16   ` [Qemu-devel] " Christoph Hellwig
  2 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:16 UTC (permalink / raw)
  To: qemu-devel, linux-kernel; +Cc: stefanha, kwolf, prerna, aliguori, rusty

Add support for the new dynamic features config space field to allow
en/disabling the write cache at runtime.  The userspace interface is
a SCSI-compatible sysfs attribute.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/drivers/block/virtio_blk.c
===================================================================
--- linux-2.6.orig/drivers/block/virtio_blk.c	2011-03-15 12:16:29.156133695 +0100
+++ linux-2.6/drivers/block/virtio_blk.c	2011-03-15 13:17:30.160634723 +0100
@@ -291,6 +291,73 @@ static ssize_t virtblk_serial_show(struc
 }
 DEVICE_ATTR(serial, S_IRUGO, virtblk_serial_show, NULL);
 
+static bool virtblk_has_wb_cache(struct virtio_blk *vblk)
+{
+	struct virtio_device *vdev = vblk->vdev;
+	u32 features;
+
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
+		return false;
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_DYNAMIC))
+		return true;
+
+	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
+			  &features, sizeof(features));
+
+	if (features & VIRTIO_BLK_RT_WCE)
+		return true;
+	return false;
+}
+
+static ssize_t virtblk_cache_type_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	if (virtblk_has_wb_cache(disk->private_data))
+		return sprintf(buf, "write back\n");
+	else
+		return sprintf(buf, "write through\n");
+}
+
+static ssize_t virtblk_cache_type_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct virtio_blk *vblk = disk->private_data;
+	struct virtio_device *vdev = vblk->vdev;
+	u32 features, features2 = 0;
+
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH) ||
+	    !virtio_has_feature(vdev, VIRTIO_BLK_F_DYNAMIC))
+		return -ENXIO;
+
+	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
+		;
+	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
+		blk_queue_flush(disk->queue, REQ_FLUSH);
+		features |= VIRTIO_BLK_RT_WCE;
+	} else {
+		return -EINVAL;
+	}
+
+	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
+	                  &features, sizeof(features));
+
+	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
+			  &features2, sizeof(features2));
+
+	if ((features & VIRTIO_BLK_RT_WCE) !=
+	    (features2 & VIRTIO_BLK_RT_WCE))
+		return -EIO;
+
+	if (!(features2 & VIRTIO_BLK_RT_WCE))
+		blk_queue_flush(disk->queue, 0);
+	return count;
+}
+static DEVICE_ATTR(cache_type, S_IRUGO|S_IWUSR,
+	    virtblk_cache_type_show, virtblk_cache_type_store);
+
 static int __devinit virtblk_probe(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk;
@@ -377,7 +444,7 @@ static int __devinit virtblk_probe(struc
 	index++;
 
 	/* configure queue flush support */
-	if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
+	if (virtblk_has_wb_cache(vblk))
 		blk_queue_flush(q, REQ_FLUSH);
 
 	/* If disk is read-only in the host, the guest should obey */
@@ -456,6 +523,10 @@ static int __devinit virtblk_probe(struc
 	if (err)
 		goto out_del_disk;
 
+	err = device_create_file(disk_to_dev(vblk->disk), &dev_attr_cache_type);
+	if (err)
+		goto out_del_disk;
+
 	return 0;
 
 out_del_disk:
@@ -499,7 +570,7 @@ static const struct virtio_device_id id_
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
-	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY
+	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_DYNAMIC
 };
 
 /*
Index: linux-2.6/include/linux/virtio_blk.h
===================================================================
--- linux-2.6.orig/include/linux/virtio_blk.h	2011-03-15 12:14:28.261632780 +0100
+++ linux-2.6/include/linux/virtio_blk.h	2011-03-15 12:25:56.308634399 +0100
@@ -16,6 +16,7 @@
 #define VIRTIO_BLK_F_SCSI	7	/* Supports scsi command passthru */
 #define VIRTIO_BLK_F_FLUSH	9	/* Cache flush command support */
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
+#define VIRTIO_BLK_F_DYNAMIC	11	/* Dynamic features field */
 
 #define VIRTIO_BLK_ID_BYTES	20	/* ID string length */
 
@@ -45,6 +46,9 @@ struct virtio_blk_config {
 	__u16 min_io_size;
 	/* optimal sustained I/O size in logical blocks. */
 	__u32 opt_io_size;
+	/* runtime controllable features */
+	__u32 features;
+#define VIRTIO_BLK_RT_WCE		(1 << 0)
 
 } __attribute__((packed));
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-15 14:16   ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-15 14:16 UTC (permalink / raw)
  To: qemu-devel, linux-kernel; +Cc: kwolf, stefanha, aliguori, rusty, prerna

Add support for the new dynamic features config space field to allow
en/disabling the write cache at runtime.  The userspace interface is
a SCSI-compatible sysfs attribute.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6/drivers/block/virtio_blk.c
===================================================================
--- linux-2.6.orig/drivers/block/virtio_blk.c	2011-03-15 12:16:29.156133695 +0100
+++ linux-2.6/drivers/block/virtio_blk.c	2011-03-15 13:17:30.160634723 +0100
@@ -291,6 +291,73 @@ static ssize_t virtblk_serial_show(struc
 }
 DEVICE_ATTR(serial, S_IRUGO, virtblk_serial_show, NULL);
 
+static bool virtblk_has_wb_cache(struct virtio_blk *vblk)
+{
+	struct virtio_device *vdev = vblk->vdev;
+	u32 features;
+
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
+		return false;
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_DYNAMIC))
+		return true;
+
+	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
+			  &features, sizeof(features));
+
+	if (features & VIRTIO_BLK_RT_WCE)
+		return true;
+	return false;
+}
+
+static ssize_t virtblk_cache_type_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	if (virtblk_has_wb_cache(disk->private_data))
+		return sprintf(buf, "write back\n");
+	else
+		return sprintf(buf, "write through\n");
+}
+
+static ssize_t virtblk_cache_type_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct virtio_blk *vblk = disk->private_data;
+	struct virtio_device *vdev = vblk->vdev;
+	u32 features, features2 = 0;
+
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH) ||
+	    !virtio_has_feature(vdev, VIRTIO_BLK_F_DYNAMIC))
+		return -ENXIO;
+
+	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
+		;
+	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
+		blk_queue_flush(disk->queue, REQ_FLUSH);
+		features |= VIRTIO_BLK_RT_WCE;
+	} else {
+		return -EINVAL;
+	}
+
+	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
+	                  &features, sizeof(features));
+
+	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
+			  &features2, sizeof(features2));
+
+	if ((features & VIRTIO_BLK_RT_WCE) !=
+	    (features2 & VIRTIO_BLK_RT_WCE))
+		return -EIO;
+
+	if (!(features2 & VIRTIO_BLK_RT_WCE))
+		blk_queue_flush(disk->queue, 0);
+	return count;
+}
+static DEVICE_ATTR(cache_type, S_IRUGO|S_IWUSR,
+	    virtblk_cache_type_show, virtblk_cache_type_store);
+
 static int __devinit virtblk_probe(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk;
@@ -377,7 +444,7 @@ static int __devinit virtblk_probe(struc
 	index++;
 
 	/* configure queue flush support */
-	if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
+	if (virtblk_has_wb_cache(vblk))
 		blk_queue_flush(q, REQ_FLUSH);
 
 	/* If disk is read-only in the host, the guest should obey */
@@ -456,6 +523,10 @@ static int __devinit virtblk_probe(struc
 	if (err)
 		goto out_del_disk;
 
+	err = device_create_file(disk_to_dev(vblk->disk), &dev_attr_cache_type);
+	if (err)
+		goto out_del_disk;
+
 	return 0;
 
 out_del_disk:
@@ -499,7 +570,7 @@ static const struct virtio_device_id id_
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
-	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY
+	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_DYNAMIC
 };
 
 /*
Index: linux-2.6/include/linux/virtio_blk.h
===================================================================
--- linux-2.6.orig/include/linux/virtio_blk.h	2011-03-15 12:14:28.261632780 +0100
+++ linux-2.6/include/linux/virtio_blk.h	2011-03-15 12:25:56.308634399 +0100
@@ -16,6 +16,7 @@
 #define VIRTIO_BLK_F_SCSI	7	/* Supports scsi command passthru */
 #define VIRTIO_BLK_F_FLUSH	9	/* Cache flush command support */
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
+#define VIRTIO_BLK_F_DYNAMIC	11	/* Dynamic features field */
 
 #define VIRTIO_BLK_ID_BYTES	20	/* ID string length */
 
@@ -45,6 +46,9 @@ struct virtio_blk_config {
 	__u16 min_io_size;
 	/* optimal sustained I/O size in logical blocks. */
 	__u32 opt_io_size;
+	/* runtime controllable features */
+	__u32 features;
+#define VIRTIO_BLK_RT_WCE		(1 << 0)
 
 } __attribute__((packed));
 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-15 14:16   ` [Qemu-devel] " Christoph Hellwig
@ 2011-03-16  4:09     ` Rusty Russell
  -1 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-16  4:09 UTC (permalink / raw)
  To: Christoph Hellwig, qemu-devel, linux-kernel
  Cc: stefanha, kwolf, prerna, aliguori

On Tue, 15 Mar 2011 15:16:44 +0100, Christoph Hellwig <hch@lst.de> wrote:
> Add support for the new dynamic features config space field to allow
> en/disabling the write cache at runtime.  The userspace interface is
> a SCSI-compatible sysfs attribute.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Hi Christoph,

   Neat work.  Minor comments:

> +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> +		;
> +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {

   Is there a reason we're not letting gcc and/or strcmp do the
optimization work here?

> +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> +	                  &features, sizeof(features));
> +
> +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> +			  &features2, sizeof(features2));
> +
> +	if ((features & VIRTIO_BLK_RT_WCE) !=
> +	    (features2 & VIRTIO_BLK_RT_WCE))
> +		return -EIO;

   This seems like a debugging check you left in.  Or do you suspect
some issues?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-16  4:09     ` Rusty Russell
  0 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-16  4:09 UTC (permalink / raw)
  To: Christoph Hellwig, qemu-devel, linux-kernel
  Cc: kwolf, stefanha, aliguori, prerna

On Tue, 15 Mar 2011 15:16:44 +0100, Christoph Hellwig <hch@lst.de> wrote:
> Add support for the new dynamic features config space field to allow
> en/disabling the write cache at runtime.  The userspace interface is
> a SCSI-compatible sysfs attribute.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Hi Christoph,

   Neat work.  Minor comments:

> +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> +		;
> +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {

   Is there a reason we're not letting gcc and/or strcmp do the
optimization work here?

> +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> +	                  &features, sizeof(features));
> +
> +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> +			  &features2, sizeof(features2));
> +
> +	if ((features & VIRTIO_BLK_RT_WCE) !=
> +	    (features2 & VIRTIO_BLK_RT_WCE))
> +		return -EIO;

   This seems like a debugging check you left in.  Or do you suspect
some issues?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE Christoph Hellwig
@ 2011-03-16  9:42   ` Stefan Hajnoczi
  2011-03-16  9:55     ` Kevin Wolf
  2011-03-16 14:08     ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2011-03-16  9:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwolf, prerna@linux.vnet.ibm.com Anthony Liguori, qemu-devel

On Tue, Mar 15, 2011 at 2:11 PM, Christoph Hellwig <hch@lst.de> wrote:
> Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
> but no writeback semantics.  All existing callers are changed to also
> specify BDRV_O_CACHE_WB to give them writeback semantics.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I think there is one hunk missing:
diff --git a/block/qcow2.c b/block/qcow2.c
index 75b8bec..db1931b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -229,7 +229,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }

     /* alloc L2 table/refcount block cache */
-    writethrough = ((flags & BDRV_O_CACHE_MASK) == 0);
+    writethrough = ((flags & (BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)) == 0);
     s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, writethrough);
     s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
         writethrough);

Stefan

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
  2011-03-15 14:11   ` [Qemu-devel] [PATCH 3/4] ide: wire up setfeatures cache control Christoph Hellwig
  2011-03-15 14:11   ` [Qemu-devel] [PATCH 4/4] virtio-blk: add runtime " Christoph Hellwig
@ 2011-03-16  9:49   ` Stefan Hajnoczi
  2011-03-16  9:59     ` Kevin Wolf
  2011-03-17 14:44   ` [Qemu-devel] " Daniel P. Berrange
  3 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2011-03-16  9:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kwolf, Jes Sorensen, aliguori, qemu-devel, prerna

On Tue, Mar 15, 2011 at 2:11 PM, Christoph Hellwig <hch@lst.de> wrote:
> Add a new bdrv_change_cache that can set/clear the writeback flag
> at runtime by stopping all I/O and closing/reopening the image file.
>
> All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
> with minimal refactoring.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: qemu/block.c
> ===================================================================
> --- qemu.orig/block.c   2011-03-15 11:47:31.285634626 +0100
> +++ qemu/block.c        2011-03-15 14:57:03.680633093 +0100
> @@ -441,6 +441,8 @@ static int bdrv_open_common(BlockDriverS
>
>     if (flags & BDRV_O_CACHE_WB)
>         bs->enable_write_cache = 1;
> +    else
> +        bs->enable_write_cache = 0;
>
>     /*
>      * Clear flags that are internal to the block layer before opening the
> @@ -651,6 +653,44 @@ unlink_and_fail:
>     return ret;
>  }
>
> +static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
> +{
> +    BlockDriver *drv = bs->drv;
> +    int ret;
> +
> +    if (bdrv_flags == bs->open_flags) {
> +        return 0;
> +    }
> +
> +    /* Quiesce IO for the given block device */
> +    qemu_aio_flush();
> +    bdrv_flush(bs);
> +
> +    bdrv_close(bs);
> +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);

This will fail for -snapshot disks since the on-disk file is deleted.

In general it would be more robust to keep the original file
descriptor around in case the file cannot be opened with the new
flags.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-16  9:42   ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-03-16  9:55     ` Kevin Wolf
  2011-03-16 14:08     ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Kevin Wolf @ 2011-03-16  9:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: prerna@linux.vnet.ibm.com Anthony Liguori, Christoph Hellwig, qemu-devel

Am 16.03.2011 10:42, schrieb Stefan Hajnoczi:
> On Tue, Mar 15, 2011 at 2:11 PM, Christoph Hellwig <hch@lst.de> wrote:
>> Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache,
>> but no writeback semantics.  All existing callers are changed to also
>> specify BDRV_O_CACHE_WB to give them writeback semantics.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I think there is one hunk missing:
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 75b8bec..db1931b 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -229,7 +229,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>      }
> 
>      /* alloc L2 table/refcount block cache */
> -    writethrough = ((flags & BDRV_O_CACHE_MASK) == 0);
> +    writethrough = ((flags & (BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)) == 0);
>      s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE, writethrough);
>      s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE,
>          writethrough);

This one doesn't make a difference. But the intention could be made
clearer now that all writeback modes have BDRV_O_CACHE_WB set:

    writethrough = ((flags & BDRV_O_CACHE_WB) == 0);

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-16  9:49   ` [Qemu-devel] Re: [PATCH 2/4] block: add a helper to change writeback mode on the fly Stefan Hajnoczi
@ 2011-03-16  9:59     ` Kevin Wolf
  0 siblings, 0 replies; 37+ messages in thread
From: Kevin Wolf @ 2011-03-16  9:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Jes Sorensen, aliguori, prerna, Christoph Hellwig, qemu-devel

Am 16.03.2011 10:49, schrieb Stefan Hajnoczi:
> On Tue, Mar 15, 2011 at 2:11 PM, Christoph Hellwig <hch@lst.de> wrote:
>> Add a new bdrv_change_cache that can set/clear the writeback flag
>> at runtime by stopping all I/O and closing/reopening the image file.
>>
>> All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
>> with minimal refactoring.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>
>> Index: qemu/block.c
>> ===================================================================
>> --- qemu.orig/block.c   2011-03-15 11:47:31.285634626 +0100
>> +++ qemu/block.c        2011-03-15 14:57:03.680633093 +0100
>> @@ -441,6 +441,8 @@ static int bdrv_open_common(BlockDriverS
>>
>>     if (flags & BDRV_O_CACHE_WB)
>>         bs->enable_write_cache = 1;
>> +    else
>> +        bs->enable_write_cache = 0;
>>
>>     /*
>>      * Clear flags that are internal to the block layer before opening the
>> @@ -651,6 +653,44 @@ unlink_and_fail:
>>     return ret;
>>  }
>>
>> +static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +    int ret;
>> +
>> +    if (bdrv_flags == bs->open_flags) {
>> +        return 0;
>> +    }
>> +
>> +    /* Quiesce IO for the given block device */
>> +    qemu_aio_flush();
>> +    bdrv_flush(bs);
>> +
>> +    bdrv_close(bs);
>> +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
> 
> This will fail for -snapshot disks since the on-disk file is deleted.
> 
> In general it would be more robust to keep the original file
> descriptor around in case the file cannot be opened with the new
> flags.

Looks like we'll need a bdrv_reopen for protocols?

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-16  9:42   ` [Qemu-devel] " Stefan Hajnoczi
  2011-03-16  9:55     ` Kevin Wolf
@ 2011-03-16 14:08     ` Christoph Hellwig
  2011-03-16 17:00       ` Stefan Hajnoczi
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-16 14:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, prerna@linux.vnet.ibm.com Anthony Liguori,
	Christoph Hellwig, qemu-devel

On Wed, Mar 16, 2011 at 09:42:37AM +0000, Stefan Hajnoczi wrote:
> -    writethrough = ((flags & BDRV_O_CACHE_MASK) == 0);
> +    writethrough = ((flags & (BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)) == 0);

or rather

	writethrough = ((flags & (BDRV_O_CACHE_WB) != );

but yes, this code had sneaked in since my initial version.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-16  4:09     ` [Qemu-devel] " Rusty Russell
@ 2011-03-16 14:09       ` Christoph Hellwig
  -1 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-16 14:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, qemu-devel, linux-kernel, stefanha, kwolf,
	prerna, aliguori

On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
> > +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> > +		;
> > +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
> 
>    Is there a reason we're not letting gcc and/or strcmp do the
> optimization work here?

I'm happ to switch strcmp.

> > +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> > +	                  &features, sizeof(features));
> > +
> > +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> > +			  &features2, sizeof(features2));
> > +
> > +	if ((features & VIRTIO_BLK_RT_WCE) !=
> > +	    (features2 & VIRTIO_BLK_RT_WCE))
> > +		return -EIO;
> 
>    This seems like a debugging check you left in.  Or do you suspect
> some issues?

No, it's intentional.  config space writes can't return errors, so we need
to check that the value has really changed.  I'll add a comment explaining it.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-16 14:09       ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-16 14:09 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kwolf, aliguori, stefanha, linux-kernel, qemu-devel, prerna,
	Christoph Hellwig

On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
> > +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> > +		;
> > +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
> 
>    Is there a reason we're not letting gcc and/or strcmp do the
> optimization work here?

I'm happ to switch strcmp.

> > +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> > +	                  &features, sizeof(features));
> > +
> > +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> > +			  &features2, sizeof(features2));
> > +
> > +	if ((features & VIRTIO_BLK_RT_WCE) !=
> > +	    (features2 & VIRTIO_BLK_RT_WCE))
> > +		return -EIO;
> 
>    This seems like a debugging check you left in.  Or do you suspect
> some issues?

No, it's intentional.  config space writes can't return errors, so we need
to check that the value has really changed.  I'll add a comment explaining it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-16 14:08     ` Christoph Hellwig
@ 2011-03-16 17:00       ` Stefan Hajnoczi
  2011-03-17  9:07         ` Kevin Wolf
  0 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2011-03-16 17:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwolf, prerna@linux.vnet.ibm.com Anthony Liguori, qemu-devel

On Wed, Mar 16, 2011 at 2:08 PM, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Mar 16, 2011 at 09:42:37AM +0000, Stefan Hajnoczi wrote:
>> -    writethrough = ((flags & BDRV_O_CACHE_MASK) == 0);
>> +    writethrough = ((flags & (BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)) == 0);
>
> or rather
>
>        writethrough = ((flags & (BDRV_O_CACHE_WB) != );
>
> but yes, this code had sneaked in since my initial version.

My intention was that if we don't care about honoring flushes then we
might as well use Qcow2Cache.  But yes, just checking for cache mode
is the clearest.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-16 14:09       ` [Qemu-devel] " Christoph Hellwig
@ 2011-03-17  5:06         ` Rusty Russell
  -1 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-17  5:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, qemu-devel, linux-kernel, stefanha, kwolf,
	prerna, aliguori, Christian Borntraeger

On Wed, 16 Mar 2011 15:09:58 +0100, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
> > > +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> > > +		;
> > > +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
> > 
> >    Is there a reason we're not letting gcc and/or strcmp do the
> > optimization work here?
> 
> I'm happ to switch strcmp.

Of course, that's assuming buf is nul terminated.

> > > +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> > > +	                  &features, sizeof(features));
> > > +
> > > +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> > > +			  &features2, sizeof(features2));
> > > +
> > > +	if ((features & VIRTIO_BLK_RT_WCE) !=
> > > +	    (features2 & VIRTIO_BLK_RT_WCE))
> > > +		return -EIO;
> > 
> >    This seems like a debugging check you left in.  Or do you suspect
> > some issues?
> 
> No, it's intentional.  config space writes can't return errors, so we need
> to check that the value has really changed.  I'll add a comment explaining it.

OK, under what circumstances could it fail?

If you're using this mechanism to indicate that the host doesn't support
the feature, that's making an assumption about the nature of config
space writes which isn't true for non-PCI virtio.

ie. lguest and S/390 don't trap writes to config space.

Or perhaps they should?  But we should be explicit about needing it...

Thanks,
Rusty.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-17  5:06         ` Rusty Russell
  0 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-17  5:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwolf, aliguori, stefanha, linux-kernel, qemu-devel,
	Christian Borntraeger, prerna

On Wed, 16 Mar 2011 15:09:58 +0100, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
> > > +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> > > +		;
> > > +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
> > 
> >    Is there a reason we're not letting gcc and/or strcmp do the
> > optimization work here?
> 
> I'm happ to switch strcmp.

Of course, that's assuming buf is nul terminated.

> > > +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> > > +	                  &features, sizeof(features));
> > > +
> > > +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> > > +			  &features2, sizeof(features2));
> > > +
> > > +	if ((features & VIRTIO_BLK_RT_WCE) !=
> > > +	    (features2 & VIRTIO_BLK_RT_WCE))
> > > +		return -EIO;
> > 
> >    This seems like a debugging check you left in.  Or do you suspect
> > some issues?
> 
> No, it's intentional.  config space writes can't return errors, so we need
> to check that the value has really changed.  I'll add a comment explaining it.

OK, under what circumstances could it fail?

If you're using this mechanism to indicate that the host doesn't support
the feature, that's making an assumption about the nature of config
space writes which isn't true for non-PCI virtio.

ie. lguest and S/390 don't trap writes to config space.

Or perhaps they should?  But we should be explicit about needing it...

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-16 17:00       ` Stefan Hajnoczi
@ 2011-03-17  9:07         ` Kevin Wolf
  2011-03-17  9:18           ` Stefan Hajnoczi
  0 siblings, 1 reply; 37+ messages in thread
From: Kevin Wolf @ 2011-03-17  9:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: prerna@linux.vnet.ibm.com Anthony Liguori, Christoph Hellwig, qemu-devel

Am 16.03.2011 18:00, schrieb Stefan Hajnoczi:
> On Wed, Mar 16, 2011 at 2:08 PM, Christoph Hellwig <hch@lst.de> wrote:
>> On Wed, Mar 16, 2011 at 09:42:37AM +0000, Stefan Hajnoczi wrote:
>>> -    writethrough = ((flags & BDRV_O_CACHE_MASK) == 0);
>>> +    writethrough = ((flags & (BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)) == 0);
>>
>> or rather
>>
>>        writethrough = ((flags & (BDRV_O_CACHE_WB) != );
>>
>> but yes, this code had sneaked in since my initial version.
> 
> My intention was that if we don't care about honoring flushes then we
> might as well use Qcow2Cache.  But yes, just checking for cache mode
> is the clearest.

You mean for a possible writethrough mode with BDRV_O_NO_FLUSH set? Such
a mode doesn't make a lot of sense to me.

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE
  2011-03-17  9:07         ` Kevin Wolf
@ 2011-03-17  9:18           ` Stefan Hajnoczi
  0 siblings, 0 replies; 37+ messages in thread
From: Stefan Hajnoczi @ 2011-03-17  9:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: prerna@linux.vnet.ibm.com Anthony Liguori, Christoph Hellwig, qemu-devel

On Thu, Mar 17, 2011 at 9:07 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 16.03.2011 18:00, schrieb Stefan Hajnoczi:
>> On Wed, Mar 16, 2011 at 2:08 PM, Christoph Hellwig <hch@lst.de> wrote:
>>> On Wed, Mar 16, 2011 at 09:42:37AM +0000, Stefan Hajnoczi wrote:
>>>> -    writethrough = ((flags & BDRV_O_CACHE_MASK) == 0);
>>>> +    writethrough = ((flags & (BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)) == 0);
>>>
>>> or rather
>>>
>>>        writethrough = ((flags & (BDRV_O_CACHE_WB) != );
>>>
>>> but yes, this code had sneaked in since my initial version.
>>
>> My intention was that if we don't care about honoring flushes then we
>> might as well use Qcow2Cache.  But yes, just checking for cache mode
>> is the clearest.
>
> You mean for a possible writethrough mode with BDRV_O_NO_FLUSH set? Such
> a mode doesn't make a lot of sense to me.

Yeah, I guess you're right, we can never have BDRV_O_NO_FLUSH without
BDRV_O_CACHE_WB today.  I wasn't thinking end-to-end, just looking at
the BDRV_O_* flag bits.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-17  5:06         ` [Qemu-devel] " Rusty Russell
@ 2011-03-17 14:21           ` Christoph Hellwig
  -1 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-17 14:21 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, kwolf, aliguori, stefanha, linux-kernel,
	qemu-devel, Christian Borntraeger, prerna

On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
> > I'm happ to switch strcmp.
> 
> Of course, that's assuming buf is nul terminated.

It's the string the user writes into it, which normally should be
nul-terminated.

> > No, it's intentional.  config space writes can't return errors, so we need
> > to check that the value has really changed.  I'll add a comment explaining it.
> 
> OK, under what circumstances could it fail?
> 
> If you're using this mechanism to indicate that the host doesn't support
> the feature, that's making an assumption about the nature of config
> space writes which isn't true for non-PCI virtio.
> 
> ie. lguest and S/390 don't trap writes to config space.
> 
> Or perhaps they should?  But we should be explicit about needing it...

We have the features flag to indicate if updating the caching mode is
supported, but we we could still fail it for other reasons - e.g. we could fail
to reopen the file with/without O_SYNC.  But if lguest or S/390 don't support
trapping config space write this feature won't work for them at all.  As do
other features that make use of config space write, e.g. updating the MAC
address for virtio-net.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-17 14:21           ` Christoph Hellwig
  0 siblings, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2011-03-17 14:21 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kwolf, aliguori, stefanha, qemu-devel, linux-kernel,
	Christian Borntraeger, prerna, Christoph Hellwig

On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
> > I'm happ to switch strcmp.
> 
> Of course, that's assuming buf is nul terminated.

It's the string the user writes into it, which normally should be
nul-terminated.

> > No, it's intentional.  config space writes can't return errors, so we need
> > to check that the value has really changed.  I'll add a comment explaining it.
> 
> OK, under what circumstances could it fail?
> 
> If you're using this mechanism to indicate that the host doesn't support
> the feature, that's making an assumption about the nature of config
> space writes which isn't true for non-PCI virtio.
> 
> ie. lguest and S/390 don't trap writes to config space.
> 
> Or perhaps they should?  But we should be explicit about needing it...

We have the features flag to indicate if updating the caching mode is
supported, but we we could still fail it for other reasons - e.g. we could fail
to reopen the file with/without O_SYNC.  But if lguest or S/390 don't support
trapping config space write this feature won't work for them at all.  As do
other features that make use of config space write, e.g. updating the MAC
address for virtio-net.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
                     ` (2 preceding siblings ...)
  2011-03-16  9:49   ` [Qemu-devel] Re: [PATCH 2/4] block: add a helper to change writeback mode on the fly Stefan Hajnoczi
@ 2011-03-17 14:44   ` Daniel P. Berrange
  2011-03-17 15:11     ` Kevin Wolf
  3 siblings, 1 reply; 37+ messages in thread
From: Daniel P. Berrange @ 2011-03-17 14:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kwolf, stefanha, aliguori, qemu-devel, prerna

On Tue, Mar 15, 2011 at 03:11:32PM +0100, Christoph Hellwig wrote:
> Add a new bdrv_change_cache that can set/clear the writeback flag
> at runtime by stopping all I/O and closing/reopening the image file.
> 
> All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
> with minimal refactoring.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
>  
> +static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
> +{
> +    BlockDriver *drv = bs->drv;
> +    int ret;
> +
> +    if (bdrv_flags == bs->open_flags) {
> +        return 0;
> +    }
> +
> +    /* Quiesce IO for the given block device */
> +    qemu_aio_flush();
> +    bdrv_flush(bs);
> +
> +    bdrv_close(bs);
> +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
> +
> +    /*
> +     * A failed attempt to reopen the image file must lead to 'abort()'
> +     */
> +    if (ret != 0) {
> +        abort();
> +    }
> +
> +    return ret;
> +}



> +
> +int bdrv_change_cache(BlockDriverState *bs, bool enable)
> +{
> +    int bdrv_flags = 0;
> +
> +    bdrv_flags = bs->open_flags & ~BDRV_O_CACHE_WB;
> +    if (enable) {
> +        bdrv_flags |= BDRV_O_CACHE_WB;
> +    }
> +
> +    return bdrv_reopen(bs, bdrv_flags);
> +}

Is there any way we can manage todo this *without* closing and
re-opening the file descriptor ?

One of the things we're considering for the future is to enable
QEMU to be passed open FD(s) for its drives, from the management
system, instead of having QEMU open the files itself. This will
make it possible to have strong, per-VM selinux isolation for
disks stored on NFS. It will also make it easier to let us run
QEMU inside a private filesystem namespace (CLONE_NEWNS) and not
have to expose any files in that namespace except the QEMU binary
& a few device nodes (particularly useful for hotplug since we
won't know what files need to be visible inside the private namespace
ahead of time).

If QEMU expects to close+reopen any of its disks at any time the
guest requests, this will complicate life somewhat

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-17 14:44   ` [Qemu-devel] " Daniel P. Berrange
@ 2011-03-17 15:11     ` Kevin Wolf
  2011-03-17 16:44       ` Stefan Hajnoczi
  0 siblings, 1 reply; 37+ messages in thread
From: Kevin Wolf @ 2011-03-17 15:11 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: stefanha, aliguori, prerna, Christoph Hellwig, qemu-devel

Am 17.03.2011 15:44, schrieb Daniel P. Berrange:
> On Tue, Mar 15, 2011 at 03:11:32PM +0100, Christoph Hellwig wrote:
>> Add a new bdrv_change_cache that can set/clear the writeback flag
>> at runtime by stopping all I/O and closing/reopening the image file.
>>
>> All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
>> with minimal refactoring.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>
>>  
>> +static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +    int ret;
>> +
>> +    if (bdrv_flags == bs->open_flags) {
>> +        return 0;
>> +    }
>> +
>> +    /* Quiesce IO for the given block device */
>> +    qemu_aio_flush();
>> +    bdrv_flush(bs);
>> +
>> +    bdrv_close(bs);
>> +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
>> +
>> +    /*
>> +     * A failed attempt to reopen the image file must lead to 'abort()'
>> +     */
>> +    if (ret != 0) {
>> +        abort();
>> +    }
>> +
>> +    return ret;
>> +}
> 
> 
> 
>> +
>> +int bdrv_change_cache(BlockDriverState *bs, bool enable)
>> +{
>> +    int bdrv_flags = 0;
>> +
>> +    bdrv_flags = bs->open_flags & ~BDRV_O_CACHE_WB;
>> +    if (enable) {
>> +        bdrv_flags |= BDRV_O_CACHE_WB;
>> +    }
>> +
>> +    return bdrv_reopen(bs, bdrv_flags);
>> +}
> 
> Is there any way we can manage todo this *without* closing and
> re-opening the file descriptor ?
> 
> One of the things we're considering for the future is to enable
> QEMU to be passed open FD(s) for its drives, from the management
> system, instead of having QEMU open the files itself. 

What about cache mode, read-only flags etc. that are set with the right
flags during the open? Must qemu just assume that the management has
done the right thing?

And what about things like backing files or other information that
depends on the content of the image file?

> If QEMU expects to close+reopen any of its disks at any time the
> guest requests, this will complicate life somewhat

It does expect this. For example for making backing files temporarily
read-write during a 'commit' monitor command, we already reopen the
image. (Let's hope nobody uses -snapshot, live snapshots and commit, he
would lose his disk...)

Kevin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-17 15:11     ` Kevin Wolf
@ 2011-03-17 16:44       ` Stefan Hajnoczi
  2011-03-19  8:28         ` Blue Swirl
  0 siblings, 1 reply; 37+ messages in thread
From: Stefan Hajnoczi @ 2011-03-17 16:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: aliguori, prerna, Christoph Hellwig, qemu-devel

On Thu, Mar 17, 2011 at 3:11 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 17.03.2011 15:44, schrieb Daniel P. Berrange:
>> On Tue, Mar 15, 2011 at 03:11:32PM +0100, Christoph Hellwig wrote:
>>> Add a new bdrv_change_cache that can set/clear the writeback flag
>>> at runtime by stopping all I/O and closing/reopening the image file.
>>>
>>> All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
>>> with minimal refactoring.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>
>>>
>>> +static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
>>> +{
>>> +    BlockDriver *drv = bs->drv;
>>> +    int ret;
>>> +
>>> +    if (bdrv_flags == bs->open_flags) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    /* Quiesce IO for the given block device */
>>> +    qemu_aio_flush();
>>> +    bdrv_flush(bs);
>>> +
>>> +    bdrv_close(bs);
>>> +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
>>> +
>>> +    /*
>>> +     * A failed attempt to reopen the image file must lead to 'abort()'
>>> +     */
>>> +    if (ret != 0) {
>>> +        abort();
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>
>>
>>
>>> +
>>> +int bdrv_change_cache(BlockDriverState *bs, bool enable)
>>> +{
>>> +    int bdrv_flags = 0;
>>> +
>>> +    bdrv_flags = bs->open_flags & ~BDRV_O_CACHE_WB;
>>> +    if (enable) {
>>> +        bdrv_flags |= BDRV_O_CACHE_WB;
>>> +    }
>>> +
>>> +    return bdrv_reopen(bs, bdrv_flags);
>>> +}
>>
>> Is there any way we can manage todo this *without* closing and
>> re-opening the file descriptor ?
>>
>> One of the things we're considering for the future is to enable
>> QEMU to be passed open FD(s) for its drives, from the management
>> system, instead of having QEMU open the files itself.
>
> What about cache mode, read-only flags etc. that are set with the right
> flags during the open? Must qemu just assume that the management has
> done the right thing?
>
> And what about things like backing files or other information that
> depends on the content of the image file?
>
>> If QEMU expects to close+reopen any of its disks at any time the
>> guest requests, this will complicate life somewhat
>
> It does expect this. For example for making backing files temporarily
> read-write during a 'commit' monitor command, we already reopen the
> image. (Let's hope nobody uses -snapshot, live snapshots and commit, he
> would lose his disk...)

I think we need to use the most robust solution possible.  We don't
want to get into a situation that can be avoided.  Especially in cases
where human errors can (==will) be made.

I suggested using /proc/$pid/fd to reopen an existing file descriptor.
 That interface is only available on Linux and possibly some of
{Solaris, *BSD, OSX}.  So there needs to be a fallback, but in this
situation it feels right to take advantage of whatever means are
provided by the host OS to make safe transitions between image files.

Stefan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly
  2011-03-17 16:44       ` Stefan Hajnoczi
@ 2011-03-19  8:28         ` Blue Swirl
  0 siblings, 0 replies; 37+ messages in thread
From: Blue Swirl @ 2011-03-19  8:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, aliguori, qemu-devel, Christoph Hellwig, prerna

On Thu, Mar 17, 2011 at 6:44 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Mar 17, 2011 at 3:11 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 17.03.2011 15:44, schrieb Daniel P. Berrange:
>>> On Tue, Mar 15, 2011 at 03:11:32PM +0100, Christoph Hellwig wrote:
>>>> Add a new bdrv_change_cache that can set/clear the writeback flag
>>>> at runtime by stopping all I/O and closing/reopening the image file.
>>>>
>>>> All code is based on a patch from Prerna Saxena <prerna@linux.vnet.ibm.com>
>>>> with minimal refactoring.
>>>>
>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>
>>>>
>>>> +static int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
>>>> +{
>>>> +    BlockDriver *drv = bs->drv;
>>>> +    int ret;
>>>> +
>>>> +    if (bdrv_flags == bs->open_flags) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    /* Quiesce IO for the given block device */
>>>> +    qemu_aio_flush();
>>>> +    bdrv_flush(bs);
>>>> +
>>>> +    bdrv_close(bs);
>>>> +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
>>>> +
>>>> +    /*
>>>> +     * A failed attempt to reopen the image file must lead to 'abort()'
>>>> +     */
>>>> +    if (ret != 0) {
>>>> +        abort();
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>
>>>
>>>
>>>> +
>>>> +int bdrv_change_cache(BlockDriverState *bs, bool enable)
>>>> +{
>>>> +    int bdrv_flags = 0;
>>>> +
>>>> +    bdrv_flags = bs->open_flags & ~BDRV_O_CACHE_WB;
>>>> +    if (enable) {
>>>> +        bdrv_flags |= BDRV_O_CACHE_WB;
>>>> +    }
>>>> +
>>>> +    return bdrv_reopen(bs, bdrv_flags);
>>>> +}
>>>
>>> Is there any way we can manage todo this *without* closing and
>>> re-opening the file descriptor ?
>>>
>>> One of the things we're considering for the future is to enable
>>> QEMU to be passed open FD(s) for its drives, from the management
>>> system, instead of having QEMU open the files itself.
>>
>> What about cache mode, read-only flags etc. that are set with the right
>> flags during the open? Must qemu just assume that the management has
>> done the right thing?
>>
>> And what about things like backing files or other information that
>> depends on the content of the image file?
>>
>>> If QEMU expects to close+reopen any of its disks at any time the
>>> guest requests, this will complicate life somewhat
>>
>> It does expect this. For example for making backing files temporarily
>> read-write during a 'commit' monitor command, we already reopen the
>> image. (Let's hope nobody uses -snapshot, live snapshots and commit, he
>> would lose his disk...)
>
> I think we need to use the most robust solution possible.  We don't
> want to get into a situation that can be avoided.  Especially in cases
> where human errors can (==will) be made.
>
> I suggested using /proc/$pid/fd to reopen an existing file descriptor.
>  That interface is only available on Linux and possibly some of
> {Solaris, *BSD, OSX}.  So there needs to be a fallback, but in this
> situation it feels right to take advantage of whatever means are
> provided by the host OS to make safe transitions between image files.

No need for such hacks. There could be an explicit reopen method,
which could be implemented differently for each backing file type
(file descriptor, file, socket, CDROM devices etc).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-17 14:21           ` Christoph Hellwig
@ 2011-03-24  0:11             ` Rusty Russell
  -1 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-24  0:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, kwolf, aliguori, stefanha, linux-kernel,
	qemu-devel, Christian Borntraeger, prerna, Alex Williamson

On Thu, 17 Mar 2011 15:21:22 +0100, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
> > OK, under what circumstances could it fail?
> > 
> > If you're using this mechanism to indicate that the host doesn't support
> > the feature, that's making an assumption about the nature of config
> > space writes which isn't true for non-PCI virtio.
> > 
> > ie. lguest and S/390 don't trap writes to config space.
> > 
> > Or perhaps they should?  But we should be explicit about needing it...
> 
> We have the features flag to indicate if updating the caching mode is
> supported, but we we could still fail it for other reasons - e.g. we could fail
> to reopen the file with/without O_SYNC.

OK, then I think you need to make it a real command and feed it into the
request queue.

The theory behind config space is that it's for advertising, not for
interaction.  And it's not ever very good at that...

>  But if lguest or S/390 don't support
> trapping config space write this feature won't work for them at all.  As do
> other features that make use of config space write, e.g. updating the MAC
> address for virtio-net.

Yes, and that was a mistake.  What does qemu do with a partially-written
MAC address?  Lguest ignores it, what does S/390 do?

Alex?

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-24  0:11             ` Rusty Russell
  0 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-24  0:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwolf, aliguori, stefanha, qemu-devel, linux-kernel,
	Christian Borntraeger, prerna, Alex Williamson

On Thu, 17 Mar 2011 15:21:22 +0100, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
> > OK, under what circumstances could it fail?
> > 
> > If you're using this mechanism to indicate that the host doesn't support
> > the feature, that's making an assumption about the nature of config
> > space writes which isn't true for non-PCI virtio.
> > 
> > ie. lguest and S/390 don't trap writes to config space.
> > 
> > Or perhaps they should?  But we should be explicit about needing it...
> 
> We have the features flag to indicate if updating the caching mode is
> supported, but we we could still fail it for other reasons - e.g. we could fail
> to reopen the file with/without O_SYNC.

OK, then I think you need to make it a real command and feed it into the
request queue.

The theory behind config space is that it's for advertising, not for
interaction.  And it's not ever very good at that...

>  But if lguest or S/390 don't support
> trapping config space write this feature won't work for them at all.  As do
> other features that make use of config space write, e.g. updating the MAC
> address for virtio-net.

Yes, and that was a mistake.  What does qemu do with a partially-written
MAC address?  Lguest ignores it, what does S/390 do?

Alex?

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-17  5:06         ` [Qemu-devel] " Rusty Russell
@ 2011-03-24  3:05           ` Anthony Liguori
  -1 siblings, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2011-03-24  3:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, qemu-devel, linux-kernel, stefanha, kwolf,
	prerna, Christian Borntraeger

On 03/17/2011 12:06 AM, Rusty Russell wrote:
> On Wed, 16 Mar 2011 15:09:58 +0100, Christoph Hellwig<hch@lst.de>  wrote:
>> On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
>>>> +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
>>>> +		;
>>>> +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
>>>     Is there a reason we're not letting gcc and/or strcmp do the
>>> optimization work here?
>> I'm happ to switch strcmp.
> Of course, that's assuming buf is nul terminated.
>
>>>> +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
>>>> +	&features, sizeof(features));
>>>> +
>>>> +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
>>>> +			&features2, sizeof(features2));
>>>> +
>>>> +	if ((features&  VIRTIO_BLK_RT_WCE) !=
>>>> +	    (features2&  VIRTIO_BLK_RT_WCE))
>>>> +		return -EIO;
>>>     This seems like a debugging check you left in.  Or do you suspect
>>> some issues?
>> No, it's intentional.  config space writes can't return errors, so we need
>> to check that the value has really changed.  I'll add a comment explaining it.
> OK, under what circumstances could it fail?
>
> If you're using this mechanism to indicate that the host doesn't support
> the feature, that's making an assumption about the nature of config
> space writes which isn't true for non-PCI virtio.
>
> ie. lguest and S/390 don't trap writes to config space.
>
> Or perhaps they should?  But we should be explicit about needing it...

I don't think we ever operated on the assumption that config space 
writes would trap.

I don't think adding it is the right thing either because you can do 
byte access to the config space which makes atomicity difficult.

Any reason not to use a control queue to negotiate dynamic features?  
The authorative source of what the currently enabled features are can 
still be config space but the guest's enabling or disabling of a feature 
ought to be a control queue message.

Regards,

Anthony Liguori

> Thanks,
> Rusty.
>
>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-24  3:05           ` Anthony Liguori
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2011-03-24  3:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kwolf, stefanha, linux-kernel, qemu-devel, Christian Borntraeger,
	prerna, Christoph Hellwig

On 03/17/2011 12:06 AM, Rusty Russell wrote:
> On Wed, 16 Mar 2011 15:09:58 +0100, Christoph Hellwig<hch@lst.de>  wrote:
>> On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
>>>> +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
>>>> +		;
>>>> +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
>>>     Is there a reason we're not letting gcc and/or strcmp do the
>>> optimization work here?
>> I'm happ to switch strcmp.
> Of course, that's assuming buf is nul terminated.
>
>>>> +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
>>>> +	&features, sizeof(features));
>>>> +
>>>> +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
>>>> +			&features2, sizeof(features2));
>>>> +
>>>> +	if ((features&  VIRTIO_BLK_RT_WCE) !=
>>>> +	    (features2&  VIRTIO_BLK_RT_WCE))
>>>> +		return -EIO;
>>>     This seems like a debugging check you left in.  Or do you suspect
>>> some issues?
>> No, it's intentional.  config space writes can't return errors, so we need
>> to check that the value has really changed.  I'll add a comment explaining it.
> OK, under what circumstances could it fail?
>
> If you're using this mechanism to indicate that the host doesn't support
> the feature, that's making an assumption about the nature of config
> space writes which isn't true for non-PCI virtio.
>
> ie. lguest and S/390 don't trap writes to config space.
>
> Or perhaps they should?  But we should be explicit about needing it...

I don't think we ever operated on the assumption that config space 
writes would trap.

I don't think adding it is the right thing either because you can do 
byte access to the config space which makes atomicity difficult.

Any reason not to use a control queue to negotiate dynamic features?  
The authorative source of what the currently enabled features are can 
still be config space but the guest's enabling or disabling of a feature 
ought to be a control queue message.

Regards,

Anthony Liguori

> Thanks,
> Rusty.
>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-17 14:21           ` Christoph Hellwig
@ 2011-03-24  3:11             ` Anthony Liguori
  -1 siblings, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2011-03-24  3:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rusty Russell, kwolf, stefanha, linux-kernel, qemu-devel,
	Christian Borntraeger, prerna

On 03/17/2011 09:21 AM, Christoph Hellwig wrote:
> On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
>>> I'm happ to switch strcmp.
>> Of course, that's assuming buf is nul terminated.
> It's the string the user writes into it, which normally should be
> nul-terminated.
>
>>> No, it's intentional.  config space writes can't return errors, so we need
>>> to check that the value has really changed.  I'll add a comment explaining it.
>> OK, under what circumstances could it fail?
>>
>> If you're using this mechanism to indicate that the host doesn't support
>> the feature, that's making an assumption about the nature of config
>> space writes which isn't true for non-PCI virtio.
>>
>> ie. lguest and S/390 don't trap writes to config space.
>>
>> Or perhaps they should?  But we should be explicit about needing it...
> We have the features flag to indicate if updating the caching mode is
> supported, but we we could still fail it for other reasons - e.g. we could fail
> to reopen the file with/without O_SYNC.  But if lguest or S/390 don't support
> trapping config space write this feature won't work for them at all.  As do
> other features that make use of config space write, e.g. updating the MAC
> address for virtio-net.

QEMU does not rely on config space writes for supporting mac address 
updates.

Whenever the config space is updated, we update the mac address in the 
virtio-net structure but since PCI only supports 4 byte accesses, it 
will only be partially updated at certain points in time.

This is okay though because as long as a guest updates it before we send 
a network packet out, when we refer to the mac address, it will be 
correct.  We could just as well have the config space be in shared 
memory and only refer to the mac address in that shared memory area when 
transmitting a packet.

So the fact that we respond to config space writes doesn't mean that 
writes have a side effect other than updating the config space, which is 
really what I think the important point is here.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-24  3:11             ` Anthony Liguori
  0 siblings, 0 replies; 37+ messages in thread
From: Anthony Liguori @ 2011-03-24  3:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kwolf, stefanha, Rusty Russell, linux-kernel, qemu-devel,
	Christian Borntraeger, prerna

On 03/17/2011 09:21 AM, Christoph Hellwig wrote:
> On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
>>> I'm happ to switch strcmp.
>> Of course, that's assuming buf is nul terminated.
> It's the string the user writes into it, which normally should be
> nul-terminated.
>
>>> No, it's intentional.  config space writes can't return errors, so we need
>>> to check that the value has really changed.  I'll add a comment explaining it.
>> OK, under what circumstances could it fail?
>>
>> If you're using this mechanism to indicate that the host doesn't support
>> the feature, that's making an assumption about the nature of config
>> space writes which isn't true for non-PCI virtio.
>>
>> ie. lguest and S/390 don't trap writes to config space.
>>
>> Or perhaps they should?  But we should be explicit about needing it...
> We have the features flag to indicate if updating the caching mode is
> supported, but we we could still fail it for other reasons - e.g. we could fail
> to reopen the file with/without O_SYNC.  But if lguest or S/390 don't support
> trapping config space write this feature won't work for them at all.  As do
> other features that make use of config space write, e.g. updating the MAC
> address for virtio-net.

QEMU does not rely on config space writes for supporting mac address 
updates.

Whenever the config space is updated, we update the mac address in the 
virtio-net structure but since PCI only supports 4 byte accesses, it 
will only be partially updated at certain points in time.

This is okay though because as long as a guest updates it before we send 
a network packet out, when we refer to the mac address, it will be 
correct.  We could just as well have the config space be in shared 
memory and only refer to the mac address in that shared memory area when 
transmitting a packet.

So the fact that we respond to config space writes doesn't mean that 
writes have a side effect other than updating the config space, which is 
really what I think the important point is here.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-24  3:05           ` [Qemu-devel] " Anthony Liguori
@ 2011-03-24  9:54             ` Christian Borntraeger
  -1 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2011-03-24  9:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Rusty Russell, Christoph Hellwig, qemu-devel, linux-kernel,
	stefanha, kwolf, prerna

Am 24.03.2011 04:05, schrieb Anthony Liguori:
>> ie. lguest and S/390 don't trap writes to config space.
>>
>> Or perhaps they should?  But we should be explicit about needing it...
> I don't think we ever operated on the assumption that config space writes would trap.
> 
> I don't think adding it is the right thing either because you can do byte access to the config space which makes atomicity difficult.

There is the additional problem, that s390 has no MMIO and,therefore,
there is no real HW support for trapping writes to an area. You can
use page faults, or read-only faults on newer systems, but this is 
expensive. In addition, page faults only deliver the page frame, but
not the offset within a page.
> 
> Any reason not to use a control queue to negotiate dynamic features? 

Sounds reasonable.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-24  9:54             ` Christian Borntraeger
  0 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2011-03-24  9:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, Rusty Russell, linux-kernel, qemu-devel, prerna,
	Christoph Hellwig

Am 24.03.2011 04:05, schrieb Anthony Liguori:
>> ie. lguest and S/390 don't trap writes to config space.
>>
>> Or perhaps they should?  But we should be explicit about needing it...
> I don't think we ever operated on the assumption that config space writes would trap.
> 
> I don't think adding it is the right thing either because you can do byte access to the config space which makes atomicity difficult.

There is the additional problem, that s390 has no MMIO and,therefore,
there is no real HW support for trapping writes to an area. You can
use page faults, or read-only faults on newer systems, but this is 
expensive. In addition, page faults only deliver the page frame, but
not the offset within a page.
> 
> Any reason not to use a control queue to negotiate dynamic features? 

Sounds reasonable.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH, RFC] virtio_blk: add cache control support
  2011-03-24  9:54             ` [Qemu-devel] " Christian Borntraeger
@ 2011-03-25  5:08               ` Rusty Russell
  -1 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-25  5:08 UTC (permalink / raw)
  To: Christian Borntraeger, Anthony Liguori
  Cc: Christoph Hellwig, qemu-devel, linux-kernel, stefanha, kwolf, prerna

On Thu, 24 Mar 2011 10:54:05 +0100, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> Am 24.03.2011 04:05, schrieb Anthony Liguori:
> >> ie. lguest and S/390 don't trap writes to config space.
> >>
> >> Or perhaps they should?  But we should be explicit about needing it...
> > I don't think we ever operated on the assumption that config space writes would trap.
> > 
> > I don't think adding it is the right thing either because you can do byte access to the config space which makes atomicity difficult.
> 
> There is the additional problem, that s390 has no MMIO and,therefore,
> there is no real HW support for trapping writes to an area. You can
> use page faults, or read-only faults on newer systems, but this is 
> expensive. In addition, page faults only deliver the page frame, but
> not the offset within a page.

That's not *really* a problem, since you have control over the
config_set operation and could do whatever you wanted.

But I wanted to make sure we're all on the same page: you *can't* rely
on the host knowing immediately what you write to the config space.  If
you want that, an actual queued request is necessary...

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] Re: [PATCH, RFC] virtio_blk: add cache control support
@ 2011-03-25  5:08               ` Rusty Russell
  0 siblings, 0 replies; 37+ messages in thread
From: Rusty Russell @ 2011-03-25  5:08 UTC (permalink / raw)
  To: Christian Borntraeger, Anthony Liguori
  Cc: kwolf, stefanha, linux-kernel, qemu-devel, prerna, Christoph Hellwig

On Thu, 24 Mar 2011 10:54:05 +0100, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> Am 24.03.2011 04:05, schrieb Anthony Liguori:
> >> ie. lguest and S/390 don't trap writes to config space.
> >>
> >> Or perhaps they should?  But we should be explicit about needing it...
> > I don't think we ever operated on the assumption that config space writes would trap.
> > 
> > I don't think adding it is the right thing either because you can do byte access to the config space which makes atomicity difficult.
> 
> There is the additional problem, that s390 has no MMIO and,therefore,
> there is no real HW support for trapping writes to an area. You can
> use page faults, or read-only faults on newer systems, but this is 
> expensive. In addition, page faults only deliver the page frame, but
> not the offset within a page.

That's not *really* a problem, since you have control over the
config_set operation and could do whatever you wanted.

But I wanted to make sure we're all on the same page: you *can't* rely
on the host knowing immediately what you write to the config space.  If
you want that, an actual queued request is necessary...

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2011-03-25 11:22 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-15 14:10 [Qemu-devel] [PATCH, RFC 0/4] allow guest control of the volatile write cache Christoph Hellwig
2011-03-15 14:11 ` [Qemu-devel] [PATCH 1/4] block: clarify the meaning of BDRV_O_NOCACHE Christoph Hellwig
2011-03-16  9:42   ` [Qemu-devel] " Stefan Hajnoczi
2011-03-16  9:55     ` Kevin Wolf
2011-03-16 14:08     ` Christoph Hellwig
2011-03-16 17:00       ` Stefan Hajnoczi
2011-03-17  9:07         ` Kevin Wolf
2011-03-17  9:18           ` Stefan Hajnoczi
2011-03-15 14:11 ` [Qemu-devel] [PATCH 2/4] block: add a helper to change writeback mode on the fly Christoph Hellwig
2011-03-15 14:11   ` [Qemu-devel] [PATCH 3/4] ide: wire up setfeatures cache control Christoph Hellwig
2011-03-15 14:11   ` [Qemu-devel] [PATCH 4/4] virtio-blk: add runtime " Christoph Hellwig
2011-03-16  9:49   ` [Qemu-devel] Re: [PATCH 2/4] block: add a helper to change writeback mode on the fly Stefan Hajnoczi
2011-03-16  9:59     ` Kevin Wolf
2011-03-17 14:44   ` [Qemu-devel] " Daniel P. Berrange
2011-03-17 15:11     ` Kevin Wolf
2011-03-17 16:44       ` Stefan Hajnoczi
2011-03-19  8:28         ` Blue Swirl
2011-03-15 14:16 ` [PATCH, RFC] virtio_blk: add cache control support Christoph Hellwig
2011-03-15 14:16   ` [Qemu-devel] " Christoph Hellwig
2011-03-16  4:09   ` Rusty Russell
2011-03-16  4:09     ` [Qemu-devel] " Rusty Russell
2011-03-16 14:09     ` Christoph Hellwig
2011-03-16 14:09       ` [Qemu-devel] " Christoph Hellwig
2011-03-17  5:06       ` Rusty Russell
2011-03-17  5:06         ` [Qemu-devel] " Rusty Russell
2011-03-17 14:21         ` Christoph Hellwig
2011-03-17 14:21           ` Christoph Hellwig
2011-03-24  0:11           ` Rusty Russell
2011-03-24  0:11             ` Rusty Russell
2011-03-24  3:11           ` Anthony Liguori
2011-03-24  3:11             ` Anthony Liguori
2011-03-24  3:05         ` Anthony Liguori
2011-03-24  3:05           ` [Qemu-devel] " Anthony Liguori
2011-03-24  9:54           ` Christian Borntraeger
2011-03-24  9:54             ` [Qemu-devel] " Christian Borntraeger
2011-03-25  5:08             ` Rusty Russell
2011-03-25  5:08               ` [Qemu-devel] " Rusty Russell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.