All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
@ 2010-05-10 21:51 Alexander Graf
  2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
  2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
  0 siblings, 2 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 21:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch

Thanks to recent improvements, qemu flushes guest data to disk when the guest
tells us to do so.

This is great if we care about data consistency on host disk failures. In cases
where we don't it just creates additional overhead for no net win. One such use
case is the building of appliances in SUSE Studio. We write the resulting images
out of the build VM, but compress it directly afterwards. So if possible we'd
love to keep it in RAM.

This patchset introduces a new block parameter to -drive called "flush" which
allows a user to disable flushing in odd scenarios like the above. To show the
difference in performance this makes, I have put together a small test case.
Inside the initrd, I call the following piece of code on a 500MB preallocated
vmdk image:

  mkfs.ext3 /dev/vda
  mkdir -p /mnt
  mount /dev/vda /mnt
  dd if=/dev/zero of=/mnt/test bs=1M
  umount /mnt
  sync
  halt -fp

With flush=on (default)

real	0m33.597s
user	0m16.453s
sys	0m6.192s

With flush=off

real	0m27.150s
user	0m16.533s
sys	0m5.348s


Alexander Graf (2):
  Add no-op aio emulation stub
  Add flush=off parameter to -drive

 block.c           |   18 ++++++++++++++++++
 block.h           |    5 +++++
 block/raw-posix.c |   13 +++++++++++++
 qemu-config.c     |    3 +++
 qemu-options.hx   |    3 +++
 vl.c              |    3 +++
 6 files changed, 45 insertions(+), 0 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub
  2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
@ 2010-05-10 21:51 ` Alexander Graf
  2010-05-10 21:51   ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
                     ` (2 more replies)
  2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
  1 sibling, 3 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 21:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch

We need to be able to do nothing in AIO fashion. Since I suspect this
could be useful for more cases than the non flushing, I figured I'd
create a new function that does everything AIO-like, but doesn't do
anything.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 block.c |   18 ++++++++++++++++++
 block.h |    5 +++++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 48305b7..1cd39d7 100644
--- a/block.c
+++ b/block.c
@@ -2196,6 +2196,24 @@ static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
     return &acb->common;
 }
 
+BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BlockDriverAIOCBSync *acb;
+
+    acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
+    acb->is_write = 1; /* don't bounce in the completion hadler */
+    acb->qiov = NULL;
+    acb->bounce = NULL;
+    acb->ret = 0;
+
+    if (!acb->bh)
+        acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
+
+    qemu_bh_schedule(acb->bh);
+    return &acb->common;
+}
+
 /**************************************************************/
 /* sync block device emulation */
 
diff --git a/block.h b/block.h
index f87d24e..bef6358 100644
--- a/block.h
+++ b/block.h
@@ -33,6 +33,7 @@ typedef struct QEMUSnapshotInfo {
 #define BDRV_O_CACHE_WB    0x0040 /* use write-back caching */
 #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
 #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
+#define BDRV_O_NOFLUSH     0x0200 /* don't flush the image ever */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)
 
@@ -97,6 +98,10 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
 				 BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
 
+/* Emulate a no-op */
+BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
+
 typedef struct BlockRequest {
     /* Fields to be filled by multiwrite caller */
     int64_t sector;
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
@ 2010-05-10 21:51   ` Alexander Graf
  2010-05-11  8:36     ` [Qemu-devel] " Kevin Wolf
  2010-05-11  6:18   ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
  2010-05-11  8:29   ` [Qemu-devel] " Kevin Wolf
  2 siblings, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 21:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch

Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.

So let's add a new parameter to -drive that allows us to set the flushing
behavior to "on" or "off", defaulting to enabling the guest to flush.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 block/raw-posix.c |   13 +++++++++++++
 qemu-config.c     |    3 +++
 qemu-options.hx   |    3 +++
 vl.c              |    3 +++
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7541ed2..2510b1b 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -106,6 +106,7 @@ typedef struct BDRVRawState {
     int fd;
     int type;
     int open_flags;
+    int bdrv_flags;
 #if defined(__linux__)
     /* linux floppy specific */
     int64_t fd_open_time;
@@ -133,6 +134,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
     BDRVRawState *s = bs->opaque;
     int fd, ret;
 
+    s->bdrv_flags = bdrv_flags;
     s->open_flags = open_flags | O_BINARY;
     s->open_flags &= ~O_ACCMODE;
     if (bdrv_flags & BDRV_O_RDWR) {
@@ -555,6 +557,11 @@ static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
     if (fd_open(bs) < 0)
         return NULL;
 
+    /* Don't flush? */
+    if (s->bdrv_flags & BDRV_O_NOFLUSH) {
+        return bdrv_aio_noop_em(bs, cb, opaque);
+    }
+
     return paio_submit(bs, s->fd, 0, NULL, 0, cb, opaque, QEMU_AIO_FLUSH);
 }
 
@@ -726,6 +733,12 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
 static void raw_flush(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
+
+    /* No flush means no flush */
+    if (s->bdrv_flags & BDRV_O_NOFLUSH) {
+        return;
+    }
+
     qemu_fdatasync(s->fd);
 }
 
diff --git a/qemu-config.c b/qemu-config.c
index d500885..c358add 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -79,6 +79,9 @@ QemuOptsList qemu_drive_opts = {
         },{
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
+        },{
+            .name = "flush",
+            .type = QEMU_OPT_BOOL,
         },
         { /* end if list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index 12f6b51..69ae8de 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -120,6 +120,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none][,format=f][,serial=s]\n"
     "       [,addr=A][,id=name][,aio=threads|native][,readonly=on|off]\n"
+    "       [,flush=on|off]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -151,6 +152,8 @@ These options have the same definition as they have in @option{-hdachs}.
 @var{cache} is "none", "writeback", or "writethrough" and controls how the host cache is used to access block data.
 @item aio=@var{aio}
 @var{aio} is "threads", or "native" and selects between pthread based disk I/O and native Linux AIO.
+@item flush=@var{flush}
+@var{flush} is "on" (default), or "off" and select whether the guest can trigger a host flush
 @item format=@var{format}
 Specify which disk @var{format} will be used rather than detecting
 the format.  Can be used to specifiy format=raw to avoid interpreting
diff --git a/vl.c b/vl.c
index 85bcc84..a7ca2c3 100644
--- a/vl.c
+++ b/vl.c
@@ -787,6 +787,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
     int max_devs;
     int index;
     int ro = 0;
+    int flush = 1;
     int bdrv_flags = 0;
     int on_read_error, on_write_error;
     const char *devaddr;
@@ -819,6 +820,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
 
     snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
     ro = qemu_opt_get_bool(opts, "readonly", 0);
+    flush = qemu_opt_get_bool(opts, "flush", 1);
 
     file = qemu_opt_get(opts, "file");
     serial = qemu_opt_get(opts, "serial");
@@ -1118,6 +1120,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
     }
 
     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
+    bdrv_flags |= flush ? 0 : BDRV_O_NOFLUSH;
 
     if (bdrv_open(dinfo->bdrv, file, bdrv_flags, drv) < 0) {
         fprintf(stderr, "qemu: could not open disk image %s: %s\n",
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
  2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
@ 2010-05-10 21:59 ` Anthony Liguori
  2010-05-10 22:03   ` Alexander Graf
  1 sibling, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-10 21:59 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kwolf, qemu-devel, hch

On 05/10/2010 04:51 PM, Alexander Graf wrote:
> Thanks to recent improvements, qemu flushes guest data to disk when the guest
> tells us to do so.
>
> This is great if we care about data consistency on host disk failures. In cases
> where we don't it just creates additional overhead for no net win. One such use
> case is the building of appliances in SUSE Studio. We write the resulting images
> out of the build VM, but compress it directly afterwards. So if possible we'd
> love to keep it in RAM.
>
> This patchset introduces a new block parameter to -drive called "flush" which
> allows a user to disable flushing in odd scenarios like the above. To show the
> difference in performance this makes, I have put together a small test case.
> Inside the initrd, I call the following piece of code on a 500MB preallocated
> vmdk image:
>    

This seems like it's asking for trouble to me.  I'm not sure it's worth 
the minor performance gain.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
@ 2010-05-10 22:03   ` Alexander Graf
  2010-05-10 22:12     ` Anthony Liguori
  0 siblings, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 22:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: kwolf, qemu-devel, hch


On 10.05.2010, at 23:59, Anthony Liguori wrote:

> On 05/10/2010 04:51 PM, Alexander Graf wrote:
>> Thanks to recent improvements, qemu flushes guest data to disk when the guest
>> tells us to do so.
>> 
>> This is great if we care about data consistency on host disk failures. In cases
>> where we don't it just creates additional overhead for no net win. One such use
>> case is the building of appliances in SUSE Studio. We write the resulting images
>> out of the build VM, but compress it directly afterwards. So if possible we'd
>> love to keep it in RAM.
>> 
>> This patchset introduces a new block parameter to -drive called "flush" which
>> allows a user to disable flushing in odd scenarios like the above. To show the
>> difference in performance this makes, I have put together a small test case.
>> Inside the initrd, I call the following piece of code on a 500MB preallocated
>> vmdk image:
>>   
> 
> This seems like it's asking for trouble to me.  I'm not sure it's worth the minor performance gain.

The gain is little on my netbook where I did the test on. This is part of performance regressions from 0.10 to 0.12 where we're talking build times of 2 minutes going to 30. While writeback was most of the chunk, flushing still at least doubled the build times which is unacceptable for us.

I also fail to see where it's asking for trouble. If we don't flush volatile data, things are good, no?


Alex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-10 22:03   ` Alexander Graf
@ 2010-05-10 22:12     ` Anthony Liguori
  2010-05-11 21:48       ` Jamie Lokier
  0 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-10 22:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kwolf, qemu-devel, hch

On 05/10/2010 05:03 PM, Alexander Graf wrote:
> On 10.05.2010, at 23:59, Anthony Liguori wrote:
>
>    
>> On 05/10/2010 04:51 PM, Alexander Graf wrote:
>>      
>>> Thanks to recent improvements, qemu flushes guest data to disk when the guest
>>> tells us to do so.
>>>
>>> This is great if we care about data consistency on host disk failures. In cases
>>> where we don't it just creates additional overhead for no net win. One such use
>>> case is the building of appliances in SUSE Studio. We write the resulting images
>>> out of the build VM, but compress it directly afterwards. So if possible we'd
>>> love to keep it in RAM.
>>>
>>> This patchset introduces a new block parameter to -drive called "flush" which
>>> allows a user to disable flushing in odd scenarios like the above. To show the
>>> difference in performance this makes, I have put together a small test case.
>>> Inside the initrd, I call the following piece of code on a 500MB preallocated
>>> vmdk image:
>>>
>>>        
>> This seems like it's asking for trouble to me.  I'm not sure it's worth the minor performance gain.
>>      
> The gain is little on my netbook where I did the test on. This is part of performance regressions from 0.10 to 0.12 where we're talking build times of 2 minutes going to 30. While writeback was most of the chunk, flushing still at least doubled the build times which is unacceptable for us.
>    

There's got to be a better place to fix this.  Disable barriers in your 
guests?

Regards,

Anthony Liguori

> I also fail to see where it's asking for trouble. If we don't flush volatile data, things are good, no?
>
>
> Alex
>
>    

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub
  2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
  2010-05-10 21:51   ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
@ 2010-05-11  6:18   ` Stefan Hajnoczi
  2010-05-11  8:29   ` [Qemu-devel] " Kevin Wolf
  2 siblings, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2010-05-11  6:18 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kwolf, qemu-devel, hch

bdrv_aio_noop_em() could be useful for benchmarking and optimizing the
aio code.  It serves as a cheap operation that lets us see the cost of
the aio roundtrip.

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] Re: [PATCH 1/2] Add no-op aio emulation stub
  2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
  2010-05-10 21:51   ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
  2010-05-11  6:18   ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
@ 2010-05-11  8:29   ` Kevin Wolf
  2 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2010-05-11  8:29 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel, hch

Am 10.05.2010 23:51, schrieb Alexander Graf:
> We need to be able to do nothing in AIO fashion. Since I suspect this
> could be useful for more cases than the non flushing, I figured I'd
> create a new function that does everything AIO-like, but doesn't do
> anything.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  block.c |   18 ++++++++++++++++++
>  block.h |    5 +++++
>  2 files changed, 23 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 48305b7..1cd39d7 100644
> --- a/block.c
> +++ b/block.c
> @@ -2196,6 +2196,24 @@ static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
>      return &acb->common;
>  }
>  
> +BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    BlockDriverAIOCBSync *acb;
> +
> +    acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
> +    acb->is_write = 1; /* don't bounce in the completion hadler */

Typo in the comment.

> +    acb->qiov = NULL;
> +    acb->bounce = NULL;
> +    acb->ret = 0;
> +
> +    if (!acb->bh)
> +        acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
> +
> +    qemu_bh_schedule(acb->bh);
> +    return &acb->common;
> +}
> +
>  /**************************************************************/
>  /* sync block device emulation */
>  
> diff --git a/block.h b/block.h
> index f87d24e..bef6358 100644
> --- a/block.h
> +++ b/block.h
> @@ -33,6 +33,7 @@ typedef struct QEMUSnapshotInfo {
>  #define BDRV_O_CACHE_WB    0x0040 /* use write-back caching */
>  #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
>  #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
> +#define BDRV_O_NOFLUSH     0x0200 /* don't flush the image ever */
>  
>  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)

This hunk should be in patch 2/2.

>  
> @@ -97,6 +98,10 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
>  				 BlockDriverCompletionFunc *cb, void *opaque);
>  void bdrv_aio_cancel(BlockDriverAIOCB *acb);
>  
> +/* Emulate a no-op */
> +BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
> +        BlockDriverCompletionFunc *cb, void *opaque);
> +
>  typedef struct BlockRequest {
>      /* Fields to be filled by multiwrite caller */
>      int64_t sector;

I think exporting this function shouldn't be necessary. Everything that
deals with AIO emulation should be contained in block.c.

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-10 21:51   ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
@ 2010-05-11  8:36     ` Kevin Wolf
  2010-05-11 10:55       ` Christoph Hellwig
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Kevin Wolf @ 2010-05-11  8:36 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel, hch

Am 10.05.2010 23:51, schrieb Alexander Graf:
> Usually the guest can tell the host to flush data to disk. In some cases we
> don't want to flush though, but try to keep everything in cache.
> 
> So let's add a new parameter to -drive that allows us to set the flushing
> behavior to "on" or "off", defaulting to enabling the guest to flush.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

What about another cache=... value instead of adding more options? I'm
quite sure you'll only ever need this with writeback caching. So we
could have cache=none|writethrough|writeback|wb-noflush or something
like that.

> ---
>  block/raw-posix.c |   13 +++++++++++++

This is obviously wrong. If you want to introduce new behaviour to the
block layer, you must do it consistently and not just for one block
driver. So these changes should be made to the generic functions in
block.c instead.

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11  8:36     ` [Qemu-devel] " Kevin Wolf
@ 2010-05-11 10:55       ` Christoph Hellwig
  2010-05-11 12:15         ` Paul Brook
  2010-05-14  9:16         ` Markus Armbruster
  2010-05-11 19:04       ` Avi Kivity
  2010-05-12 15:05       ` Alexander Graf
  2 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2010-05-11 10:55 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Alexander Graf, armbru

On Tue, May 11, 2010 at 10:36:24AM +0200, Kevin Wolf wrote:
> Am 10.05.2010 23:51, schrieb Alexander Graf:
> > Usually the guest can tell the host to flush data to disk. In some cases we
> > don't want to flush though, but try to keep everything in cache.
> > 
> > So let's add a new parameter to -drive that allows us to set the flushing
> > behavior to "on" or "off", defaulting to enabling the guest to flush.
> > 
> > Signed-off-by: Alexander Graf <agraf@suse.de>
> 
> What about another cache=... value instead of adding more options? I'm
> quite sure you'll only ever need this with writeback caching. So we
> could have cache=none|writethrough|writeback|wb-noflush or something
> like that.

The cache option really isn't too useful.  There's a matrix of 3x2
possible I/O modes for the posix backend, and right now we only expose
two of them.  I think we really should expose all of them, split into
two options:

                                      caching

                    | normal             | O_DIRECT           |
            --------+--------------------+--------------------+
            none    | -                  | -                  |
integrity   O_DSYNC | cache=writeback    | -                  |
            fsync   | cache=writethrough | cache=none         |
            --------+--------------------+--------------------+

While we have to keep the cache= option for legacy purposes we're
much better to have two new options for the I/O mode and integrity
to express everything we need.  Especially given that I'm looking
into allowing the guest to tweak the volatile write cache setting
of the emulated disk, which will allow us to turn on/off the O_SYNC
on the flight.  In the current scheme that would introduce a
O_DSYNC|O_DIRECT mode that's not currently accesible through the
command line/monitor.

Markus was looking into separating the block device state in host/
guest portions lately, and splitting cache= would help with this.
Driver cache enabled or not (fsync means enabled, O_DSYNC disabled,
and none probably disabled given that we don't care), while O_DIRECT
vs normal cached I/O is host state.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 10:55       ` Christoph Hellwig
@ 2010-05-11 12:15         ` Paul Brook
  2010-05-11 12:43           ` Anthony Liguori
                             ` (2 more replies)
  2010-05-14  9:16         ` Markus Armbruster
  1 sibling, 3 replies; 41+ messages in thread
From: Paul Brook @ 2010-05-11 12:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, armbru, Christoph Hellwig, Alexander Graf

> > What about another cache=... value instead of adding more options? I'm
> > quite sure you'll only ever need this with writeback caching. So we
> > could have cache=none|writethrough|writeback|wb-noflush or something
> > like that.

I agree.
 
> The cache option really isn't too useful.  There's a matrix of 3x2
> possible I/O modes for the posix backend, and right now we only expose
> two of them.  I think we really should expose all of them, split into
> two options:
> 
>                                       caching
> 
>                     | normal             | O_DIRECT           |
> 
>             --------+--------------------+--------------------+
>             none    | -                  | -                  |
> integrity   O_DSYNC | cache=writeback    | -                  |
>             fsync   | cache=writethrough | cache=none         |
>             --------+--------------------+--------------------+

I think this table is misleading, and from a user perspective there are only 4 
interesting combinations:

cache=none:
  No host caching. Reads and writes both go directly to underlying storage. 
Useful to avoid double-caching.

cache=writethrough
  Reads are cached. Writes go directly to underlying storage.  Useful for 
broken guests that aren't aware of drive caches.

cache=writeback
  Reads and writes are cached. Guest flushes are honoured. Note that this may 
include a guest visible knob that allows the write cache to be disabled. If 
the guest requests the cache be disabled then this should act the same as 
cache=writethrough. Closest to how real hardware works.

cache=always (or a more scary name like cache=lie to defend against idiots)
  Reads and writes are cached. Guest flushes are ignored.  Useful for dumb 
guests in non-critical environments.

IMO the other boxes in your table (disable cache but require guest flushes) 
don't make any sense.

Paul

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 12:15         ` Paul Brook
@ 2010-05-11 12:43           ` Anthony Liguori
  2010-05-11 13:12             ` Paul Brook
  2010-05-11 15:18           ` Alexander Graf
  2010-05-11 18:20           ` Jamie Lokier
  2 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 12:43 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru

On 05/11/2010 07:15 AM, Paul Brook wrote:
>>> What about another cache=... value instead of adding more options? I'm
>>> quite sure you'll only ever need this with writeback caching. So we
>>> could have cache=none|writethrough|writeback|wb-noflush or something
>>> like that.
>>>        
> I think this table is misleading, and from a user perspective there are only 4
> interesting combinations:
>    

I agree that splitting cache would be unfortunate for users.

> cache=none:
>    No host caching. Reads and writes both go directly to underlying storage.
> Useful to avoid double-caching.
>
> cache=writethrough
>    Reads are cached. Writes go directly to underlying storage.  Useful for
> broken guests that aren't aware of drive caches.
>
> cache=writeback
>    Reads and writes are cached. Guest flushes are honoured. Note that this may
> include a guest visible knob that allows the write cache to be disabled. If
> the guest requests the cache be disabled then this should act the same as
> cache=writethrough. Closest to how real hardware works.
>
> cache=always (or a more scary name like cache=lie to defend against idiots)
>    Reads and writes are cached. Guest flushes are ignored.  Useful for dumb
> guests in non-critical environments.
>    

I really don't believe that we should support a cache=lie.  There are 
many other obtain the same results.  For instance, mount your guest 
filesystem with barrier=0.

Introducing a mode where data integrity isn't preserved means that it's 
possible for someone to accidentally use it or use it when they really 
shouldn't.  People already struggle with what caching mode they should 
use and this would just make the problem worse.

Regards,

Anthony Liguori

> IMO the other boxes in your table (disable cache but require guest flushes)
> don't make any sense.
>
> Paul
>
>    

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 12:43           ` Anthony Liguori
@ 2010-05-11 13:12             ` Paul Brook
  2010-05-11 13:20               ` Anthony Liguori
  0 siblings, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 13:12 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru

> > cache=always (or a more scary name like cache=lie to defend against
> > idiots)
> > 
> >    Reads and writes are cached. Guest flushes are ignored.  Useful for
> >    dumb guests in non-critical environments.
> 
> I really don't believe that we should support a cache=lie.  There are
> many other obtain the same results.  For instance, mount your guest
> filesystem with barrier=0.

Ideally yes. However in practice I suspect this is still a useful option. Is 
it even possible to disable barriers in all cases (e.g. NTFS under windows)?

In a production environment it's probably not so useful - you're generally 
dealing with long lived, custom configured guests.

In a development environment the rules can be a bit different. For example if 
you're testing an OS installer then you really don't want to be passing magic 
mount options. If the host machine dies then you don't care about the state of 
the guest because you're going to start from scratch anyway.

Paul

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 13:12             ` Paul Brook
@ 2010-05-11 13:20               ` Anthony Liguori
  2010-05-11 13:50                 ` Paul Brook
  2010-05-11 16:32                 ` Jamie Lokier
  0 siblings, 2 replies; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 13:20 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Christoph Hellwig

On 05/11/2010 08:12 AM, Paul Brook wrote:
>>> cache=always (or a more scary name like cache=lie to defend against
>>> idiots)
>>>
>>>     Reads and writes are cached. Guest flushes are ignored.  Useful for
>>>     dumb guests in non-critical environments.
>>>        
>> I really don't believe that we should support a cache=lie.  There are
>> many other obtain the same results.  For instance, mount your guest
>> filesystem with barrier=0.
>>      
> Ideally yes. However in practice I suspect this is still a useful option. Is
> it even possible to disable barriers in all cases (e.g. NTFS under windows)?
>
> In a production environment it's probably not so useful - you're generally
> dealing with long lived, custom configured guests.
>
> In a development environment the rules can be a bit different. For example if
> you're testing an OS installer then you really don't want to be passing magic
> mount options. If the host machine dies then you don't care about the state of
> the guest because you're going to start from scratch anyway.
>    

Then create a mount point on your host and mount the host file system 
under that mount with barrier=0.

The problem with options added for developers is that those options are 
very often accidentally used for production.

Regards,

Anthony Liguori

> Paul
>
>    

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 13:20               ` Anthony Liguori
@ 2010-05-11 13:50                 ` Paul Brook
  2010-05-11 15:40                   ` Anthony Liguori
  2010-05-11 16:32                 ` Jamie Lokier
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 13:50 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig

> > In a development environment the rules can be a bit different. For
> > example if you're testing an OS installer then you really don't want to
> > be passing magic mount options. If the host machine dies then you don't
> > care about the state of the guest because you're going to start from
> > scratch anyway.
> 
> Then create a mount point on your host and mount the host file system
> under that mount with barrier=0.
> 
> The problem with options added for developers is that those options are
> very often accidentally used for production.

I disagree.  We should not be removing or rejecting features just because they 
allow you to shoot yourself in the foot.  We probably shouldn't be enabling 
them by default, but that's a whole different question.

Mounting a filesystem with barrier=0 is not a good answer because it's a 
global setting.  While my qemu VM may be disposable, it's unlikely that the 
same is true of the rest of my machine.  By your argument linux shouldn't be 
allowing me to do that in the first place because a dumb sysadmin could use 
that option on the filesystem containing the mail store. In fact with the 
average user that's *likely* to happen because they'll only have a single 
partition on their machine.

Paul

N.B. in this context "developers" refers to users who happen to be using qemu 
as part of a toolset for development of $whatever, not development of qemu 
itself. c.f. "production" where qemu is part of the IT infrastructure used to 
directly provide an external service.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 12:15         ` Paul Brook
  2010-05-11 12:43           ` Anthony Liguori
@ 2010-05-11 15:18           ` Alexander Graf
  2010-05-11 18:20           ` Jamie Lokier
  2 siblings, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-11 15:18 UTC (permalink / raw)
  To: Paul Brook; +Cc: Kevin Wolf, armbru, qemu-devel, Christoph Hellwig

Paul Brook wrote:
>>> What about another cache=... value instead of adding more options? I'm
>>> quite sure you'll only ever need this with writeback caching. So we
>>> could have cache=none|writethrough|writeback|wb-noflush or something
>>> like that.
>>>       
>
> I agree.
>  
>   
>> The cache option really isn't too useful.  There's a matrix of 3x2
>> possible I/O modes for the posix backend, and right now we only expose
>> two of them.  I think we really should expose all of them, split into
>> two options:
>>
>>                                       caching
>>
>>                     | normal             | O_DIRECT           |
>>
>>             --------+--------------------+--------------------+
>>             none    | -                  | -                  |
>> integrity   O_DSYNC | cache=writeback    | -                  |
>>             fsync   | cache=writethrough | cache=none         |
>>             --------+--------------------+--------------------+
>>     
>
> I think this table is misleading, and from a user perspective there are only 4 
> interesting combinations:
>
> cache=none:
>   No host caching. Reads and writes both go directly to underlying storage. 
> Useful to avoid double-caching.
>
> cache=writethrough
>   Reads are cached. Writes go directly to underlying storage.  Useful for 
> broken guests that aren't aware of drive caches.
>
> cache=writeback
>   Reads and writes are cached. Guest flushes are honoured. Note that this may 
> include a guest visible knob that allows the write cache to be disabled. If 
> the guest requests the cache be disabled then this should act the same as 
> cache=writethrough. Closest to how real hardware works.
>
> cache=always (or a more scary name like cache=lie to defend against idiots)
>   Reads and writes are cached. Guest flushes are ignored.  Useful for dumb 
> guests in non-critical environments.
>   

How about cache=insecure?

But I agree. Having that as a cache= option seems to be the easiest way
to expose it to a user.


Alex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 13:50                 ` Paul Brook
@ 2010-05-11 15:40                   ` Anthony Liguori
  2010-05-11 15:53                     ` Paul Brook
  2010-05-11 19:11                     ` Avi Kivity
  0 siblings, 2 replies; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 15:40 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig

On 05/11/2010 08:50 AM, Paul Brook wrote:
>>> In a development environment the rules can be a bit different. For
>>> example if you're testing an OS installer then you really don't want to
>>> be passing magic mount options. If the host machine dies then you don't
>>> care about the state of the guest because you're going to start from
>>> scratch anyway.
>>>        
>> Then create a mount point on your host and mount the host file system
>> under that mount with barrier=0.
>>
>> The problem with options added for developers is that those options are
>> very often accidentally used for production.
>>      
> I disagree.  We should not be removing or rejecting features just because they
> allow you to shoot yourself in the foot.  We probably shouldn't be enabling
> them by default, but that's a whole different question.
>    

I disagree and think the mentality severely hurts usability.  QEMU's 
role should be to enable features, not to simplify every obscure 
feature.  In general, if someone wants to accomplish something, we 
should try to provide a mechanism to accomplish it.  
cache=none|writeback|writethrough is an example of this.  No one other 
than QEMU can control how we open a file descriptor so we need to 
provide a knob for it.

But if the goal is to make sure that fsync's don't result in data 
actually being on disk, there are many other ways to accomplish this.  
First, for the vast majority of users, this is already the case because 
ext3 defaults to disabling barriers.  Alex is running into this issue 
only because SuSE enables barriers by default for ext3 and fsync()'s are 
horribly slow on ext3.  The fact that this is measurable is purely a 
statement of ext3 suckage and a conscious decision by the SuSE team to 
live with that suckage.  It shouldn't be nearly as bad on ext4.

But even in this context, it's not a huge burden to disable barriers 
within the guest or to create a dedicated filesystem mount for your 
images on the host that disabled barriers on the mount.

On the other hand, the cost of adding another caching option is pretty 
significant.  Very few users really understand what caching options they 
should use under what circumstance.  Adding another option will just 
make that worse.  The fact that this new option can result in data 
corruption that won't occur under normal circumstances is troubling.

> Mounting a filesystem with barrier=0 is not a good answer because it's a
> global setting.  While my qemu VM may be disposable, it's unlikely that the
> same is true of the rest of my machine.

You can have multiple mounts.  In fact, you can just make a loopback 
mount within your existing file system which means that you don't even 
have to muck with your disk.

If your VM is disposable, then shouldn't you be using -snapshot?


>    By your argument linux shouldn't be
> allowing me to do that in the first place because a dumb sysadmin could use
> that option on the filesystem containing the mail store. In fact with the
> average user that's *likely* to happen because they'll only have a single
> partition on their machine.
>    

We aren't preventing sophisticated users from doing sophisticated 
things.  But why should we simplify something that most people don't 
need and if they accidentally use it, bad things will happen?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 15:40                   ` Anthony Liguori
@ 2010-05-11 15:53                     ` Paul Brook
  2010-05-11 17:09                       ` Anthony Liguori
  2010-05-11 19:11                     ` Avi Kivity
  1 sibling, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 15:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig

> > I disagree.  We should not be removing or rejecting features just because
> > they allow you to shoot yourself in the foot.  We probably shouldn't be
> > enabling them by default, but that's a whole different question.
> 
> I disagree and think the mentality severely hurts usability.  QEMU's
> role should be to enable features, not to simplify every obscure
> feature.  In general, if someone wants to accomplish something, we
> should try to provide a mechanism to accomplish it.
> cache=none|writeback|writethrough is an example of this.  No one other
> than QEMU can control how we open a file descriptor so we need to
> provide a knob for it.

Doesn't the same argument apply to the existing cache=writethrough?
i.e. if you want to avoid data loss you should make sure your guest issues 
flushes properly, and it's not something qemu should be trying to hack round 
be adding an implicit flushe after every write.

Paul

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 13:20               ` Anthony Liguori
  2010-05-11 13:50                 ` Paul Brook
@ 2010-05-11 16:32                 ` Jamie Lokier
  2010-05-11 17:15                   ` Anthony Liguori
  1 sibling, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 16:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Paul Brook,
	Christoph Hellwig

Anthony Liguori wrote:
> On 05/11/2010 08:12 AM, Paul Brook wrote:
> >>>cache=always (or a more scary name like cache=lie to defend against
> >>>idiots)
> >>>
> >>>    Reads and writes are cached. Guest flushes are ignored.  Useful for
> >>>    dumb guests in non-critical environments.
> >>>       
> >>I really don't believe that we should support a cache=lie.  There are
> >>many other obtain the same results.  For instance, mount your guest
> >>filesystem with barrier=0.
> >>     
> >Ideally yes. However in practice I suspect this is still a useful option. 
> >Is
> >it even possible to disable barriers in all cases (e.g. NTFS under 
> >windows)?
> >
> >In a production environment it's probably not so useful - you're generally
> >dealing with long lived, custom configured guests.
> >
> >In a development environment the rules can be a bit different. For example 
> >if
> >you're testing an OS installer then you really don't want to be passing 
> >magic
> >mount options. If the host machine dies then you don't care about the 
> >state of
> >the guest because you're going to start from scratch anyway.
> >   
> 
> Then create a mount point on your host and mount the host file system 
> under that mount with barrier=0.

Two reasons that advice doesn't work:

1. It doesn't work in many environments.  You can't mount a filesystem
with barrier=0 in one place and barrier=1 on a different point, and
there's ofen only one host partition.

2. barrier=0 does _not_ provide the cache=off behaviour.  It only
disables barriers; it does not prevent writing to the disk hardware.

If you are doing a transient OS install, ideally you want an amount
equal to your free RAM not written to disk until the end.  barrier=0
does not achieve that.

> The problem with options added for developers is that those options are 
> very often accidentally used for production.

We already have risky cache= options.  Also, do we call fdatasync
(with barrier) on _every_ write for guests which disable the
emulated disk cache?

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 15:53                     ` Paul Brook
@ 2010-05-11 17:09                       ` Anthony Liguori
  2010-05-11 22:33                         ` Paul Brook
  0 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 17:09 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig

On 05/11/2010 10:53 AM, Paul Brook wrote:
>>> I disagree.  We should not be removing or rejecting features just because
>>> they allow you to shoot yourself in the foot.  We probably shouldn't be
>>> enabling them by default, but that's a whole different question.
>>>        
>> I disagree and think the mentality severely hurts usability.  QEMU's
>> role should be to enable features, not to simplify every obscure
>> feature.  In general, if someone wants to accomplish something, we
>> should try to provide a mechanism to accomplish it.
>> cache=none|writeback|writethrough is an example of this.  No one other
>> than QEMU can control how we open a file descriptor so we need to
>> provide a knob for it.
>>      
> Doesn't the same argument apply to the existing cache=writethrough?
> i.e. if you want to avoid data loss you should make sure your guest issues
> flushes properly, and it's not something qemu should be trying to hack round
> be adding an implicit flushe after every write.
>    

cache is the host page cache acting as an extended disk cache.  In 
writethrough mode, the behavior is identical to writethrough on a normal 
disk cache in that all operations are completed only when sent down to 
the next storage layer.  In writeback mode, you rely on the integrity of 
the host page cache to prevent data loss.

Data loss is different than data corruption though.  In neither mode 
will a correct application experience data corruption because if a guest 
issues a flush, we respect those requests.  However, the new proposed 
mode would introduce the possibility of data corruption because it 
becomes possible for writes to be written to disk outside the order 
requested by the guest.  In the event of power loss, the result is 
corruption.

Regards,

Anthony Liguori

> Paul
>    

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 16:32                 ` Jamie Lokier
@ 2010-05-11 17:15                   ` Anthony Liguori
  2010-05-11 18:13                     ` Jamie Lokier
  0 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 17:15 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Paul Brook,
	Christoph Hellwig

On 05/11/2010 11:32 AM, Jamie Lokier wrote:
> Anthony Liguori wrote:
>    
>> On 05/11/2010 08:12 AM, Paul Brook wrote:
>>      
>>>>> cache=always (or a more scary name like cache=lie to defend against
>>>>> idiots)
>>>>>
>>>>>     Reads and writes are cached. Guest flushes are ignored.  Useful for
>>>>>     dumb guests in non-critical environments.
>>>>>
>>>>>            
>>>> I really don't believe that we should support a cache=lie.  There are
>>>> many other obtain the same results.  For instance, mount your guest
>>>> filesystem with barrier=0.
>>>>
>>>>          
>>> Ideally yes. However in practice I suspect this is still a useful option.
>>> Is
>>> it even possible to disable barriers in all cases (e.g. NTFS under
>>> windows)?
>>>
>>> In a production environment it's probably not so useful - you're generally
>>> dealing with long lived, custom configured guests.
>>>
>>> In a development environment the rules can be a bit different. For example
>>> if
>>> you're testing an OS installer then you really don't want to be passing
>>> magic
>>> mount options. If the host machine dies then you don't care about the
>>> state of
>>> the guest because you're going to start from scratch anyway.
>>>
>>>        
>> Then create a mount point on your host and mount the host file system
>> under that mount with barrier=0.
>>      
> Two reasons that advice doesn't work:
>
> 1. It doesn't work in many environments.  You can't mount a filesystem
> with barrier=0 in one place and barrier=1 on a different point, and
> there's ofen only one host partition.
>    

qemu-img create -f raw foo.img 10G
mkfs.ext3 foo.img
mount -oloop,rw,barrier=1 -t ext3 foo.img mnt

Works perfectly fine.

> 2. barrier=0 does _not_ provide the cache=off behaviour.  It only
> disables barriers; it does not prevent writing to the disk hardware.
>    

The proposal has nothing to do with cache=off.

>> The problem with options added for developers is that those options are
>> very often accidentally used for production.
>>      
> We already have risky cache= options.  Also, do we call fdatasync
> (with barrier) on _every_ write for guests which disable the
> emulated disk cache?
>    

None of our cache= options should result in data corruption on power 
loss.  If they do, it's a bug.

Regards,

Anthony Liguori

> -- Jamie
>    

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 17:15                   ` Anthony Liguori
@ 2010-05-11 18:13                     ` Jamie Lokier
  0 siblings, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 18:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Paul Brook,
	Christoph Hellwig

Anthony Liguori wrote:
> qemu-img create -f raw foo.img 10G
> mkfs.ext3 foo.img
> mount -oloop,rw,barrier=1 -t ext3 foo.img mnt
> 
> Works perfectly fine.

Hmm, interesting.  Didn't know loop propagated barriers.

So you're suggesting to use qemu with a loop device, and ext2 (bit
faster than ext3) and barrier=0 (well, that's implied if you use
ext2), and a raw image file on the ext2/3 filesystem, to provide the
effect of flush=off, becuase the loop device caches block writes on
the host, except for explicit barrier requests from the fs, which are
turned off?

That wasn't obvious the first time :-)

Does the loop device cache fs writes instead of propagating them
immediately to the underlying fs?  I guess it probably does.

Does the loop device allow the backing file to grow sparsely, to get
behavious like qcow2?

That's ugly but it might just work.

> >2. barrier=0 does _not_ provide the cache=off behaviour.  It only
> >disables barriers; it does not prevent writing to the disk hardware.
> 
> The proposal has nothing to do with cache=off.

Sorry, I meant flush=off (the proposal).  Mounting the host filesystem
(i.e. not using a loop device anywhere) with barrier=0 doesn't have
even close to the same effect.

> >>The problem with options added for developers is that those options are
> >>very often accidentally used for production.
> >>     
> >We already have risky cache= options.  Also, do we call fdatasync
> >(with barrier) on _every_ write for guests which disable the
> >emulated disk cache?
> 
> None of our cache= options should result in data corruption on power 
> loss.  If they do, it's a bug.

(I might have the details below a bit off.)

If cache=none uses O_DIRECT without calling fdatasync for guest
barriers, then it will get data corruption on power loss.

If cache=none does call fdatasync for guest barriers, then it might
still get corruption on power loss; I am not sure if recent Linux host
behaviour of O_DIRECT+fdatasync (with no buffered writes to commit)
issues the necessary barriers.  I am quite sure that older kernels did not.

cache=writethrough will get data corruption on power loss with older
Linux host kernels.  O_DSYNC did not issue barriers.  I'm not sure if
the behaviour of O_DSYNC that was recently changed is now issuing
barriers after every write.

Provided all the cache= options call fdatasync/fsync when the guest
issues a cache flush, and call fdatasync/fsync following _every_ write
when the guest has disabled the emulated write cache, that should be
as good as Qemu can reasonably do.  It's up to the host from there.

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 12:15         ` Paul Brook
  2010-05-11 12:43           ` Anthony Liguori
  2010-05-11 15:18           ` Alexander Graf
@ 2010-05-11 18:20           ` Jamie Lokier
  2010-05-11 21:58             ` Paul Brook
  2 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 18:20 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru

Paul Brook wrote:
> cache=none:
>   No host caching. Reads and writes both go directly to underlying storage. 
> Useful to avoid double-caching.
> 
> cache=writethrough
>   Reads are cached. Writes go directly to underlying storage.  Useful for 
> broken guests that aren't aware of drive caches.

These are misleading descriptions - because cache=none does not push
writes down to powerfail-safe storage, while cache=writethrough might.

> cache=always (or a more scary name like cache=lie to defend against idiots)
>   Reads and writes are cached. Guest flushes are ignored.  Useful for dumb 
> guests in non-critical environments.

cache=unsafe would tell it like it is.

Even non-idiots could be excused for getting the wrong impression from
cache=always.

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11  8:36     ` [Qemu-devel] " Kevin Wolf
  2010-05-11 10:55       ` Christoph Hellwig
@ 2010-05-11 19:04       ` Avi Kivity
  2010-05-12 15:05       ` Alexander Graf
  2 siblings, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2010-05-11 19:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: hch, Alexander Graf, qemu-devel

On 05/11/2010 11:36 AM, Kevin Wolf wrote:
> Am 10.05.2010 23:51, schrieb Alexander Graf:
>    
>> Usually the guest can tell the host to flush data to disk. In some cases we
>> don't want to flush though, but try to keep everything in cache.
>>
>> So let's add a new parameter to -drive that allows us to set the flushing
>> behavior to "on" or "off", defaulting to enabling the guest to flush.
>>
>> Signed-off-by: Alexander Graf<agraf@suse.de>
>>      
> What about another cache=... value instead of adding more options? I'm
> quite sure you'll only ever need this with writeback caching. So we
> could have cache=none|writethrough|writeback|wb-noflush or something
> like that.
>    

I implemented this once and promptly forgot about it, as cache=volatile.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 15:40                   ` Anthony Liguori
  2010-05-11 15:53                     ` Paul Brook
@ 2010-05-11 19:11                     ` Avi Kivity
  1 sibling, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2010-05-11 19:11 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, armbru, Alexander Graf, qemu-devel, Paul Brook,
	Christoph Hellwig

On 05/11/2010 06:40 PM, Anthony Liguori wrote:
>
> But if the goal is to make sure that fsync's don't result in data 
> actually being on disk, there are many other ways to accomplish this.  
> First, for the vast majority of users, this is already the case 
> because ext3 defaults to disabling barriers.  Alex is running into 
> this issue only because SuSE enables barriers by default for ext3 and 
> fsync()'s are horribly slow on ext3.  The fact that this is measurable 
> is purely a statement of ext3 suckage and a conscious decision by the 
> SuSE team to live with that suckage.  It shouldn't be nearly as bad on 
> ext4.
>

There is a huge difference between disabling barriers and cache=volatile.

With barrier=0, data is still forced out of host pagecache and into disk 
cache; it's simply not forced from disk cache to the platter.  But since 
the disk cache is very limited, you'll still be running at disk speed.

With cache=volatile and enough memory you'll never wait for the disk to 
write data.

>
> On the other hand, the cost of adding another caching option is pretty 
> significant.  Very few users really understand what caching options 
> they should use under what circumstance.  Adding another option will 
> just make that worse.  The fact that this new option can result in 
> data corruption that won't occur under normal circumstances is troubling.

I don't think it's a real problem with proper naming, but we can always 
hide the option behind a ./configure --enable-developer-options.

>
>> Mounting a filesystem with barrier=0 is not a good answer because it's a
>> global setting.  While my qemu VM may be disposable, it's unlikely 
>> that the
>> same is true of the rest of my machine.
>
> You can have multiple mounts.  In fact, you can just make a loopback 
> mount within your existing file system which means that you don't even 
> have to muck with your disk.
>
> If your VM is disposable, then shouldn't you be using -snapshot?

For my use case (autotest) the VM is not disposable (it's reused between 
tests) but I don't care about the data in case of a host crash.

>
>
>>    By your argument linux shouldn't be
>> allowing me to do that in the first place because a dumb sysadmin 
>> could use
>> that option on the filesystem containing the mail store. In fact with 
>> the
>> average user that's *likely* to happen because they'll only have a 
>> single
>> partition on their machine.
>
> We aren't preventing sophisticated users from doing sophisticated 
> things.  But why should we simplify something that most people don't 
> need and if they accidentally use it, bad things will happen?

We don't have a real alternative.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-10 22:12     ` Anthony Liguori
@ 2010-05-11 21:48       ` Jamie Lokier
  2010-05-12  8:51         ` Stefan Hajnoczi
  0 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 21:48 UTC (permalink / raw)
  To: Anthony Liguori, g; +Cc: kwolf, hch, Alexander Graf, qemu-devel

Anthony Liguori wrote:
> There's got to be a better place to fix this.  Disable barriers in your 
> guests?

If only it were that easy.

OS installs are the thing that this feature would most help.  They
take ages, do a huge amount of writing with lots of seeking, and if
the host fails you're going to discard the image.

I'm not sure how I would change that setting for most OS install GUIs,
especially Windows, or if it's even possible.

It's usually much easier to change barrier settings after installing
and you've got a command line or registry editing tool.  But by then,
it's not useful any more.

Any other ideas?

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 18:20           ` Jamie Lokier
@ 2010-05-11 21:58             ` Paul Brook
  2010-05-11 22:11               ` Paul Brook
  0 siblings, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 21:58 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru

> Paul Brook wrote:
> > cache=none:
> >   No host caching. Reads and writes both go directly to underlying
> >   storage.
> > 
> > Useful to avoid double-caching.
> > 
> > cache=writethrough
> > 
> >   Reads are cached. Writes go directly to underlying storage.  Useful for
> > broken guests that aren't aware of drive caches.
> 
> These are misleading descriptions - because cache=none does not push
> writes down to powerfail-safe storage, while cache=writethrough might.

If so, then this is a serious bug.

Paul

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 21:58             ` Paul Brook
@ 2010-05-11 22:11               ` Paul Brook
  2010-05-12 10:09                 ` Jamie Lokier
  2010-05-17 12:40                 ` Christoph Hellwig
  0 siblings, 2 replies; 41+ messages in thread
From: Paul Brook @ 2010-05-11 22:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, armbru, Alexander Graf, Christoph Hellwig

> > Paul Brook wrote:
> > > cache=none:
> > >   No host caching. Reads and writes both go directly to underlying
> > >   storage.
> > > 
> > > Useful to avoid double-caching.
> > > 
> > > cache=writethrough
> > > 
> > >   Reads are cached. Writes go directly to underlying storage.  Useful
> > >   for
> > > 
> > > broken guests that aren't aware of drive caches.
> > 
> > These are misleading descriptions - because cache=none does not push
> > writes down to powerfail-safe storage, while cache=writethrough might.
> 
> If so, then this is a serious bug.

.. though it may be a kernel bug rather that a qemu bug, depending on the 
exact details.  Either way, I consider any mode that inhibits host filesystem 
write cache but not volatile drive cache to be pretty worthless.  Either we 
guaranteed data integrity on completion or we don't.

Paul

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 17:09                       ` Anthony Liguori
@ 2010-05-11 22:33                         ` Paul Brook
  0 siblings, 0 replies; 41+ messages in thread
From: Paul Brook @ 2010-05-11 22:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Christoph Hellwig, armbru, Alexander Graf

> On 05/11/2010 10:53 AM, Paul Brook wrote:
> >>> I disagree.  We should not be removing or rejecting features just
> >>> because they allow you to shoot yourself in the foot.  We probably
> >>> shouldn't be enabling them by default, but that's a whole different
> >>> question.
> >> 
> >> I disagree and think the mentality severely hurts usability.  QEMU's
> >> role should be to enable features, not to simplify every obscure
> >> feature.  In general, if someone wants to accomplish something, we
> >> should try to provide a mechanism to accomplish it.
> >> cache=none|writeback|writethrough is an example of this.  No one other
> >> than QEMU can control how we open a file descriptor so we need to
> >> provide a knob for it.
> > 
> > Doesn't the same argument apply to the existing cache=writethrough?
> > i.e. if you want to avoid data loss you should make sure your guest
> > issues flushes properly, and it's not something qemu should be trying to
> > hack round be adding an implicit flushe after every write.
> 
> cache is the host page cache acting as an extended disk cache.  In
> writethrough mode, the behavior is identical to writethrough on a normal
> disk cache in that all operations are completed only when sent down to
> the next storage layer.

IMO this is a bug. Making host pagecache writethrough but still having a 
volatile writeback disk cache seems like a complete waste of time. I can see 
the advantage of disabling host pagecache (avoid double caching in host RAM), 
but having different levels of cache be writethrough/writeback seems extremely 
suspect.

It's also occurred to me that you're also basing your arguments on the 
assumption that host pagecache is volatile.  On a machine with a good UPS this 
is not true. In the even of external power failure the UPS will flush the host 
page cache and cleanly shut the machine down. As with battery-backed RAID 
cards, it's entirely reasonable to consider the cache to be non-volatile 
storage and ignore the flush requests.

If you don't trust your host OS in this situation then you're into a whole 
different level of pain, and raises obvious questions about the firmware 
running on your storage subsystem.

Paul

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-11 21:48       ` Jamie Lokier
@ 2010-05-12  8:51         ` Stefan Hajnoczi
  2010-05-12  9:42           ` Jamie Lokier
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2010-05-12  8:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kwolf, g, hch, qemu-devel

Why add a nop AIO operation instead of setting
BlockDriverState->enable_write_cache to zero?  In that case no write
cache would be reported to the guest (just like cache=writethrough).

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-12  8:51         ` Stefan Hajnoczi
@ 2010-05-12  9:42           ` Jamie Lokier
  2010-05-12 10:43             ` Stefan Hajnoczi
  0 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-12  9:42 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, g, qemu-devel, Alexander Graf, hch

Stefan Hajnoczi wrote:
> Why add a nop AIO operation instead of setting
> BlockDriverState->enable_write_cache to zero?  In that case no write
> cache would be reported to the guest (just like cache=writethrough).

Hmm.  If the guest sees write cache absent, that prevents changing the
cache policy on the host later (from not flushing to flushing), which
you might want to do after an OS install has finished and booted up.

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 22:11               ` Paul Brook
@ 2010-05-12 10:09                 ` Jamie Lokier
  2010-05-17 12:40                 ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2010-05-12 10:09 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, armbru, Christoph Hellwig, qemu-devel, Alexander Graf

Paul Brook wrote:
> > > Paul Brook wrote:
> > > > cache=none:
> > > >   No host caching. Reads and writes both go directly to underlying
> > > >   storage.
> > > > 
> > > > Useful to avoid double-caching.
> > > > 
> > > > cache=writethrough
> > > > 
> > > >   Reads are cached. Writes go directly to underlying storage.  Useful
> > > >   for
> > > > 
> > > > broken guests that aren't aware of drive caches.
> > > 
> > > These are misleading descriptions - because cache=none does not push
> > > writes down to powerfail-safe storage, while cache=writethrough might.
> > 
> > If so, then this is a serious bug.
> 
> .. though it may be a kernel bug rather that a qemu bug, depending on the 
> exact details.

It's not a kernel bug.  cache=none uses O_DIRECT, and O_DIRECT must
not force writes to powerfail-safe storage.  If it did, it would be
unusably slow for applications using O_DIRECT as a performance
enhancer / memory saver.  They can call fsync/fdatasync when they need
to for integrity.  (There might be kernel bugs in the latter department.)

> Either way, I consider any mode that inhibits host filesystem write
> cache but not volatile drive cache to be pretty worthless.

On the contrary, it greatly reduces host memory consumption so that
guest data isn't cached twice (it's already cached in the guest), and
it may improve performance by relaxing the POSIX write-serialisation
constraint (not sure if Linux cares; Solaris does).

> Either we guaranteed data integrity on completion or we don't.

The problem with the description of cache=none is it uses O_DIRECT,
which does always not push writes to powerfail-safe storage,.

O_DIRECT is effectively a hint.  It requests less caching in kernel
memory, may reduce memory usage and copying, may invoke direct DMA.

O_DIRECT does not tell the disk hardware to commit to powerfail-safe
storage.  I.e. it doesn't issue barriers or disable disk write caching.
(However, depending on a host setup, it might have that effect if disk
write cache is disabled by the admin).

Also, it doesn't even always write to disk: It falls back to buffered
in some circumstances, even on filesystems which support it - see
recent patches for btrfs which use buffered I/O for O_DIRECT for some
parts of some files.  (Many non-Linux OSes fall back to buffered
when any other process holds a non-O_DIRECT file descriptor, or when
requests don't meet some criteria).

The POSIX thing to use for cache=none would be O_DSYNC|O_RSYNC, and
that should work on some hosts, but Linux doesn't implement real O_RSYNC.

A combination which ought to work is O_DSYNC|O_DIRECT.  O_DIRECT is
the performance hint; O_DSYNC provides the commit request.  Christoph
Hellwig has mentioned that combination elsewhere on this thread.
It makes sense to me for cache=none.

O_DIRECT by itself is a useful performance & memory hint, so there
does need to be some option which maps onto O_DIRECT alone.  But it
shouldn't be documented as stronger than cache=writethrough, because
it isn't.

--  Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-12  9:42           ` Jamie Lokier
@ 2010-05-12 10:43             ` Stefan Hajnoczi
  2010-05-12 12:50               ` Jamie Lokier
  0 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2010-05-12 10:43 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: kwolf, g, qemu-devel, Alexander Graf, hch

On Wed, May 12, 2010 at 10:42 AM, Jamie Lokier <jamie@shareable.org> wrote:
> Stefan Hajnoczi wrote:
>> Why add a nop AIO operation instead of setting
>> BlockDriverState->enable_write_cache to zero?  In that case no write
>> cache would be reported to the guest (just like cache=writethrough).
>
> Hmm.  If the guest sees write cache absent, that prevents changing the
> cache policy on the host later (from not flushing to flushing), which
> you might want to do after an OS install has finished and booted up.

Right.  There are 3 cases from the guest perspective:

1. Disable write cache or no write cache.  Flushing not needed.
2. Disable flushing but leave write cache enabled.
3. Enable write cache and use flushing.

When we don't report a write cache at all, the guest is always stuck at 1.

If you're going to do this for installs and other temporary workloads,
then enabling the write cache again isn't an issue.  After installing
successfully, restart the guest with a sane cache= mode.

Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
  2010-05-12 10:43             ` Stefan Hajnoczi
@ 2010-05-12 12:50               ` Jamie Lokier
  0 siblings, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2010-05-12 12:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, g, qemu-devel, Alexander Graf, hch

Stefan Hajnoczi wrote:
> On Wed, May 12, 2010 at 10:42 AM, Jamie Lokier <jamie@shareable.org> wrote:
> > Stefan Hajnoczi wrote:
> >> Why add a nop AIO operation instead of setting
> >> BlockDriverState->enable_write_cache to zero?  In that case no write
> >> cache would be reported to the guest (just like cache=writethrough).
> >
> > Hmm.  If the guest sees write cache absent, that prevents changing the
> > cache policy on the host later (from not flushing to flushing), which
> > you might want to do after an OS install has finished and booted up.
> 
> Right.  There are 3 cases from the guest perspective:
> 
> 1. Disable write cache or no write cache.  Flushing not needed.
> 2. Disable flushing but leave write cache enabled.
> 3. Enable write cache and use flushing.
> 
> When we don't report a write cache at all, the guest is always stuck at 1.
> 
> If you're going to do this for installs and other temporary workloads,
> then enabling the write cache again isn't an issue.  After installing
> successfully, restart the guest with a sane cache= mode.

That only works if you're happy to reboot the guest after the process
finishes.  I guess that is usually fine, but it is a restriction.

Is it possible via QMP to request that the guest is paused when it
next reboots, so that QMP operations to change the cache= mode can be
done (as it's not safe to change the guest-visible disk write cache
availability when it's running, and probably a request to do so should
be denied).

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11  8:36     ` [Qemu-devel] " Kevin Wolf
  2010-05-11 10:55       ` Christoph Hellwig
  2010-05-11 19:04       ` Avi Kivity
@ 2010-05-12 15:05       ` Alexander Graf
  2010-05-12 15:36         ` Kevin Wolf
  2 siblings, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2010-05-12 15:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, hch

Kevin Wolf wrote:
> Am 10.05.2010 23:51, schrieb Alexander Graf:
>   
>> Usually the guest can tell the host to flush data to disk. In some cases we
>> don't want to flush though, but try to keep everything in cache.
>>
>> So let's add a new parameter to -drive that allows us to set the flushing
>> behavior to "on" or "off", defaulting to enabling the guest to flush.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>     
>
> What about another cache=... value instead of adding more options? I'm
> quite sure you'll only ever need this with writeback caching. So we
> could have cache=none|writethrough|writeback|wb-noflush or something
> like that.
>
>   

Yes, cache=volatile seems reasonable. Or cache=unsafe.

>> ---
>>  block/raw-posix.c |   13 +++++++++++++
>>     
>
> This is obviously wrong. If you want to introduce new behaviour to the
> block layer, you must do it consistently and not just for one block
> driver. So these changes should be made to the generic functions in
> block.c instead.
>   

How so? The callback functions are called using bdrv->drv->xxx. If I
modify that pointer, I end up affecting all other virtual disks as well.

Alex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-12 15:05       ` Alexander Graf
@ 2010-05-12 15:36         ` Kevin Wolf
  2010-05-12 15:51           ` Alexander Graf
  0 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2010-05-12 15:36 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel, hch

Am 12.05.2010 17:05, schrieb Alexander Graf:
> Kevin Wolf wrote:
>> Am 10.05.2010 23:51, schrieb Alexander Graf:
>>   
>>> Usually the guest can tell the host to flush data to disk. In some cases we
>>> don't want to flush though, but try to keep everything in cache.
>>>
>>> So let's add a new parameter to -drive that allows us to set the flushing
>>> behavior to "on" or "off", defaulting to enabling the guest to flush.
>>>
>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>     
>>
>> What about another cache=... value instead of adding more options? I'm
>> quite sure you'll only ever need this with writeback caching. So we
>> could have cache=none|writethrough|writeback|wb-noflush or something
>> like that.
>>
>>   
> 
> Yes, cache=volatile seems reasonable. Or cache=unsafe.
> 
>>> ---
>>>  block/raw-posix.c |   13 +++++++++++++
>>>     
>>
>> This is obviously wrong. If you want to introduce new behaviour to the
>> block layer, you must do it consistently and not just for one block
>> driver. So these changes should be made to the generic functions in
>> block.c instead.
>>   
> 
> How so? The callback functions are called using bdrv->drv->xxx. If I
> modify that pointer, I end up affecting all other virtual disks as well.

By doing the check in bdrv_flush/bdrv_aio_flush before this function
pointer is even called.

Kevin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-12 15:36         ` Kevin Wolf
@ 2010-05-12 15:51           ` Alexander Graf
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-12 15:51 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, hch


Am 12.05.2010 um 17:36 schrieb Kevin Wolf <kwolf@redhat.com>:

> Am 12.05.2010 17:05, schrieb Alexander Graf:
>> Kevin Wolf wrote:
>>> Am 10.05.2010 23:51, schrieb Alexander Graf:
>>>
>>>> Usually the guest can tell the host to flush data to disk. In  
>>>> some cases we
>>>> don't want to flush though, but try to keep everything in cache.
>>>>
>>>> So let's add a new parameter to -drive that allows us to set the  
>>>> flushing
>>>> behavior to "on" or "off", defaulting to enabling the guest to  
>>>> flush.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>>
>>>
>>> What about another cache=... value instead of adding more options?  
>>> I'm
>>> quite sure you'll only ever need this with writeback caching. So we
>>> could have cache=none|writethrough|writeback|wb-noflush or something
>>> like that.
>>>
>>>
>>
>> Yes, cache=volatile seems reasonable. Or cache=unsafe.
>>
>>>> ---
>>>> block/raw-posix.c |   13 +++++++++++++
>>>>
>>>
>>> This is obviously wrong. If you want to introduce new behaviour to  
>>> the
>>> block layer, you must do it consistently and not just for one block
>>> driver. So these changes should be made to the generic functions in
>>> block.c instead.
>>>
>>
>> How so? The callback functions are called using bdrv->drv->xxx. If I
>> modify that pointer, I end up affecting all other virtual disks as  
>> well.
>
> By doing the check in bdrv_flush/bdrv_aio_flush before this function
> pointer is even called.

Sorry yeah, realized that 2 minutes after writing the mail myself :).

I put together an updated patch, but need to test it properly first.

Alex

>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 10:55       ` Christoph Hellwig
  2010-05-11 12:15         ` Paul Brook
@ 2010-05-14  9:16         ` Markus Armbruster
  2010-05-17 12:41           ` Christoph Hellwig
  1 sibling, 1 reply; 41+ messages in thread
From: Markus Armbruster @ 2010-05-14  9:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Kevin Wolf, Alexander Graf, qemu-devel

Christoph Hellwig <hch@lst.de> writes:

[...]
> Markus was looking into separating the block device state in host/
> guest portions lately, and splitting cache= would help with this.
> Driver cache enabled or not (fsync means enabled, O_DSYNC disabled,
> and none probably disabled given that we don't care), while O_DIRECT
> vs normal cached I/O is host state.

Let me provide a bit more context.  -drive is a happy mix of guest and
host parameters.  Some parameters are clearly host, some clearly guest,
and some are... complicated.  For instance, I'd argue that "readonly"
applies to both host and guest.

I'm working on turning all guest parameters proper qdev properties, and
collect the host parameters in a new -blockdev.  -drive stays for
compatibility and convenience (I'd like to deprecate if=none, though).

If you change -drive, please keep the coming separation in mind, and
avoid parameters that mix up guest and host.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-11 22:11               ` Paul Brook
  2010-05-12 10:09                 ` Jamie Lokier
@ 2010-05-17 12:40                 ` Christoph Hellwig
  1 sibling, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2010-05-17 12:40 UTC (permalink / raw)
  To: Paul Brook
  Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru

On Tue, May 11, 2010 at 11:11:12PM +0100, Paul Brook wrote:
> .. though it may be a kernel bug rather that a qemu bug, depending on the 
> exact details.  Either way, I consider any mode that inhibits host filesystem 
> write cache but not volatile drive cache to be pretty worthless.  Either we 
> guaranteed data integrity on completion or we don't.

O_DIRECT just means bypassing the pagecache, it does not mean flushing
the disk cache on every access, which for certain workloads can be very
painful.  It also doesn't require a synchronous writeout of metadata
required to reach the data, e.g. in case when we have to allocate blocks
for a sparse image file.

To get the behaviour you want you need O_DIRECT|O_DSYNC, which is
something that is not exposed by qemu's current cache= suboption.

If we want to implement this properly we need to split the cache option,
as I already mentioned.  This would also have benefits in other areas,
but again refer to my previous mail for that.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-14  9:16         ` Markus Armbruster
@ 2010-05-17 12:41           ` Christoph Hellwig
  2010-05-17 12:42             ` Alexander Graf
  0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2010-05-17 12:41 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, qemu-devel, Christoph Hellwig, Alexander Graf

On Fri, May 14, 2010 at 11:16:47AM +0200, Markus Armbruster wrote:
> If you change -drive, please keep the coming separation in mind, and
> avoid parameters that mix up guest and host.

This new option seems to be exactly that, unfortunately..

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
  2010-05-17 12:41           ` Christoph Hellwig
@ 2010-05-17 12:42             ` Alexander Graf
  0 siblings, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-17 12:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Kevin Wolf, Markus Armbruster, qemu-devel


On 17.05.2010, at 14:41, Christoph Hellwig wrote:

> On Fri, May 14, 2010 at 11:16:47AM +0200, Markus Armbruster wrote:
>> If you change -drive, please keep the coming separation in mind, and
>> avoid parameters that mix up guest and host.
> 
> This new option seems to be exactly that, unfortunately..
> 

How so? It only affects the host. The guest never gets to see that flushing is disabled.


Alex

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-05-17 12:42 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
2010-05-10 21:51   ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
2010-05-11  8:36     ` [Qemu-devel] " Kevin Wolf
2010-05-11 10:55       ` Christoph Hellwig
2010-05-11 12:15         ` Paul Brook
2010-05-11 12:43           ` Anthony Liguori
2010-05-11 13:12             ` Paul Brook
2010-05-11 13:20               ` Anthony Liguori
2010-05-11 13:50                 ` Paul Brook
2010-05-11 15:40                   ` Anthony Liguori
2010-05-11 15:53                     ` Paul Brook
2010-05-11 17:09                       ` Anthony Liguori
2010-05-11 22:33                         ` Paul Brook
2010-05-11 19:11                     ` Avi Kivity
2010-05-11 16:32                 ` Jamie Lokier
2010-05-11 17:15                   ` Anthony Liguori
2010-05-11 18:13                     ` Jamie Lokier
2010-05-11 15:18           ` Alexander Graf
2010-05-11 18:20           ` Jamie Lokier
2010-05-11 21:58             ` Paul Brook
2010-05-11 22:11               ` Paul Brook
2010-05-12 10:09                 ` Jamie Lokier
2010-05-17 12:40                 ` Christoph Hellwig
2010-05-14  9:16         ` Markus Armbruster
2010-05-17 12:41           ` Christoph Hellwig
2010-05-17 12:42             ` Alexander Graf
2010-05-11 19:04       ` Avi Kivity
2010-05-12 15:05       ` Alexander Graf
2010-05-12 15:36         ` Kevin Wolf
2010-05-12 15:51           ` Alexander Graf
2010-05-11  6:18   ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
2010-05-11  8:29   ` [Qemu-devel] " Kevin Wolf
2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
2010-05-10 22:03   ` Alexander Graf
2010-05-10 22:12     ` Anthony Liguori
2010-05-11 21:48       ` Jamie Lokier
2010-05-12  8:51         ` Stefan Hajnoczi
2010-05-12  9:42           ` Jamie Lokier
2010-05-12 10:43             ` Stefan Hajnoczi
2010-05-12 12:50               ` Jamie Lokier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.