* [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
@ 2010-05-10 21:51 Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
0 siblings, 2 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 21:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, hch
Thanks to recent improvements, qemu flushes guest data to disk when the guest
tells us to do so.
This is great if we care about data consistency on host disk failures. In cases
where we don't it just creates additional overhead for no net win. One such use
case is the building of appliances in SUSE Studio. We write the resulting images
out of the build VM, but compress it directly afterwards. So if possible we'd
love to keep it in RAM.
This patchset introduces a new block parameter to -drive called "flush" which
allows a user to disable flushing in odd scenarios like the above. To show the
difference in performance this makes, I have put together a small test case.
Inside the initrd, I call the following piece of code on a 500MB preallocated
vmdk image:
mkfs.ext3 /dev/vda
mkdir -p /mnt
mount /dev/vda /mnt
dd if=/dev/zero of=/mnt/test bs=1M
umount /mnt
sync
halt -fp
With flush=on (default)
real 0m33.597s
user 0m16.453s
sys 0m6.192s
With flush=off
real 0m27.150s
user 0m16.533s
sys 0m5.348s
Alexander Graf (2):
Add no-op aio emulation stub
Add flush=off parameter to -drive
block.c | 18 ++++++++++++++++++
block.h | 5 +++++
block/raw-posix.c | 13 +++++++++++++
qemu-config.c | 3 +++
qemu-options.hx | 3 +++
vl.c | 3 +++
6 files changed, 45 insertions(+), 0 deletions(-)
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub
2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
@ 2010-05-10 21:51 ` Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
` (2 more replies)
2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
1 sibling, 3 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 21:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, hch
We need to be able to do nothing in AIO fashion. Since I suspect this
could be useful for more cases than the non flushing, I figured I'd
create a new function that does everything AIO-like, but doesn't do
anything.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
block.c | 18 ++++++++++++++++++
block.h | 5 +++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/block.c b/block.c
index 48305b7..1cd39d7 100644
--- a/block.c
+++ b/block.c
@@ -2196,6 +2196,24 @@ static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
return &acb->common;
}
+BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
+ BlockDriverCompletionFunc *cb, void *opaque)
+{
+ BlockDriverAIOCBSync *acb;
+
+ acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
+ acb->is_write = 1; /* don't bounce in the completion hadler */
+ acb->qiov = NULL;
+ acb->bounce = NULL;
+ acb->ret = 0;
+
+ if (!acb->bh)
+ acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
+
+ qemu_bh_schedule(acb->bh);
+ return &acb->common;
+}
+
/**************************************************************/
/* sync block device emulation */
diff --git a/block.h b/block.h
index f87d24e..bef6358 100644
--- a/block.h
+++ b/block.h
@@ -33,6 +33,7 @@ typedef struct QEMUSnapshotInfo {
#define BDRV_O_CACHE_WB 0x0040 /* use write-back caching */
#define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
#define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
+#define BDRV_O_NOFLUSH 0x0200 /* don't flush the image ever */
#define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)
@@ -97,6 +98,10 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
BlockDriverCompletionFunc *cb, void *opaque);
void bdrv_aio_cancel(BlockDriverAIOCB *acb);
+/* Emulate a no-op */
+BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
+ BlockDriverCompletionFunc *cb, void *opaque);
+
typedef struct BlockRequest {
/* Fields to be filled by multiwrite caller */
int64_t sector;
--
1.6.0.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
@ 2010-05-10 21:51 ` Alexander Graf
2010-05-11 8:36 ` [Qemu-devel] " Kevin Wolf
2010-05-11 6:18 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
2010-05-11 8:29 ` [Qemu-devel] " Kevin Wolf
2 siblings, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 21:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, hch
Usually the guest can tell the host to flush data to disk. In some cases we
don't want to flush though, but try to keep everything in cache.
So let's add a new parameter to -drive that allows us to set the flushing
behavior to "on" or "off", defaulting to enabling the guest to flush.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
block/raw-posix.c | 13 +++++++++++++
qemu-config.c | 3 +++
qemu-options.hx | 3 +++
vl.c | 3 +++
4 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 7541ed2..2510b1b 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -106,6 +106,7 @@ typedef struct BDRVRawState {
int fd;
int type;
int open_flags;
+ int bdrv_flags;
#if defined(__linux__)
/* linux floppy specific */
int64_t fd_open_time;
@@ -133,6 +134,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
BDRVRawState *s = bs->opaque;
int fd, ret;
+ s->bdrv_flags = bdrv_flags;
s->open_flags = open_flags | O_BINARY;
s->open_flags &= ~O_ACCMODE;
if (bdrv_flags & BDRV_O_RDWR) {
@@ -555,6 +557,11 @@ static BlockDriverAIOCB *raw_aio_flush(BlockDriverState *bs,
if (fd_open(bs) < 0)
return NULL;
+ /* Don't flush? */
+ if (s->bdrv_flags & BDRV_O_NOFLUSH) {
+ return bdrv_aio_noop_em(bs, cb, opaque);
+ }
+
return paio_submit(bs, s->fd, 0, NULL, 0, cb, opaque, QEMU_AIO_FLUSH);
}
@@ -726,6 +733,12 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
static void raw_flush(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
+
+ /* No flush means no flush */
+ if (s->bdrv_flags & BDRV_O_NOFLUSH) {
+ return;
+ }
+
qemu_fdatasync(s->fd);
}
diff --git a/qemu-config.c b/qemu-config.c
index d500885..c358add 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -79,6 +79,9 @@ QemuOptsList qemu_drive_opts = {
},{
.name = "readonly",
.type = QEMU_OPT_BOOL,
+ },{
+ .name = "flush",
+ .type = QEMU_OPT_BOOL,
},
{ /* end if list */ }
},
diff --git a/qemu-options.hx b/qemu-options.hx
index 12f6b51..69ae8de 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -120,6 +120,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
" [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
" [,cache=writethrough|writeback|none][,format=f][,serial=s]\n"
" [,addr=A][,id=name][,aio=threads|native][,readonly=on|off]\n"
+ " [,flush=on|off]\n"
" use 'file' as a drive image\n", QEMU_ARCH_ALL)
STEXI
@item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -151,6 +152,8 @@ These options have the same definition as they have in @option{-hdachs}.
@var{cache} is "none", "writeback", or "writethrough" and controls how the host cache is used to access block data.
@item aio=@var{aio}
@var{aio} is "threads", or "native" and selects between pthread based disk I/O and native Linux AIO.
+@item flush=@var{flush}
+@var{flush} is "on" (default), or "off" and select whether the guest can trigger a host flush
@item format=@var{format}
Specify which disk @var{format} will be used rather than detecting
the format. Can be used to specifiy format=raw to avoid interpreting
diff --git a/vl.c b/vl.c
index 85bcc84..a7ca2c3 100644
--- a/vl.c
+++ b/vl.c
@@ -787,6 +787,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
int max_devs;
int index;
int ro = 0;
+ int flush = 1;
int bdrv_flags = 0;
int on_read_error, on_write_error;
const char *devaddr;
@@ -819,6 +820,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
ro = qemu_opt_get_bool(opts, "readonly", 0);
+ flush = qemu_opt_get_bool(opts, "flush", 1);
file = qemu_opt_get(opts, "file");
serial = qemu_opt_get(opts, "serial");
@@ -1118,6 +1120,7 @@ DriveInfo *drive_init(QemuOpts *opts, void *opaque,
}
bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
+ bdrv_flags |= flush ? 0 : BDRV_O_NOFLUSH;
if (bdrv_open(dinfo->bdrv, file, bdrv_flags, drv) < 0) {
fprintf(stderr, "qemu: could not open disk image %s: %s\n",
--
1.6.0.2
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
@ 2010-05-10 21:59 ` Anthony Liguori
2010-05-10 22:03 ` Alexander Graf
1 sibling, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-10 21:59 UTC (permalink / raw)
To: Alexander Graf; +Cc: kwolf, qemu-devel, hch
On 05/10/2010 04:51 PM, Alexander Graf wrote:
> Thanks to recent improvements, qemu flushes guest data to disk when the guest
> tells us to do so.
>
> This is great if we care about data consistency on host disk failures. In cases
> where we don't it just creates additional overhead for no net win. One such use
> case is the building of appliances in SUSE Studio. We write the resulting images
> out of the build VM, but compress it directly afterwards. So if possible we'd
> love to keep it in RAM.
>
> This patchset introduces a new block parameter to -drive called "flush" which
> allows a user to disable flushing in odd scenarios like the above. To show the
> difference in performance this makes, I have put together a small test case.
> Inside the initrd, I call the following piece of code on a 500MB preallocated
> vmdk image:
>
This seems like it's asking for trouble to me. I'm not sure it's worth
the minor performance gain.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
@ 2010-05-10 22:03 ` Alexander Graf
2010-05-10 22:12 ` Anthony Liguori
0 siblings, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2010-05-10 22:03 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kwolf, qemu-devel, hch
On 10.05.2010, at 23:59, Anthony Liguori wrote:
> On 05/10/2010 04:51 PM, Alexander Graf wrote:
>> Thanks to recent improvements, qemu flushes guest data to disk when the guest
>> tells us to do so.
>>
>> This is great if we care about data consistency on host disk failures. In cases
>> where we don't it just creates additional overhead for no net win. One such use
>> case is the building of appliances in SUSE Studio. We write the resulting images
>> out of the build VM, but compress it directly afterwards. So if possible we'd
>> love to keep it in RAM.
>>
>> This patchset introduces a new block parameter to -drive called "flush" which
>> allows a user to disable flushing in odd scenarios like the above. To show the
>> difference in performance this makes, I have put together a small test case.
>> Inside the initrd, I call the following piece of code on a 500MB preallocated
>> vmdk image:
>>
>
> This seems like it's asking for trouble to me. I'm not sure it's worth the minor performance gain.
The gain is little on my netbook where I did the test on. This is part of performance regressions from 0.10 to 0.12 where we're talking build times of 2 minutes going to 30. While writeback was most of the chunk, flushing still at least doubled the build times which is unacceptable for us.
I also fail to see where it's asking for trouble. If we don't flush volatile data, things are good, no?
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-10 22:03 ` Alexander Graf
@ 2010-05-10 22:12 ` Anthony Liguori
2010-05-11 21:48 ` Jamie Lokier
0 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-10 22:12 UTC (permalink / raw)
To: Alexander Graf; +Cc: kwolf, qemu-devel, hch
On 05/10/2010 05:03 PM, Alexander Graf wrote:
> On 10.05.2010, at 23:59, Anthony Liguori wrote:
>
>
>> On 05/10/2010 04:51 PM, Alexander Graf wrote:
>>
>>> Thanks to recent improvements, qemu flushes guest data to disk when the guest
>>> tells us to do so.
>>>
>>> This is great if we care about data consistency on host disk failures. In cases
>>> where we don't it just creates additional overhead for no net win. One such use
>>> case is the building of appliances in SUSE Studio. We write the resulting images
>>> out of the build VM, but compress it directly afterwards. So if possible we'd
>>> love to keep it in RAM.
>>>
>>> This patchset introduces a new block parameter to -drive called "flush" which
>>> allows a user to disable flushing in odd scenarios like the above. To show the
>>> difference in performance this makes, I have put together a small test case.
>>> Inside the initrd, I call the following piece of code on a 500MB preallocated
>>> vmdk image:
>>>
>>>
>> This seems like it's asking for trouble to me. I'm not sure it's worth the minor performance gain.
>>
> The gain is little on my netbook where I did the test on. This is part of performance regressions from 0.10 to 0.12 where we're talking build times of 2 minutes going to 30. While writeback was most of the chunk, flushing still at least doubled the build times which is unacceptable for us.
>
There's got to be a better place to fix this. Disable barriers in your
guests?
Regards,
Anthony Liguori
> I also fail to see where it's asking for trouble. If we don't flush volatile data, things are good, no?
>
>
> Alex
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
@ 2010-05-11 6:18 ` Stefan Hajnoczi
2010-05-11 8:29 ` [Qemu-devel] " Kevin Wolf
2 siblings, 0 replies; 41+ messages in thread
From: Stefan Hajnoczi @ 2010-05-11 6:18 UTC (permalink / raw)
To: Alexander Graf; +Cc: kwolf, qemu-devel, hch
bdrv_aio_noop_em() could be useful for benchmarking and optimizing the
aio code. It serves as a cheap operation that lets us see the cost of
the aio roundtrip.
Stefan
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] Re: [PATCH 1/2] Add no-op aio emulation stub
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
2010-05-11 6:18 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
@ 2010-05-11 8:29 ` Kevin Wolf
2 siblings, 0 replies; 41+ messages in thread
From: Kevin Wolf @ 2010-05-11 8:29 UTC (permalink / raw)
To: Alexander Graf; +Cc: qemu-devel, hch
Am 10.05.2010 23:51, schrieb Alexander Graf:
> We need to be able to do nothing in AIO fashion. Since I suspect this
> could be useful for more cases than the non flushing, I figured I'd
> create a new function that does everything AIO-like, but doesn't do
> anything.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
> block.c | 18 ++++++++++++++++++
> block.h | 5 +++++
> 2 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/block.c b/block.c
> index 48305b7..1cd39d7 100644
> --- a/block.c
> +++ b/block.c
> @@ -2196,6 +2196,24 @@ static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
> return &acb->common;
> }
>
> +BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
> + BlockDriverCompletionFunc *cb, void *opaque)
> +{
> + BlockDriverAIOCBSync *acb;
> +
> + acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
> + acb->is_write = 1; /* don't bounce in the completion hadler */
Typo in the comment.
> + acb->qiov = NULL;
> + acb->bounce = NULL;
> + acb->ret = 0;
> +
> + if (!acb->bh)
> + acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
> +
> + qemu_bh_schedule(acb->bh);
> + return &acb->common;
> +}
> +
> /**************************************************************/
> /* sync block device emulation */
>
> diff --git a/block.h b/block.h
> index f87d24e..bef6358 100644
> --- a/block.h
> +++ b/block.h
> @@ -33,6 +33,7 @@ typedef struct QEMUSnapshotInfo {
> #define BDRV_O_CACHE_WB 0x0040 /* use write-back caching */
> #define BDRV_O_NATIVE_AIO 0x0080 /* use native AIO instead of the thread pool */
> #define BDRV_O_NO_BACKING 0x0100 /* don't open the backing file */
> +#define BDRV_O_NOFLUSH 0x0200 /* don't flush the image ever */
>
> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB)
This hunk should be in patch 2/2.
>
> @@ -97,6 +98,10 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
> BlockDriverCompletionFunc *cb, void *opaque);
> void bdrv_aio_cancel(BlockDriverAIOCB *acb);
>
> +/* Emulate a no-op */
> +BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
> + BlockDriverCompletionFunc *cb, void *opaque);
> +
> typedef struct BlockRequest {
> /* Fields to be filled by multiwrite caller */
> int64_t sector;
I think exporting this function shouldn't be necessary. Everything that
deals with AIO emulation should be contained in block.c.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-10 21:51 ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
@ 2010-05-11 8:36 ` Kevin Wolf
2010-05-11 10:55 ` Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Kevin Wolf @ 2010-05-11 8:36 UTC (permalink / raw)
To: Alexander Graf; +Cc: qemu-devel, hch
Am 10.05.2010 23:51, schrieb Alexander Graf:
> Usually the guest can tell the host to flush data to disk. In some cases we
> don't want to flush though, but try to keep everything in cache.
>
> So let's add a new parameter to -drive that allows us to set the flushing
> behavior to "on" or "off", defaulting to enabling the guest to flush.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
What about another cache=... value instead of adding more options? I'm
quite sure you'll only ever need this with writeback caching. So we
could have cache=none|writethrough|writeback|wb-noflush or something
like that.
> ---
> block/raw-posix.c | 13 +++++++++++++
This is obviously wrong. If you want to introduce new behaviour to the
block layer, you must do it consistently and not just for one block
driver. So these changes should be made to the generic functions in
block.c instead.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 8:36 ` [Qemu-devel] " Kevin Wolf
@ 2010-05-11 10:55 ` Christoph Hellwig
2010-05-11 12:15 ` Paul Brook
2010-05-14 9:16 ` Markus Armbruster
2010-05-11 19:04 ` Avi Kivity
2010-05-12 15:05 ` Alexander Graf
2 siblings, 2 replies; 41+ messages in thread
From: Christoph Hellwig @ 2010-05-11 10:55 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, Alexander Graf, armbru
On Tue, May 11, 2010 at 10:36:24AM +0200, Kevin Wolf wrote:
> Am 10.05.2010 23:51, schrieb Alexander Graf:
> > Usually the guest can tell the host to flush data to disk. In some cases we
> > don't want to flush though, but try to keep everything in cache.
> >
> > So let's add a new parameter to -drive that allows us to set the flushing
> > behavior to "on" or "off", defaulting to enabling the guest to flush.
> >
> > Signed-off-by: Alexander Graf <agraf@suse.de>
>
> What about another cache=... value instead of adding more options? I'm
> quite sure you'll only ever need this with writeback caching. So we
> could have cache=none|writethrough|writeback|wb-noflush or something
> like that.
The cache option really isn't too useful. There's a matrix of 3x2
possible I/O modes for the posix backend, and right now we only expose
two of them. I think we really should expose all of them, split into
two options:
caching
| normal | O_DIRECT |
--------+--------------------+--------------------+
none | - | - |
integrity O_DSYNC | cache=writeback | - |
fsync | cache=writethrough | cache=none |
--------+--------------------+--------------------+
While we have to keep the cache= option for legacy purposes we're
much better to have two new options for the I/O mode and integrity
to express everything we need. Especially given that I'm looking
into allowing the guest to tweak the volatile write cache setting
of the emulated disk, which will allow us to turn on/off the O_SYNC
on the flight. In the current scheme that would introduce a
O_DSYNC|O_DIRECT mode that's not currently accesible through the
command line/monitor.
Markus was looking into separating the block device state in host/
guest portions lately, and splitting cache= would help with this.
Driver cache enabled or not (fsync means enabled, O_DSYNC disabled,
and none probably disabled given that we don't care), while O_DIRECT
vs normal cached I/O is host state.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 10:55 ` Christoph Hellwig
@ 2010-05-11 12:15 ` Paul Brook
2010-05-11 12:43 ` Anthony Liguori
` (2 more replies)
2010-05-14 9:16 ` Markus Armbruster
1 sibling, 3 replies; 41+ messages in thread
From: Paul Brook @ 2010-05-11 12:15 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, armbru, Christoph Hellwig, Alexander Graf
> > What about another cache=... value instead of adding more options? I'm
> > quite sure you'll only ever need this with writeback caching. So we
> > could have cache=none|writethrough|writeback|wb-noflush or something
> > like that.
I agree.
> The cache option really isn't too useful. There's a matrix of 3x2
> possible I/O modes for the posix backend, and right now we only expose
> two of them. I think we really should expose all of them, split into
> two options:
>
> caching
>
> | normal | O_DIRECT |
>
> --------+--------------------+--------------------+
> none | - | - |
> integrity O_DSYNC | cache=writeback | - |
> fsync | cache=writethrough | cache=none |
> --------+--------------------+--------------------+
I think this table is misleading, and from a user perspective there are only 4
interesting combinations:
cache=none:
No host caching. Reads and writes both go directly to underlying storage.
Useful to avoid double-caching.
cache=writethrough
Reads are cached. Writes go directly to underlying storage. Useful for
broken guests that aren't aware of drive caches.
cache=writeback
Reads and writes are cached. Guest flushes are honoured. Note that this may
include a guest visible knob that allows the write cache to be disabled. If
the guest requests the cache be disabled then this should act the same as
cache=writethrough. Closest to how real hardware works.
cache=always (or a more scary name like cache=lie to defend against idiots)
Reads and writes are cached. Guest flushes are ignored. Useful for dumb
guests in non-critical environments.
IMO the other boxes in your table (disable cache but require guest flushes)
don't make any sense.
Paul
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 12:15 ` Paul Brook
@ 2010-05-11 12:43 ` Anthony Liguori
2010-05-11 13:12 ` Paul Brook
2010-05-11 15:18 ` Alexander Graf
2010-05-11 18:20 ` Jamie Lokier
2 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 12:43 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru
On 05/11/2010 07:15 AM, Paul Brook wrote:
>>> What about another cache=... value instead of adding more options? I'm
>>> quite sure you'll only ever need this with writeback caching. So we
>>> could have cache=none|writethrough|writeback|wb-noflush or something
>>> like that.
>>>
> I think this table is misleading, and from a user perspective there are only 4
> interesting combinations:
>
I agree that splitting cache would be unfortunate for users.
> cache=none:
> No host caching. Reads and writes both go directly to underlying storage.
> Useful to avoid double-caching.
>
> cache=writethrough
> Reads are cached. Writes go directly to underlying storage. Useful for
> broken guests that aren't aware of drive caches.
>
> cache=writeback
> Reads and writes are cached. Guest flushes are honoured. Note that this may
> include a guest visible knob that allows the write cache to be disabled. If
> the guest requests the cache be disabled then this should act the same as
> cache=writethrough. Closest to how real hardware works.
>
> cache=always (or a more scary name like cache=lie to defend against idiots)
> Reads and writes are cached. Guest flushes are ignored. Useful for dumb
> guests in non-critical environments.
>
I really don't believe that we should support a cache=lie. There are
many other obtain the same results. For instance, mount your guest
filesystem with barrier=0.
Introducing a mode where data integrity isn't preserved means that it's
possible for someone to accidentally use it or use it when they really
shouldn't. People already struggle with what caching mode they should
use and this would just make the problem worse.
Regards,
Anthony Liguori
> IMO the other boxes in your table (disable cache but require guest flushes)
> don't make any sense.
>
> Paul
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 12:43 ` Anthony Liguori
@ 2010-05-11 13:12 ` Paul Brook
2010-05-11 13:20 ` Anthony Liguori
0 siblings, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 13:12 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru
> > cache=always (or a more scary name like cache=lie to defend against
> > idiots)
> >
> > Reads and writes are cached. Guest flushes are ignored. Useful for
> > dumb guests in non-critical environments.
>
> I really don't believe that we should support a cache=lie. There are
> many other obtain the same results. For instance, mount your guest
> filesystem with barrier=0.
Ideally yes. However in practice I suspect this is still a useful option. Is
it even possible to disable barriers in all cases (e.g. NTFS under windows)?
In a production environment it's probably not so useful - you're generally
dealing with long lived, custom configured guests.
In a development environment the rules can be a bit different. For example if
you're testing an OS installer then you really don't want to be passing magic
mount options. If the host machine dies then you don't care about the state of
the guest because you're going to start from scratch anyway.
Paul
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 13:12 ` Paul Brook
@ 2010-05-11 13:20 ` Anthony Liguori
2010-05-11 13:50 ` Paul Brook
2010-05-11 16:32 ` Jamie Lokier
0 siblings, 2 replies; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 13:20 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Christoph Hellwig
On 05/11/2010 08:12 AM, Paul Brook wrote:
>>> cache=always (or a more scary name like cache=lie to defend against
>>> idiots)
>>>
>>> Reads and writes are cached. Guest flushes are ignored. Useful for
>>> dumb guests in non-critical environments.
>>>
>> I really don't believe that we should support a cache=lie. There are
>> many other obtain the same results. For instance, mount your guest
>> filesystem with barrier=0.
>>
> Ideally yes. However in practice I suspect this is still a useful option. Is
> it even possible to disable barriers in all cases (e.g. NTFS under windows)?
>
> In a production environment it's probably not so useful - you're generally
> dealing with long lived, custom configured guests.
>
> In a development environment the rules can be a bit different. For example if
> you're testing an OS installer then you really don't want to be passing magic
> mount options. If the host machine dies then you don't care about the state of
> the guest because you're going to start from scratch anyway.
>
Then create a mount point on your host and mount the host file system
under that mount with barrier=0.
The problem with options added for developers is that those options are
very often accidentally used for production.
Regards,
Anthony Liguori
> Paul
>
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 13:20 ` Anthony Liguori
@ 2010-05-11 13:50 ` Paul Brook
2010-05-11 15:40 ` Anthony Liguori
2010-05-11 16:32 ` Jamie Lokier
1 sibling, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 13:50 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig
> > In a development environment the rules can be a bit different. For
> > example if you're testing an OS installer then you really don't want to
> > be passing magic mount options. If the host machine dies then you don't
> > care about the state of the guest because you're going to start from
> > scratch anyway.
>
> Then create a mount point on your host and mount the host file system
> under that mount with barrier=0.
>
> The problem with options added for developers is that those options are
> very often accidentally used for production.
I disagree. We should not be removing or rejecting features just because they
allow you to shoot yourself in the foot. We probably shouldn't be enabling
them by default, but that's a whole different question.
Mounting a filesystem with barrier=0 is not a good answer because it's a
global setting. While my qemu VM may be disposable, it's unlikely that the
same is true of the rest of my machine. By your argument linux shouldn't be
allowing me to do that in the first place because a dumb sysadmin could use
that option on the filesystem containing the mail store. In fact with the
average user that's *likely* to happen because they'll only have a single
partition on their machine.
Paul
N.B. in this context "developers" refers to users who happen to be using qemu
as part of a toolset for development of $whatever, not development of qemu
itself. c.f. "production" where qemu is part of the IT infrastructure used to
directly provide an external service.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 12:15 ` Paul Brook
2010-05-11 12:43 ` Anthony Liguori
@ 2010-05-11 15:18 ` Alexander Graf
2010-05-11 18:20 ` Jamie Lokier
2 siblings, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-11 15:18 UTC (permalink / raw)
To: Paul Brook; +Cc: Kevin Wolf, armbru, qemu-devel, Christoph Hellwig
Paul Brook wrote:
>>> What about another cache=... value instead of adding more options? I'm
>>> quite sure you'll only ever need this with writeback caching. So we
>>> could have cache=none|writethrough|writeback|wb-noflush or something
>>> like that.
>>>
>
> I agree.
>
>
>> The cache option really isn't too useful. There's a matrix of 3x2
>> possible I/O modes for the posix backend, and right now we only expose
>> two of them. I think we really should expose all of them, split into
>> two options:
>>
>> caching
>>
>> | normal | O_DIRECT |
>>
>> --------+--------------------+--------------------+
>> none | - | - |
>> integrity O_DSYNC | cache=writeback | - |
>> fsync | cache=writethrough | cache=none |
>> --------+--------------------+--------------------+
>>
>
> I think this table is misleading, and from a user perspective there are only 4
> interesting combinations:
>
> cache=none:
> No host caching. Reads and writes both go directly to underlying storage.
> Useful to avoid double-caching.
>
> cache=writethrough
> Reads are cached. Writes go directly to underlying storage. Useful for
> broken guests that aren't aware of drive caches.
>
> cache=writeback
> Reads and writes are cached. Guest flushes are honoured. Note that this may
> include a guest visible knob that allows the write cache to be disabled. If
> the guest requests the cache be disabled then this should act the same as
> cache=writethrough. Closest to how real hardware works.
>
> cache=always (or a more scary name like cache=lie to defend against idiots)
> Reads and writes are cached. Guest flushes are ignored. Useful for dumb
> guests in non-critical environments.
>
How about cache=insecure?
But I agree. Having that as a cache= option seems to be the easiest way
to expose it to a user.
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 13:50 ` Paul Brook
@ 2010-05-11 15:40 ` Anthony Liguori
2010-05-11 15:53 ` Paul Brook
2010-05-11 19:11 ` Avi Kivity
0 siblings, 2 replies; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 15:40 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig
On 05/11/2010 08:50 AM, Paul Brook wrote:
>>> In a development environment the rules can be a bit different. For
>>> example if you're testing an OS installer then you really don't want to
>>> be passing magic mount options. If the host machine dies then you don't
>>> care about the state of the guest because you're going to start from
>>> scratch anyway.
>>>
>> Then create a mount point on your host and mount the host file system
>> under that mount with barrier=0.
>>
>> The problem with options added for developers is that those options are
>> very often accidentally used for production.
>>
> I disagree. We should not be removing or rejecting features just because they
> allow you to shoot yourself in the foot. We probably shouldn't be enabling
> them by default, but that's a whole different question.
>
I disagree and think the mentality severely hurts usability. QEMU's
role should be to enable features, not to simplify every obscure
feature. In general, if someone wants to accomplish something, we
should try to provide a mechanism to accomplish it.
cache=none|writeback|writethrough is an example of this. No one other
than QEMU can control how we open a file descriptor so we need to
provide a knob for it.
But if the goal is to make sure that fsync's don't result in data
actually being on disk, there are many other ways to accomplish this.
First, for the vast majority of users, this is already the case because
ext3 defaults to disabling barriers. Alex is running into this issue
only because SuSE enables barriers by default for ext3 and fsync()'s are
horribly slow on ext3. The fact that this is measurable is purely a
statement of ext3 suckage and a conscious decision by the SuSE team to
live with that suckage. It shouldn't be nearly as bad on ext4.
But even in this context, it's not a huge burden to disable barriers
within the guest or to create a dedicated filesystem mount for your
images on the host that disabled barriers on the mount.
On the other hand, the cost of adding another caching option is pretty
significant. Very few users really understand what caching options they
should use under what circumstance. Adding another option will just
make that worse. The fact that this new option can result in data
corruption that won't occur under normal circumstances is troubling.
> Mounting a filesystem with barrier=0 is not a good answer because it's a
> global setting. While my qemu VM may be disposable, it's unlikely that the
> same is true of the rest of my machine.
You can have multiple mounts. In fact, you can just make a loopback
mount within your existing file system which means that you don't even
have to muck with your disk.
If your VM is disposable, then shouldn't you be using -snapshot?
> By your argument linux shouldn't be
> allowing me to do that in the first place because a dumb sysadmin could use
> that option on the filesystem containing the mail store. In fact with the
> average user that's *likely* to happen because they'll only have a single
> partition on their machine.
>
We aren't preventing sophisticated users from doing sophisticated
things. But why should we simplify something that most people don't
need and if they accidentally use it, bad things will happen?
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 15:40 ` Anthony Liguori
@ 2010-05-11 15:53 ` Paul Brook
2010-05-11 17:09 ` Anthony Liguori
2010-05-11 19:11 ` Avi Kivity
1 sibling, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 15:53 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig
> > I disagree. We should not be removing or rejecting features just because
> > they allow you to shoot yourself in the foot. We probably shouldn't be
> > enabling them by default, but that's a whole different question.
>
> I disagree and think the mentality severely hurts usability. QEMU's
> role should be to enable features, not to simplify every obscure
> feature. In general, if someone wants to accomplish something, we
> should try to provide a mechanism to accomplish it.
> cache=none|writeback|writethrough is an example of this. No one other
> than QEMU can control how we open a file descriptor so we need to
> provide a knob for it.
Doesn't the same argument apply to the existing cache=writethrough?
i.e. if you want to avoid data loss you should make sure your guest issues
flushes properly, and it's not something qemu should be trying to hack round
be adding an implicit flushe after every write.
Paul
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 13:20 ` Anthony Liguori
2010-05-11 13:50 ` Paul Brook
@ 2010-05-11 16:32 ` Jamie Lokier
2010-05-11 17:15 ` Anthony Liguori
1 sibling, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 16:32 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Paul Brook,
Christoph Hellwig
Anthony Liguori wrote:
> On 05/11/2010 08:12 AM, Paul Brook wrote:
> >>>cache=always (or a more scary name like cache=lie to defend against
> >>>idiots)
> >>>
> >>> Reads and writes are cached. Guest flushes are ignored. Useful for
> >>> dumb guests in non-critical environments.
> >>>
> >>I really don't believe that we should support a cache=lie. There are
> >>many other obtain the same results. For instance, mount your guest
> >>filesystem with barrier=0.
> >>
> >Ideally yes. However in practice I suspect this is still a useful option.
> >Is
> >it even possible to disable barriers in all cases (e.g. NTFS under
> >windows)?
> >
> >In a production environment it's probably not so useful - you're generally
> >dealing with long lived, custom configured guests.
> >
> >In a development environment the rules can be a bit different. For example
> >if
> >you're testing an OS installer then you really don't want to be passing
> >magic
> >mount options. If the host machine dies then you don't care about the
> >state of
> >the guest because you're going to start from scratch anyway.
> >
>
> Then create a mount point on your host and mount the host file system
> under that mount with barrier=0.
Two reasons that advice doesn't work:
1. It doesn't work in many environments. You can't mount a filesystem
with barrier=0 in one place and barrier=1 on a different point, and
there's ofen only one host partition.
2. barrier=0 does _not_ provide the cache=off behaviour. It only
disables barriers; it does not prevent writing to the disk hardware.
If you are doing a transient OS install, ideally you want an amount
equal to your free RAM not written to disk until the end. barrier=0
does not achieve that.
> The problem with options added for developers is that those options are
> very often accidentally used for production.
We already have risky cache= options. Also, do we call fdatasync
(with barrier) on _every_ write for guests which disable the
emulated disk cache?
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 15:53 ` Paul Brook
@ 2010-05-11 17:09 ` Anthony Liguori
2010-05-11 22:33 ` Paul Brook
0 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 17:09 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, armbru, qemu-devel, Alexander Graf, Christoph Hellwig
On 05/11/2010 10:53 AM, Paul Brook wrote:
>>> I disagree. We should not be removing or rejecting features just because
>>> they allow you to shoot yourself in the foot. We probably shouldn't be
>>> enabling them by default, but that's a whole different question.
>>>
>> I disagree and think the mentality severely hurts usability. QEMU's
>> role should be to enable features, not to simplify every obscure
>> feature. In general, if someone wants to accomplish something, we
>> should try to provide a mechanism to accomplish it.
>> cache=none|writeback|writethrough is an example of this. No one other
>> than QEMU can control how we open a file descriptor so we need to
>> provide a knob for it.
>>
> Doesn't the same argument apply to the existing cache=writethrough?
> i.e. if you want to avoid data loss you should make sure your guest issues
> flushes properly, and it's not something qemu should be trying to hack round
> be adding an implicit flushe after every write.
>
cache is the host page cache acting as an extended disk cache. In
writethrough mode, the behavior is identical to writethrough on a normal
disk cache in that all operations are completed only when sent down to
the next storage layer. In writeback mode, you rely on the integrity of
the host page cache to prevent data loss.
Data loss is different than data corruption though. In neither mode
will a correct application experience data corruption because if a guest
issues a flush, we respect those requests. However, the new proposed
mode would introduce the possibility of data corruption because it
becomes possible for writes to be written to disk outside the order
requested by the guest. In the event of power loss, the result is
corruption.
Regards,
Anthony Liguori
> Paul
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 16:32 ` Jamie Lokier
@ 2010-05-11 17:15 ` Anthony Liguori
2010-05-11 18:13 ` Jamie Lokier
0 siblings, 1 reply; 41+ messages in thread
From: Anthony Liguori @ 2010-05-11 17:15 UTC (permalink / raw)
To: Jamie Lokier
Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Paul Brook,
Christoph Hellwig
On 05/11/2010 11:32 AM, Jamie Lokier wrote:
> Anthony Liguori wrote:
>
>> On 05/11/2010 08:12 AM, Paul Brook wrote:
>>
>>>>> cache=always (or a more scary name like cache=lie to defend against
>>>>> idiots)
>>>>>
>>>>> Reads and writes are cached. Guest flushes are ignored. Useful for
>>>>> dumb guests in non-critical environments.
>>>>>
>>>>>
>>>> I really don't believe that we should support a cache=lie. There are
>>>> many other obtain the same results. For instance, mount your guest
>>>> filesystem with barrier=0.
>>>>
>>>>
>>> Ideally yes. However in practice I suspect this is still a useful option.
>>> Is
>>> it even possible to disable barriers in all cases (e.g. NTFS under
>>> windows)?
>>>
>>> In a production environment it's probably not so useful - you're generally
>>> dealing with long lived, custom configured guests.
>>>
>>> In a development environment the rules can be a bit different. For example
>>> if
>>> you're testing an OS installer then you really don't want to be passing
>>> magic
>>> mount options. If the host machine dies then you don't care about the
>>> state of
>>> the guest because you're going to start from scratch anyway.
>>>
>>>
>> Then create a mount point on your host and mount the host file system
>> under that mount with barrier=0.
>>
> Two reasons that advice doesn't work:
>
> 1. It doesn't work in many environments. You can't mount a filesystem
> with barrier=0 in one place and barrier=1 on a different point, and
> there's ofen only one host partition.
>
qemu-img create -f raw foo.img 10G
mkfs.ext3 foo.img
mount -oloop,rw,barrier=1 -t ext3 foo.img mnt
Works perfectly fine.
> 2. barrier=0 does _not_ provide the cache=off behaviour. It only
> disables barriers; it does not prevent writing to the disk hardware.
>
The proposal has nothing to do with cache=off.
>> The problem with options added for developers is that those options are
>> very often accidentally used for production.
>>
> We already have risky cache= options. Also, do we call fdatasync
> (with barrier) on _every_ write for guests which disable the
> emulated disk cache?
>
None of our cache= options should result in data corruption on power
loss. If they do, it's a bug.
Regards,
Anthony Liguori
> -- Jamie
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 17:15 ` Anthony Liguori
@ 2010-05-11 18:13 ` Jamie Lokier
0 siblings, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 18:13 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, Alexander Graf, armbru, qemu-devel, Paul Brook,
Christoph Hellwig
Anthony Liguori wrote:
> qemu-img create -f raw foo.img 10G
> mkfs.ext3 foo.img
> mount -oloop,rw,barrier=1 -t ext3 foo.img mnt
>
> Works perfectly fine.
Hmm, interesting. Didn't know loop propagated barriers.
So you're suggesting to use qemu with a loop device, and ext2 (bit
faster than ext3) and barrier=0 (well, that's implied if you use
ext2), and a raw image file on the ext2/3 filesystem, to provide the
effect of flush=off, becuase the loop device caches block writes on
the host, except for explicit barrier requests from the fs, which are
turned off?
That wasn't obvious the first time :-)
Does the loop device cache fs writes instead of propagating them
immediately to the underlying fs? I guess it probably does.
Does the loop device allow the backing file to grow sparsely, to get
behavious like qcow2?
That's ugly but it might just work.
> >2. barrier=0 does _not_ provide the cache=off behaviour. It only
> >disables barriers; it does not prevent writing to the disk hardware.
>
> The proposal has nothing to do with cache=off.
Sorry, I meant flush=off (the proposal). Mounting the host filesystem
(i.e. not using a loop device anywhere) with barrier=0 doesn't have
even close to the same effect.
> >>The problem with options added for developers is that those options are
> >>very often accidentally used for production.
> >>
> >We already have risky cache= options. Also, do we call fdatasync
> >(with barrier) on _every_ write for guests which disable the
> >emulated disk cache?
>
> None of our cache= options should result in data corruption on power
> loss. If they do, it's a bug.
(I might have the details below a bit off.)
If cache=none uses O_DIRECT without calling fdatasync for guest
barriers, then it will get data corruption on power loss.
If cache=none does call fdatasync for guest barriers, then it might
still get corruption on power loss; I am not sure if recent Linux host
behaviour of O_DIRECT+fdatasync (with no buffered writes to commit)
issues the necessary barriers. I am quite sure that older kernels did not.
cache=writethrough will get data corruption on power loss with older
Linux host kernels. O_DSYNC did not issue barriers. I'm not sure if
the behaviour of O_DSYNC that was recently changed is now issuing
barriers after every write.
Provided all the cache= options call fdatasync/fsync when the guest
issues a cache flush, and call fdatasync/fsync following _every_ write
when the guest has disabled the emulated write cache, that should be
as good as Qemu can reasonably do. It's up to the host from there.
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 12:15 ` Paul Brook
2010-05-11 12:43 ` Anthony Liguori
2010-05-11 15:18 ` Alexander Graf
@ 2010-05-11 18:20 ` Jamie Lokier
2010-05-11 21:58 ` Paul Brook
2 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 18:20 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru
Paul Brook wrote:
> cache=none:
> No host caching. Reads and writes both go directly to underlying storage.
> Useful to avoid double-caching.
>
> cache=writethrough
> Reads are cached. Writes go directly to underlying storage. Useful for
> broken guests that aren't aware of drive caches.
These are misleading descriptions - because cache=none does not push
writes down to powerfail-safe storage, while cache=writethrough might.
> cache=always (or a more scary name like cache=lie to defend against idiots)
> Reads and writes are cached. Guest flushes are ignored. Useful for dumb
> guests in non-critical environments.
cache=unsafe would tell it like it is.
Even non-idiots could be excused for getting the wrong impression from
cache=always.
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 8:36 ` [Qemu-devel] " Kevin Wolf
2010-05-11 10:55 ` Christoph Hellwig
@ 2010-05-11 19:04 ` Avi Kivity
2010-05-12 15:05 ` Alexander Graf
2 siblings, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2010-05-11 19:04 UTC (permalink / raw)
To: Kevin Wolf; +Cc: hch, Alexander Graf, qemu-devel
On 05/11/2010 11:36 AM, Kevin Wolf wrote:
> Am 10.05.2010 23:51, schrieb Alexander Graf:
>
>> Usually the guest can tell the host to flush data to disk. In some cases we
>> don't want to flush though, but try to keep everything in cache.
>>
>> So let's add a new parameter to -drive that allows us to set the flushing
>> behavior to "on" or "off", defaulting to enabling the guest to flush.
>>
>> Signed-off-by: Alexander Graf<agraf@suse.de>
>>
> What about another cache=... value instead of adding more options? I'm
> quite sure you'll only ever need this with writeback caching. So we
> could have cache=none|writethrough|writeback|wb-noflush or something
> like that.
>
I implemented this once and promptly forgot about it, as cache=volatile.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 15:40 ` Anthony Liguori
2010-05-11 15:53 ` Paul Brook
@ 2010-05-11 19:11 ` Avi Kivity
1 sibling, 0 replies; 41+ messages in thread
From: Avi Kivity @ 2010-05-11 19:11 UTC (permalink / raw)
To: Anthony Liguori
Cc: Kevin Wolf, armbru, Alexander Graf, qemu-devel, Paul Brook,
Christoph Hellwig
On 05/11/2010 06:40 PM, Anthony Liguori wrote:
>
> But if the goal is to make sure that fsync's don't result in data
> actually being on disk, there are many other ways to accomplish this.
> First, for the vast majority of users, this is already the case
> because ext3 defaults to disabling barriers. Alex is running into
> this issue only because SuSE enables barriers by default for ext3 and
> fsync()'s are horribly slow on ext3. The fact that this is measurable
> is purely a statement of ext3 suckage and a conscious decision by the
> SuSE team to live with that suckage. It shouldn't be nearly as bad on
> ext4.
>
There is a huge difference between disabling barriers and cache=volatile.
With barrier=0, data is still forced out of host pagecache and into disk
cache; it's simply not forced from disk cache to the platter. But since
the disk cache is very limited, you'll still be running at disk speed.
With cache=volatile and enough memory you'll never wait for the disk to
write data.
>
> On the other hand, the cost of adding another caching option is pretty
> significant. Very few users really understand what caching options
> they should use under what circumstance. Adding another option will
> just make that worse. The fact that this new option can result in
> data corruption that won't occur under normal circumstances is troubling.
I don't think it's a real problem with proper naming, but we can always
hide the option behind a ./configure --enable-developer-options.
>
>> Mounting a filesystem with barrier=0 is not a good answer because it's a
>> global setting. While my qemu VM may be disposable, it's unlikely
>> that the
>> same is true of the rest of my machine.
>
> You can have multiple mounts. In fact, you can just make a loopback
> mount within your existing file system which means that you don't even
> have to muck with your disk.
>
> If your VM is disposable, then shouldn't you be using -snapshot?
For my use case (autotest) the VM is not disposable (it's reused between
tests) but I don't care about the data in case of a host crash.
>
>
>> By your argument linux shouldn't be
>> allowing me to do that in the first place because a dumb sysadmin
>> could use
>> that option on the filesystem containing the mail store. In fact with
>> the
>> average user that's *likely* to happen because they'll only have a
>> single
>> partition on their machine.
>
> We aren't preventing sophisticated users from doing sophisticated
> things. But why should we simplify something that most people don't
> need and if they accidentally use it, bad things will happen?
We don't have a real alternative.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-10 22:12 ` Anthony Liguori
@ 2010-05-11 21:48 ` Jamie Lokier
2010-05-12 8:51 ` Stefan Hajnoczi
0 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-11 21:48 UTC (permalink / raw)
To: Anthony Liguori, g; +Cc: kwolf, hch, Alexander Graf, qemu-devel
Anthony Liguori wrote:
> There's got to be a better place to fix this. Disable barriers in your
> guests?
If only it were that easy.
OS installs are the thing that this feature would most help. They
take ages, do a huge amount of writing with lots of seeking, and if
the host fails you're going to discard the image.
I'm not sure how I would change that setting for most OS install GUIs,
especially Windows, or if it's even possible.
It's usually much easier to change barrier settings after installing
and you've got a command line or registry editing tool. But by then,
it's not useful any more.
Any other ideas?
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 18:20 ` Jamie Lokier
@ 2010-05-11 21:58 ` Paul Brook
2010-05-11 22:11 ` Paul Brook
0 siblings, 1 reply; 41+ messages in thread
From: Paul Brook @ 2010-05-11 21:58 UTC (permalink / raw)
To: Jamie Lokier
Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru
> Paul Brook wrote:
> > cache=none:
> > No host caching. Reads and writes both go directly to underlying
> > storage.
> >
> > Useful to avoid double-caching.
> >
> > cache=writethrough
> >
> > Reads are cached. Writes go directly to underlying storage. Useful for
> > broken guests that aren't aware of drive caches.
>
> These are misleading descriptions - because cache=none does not push
> writes down to powerfail-safe storage, while cache=writethrough might.
If so, then this is a serious bug.
Paul
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 21:58 ` Paul Brook
@ 2010-05-11 22:11 ` Paul Brook
2010-05-12 10:09 ` Jamie Lokier
2010-05-17 12:40 ` Christoph Hellwig
0 siblings, 2 replies; 41+ messages in thread
From: Paul Brook @ 2010-05-11 22:11 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, armbru, Alexander Graf, Christoph Hellwig
> > Paul Brook wrote:
> > > cache=none:
> > > No host caching. Reads and writes both go directly to underlying
> > > storage.
> > >
> > > Useful to avoid double-caching.
> > >
> > > cache=writethrough
> > >
> > > Reads are cached. Writes go directly to underlying storage. Useful
> > > for
> > >
> > > broken guests that aren't aware of drive caches.
> >
> > These are misleading descriptions - because cache=none does not push
> > writes down to powerfail-safe storage, while cache=writethrough might.
>
> If so, then this is a serious bug.
.. though it may be a kernel bug rather that a qemu bug, depending on the
exact details. Either way, I consider any mode that inhibits host filesystem
write cache but not volatile drive cache to be pretty worthless. Either we
guaranteed data integrity on completion or we don't.
Paul
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 17:09 ` Anthony Liguori
@ 2010-05-11 22:33 ` Paul Brook
0 siblings, 0 replies; 41+ messages in thread
From: Paul Brook @ 2010-05-11 22:33 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Christoph Hellwig, armbru, Alexander Graf
> On 05/11/2010 10:53 AM, Paul Brook wrote:
> >>> I disagree. We should not be removing or rejecting features just
> >>> because they allow you to shoot yourself in the foot. We probably
> >>> shouldn't be enabling them by default, but that's a whole different
> >>> question.
> >>
> >> I disagree and think the mentality severely hurts usability. QEMU's
> >> role should be to enable features, not to simplify every obscure
> >> feature. In general, if someone wants to accomplish something, we
> >> should try to provide a mechanism to accomplish it.
> >> cache=none|writeback|writethrough is an example of this. No one other
> >> than QEMU can control how we open a file descriptor so we need to
> >> provide a knob for it.
> >
> > Doesn't the same argument apply to the existing cache=writethrough?
> > i.e. if you want to avoid data loss you should make sure your guest
> > issues flushes properly, and it's not something qemu should be trying to
> > hack round be adding an implicit flushe after every write.
>
> cache is the host page cache acting as an extended disk cache. In
> writethrough mode, the behavior is identical to writethrough on a normal
> disk cache in that all operations are completed only when sent down to
> the next storage layer.
IMO this is a bug. Making host pagecache writethrough but still having a
volatile writeback disk cache seems like a complete waste of time. I can see
the advantage of disabling host pagecache (avoid double caching in host RAM),
but having different levels of cache be writethrough/writeback seems extremely
suspect.
It's also occurred to me that you're also basing your arguments on the
assumption that host pagecache is volatile. On a machine with a good UPS this
is not true. In the even of external power failure the UPS will flush the host
page cache and cleanly shut the machine down. As with battery-backed RAID
cards, it's entirely reasonable to consider the cache to be non-volatile
storage and ignore the flush requests.
If you don't trust your host OS in this situation then you're into a whole
different level of pain, and raises obvious questions about the firmware
running on your storage subsystem.
Paul
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-11 21:48 ` Jamie Lokier
@ 2010-05-12 8:51 ` Stefan Hajnoczi
2010-05-12 9:42 ` Jamie Lokier
0 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2010-05-12 8:51 UTC (permalink / raw)
To: Alexander Graf; +Cc: kwolf, g, hch, qemu-devel
Why add a nop AIO operation instead of setting
BlockDriverState->enable_write_cache to zero? In that case no write
cache would be reported to the guest (just like cache=writethrough).
Stefan
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-12 8:51 ` Stefan Hajnoczi
@ 2010-05-12 9:42 ` Jamie Lokier
2010-05-12 10:43 ` Stefan Hajnoczi
0 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2010-05-12 9:42 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, g, qemu-devel, Alexander Graf, hch
Stefan Hajnoczi wrote:
> Why add a nop AIO operation instead of setting
> BlockDriverState->enable_write_cache to zero? In that case no write
> cache would be reported to the guest (just like cache=writethrough).
Hmm. If the guest sees write cache absent, that prevents changing the
cache policy on the host later (from not flushing to flushing), which
you might want to do after an OS install has finished and booted up.
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 22:11 ` Paul Brook
@ 2010-05-12 10:09 ` Jamie Lokier
2010-05-17 12:40 ` Christoph Hellwig
1 sibling, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2010-05-12 10:09 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, armbru, Christoph Hellwig, qemu-devel, Alexander Graf
Paul Brook wrote:
> > > Paul Brook wrote:
> > > > cache=none:
> > > > No host caching. Reads and writes both go directly to underlying
> > > > storage.
> > > >
> > > > Useful to avoid double-caching.
> > > >
> > > > cache=writethrough
> > > >
> > > > Reads are cached. Writes go directly to underlying storage. Useful
> > > > for
> > > >
> > > > broken guests that aren't aware of drive caches.
> > >
> > > These are misleading descriptions - because cache=none does not push
> > > writes down to powerfail-safe storage, while cache=writethrough might.
> >
> > If so, then this is a serious bug.
>
> .. though it may be a kernel bug rather that a qemu bug, depending on the
> exact details.
It's not a kernel bug. cache=none uses O_DIRECT, and O_DIRECT must
not force writes to powerfail-safe storage. If it did, it would be
unusably slow for applications using O_DIRECT as a performance
enhancer / memory saver. They can call fsync/fdatasync when they need
to for integrity. (There might be kernel bugs in the latter department.)
> Either way, I consider any mode that inhibits host filesystem write
> cache but not volatile drive cache to be pretty worthless.
On the contrary, it greatly reduces host memory consumption so that
guest data isn't cached twice (it's already cached in the guest), and
it may improve performance by relaxing the POSIX write-serialisation
constraint (not sure if Linux cares; Solaris does).
> Either we guaranteed data integrity on completion or we don't.
The problem with the description of cache=none is it uses O_DIRECT,
which does always not push writes to powerfail-safe storage,.
O_DIRECT is effectively a hint. It requests less caching in kernel
memory, may reduce memory usage and copying, may invoke direct DMA.
O_DIRECT does not tell the disk hardware to commit to powerfail-safe
storage. I.e. it doesn't issue barriers or disable disk write caching.
(However, depending on a host setup, it might have that effect if disk
write cache is disabled by the admin).
Also, it doesn't even always write to disk: It falls back to buffered
in some circumstances, even on filesystems which support it - see
recent patches for btrfs which use buffered I/O for O_DIRECT for some
parts of some files. (Many non-Linux OSes fall back to buffered
when any other process holds a non-O_DIRECT file descriptor, or when
requests don't meet some criteria).
The POSIX thing to use for cache=none would be O_DSYNC|O_RSYNC, and
that should work on some hosts, but Linux doesn't implement real O_RSYNC.
A combination which ought to work is O_DSYNC|O_DIRECT. O_DIRECT is
the performance hint; O_DSYNC provides the commit request. Christoph
Hellwig has mentioned that combination elsewhere on this thread.
It makes sense to me for cache=none.
O_DIRECT by itself is a useful performance & memory hint, so there
does need to be some option which maps onto O_DIRECT alone. But it
shouldn't be documented as stronger than cache=writethrough, because
it isn't.
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-12 9:42 ` Jamie Lokier
@ 2010-05-12 10:43 ` Stefan Hajnoczi
2010-05-12 12:50 ` Jamie Lokier
0 siblings, 1 reply; 41+ messages in thread
From: Stefan Hajnoczi @ 2010-05-12 10:43 UTC (permalink / raw)
To: Jamie Lokier; +Cc: kwolf, g, qemu-devel, Alexander Graf, hch
On Wed, May 12, 2010 at 10:42 AM, Jamie Lokier <jamie@shareable.org> wrote:
> Stefan Hajnoczi wrote:
>> Why add a nop AIO operation instead of setting
>> BlockDriverState->enable_write_cache to zero? In that case no write
>> cache would be reported to the guest (just like cache=writethrough).
>
> Hmm. If the guest sees write cache absent, that prevents changing the
> cache policy on the host later (from not flushing to flushing), which
> you might want to do after an OS install has finished and booted up.
Right. There are 3 cases from the guest perspective:
1. Disable write cache or no write cache. Flushing not needed.
2. Disable flushing but leave write cache enabled.
3. Enable write cache and use flushing.
When we don't report a write cache at all, the guest is always stuck at 1.
If you're going to do this for installs and other temporary workloads,
then enabling the write cache again isn't an issue. After installing
successfully, restart the guest with a sane cache= mode.
Stefan
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush
2010-05-12 10:43 ` Stefan Hajnoczi
@ 2010-05-12 12:50 ` Jamie Lokier
0 siblings, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2010-05-12 12:50 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, g, qemu-devel, Alexander Graf, hch
Stefan Hajnoczi wrote:
> On Wed, May 12, 2010 at 10:42 AM, Jamie Lokier <jamie@shareable.org> wrote:
> > Stefan Hajnoczi wrote:
> >> Why add a nop AIO operation instead of setting
> >> BlockDriverState->enable_write_cache to zero? In that case no write
> >> cache would be reported to the guest (just like cache=writethrough).
> >
> > Hmm. If the guest sees write cache absent, that prevents changing the
> > cache policy on the host later (from not flushing to flushing), which
> > you might want to do after an OS install has finished and booted up.
>
> Right. There are 3 cases from the guest perspective:
>
> 1. Disable write cache or no write cache. Flushing not needed.
> 2. Disable flushing but leave write cache enabled.
> 3. Enable write cache and use flushing.
>
> When we don't report a write cache at all, the guest is always stuck at 1.
>
> If you're going to do this for installs and other temporary workloads,
> then enabling the write cache again isn't an issue. After installing
> successfully, restart the guest with a sane cache= mode.
That only works if you're happy to reboot the guest after the process
finishes. I guess that is usually fine, but it is a restriction.
Is it possible via QMP to request that the guest is paused when it
next reboots, so that QMP operations to change the cache= mode can be
done (as it's not safe to change the guest-visible disk write cache
availability when it's running, and probably a request to do so should
be denied).
-- Jamie
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 8:36 ` [Qemu-devel] " Kevin Wolf
2010-05-11 10:55 ` Christoph Hellwig
2010-05-11 19:04 ` Avi Kivity
@ 2010-05-12 15:05 ` Alexander Graf
2010-05-12 15:36 ` Kevin Wolf
2 siblings, 1 reply; 41+ messages in thread
From: Alexander Graf @ 2010-05-12 15:05 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, hch
Kevin Wolf wrote:
> Am 10.05.2010 23:51, schrieb Alexander Graf:
>
>> Usually the guest can tell the host to flush data to disk. In some cases we
>> don't want to flush though, but try to keep everything in cache.
>>
>> So let's add a new parameter to -drive that allows us to set the flushing
>> behavior to "on" or "off", defaulting to enabling the guest to flush.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>
>
> What about another cache=... value instead of adding more options? I'm
> quite sure you'll only ever need this with writeback caching. So we
> could have cache=none|writethrough|writeback|wb-noflush or something
> like that.
>
>
Yes, cache=volatile seems reasonable. Or cache=unsafe.
>> ---
>> block/raw-posix.c | 13 +++++++++++++
>>
>
> This is obviously wrong. If you want to introduce new behaviour to the
> block layer, you must do it consistently and not just for one block
> driver. So these changes should be made to the generic functions in
> block.c instead.
>
How so? The callback functions are called using bdrv->drv->xxx. If I
modify that pointer, I end up affecting all other virtual disks as well.
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-12 15:05 ` Alexander Graf
@ 2010-05-12 15:36 ` Kevin Wolf
2010-05-12 15:51 ` Alexander Graf
0 siblings, 1 reply; 41+ messages in thread
From: Kevin Wolf @ 2010-05-12 15:36 UTC (permalink / raw)
To: Alexander Graf; +Cc: qemu-devel, hch
Am 12.05.2010 17:05, schrieb Alexander Graf:
> Kevin Wolf wrote:
>> Am 10.05.2010 23:51, schrieb Alexander Graf:
>>
>>> Usually the guest can tell the host to flush data to disk. In some cases we
>>> don't want to flush though, but try to keep everything in cache.
>>>
>>> So let's add a new parameter to -drive that allows us to set the flushing
>>> behavior to "on" or "off", defaulting to enabling the guest to flush.
>>>
>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>
>>
>> What about another cache=... value instead of adding more options? I'm
>> quite sure you'll only ever need this with writeback caching. So we
>> could have cache=none|writethrough|writeback|wb-noflush or something
>> like that.
>>
>>
>
> Yes, cache=volatile seems reasonable. Or cache=unsafe.
>
>>> ---
>>> block/raw-posix.c | 13 +++++++++++++
>>>
>>
>> This is obviously wrong. If you want to introduce new behaviour to the
>> block layer, you must do it consistently and not just for one block
>> driver. So these changes should be made to the generic functions in
>> block.c instead.
>>
>
> How so? The callback functions are called using bdrv->drv->xxx. If I
> modify that pointer, I end up affecting all other virtual disks as well.
By doing the check in bdrv_flush/bdrv_aio_flush before this function
pointer is even called.
Kevin
^ permalink raw reply [flat|nested] 41+ messages in thread
* [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-12 15:36 ` Kevin Wolf
@ 2010-05-12 15:51 ` Alexander Graf
0 siblings, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-12 15:51 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, hch
Am 12.05.2010 um 17:36 schrieb Kevin Wolf <kwolf@redhat.com>:
> Am 12.05.2010 17:05, schrieb Alexander Graf:
>> Kevin Wolf wrote:
>>> Am 10.05.2010 23:51, schrieb Alexander Graf:
>>>
>>>> Usually the guest can tell the host to flush data to disk. In
>>>> some cases we
>>>> don't want to flush though, but try to keep everything in cache.
>>>>
>>>> So let's add a new parameter to -drive that allows us to set the
>>>> flushing
>>>> behavior to "on" or "off", defaulting to enabling the guest to
>>>> flush.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>>
>>>
>>> What about another cache=... value instead of adding more options?
>>> I'm
>>> quite sure you'll only ever need this with writeback caching. So we
>>> could have cache=none|writethrough|writeback|wb-noflush or something
>>> like that.
>>>
>>>
>>
>> Yes, cache=volatile seems reasonable. Or cache=unsafe.
>>
>>>> ---
>>>> block/raw-posix.c | 13 +++++++++++++
>>>>
>>>
>>> This is obviously wrong. If you want to introduce new behaviour to
>>> the
>>> block layer, you must do it consistently and not just for one block
>>> driver. So these changes should be made to the generic functions in
>>> block.c instead.
>>>
>>
>> How so? The callback functions are called using bdrv->drv->xxx. If I
>> modify that pointer, I end up affecting all other virtual disks as
>> well.
>
> By doing the check in bdrv_flush/bdrv_aio_flush before this function
> pointer is even called.
Sorry yeah, realized that 2 minutes after writing the mail myself :).
I put together an updated patch, but need to test it properly first.
Alex
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 10:55 ` Christoph Hellwig
2010-05-11 12:15 ` Paul Brook
@ 2010-05-14 9:16 ` Markus Armbruster
2010-05-17 12:41 ` Christoph Hellwig
1 sibling, 1 reply; 41+ messages in thread
From: Markus Armbruster @ 2010-05-14 9:16 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kevin Wolf, Alexander Graf, qemu-devel
Christoph Hellwig <hch@lst.de> writes:
[...]
> Markus was looking into separating the block device state in host/
> guest portions lately, and splitting cache= would help with this.
> Driver cache enabled or not (fsync means enabled, O_DSYNC disabled,
> and none probably disabled given that we don't care), while O_DIRECT
> vs normal cached I/O is host state.
Let me provide a bit more context. -drive is a happy mix of guest and
host parameters. Some parameters are clearly host, some clearly guest,
and some are... complicated. For instance, I'd argue that "readonly"
applies to both host and guest.
I'm working on turning all guest parameters proper qdev properties, and
collect the host parameters in a new -blockdev. -drive stays for
compatibility and convenience (I'd like to deprecate if=none, though).
If you change -drive, please keep the coming separation in mind, and
avoid parameters that mix up guest and host.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-11 22:11 ` Paul Brook
2010-05-12 10:09 ` Jamie Lokier
@ 2010-05-17 12:40 ` Christoph Hellwig
1 sibling, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2010-05-17 12:40 UTC (permalink / raw)
To: Paul Brook
Cc: Kevin Wolf, Alexander Graf, Christoph Hellwig, qemu-devel, armbru
On Tue, May 11, 2010 at 11:11:12PM +0100, Paul Brook wrote:
> .. though it may be a kernel bug rather that a qemu bug, depending on the
> exact details. Either way, I consider any mode that inhibits host filesystem
> write cache but not volatile drive cache to be pretty worthless. Either we
> guaranteed data integrity on completion or we don't.
O_DIRECT just means bypassing the pagecache, it does not mean flushing
the disk cache on every access, which for certain workloads can be very
painful. It also doesn't require a synchronous writeout of metadata
required to reach the data, e.g. in case when we have to allocate blocks
for a sparse image file.
To get the behaviour you want you need O_DIRECT|O_DSYNC, which is
something that is not exposed by qemu's current cache= suboption.
If we want to implement this properly we need to split the cache option,
as I already mentioned. This would also have benefits in other areas,
but again refer to my previous mail for that.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-14 9:16 ` Markus Armbruster
@ 2010-05-17 12:41 ` Christoph Hellwig
2010-05-17 12:42 ` Alexander Graf
0 siblings, 1 reply; 41+ messages in thread
From: Christoph Hellwig @ 2010-05-17 12:41 UTC (permalink / raw)
To: Markus Armbruster
Cc: Kevin Wolf, qemu-devel, Christoph Hellwig, Alexander Graf
On Fri, May 14, 2010 at 11:16:47AM +0200, Markus Armbruster wrote:
> If you change -drive, please keep the coming separation in mind, and
> avoid parameters that mix up guest and host.
This new option seems to be exactly that, unfortunately..
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
2010-05-17 12:41 ` Christoph Hellwig
@ 2010-05-17 12:42 ` Alexander Graf
0 siblings, 0 replies; 41+ messages in thread
From: Alexander Graf @ 2010-05-17 12:42 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Kevin Wolf, Markus Armbruster, qemu-devel
On 17.05.2010, at 14:41, Christoph Hellwig wrote:
> On Fri, May 14, 2010 at 11:16:47AM +0200, Markus Armbruster wrote:
>> If you change -drive, please keep the coming separation in mind, and
>> avoid parameters that mix up guest and host.
>
> This new option seems to be exactly that, unfortunately..
>
How so? It only affects the host. The guest never gets to see that flushing is disabled.
Alex
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2010-05-17 12:42 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-10 21:51 [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Alexander Graf
2010-05-10 21:51 ` [Qemu-devel] [PATCH 2/2] Add flush=off parameter to -drive Alexander Graf
2010-05-11 8:36 ` [Qemu-devel] " Kevin Wolf
2010-05-11 10:55 ` Christoph Hellwig
2010-05-11 12:15 ` Paul Brook
2010-05-11 12:43 ` Anthony Liguori
2010-05-11 13:12 ` Paul Brook
2010-05-11 13:20 ` Anthony Liguori
2010-05-11 13:50 ` Paul Brook
2010-05-11 15:40 ` Anthony Liguori
2010-05-11 15:53 ` Paul Brook
2010-05-11 17:09 ` Anthony Liguori
2010-05-11 22:33 ` Paul Brook
2010-05-11 19:11 ` Avi Kivity
2010-05-11 16:32 ` Jamie Lokier
2010-05-11 17:15 ` Anthony Liguori
2010-05-11 18:13 ` Jamie Lokier
2010-05-11 15:18 ` Alexander Graf
2010-05-11 18:20 ` Jamie Lokier
2010-05-11 21:58 ` Paul Brook
2010-05-11 22:11 ` Paul Brook
2010-05-12 10:09 ` Jamie Lokier
2010-05-17 12:40 ` Christoph Hellwig
2010-05-14 9:16 ` Markus Armbruster
2010-05-17 12:41 ` Christoph Hellwig
2010-05-17 12:42 ` Alexander Graf
2010-05-11 19:04 ` Avi Kivity
2010-05-12 15:05 ` Alexander Graf
2010-05-12 15:36 ` Kevin Wolf
2010-05-12 15:51 ` Alexander Graf
2010-05-11 6:18 ` [Qemu-devel] [PATCH 1/2] Add no-op aio emulation stub Stefan Hajnoczi
2010-05-11 8:29 ` [Qemu-devel] " Kevin Wolf
2010-05-10 21:59 ` [Qemu-devel] [PATCH 0/2] Enable qemu block layer to not flush Anthony Liguori
2010-05-10 22:03 ` Alexander Graf
2010-05-10 22:12 ` Anthony Liguori
2010-05-11 21:48 ` Jamie Lokier
2010-05-12 8:51 ` Stefan Hajnoczi
2010-05-12 9:42 ` Jamie Lokier
2010-05-12 10:43 ` Stefan Hajnoczi
2010-05-12 12:50 ` Jamie Lokier
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.