All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
@ 2018-04-27 16:23 Stefan Hajnoczi
  2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-04-27 16:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Dr. David Alan Gilbert, Eric Blake,
	qemu-block, Markus Armbruster, Max Reitz, Sergio Lopez,
	Stefan Hajnoczi

v2:
 * Add comment on !__linux__ situation [Fam]
 * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]

file-posix.c only supports shared storage live migration with -drive
cache.direct=off due to cache consistency issues.  There are two main shared
storage configurations: files on NFS and host block devices on SAN LUNs.

The problem is that QEMU starts on the destination host before the source host
has written everything out to the disk.  The page cache on the destination host
may contain stale data read when QEMU opened the image file (before migration
handover).  Using O_DIRECT avoids this problem but prevents users from taking
advantage of the host page cache.

Although cache=none is the recommended setting for virtualization use cases,
there are scenarios where cache=writeback makes sense.  If the guest has much
less RAM than the host or many guests share the same backing file, then the
host page cache can significantly improve disk I/O performance.

This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
on Linux so that shared storage live migration works.  I have sent it as an RFC
because cache consistency is not binary, there are corner cases which I've
described in the actual patch, and this may require more discussion.

Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
After this patch cache consistency is no longer provided by O_DIRECT.

This patch series relies on fdatasync(2) (source) +
posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
fdatasync(2) so that NFS has up-to-date inode metadata?

Stefan Hajnoczi (2):
  block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  block/file-posix: add x-check-page-cache=on|off option

 qapi/block-core.json |   7 ++-
 block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 150 insertions(+), 3 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-27 16:23 [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
@ 2018-04-27 16:23 ` Stefan Hajnoczi
  2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 2/2] block/file-posix: add x-check-page-cache=on|off option Stefan Hajnoczi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-04-27 16:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Dr. David Alan Gilbert, Eric Blake,
	qemu-block, Markus Armbruster, Max Reitz, Sergio Lopez,
	Stefan Hajnoczi

On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
this to drop page cache on the destination host during shared storage
migration.  This way the destination host will read the latest copy of
the data and will not use stale data from the page cache.

The flow is as follows:

1. Source host writes out all dirty pages and inactivates drives.
2. QEMU_VM_EOF is sent on migration stream.
3. Destination host invalidates caches before accessing drives.

This patch enables live migration even with -drive cache.direct=off.

* Terms and conditions may apply, please see patch for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 3794c0007a..3707ea2d1c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2236,6 +2236,49 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     return ret | BDRV_BLOCK_OFFSET_VALID;
 }
 
+static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
+                                                 Error **errp)
+{
+    BDRVRawState *s = bs->opaque;
+    int ret;
+
+    ret = fd_open(bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "The file descriptor is not open");
+        return;
+    }
+
+    if (s->open_flags & O_DIRECT) {
+        return; /* No host kernel page cache */
+    }
+
+#if defined(__linux__)
+    /* This sets the scene for the next syscall... */
+    ret = bdrv_co_flush(bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "flush failed");
+        return;
+    }
+
+    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
+     * process.  These limitations are okay because we just fsynced the file,
+     * we don't use mmap, and the file should not be in use by other processes.
+     */
+    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
+    if (ret != 0) { /* the return value is a positive errno */
+        error_setg_errno(errp, ret, "fadvise failed");
+        return;
+    }
+#else /* __linux__ */
+    /* Do nothing.  Live migration to a remote host with cache.direct=off is
+     * unsupported on other host operating systems.  Cache consistency issues
+     * may occur but no error is reported here, partly because that's the
+     * historical behavior and partly because it's hard to differentiate valid
+     * configurations that should not cause errors.
+     */
+#endif /* !__linux__ */
+}
+
 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
     int64_t offset, int bytes,
     BlockCompletionFunc *cb, void *opaque)
@@ -2328,6 +2371,7 @@ BlockDriver bdrv_file = {
     .bdrv_co_create_opts = raw_co_create_opts,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
     .bdrv_co_block_status = raw_co_block_status,
+    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
     .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
 
     .bdrv_co_preadv         = raw_co_preadv,
@@ -2805,6 +2849,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_reopen_abort   = raw_reopen_abort,
     .bdrv_co_create_opts = hdev_co_create_opts,
     .create_opts         = &raw_create_opts,
+    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
     .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
 
     .bdrv_co_preadv         = raw_co_preadv,
@@ -2927,6 +2972,7 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_reopen_abort   = raw_reopen_abort,
     .bdrv_co_create_opts = hdev_co_create_opts,
     .create_opts         = &raw_create_opts,
+    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
 
 
     .bdrv_co_preadv         = raw_co_preadv,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] block/file-posix: add x-check-page-cache=on|off option
  2018-04-27 16:23 [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
  2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
@ 2018-04-27 16:23 ` Stefan Hajnoczi
  2018-05-03  7:33 ` [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Fam Zheng
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-04-27 16:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Dr. David Alan Gilbert, Eric Blake,
	qemu-block, Markus Armbruster, Max Reitz, Sergio Lopez,
	Stefan Hajnoczi

mincore(2) checks whether pages are resident.  Use it to verify that
page cache has been dropped.

You can trigger a verification failure by mmapping the image file from
another process that loads a byte from a page, forcing it to become
resident.  bdrv_co_invalidate_cache() will fail while that process is
alive.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/block-core.json |   7 +++-
 block/file-posix.c   | 100 +++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c50517bff3..21c3470234 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2530,6 +2530,10 @@
 # @locking:     whether to enable file locking. If set to 'auto', only enable
 #               when Open File Descriptor (OFD) locking API is available
 #               (default: auto, since 2.10)
+# @x-check-cache-dropped: whether to check that page cache was dropped on live
+#                         migration.  May cause noticeable delays if the image
+#                         file is large, do not use in production.
+#                         (default: off) (since: 2.13)
 #
 # Since: 2.9
 ##
@@ -2537,7 +2541,8 @@
   'data': { 'filename': 'str',
             '*pr-manager': 'str',
             '*locking': 'OnOffAuto',
-            '*aio': 'BlockdevAioOptions' } }
+            '*aio': 'BlockdevAioOptions',
+            '*x-check-cache-dropped': 'bool' } }
 
 ##
 # @BlockdevOptionsNull:
diff --git a/block/file-posix.c b/block/file-posix.c
index 3707ea2d1c..5a602cfe37 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -161,6 +161,7 @@ typedef struct BDRVRawState {
     bool page_cache_inconsistent:1;
     bool has_fallocate;
     bool needs_alignment;
+    bool check_cache_dropped;
 
     PRManager *pr_mgr;
 } BDRVRawState;
@@ -168,6 +169,7 @@ typedef struct BDRVRawState {
 typedef struct BDRVRawReopenState {
     int fd;
     int open_flags;
+    bool check_cache_dropped;
 } BDRVRawReopenState;
 
 static int fd_open(BlockDriverState *bs);
@@ -415,6 +417,11 @@ static QemuOptsList raw_runtime_opts = {
             .type = QEMU_OPT_STRING,
             .help = "id of persistent reservation manager object (default: none)",
         },
+        {
+            .name = "x-check-cache-dropped",
+            .type = QEMU_OPT_BOOL,
+            .help = "check that page cache was dropped on live migration (default: off)"
+        },
         { /* end of list */ }
     },
 };
@@ -500,6 +507,9 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped",
+                                               false);
+
     s->open_flags = open_flags;
     raw_parse_flags(bdrv_flags, &s->open_flags);
 
@@ -777,6 +787,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 {
     BDRVRawState *s;
     BDRVRawReopenState *rs;
+    QemuOpts *opts;
     int ret = 0;
     Error *local_err = NULL;
 
@@ -787,6 +798,19 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 
     state->opaque = g_new0(BDRVRawReopenState, 1);
     rs = state->opaque;
+    rs->fd = -1;
+
+    /* Handle options changes */
+    opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, state->options, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        ret = -EINVAL;
+        goto out;
+    }
+
+    rs->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped",
+                                                s->check_cache_dropped);
 
     if (s->type == FTYPE_CD) {
         rs->open_flags |= O_NONBLOCK;
@@ -794,8 +818,6 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 
     raw_parse_flags(state->flags, &rs->open_flags);
 
-    rs->fd = -1;
-
     int fcntl_flags = O_APPEND | O_NONBLOCK;
 #ifdef O_NOATIME
     fcntl_flags |= O_NOATIME;
@@ -850,6 +872,8 @@ static int raw_reopen_prepare(BDRVReopenState *state,
         }
     }
 
+out:
+    qemu_opts_del(opts);
     return ret;
 }
 
@@ -858,6 +882,7 @@ static void raw_reopen_commit(BDRVReopenState *state)
     BDRVRawReopenState *rs = state->opaque;
     BDRVRawState *s = state->bs->opaque;
 
+    s->check_cache_dropped = rs->check_cache_dropped;
     s->open_flags = rs->open_flags;
 
     qemu_close(s->fd);
@@ -2236,6 +2261,73 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     return ret | BDRV_BLOCK_OFFSET_VALID;
 }
 
+#if defined(__linux__)
+/* Verify that the file is not in the page cache */
+static void check_cache_dropped(BlockDriverState *bs, Error **errp)
+{
+    const size_t window_size = 128 * 1024 * 1024;
+    BDRVRawState *s = bs->opaque;
+    void *window = NULL;
+    size_t length = 0;
+    unsigned char *vec;
+    size_t page_size;
+    off_t offset;
+    off_t end;
+
+    /* mincore(2) page status information requires 1 byte per page */
+    page_size = sysconf(_SC_PAGESIZE);
+    vec = g_malloc(DIV_ROUND_UP(window_size, page_size));
+
+    end = raw_getlength(bs);
+
+    for (offset = 0; offset < end; offset += window_size) {
+        void *new_window;
+        size_t new_length;
+        size_t vec_end;
+        size_t i;
+        int ret;
+
+        /* Unmap previous window if size has changed */
+        new_length = MIN(end - offset, window_size);
+        if (new_length != length) {
+            munmap(window, length);
+            window = NULL;
+            length = 0;
+        }
+
+        new_window = mmap(window, new_length, PROT_NONE, MAP_PRIVATE,
+                          s->fd, offset);
+        if (new_window == MAP_FAILED) {
+            error_setg_errno(errp, errno, "mmap failed");
+            break;
+        }
+
+        window = new_window;
+        length = new_length;
+
+        ret = mincore(window, length, vec);
+        if (ret < 0) {
+            error_setg_errno(errp, errno, "mincore failed");
+            break;
+        }
+
+        vec_end = DIV_ROUND_UP(length, page_size);
+        for (i = 0; i < vec_end; i++) {
+            if (vec[i] & 0x1) {
+                error_setg(errp, "page cache still in use!");
+                break;
+            }
+        }
+    }
+
+    if (window) {
+        munmap(window, length);
+    }
+
+    g_free(vec);
+}
+#endif /* __linux__ */
+
 static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
                                                  Error **errp)
 {
@@ -2269,6 +2361,10 @@ static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
         error_setg_errno(errp, ret, "fadvise failed");
         return;
     }
+
+    if (s->check_cache_dropped) {
+        check_cache_dropped(bs, errp);
+    }
 #else /* __linux__ */
     /* Do nothing.  Live migration to a remote host with cache.direct=off is
      * unsupported on other host operating systems.  Cache consistency issues
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-27 16:23 [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
  2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
  2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 2/2] block/file-posix: add x-check-page-cache=on|off option Stefan Hajnoczi
@ 2018-05-03  7:33 ` Fam Zheng
  2018-05-08 10:32 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2018-05-11 15:50 ` [Qemu-devel] " Stefan Hajnoczi
  4 siblings, 0 replies; 7+ messages in thread
From: Fam Zheng @ 2018-05-03  7:33 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Dr. David Alan Gilbert, Eric Blake,
	qemu-block, Markus Armbruster, Max Reitz, Sergio Lopez

On Fri, 04/27 17:23, Stefan Hajnoczi wrote:
> v2:
>  * Add comment on !__linux__ situation [Fam]
>  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-27 16:23 [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2018-05-03  7:33 ` [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Fam Zheng
@ 2018-05-08 10:32 ` Stefan Hajnoczi
  2018-05-08 10:46   ` Kevin Wolf
  2018-05-11 15:50 ` [Qemu-devel] " Stefan Hajnoczi
  4 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-05-08 10:32 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, stefanha, Fam Zheng, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Markus Armbruster, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

On Fri, Apr 27, 2018 at 05:23:10PM +0100, Stefan Hajnoczi wrote:
> v2:
>  * Add comment on !__linux__ situation [Fam]
>  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]
> 
> file-posix.c only supports shared storage live migration with -drive
> cache.direct=off due to cache consistency issues.  There are two main shared
> storage configurations: files on NFS and host block devices on SAN LUNs.
> 
> The problem is that QEMU starts on the destination host before the source host
> has written everything out to the disk.  The page cache on the destination host
> may contain stale data read when QEMU opened the image file (before migration
> handover).  Using O_DIRECT avoids this problem but prevents users from taking
> advantage of the host page cache.
> 
> Although cache=none is the recommended setting for virtualization use cases,
> there are scenarios where cache=writeback makes sense.  If the guest has much
> less RAM than the host or many guests share the same backing file, then the
> host page cache can significantly improve disk I/O performance.
> 
> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> on Linux so that shared storage live migration works.  I have sent it as an RFC
> because cache consistency is not binary, there are corner cases which I've
> described in the actual patch, and this may require more discussion.
> 
> Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
> consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
> After this patch cache consistency is no longer provided by O_DIRECT.
> 
> This patch series relies on fdatasync(2) (source) +
> posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
> for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
> fdatasync(2) so that NFS has up-to-date inode metadata?
> 
> Stefan Hajnoczi (2):
>   block/file-posix: implement bdrv_co_invalidate_cache() on Linux
>   block/file-posix: add x-check-page-cache=on|off option
> 
>  qapi/block-core.json |   7 ++-
>  block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 150 insertions(+), 3 deletions(-)

Kevin: Are you happy with this series?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-05-08 10:32 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-05-08 10:46   ` Kevin Wolf
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Wolf @ 2018-05-08 10:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, stefanha, Fam Zheng, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Markus Armbruster, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2730 bytes --]

Am 08.05.2018 um 12:32 hat Stefan Hajnoczi geschrieben:
> On Fri, Apr 27, 2018 at 05:23:10PM +0100, Stefan Hajnoczi wrote:
> > v2:
> >  * Add comment on !__linux__ situation [Fam]
> >  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]
> > 
> > file-posix.c only supports shared storage live migration with -drive
> > cache.direct=off due to cache consistency issues.  There are two main shared
> > storage configurations: files on NFS and host block devices on SAN LUNs.
> > 
> > The problem is that QEMU starts on the destination host before the source host
> > has written everything out to the disk.  The page cache on the destination host
> > may contain stale data read when QEMU opened the image file (before migration
> > handover).  Using O_DIRECT avoids this problem but prevents users from taking
> > advantage of the host page cache.
> > 
> > Although cache=none is the recommended setting for virtualization use cases,
> > there are scenarios where cache=writeback makes sense.  If the guest has much
> > less RAM than the host or many guests share the same backing file, then the
> > host page cache can significantly improve disk I/O performance.
> > 
> > This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> > on Linux so that shared storage live migration works.  I have sent it as an RFC
> > because cache consistency is not binary, there are corner cases which I've
> > described in the actual patch, and this may require more discussion.
> > 
> > Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
> > consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
> > After this patch cache consistency is no longer provided by O_DIRECT.
> > 
> > This patch series relies on fdatasync(2) (source) +
> > posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
> > for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
> > fdatasync(2) so that NFS has up-to-date inode metadata?
> > 
> > Stefan Hajnoczi (2):
> >   block/file-posix: implement bdrv_co_invalidate_cache() on Linux
> >   block/file-posix: add x-check-page-cache=on|off option
> > 
> >  qapi/block-core.json |   7 ++-
> >  block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 150 insertions(+), 3 deletions(-)
> 
> Kevin: Are you happy with this series?

Yes, I think I am.

I'm still kind of concerned about misleading people into believing that
cache=writeback + live migration is generally safe when it's only in
special cases, but that's not really a concern about the code, but about
how we communicate the feature.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-27 16:23 [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2018-05-08 10:32 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2018-05-11 15:50 ` Stefan Hajnoczi
  4 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2018-05-11 15:50 UTC (permalink / raw)
  To: qemu-devel
  Cc: Fam Zheng, Kevin Wolf, Dr. David Alan Gilbert, Eric Blake,
	qemu-block, Markus Armbruster, Max Reitz, Sergio Lopez

[-- Attachment #1: Type: text/plain, Size: 2382 bytes --]

On Fri, Apr 27, 2018 at 05:23:10PM +0100, Stefan Hajnoczi wrote:
> v2:
>  * Add comment on !__linux__ situation [Fam]
>  * Add file-posix.c x-check-cache-dropped=on|off option [DaveG, Kevin]
> 
> file-posix.c only supports shared storage live migration with -drive
> cache.direct=off due to cache consistency issues.  There are two main shared
> storage configurations: files on NFS and host block devices on SAN LUNs.
> 
> The problem is that QEMU starts on the destination host before the source host
> has written everything out to the disk.  The page cache on the destination host
> may contain stale data read when QEMU opened the image file (before migration
> handover).  Using O_DIRECT avoids this problem but prevents users from taking
> advantage of the host page cache.
> 
> Although cache=none is the recommended setting for virtualization use cases,
> there are scenarios where cache=writeback makes sense.  If the guest has much
> less RAM than the host or many guests share the same backing file, then the
> host page cache can significantly improve disk I/O performance.
> 
> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> on Linux so that shared storage live migration works.  I have sent it as an RFC
> because cache consistency is not binary, there are corner cases which I've
> described in the actual patch, and this may require more discussion.
> 
> Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
> consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
> After this patch cache consistency is no longer provided by O_DIRECT.
> 
> This patch series relies on fdatasync(2) (source) +
> posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
> for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
> fdatasync(2) so that NFS has up-to-date inode metadata?
> 
> Stefan Hajnoczi (2):
>   block/file-posix: implement bdrv_co_invalidate_cache() on Linux
>   block/file-posix: add x-check-page-cache=on|off option
> 
>  qapi/block-core.json |   7 ++-
>  block/file-posix.c   | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 150 insertions(+), 3 deletions(-)
> 
> -- 
> 2.14.3
> 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-05-11 15:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 16:23 [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
2018-04-27 16:23 ` [Qemu-devel] [PATCH v2 2/2] block/file-posix: add x-check-page-cache=on|off option Stefan Hajnoczi
2018-05-03  7:33 ` [Qemu-devel] [PATCH v2 0/2] block/file-posix: allow -drive cache.direct=off live migration Fam Zheng
2018-05-08 10:32 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-05-08 10:46   ` Kevin Wolf
2018-05-11 15:50 ` [Qemu-devel] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.