All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration
@ 2018-04-19  7:52 Stefan Hajnoczi
  2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-19  7:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Stefan Hajnoczi

file-posix.c only supports shared storage live migration with -drive
cache.direct=off due to cache consistency issues.  There are two main shared
storage configurations: files on NFS and host block devices on SAN LUNs.

The problem is that QEMU starts on the destination host before the source host
has written everything out to the disk.  The page cache on the destination host
may contain stale data read when QEMU opened the image file (before migration
handover).  Using O_DIRECT avoids this problem but prevents users from taking
advantage of the host page cache.

Although cache=none is the recommended setting for virtualization use cases,
there are scenarios where cache=writeback makes sense.  If the guest has much
less RAM than the host or many guests share the same backing file, then the
host page cache can significantly improve disk I/O performance.

This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
on Linux so that shared storage live migration works.  I have sent it as an RFC
because cache consistency is not binary, there are corner cases which I've
described in the actual patch, and this may require more discussion.

Regarding NFS, QEMU relies on O_DIRECT rather than the close-to-open
consistency model (see nfs(5)), which is the basic guarantee provided by NFS.
After this patch cache consistency is no longer provided by O_DIRECT.

This patch series relies on fdatasync(2) (source) +
posix_fadvise(POSIX_FADV_DONTNEED) (destination) instead.  I believe it is safe
for both NFS and SAN LUNs.  Maybe we should use fsync(2) instead of
fdatasync(2) so that NFS has up-to-date inode metadata?

Stefan Hajnoczi (2):
  block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  block/file-posix: verify page cache is not used

 block/file-posix.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-19  7:52 [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
@ 2018-04-19  7:52 ` Stefan Hajnoczi
  2018-04-19  8:13   ` Fam Zheng
  2018-04-19  9:18   ` Dr. David Alan Gilbert
  2018-04-19  7:52 ` [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used Stefan Hajnoczi
  2018-04-19 16:09 ` [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Eric Blake
  2 siblings, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-19  7:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Stefan Hajnoczi

On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
this to drop page cache on the destination host during shared storage
migration.  This way the destination host will read the latest copy of
the data and will not use stale data from the page cache.

The flow is as follows:

1. Source host writes out all dirty pages and inactivates drives.
2. QEMU_VM_EOF is sent on migration stream.
3. Destination host invalidates caches before accessing drives.

This patch enables live migration even with -drive cache.direct=off.

* Terms and conditions may apply, please see patch for details.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 3794c0007a..df4f52919f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     return ret | BDRV_BLOCK_OFFSET_VALID;
 }
 
+static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
+                                                 Error **errp)
+{
+    BDRVRawState *s = bs->opaque;
+    int ret;
+
+    ret = fd_open(bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "The file descriptor is not open");
+        return;
+    }
+
+    if (s->open_flags & O_DIRECT) {
+        return; /* No host kernel page cache */
+    }
+
+#if defined(__linux__)
+    /* This sets the scene for the next syscall... */
+    ret = bdrv_co_flush(bs);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "flush failed");
+        return;
+    }
+
+    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
+     * process.  These limitations are okay because we just fsynced the file,
+     * we don't use mmap, and the file should not be in use by other processes.
+     */
+    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
+    if (ret != 0) { /* the return value is a positive errno */
+        error_setg_errno(errp, ret, "fadvise failed");
+        return;
+    }
+#endif /* __linux__ */
+}
+
 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
     int64_t offset, int bytes,
     BlockCompletionFunc *cb, void *opaque)
@@ -2328,6 +2364,7 @@ BlockDriver bdrv_file = {
     .bdrv_co_create_opts = raw_co_create_opts,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
     .bdrv_co_block_status = raw_co_block_status,
+    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
     .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
 
     .bdrv_co_preadv         = raw_co_preadv,
@@ -2805,6 +2842,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_reopen_abort   = raw_reopen_abort,
     .bdrv_co_create_opts = hdev_co_create_opts,
     .create_opts         = &raw_create_opts,
+    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
     .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
 
     .bdrv_co_preadv         = raw_co_preadv,
@@ -2927,6 +2965,7 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_reopen_abort   = raw_reopen_abort,
     .bdrv_co_create_opts = hdev_co_create_opts,
     .create_opts         = &raw_create_opts,
+    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
 
 
     .bdrv_co_preadv         = raw_co_preadv,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-19  7:52 [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
  2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
@ 2018-04-19  7:52 ` Stefan Hajnoczi
  2018-04-19  9:05   ` Dr. David Alan Gilbert
  2018-04-19 16:09 ` [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Eric Blake
  2 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-19  7:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Stefan Hajnoczi

This commit is for debugging only.  Do not merge it.

mincore(2) checks whether pages are resident.  Use it to verify that
page cache has been dropped.

You can trigger a verification failure by mmapping the image file from
another process and loading a byte from a page so that it becomes
resident.  bdrv_co_invalidate_cache() will fail while the process is
alive.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index df4f52919f..d3105269c6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2236,6 +2236,75 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     return ret | BDRV_BLOCK_OFFSET_VALID;
 }
 
+static bool is_mincore(void *addr, size_t length)
+{
+    size_t vec_len = DIV_ROUND_UP(length, sysconf(_SC_PAGESIZE));
+    unsigned char *vec;
+    size_t i;
+    int ret;
+    bool incore = false;
+
+    vec = g_malloc(vec_len);
+    ret = mincore(addr, length, vec);
+    if (ret < 0) {
+        incore = true;
+        goto out;
+    }
+
+    for (i = 0; i < vec_len; i++) {
+        if (vec[i] & 0x1) {
+            incore = true;
+            break;
+        }
+    }
+
+out:
+    g_free(vec);
+    return incore;
+}
+
+static void check_not_in_page_cache(BlockDriverState *bs, Error **errp)
+{
+    const size_t WINDOW_SIZE = 128 * 1024 * 1024;
+    BDRVRawState *s = bs->opaque;
+    void *window = NULL;
+    size_t length = 0;
+    off_t end;
+    off_t offset;
+
+    end = raw_getlength(bs);
+
+    for (offset = 0; offset < end; offset += WINDOW_SIZE) {
+        void *new_window;
+        size_t new_length = MIN(end - offset, WINDOW_SIZE);
+
+        if (new_length != length) {
+            munmap(window, length);
+            window = NULL;
+            length = 0;
+        }
+
+        new_window = mmap(window, new_length, PROT_NONE, MAP_PRIVATE,
+                          s->fd, offset);
+        if (new_window == MAP_FAILED) {
+            error_setg_errno(errp, errno, "mmap failed");
+            break;
+        }
+
+        window = new_window;
+        length = new_length;
+
+        if (is_mincore(window, length)) {
+            error_setg(errp, "page cache still in use!");
+            break;
+        }
+    }
+
+    if (window) {
+        munmap(window, length);
+    }
+}
+
 static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
                                                  Error **errp)
 {
@@ -2270,6 +2339,8 @@ static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
         return;
     }
 #endif /* __linux__ */
+
+    check_not_in_page_cache(bs, errp);
 }
 
 static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
@ 2018-04-19  8:13   ` Fam Zheng
  2018-04-20  3:15     ` Stefan Hajnoczi
  2018-04-19  9:18   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2018-04-19  8:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz

On Thu, 04/19 15:52, Stefan Hajnoczi wrote:
> On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> this to drop page cache on the destination host during shared storage
> migration.  This way the destination host will read the latest copy of
> the data and will not use stale data from the page cache.
> 
> The flow is as follows:
> 
> 1. Source host writes out all dirty pages and inactivates drives.
> 2. QEMU_VM_EOF is sent on migration stream.
> 3. Destination host invalidates caches before accessing drives.
> 
> This patch enables live migration even with -drive cache.direct=off.
> 
> * Terms and conditions may apply, please see patch for details.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 3794c0007a..df4f52919f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>      return ret | BDRV_BLOCK_OFFSET_VALID;
>  }
>  
> +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> +                                                 Error **errp)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    int ret;
> +
> +    ret = fd_open(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> +        return;
> +    }
> +
> +    if (s->open_flags & O_DIRECT) {
> +        return; /* No host kernel page cache */
> +    }
> +
> +#if defined(__linux__)
> +    /* This sets the scene for the next syscall... */
> +    ret = bdrv_co_flush(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "flush failed");
> +        return;
> +    }
> +
> +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> +     * process.  These limitations are okay because we just fsynced the file,
> +     * we don't use mmap, and the file should not be in use by other processes.
> +     */
> +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> +    if (ret != 0) { /* the return value is a positive errno */
> +        error_setg_errno(errp, ret, "fadvise failed");
> +        return;
> +    }
> +#endif /* __linux__ */

What about the #else branch? It doesn't automatically work, I guess?

Fam

> +}
> +
>  static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
>      int64_t offset, int bytes,
>      BlockCompletionFunc *cb, void *opaque)
> @@ -2328,6 +2364,7 @@ BlockDriver bdrv_file = {
>      .bdrv_co_create_opts = raw_co_create_opts,
>      .bdrv_has_zero_init = bdrv_has_zero_init_1,
>      .bdrv_co_block_status = raw_co_block_status,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>      .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> @@ -2805,6 +2842,7 @@ static BlockDriver bdrv_host_device = {
>      .bdrv_reopen_abort   = raw_reopen_abort,
>      .bdrv_co_create_opts = hdev_co_create_opts,
>      .create_opts         = &raw_create_opts,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>      .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> @@ -2927,6 +2965,7 @@ static BlockDriver bdrv_host_cdrom = {
>      .bdrv_reopen_abort   = raw_reopen_abort,
>      .bdrv_co_create_opts = hdev_co_create_opts,
>      .create_opts         = &raw_create_opts,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>  
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> -- 
> 2.14.3
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-19  7:52 ` [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used Stefan Hajnoczi
@ 2018-04-19  9:05   ` Dr. David Alan Gilbert
  2018-04-20  3:02     ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-19  9:05 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> This commit is for debugging only.  Do not merge it.
> 
> mincore(2) checks whether pages are resident.  Use it to verify that
> page cache has been dropped.
> 
> You can trigger a verification failure by mmapping the image file from
> another process and loading a byte from a page so that it becomes
> resident.  bdrv_co_invalidate_cache() will fail while the process is
> alive.

It doesn't seem a bad diagnostic to keep in (with a switch to activate)
for when we're faced with some weird corruption on some weird storage
system.

Dave

> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index df4f52919f..d3105269c6 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2236,6 +2236,75 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>      return ret | BDRV_BLOCK_OFFSET_VALID;
>  }
>  
> +static bool is_mincore(void *addr, size_t length)
> +{
> +    size_t vec_len = DIV_ROUND_UP(length, sysconf(_SC_PAGESIZE));
> +    unsigned char *vec;
> +    size_t i;
> +    int ret;
> +    bool incore = false;
> +
> +    vec = g_malloc(vec_len);
> +    ret = mincore(addr, length, vec);
> +    if (ret < 0) {
> +        incore = true;
> +        goto out;
> +    }
> +
> +    for (i = 0; i < vec_len; i++) {
> +        if (vec[i] & 0x1) {
> +            incore = true;
> +            break;
> +        }
> +    }
> +
> +out:
> +    g_free(vec);
> +    return incore;
> +}
> +
> +static void check_not_in_page_cache(BlockDriverState *bs, Error **errp)
> +{
> +    const size_t WINDOW_SIZE = 128 * 1024 * 1024;
> +    BDRVRawState *s = bs->opaque;
> +    void *window = NULL;
> +    size_t length = 0;
> +    off_t end;
> +    off_t offset;
> +
> +    end = raw_getlength(bs);
> +
> +    for (offset = 0; offset < end; offset += WINDOW_SIZE) {
> +        void *new_window;
> +        size_t new_length = MIN(end - offset, WINDOW_SIZE);
> +
> +        if (new_length != length) {
> +            munmap(window, length);
> +            window = NULL;
> +            length = 0;
> +        }
> +
> +        new_window = mmap(window, new_length, PROT_NONE, MAP_PRIVATE,
> +                          s->fd, offset);
> +        if (new_window == MAP_FAILED) {
> +            error_setg_errno(errp, errno, "mmap failed");
> +            break;
> +        }
> +
> +        window = new_window;
> +        length = new_length;
> +
> +        if (is_mincore(window, length)) {
> +            error_setg(errp, "page cache still in use!");
> +            break;
> +        }
> +    }
> +
> +    if (window) {
> +        munmap(window, length);
> +    }
> +}
> +
>  static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
>                                                   Error **errp)
>  {
> @@ -2270,6 +2339,8 @@ static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
>          return;
>      }
>  #endif /* __linux__ */
> +
> +    check_not_in_page_cache(bs, errp);
>  }
>  
>  static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
  2018-04-19  8:13   ` Fam Zheng
@ 2018-04-19  9:18   ` Dr. David Alan Gilbert
  2018-04-20  3:21     ` Stefan Hajnoczi
  1 sibling, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2018-04-19  9:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> this to drop page cache on the destination host during shared storage
> migration.  This way the destination host will read the latest copy of
> the data and will not use stale data from the page cache.
> 
> The flow is as follows:
> 
> 1. Source host writes out all dirty pages and inactivates drives.
> 2. QEMU_VM_EOF is sent on migration stream.
> 3. Destination host invalidates caches before accessing drives.
> 
> This patch enables live migration even with -drive cache.direct=off.
> 
> * Terms and conditions may apply, please see patch for details.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 3794c0007a..df4f52919f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>      return ret | BDRV_BLOCK_OFFSET_VALID;
>  }
>  
> +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> +                                                 Error **errp)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    int ret;
> +
> +    ret = fd_open(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> +        return;
> +    }
> +
> +    if (s->open_flags & O_DIRECT) {
> +        return; /* No host kernel page cache */
> +    }
> +
> +#if defined(__linux__)
> +    /* This sets the scene for the next syscall... */
> +    ret = bdrv_co_flush(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "flush failed");
> +        return;
> +    }
> +
> +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> +     * process.  These limitations are okay because we just fsynced the file,
> +     * we don't use mmap, and the file should not be in use by other processes.
> +     */
> +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);

What happens if I try a migrate between two qemu's on the same host?
(Which I, and avocado, both use for testing; I think think users
occasionally do for QEMU updates).

Dave

> +    if (ret != 0) { /* the return value is a positive errno */
> +        error_setg_errno(errp, ret, "fadvise failed");
> +        return;
> +    }
> +#endif /* __linux__ */
> +}
> +
>  static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
>      int64_t offset, int bytes,
>      BlockCompletionFunc *cb, void *opaque)
> @@ -2328,6 +2364,7 @@ BlockDriver bdrv_file = {
>      .bdrv_co_create_opts = raw_co_create_opts,
>      .bdrv_has_zero_init = bdrv_has_zero_init_1,
>      .bdrv_co_block_status = raw_co_block_status,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>      .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> @@ -2805,6 +2842,7 @@ static BlockDriver bdrv_host_device = {
>      .bdrv_reopen_abort   = raw_reopen_abort,
>      .bdrv_co_create_opts = hdev_co_create_opts,
>      .create_opts         = &raw_create_opts,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>      .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> @@ -2927,6 +2965,7 @@ static BlockDriver bdrv_host_cdrom = {
>      .bdrv_reopen_abort   = raw_reopen_abort,
>      .bdrv_co_create_opts = hdev_co_create_opts,
>      .create_opts         = &raw_create_opts,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>  
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-19  7:52 [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
  2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
  2018-04-19  7:52 ` [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used Stefan Hajnoczi
@ 2018-04-19 16:09 ` Eric Blake
  2018-04-20  3:05   ` Stefan Hajnoczi
  2 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2018-04-19 16:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, Sergio Lopez, qemu-block, Dr. David Alan Gilbert,
	Max Reitz, Vladimir Sementsov-Ogievskiy

[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]

On 04/19/2018 02:52 AM, Stefan Hajnoczi wrote:
> file-posix.c only supports shared storage live migration with -drive
> cache.direct=off due to cache consistency issues.  There are two main shared
> storage configurations: files on NFS and host block devices on SAN LUNs.
> 
> The problem is that QEMU starts on the destination host before the source host
> has written everything out to the disk.  The page cache on the destination host
> may contain stale data read when QEMU opened the image file (before migration
> handover).  Using O_DIRECT avoids this problem but prevents users from taking
> advantage of the host page cache.
> 
> Although cache=none is the recommended setting for virtualization use cases,
> there are scenarios where cache=writeback makes sense.  If the guest has much
> less RAM than the host or many guests share the same backing file, then the
> host page cache can significantly improve disk I/O performance.
> 
> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> on Linux so that shared storage live migration works.  I have sent it as an RFC
> because cache consistency is not binary, there are corner cases which I've
> described in the actual patch, and this may require more discussion.

Interesting, in that the NBD list is also discussing the possible
standardization of a NBD_CMD_CACHE command (based on existing practice
in the xNBD implementation), and covering whether that MIGHT be worth
doing as a thin wrapper that corresponds to posix_fadvise() semantics.
Thus, if NBD_CMD_CACHE learns flags, we could support
.bdrv_co_invalidate_cache() through the NBD protocol driver, in addition
to the POSIX file driver.  Obviously, your usage invalidates the cache
of the entire file; but does it also make sense to expose a start/length
subset invalidation, for better exposure to posix_fadvise() semantics?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-19  9:05   ` Dr. David Alan Gilbert
@ 2018-04-20  3:02     ` Stefan Hajnoczi
  2018-04-20  6:25       ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  3:02 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 817 bytes --]

On Thu, Apr 19, 2018 at 10:05:47AM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > This commit is for debugging only.  Do not merge it.
> > 
> > mincore(2) checks whether pages are resident.  Use it to verify that
> > page cache has been dropped.
> > 
> > You can trigger a verification failure by mmapping the image file from
> > another process and loading a byte from a page so that it becomes
> > resident.  bdrv_co_invalidate_cache() will fail while the process is
> > alive.
> 
> It doesn't seem a bad diagnostic to keep in (with a switch to activate)
> for when we're faced with some weird corruption on some weird storage
> system.

Okay.  It's very slow to mmap an entire image file and query mincore(2)
so it needs to be off by default.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-19 16:09 ` [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Eric Blake
@ 2018-04-20  3:05   ` Stefan Hajnoczi
  2018-04-20 13:53     ` Eric Blake
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  3:05 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz, Vladimir Sementsov-Ogievskiy

[-- Attachment #1: Type: text/plain, Size: 2344 bytes --]

On Thu, Apr 19, 2018 at 11:09:53AM -0500, Eric Blake wrote:
> On 04/19/2018 02:52 AM, Stefan Hajnoczi wrote:
> > file-posix.c only supports shared storage live migration with -drive
> > cache.direct=off due to cache consistency issues.  There are two main shared
> > storage configurations: files on NFS and host block devices on SAN LUNs.
> > 
> > The problem is that QEMU starts on the destination host before the source host
> > has written everything out to the disk.  The page cache on the destination host
> > may contain stale data read when QEMU opened the image file (before migration
> > handover).  Using O_DIRECT avoids this problem but prevents users from taking
> > advantage of the host page cache.
> > 
> > Although cache=none is the recommended setting for virtualization use cases,
> > there are scenarios where cache=writeback makes sense.  If the guest has much
> > less RAM than the host or many guests share the same backing file, then the
> > host page cache can significantly improve disk I/O performance.
> > 
> > This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> > on Linux so that shared storage live migration works.  I have sent it as an RFC
> > because cache consistency is not binary, there are corner cases which I've
> > described in the actual patch, and this may require more discussion.
> 
> Interesting, in that the NBD list is also discussing the possible
> standardization of a NBD_CMD_CACHE command (based on existing practice
> in the xNBD implementation), and covering whether that MIGHT be worth
> doing as a thin wrapper that corresponds to posix_fadvise() semantics.
> Thus, if NBD_CMD_CACHE learns flags, we could support
> .bdrv_co_invalidate_cache() through the NBD protocol driver, in addition
> to the POSIX file driver.  Obviously, your usage invalidates the cache
> of the entire file; but does it also make sense to expose a start/length
> subset invalidation, for better exposure to posix_fadvise() semantics?

bdrv_co_invalidate_cache() is currently only used by migration before
using a file that may have been touched by the other host.  I don't
think start/length will be needed for that use case.

Can you describe how will NBD use cache invalidation?  Maybe this will
help me understand other use cases.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-19  8:13   ` Fam Zheng
@ 2018-04-20  3:15     ` Stefan Hajnoczi
  2018-04-20  3:36       ` Fam Zheng
  2018-04-20  6:13       ` Kevin Wolf
  0 siblings, 2 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  3:15 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-devel, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3065 bytes --]

On Thu, Apr 19, 2018 at 04:13:44PM +0800, Fam Zheng wrote:
> On Thu, 04/19 15:52, Stefan Hajnoczi wrote:
> > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> > this to drop page cache on the destination host during shared storage
> > migration.  This way the destination host will read the latest copy of
> > the data and will not use stale data from the page cache.
> > 
> > The flow is as follows:
> > 
> > 1. Source host writes out all dirty pages and inactivates drives.
> > 2. QEMU_VM_EOF is sent on migration stream.
> > 3. Destination host invalidates caches before accessing drives.
> > 
> > This patch enables live migration even with -drive cache.direct=off.
> > 
> > * Terms and conditions may apply, please see patch for details.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 39 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 3794c0007a..df4f52919f 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> >      return ret | BDRV_BLOCK_OFFSET_VALID;
> >  }
> >  
> > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> > +                                                 Error **errp)
> > +{
> > +    BDRVRawState *s = bs->opaque;
> > +    int ret;
> > +
> > +    ret = fd_open(bs);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> > +        return;
> > +    }
> > +
> > +    if (s->open_flags & O_DIRECT) {
> > +        return; /* No host kernel page cache */
> > +    }
> > +
> > +#if defined(__linux__)
> > +    /* This sets the scene for the next syscall... */
> > +    ret = bdrv_co_flush(bs);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "flush failed");
> > +        return;
> > +    }
> > +
> > +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> > +     * process.  These limitations are okay because we just fsynced the file,
> > +     * we don't use mmap, and the file should not be in use by other processes.
> > +     */
> > +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> > +    if (ret != 0) { /* the return value is a positive errno */
> > +        error_setg_errno(errp, ret, "fadvise failed");
> > +        return;
> > +    }
> > +#endif /* __linux__ */
> 
> What about the #else branch? It doesn't automatically work, I guess?

Right, no error is reported.  This is existing QEMU behavior.

If we want to change behavior then it must be done consistently (i.e. by
auditing the other block drivers) and we need to be prepared for bug
reports (just like file locking, it may expose interesting use cases
that we cannot easily dismiss as wrong).  I didn't want to go there.

If there is consensus then I will change the behavior.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-19  9:18   ` Dr. David Alan Gilbert
@ 2018-04-20  3:21     ` Stefan Hajnoczi
  2018-04-20  6:27       ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-20  3:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Max Reitz, Kevin Wolf, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3231 bytes --]

On Thu, Apr 19, 2018 at 10:18:33AM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> > this to drop page cache on the destination host during shared storage
> > migration.  This way the destination host will read the latest copy of
> > the data and will not use stale data from the page cache.
> > 
> > The flow is as follows:
> > 
> > 1. Source host writes out all dirty pages and inactivates drives.
> > 2. QEMU_VM_EOF is sent on migration stream.
> > 3. Destination host invalidates caches before accessing drives.
> > 
> > This patch enables live migration even with -drive cache.direct=off.
> > 
> > * Terms and conditions may apply, please see patch for details.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 39 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 3794c0007a..df4f52919f 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> >      return ret | BDRV_BLOCK_OFFSET_VALID;
> >  }
> >  
> > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> > +                                                 Error **errp)
> > +{
> > +    BDRVRawState *s = bs->opaque;
> > +    int ret;
> > +
> > +    ret = fd_open(bs);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> > +        return;
> > +    }
> > +
> > +    if (s->open_flags & O_DIRECT) {
> > +        return; /* No host kernel page cache */
> > +    }
> > +
> > +#if defined(__linux__)
> > +    /* This sets the scene for the next syscall... */
> > +    ret = bdrv_co_flush(bs);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "flush failed");
> > +        return;
> > +    }
> > +
> > +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> > +     * process.  These limitations are okay because we just fsynced the file,
> > +     * we don't use mmap, and the file should not be in use by other processes.
> > +     */
> > +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> 
> What happens if I try a migrate between two qemu's on the same host?
> (Which I, and avocado, both use for testing; I think think users
> occasionally do for QEMU updates).

The steps quoted from the commit description:

  1. Source host writes out all dirty pages and inactivates drives.
  2. QEMU_VM_EOF is sent on migration stream.
  3. Destination host invalidates caches before accessing drives.

When we reach Step 3 the source QEMU is not doing I/O (no pages are
locked).  The destination QEMU does bdrv_co_flush() so even if pages are
still dirty (that shouldn't happen since the source already drained and
flushed) they will be written out and pages will be clean.  Therefore
fadvise really invalidates all resident pages.

FWIW when writing this patch I tested with both QEMUs on the same host.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-20  3:15     ` Stefan Hajnoczi
@ 2018-04-20  3:36       ` Fam Zheng
  2018-04-20  6:13       ` Kevin Wolf
  1 sibling, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2018-04-20  3:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz

On Fri, 04/20 11:15, Stefan Hajnoczi wrote:
> On Thu, Apr 19, 2018 at 04:13:44PM +0800, Fam Zheng wrote:
> > On Thu, 04/19 15:52, Stefan Hajnoczi wrote:
> > > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> > > this to drop page cache on the destination host during shared storage
> > > migration.  This way the destination host will read the latest copy of
> > > the data and will not use stale data from the page cache.
> > > 
> > > The flow is as follows:
> > > 
> > > 1. Source host writes out all dirty pages and inactivates drives.
> > > 2. QEMU_VM_EOF is sent on migration stream.
> > > 3. Destination host invalidates caches before accessing drives.
> > > 
> > > This patch enables live migration even with -drive cache.direct=off.
> > > 
> > > * Terms and conditions may apply, please see patch for details.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 39 insertions(+)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 3794c0007a..df4f52919f 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> > >      return ret | BDRV_BLOCK_OFFSET_VALID;
> > >  }
> > >  
> > > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> > > +                                                 Error **errp)
> > > +{
> > > +    BDRVRawState *s = bs->opaque;
> > > +    int ret;
> > > +
> > > +    ret = fd_open(bs);
> > > +    if (ret < 0) {
> > > +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> > > +        return;
> > > +    }
> > > +
> > > +    if (s->open_flags & O_DIRECT) {
> > > +        return; /* No host kernel page cache */
> > > +    }
> > > +
> > > +#if defined(__linux__)
> > > +    /* This sets the scene for the next syscall... */
> > > +    ret = bdrv_co_flush(bs);
> > > +    if (ret < 0) {
> > > +        error_setg_errno(errp, -ret, "flush failed");
> > > +        return;
> > > +    }
> > > +
> > > +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> > > +     * process.  These limitations are okay because we just fsynced the file,
> > > +     * we don't use mmap, and the file should not be in use by other processes.
> > > +     */
> > > +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> > > +    if (ret != 0) { /* the return value is a positive errno */
> > > +        error_setg_errno(errp, ret, "fadvise failed");
> > > +        return;
> > > +    }
> > > +#endif /* __linux__ */
> > 
> > What about the #else branch? It doesn't automatically work, I guess?
> 
> Right, no error is reported.  This is existing QEMU behavior.
> 
> If we want to change behavior then it must be done consistently (i.e. by
> auditing the other block drivers) and we need to be prepared for bug
> reports (just like file locking, it may expose interesting use cases
> that we cannot easily dismiss as wrong).  I didn't want to go there.
> 
> If there is consensus then I will change the behavior.

No need to change behavior (reporting error), at least not in this patch. But a

#else
    /* TODO: ... */
#endif

to remember adding similar code to invalidate system cache on other *nix systems
cannot hurt.  E.g BSDes have posix_fadvise() too, though I have no idea if
POSIX_FADV_DONTNEED works the same.

(I'm sure there are some tricks on Windows to but do we care? :)

Fam

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-20  3:15     ` Stefan Hajnoczi
  2018-04-20  3:36       ` Fam Zheng
@ 2018-04-20  6:13       ` Kevin Wolf
  1 sibling, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2018-04-20  6:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Fam Zheng, qemu-devel, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3894 bytes --]

Am 20.04.2018 um 05:15 hat Stefan Hajnoczi geschrieben:
> On Thu, Apr 19, 2018 at 04:13:44PM +0800, Fam Zheng wrote:
> > On Thu, 04/19 15:52, Stefan Hajnoczi wrote:
> > > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> > > this to drop page cache on the destination host during shared storage
> > > migration.  This way the destination host will read the latest copy of
> > > the data and will not use stale data from the page cache.
> > > 
> > > The flow is as follows:
> > > 
> > > 1. Source host writes out all dirty pages and inactivates drives.
> > > 2. QEMU_VM_EOF is sent on migration stream.
> > > 3. Destination host invalidates caches before accessing drives.
> > > 
> > > This patch enables live migration even with -drive cache.direct=off.
> > > 
> > > * Terms and conditions may apply, please see patch for details.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 39 insertions(+)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 3794c0007a..df4f52919f 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> > >      return ret | BDRV_BLOCK_OFFSET_VALID;
> > >  }
> > >  
> > > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> > > +                                                 Error **errp)
> > > +{
> > > +    BDRVRawState *s = bs->opaque;
> > > +    int ret;
> > > +
> > > +    ret = fd_open(bs);
> > > +    if (ret < 0) {
> > > +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> > > +        return;
> > > +    }
> > > +
> > > +    if (s->open_flags & O_DIRECT) {
> > > +        return; /* No host kernel page cache */
> > > +    }
> > > +
> > > +#if defined(__linux__)
> > > +    /* This sets the scene for the next syscall... */
> > > +    ret = bdrv_co_flush(bs);
> > > +    if (ret < 0) {
> > > +        error_setg_errno(errp, -ret, "flush failed");
> > > +        return;
> > > +    }
> > > +
> > > +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> > > +     * process.  These limitations are okay because we just fsynced the file,
> > > +     * we don't use mmap, and the file should not be in use by other processes.
> > > +     */
> > > +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> > > +    if (ret != 0) { /* the return value is a positive errno */
> > > +        error_setg_errno(errp, ret, "fadvise failed");
> > > +        return;
> > > +    }
> > > +#endif /* __linux__ */
> > 
> > What about the #else branch? It doesn't automatically work, I guess?
> 
> Right, no error is reported.  This is existing QEMU behavior.
> 
> If we want to change behavior then it must be done consistently (i.e. by
> auditing the other block drivers) and we need to be prepared for bug
> reports (just like file locking, it may expose interesting use cases
> that we cannot easily dismiss as wrong).  I didn't want to go there.
> 
> If there is consensus then I will change the behavior.

I think either way that would be for a separate patch.

I'm also not sure how useful that change would actually be because it
might give you a false sense of safety: Even with this patch, you still
need to be exactly aware of the conditions that make live migration with
shared storage work correctly. If we error out on some unsafe cases,
but not on others, this might be confusing.

On the other hand, the problematic image format drivers have been
setting migration blockers for a long time, so you could also argue that
file-posix is inconsistent with them because it completely ignores
unsafe scenarios.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-20  3:02     ` Stefan Hajnoczi
@ 2018-04-20  6:25       ` Kevin Wolf
  2018-04-24 14:04         ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2018-04-20  6:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Dr. David Alan Gilbert, qemu-devel, Max Reitz, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

Am 20.04.2018 um 05:02 hat Stefan Hajnoczi geschrieben:
> On Thu, Apr 19, 2018 at 10:05:47AM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > This commit is for debugging only.  Do not merge it.
> > > 
> > > mincore(2) checks whether pages are resident.  Use it to verify that
> > > page cache has been dropped.
> > > 
> > > You can trigger a verification failure by mmapping the image file from
> > > another process and loading a byte from a page so that it becomes
> > > resident.  bdrv_co_invalidate_cache() will fail while the process is
> > > alive.
> > 
> > It doesn't seem a bad diagnostic to keep in (with a switch to activate)
> > for when we're faced with some weird corruption on some weird storage
> > system.
> 
> Okay.  It's very slow to mmap an entire image file and query mincore(2)
> so it needs to be off by default.

Also, having it enabled breaks localhost migration at least on tmpfs
(which was what I tried out first).

I wonder if the kernel would add some way to query whether the "advice"
was actually acted upon if we asked. Either with a new function that
returns an error if not everything is dropped (basically
.bdrv_invalidate_cache on the kernel level), or a function that just
queries if any page is allocated (or maybe the address of the first
allocated page in a given range) without having to use mincore() and
iterating over all the pages in userspace.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
  2018-04-20  3:21     ` Stefan Hajnoczi
@ 2018-04-20  6:27       ` Kevin Wolf
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2018-04-20  6:27 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Dr. David Alan Gilbert, qemu-devel, Max Reitz, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

Am 20.04.2018 um 05:21 hat Stefan Hajnoczi geschrieben:
> On Thu, Apr 19, 2018 at 10:18:33AM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> > > this to drop page cache on the destination host during shared storage
> > > migration.  This way the destination host will read the latest copy of
> > > the data and will not use stale data from the page cache.
> > > 
> > > The flow is as follows:
> > > 
> > > 1. Source host writes out all dirty pages and inactivates drives.
> > > 2. QEMU_VM_EOF is sent on migration stream.
> > > 3. Destination host invalidates caches before accessing drives.
> > > 
> > > This patch enables live migration even with -drive cache.direct=off.
> > > 
> > > * Terms and conditions may apply, please see patch for details.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 39 insertions(+)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 3794c0007a..df4f52919f 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> > >      return ret | BDRV_BLOCK_OFFSET_VALID;
> > >  }
> > >  
> > > +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> > > +                                                 Error **errp)
> > > +{
> > > +    BDRVRawState *s = bs->opaque;
> > > +    int ret;
> > > +
> > > +    ret = fd_open(bs);
> > > +    if (ret < 0) {
> > > +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> > > +        return;
> > > +    }
> > > +
> > > +    if (s->open_flags & O_DIRECT) {
> > > +        return; /* No host kernel page cache */
> > > +    }
> > > +
> > > +#if defined(__linux__)
> > > +    /* This sets the scene for the next syscall... */
> > > +    ret = bdrv_co_flush(bs);
> > > +    if (ret < 0) {
> > > +        error_setg_errno(errp, -ret, "flush failed");
> > > +        return;
> > > +    }
> > > +
> > > +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> > > +     * process.  These limitations are okay because we just fsynced the file,
> > > +     * we don't use mmap, and the file should not be in use by other processes.
> > > +     */
> > > +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);
> > 
> > What happens if I try a migrate between two qemu's on the same host?
> > (Which I, and avocado, both use for testing; I think think users
> > occasionally do for QEMU updates).
> 
> The steps quoted from the commit description:
> 
>   1. Source host writes out all dirty pages and inactivates drives.
>   2. QEMU_VM_EOF is sent on migration stream.
>   3. Destination host invalidates caches before accessing drives.
> 
> When we reach Step 3 the source QEMU is not doing I/O (no pages are
> locked).  The destination QEMU does bdrv_co_flush() so even if pages are
> still dirty (that shouldn't happen since the source already drained and
> flushed) they will be written out and pages will be clean.  Therefore
> fadvise really invalidates all resident pages.
> 
> FWIW when writing this patch I tested with both QEMUs on the same host.

Which is actually unnecessary overhead on localhost because the local
kernel page cache can't be incoherent with itself. But I don't think
it's a real problem either.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-20  3:05   ` Stefan Hajnoczi
@ 2018-04-20 13:53     ` Eric Blake
  2018-04-24 13:43       ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Blake @ 2018-04-20 13:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz, Vladimir Sementsov-Ogievskiy

[-- Attachment #1: Type: text/plain, Size: 2247 bytes --]

On 04/19/2018 10:05 PM, Stefan Hajnoczi wrote:

>>> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
>>> on Linux so that shared storage live migration works.  I have sent it as an RFC
>>> because cache consistency is not binary, there are corner cases which I've
>>> described in the actual patch, and this may require more discussion.
>>
>> Interesting, in that the NBD list is also discussing the possible
>> standardization of a NBD_CMD_CACHE command (based on existing practice
>> in the xNBD implementation), and covering whether that MIGHT be worth
>> doing as a thin wrapper that corresponds to posix_fadvise() semantics.
>> Thus, if NBD_CMD_CACHE learns flags, we could support
>> .bdrv_co_invalidate_cache() through the NBD protocol driver, in addition
>> to the POSIX file driver.  Obviously, your usage invalidates the cache
>> of the entire file; but does it also make sense to expose a start/length
>> subset invalidation, for better exposure to posix_fadvise() semantics?
> 
> bdrv_co_invalidate_cache() is currently only used by migration before
> using a file that may have been touched by the other host.  I don't
> think start/length will be needed for that use case.
> 
> Can you describe how will NBD use cache invalidation?  Maybe this will
> help me understand other use cases.

That's where things are still under discussion - no one has yet provided
a case that would benefit from a POSIX_FADV_DONTNEED over just a range
of the file [1]; on the other hand, it might make sense that if you know
an implementation has a limited cache, then having control over the
various posix_fadvise() flags over various ranges of the files may lead
to more optimum behavior.  And posix_fadvise() does have the ability to
work over the entire file (offset 0 length 0) or a subrange (any offset
and nonzero length).  So I'm also fine if .bdrv_co_invalidate_cache()
doesn't expose offset/length parameters, particularly if NBD can't come
up with an actual use case that would benefit.

[1] https://lists.debian.org/nbd/2018/04/msg00020.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration
  2018-04-20 13:53     ` Eric Blake
@ 2018-04-24 13:43       ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-24 13:43 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Sergio Lopez, qemu-block,
	Dr. David Alan Gilbert, Max Reitz, Vladimir Sementsov-Ogievskiy

[-- Attachment #1: Type: text/plain, Size: 2437 bytes --]

On Fri, Apr 20, 2018 at 08:53:33AM -0500, Eric Blake wrote:
> On 04/19/2018 10:05 PM, Stefan Hajnoczi wrote:
> 
> >>> This patch series implements .bdrv_co_invalidate_cache() for block/file-posix.c
> >>> on Linux so that shared storage live migration works.  I have sent it as an RFC
> >>> because cache consistency is not binary, there are corner cases which I've
> >>> described in the actual patch, and this may require more discussion.
> >>
> >> Interesting, in that the NBD list is also discussing the possible
> >> standardization of a NBD_CMD_CACHE command (based on existing practice
> >> in the xNBD implementation), and covering whether that MIGHT be worth
> >> doing as a thin wrapper that corresponds to posix_fadvise() semantics.
> >> Thus, if NBD_CMD_CACHE learns flags, we could support
> >> .bdrv_co_invalidate_cache() through the NBD protocol driver, in addition
> >> to the POSIX file driver.  Obviously, your usage invalidates the cache
> >> of the entire file; but does it also make sense to expose a start/length
> >> subset invalidation, for better exposure to posix_fadvise() semantics?
> > 
> > bdrv_co_invalidate_cache() is currently only used by migration before
> > using a file that may have been touched by the other host.  I don't
> > think start/length will be needed for that use case.
> > 
> > Can you describe how will NBD use cache invalidation?  Maybe this will
> > help me understand other use cases.
> 
> That's where things are still under discussion - no one has yet provided
> a case that would benefit from a POSIX_FADV_DONTNEED over just a range
> of the file [1]; on the other hand, it might make sense that if you know
> an implementation has a limited cache, then having control over the
> various posix_fadvise() flags over various ranges of the files may lead
> to more optimum behavior.  And posix_fadvise() does have the ability to
> work over the entire file (offset 0 length 0) or a subrange (any offset
> and nonzero length).  So I'm also fine if .bdrv_co_invalidate_cache()
> doesn't expose offset/length parameters, particularly if NBD can't come
> up with an actual use case that would benefit.
> 
> [1] https://lists.debian.org/nbd/2018/04/msg00020.html

Okay.  .bdrv_co_invalidate_cache() is an internal interface.  We can
change it later to support NBD functionality, if necessary.  Let's keep
it without start/length for now.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-20  6:25       ` Kevin Wolf
@ 2018-04-24 14:04         ` Stefan Hajnoczi
  2018-04-24 14:29           ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-24 14:04 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Dr. David Alan Gilbert, qemu-devel, Max Reitz, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2555 bytes --]

On Fri, Apr 20, 2018 at 08:25:13AM +0200, Kevin Wolf wrote:
> Am 20.04.2018 um 05:02 hat Stefan Hajnoczi geschrieben:
> > On Thu, Apr 19, 2018 at 10:05:47AM +0100, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > This commit is for debugging only.  Do not merge it.
> > > > 
> > > > mincore(2) checks whether pages are resident.  Use it to verify that
> > > > page cache has been dropped.
> > > > 
> > > > You can trigger a verification failure by mmapping the image file from
> > > > another process and loading a byte from a page so that it becomes
> > > > resident.  bdrv_co_invalidate_cache() will fail while the process is
> > > > alive.
> > > 
> > > It doesn't seem a bad diagnostic to keep in (with a switch to activate)
> > > for when we're faced with some weird corruption on some weird storage
> > > system.
> > 
> > Okay.  It's very slow to mmap an entire image file and query mincore(2)
> > so it needs to be off by default.
> 
> Also, having it enabled breaks localhost migration at least on tmpfs
> (which was what I tried out first).
> 
> I wonder if the kernel would add some way to query whether the "advice"
> was actually acted upon if we asked. Either with a new function that
> returns an error if not everything is dropped (basically
> .bdrv_invalidate_cache on the kernel level), or a function that just
> queries if any page is allocated (or maybe the address of the first
> allocated page in a given range) without having to use mincore() and
> iterating over all the pages in userspace.

I'm trying to figure out how to expose the optional mincore check on the
command-line/QMP:

1. Add a check_consistency bool argument to bdrv_invalidate_cache*().
   Add command-line/QMP option to -incoming and migrate_incoming.  This
   is messy and won't be easy to access for libvirt users.

2. Add a BlockdevOptionsFile *check-cache-consistency bool field.  This
   is specified at .bdrv_open() time.  It can be changed at runtime with
   .bdrv_reopen*().

3. Add a 'blockdev-check-cache-consistency' QMP command that calls a new
   .bdrv_check_cache_consistency() callback that is implemented by
   file-posix.c.  The problem is users might issue this command after
   I/O has resumed and pages have become resident again.  It only makes
   sense if the guest is still paused.  Probably a bad interface...

Have I missed a good way to expose this optional check functionality?

Which approach do you prefer?  I'm leaning towards #2.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-24 14:04         ` Stefan Hajnoczi
@ 2018-04-24 14:29           ` Kevin Wolf
  2018-04-27 10:06             ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2018-04-24 14:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Dr. David Alan Gilbert, qemu-devel, Max Reitz, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2925 bytes --]

Am 24.04.2018 um 16:04 hat Stefan Hajnoczi geschrieben:
> On Fri, Apr 20, 2018 at 08:25:13AM +0200, Kevin Wolf wrote:
> > Am 20.04.2018 um 05:02 hat Stefan Hajnoczi geschrieben:
> > > On Thu, Apr 19, 2018 at 10:05:47AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > This commit is for debugging only.  Do not merge it.
> > > > > 
> > > > > mincore(2) checks whether pages are resident.  Use it to verify that
> > > > > page cache has been dropped.
> > > > > 
> > > > > You can trigger a verification failure by mmapping the image file from
> > > > > another process and loading a byte from a page so that it becomes
> > > > > resident.  bdrv_co_invalidate_cache() will fail while the process is
> > > > > alive.
> > > > 
> > > > It doesn't seem a bad diagnostic to keep in (with a switch to activate)
> > > > for when we're faced with some weird corruption on some weird storage
> > > > system.
> > > 
> > > Okay.  It's very slow to mmap an entire image file and query mincore(2)
> > > so it needs to be off by default.
> > 
> > Also, having it enabled breaks localhost migration at least on tmpfs
> > (which was what I tried out first).
> > 
> > I wonder if the kernel would add some way to query whether the "advice"
> > was actually acted upon if we asked. Either with a new function that
> > returns an error if not everything is dropped (basically
> > .bdrv_invalidate_cache on the kernel level), or a function that just
> > queries if any page is allocated (or maybe the address of the first
> > allocated page in a given range) without having to use mincore() and
> > iterating over all the pages in userspace.
> 
> I'm trying to figure out how to expose the optional mincore check on the
> command-line/QMP:
> 
> 1. Add a check_consistency bool argument to bdrv_invalidate_cache*().
>    Add command-line/QMP option to -incoming and migrate_incoming.  This
>    is messy and won't be easy to access for libvirt users.
> 
> 2. Add a BlockdevOptionsFile *check-cache-consistency bool field.  This
>    is specified at .bdrv_open() time.  It can be changed at runtime with
>    .bdrv_reopen*().
> 
> 3. Add a 'blockdev-check-cache-consistency' QMP command that calls a new
>    .bdrv_check_cache_consistency() callback that is implemented by
>    file-posix.c.  The problem is users might issue this command after
>    I/O has resumed and pages have become resident again.  It only makes
>    sense if the guest is still paused.  Probably a bad interface...
> 
> Have I missed a good way to expose this optional check functionality?
> 
> Which approach do you prefer?  I'm leaning towards #2.

Yes, I think that makes the most sense.

In its current form, this can probably only be a debugging feature,
though, so maybe x-check-cache-consistency? I also don't think libvirt
should mess with this.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used
  2018-04-24 14:29           ` Kevin Wolf
@ 2018-04-27 10:06             ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2018-04-27 10:06 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Dr. David Alan Gilbert, qemu-devel, Max Reitz, Sergio Lopez, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]

On Tue, Apr 24, 2018 at 04:29:15PM +0200, Kevin Wolf wrote:
> Am 24.04.2018 um 16:04 hat Stefan Hajnoczi geschrieben:
> > On Fri, Apr 20, 2018 at 08:25:13AM +0200, Kevin Wolf wrote:
> > > Am 20.04.2018 um 05:02 hat Stefan Hajnoczi geschrieben:
> > > > On Thu, Apr 19, 2018 at 10:05:47AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > > This commit is for debugging only.  Do not merge it.
> > > > > > 
> > > > > > mincore(2) checks whether pages are resident.  Use it to verify that
> > > > > > page cache has been dropped.
> > > > > > 
> > > > > > You can trigger a verification failure by mmapping the image file from
> > > > > > another process and loading a byte from a page so that it becomes
> > > > > > resident.  bdrv_co_invalidate_cache() will fail while the process is
> > > > > > alive.
> > > > > 
> > > > > It doesn't seem a bad diagnostic to keep in (with a switch to activate)
> > > > > for when we're faced with some weird corruption on some weird storage
> > > > > system.
> > > > 
> > > > Okay.  It's very slow to mmap an entire image file and query mincore(2)
> > > > so it needs to be off by default.
> > > 
> > > Also, having it enabled breaks localhost migration at least on tmpfs
> > > (which was what I tried out first).
> > > 
> > > I wonder if the kernel would add some way to query whether the "advice"
> > > was actually acted upon if we asked. Either with a new function that
> > > returns an error if not everything is dropped (basically
> > > .bdrv_invalidate_cache on the kernel level), or a function that just
> > > queries if any page is allocated (or maybe the address of the first
> > > allocated page in a given range) without having to use mincore() and
> > > iterating over all the pages in userspace.
> > 
> > I'm trying to figure out how to expose the optional mincore check on the
> > command-line/QMP:
> > 
> > 1. Add a check_consistency bool argument to bdrv_invalidate_cache*().
> >    Add command-line/QMP option to -incoming and migrate_incoming.  This
> >    is messy and won't be easy to access for libvirt users.
> > 
> > 2. Add a BlockdevOptionsFile *check-cache-consistency bool field.  This
> >    is specified at .bdrv_open() time.  It can be changed at runtime with
> >    .bdrv_reopen*().
> > 
> > 3. Add a 'blockdev-check-cache-consistency' QMP command that calls a new
> >    .bdrv_check_cache_consistency() callback that is implemented by
> >    file-posix.c.  The problem is users might issue this command after
> >    I/O has resumed and pages have become resident again.  It only makes
> >    sense if the guest is still paused.  Probably a bad interface...
> > 
> > Have I missed a good way to expose this optional check functionality?
> > 
> > Which approach do you prefer?  I'm leaning towards #2.
> 
> Yes, I think that makes the most sense.
> 
> In its current form, this can probably only be a debugging feature,
> though, so maybe x-check-cache-consistency? I also don't think libvirt
> should mess with this.

Cool, I will implement this.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-04-27 12:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19  7:52 [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
2018-04-19  8:13   ` Fam Zheng
2018-04-20  3:15     ` Stefan Hajnoczi
2018-04-20  3:36       ` Fam Zheng
2018-04-20  6:13       ` Kevin Wolf
2018-04-19  9:18   ` Dr. David Alan Gilbert
2018-04-20  3:21     ` Stefan Hajnoczi
2018-04-20  6:27       ` Kevin Wolf
2018-04-19  7:52 ` [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used Stefan Hajnoczi
2018-04-19  9:05   ` Dr. David Alan Gilbert
2018-04-20  3:02     ` Stefan Hajnoczi
2018-04-20  6:25       ` Kevin Wolf
2018-04-24 14:04         ` Stefan Hajnoczi
2018-04-24 14:29           ` Kevin Wolf
2018-04-27 10:06             ` Stefan Hajnoczi
2018-04-19 16:09 ` [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Eric Blake
2018-04-20  3:05   ` Stefan Hajnoczi
2018-04-20 13:53     ` Eric Blake
2018-04-24 13:43       ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.