All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>, Sergio Lopez <slp@redhat.com>,
	qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux
Date: Thu, 19 Apr 2018 10:18:33 +0100	[thread overview]
Message-ID: <20180419091832.GB2730@work-vm> (raw)
In-Reply-To: <20180419075232.31407-2-stefanha@redhat.com>

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*.  Use
> this to drop page cache on the destination host during shared storage
> migration.  This way the destination host will read the latest copy of
> the data and will not use stale data from the page cache.
> 
> The flow is as follows:
> 
> 1. Source host writes out all dirty pages and inactivates drives.
> 2. QEMU_VM_EOF is sent on migration stream.
> 3. Destination host invalidates caches before accessing drives.
> 
> This patch enables live migration even with -drive cache.direct=off.
> 
> * Terms and conditions may apply, please see patch for details.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/file-posix.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 3794c0007a..df4f52919f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2236,6 +2236,42 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>      return ret | BDRV_BLOCK_OFFSET_VALID;
>  }
>  
> +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
> +                                                 Error **errp)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    int ret;
> +
> +    ret = fd_open(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "The file descriptor is not open");
> +        return;
> +    }
> +
> +    if (s->open_flags & O_DIRECT) {
> +        return; /* No host kernel page cache */
> +    }
> +
> +#if defined(__linux__)
> +    /* This sets the scene for the next syscall... */
> +    ret = bdrv_co_flush(bs);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "flush failed");
> +        return;
> +    }
> +
> +    /* Linux does not invalidate pages that are dirty, locked, or mmapped by a
> +     * process.  These limitations are okay because we just fsynced the file,
> +     * we don't use mmap, and the file should not be in use by other processes.
> +     */
> +    ret = posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED);

What happens if I try a migrate between two qemu's on the same host?
(Which I, and avocado, both use for testing; I think think users
occasionally do for QEMU updates).

Dave

> +    if (ret != 0) { /* the return value is a positive errno */
> +        error_setg_errno(errp, ret, "fadvise failed");
> +        return;
> +    }
> +#endif /* __linux__ */
> +}
> +
>  static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
>      int64_t offset, int bytes,
>      BlockCompletionFunc *cb, void *opaque)
> @@ -2328,6 +2364,7 @@ BlockDriver bdrv_file = {
>      .bdrv_co_create_opts = raw_co_create_opts,
>      .bdrv_has_zero_init = bdrv_has_zero_init_1,
>      .bdrv_co_block_status = raw_co_block_status,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>      .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> @@ -2805,6 +2842,7 @@ static BlockDriver bdrv_host_device = {
>      .bdrv_reopen_abort   = raw_reopen_abort,
>      .bdrv_co_create_opts = hdev_co_create_opts,
>      .create_opts         = &raw_create_opts,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>      .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> @@ -2927,6 +2965,7 @@ static BlockDriver bdrv_host_cdrom = {
>      .bdrv_reopen_abort   = raw_reopen_abort,
>      .bdrv_co_create_opts = hdev_co_create_opts,
>      .create_opts         = &raw_create_opts,
> +    .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>  
>  
>      .bdrv_co_preadv         = raw_co_preadv,
> -- 
> 2.14.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  parent reply	other threads:[~2018-04-19  9:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19  7:52 [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Stefan Hajnoczi
2018-04-19  7:52 ` [Qemu-devel] [RFC 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux Stefan Hajnoczi
2018-04-19  8:13   ` Fam Zheng
2018-04-20  3:15     ` Stefan Hajnoczi
2018-04-20  3:36       ` Fam Zheng
2018-04-20  6:13       ` Kevin Wolf
2018-04-19  9:18   ` Dr. David Alan Gilbert [this message]
2018-04-20  3:21     ` Stefan Hajnoczi
2018-04-20  6:27       ` Kevin Wolf
2018-04-19  7:52 ` [Qemu-devel] [RFC 2/2] block/file-posix: verify page cache is not used Stefan Hajnoczi
2018-04-19  9:05   ` Dr. David Alan Gilbert
2018-04-20  3:02     ` Stefan Hajnoczi
2018-04-20  6:25       ` Kevin Wolf
2018-04-24 14:04         ` Stefan Hajnoczi
2018-04-24 14:29           ` Kevin Wolf
2018-04-27 10:06             ` Stefan Hajnoczi
2018-04-19 16:09 ` [Qemu-devel] [RFC 0/2] block/file-posix: allow -drive cache.direct=off live migration Eric Blake
2018-04-20  3:05   ` Stefan Hajnoczi
2018-04-20 13:53     ` Eric Blake
2018-04-24 13:43       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180419091832.GB2730@work-vm \
    --to=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.