All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
@ 2014-10-02  8:52 Alexey Kardashevskiy
  2014-10-02  9:45 ` Paolo Bonzini
  2014-10-02 14:52 ` Stefan Hajnoczi
  0 siblings, 2 replies; 7+ messages in thread
From: Alexey Kardashevskiy @ 2014-10-02  8:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, Kevin Wolf, Stefan Hajnoczi, Paolo Bonzini

When migrated using libvirt with "--copy-storage-all", at the end of
migration there is race between NBD mirroring task trying to do flush
and migration completion, both end up invalidating cache. Since qcow2
driver does not handle this situation very well, random crashes happen.

This disables the BDRV_O_INCOMING flag for the block device being migrated
and restores it when NBD task is done.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---


The commit log is not full and most likely incorrect as well
as the patch :) Please, help. Thanks!

The patch seems to fix the initial problem though.


btw is there any easy way to migrate one QEMU to another
using NBD (i.e. not using "migrate -b") and not using libvirt?
What would the command line be? Debugging with libvirt is real
pain :(


---
 block.c     | 17 ++++-------------
 migration.c |  1 -
 nbd.c       | 11 +++++++++++
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index c5a251c..ed72e0a 100644
--- a/block.c
+++ b/block.c
@@ -5073,6 +5073,10 @@ void bdrv_invalidate_cache_all(Error **errp)
     QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
+        if (!(bs->open_flags & BDRV_O_INCOMING)) {
+            continue;
+        }
+
         aio_context_acquire(aio_context);
         bdrv_invalidate_cache(bs, &local_err);
         aio_context_release(aio_context);
@@ -5083,19 +5087,6 @@ void bdrv_invalidate_cache_all(Error **errp)
     }
 }
 
-void bdrv_clear_incoming_migration_all(void)
-{
-    BlockDriverState *bs;
-
-    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
-        AioContext *aio_context = bdrv_get_aio_context(bs);
-
-        aio_context_acquire(aio_context);
-        bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING);
-        aio_context_release(aio_context);
-    }
-}
-
 int bdrv_flush(BlockDriverState *bs)
 {
     Coroutine *co;
diff --git a/migration.c b/migration.c
index 8d675b3..c49a05a 100644
--- a/migration.c
+++ b/migration.c
@@ -103,7 +103,6 @@ static void process_incoming_migration_co(void *opaque)
     }
     qemu_announce_self();
 
-    bdrv_clear_incoming_migration_all();
     /* Make sure all file formats flush their mutable metadata */
     bdrv_invalidate_cache_all(&local_err);
     if (local_err) {
diff --git a/nbd.c b/nbd.c
index e9b539b..7b479c0 100644
--- a/nbd.c
+++ b/nbd.c
@@ -106,6 +106,7 @@ struct NBDExport {
     off_t dev_offset;
     off_t size;
     uint32_t nbdflags;
+    bool restore_incoming;
     QTAILQ_HEAD(, NBDClient) clients;
     QTAILQ_ENTRY(NBDExport) next;
 
@@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
     exp->ctx = bdrv_get_aio_context(bs);
     bdrv_ref(bs);
     bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
+
+    if (bs->open_flags & BDRV_O_INCOMING) {
+        bdrv_invalidate_cache(bs, NULL);
+        exp->restore_incoming = !!(bs->open_flags & BDRV_O_INCOMING);
+        bs->open_flags &= ~(BDRV_O_INCOMING);
+    }
+
     return exp;
 }
 
@@ -1021,6 +1029,9 @@ void nbd_export_close(NBDExport *exp)
     if (exp->bs) {
         bdrv_remove_aio_context_notifier(exp->bs, bs_aio_attached,
                                          bs_aio_detach, exp);
+        if (exp->restore_incoming) {
+            exp->bs->open_flags |= BDRV_O_INCOMING;
+        }
         bdrv_unref(exp->bs);
         exp->bs = NULL;
     }
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
  2014-10-02  8:52 [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration Alexey Kardashevskiy
@ 2014-10-02  9:45 ` Paolo Bonzini
  2014-10-02 10:19   ` Alexey Kardashevskiy
  2014-10-02 14:52 ` Stefan Hajnoczi
  1 sibling, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2014-10-02  9:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Il 02/10/2014 10:52, Alexey Kardashevskiy ha scritto:
> When migrated using libvirt with "--copy-storage-all", at the end of
> migration there is race between NBD mirroring task trying to do flush
> and migration completion, both end up invalidating cache. Since qcow2
> driver does not handle this situation very well, random crashes happen.
> 
> This disables the BDRV_O_INCOMING flag for the block device being migrated
> and restores it when NBD task is done.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> 
> The commit log is not full and most likely incorrect as well
> as the patch :) Please, help. Thanks!
> 
> The patch seems to fix the initial problem though.
> 
> 
> btw is there any easy way to migrate one QEMU to another
> using NBD (i.e. not using "migrate -b") and not using libvirt?
> What would the command line be? Debugging with libvirt is real
> pain :(
> 
> 
> ---
>  block.c     | 17 ++++-------------
>  migration.c |  1 -
>  nbd.c       | 11 +++++++++++
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c5a251c..ed72e0a 100644
> --- a/block.c
> +++ b/block.c
> @@ -5073,6 +5073,10 @@ void bdrv_invalidate_cache_all(Error **errp)
>      QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
>  
> +        if (!(bs->open_flags & BDRV_O_INCOMING)) {
> +            continue;
> +        }
> +
>          aio_context_acquire(aio_context);
>          bdrv_invalidate_cache(bs, &local_err);
>          aio_context_release(aio_context);

This part is okay, though perhaps we should add it to
bdrv_invalidate_cache instead?

> @@ -5083,19 +5087,6 @@ void bdrv_invalidate_cache_all(Error **errp)
>      }
>  }
>  
> -void bdrv_clear_incoming_migration_all(void)
> -{
> -    BlockDriverState *bs;
> -
> -    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -        AioContext *aio_context = bdrv_get_aio_context(bs);
> -
> -        aio_context_acquire(aio_context);
> -        bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING);
> -        aio_context_release(aio_context);
> -    }
> -}
> -
>  int bdrv_flush(BlockDriverState *bs)
>  {
>      Coroutine *co;
> diff --git a/migration.c b/migration.c
> index 8d675b3..c49a05a 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -103,7 +103,6 @@ static void process_incoming_migration_co(void *opaque)
>      }
>      qemu_announce_self();
>  
> -    bdrv_clear_incoming_migration_all();
>      /* Make sure all file formats flush their mutable metadata */
>      bdrv_invalidate_cache_all(&local_err);
>      if (local_err) {

This part I don't understand.

Shouldn't you at least be adding

	bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING);

to bdrv_invalidate_cache?

> diff --git a/nbd.c b/nbd.c
> index e9b539b..7b479c0 100644
> --- a/nbd.c
> +++ b/nbd.c
> @@ -106,6 +106,7 @@ struct NBDExport {
>      off_t dev_offset;
>      off_t size;
>      uint32_t nbdflags;
> +    bool restore_incoming;
>      QTAILQ_HEAD(, NBDClient) clients;
>      QTAILQ_ENTRY(NBDExport) next;
>  
> @@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
>      exp->ctx = bdrv_get_aio_context(bs);
>      bdrv_ref(bs);
>      bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
> +
> +    if (bs->open_flags & BDRV_O_INCOMING) {
> +        bdrv_invalidate_cache(bs, NULL);
> +        exp->restore_incoming = !!(bs->open_flags & BDRV_O_INCOMING);
> +        bs->open_flags &= ~(BDRV_O_INCOMING);
> +    }
> +
>      return exp;
>  }
>  
> @@ -1021,6 +1029,9 @@ void nbd_export_close(NBDExport *exp)
>      if (exp->bs) {
>          bdrv_remove_aio_context_notifier(exp->bs, bs_aio_attached,
>                                           bs_aio_detach, exp);
> +        if (exp->restore_incoming) {
> +            exp->bs->open_flags |= BDRV_O_INCOMING;
> +        }
>          bdrv_unref(exp->bs);
>          exp->bs = NULL;
>      }
> 

For this, I don't think you even need exp->restore_incoming, and then it
can simply be a one-liner

+	bdrv_invalidate_cache(bs, NULL);

if you modify bdrv_invalidate_cache instead of bdrv_invalidate_cache_all.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
  2014-10-02  9:45 ` Paolo Bonzini
@ 2014-10-02 10:19   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 7+ messages in thread
From: Alexey Kardashevskiy @ 2014-10-02 10:19 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/02/2014 07:45 PM, Paolo Bonzini wrote:
> Il 02/10/2014 10:52, Alexey Kardashevskiy ha scritto:
>> When migrated using libvirt with "--copy-storage-all", at the end
>> of migration there is race between NBD mirroring task trying to do
>> flush and migration completion, both end up invalidating cache.
>> Since qcow2 driver does not handle this situation very well, random
>> crashes happen.
>> 
>> This disables the BDRV_O_INCOMING flag for the block device being
>> migrated and restores it when NBD task is done.
>> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> ---
>> 
>> 
>> The commit log is not full and most likely incorrect as well as the
>> patch :) Please, help. Thanks!
>> 
>> The patch seems to fix the initial problem though.
>> 
>> 
>> btw is there any easy way to migrate one QEMU to another using NBD
>> (i.e. not using "migrate -b") and not using libvirt? What would the
>> command line be? Debugging with libvirt is real pain :(
>> 
>> 
>> --- block.c     | 17 ++++------------- migration.c |  1 - nbd.c
>> | 11 +++++++++++ 3 files changed, 15 insertions(+), 14 deletions(-)
>> 
>> diff --git a/block.c b/block.c index c5a251c..ed72e0a 100644 ---
>> a/block.c +++ b/block.c @@ -5073,6 +5073,10 @@ void
>> bdrv_invalidate_cache_all(Error **errp) QTAILQ_FOREACH(bs,
>> &bdrv_states, device_list) { AioContext *aio_context =
>> bdrv_get_aio_context(bs);
>> 
>> +        if (!(bs->open_flags & BDRV_O_INCOMING)) { +
>> continue; +        } + aio_context_acquire(aio_context); 
>> bdrv_invalidate_cache(bs, &local_err); 
>> aio_context_release(aio_context);
> 
> This part is okay, though perhaps we should add it to 
> bdrv_invalidate_cache instead?

Yes, makes perfect sense.


> 
>> @@ -5083,19 +5087,6 @@ void bdrv_invalidate_cache_all(Error **errp) 
>> } }
>> 
>> -void bdrv_clear_incoming_migration_all(void) -{ -
>> BlockDriverState *bs; - -    QTAILQ_FOREACH(bs, &bdrv_states,
>> device_list) { -        AioContext *aio_context =
>> bdrv_get_aio_context(bs); - -
>> aio_context_acquire(aio_context); -        bs->open_flags =
>> bs->open_flags & ~(BDRV_O_INCOMING); -
>> aio_context_release(aio_context); -    } -} - int
>> bdrv_flush(BlockDriverState *bs) { Coroutine *co; diff --git
>> a/migration.c b/migration.c index 8d675b3..c49a05a 100644 ---
>> a/migration.c +++ b/migration.c @@ -103,7 +103,6 @@ static void
>> process_incoming_migration_co(void *opaque) } qemu_announce_self();
>> 
>> -    bdrv_clear_incoming_migration_all(); /* Make sure all file
>> formats flush their mutable metadata */ 
>> bdrv_invalidate_cache_all(&local_err); if (local_err) {
> 
> This part I don't understand.
> 
> Shouldn't you at least be adding
> 
> bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING);
> 
> to bdrv_invalidate_cache?


Reset the flag after caches has been invalidated?
What is the exact semantic of this BDRV_O_INCOMING?

blockdev_init() sets it, we reset it on the first bdrv_invalidate_cache()
and then we never set it again? I am still missing the bigger picture...


>> diff --git a/nbd.c b/nbd.c index e9b539b..7b479c0 100644 ---
>> a/nbd.c +++ b/nbd.c @@ -106,6 +106,7 @@ struct NBDExport { off_t
>> dev_offset; off_t size; uint32_t nbdflags; +    bool
>> restore_incoming; QTAILQ_HEAD(, NBDClient) clients; 
>> QTAILQ_ENTRY(NBDExport) next;
>> 
>> @@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
>> off_t dev_offset, exp->ctx = bdrv_get_aio_context(bs); 
>> bdrv_ref(bs); bdrv_add_aio_context_notifier(bs, bs_aio_attached,
>> bs_aio_detach, exp); + +    if (bs->open_flags & BDRV_O_INCOMING) { 
>> +        bdrv_invalidate_cache(bs, NULL); +
>> exp->restore_incoming = !!(bs->open_flags & BDRV_O_INCOMING); +
>> bs->open_flags &= ~(BDRV_O_INCOMING); +    } + return exp; }
>> 
>> @@ -1021,6 +1029,9 @@ void nbd_export_close(NBDExport *exp) if
>> (exp->bs) { bdrv_remove_aio_context_notifier(exp->bs,
>> bs_aio_attached, bs_aio_detach, exp); +        if
>> (exp->restore_incoming) { +            exp->bs->open_flags |=
>> BDRV_O_INCOMING; +        } bdrv_unref(exp->bs); exp->bs = NULL; }
>> 
> 
> For this, I don't think you even need exp->restore_incoming, and then
> it can simply be a one-liner
> 
> +	bdrv_invalidate_cache(bs, NULL);
> 
> if you modify bdrv_invalidate_cache instead of
> bdrv_invalidate_cache_all.


I did not understand that modification but if I do not restore
BDRV_O_INCOMING, then changes to the disk I made on the source side
before migration - they are lost after rebooting the destination guest.



- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJULSalAAoJEIYTPdgrwSC5tVcP/RAKlDwP2E3hRpfqK4oR+BnR
/OF89+kVvlvXtKSImY1/8oHlPwoKIqA974ZuxYnJNZw3xx2xDmMnT3V3UOVs77Te
rRhs87ps/xjk+FXrqRQnuITyoJzOCjIuhkx5cVO66caLyfJaesbPmKgPbThH3EoI
FHbwe/XsKjttMGAwd721tDrx/1fwAp5BnpFOMP2ZgqMGkRC3+9+xnxIWqOUvpMTl
AVsjWvWO5rRSyj/QE+8RQi+XNPtqfiCYaUHLNy+g23GQjIAjol+zY88sS5f9axJD
e4BthhumaALrCfJXf/3p0kszV+oUZ6SSnFcbZnMNe90o5+erDjNEt2i2HGW82sPY
42NP6Tpdg3q01L9zzw7Q+kR8dSy8SQKxeC8Brdi2sfX3KS0JI8mYtYdvWsRjeQ1L
OpAYh2eWcqbb9JI1mIE5KWLF/hZPj0epWYNz1VUTB5zmT2VqtmPd+7Xf1mAbh2xN
EUWhNQOSrnIxwVcm62SiSy8jYVXfzKIfgmz2Ax/W12Q0zqSxo4896zvaep3PlC+l
Ms33JpDPa2qIyWBhZ9ofufV+smqnOgPxC9+Spg4QSlTAL4MHBUGH+fVhml/p4/rn
jQo8+0ifbvl9ARv+B0oEERk2Lr1LL7fIcmZDyddQUTswmSK7vTUeKZqpCMN00Ryx
9ms4MHSEolQQJUhrVnX2
=qkLF
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
  2014-10-02  8:52 [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration Alexey Kardashevskiy
  2014-10-02  9:45 ` Paolo Bonzini
@ 2014-10-02 14:52 ` Stefan Hajnoczi
  2014-10-03  4:12   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2014-10-02 14:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 4227 bytes --]

On Thu, Oct 02, 2014 at 06:52:52PM +1000, Alexey Kardashevskiy wrote:
> When migrated using libvirt with "--copy-storage-all", at the end of
> migration there is race between NBD mirroring task trying to do flush
> and migration completion, both end up invalidating cache. Since qcow2
> driver does not handle this situation very well, random crashes happen.
> 
> This disables the BDRV_O_INCOMING flag for the block device being migrated
> and restores it when NBD task is done.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> 
> The commit log is not full and most likely incorrect as well
> as the patch :) Please, help. Thanks!
> 
> The patch seems to fix the initial problem though.
> 
> 
> btw is there any easy way to migrate one QEMU to another
> using NBD (i.e. not using "migrate -b") and not using libvirt?
> What would the command line be? Debugging with libvirt is real
> pain :(
> 
> 
> ---
>  block.c     | 17 ++++-------------
>  migration.c |  1 -
>  nbd.c       | 11 +++++++++++
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c5a251c..ed72e0a 100644
> --- a/block.c
> +++ b/block.c
> @@ -5073,6 +5073,10 @@ void bdrv_invalidate_cache_all(Error **errp)
>      QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>          AioContext *aio_context = bdrv_get_aio_context(bs);
>  
> +        if (!(bs->open_flags & BDRV_O_INCOMING)) {
> +            continue;
> +        }
> +

We shouldn't touch bs before acquiring the AioContext.  Acquiring the
AioContext is basically the "lock" for the BDS.

It needs to be moved...

>          aio_context_acquire(aio_context);

...in here.

>          bdrv_invalidate_cache(bs, &local_err);
>          aio_context_release(aio_context);
> @@ -5083,19 +5087,6 @@ void bdrv_invalidate_cache_all(Error **errp)
>      }
>  }
>  
> -void bdrv_clear_incoming_migration_all(void)
> -{
> -    BlockDriverState *bs;
> -
> -    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
> -        AioContext *aio_context = bdrv_get_aio_context(bs);
> -
> -        aio_context_acquire(aio_context);
> -        bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING);
> -        aio_context_release(aio_context);
> -    }
> -}
> -
>  int bdrv_flush(BlockDriverState *bs)
>  {
>      Coroutine *co;
> diff --git a/migration.c b/migration.c
> index 8d675b3..c49a05a 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -103,7 +103,6 @@ static void process_incoming_migration_co(void *opaque)
>      }
>      qemu_announce_self();
>  
> -    bdrv_clear_incoming_migration_all();
>      /* Make sure all file formats flush their mutable metadata */
>      bdrv_invalidate_cache_all(&local_err);

BDRV_O_INCOMING needs to be cleared, otherwise the block drivers will
think that incoming migration is still taking place and treat the file
as effectively read-only during open.

On IRC I suggested doing away with the bdrv_invalidate_cache_all() name
since it no longer calls bdrv_invalidate_cache() on all BDSes.

Combine both clearing BDRV_O_INCOMING and calling
bdrv_invalidate_cache() if BDRV_O_INCOMING was previously set into one
function - you could reuse bdrv_clear_incoming_migration_all() for that.

>      if (local_err) {
> diff --git a/nbd.c b/nbd.c
> index e9b539b..7b479c0 100644
> --- a/nbd.c
> +++ b/nbd.c
> @@ -106,6 +106,7 @@ struct NBDExport {
>      off_t dev_offset;
>      off_t size;
>      uint32_t nbdflags;
> +    bool restore_incoming;

Not needed, it does not make sense to restore BDRV_O_INCOMING because
once we've written to a file it cannot be in use by another host at the
same time.

>      QTAILQ_HEAD(, NBDClient) clients;
>      QTAILQ_ENTRY(NBDExport) next;
>  
> @@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
>      exp->ctx = bdrv_get_aio_context(bs);
>      bdrv_ref(bs);
>      bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
> +
> +    if (bs->open_flags & BDRV_O_INCOMING) {

I think the flag has to be cleared before calling
bdrv_invalidate_cache() because the .bdrv_open() function looks at the
flag.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
  2014-10-02 14:52 ` Stefan Hajnoczi
@ 2014-10-03  4:12   ` Alexey Kardashevskiy
  2014-10-06 10:03     ` Stefan Hajnoczi
  0 siblings, 1 reply; 7+ messages in thread
From: Alexey Kardashevskiy @ 2014-10-03  4:12 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/03/2014 12:52 AM, Stefan Hajnoczi wrote:
> On Thu, Oct 02, 2014 at 06:52:52PM +1000, Alexey Kardashevskiy wrote:
>> When migrated using libvirt with "--copy-storage-all", at the end
>> of migration there is race between NBD mirroring task trying to do
>> flush and migration completion, both end up invalidating cache.
>> Since qcow2 driver does not handle this situation very well, random
>> crashes happen.
>> 
>> This disables the BDRV_O_INCOMING flag for the block device being
>> migrated and restores it when NBD task is done.
>> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> ---
>> 
>> 
>> The commit log is not full and most likely incorrect as well as the
>> patch :) Please, help. Thanks!
>> 
>> The patch seems to fix the initial problem though.
>> 
>> 
>> btw is there any easy way to migrate one QEMU to another using NBD
>> (i.e. not using "migrate -b") and not using libvirt? What would the
>> command line be? Debugging with libvirt is real pain :(
>> 
>> 
>> --- block.c     | 17 ++++------------- migration.c |  1 - nbd.c
>> | 11 +++++++++++ 3 files changed, 15 insertions(+), 14 deletions(-)
>> 
>> diff --git a/block.c b/block.c index c5a251c..ed72e0a 100644 ---
>> a/block.c +++ b/block.c @@ -5073,6 +5073,10 @@ void
>> bdrv_invalidate_cache_all(Error **errp) QTAILQ_FOREACH(bs,
>> &bdrv_states, device_list) { AioContext *aio_context =
>> bdrv_get_aio_context(bs);
>> 
>> +        if (!(bs->open_flags & BDRV_O_INCOMING)) { +
>> continue; +        } +
> 
> We shouldn't touch bs before acquiring the AioContext.  Acquiring the 
> AioContext is basically the "lock" for the BDS.
> 
> It needs to be moved...
> 
>> aio_context_acquire(aio_context);
> 
> ...in here.
> 
>> bdrv_invalidate_cache(bs, &local_err); 
>> aio_context_release(aio_context); @@ -5083,19 +5087,6 @@ void
>> bdrv_invalidate_cache_all(Error **errp) } }
>> 
>> -void bdrv_clear_incoming_migration_all(void) -{ -
>> BlockDriverState *bs; - -    QTAILQ_FOREACH(bs, &bdrv_states,
>> device_list) { -        AioContext *aio_context =
>> bdrv_get_aio_context(bs); - -
>> aio_context_acquire(aio_context); -        bs->open_flags =
>> bs->open_flags & ~(BDRV_O_INCOMING); -
>> aio_context_release(aio_context); -    } -} - int
>> bdrv_flush(BlockDriverState *bs) { Coroutine *co; diff --git
>> a/migration.c b/migration.c index 8d675b3..c49a05a 100644 ---
>> a/migration.c +++ b/migration.c @@ -103,7 +103,6 @@ static void
>> process_incoming_migration_co(void *opaque) } qemu_announce_self();
>> 
>> -    bdrv_clear_incoming_migration_all(); /* Make sure all file
>> formats flush their mutable metadata */ 
>> bdrv_invalidate_cache_all(&local_err);
> 
> BDRV_O_INCOMING needs to be cleared, otherwise the block drivers will 
> think that incoming migration is still taking place and treat the
> file as effectively read-only during open.
> 
> On IRC I suggested doing away with the bdrv_invalidate_cache_all()
> name since it no longer calls bdrv_invalidate_cache() on all BDSes.
> 
> Combine both clearing BDRV_O_INCOMING and calling 
> bdrv_invalidate_cache() if BDRV_O_INCOMING was previously set into
> one function - you could reuse bdrv_clear_incoming_migration_all() for
> that.


I reread why bdrv_invalidate_cache() was added in the first place and
what BDRV_O_INCOMING is, every time I look at the code it plays in new
colors (not sure if it is a correct figure of speech :) ).

BDRV_O_INCOMING is only set when QEMU is about to receive migration and
we do not want QEMU to check the file at opening time as there is likely
garbage. Is there any other use of BDRV_O_INCOMING? There must be some as
bdrv_clear_incoming_migration_all() is called at the end of live
migration and I believe there must be a reason for it (cache does not
migrate, does it?).

bdrv_invalidate_cache() flushes cache as it could be already initialized
even if QEMU is receiving migration - QEMU could have cached some of real
disk data. Is that correct? I do not really understand why it would
happen if there is BDRV_O_INCOMING set but ok. I also thought that
bdrv_invalidate_cache() is called more often as its name suggests, not
during migration only, I was wrong here.

The patch below then should solve the initial problem.

bdrv_clear_incoming_migration_all() is unclear though...


diff --git a/block.c b/block.c
index c5a251c..6314af7 100644
- --- a/block.c
+++ b/block.c
@@ -5048,6 +5048,11 @@ void bdrv_invalidate_cache(BlockDriverState *bs,
Error **errp)
         return;
     }

+    if (!(bs->open_flags & BDRV_O_INCOMING)) {
+        return;
+    }
+    bs->open_flags &= ~(BDRV_O_INCOMING);
+
     if (bs->drv->bdrv_invalidate_cache) {
         bs->drv->bdrv_invalidate_cache(bs, &local_err);
     } else if (bs->file) {
@@ -5083,19 +5088,6 @@ void bdrv_invalidate_cache_all(Error **errp)
     }
 }

- -void bdrv_clear_incoming_migration_all(void)
- -{
- -    BlockDriverState *bs;
- -
- -    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
- -        AioContext *aio_context = bdrv_get_aio_context(bs);
- -
- -        aio_context_acquire(aio_context);
- -        bs->open_flags = bs->open_flags & ~(BDRV_O_INCOMING);
- -        aio_context_release(aio_context);
- -    }
- -}
- -
 int bdrv_flush(BlockDriverState *bs)
 {
     Coroutine *co;
diff --git a/migration.c b/migration.c
index 8d675b3..c49a05a 100644
- --- a/migration.c
+++ b/migration.c
@@ -103,7 +103,6 @@ static void process_incoming_migration_co(void *opaque)
     }
     qemu_announce_self();

- -    bdrv_clear_incoming_migration_all();
     /* Make sure all file formats flush their mutable metadata */
     bdrv_invalidate_cache_all(&local_err);
     if (local_err) {
diff --git a/nbd.c b/nbd.c
index e9b539b..953c378 100644
- --- a/nbd.c
+++ b/nbd.c
@@ -972,6 +972,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t
dev_offset,
     exp->ctx = bdrv_get_aio_context(bs);
     bdrv_ref(bs);
     bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
+    bdrv_invalidate_cache(bs, NULL);
     return exp;
 }




>> if (local_err) { diff --git a/nbd.c b/nbd.c index e9b539b..7b479c0
>> 100644 --- a/nbd.c +++ b/nbd.c @@ -106,6 +106,7 @@ struct NBDExport
>> { off_t dev_offset; off_t size; uint32_t nbdflags; +    bool
>> restore_incoming;
> 
> Not needed, it does not make sense to restore BDRV_O_INCOMING because 
> once we've written to a file it cannot be in use by another host at
> the same time.
> 
>> QTAILQ_HEAD(, NBDClient) clients; QTAILQ_ENTRY(NBDExport) next;
>> 
>> @@ -972,6 +973,13 @@ NBDExport *nbd_export_new(BlockDriverState *bs,
>> off_t dev_offset, exp->ctx = bdrv_get_aio_context(bs); 
>> bdrv_ref(bs); bdrv_add_aio_context_notifier(bs, bs_aio_attached,
>> bs_aio_detach, exp); + +    if (bs->open_flags & BDRV_O_INCOMING) {
> 
> I think the flag has to be cleared before calling 
> bdrv_invalidate_cache() because the .bdrv_open() function looks at
> the flag.
> 


- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJULiIVAAoJEIYTPdgrwSC57ZIP+gI2ckqbMTf2LFl4MVP1oOEZ
IkbZGiWJRfVsUitKD4+fjYB7ysnkt1hlrqdaaoTaPPzst676cGek91KdkpxqgorU
L80e5TaRO/Ltx65yN1gMHhGJJnck9KvnzSp9atqUZNBGiMbYDPj8lN5NeyBadwdk
37uR/SvFrVRUWMVpgs7XF7q60C6YRnoGy+8qqmUSlwx3aYtFZpOfSyQYWkY5mT5j
3nsymm1ieFAQOd255K6X1oZW6GjwNCXa5MsUspK5gokPufjZQguGorRUMdBDmKa+
L9dvvMDG0I88027W3eTI0a2WI0D3BllizNuThacDHQEiLTIVGwd9VIDPYUJiz85W
7p88H6c8AozhUlcQiNK1o185HFCC+bcxlX0rHHRTLG+WfEWPMJ0Y4iewBvVa/IPZ
Dzje5mEvdOm5MeU1/BiyTA6fc5nN5CpVth5mu7stMnnVvEAjK+gttKqTMMcR4BWa
bzbREsrw97eqgGZcqH6T8X69Y9kEjgofdvs9uOlH4XlCD5CjJZNoylF8bzcwAFnO
jlDNPAHCJATKLxIH6SClwaVf74lRn6Jv1WEKwKX9zuAq8rN34mSVT4VauEvRSsP0
RKnkqbUl37MZcf/dpiqr3M2W0pXDArvPZpme42IBJkTtpgFH5jC5Cry9OipOPqNc
dFVR8sIgTQ4HLktVx4Fz
=3bSl
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
  2014-10-03  4:12   ` Alexey Kardashevskiy
@ 2014-10-06 10:03     ` Stefan Hajnoczi
  2014-10-06 22:47       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Hajnoczi @ 2014-10-06 10:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

On Fri, Oct 03, 2014 at 02:12:06PM +1000, Alexey Kardashevskiy wrote:
> BDRV_O_INCOMING is only set when QEMU is about to receive migration and
> we do not want QEMU to check the file at opening time as there is likely
> garbage. Is there any other use of BDRV_O_INCOMING? There must be some as
> bdrv_clear_incoming_migration_all() is called at the end of live
> migration and I believe there must be a reason for it (cache does not
> migrate, does it?).

BDRV_O_INCOMING is just for live migration.  The cached data is not
migrated, this is why it must be refreshed upon migration handover.

> bdrv_invalidate_cache() flushes cache as it could be already initialized
> even if QEMU is receiving migration - QEMU could have cached some of real
> disk data. Is that correct? I do not really understand why it would
> happen if there is BDRV_O_INCOMING set but ok.

.bdrv_open() can load metadata from image files (such as the qcow2 L1
tables) and it does this even when BDRV_O_INCOMING is set.  That data
needs to be re-read at migration handover to prevent the destination
QEMU from running with stale image metadata.

> diff --git a/nbd.c b/nbd.c
> index e9b539b..953c378 100644
> - --- a/nbd.c
> +++ b/nbd.c
> @@ -972,6 +972,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t
> dev_offset,
>      exp->ctx = bdrv_get_aio_context(bs);
>      bdrv_ref(bs);
>      bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
> +    bdrv_invalidate_cache(bs, NULL);
>      return exp;
>  }

Please add a comment to explain why this call is necessary:

/* NBD exports are used for non-shared storage migration.  Make sure
 * that BDRV_O_INCOMING is cleared and the image is ready for write
 * access since the export could be available before migration handover.
 */

If someone can come up with a 2- or 3-line comment that explains it
better, great.

The rest of the patch looks like it will work.  I'm not thrilled about
putting BDRV_O_INCOMING logical inside bdrv_invalidate_cache() because
there was no coupling there before, but the code seems correct now.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration
  2014-10-06 10:03     ` Stefan Hajnoczi
@ 2014-10-06 22:47       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 7+ messages in thread
From: Alexey Kardashevskiy @ 2014-10-06 22:47 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/06/2014 09:03 PM, Stefan Hajnoczi wrote:
> On Fri, Oct 03, 2014 at 02:12:06PM +1000, Alexey Kardashevskiy wrote:
>> BDRV_O_INCOMING is only set when QEMU is about to receive migration
>> and we do not want QEMU to check the file at opening time as there
>> is likely garbage. Is there any other use of BDRV_O_INCOMING? There
>> must be some as bdrv_clear_incoming_migration_all() is called at the
>> end of live migration and I believe there must be a reason for it
>> (cache does not migrate, does it?).
> 
> BDRV_O_INCOMING is just for live migration.  The cached data is not 
> migrated, this is why it must be refreshed upon migration handover.
> 
>> bdrv_invalidate_cache() flushes cache as it could be already
>> initialized even if QEMU is receiving migration - QEMU could have
>> cached some of real disk data. Is that correct? I do not really
>> understand why it would happen if there is BDRV_O_INCOMING set but
>> ok.
> 
> .bdrv_open() can load metadata from image files (such as the qcow2 L1 
> tables) and it does this even when BDRV_O_INCOMING is set.  That data 
> needs to be re-read at migration handover to prevent the destination 
> QEMU from running with stale image metadata.


Would not it be easier/more correct if bdrv_open would not load this
metadata if BDRV_O_INCOMING is set?


> 
>> diff --git a/nbd.c b/nbd.c index e9b539b..953c378 100644 - ---
>> a/nbd.c +++ b/nbd.c @@ -972,6 +972,7 @@ NBDExport
>> *nbd_export_new(BlockDriverState *bs, off_t dev_offset, exp->ctx =
>> bdrv_get_aio_context(bs); bdrv_ref(bs); 
>> bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach,
>> exp); +    bdrv_invalidate_cache(bs, NULL); return exp; }
> 
> Please add a comment to explain why this call is necessary:
> 
> /* NBD exports are used for non-shared storage migration.  Make sure *
> that BDRV_O_INCOMING is cleared and the image is ready for write *
> access since the export could be available before migration handover. 
> */
> 
> If someone can come up with a 2- or 3-line comment that explains it 
> better, great.
> 
> The rest of the patch looks like it will work.  I'm not thrilled
> about putting BDRV_O_INCOMING logical inside bdrv_invalidate_cache()
> because there was no coupling there before, but the code seems correct
> now.


Ok, will repost today.



- -- 
Alexey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJUMxvkAAoJEIYTPdgrwSC5+GMP/1sDhIJf3BlGb7FFqi7CeRDk
X65p61AKc0BKjchFSGsl62DctcIUFPUN+gO8bGYlfihEdBu7n0pinG0iovg+Dhk3
auExumJGQony7D5K0/7BnR5Ciyvakk8lYs3qUaSE7PD4FrVDtnl8x7RTwP3qet3Q
P4TO9yTVCoqnMBSj3ZZzcirwty8+MpFNWD9vhzBsyC3uVkXG/3pPr2LJWWogzosz
ewmZQPQ5ydFncpBvT8gZhD+eseRVo6y0iTEac7TGmhDGCSWtyNcZng695lmzbUpB
+lIQ5kqXOgGbcn8zgJ2VUwuBy7/sI0TsSIxYO69qwdgOqahSCrd7KgN0t2BoRVtH
SOdxKxZhI3ULaNKAvRax93yq+gL8WZSvVmhpfEVh1EA7mZ2TeGMwZ3OsQzvGwi5/
RjsDTruiWg7FWoU6cI2VuPskNkehr0CXTMsheWoR8xvj+WVGz35quropSp8dw1qy
NnJ8RmL5sQbtmh3Y8njdP+kwRUilqJAPcrpH6p4a+2cnXH4b3By4gyt0UROl7afs
h8Pf/86HFa92kbMZAFs5ajhBj3Dg+SLHdAElzrRuz25/MVREAslaM8Us1q53xx/g
P8neTnQIZfbcaH0b4JLxWPN2JPOyYXDV4IaOLEWyoAG0m2ks085+AB1L0o56hn0/
6RVXGOU8iJKtQ6KGiDy/
=pB8l
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-06 22:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-02  8:52 [Qemu-devel] [RFC PATCH] block/migration: Disable cache invalidate for incoming migration Alexey Kardashevskiy
2014-10-02  9:45 ` Paolo Bonzini
2014-10-02 10:19   ` Alexey Kardashevskiy
2014-10-02 14:52 ` Stefan Hajnoczi
2014-10-03  4:12   ` Alexey Kardashevskiy
2014-10-06 10:03     ` Stefan Hajnoczi
2014-10-06 22:47       ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.