All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] bug in reopen arch
@ 2018-06-12 18:57 Vladimir Sementsov-Ogievskiy
  2018-06-14 10:46 ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-12 18:57 UTC (permalink / raw)
  To: qemu block, qemu-devel, Max Reitz, Kevin Wolf; +Cc: Denis V. Lunev

Hi all!

I've faced the following problem:

     1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
     command block-dirty-bitmap-add)

     2. run the following commands:

         qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
         qemu-io -c 'write 0 512' b.qcow2
         qemu-img commit b.qcow2

     3. last command fails with the following output:

Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2 
cluster_size=65536 lazy_refcounts=off refcount_bits=16
wrote 512/512 bytes at offset 0
512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update 
bitmap directory: Operation not permitted
qemu-img: Block job failed: Operation not permitted


And problem is that children are reopened _after_ parent. But qcow2 
reopen needs write access to its file, to write IN_USE flag to 
dirty-bitmaps extension.

I've tried to fix it as simple as:


diff --git a/block.c b/block.c
index cfe5e6080d..392b2941ac 100644
--- a/block.c
+++ b/block.c
@@ -2793,7 +2793,7 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,

      if (!bs_entry) {
          bs_entry = g_new0(BlockReopenQueueEntry, 1);
-        QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
+        QSIMPLEQ_INSERT_HEAD(bs_queue, bs_entry, entry);
      } else {
          QDECREF(bs_entry->state.options);
          QDECREF(bs_entry->state.explicit_options);


But this breaks a lot of iotests. And looks like it is wrong idea, 
because permissions should be updated for parent first. We've faced and 
fixed similar problems with invalidation sequence. Reopen architecture 
differs with it's bs_queue.. Any ideas how to fix this?

Similar story with invalidation includes

commit 16e977d506bcc2d9f7daa4a9f7cc2b48536d9da6
Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Date:   Tue Jan 31 14:23:08 2017 +0300

     block: bdrv_invalidate_cache: invalidate children first

and

commit dafe096057373d95847e1c99c2fece34be9fc5bb
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Nov 16 13:00:01 2017 +0100

     block: Fix permissions in image activation

-- 
Best regards,
Vladimir

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-12 18:57 [Qemu-devel] bug in reopen arch Vladimir Sementsov-Ogievskiy
@ 2018-06-14 10:46 ` Kevin Wolf
  2018-06-15 18:42   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2018-06-14 10:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> I've faced the following problem:
> 
>     1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
>     command block-dirty-bitmap-add)
> 
>     2. run the following commands:
> 
>         qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
>         qemu-io -c 'write 0 512' b.qcow2
>         qemu-img commit b.qcow2
> 
>     3. last command fails with the following output:
> 
> Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> wrote 512/512 bytes at offset 0
> 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
> qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update
> bitmap directory: Operation not permitted
> qemu-img: Block job failed: Operation not permitted
> 
> And problem is that children are reopened _after_ parent. But qcow2 reopen
> needs write access to its file, to write IN_USE flag to dirty-bitmaps
> extension.

I was aware of a different instance of this problem: Assume a qcow2
image with an unknown autoclear flag (so it will be cleared on r/w
open), which is first opened r/o and then reopened r/w. This will fail
because .bdrv_reopen_prepare doesn't have the permissions yet.

Simply changing the order won't fix this because in the r/w -> r/o, the
driver will legitimately flush its caches in .bdrv_reopen_prepare, and
for this it still needs to be able to write.

We may need to have a way for nodes to access both the old and the new
state of their children. I'm not completely sure how to achieve this
best, though.

When I thought only of permissions, the obvious and simple thing to do
was to just get combined permissions for the old and new state, i.e.
'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
this is actually enough when the child node switches between a r/w and
a r/o file descriptor because even though QEMU's permission system would
allow the write, you still can't successfully write to a r/o file
descriptor.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-14 10:46 ` Kevin Wolf
@ 2018-06-15 18:42   ` Vladimir Sementsov-Ogievskiy
  2018-06-20 14:28     ` Vladimir Sementsov-Ogievskiy
  2018-06-21 14:25     ` Kevin Wolf
  0 siblings, 2 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-15 18:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

14.06.2018 13:46, Kevin Wolf wrote:
> Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> I've faced the following problem:
>>
>>      1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
>>      command block-dirty-bitmap-add)
>>
>>      2. run the following commands:
>>
>>          qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
>>          qemu-io -c 'write 0 512' b.qcow2
>>          qemu-img commit b.qcow2
>>
>>      3. last command fails with the following output:
>>
>> Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>> wrote 512/512 bytes at offset 0
>> 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
>> qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update
>> bitmap directory: Operation not permitted
>> qemu-img: Block job failed: Operation not permitted
>>
>> And problem is that children are reopened _after_ parent. But qcow2 reopen
>> needs write access to its file, to write IN_USE flag to dirty-bitmaps
>> extension.
> I was aware of a different instance of this problem: Assume a qcow2
> image with an unknown autoclear flag (so it will be cleared on r/w
> open), which is first opened r/o and then reopened r/w. This will fail
> because .bdrv_reopen_prepare doesn't have the permissions yet.

Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with  
autoclear flags, as it doesn't call qcow2_do_open.

>
> Simply changing the order won't fix this because in the r/w -> r/o, the
> driver will legitimately flush its caches in .bdrv_reopen_prepare, and
> for this it still needs to be able to write.
>
> We may need to have a way for nodes to access both the old and the new
> state of their children. I'm not completely sure how to achieve this
> best, though.
>
> When I thought only of permissions, the obvious and simple thing to do
> was to just get combined permissions for the old and new state, i.e.
> 'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
> this is actually enough when the child node switches between a r/w and
> a r/o file descriptor because even though QEMU's permission system would
> allow the write, you still can't successfully write to a r/o file
> descriptor.
>
> Kevin

Maybe we want two .bdrv_reopen_prepare: 
.bdrv_reopen_prepare_before_children and 
.bdrv_reopen_prepare_after_children. But to write something in 
reopen_prepare, we need to move bdrv_set_perm from reopen_commit to 
reopen_prepare.. Is it possible?

Now, I've found the following workaround, what do you think about 
something like this as a temporary fix:

diff --git a/include/block/block.h b/include/block/block.h
index e677080c4e..c21392491d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -266,7 +266,8 @@ BlockDriverState *bdrv_new_open_driver(BlockDriver 
*drv, const char *node_name,
  BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
                                      BlockDriverState *bs,
                                      QDict *options, int flags);
-int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, 
Error **errp);
+int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue,
+                         bool cheat_reopen_rw, Error **errp);
  int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp);
  int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
                          BlockReopenQueue *queue, Error **errp);
diff --git a/block.c b/block.c
index 50887087f3..9b50828cd2 100644
--- a/block.c
+++ b/block.c
@@ -2988,7 +2988,8 @@ BlockReopenQueue 
*bdrv_reopen_queue(BlockReopenQueue *bs_queue,
   * All affected nodes must be drained between bdrv_reopen_queue() and
   * bdrv_reopen_multiple().
   */
-int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, 
Error **errp)
+int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue,
+                         bool cheat_reopen_rw, Error **errp)
  {
      int ret = -1;
      BlockReopenQueueEntry *bs_entry, *next;
@@ -3005,6 +3006,14 @@ int bdrv_reopen_multiple(AioContext *ctx, 
BlockReopenQueue *bs_queue, Error **er
          bs_entry->prepared = true;
      }

+    if (cheat_reopen_rw) {
+        /* reverse queue, to reopen children first */
+        QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
+            QSIMPLEQ_REMOVE(bs_queue, bs_entry, BlockReopenQueueEntry, 
entry);
+            QSIMPLEQ_INSERT_HEAD(bs_queue, bs_entry, entry);
+        }
+    }
+
      /* If we reach this point, we have success and just need to apply the
       * changes
       */
@@ -3036,11 +3045,13 @@ int bdrv_reopen(BlockDriverState *bs, int 
bdrv_flags, Error **errp)
      int ret = -1;
      Error *local_err = NULL;
      BlockReopenQueue *queue;
+    bool cheat_reopen_rw = bdrv_is_read_only(bs) && (bdrv_flags & 
BDRV_O_RDWR);

      bdrv_subtree_drained_begin(bs);

      queue = bdrv_reopen_queue(NULL, bs, NULL, bdrv_flags);
-    ret = bdrv_reopen_multiple(bdrv_get_aio_context(bs), queue, 
&local_err);
+    ret = bdrv_reopen_multiple(bdrv_get_aio_context(bs), queue, 
cheat_reopen_rw,
+                               &local_err);
      if (local_err != NULL) {
          error_propagate(errp, local_err);
      }
diff --git a/block/replication.c b/block/replication.c
index 826db7b304..e528969e2b 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -411,7 +411,7 @@ static void reopen_backing_file(BlockDriverState 
*bs, bool writable,

      if (reopen_queue) {
          bdrv_reopen_multiple(bdrv_get_aio_context(bs),
-                             reopen_queue, &local_err);
+                             reopen_queue, false, &local_err);
          error_propagate(errp, local_err);
      }

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 5bf5f28178..e0c4de323c 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -2051,7 +2051,7 @@ static int reopen_f(BlockBackend *blk, int argc, 
char **argv)

      bdrv_subtree_drained_begin(bs);
      brq = bdrv_reopen_queue(NULL, bs, opts, flags);
-    bdrv_reopen_multiple(bdrv_get_aio_context(bs), brq, &local_err);
+    bdrv_reopen_multiple(bdrv_get_aio_context(bs), brq, false, &local_err);
      bdrv_subtree_drained_end(bs);

      if (local_err) {


-- 
Best regards,
Vladimir

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-15 18:42   ` Vladimir Sementsov-Ogievskiy
@ 2018-06-20 14:28     ` Vladimir Sementsov-Ogievskiy
  2018-06-21 14:25     ` Kevin Wolf
  1 sibling, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-20 14:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

Kevin?

15.06.2018 21:42, Vladimir Sementsov-Ogievskiy wrote:
> 14.06.2018 13:46, Kevin Wolf wrote:
>> Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Hi all!
>>>
>>> I've faced the following problem:
>>>
>>>      1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
>>>      command block-dirty-bitmap-add)
>>>
>>>      2. run the following commands:
>>>
>>>          qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
>>>          qemu-io -c 'write 0 512' b.qcow2
>>>          qemu-img commit b.qcow2
>>>
>>>      3. last command fails with the following output:
>>>
>>> Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>> wrote 512/512 bytes at offset 0
>>> 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
>>> qemu-img: #block397: Failed to make dirty bitmaps writable: Can't 
>>> update
>>> bitmap directory: Operation not permitted
>>> qemu-img: Block job failed: Operation not permitted
>>>
>>> And problem is that children are reopened _after_ parent. But qcow2 
>>> reopen
>>> needs write access to its file, to write IN_USE flag to dirty-bitmaps
>>> extension.
>> I was aware of a different instance of this problem: Assume a qcow2
>> image with an unknown autoclear flag (so it will be cleared on r/w
>> open), which is first opened r/o and then reopened r/w. This will fail
>> because .bdrv_reopen_prepare doesn't have the permissions yet.
>
> Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with  
> autoclear flags, as it doesn't call qcow2_do_open.
>
>>
>> Simply changing the order won't fix this because in the r/w -> r/o, the
>> driver will legitimately flush its caches in .bdrv_reopen_prepare, and
>> for this it still needs to be able to write.
>>
>> We may need to have a way for nodes to access both the old and the new
>> state of their children. I'm not completely sure how to achieve this
>> best, though.
>>
>> When I thought only of permissions, the obvious and simple thing to do
>> was to just get combined permissions for the old and new state, i.e.
>> 'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
>> this is actually enough when the child node switches between a r/w and
>> a r/o file descriptor because even though QEMU's permission system would
>> allow the write, you still can't successfully write to a r/o file
>> descriptor.
>>
>> Kevin
>
> Maybe we want two .bdrv_reopen_prepare: 
> .bdrv_reopen_prepare_before_children and 
> .bdrv_reopen_prepare_after_children. But to write something in 
> reopen_prepare, we need to move bdrv_set_perm from reopen_commit to 
> reopen_prepare.. Is it possible?
>
> Now, I've found the following workaround, what do you think about 
> something like this as a temporary fix:
>
> diff --git a/include/block/block.h b/include/block/block.h
> index e677080c4e..c21392491d 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -266,7 +266,8 @@ BlockDriverState *bdrv_new_open_driver(BlockDriver 
> *drv, const char *node_name,
>  BlockReopenQueue *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
>                                      BlockDriverState *bs,
>                                      QDict *options, int flags);
> -int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, 
> Error **errp);
> +int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue,
> +                         bool cheat_reopen_rw, Error **errp);
>  int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp);
>  int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
>                          BlockReopenQueue *queue, Error **errp);
> diff --git a/block.c b/block.c
> index 50887087f3..9b50828cd2 100644
> --- a/block.c
> +++ b/block.c
> @@ -2988,7 +2988,8 @@ BlockReopenQueue 
> *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
>   * All affected nodes must be drained between bdrv_reopen_queue() and
>   * bdrv_reopen_multiple().
>   */
> -int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue, 
> Error **errp)
> +int bdrv_reopen_multiple(AioContext *ctx, BlockReopenQueue *bs_queue,
> +                         bool cheat_reopen_rw, Error **errp)
>  {
>      int ret = -1;
>      BlockReopenQueueEntry *bs_entry, *next;
> @@ -3005,6 +3006,14 @@ int bdrv_reopen_multiple(AioContext *ctx, 
> BlockReopenQueue *bs_queue, Error **er
>          bs_entry->prepared = true;
>      }
>
> +    if (cheat_reopen_rw) {
> +        /* reverse queue, to reopen children first */
> +        QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
> +            QSIMPLEQ_REMOVE(bs_queue, bs_entry, 
> BlockReopenQueueEntry, entry);
> +            QSIMPLEQ_INSERT_HEAD(bs_queue, bs_entry, entry);
> +        }
> +    }
> +
>      /* If we reach this point, we have success and just need to apply 
> the
>       * changes
>       */
> @@ -3036,11 +3045,13 @@ int bdrv_reopen(BlockDriverState *bs, int 
> bdrv_flags, Error **errp)
>      int ret = -1;
>      Error *local_err = NULL;
>      BlockReopenQueue *queue;
> +    bool cheat_reopen_rw = bdrv_is_read_only(bs) && (bdrv_flags & 
> BDRV_O_RDWR);
>
>      bdrv_subtree_drained_begin(bs);
>
>      queue = bdrv_reopen_queue(NULL, bs, NULL, bdrv_flags);
> -    ret = bdrv_reopen_multiple(bdrv_get_aio_context(bs), queue, 
> &local_err);
> +    ret = bdrv_reopen_multiple(bdrv_get_aio_context(bs), queue, 
> cheat_reopen_rw,
> +                               &local_err);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
>      }
> diff --git a/block/replication.c b/block/replication.c
> index 826db7b304..e528969e2b 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -411,7 +411,7 @@ static void reopen_backing_file(BlockDriverState 
> *bs, bool writable,
>
>      if (reopen_queue) {
>          bdrv_reopen_multiple(bdrv_get_aio_context(bs),
> -                             reopen_queue, &local_err);
> +                             reopen_queue, false, &local_err);
>          error_propagate(errp, local_err);
>      }
>
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 5bf5f28178..e0c4de323c 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -2051,7 +2051,7 @@ static int reopen_f(BlockBackend *blk, int argc, 
> char **argv)
>
>      bdrv_subtree_drained_begin(bs);
>      brq = bdrv_reopen_queue(NULL, bs, opts, flags);
> -    bdrv_reopen_multiple(bdrv_get_aio_context(bs), brq, &local_err);
> +    bdrv_reopen_multiple(bdrv_get_aio_context(bs), brq, false, 
> &local_err);
>      bdrv_subtree_drained_end(bs);
>
>      if (local_err) {
>
>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-15 18:42   ` Vladimir Sementsov-Ogievskiy
  2018-06-20 14:28     ` Vladimir Sementsov-Ogievskiy
@ 2018-06-21 14:25     ` Kevin Wolf
  2018-06-21 15:55       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2018-06-21 14:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 14.06.2018 13:46, Kevin Wolf wrote:
> > Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > Hi all!
> > > 
> > > I've faced the following problem:
> > > 
> > >      1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
> > >      command block-dirty-bitmap-add)
> > > 
> > >      2. run the following commands:
> > > 
> > >          qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
> > >          qemu-io -c 'write 0 512' b.qcow2
> > >          qemu-img commit b.qcow2
> > > 
> > >      3. last command fails with the following output:
> > > 
> > > Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
> > > cluster_size=65536 lazy_refcounts=off refcount_bits=16
> > > wrote 512/512 bytes at offset 0
> > > 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
> > > qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update
> > > bitmap directory: Operation not permitted
> > > qemu-img: Block job failed: Operation not permitted
> > > 
> > > And problem is that children are reopened _after_ parent. But qcow2 reopen
> > > needs write access to its file, to write IN_USE flag to dirty-bitmaps
> > > extension.
> > I was aware of a different instance of this problem: Assume a qcow2
> > image with an unknown autoclear flag (so it will be cleared on r/w
> > open), which is first opened r/o and then reopened r/w. This will fail
> > because .bdrv_reopen_prepare doesn't have the permissions yet.
> 
> Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with 
> autoclear flags, as it doesn't call qcow2_do_open.

Hm, right, not sure what I really meant back then when I added it to my
to-do list... Maybe I confused reopen and invalidate_cache.

> > Simply changing the order won't fix this because in the r/w -> r/o, the
> > driver will legitimately flush its caches in .bdrv_reopen_prepare, and
> > for this it still needs to be able to write.
> > 
> > We may need to have a way for nodes to access both the old and the new
> > state of their children. I'm not completely sure how to achieve this
> > best, though.
> > 
> > When I thought only of permissions, the obvious and simple thing to do
> > was to just get combined permissions for the old and new state, i.e.
> > 'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
> > this is actually enough when the child node switches between a r/w and
> > a r/o file descriptor because even though QEMU's permission system would
> > allow the write, you still can't successfully write to a r/o file
> > descriptor.
> > 
> > Kevin
> 
> Maybe we want two .bdrv_reopen_prepare: .bdrv_reopen_prepare_before_children
> and .bdrv_reopen_prepare_after_children. But to write something in
> reopen_prepare, we need to move bdrv_set_perm from reopen_commit to
> .. Is it possible?

Getting the permission problems out of the way can be solved by changing
permissions twice, like I said above: First to the combined permissions
of old and new, and finally to only the new permissions.

The problem I see with .bdrv_reopen_prepare_after_children is that I
don't see how it actually buys you anything: Even if the children
already prepared the reopen, any access of the child node still refers
to the old file descriptor because the new one only becomes valid with
.bdrv_reopen_commit.

> Now, I've found the following workaround, what do you think about something
> like this as a temporary fix:

I honestly don't understand why this workaround makes any difference.
Shouldn't all .bdrv_reopen_prepare() callbacks still work on the old
version of the child node?

Even if I understood the reason, it looks a bit too hacky probably.
Maybe I'll change may opinion once I understand it.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-21 14:25     ` Kevin Wolf
@ 2018-06-21 15:55       ` Vladimir Sementsov-Ogievskiy
  2018-06-21 17:17         ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-21 15:55 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

21.06.2018 17:25, Kevin Wolf wrote:
> Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 14.06.2018 13:46, Kevin Wolf wrote:
>>> Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> Hi all!
>>>>
>>>> I've faced the following problem:
>>>>
>>>>       1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
>>>>       command block-dirty-bitmap-add)
>>>>
>>>>       2. run the following commands:
>>>>
>>>>           qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
>>>>           qemu-io -c 'write 0 512' b.qcow2
>>>>           qemu-img commit b.qcow2
>>>>
>>>>       3. last command fails with the following output:
>>>>
>>>> Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
>>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>>> wrote 512/512 bytes at offset 0
>>>> 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
>>>> qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update
>>>> bitmap directory: Operation not permitted
>>>> qemu-img: Block job failed: Operation not permitted
>>>>
>>>> And problem is that children are reopened _after_ parent. But qcow2 reopen
>>>> needs write access to its file, to write IN_USE flag to dirty-bitmaps
>>>> extension.
>>> I was aware of a different instance of this problem: Assume a qcow2
>>> image with an unknown autoclear flag (so it will be cleared on r/w
>>> open), which is first opened r/o and then reopened r/w. This will fail
>>> because .bdrv_reopen_prepare doesn't have the permissions yet.
>> Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with
>> autoclear flags, as it doesn't call qcow2_do_open.
> Hm, right, not sure what I really meant back then when I added it to my
> to-do list... Maybe I confused reopen and invalidate_cache.
>
>>> Simply changing the order won't fix this because in the r/w -> r/o, the
>>> driver will legitimately flush its caches in .bdrv_reopen_prepare, and
>>> for this it still needs to be able to write.
>>>
>>> We may need to have a way for nodes to access both the old and the new
>>> state of their children. I'm not completely sure how to achieve this
>>> best, though.
>>>
>>> When I thought only of permissions, the obvious and simple thing to do
>>> was to just get combined permissions for the old and new state, i.e.
>>> 'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
>>> this is actually enough when the child node switches between a r/w and
>>> a r/o file descriptor because even though QEMU's permission system would
>>> allow the write, you still can't successfully write to a r/o file
>>> descriptor.
>>>
>>> Kevin
>> Maybe we want two .bdrv_reopen_prepare: .bdrv_reopen_prepare_before_children
>> and .bdrv_reopen_prepare_after_children. But to write something in
>> reopen_prepare, we need to move bdrv_set_perm from reopen_commit to
>> .. Is it possible?
> Getting the permission problems out of the way can be solved by changing
> permissions twice, like I said above: First to the combined permissions
> of old and new, and finally to only the new permissions.
>
> The problem I see with .bdrv_reopen_prepare_after_children is that I
> don't see how it actually buys you anything: Even if the children
> already prepared the reopen, any access of the child node still refers
> to the old file descriptor because the new one only becomes valid with
> .bdrv_reopen_commit.
>
>> Now, I've found the following workaround, what do you think about something
>> like this as a temporary fix:
> I honestly don't understand why this workaround makes any difference.

with this patch, commit for children will be called earlier than for 
parent, so, when reopening bitmaps rw (which is done in commit) bs->file 
will be already completely reopened rw, and all works.

> Shouldn't all .bdrv_reopen_prepare() callbacks still work on the old
> version of the child node?

yes, but problem is with commit.

>
> Even if I understood the reason, it looks a bit too hacky probably.
> Maybe I'll change may opinion once I understand it.
>
> Kevin


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-21 15:55       ` Vladimir Sementsov-Ogievskiy
@ 2018-06-21 17:17         ` Kevin Wolf
  2018-06-21 17:44           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2018-06-21 17:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

Am 21.06.2018 um 17:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 21.06.2018 17:25, Kevin Wolf wrote:
> > Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 14.06.2018 13:46, Kevin Wolf wrote:
> > > > Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > Hi all!
> > > > > 
> > > > > I've faced the following problem:
> > > > > 
> > > > >       1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
> > > > >       command block-dirty-bitmap-add)
> > > > > 
> > > > >       2. run the following commands:
> > > > > 
> > > > >           qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
> > > > >           qemu-io -c 'write 0 512' b.qcow2
> > > > >           qemu-img commit b.qcow2
> > > > > 
> > > > >       3. last command fails with the following output:
> > > > > 
> > > > > Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
> > > > > cluster_size=65536 lazy_refcounts=off refcount_bits=16
> > > > > wrote 512/512 bytes at offset 0
> > > > > 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
> > > > > qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update
> > > > > bitmap directory: Operation not permitted
> > > > > qemu-img: Block job failed: Operation not permitted
> > > > > 
> > > > > And problem is that children are reopened _after_ parent. But qcow2 reopen
> > > > > needs write access to its file, to write IN_USE flag to dirty-bitmaps
> > > > > extension.
> > > > I was aware of a different instance of this problem: Assume a qcow2
> > > > image with an unknown autoclear flag (so it will be cleared on r/w
> > > > open), which is first opened r/o and then reopened r/w. This will fail
> > > > because .bdrv_reopen_prepare doesn't have the permissions yet.
> > > Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with
> > > autoclear flags, as it doesn't call qcow2_do_open.
> > Hm, right, not sure what I really meant back then when I added it to my
> > to-do list... Maybe I confused reopen and invalidate_cache.
> > 
> > > > Simply changing the order won't fix this because in the r/w -> r/o, the
> > > > driver will legitimately flush its caches in .bdrv_reopen_prepare, and
> > > > for this it still needs to be able to write.
> > > > 
> > > > We may need to have a way for nodes to access both the old and the new
> > > > state of their children. I'm not completely sure how to achieve this
> > > > best, though.
> > > > 
> > > > When I thought only of permissions, the obvious and simple thing to do
> > > > was to just get combined permissions for the old and new state, i.e.
> > > > 'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
> > > > this is actually enough when the child node switches between a r/w and
> > > > a r/o file descriptor because even though QEMU's permission system would
> > > > allow the write, you still can't successfully write to a r/o file
> > > > descriptor.
> > > > 
> > > > Kevin
> > > Maybe we want two .bdrv_reopen_prepare: .bdrv_reopen_prepare_before_children
> > > and .bdrv_reopen_prepare_after_children. But to write something in
> > > reopen_prepare, we need to move bdrv_set_perm from reopen_commit to
> > > .. Is it possible?
> > Getting the permission problems out of the way can be solved by changing
> > permissions twice, like I said above: First to the combined permissions
> > of old and new, and finally to only the new permissions.
> > 
> > The problem I see with .bdrv_reopen_prepare_after_children is that I
> > don't see how it actually buys you anything: Even if the children
> > already prepared the reopen, any access of the child node still refers
> > to the old file descriptor because the new one only becomes valid with
> > .bdrv_reopen_commit.
> > 
> > > Now, I've found the following workaround, what do you think about something
> > > like this as a temporary fix:
> > I honestly don't understand why this workaround makes any difference.
> 
> with this patch, commit for children will be called earlier than for parent,
> so, when reopening bitmaps rw (which is done in commit) bs->file will be
> already completely reopened rw, and all works.

.bdrv_reopen_commit() can't do any I/O because it must not fail.
Therefore the order in which nodes are committed should not matter.

Any I/O that needs to be done has to be in .bdrv_reopen_prepare() (and
possibly be kept in a temporary buffer) and .bdrv_reopen_commit() can
only apply what is already in memory.

I don't see the code for reopening bitmaps in master. Is this a pending
patch?

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-21 17:17         ` Kevin Wolf
@ 2018-06-21 17:44           ` Vladimir Sementsov-Ogievskiy
  2018-06-22  8:56             ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-21 17:44 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev

21.06.2018 20:17, Kevin Wolf wrote:
> Am 21.06.2018 um 17:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 21.06.2018 17:25, Kevin Wolf wrote:
>>> Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 14.06.2018 13:46, Kevin Wolf wrote:
>>>>> Am 12.06.2018 um 20:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> Hi all!
>>>>>>
>>>>>> I've faced the following problem:
>>>>>>
>>>>>>        1. create image with dirty bitmap, a.qcow2 (start qemu and run qmp
>>>>>>        command block-dirty-bitmap-add)
>>>>>>
>>>>>>        2. run the following commands:
>>>>>>
>>>>>>            qemu-img create -f qcow2 -b a.qcow2 b.qcow2 10M
>>>>>>            qemu-io -c 'write 0 512' b.qcow2
>>>>>>            qemu-img commit b.qcow2
>>>>>>
>>>>>>        3. last command fails with the following output:
>>>>>>
>>>>>> Formatting 'b.qcow2', fmt=qcow2 size=68719476736 backing_file=a.qcow2
>>>>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>>>>> wrote 512/512 bytes at offset 0
>>>>>> 512 bytes, 1 ops; 0.0953 sec (5.243 KiB/sec and 10.4867 ops/sec)
>>>>>> qemu-img: #block397: Failed to make dirty bitmaps writable: Can't update
>>>>>> bitmap directory: Operation not permitted
>>>>>> qemu-img: Block job failed: Operation not permitted
>>>>>>
>>>>>> And problem is that children are reopened _after_ parent. But qcow2 reopen
>>>>>> needs write access to its file, to write IN_USE flag to dirty-bitmaps
>>>>>> extension.
>>>>> I was aware of a different instance of this problem: Assume a qcow2
>>>>> image with an unknown autoclear flag (so it will be cleared on r/w
>>>>> open), which is first opened r/o and then reopened r/w. This will fail
>>>>> because .bdrv_reopen_prepare doesn't have the permissions yet.
>>>> Hm.. If I understand correctly qcow2_reopen_prepare doesn't deal with
>>>> autoclear flags, as it doesn't call qcow2_do_open.
>>> Hm, right, not sure what I really meant back then when I added it to my
>>> to-do list... Maybe I confused reopen and invalidate_cache.
>>>
>>>>> Simply changing the order won't fix this because in the r/w -> r/o, the
>>>>> driver will legitimately flush its caches in .bdrv_reopen_prepare, and
>>>>> for this it still needs to be able to write.
>>>>>
>>>>> We may need to have a way for nodes to access both the old and the new
>>>>> state of their children. I'm not completely sure how to achieve this
>>>>> best, though.
>>>>>
>>>>> When I thought only of permissions, the obvious and simple thing to do
>>>>> was to just get combined permissions for the old and new state, i.e.
>>>>> 'old_perm | new_perm' and 'old_shared & new_shared'. But I don't think
>>>>> this is actually enough when the child node switches between a r/w and
>>>>> a r/o file descriptor because even though QEMU's permission system would
>>>>> allow the write, you still can't successfully write to a r/o file
>>>>> descriptor.
>>>>>
>>>>> Kevin
>>>> Maybe we want two .bdrv_reopen_prepare: .bdrv_reopen_prepare_before_children
>>>> and .bdrv_reopen_prepare_after_children. But to write something in
>>>> reopen_prepare, we need to move bdrv_set_perm from reopen_commit to
>>>> .. Is it possible?
>>> Getting the permission problems out of the way can be solved by changing
>>> permissions twice, like I said above: First to the combined permissions
>>> of old and new, and finally to only the new permissions.
>>>
>>> The problem I see with .bdrv_reopen_prepare_after_children is that I
>>> don't see how it actually buys you anything: Even if the children
>>> already prepared the reopen, any access of the child node still refers
>>> to the old file descriptor because the new one only becomes valid with
>>> .bdrv_reopen_commit.
>>>
>>>> Now, I've found the following workaround, what do you think about something
>>>> like this as a temporary fix:
>>> I honestly don't understand why this workaround makes any difference.
>> with this patch, commit for children will be called earlier than for parent,
>> so, when reopening bitmaps rw (which is done in commit) bs->file will be
>> already completely reopened rw, and all works.
> .bdrv_reopen_commit() can't do any I/O because it must not fail.
> Therefore the order in which nodes are committed should not matter.
>
> Any I/O that needs to be done has to be in .bdrv_reopen_prepare() (and
> possibly be kept in a temporary buffer) and .bdrv_reopen_commit() can
> only apply what is already in memory.
>
> I don't see the code for reopening bitmaps in master. Is this a pending
> patch?

it is in block.c, in bdrv_reopen_commit()

...
if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
         Error *local_err = NULL;
         if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
...

>
> Kevin


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-21 17:44           ` Vladimir Sementsov-Ogievskiy
@ 2018-06-22  8:56             ` Kevin Wolf
  2018-06-22 11:17               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2018-06-22  8:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev, berto

(Berto, I'm CCing you just because this is about reopen, so you might
have thoughts about it. But it's not really related to what you're
currently working on.)

Am 21.06.2018 um 19:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 21.06.2018 20:17, Kevin Wolf wrote:
> > Am 21.06.2018 um 17:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 21.06.2018 17:25, Kevin Wolf wrote:
> > > > Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > Now, I've found the following workaround, what do you think about something
> > > > > like this as a temporary fix:
> > > > I honestly don't understand why this workaround makes any difference.
> > > with this patch, commit for children will be called earlier than for parent,
> > > so, when reopening bitmaps rw (which is done in commit) bs->file will be
> > > already completely reopened rw, and all works.
> > .bdrv_reopen_commit() can't do any I/O because it must not fail.
> > Therefore the order in which nodes are committed should not matter.
> > 
> > Any I/O that needs to be done has to be in .bdrv_reopen_prepare() (and
> > possibly be kept in a temporary buffer) and .bdrv_reopen_commit() can
> > only apply what is already in memory.
> > 
> > I don't see the code for reopening bitmaps in master. Is this a pending
> > patch?
> 
> it is in block.c, in bdrv_reopen_commit()
> 
> ...
> if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
>         Error *local_err = NULL;
>         if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {

This is already an ugly hack, we shouldn't have a separate callback for
reopening bitmaps. :-(

If done properly, this code would only exist internally in qcow2 as part
of the .bdrv_reopen/commit implementation.

I'm also not convinced of the error handling. According to commit
cb9ff6c25, this is mostly about the IN_USE flag. The qcow2 spec says
"The bitmap was not saved correctly and may be inconsistent." for this
flag. So the fail-safe state is IN_USE being set.


What I think qcow2 should be doing is setting IN_USE in .prepare (for ro
-> rw) and clearing it in .commit (for rw -> ro).

For this to be possible, .prepare needs write access to an image that
was read-only before and is becoming writable; and .commit needs write
access to an image that was writable and is becoming read-only.

This is the problem to solve, and implementing a proper design that can
provide this will solve the bug for you, too.


I'm not completely sure how this can be achieved best. Allowing parents
to choose whether they want to access the old or the new state is
probably not going to work in the general case because that's
essentially an image opened twice.

Even if it feels like a hack, too, maybe we need to make file-posix
already switch to the "better" file descriptor in .prepare and store the
old one in the BDRVRawReopenState so it can be restored in .abort.

The "better" file descriptor is the one that allows more operation, i.e.
writable is better than read-only in this sense. Of course, if we have
two options like read-only that can prevent certain operations, it may
be unclear, which of the two file descriptors is the one we want.

And obviously, this still needs child-to-parent .prepare order. I
believe when I tried reversing the current order a while ago, I ran into
problems, so whether this is possible or how it needs to be done in
detail needs to be checked carefully.

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-22  8:56             ` Kevin Wolf
@ 2018-06-22 11:17               ` Vladimir Sementsov-Ogievskiy
  2018-06-22 13:03                 ` Kevin Wolf
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-06-22 11:17 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev, berto

22.06.2018 11:56, Kevin Wolf wrote:
> (Berto, I'm CCing you just because this is about reopen, so you might
> have thoughts about it. But it's not really related to what you're
> currently working on.)
>
> Am 21.06.2018 um 19:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 21.06.2018 20:17, Kevin Wolf wrote:
>>> Am 21.06.2018 um 17:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 21.06.2018 17:25, Kevin Wolf wrote:
>>>>> Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> Now, I've found the following workaround, what do you think about something
>>>>>> like this as a temporary fix:
>>>>> I honestly don't understand why this workaround makes any difference.
>>>> with this patch, commit for children will be called earlier than for parent,
>>>> so, when reopening bitmaps rw (which is done in commit) bs->file will be
>>>> already completely reopened rw, and all works.
>>> .bdrv_reopen_commit() can't do any I/O because it must not fail.
>>> Therefore the order in which nodes are committed should not matter.
>>>
>>> Any I/O that needs to be done has to be in .bdrv_reopen_prepare() (and
>>> possibly be kept in a temporary buffer) and .bdrv_reopen_commit() can
>>> only apply what is already in memory.
>>>
>>> I don't see the code for reopening bitmaps in master. Is this a pending
>>> patch?
>> it is in block.c, in bdrv_reopen_commit()
>>
>> ...
>> if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
>>          Error *local_err = NULL;
>>          if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
> This is already an ugly hack, we shouldn't have a separate callback for
> reopening bitmaps. :-(
>
> If done properly, this code would only exist internally in qcow2 as part
> of the .bdrv_reopen/commit implementation.
>
> I'm also not convinced of the error handling. According to commit
> cb9ff6c25, this is mostly about the IN_USE flag. The qcow2 spec says
> "The bitmap was not saved correctly and may be inconsistent." for this
> flag. So the fail-safe state is IN_USE being set.
>
>
> What I think qcow2 should be doing is setting IN_USE in .prepare (for ro
> -> rw) and clearing it in .commit (for rw -> ro).

why this in prepare and this in commit? Not both in commit or both in 
prepare?

>
> For this to be possible, .prepare needs write access to an image that
> was read-only before and is becoming writable; and .commit needs write
> access to an image that was writable and is becoming read-only.
>
> This is the problem to solve, and implementing a proper design that can
> provide this will solve the bug for you, too.
>
>
> I'm not completely sure how this can be achieved best. Allowing parents
> to choose whether they want to access the old or the new state is
> probably not going to work in the general case because that's
> essentially an image opened twice.
>
> Even if it feels like a hack, too, maybe we need to make file-posix
> already switch to the "better" file descriptor in .prepare and store the
> old one in the BDRVRawReopenState so it can be restored in .abort.
>
> The "better" file descriptor is the one that allows more operation, i.e.
> writable is better than read-only in this sense. Of course, if we have
> two options like read-only that can prevent certain operations, it may
> be unclear, which of the two file descriptors is the one we want.
>
> And obviously, this still needs child-to-parent .prepare order. I
> believe when I tried reversing the current order a while ago, I ran into
> problems, so whether this is possible or how it needs to be done in
> detail needs to be checked carefully.
>
> Kevin


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] bug in reopen arch
  2018-06-22 11:17               ` Vladimir Sementsov-Ogievskiy
@ 2018-06-22 13:03                 ` Kevin Wolf
  0 siblings, 0 replies; 11+ messages in thread
From: Kevin Wolf @ 2018-06-22 13:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu block, qemu-devel, Max Reitz, Denis V. Lunev, berto

Am 22.06.2018 um 13:17 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 22.06.2018 11:56, Kevin Wolf wrote:
> > (Berto, I'm CCing you just because this is about reopen, so you might
> > have thoughts about it. But it's not really related to what you're
> > currently working on.)
> > 
> > Am 21.06.2018 um 19:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 21.06.2018 20:17, Kevin Wolf wrote:
> > > > Am 21.06.2018 um 17:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > 21.06.2018 17:25, Kevin Wolf wrote:
> > > > > > Am 15.06.2018 um 20:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > > > Now, I've found the following workaround, what do you think about something
> > > > > > > like this as a temporary fix:
> > > > > > I honestly don't understand why this workaround makes any difference.
> > > > > with this patch, commit for children will be called earlier than for parent,
> > > > > so, when reopening bitmaps rw (which is done in commit) bs->file will be
> > > > > already completely reopened rw, and all works.
> > > > .bdrv_reopen_commit() can't do any I/O because it must not fail.
> > > > Therefore the order in which nodes are committed should not matter.
> > > > 
> > > > Any I/O that needs to be done has to be in .bdrv_reopen_prepare() (and
> > > > possibly be kept in a temporary buffer) and .bdrv_reopen_commit() can
> > > > only apply what is already in memory.
> > > > 
> > > > I don't see the code for reopening bitmaps in master. Is this a pending
> > > > patch?
> > > it is in block.c, in bdrv_reopen_commit()
> > > 
> > > ...
> > > if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
> > >          Error *local_err = NULL;
> > >          if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
> > This is already an ugly hack, we shouldn't have a separate callback for
> > reopening bitmaps. :-(
> > 
> > If done properly, this code would only exist internally in qcow2 as part
> > of the .bdrv_reopen/commit implementation.
> > 
> > I'm also not convinced of the error handling. According to commit
> > cb9ff6c25, this is mostly about the IN_USE flag. The qcow2 spec says
> > "The bitmap was not saved correctly and may be inconsistent." for this
> > flag. So the fail-safe state is IN_USE being set.
> > 
> > 
> > What I think qcow2 should be doing is setting IN_USE in .prepare (for ro
> > -> rw) and clearing it in .commit (for rw -> ro).
> 
> why this in prepare and this in commit? Not both in commit or both in
> prepare?

We cannot guarantee full transactional semantics because we need to
write to the image to commit, and this may fail. So no atomicity for us.

For the case that the write fails, we have the choice between:

1. Clear IN_USE in .prepare, even though it should be set for writable
   images. Setting it in .commit fails. This is unsafe if we make the
   bitmaps writable as promised. Currently code leaves them read-only,
   but doesn't return an error from the reopen operation. The management
   tool can't know about the situation, so it will fail later.

2. Set IN_USE in .prepare, even though it can be cleared for read-only
   files. Clearing it in .commit fails. This is always safe and leaves
   the read-only image in a state as if QEMU had crashed.

I think 2. is preferable.

Kevin

> > For this to be possible, .prepare needs write access to an image that
> > was read-only before and is becoming writable; and .commit needs write
> > access to an image that was writable and is becoming read-only.
> > 
> > This is the problem to solve, and implementing a proper design that can
> > provide this will solve the bug for you, too.
> > 
> > 
> > I'm not completely sure how this can be achieved best. Allowing parents
> > to choose whether they want to access the old or the new state is
> > probably not going to work in the general case because that's
> > essentially an image opened twice.
> > 
> > Even if it feels like a hack, too, maybe we need to make file-posix
> > already switch to the "better" file descriptor in .prepare and store the
> > old one in the BDRVRawReopenState so it can be restored in .abort.
> > 
> > The "better" file descriptor is the one that allows more operation, i.e.
> > writable is better than read-only in this sense. Of course, if we have
> > two options like read-only that can prevent certain operations, it may
> > be unclear, which of the two file descriptors is the one we want.
> > 
> > And obviously, this still needs child-to-parent .prepare order. I
> > believe when I tried reversing the current order a while ago, I ran into
> > problems, so whether this is possible or how it needs to be done in
> > detail needs to be checked carefully.
> > 
> > Kevin
> 
> 
> -- 
> Best regards,
> Vladimir
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-06-22 13:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-12 18:57 [Qemu-devel] bug in reopen arch Vladimir Sementsov-Ogievskiy
2018-06-14 10:46 ` Kevin Wolf
2018-06-15 18:42   ` Vladimir Sementsov-Ogievskiy
2018-06-20 14:28     ` Vladimir Sementsov-Ogievskiy
2018-06-21 14:25     ` Kevin Wolf
2018-06-21 15:55       ` Vladimir Sementsov-Ogievskiy
2018-06-21 17:17         ` Kevin Wolf
2018-06-21 17:44           ` Vladimir Sementsov-Ogievskiy
2018-06-22  8:56             ` Kevin Wolf
2018-06-22 11:17               ` Vladimir Sementsov-Ogievskiy
2018-06-22 13:03                 ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.