All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] qcow2-bitmaps: rewrite reopening logic
@ 2019-05-23 15:47 Vladimir Sementsov-Ogievskiy
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit Vladimir Sementsov-Ogievskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-23 15:47 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: fam, kwolf, vsementsov, mreitz, den, jsnow

Hi all!

Bitmaps reopening is buggy, we may easily produce broken incremental
backup if we do temporary snaphost. Let's fix it! The main patch is 03
and it has full description of the problem and solution.

Vladimir Sementsov-Ogievskiy (3):
  iotests: add test 255 to check bitmap life after snapshot + commit
  block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps
  block/qcow2-bitmap: rewrite bitmap reopening logic

 block/qcow2.h                |   5 +-
 include/block/block_int.h    |   2 +-
 include/block/dirty-bitmap.h |   1 -
 block.c                      |  29 ++---
 block/dirty-bitmap.c         |  12 --
 block/qcow2-bitmap.c         | 220 ++++++++++++++++++++++++-----------
 block/qcow2.c                |   2 +-
 python/qemu/__init__.py      |   4 +-
 tests/qemu-iotests/255       |  83 +++++++++++++
 tests/qemu-iotests/255.out   |  52 +++++++++
 tests/qemu-iotests/group     |   1 +
 11 files changed, 308 insertions(+), 103 deletions(-)
 create mode 100755 tests/qemu-iotests/255
 create mode 100644 tests/qemu-iotests/255.out

-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit
  2019-05-23 15:47 [Qemu-devel] [PATCH 0/3] qcow2-bitmaps: rewrite reopening logic Vladimir Sementsov-Ogievskiy
@ 2019-05-23 15:47 ` Vladimir Sementsov-Ogievskiy
  2019-05-24 23:15   ` John Snow
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 2/3] block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Vladimir Sementsov-Ogievskiy
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-23 15:47 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: fam, kwolf, vsementsov, mreitz, den, jsnow

Two testcases with persistent bitmaps are not added here, as there are
bugs to be fixed soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 python/qemu/__init__.py    |  4 +-
 tests/qemu-iotests/255     | 81 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/255.out | 17 ++++++++
 tests/qemu-iotests/group   |  1 +
 4 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/255
 create mode 100644 tests/qemu-iotests/255.out

diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py
index 81d9657ec0..4c4317a55e 100644
--- a/python/qemu/__init__.py
+++ b/python/qemu/__init__.py
@@ -402,7 +402,7 @@ class QEMUMachine(object):
         self._qmp.clear_events()
         return events
 
-    def event_wait(self, name, timeout=60.0, match=None):
+    def event_wait(self, name, timeout=60.0, match=None, fail_on=None):
         """
         Wait for specified timeout on named event in QMP; optionally filter
         results by match.
@@ -430,6 +430,7 @@ class QEMUMachine(object):
 
         # Search cached events
         for event in self._events:
+            assert event['event'] != fail_on
             if (event['event'] == name) and event_match(event, match):
                 self._events.remove(event)
                 return event
@@ -437,6 +438,7 @@ class QEMUMachine(object):
         # Poll for new events
         while True:
             event = self._qmp.pull_event(wait=timeout)
+            assert event['event'] != fail_on
             if (event['event'] == name) and event_match(event, match):
                 return event
             self._events.append(event)
diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
new file mode 100755
index 0000000000..36712689d3
--- /dev/null
+++ b/tests/qemu-iotests/255
@@ -0,0 +1,81 @@
+#!/usr/bin/env python
+#
+# Tests for temporary external snapshot when we have bitmaps.
+#
+# Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import iotests
+from iotests import qemu_img_create, file_path, log
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+
+base, top = file_path('base', 'top')
+size = 64 * 1024 * 3
+
+
+def print_bitmap(msg, vm):
+    result = vm.qmp('query-block')['return'][0]
+    if 'dirty-bitmaps' in result:
+        bitmap = result['dirty-bitmaps'][0]
+        log('{}: name={} dirty-clusters={}'.format(msg, bitmap['name'],
+            bitmap['count'] // 64 // 1024))
+    else:
+        log(msg + ': not found')
+
+
+def test(persistent, restart):
+    assert persistent or not restart
+    log("\nTestcase {}persistent {} restart\n".format(
+            '' if persistent else 'non-', 'with' if restart else 'without'))
+
+    qemu_img_create('-f', iotests.imgfmt, base, str(size))
+
+    vm = iotests.VM().add_drive(base)
+    vm.launch()
+
+    vm.qmp_log('block-dirty-bitmap-add', node='drive0', name='bitmap0',
+               persistent=persistent)
+    vm.hmp_qemu_io('drive0', 'write 0 64K')
+    print_bitmap('initial bitmap', vm)
+
+    vm.qmp_log('blockdev-snapshot-sync', device='drive0', snapshot_file=top,
+               format=iotests.imgfmt, filters=[iotests.filter_qmp_testfiles])
+    vm.hmp_qemu_io('drive0', 'write 64K 512')
+    print_bitmap('check, that no bitmaps in snapshot', vm)
+
+    if restart:
+        log("... Restart ...")
+        vm.shutdown()
+        vm = iotests.VM().add_drive(top)
+        vm.launch()
+
+    vm.qmp_log('block-commit', device='drive0', top=top,
+               filters=[iotests.filter_qmp_testfiles])
+    log(vm.event_wait('BLOCK_JOB_READY', fail_on='BLOCK_JOB_COMPLETED'),
+        filters=[iotests.filter_qmp_event])
+    vm.qmp_log('block-job-complete', device='drive0')
+    log(vm.event_wait('BLOCK_JOB_COMPLETED'),
+        filters=[iotests.filter_qmp_event])
+    print_bitmap('check merged bitmap', vm)
+
+    vm.hmp_qemu_io('drive0', 'write 128K 64K')
+    print_bitmap('check updated bitmap', vm)
+
+    vm.shutdown()
+
+
+test(persistent=False, restart=False)
diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
new file mode 100644
index 0000000000..2bffb486d2
--- /dev/null
+++ b/tests/qemu-iotests/255.out
@@ -0,0 +1,17 @@
+
+Testcase non-persistent without restart
+
+{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": false}}
+{"return": {}}
+initial bitmap: name=bitmap0 dirty-clusters=1
+{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
+{"return": {}}
+check, that no bitmaps in snapshot: not found
+{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+check merged bitmap: name=bitmap0 dirty-clusters=2
+check updated bitmap: name=bitmap0 dirty-clusters=3
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 52b7c16e15..2758f48143 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -251,3 +251,4 @@
 249 rw auto quick
 252 rw auto backing quick
 253 rw auto quick
+255 rw auto quick
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 2/3] block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps
  2019-05-23 15:47 [Qemu-devel] [PATCH 0/3] qcow2-bitmaps: rewrite reopening logic Vladimir Sementsov-Ogievskiy
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit Vladimir Sementsov-Ogievskiy
@ 2019-05-23 15:47 ` Vladimir Sementsov-Ogievskiy
  2019-05-24 23:37   ` John Snow
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic Vladimir Sementsov-Ogievskiy
  2 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-23 15:47 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: fam, kwolf, vsementsov, mreitz, den, jsnow

Firstly, no reason to optimize failure path. Then, function name is
ambiguous: it checks for readonly and similar things, but someone may
think that it will ignore normal bitmaps which was just unchanged, and
this is in bad relation with the fact that we should drop IN_USE flag
for unchanged bitmaps in the image.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/dirty-bitmap.h |  1 -
 block/dirty-bitmap.c         | 12 ------------
 block/qcow2-bitmap.c         | 23 +++++++++++++----------
 3 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 8044ace63e..816022972b 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -105,7 +105,6 @@ bool bdrv_has_readonly_bitmaps(BlockDriverState *bs);
 bool bdrv_dirty_bitmap_get_autoload(const BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_get_persistence(BdrvDirtyBitmap *bitmap);
 bool bdrv_dirty_bitmap_inconsistent(const BdrvDirtyBitmap *bitmap);
-bool bdrv_has_changed_persistent_bitmaps(BlockDriverState *bs);
 BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
                                         BdrvDirtyBitmap *bitmap);
 char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 59e6ebb861..eca2eed0bf 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -775,18 +775,6 @@ bool bdrv_dirty_bitmap_inconsistent(const BdrvDirtyBitmap *bitmap)
     return bitmap->inconsistent;
 }
 
-bool bdrv_has_changed_persistent_bitmaps(BlockDriverState *bs)
-{
-    BdrvDirtyBitmap *bm;
-    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
-        if (bm->persistent && !bm->readonly && !bm->migration) {
-            return true;
-        }
-    }
-
-    return false;
-}
-
 BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
                                         BdrvDirtyBitmap *bitmap)
 {
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 8a75366c92..2b84bfa007 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1457,16 +1457,7 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
     Qcow2Bitmap *bm;
     QSIMPLEQ_HEAD(, Qcow2BitmapTable) drop_tables;
     Qcow2BitmapTable *tb, *tb_next;
-
-    if (!bdrv_has_changed_persistent_bitmaps(bs)) {
-        /* nothing to do */
-        return;
-    }
-
-    if (!can_write(bs)) {
-        error_setg(errp, "No write access");
-        return;
-    }
+    bool need_write = false;
 
     QSIMPLEQ_INIT(&drop_tables);
 
@@ -1494,6 +1485,8 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
             continue;
         }
 
+        need_write = true;
+
         if (check_constraints_on_bitmap(bs, name, granularity, errp) < 0) {
             error_prepend(errp, "Bitmap '%s' doesn't satisfy the constraints: ",
                           name);
@@ -1532,6 +1525,15 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
         bm->dirty_bitmap = bitmap;
     }
 
+    if (!need_write) {
+        goto success;
+    }
+
+    if (!can_write(bs)) {
+        error_setg(errp, "No write access");
+        goto fail;
+    }
+
     /* allocate clusters and store bitmaps */
     QSIMPLEQ_FOREACH(bm, bm_list, entry) {
         if (bm->dirty_bitmap == NULL) {
@@ -1573,6 +1575,7 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
         bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
     }
 
+success:
     bitmap_list_free(bm_list);
     return;
 
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-23 15:47 [Qemu-devel] [PATCH 0/3] qcow2-bitmaps: rewrite reopening logic Vladimir Sementsov-Ogievskiy
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit Vladimir Sementsov-Ogievskiy
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 2/3] block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Vladimir Sementsov-Ogievskiy
@ 2019-05-23 15:47 ` Vladimir Sementsov-Ogievskiy
  2019-05-28 23:24   ` John Snow
  2019-05-29 15:33   ` Max Reitz
  2 siblings, 2 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-23 15:47 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: fam, kwolf, vsementsov, mreitz, den, jsnow

Current logic
=============

Reopen rw -> ro:

Store bitmaps and release BdrvDirtyBitmap's.

Reopen ro -> rw:

Load bitmap list
Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
   the worst thing]
A kind of fail with error message if we see not readonly bitmap

This leads to at least the following bug:

Assume we have bitmap0.
Create external snapshot
  bitmap0 is stored and therefore removed
Commit snapshot
  now we have no bitmaps
Do some writes from guest (*)
  they are not marked in bitmap
Shutdown
Start
  bitmap0 is loaded as valid, but it is actually broken! It misses
  writes (*)
Incremental backup
  it will be inconsistent

New logic
=========

Reopen rw -> ro:

Store bitmaps and don't remove them from RAM, only mark them readonly
(the latter is already done, but didn't work properly because of
removing in storing function)

Reopen to rw (don't filter case with reopen rw -> rw, it is supported
now in qcow2_reopen_bitmaps_rw)

Load bitmap list

Carefully consider all possible combinations of bitmaps in RAM and in
the image, try to fix corruptions, print corresponding error messages.

Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
commited, as otherwise we can't write to the qcow2 file child on reopen
ro -> rw.

Also, add corresponding test-cases to 255.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h              |   5 +-
 include/block/block_int.h  |   2 +-
 block.c                    |  29 ++----
 block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
 block/qcow2.c              |   2 +-
 tests/qemu-iotests/255     |   2 +
 tests/qemu-iotests/255.out |  35 +++++++
 7 files changed, 193 insertions(+), 79 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 8d92ef1fee..5928306e62 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -725,9 +725,10 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
                                                 Error **errp);
 int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
                                  Error **errp);
-int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
+void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
 int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
-void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp);
+void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
+                                          bool release_stored, Error **errp);
 int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
 bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
                                       const char *name,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 1eebc7c8f3..9a694f3da0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -536,7 +536,7 @@ struct BlockDriver {
      * as rw. This handler should realize it. It also should unset readonly
      * field of BlockDirtyBitmap's in case of success.
      */
-    int (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs, Error **errp);
+    void (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs);
     bool (*bdrv_can_store_new_dirty_bitmap)(BlockDriverState *bs,
                                             const char *name,
                                             uint32_t granularity,
diff --git a/block.c b/block.c
index cb11537029..db1ec0c673 100644
--- a/block.c
+++ b/block.c
@@ -3334,6 +3334,16 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
         bdrv_reopen_commit(&bs_entry->state);
     }
 
+    QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
+        BlockDriverState *bs = bs_entry->state.bs;
+
+        if (!bdrv_is_writable(bs) || !bs->drv->bdrv_reopen_bitmaps_rw) {
+            continue;
+        }
+
+        bs->drv->bdrv_reopen_bitmaps_rw(bs);
+    }
+
     ret = 0;
 cleanup_perm:
     QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
@@ -3770,16 +3780,12 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
     BlockDriver *drv;
     BlockDriverState *bs;
     BdrvChild *child;
-    bool old_can_write, new_can_write;
 
     assert(reopen_state != NULL);
     bs = reopen_state->bs;
     drv = bs->drv;
     assert(drv != NULL);
 
-    old_can_write =
-        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
-
     /* If there are any driver level actions to take */
     if (drv->bdrv_reopen_commit) {
         drv->bdrv_reopen_commit(reopen_state);
@@ -3823,21 +3829,6 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
     }
 
     bdrv_refresh_limits(bs, NULL);
-
-    new_can_write =
-        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
-    if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
-        Error *local_err = NULL;
-        if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
-            /* This is not fatal, bitmaps just left read-only, so all following
-             * writes will fail. User can remove read-only bitmaps to unblock
-             * writes.
-             */
-            error_reportf_err(local_err,
-                              "%s: Failed to make dirty bitmaps writable: ",
-                              bdrv_get_node_name(bs));
-        }
-    }
 }
 
 /*
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 2b84bfa007..4e565db11f 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -28,6 +28,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 
 #include "block/block_int.h"
 #include "qcow2.h"
@@ -951,6 +952,12 @@ static void set_readonly_helper(gpointer bitmap, gpointer value)
     bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
 }
 
+/* for g_slist_foreach for GSList of const char* elements */
+static void error_report_helper(gpointer message, gpointer _unused)
+{
+    error_report("%s", (const char *)message);
+}
+
 /* qcow2_load_dirty_bitmaps()
  * Return value is a hint for caller: true means that the Qcow2 header was
  * updated. (false doesn't mean that the header should be updated by the
@@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
     return list;
 }
 
-int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
-                                 Error **errp)
+/*
+ * qcow2_reopen_bitmaps_rw
+ *
+ * The function is called in bdrv_reopen_multiple after all calls to
+ * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
+ * write access to child bs, and with current reopening architecture, when
+ * reopen ro -> rw it is possible only after all reopen commits.
+ *
+ * So, we can't fail here. On the other hand, we may face different kinds of
+ * corruptions and/or just can't write IN_USE flags to the image due to EIO.
+ *
+ * Try to handle as many cases as we can.
+ */
+void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
     Qcow2BitmapList *bm_list;
     Qcow2Bitmap *bm;
-    GSList *ro_dirty_bitmaps = NULL;
+    GSList *ro_dirty_bitmaps = NULL, *corrupted_bitmaps = NULL;
+    Error *local_err = NULL;
     int ret = 0;
-
-    if (header_updated != NULL) {
-        *header_updated = false;
-    }
+    bool need_update = false, updated_ok = false;
 
     if (s->nb_bitmaps == 0) {
         /* No bitmaps - nothing to do */
-        return 0;
-    }
-
-    if (!can_write(bs)) {
-        error_setg(errp, "Can't write to the image on reopening bitmaps rw");
-        return -EINVAL;
+        return;
     }
 
     bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
-                               s->bitmap_directory_size, errp);
+                               s->bitmap_directory_size, &local_err);
     if (bm_list == NULL) {
-        return -EINVAL;
+        error_reportf_err(local_err, "Failed to reopen bitmaps rw: "
+                          "cannot load bitmap list: ");
+        return;
     }
 
     QSIMPLEQ_FOREACH(bm, bm_list, entry) {
         BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(bs, bm->name);
-        if (bitmap == NULL) {
-            continue;
-        }
+        const char *corruption = NULL;
 
-        if (!bdrv_dirty_bitmap_readonly(bitmap)) {
-            error_setg(errp, "Bitmap %s was loaded prior to rw-reopen, but was "
-                       "not marked as readonly. This is a bug, something went "
-                       "wrong. All of the bitmaps may be corrupted", bm->name);
-            ret = -EINVAL;
-            goto out;
-        }
+        if (bm->flags & BME_FLAG_IN_USE) {
+            if (bitmap == NULL) {
+                /*
+                 * It's an unexpected inconsistent bitmap,
+                 * it is safe to ignore it.
+                 */
+                continue;
+            }
 
-        bm->flags |= BME_FLAG_IN_USE;
-        ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
+            /*
+             * It's either an inconsistent bitmap, or we are reopening rw -> rw,
+             * or we just didn't save bitmap for some buggy reason. Still, no
+             * reason to consider in-RAM bitmap inconsistent, than, mark it rw.
+             */
+            bdrv_dirty_bitmap_set_readonly(bitmap, false);
+        } else {
+            /*
+             * In-image bitmap not marked IN_USE
+             *
+             * The only valid case is
+             *     bitmap && bdrv_dirty_bitmap_readonly(bitmap) &&
+             *        !bdrv_dirty_bitmap_inconsistent(bitmap))
+             *
+             * which means reopen ro -> rw of consistent bitmap.
+             *
+             * Other cases a different kinds of corruptions:
+             */
+            if (!bitmap) {
+                corruption =
+                    "Corruption: unexpected valid bitmap '%s' is found in the "
+                    "image '%s' on reopen rw. Will try to set IN_USE flag.";
+
+                bitmap = load_bitmap(bs, bm, NULL);
+                if (!bitmap) {
+                    bitmap = bdrv_create_dirty_bitmap(
+                            bs, bdrv_get_default_bitmap_granularity(bs),
+                            bm->name, NULL);
+                }
+
+                if (bitmap) {
+                    bdrv_dirty_bitmap_set_persistence(bitmap, true);
+                    bdrv_dirty_bitmap_set_readonly(bitmap, true);
+                    bdrv_dirty_bitmap_set_inconsistent(bitmap);
+                }
+            } else if (!bdrv_dirty_bitmap_readonly(bitmap)) {
+                corruption =
+                    "Corruption: bitmap '%s' is not marked IN_USE in the "
+                    "image '%s' and not marked readonly in RAM. Will try to "
+                    "set IN_USE flag.";
+
+                bdrv_dirty_bitmap_set_readonly(bitmap, true);
+                bdrv_dirty_bitmap_set_inconsistent(bitmap);
+            } else if (bdrv_dirty_bitmap_inconsistent(bitmap)) {
+                corruption =
+                    "Corruption: bitmap '%s' is inconsistent but is not marked "
+                    "IN_USE in the image. Will try to set IN_USE flag.";
+
+                bdrv_dirty_bitmap_set_readonly(bitmap, true);
+            }
+
+            if (bitmap) {
+                ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
+            }
+            bm->flags |= BME_FLAG_IN_USE;
+            need_update = true;
+            if (corruption) {
+                error_report(corruption, bm->name, bs->filename);
+                corrupted_bitmaps = g_slist_append(corrupted_bitmaps, bm->name);
+            }
+        }
     }
 
-    if (ro_dirty_bitmaps != NULL) {
+    if (need_update) {
+        if (!can_write(bs->file->bs)) {
+            error_report("Failed to reopen bitmaps rw: cannot write to "
+                         "the protocol file");
+            goto handle_corrupted;
+        }
+
         /* in_use flags must be updated */
         ret = update_ext_header_and_dir_in_place(bs, bm_list);
         if (ret < 0) {
-            error_setg_errno(errp, -ret, "Can't update bitmap directory");
-            goto out;
-        }
-        if (header_updated != NULL) {
-            *header_updated = true;
+            error_report("Cannot update bitmap directory: %s", strerror(-ret));
+            goto handle_corrupted;
         }
+        updated_ok = true;
         g_slist_foreach(ro_dirty_bitmaps, set_readonly_helper, false);
     }
 
-out:
+handle_corrupted:
+    if (corrupted_bitmaps) {
+        if (updated_ok) {
+            error_report("Corrupted bitmaps in '%s' successfully marked "
+                         "IN_USE", bs->filename);
+        } else {
+            error_report("Failed to mark IN_USE the following corrupted "
+                         "bitmaps in '%s', DO NOT USE THEM:", bs->filename);
+            g_slist_foreach(corrupted_bitmaps, error_report_helper, NULL);
+        }
+    }
+
     g_slist_free(ro_dirty_bitmaps);
+    g_slist_free(corrupted_bitmaps);
     bitmap_list_free(bm_list);
-
-    return ret;
-}
-
-int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
-{
-    return qcow2_reopen_bitmaps_rw_hint(bs, NULL, errp);
 }
 
 /* Checks to see if it's safe to resize bitmaps */
@@ -1446,7 +1527,8 @@ fail:
     bitmap_list_free(bm_list);
 }
 
-void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
+void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
+                                          bool release_stored, Error **errp)
 {
     BdrvDirtyBitmap *bitmap;
     BDRVQcow2State *s = bs->opaque;
@@ -1559,20 +1641,23 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
         g_free(tb);
     }
 
-    QSIMPLEQ_FOREACH(bm, bm_list, entry) {
-        /* For safety, we remove bitmap after storing.
-         * We may be here in two cases:
-         * 1. bdrv_close. It's ok to drop bitmap.
-         * 2. inactivation. It means migration without 'dirty-bitmaps'
-         *    capability, so bitmaps are not marked with
-         *    BdrvDirtyBitmap.migration flags. It's not bad to drop them too,
-         *    and reload on invalidation.
-         */
-        if (bm->dirty_bitmap == NULL) {
-            continue;
-        }
+    if (release_stored) {
+        QSIMPLEQ_FOREACH(bm, bm_list, entry) {
+            /*
+             * For safety, we remove bitmap after storing.
+             * We may be here in two cases:
+             * 1. bdrv_close. It's ok to drop bitmap.
+             * 2. inactivation. It means migration without 'dirty-bitmaps'
+             *    capability, so bitmaps are not marked with
+             *    BdrvDirtyBitmap.migration flags. It's not bad to drop them
+             *    too, and reload on invalidation.
+             */
+            if (bm->dirty_bitmap == NULL) {
+                continue;
+            }
 
-        bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
+            bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
+        }
     }
 
 success:
@@ -1600,7 +1685,7 @@ int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
     BdrvDirtyBitmap *bitmap;
     Error *local_err = NULL;
 
-    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
+    qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
         return -EINVAL;
diff --git a/block/qcow2.c b/block/qcow2.c
index d39882785d..f0a8479874 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2273,7 +2273,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
     int ret, result = 0;
     Error *local_err = NULL;
 
-    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
+    qcow2_store_persistent_dirty_bitmaps(bs, true, &local_err);
     if (local_err != NULL) {
         result = -EINVAL;
         error_reportf_err(local_err, "Lost persistent bitmaps during "
diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
index 36712689d3..e8b0c9d4bb 100755
--- a/tests/qemu-iotests/255
+++ b/tests/qemu-iotests/255
@@ -79,3 +79,5 @@ def test(persistent, restart):
 
 
 test(persistent=False, restart=False)
+test(persistent=True, restart=False)
+test(persistent=True, restart=True)
diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
index 2bffb486d2..46e2e3ad4e 100644
--- a/tests/qemu-iotests/255.out
+++ b/tests/qemu-iotests/255.out
@@ -15,3 +15,38 @@ check, that no bitmaps in snapshot: not found
 {"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 check merged bitmap: name=bitmap0 dirty-clusters=2
 check updated bitmap: name=bitmap0 dirty-clusters=3
+
+Testcase persistent without restart
+
+{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
+{"return": {}}
+initial bitmap: name=bitmap0 dirty-clusters=1
+{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
+{"return": {}}
+check, that no bitmaps in snapshot: not found
+{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+check merged bitmap: name=bitmap0 dirty-clusters=2
+check updated bitmap: name=bitmap0 dirty-clusters=3
+
+Testcase persistent with restart
+
+{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
+{"return": {}}
+initial bitmap: name=bitmap0 dirty-clusters=1
+{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
+{"return": {}}
+check, that no bitmaps in snapshot: not found
+... Restart ...
+{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+check merged bitmap: name=bitmap0 dirty-clusters=2
+check updated bitmap: name=bitmap0 dirty-clusters=3
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit Vladimir Sementsov-Ogievskiy
@ 2019-05-24 23:15   ` John Snow
  2019-05-25  9:16     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 17+ messages in thread
From: John Snow @ 2019-05-24 23:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, den, mreitz



On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Two testcases with persistent bitmaps are not added here, as there are
> bugs to be fixed soon.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  python/qemu/__init__.py    |  4 +-
>  tests/qemu-iotests/255     | 81 ++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/255.out | 17 ++++++++
>  tests/qemu-iotests/group   |  1 +
>  4 files changed, 102 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/255
>  create mode 100644 tests/qemu-iotests/255.out
> 
> diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py
> index 81d9657ec0..4c4317a55e 100644
> --- a/python/qemu/__init__.py
> +++ b/python/qemu/__init__.py
> @@ -402,7 +402,7 @@ class QEMUMachine(object):
>          self._qmp.clear_events()
>          return events
>  
> -    def event_wait(self, name, timeout=60.0, match=None):
> +    def event_wait(self, name, timeout=60.0, match=None, fail_on=None):
>          """
>          Wait for specified timeout on named event in QMP; optionally filter
>          results by match.
> @@ -430,6 +430,7 @@ class QEMUMachine(object):
>  
>          # Search cached events
>          for event in self._events:
> +            assert event['event'] != fail_on
>              if (event['event'] == name) and event_match(event, match):
>                  self._events.remove(event)
>                  return event
> @@ -437,6 +438,7 @@ class QEMUMachine(object):
>          # Poll for new events
>          while True:
>              event = self._qmp.pull_event(wait=timeout)
> +            assert event['event'] != fail_on
>              if (event['event'] == name) and event_match(event, match):
>                  return event
>              self._events.append(event)

I'd rather not put assertions directly in the QEMUMachine code, unless
it's actually an assertion and not a test failure, because this code is
not necessarily meant to be used exclusively for tests.

> diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
> new file mode 100755
> index 0000000000..36712689d3
> --- /dev/null
> +++ b/tests/qemu-iotests/255
> @@ -0,0 +1,81 @@
> +#!/usr/bin/env python
> +#
> +# Tests for temporary external snapshot when we have bitmaps.
> +#
> +# Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +#
> +
> +import iotests
> +from iotests import qemu_img_create, file_path, log
> +
> +iotests.verify_image_format(supported_fmts=['qcow2'])
> +
> +base, top = file_path('base', 'top')
> +size = 64 * 1024 * 3
> +
> +
> +def print_bitmap(msg, vm):
> +    result = vm.qmp('query-block')['return'][0]
> +    if 'dirty-bitmaps' in result:
> +        bitmap = result['dirty-bitmaps'][0]
> +        log('{}: name={} dirty-clusters={}'.format(msg, bitmap['name'],
> +            bitmap['count'] // 64 // 1024))
> +    else:
> +        log(msg + ': not found')
> +
> +
> +def test(persistent, restart):
> +    assert persistent or not restart
> +    log("\nTestcase {}persistent {} restart\n".format(
> +            '' if persistent else 'non-', 'with' if restart else 'without'))
> +
> +    qemu_img_create('-f', iotests.imgfmt, base, str(size))
> +
> +    vm = iotests.VM().add_drive(base)
> +    vm.launch()
> +
> +    vm.qmp_log('block-dirty-bitmap-add', node='drive0', name='bitmap0',
> +               persistent=persistent)
> +    vm.hmp_qemu_io('drive0', 'write 0 64K')
> +    print_bitmap('initial bitmap', vm)
> +
> +    vm.qmp_log('blockdev-snapshot-sync', device='drive0', snapshot_file=top,
> +               format=iotests.imgfmt, filters=[iotests.filter_qmp_testfiles])
> +    vm.hmp_qemu_io('drive0', 'write 64K 512')
> +    print_bitmap('check, that no bitmaps in snapshot', vm)

'check that no bitmaps are in snapshot', probably.

> +
> +    if restart:
> +        log("... Restart ...")
> +        vm.shutdown()
> +        vm = iotests.VM().add_drive(top)
> +        vm.launch()
> +
> +    vm.qmp_log('block-commit', device='drive0', top=top,
> +               filters=[iotests.filter_qmp_testfiles])
> +    log(vm.event_wait('BLOCK_JOB_READY', fail_on='BLOCK_JOB_COMPLETED'),
> +        filters=[iotests.filter_qmp_event])

Looks like you probably saw some interesting things during development.
You shouldn't see COMPLETED before READY, should you? Is this necessary?

> +    vm.qmp_log('block-job-complete', device='drive0')
> +    log(vm.event_wait('BLOCK_JOB_COMPLETED'),
> +        filters=[iotests.filter_qmp_event])

(Just musing: we probably want a filtered version of event_wait for
iotests.py so we don't have to type this so much.)

> +    print_bitmap('check merged bitmap', vm)
> +

Merged? We don't /merge/ the bitmaps, do we? I guess you mean to say
something like: "Check bitmaps on merged/converged node", right?

> +    vm.hmp_qemu_io('drive0', 'write 128K 64K')
> +    print_bitmap('check updated bitmap', vm)
> +
> +    vm.shutdown()
> +
> +
> +test(persistent=False, restart=False)
> diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
> new file mode 100644
> index 0000000000..2bffb486d2
> --- /dev/null
> +++ b/tests/qemu-iotests/255.out
> @@ -0,0 +1,17 @@
> +
> +Testcase non-persistent without restart
> +
> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": false}}
> +{"return": {}}
> +initial bitmap: name=bitmap0 dirty-clusters=1
> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
> +{"return": {}}
> +check, that no bitmaps in snapshot: not found
> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
> +{"return": {}}
> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
> +{"return": {}}
> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +check merged bitmap: name=bitmap0 dirty-clusters=2
> +check updated bitmap: name=bitmap0 dirty-clusters=3
> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index 52b7c16e15..2758f48143 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -251,3 +251,4 @@
>  249 rw auto quick
>  252 rw auto backing quick
>  253 rw auto quick
> +255 rw auto quick
> 

Good test, I just have to dig through patch 3.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 2/3] block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Vladimir Sementsov-Ogievskiy
@ 2019-05-24 23:37   ` John Snow
  0 siblings, 0 replies; 17+ messages in thread
From: John Snow @ 2019-05-24 23:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, den, mreitz



On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Firstly, no reason to optimize failure path. Then, function name is
> ambiguous: it checks for readonly and similar things, but someone may
> think that it will ignore normal bitmaps which was just unchanged, and
> this is in bad relation with the fact that we should drop IN_USE flag
> for unchanged bitmaps in the image.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/dirty-bitmap.h |  1 -
>  block/dirty-bitmap.c         | 12 ------------
>  block/qcow2-bitmap.c         | 23 +++++++++++++----------
>  3 files changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 8044ace63e..816022972b 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -105,7 +105,6 @@ bool bdrv_has_readonly_bitmaps(BlockDriverState *bs);
>  bool bdrv_dirty_bitmap_get_autoload(const BdrvDirtyBitmap *bitmap);
>  bool bdrv_dirty_bitmap_get_persistence(BdrvDirtyBitmap *bitmap);
>  bool bdrv_dirty_bitmap_inconsistent(const BdrvDirtyBitmap *bitmap);
> -bool bdrv_has_changed_persistent_bitmaps(BlockDriverState *bs);
>  BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
>                                          BdrvDirtyBitmap *bitmap);
>  char *bdrv_dirty_bitmap_sha256(const BdrvDirtyBitmap *bitmap, Error **errp);
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 59e6ebb861..eca2eed0bf 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -775,18 +775,6 @@ bool bdrv_dirty_bitmap_inconsistent(const BdrvDirtyBitmap *bitmap)
>      return bitmap->inconsistent;
>  }
>  
> -bool bdrv_has_changed_persistent_bitmaps(BlockDriverState *bs)
> -{
> -    BdrvDirtyBitmap *bm;
> -    QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
> -        if (bm->persistent && !bm->readonly && !bm->migration) {

The loop down below has these conditionals for skipping bitmaps:

        if (!bdrv_dirty_bitmap_get_persistence(bitmap) ||
            bdrv_dirty_bitmap_readonly(bitmap) ||
            bdrv_dirty_bitmap_inconsistent(bitmap)) {
            continue;
        }

It looks like a semantic change, but hiding inside of get_persistence is
this:

bitmap->persistent && !bitmap->migration;

So this is equivalent, actually. It's not readily apparent at a glance.

> -            return true;
> -        }
> -    }
> -
> -    return false;
> -}
> -
>  BdrvDirtyBitmap *bdrv_dirty_bitmap_next(BlockDriverState *bs,
>                                          BdrvDirtyBitmap *bitmap)
>  {
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 8a75366c92..2b84bfa007 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -1457,16 +1457,7 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>      Qcow2Bitmap *bm;
>      QSIMPLEQ_HEAD(, Qcow2BitmapTable) drop_tables;
>      Qcow2BitmapTable *tb, *tb_next;
> -
> -    if (!bdrv_has_changed_persistent_bitmaps(bs)) {
> -        /* nothing to do */
> -        return;
> -    }
> -
> -    if (!can_write(bs)) {
> -        error_setg(errp, "No write access");
> -        return;
> -    }
> +    bool need_write = false;
>  
>      QSIMPLEQ_INIT(&drop_tables);
>  
> @@ -1494,6 +1485,8 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>              continue;
>          }
>  
> +        need_write = true;
> +
>          if (check_constraints_on_bitmap(bs, name, granularity, errp) < 0) {
>              error_prepend(errp, "Bitmap '%s' doesn't satisfy the constraints: ",
>                            name);
> @@ -1532,6 +1525,15 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>          bm->dirty_bitmap = bitmap;
>      }
>  
> +    if (!need_write) {
> +        goto success;
> +    }
> +
> +    if (!can_write(bs)) {
> +        error_setg(errp, "No write access");
> +        goto fail;
> +    }
> +
>      /* allocate clusters and store bitmaps */
>      QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>          if (bm->dirty_bitmap == NULL) {
> @@ -1573,6 +1575,7 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>          bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>      }
>  
> +success:
>      bitmap_list_free(bm_list);
>      return;
>  
> 

Alright, interesting. You're right that the function you're removing is
pretty badly named for what it actually does. I got confused by that not
too long ago.

The rationale against optimizing the error path works for me, too.

Seems like it's equivalent before and after, so:

Reviewed-by: John Snow <jsnow@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit
  2019-05-24 23:15   ` John Snow
@ 2019-05-25  9:16     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-25  9:16 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: fam, kwolf, Denis Lunev, mreitz

25.05.2019 2:15, John Snow wrote:
> 
> 
> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Two testcases with persistent bitmaps are not added here, as there are
>> bugs to be fixed soon.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   python/qemu/__init__.py    |  4 +-
>>   tests/qemu-iotests/255     | 81 ++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/255.out | 17 ++++++++
>>   tests/qemu-iotests/group   |  1 +
>>   4 files changed, 102 insertions(+), 1 deletion(-)
>>   create mode 100755 tests/qemu-iotests/255
>>   create mode 100644 tests/qemu-iotests/255.out
>>
>> diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py
>> index 81d9657ec0..4c4317a55e 100644
>> --- a/python/qemu/__init__.py
>> +++ b/python/qemu/__init__.py
>> @@ -402,7 +402,7 @@ class QEMUMachine(object):
>>           self._qmp.clear_events()
>>           return events
>>   
>> -    def event_wait(self, name, timeout=60.0, match=None):
>> +    def event_wait(self, name, timeout=60.0, match=None, fail_on=None):
>>           """
>>           Wait for specified timeout on named event in QMP; optionally filter
>>           results by match.
>> @@ -430,6 +430,7 @@ class QEMUMachine(object):
>>   
>>           # Search cached events
>>           for event in self._events:
>> +            assert event['event'] != fail_on
>>               if (event['event'] == name) and event_match(event, match):
>>                   self._events.remove(event)
>>                   return event
>> @@ -437,6 +438,7 @@ class QEMUMachine(object):
>>           # Poll for new events
>>           while True:
>>               event = self._qmp.pull_event(wait=timeout)
>> +            assert event['event'] != fail_on
>>               if (event['event'] == name) and event_match(event, match):
>>                   return event
>>               self._events.append(event)
> 
> I'd rather not put assertions directly in the QEMUMachine code, unless
> it's actually an assertion and not a test failure, because this code is
> not necessarily meant to be used exclusively for tests.

It was most simple thing to do to stop test execution not handling returned event..
But I think you are right, I'll rethink it.

> 
>> diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
>> new file mode 100755
>> index 0000000000..36712689d3
>> --- /dev/null
>> +++ b/tests/qemu-iotests/255
>> @@ -0,0 +1,81 @@
>> +#!/usr/bin/env python
>> +#
>> +# Tests for temporary external snapshot when we have bitmaps.
>> +#
>> +# Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +#
>> +
>> +import iotests
>> +from iotests import qemu_img_create, file_path, log
>> +
>> +iotests.verify_image_format(supported_fmts=['qcow2'])
>> +
>> +base, top = file_path('base', 'top')
>> +size = 64 * 1024 * 3
>> +
>> +
>> +def print_bitmap(msg, vm):
>> +    result = vm.qmp('query-block')['return'][0]
>> +    if 'dirty-bitmaps' in result:
>> +        bitmap = result['dirty-bitmaps'][0]
>> +        log('{}: name={} dirty-clusters={}'.format(msg, bitmap['name'],
>> +            bitmap['count'] // 64 // 1024))
>> +    else:
>> +        log(msg + ': not found')
>> +
>> +
>> +def test(persistent, restart):
>> +    assert persistent or not restart
>> +    log("\nTestcase {}persistent {} restart\n".format(
>> +            '' if persistent else 'non-', 'with' if restart else 'without'))
>> +
>> +    qemu_img_create('-f', iotests.imgfmt, base, str(size))
>> +
>> +    vm = iotests.VM().add_drive(base)
>> +    vm.launch()
>> +
>> +    vm.qmp_log('block-dirty-bitmap-add', node='drive0', name='bitmap0',
>> +               persistent=persistent)
>> +    vm.hmp_qemu_io('drive0', 'write 0 64K')
>> +    print_bitmap('initial bitmap', vm)
>> +
>> +    vm.qmp_log('blockdev-snapshot-sync', device='drive0', snapshot_file=top,
>> +               format=iotests.imgfmt, filters=[iotests.filter_qmp_testfiles])
>> +    vm.hmp_qemu_io('drive0', 'write 64K 512')
>> +    print_bitmap('check, that no bitmaps in snapshot', vm)
> 
> 'check that no bitmaps are in snapshot', probably.
> 
>> +
>> +    if restart:
>> +        log("... Restart ...")
>> +        vm.shutdown()
>> +        vm = iotests.VM().add_drive(top)
>> +        vm.launch()
>> +
>> +    vm.qmp_log('block-commit', device='drive0', top=top,
>> +               filters=[iotests.filter_qmp_testfiles])
>> +    log(vm.event_wait('BLOCK_JOB_READY', fail_on='BLOCK_JOB_COMPLETED'),
>> +        filters=[iotests.filter_qmp_event])
> 
> Looks like you probably saw some interesting things during development.
> You shouldn't see COMPLETED before READY, should you? Is this necessary?

When job failed, we may not have READY event but only COMPLETED with failure.

I think, I can instead improve event_wait to support list of events to wait,
and than check in test, what is it, READY or COMPLETED.

> 
>> +    vm.qmp_log('block-job-complete', device='drive0')
>> +    log(vm.event_wait('BLOCK_JOB_COMPLETED'),
>> +        filters=[iotests.filter_qmp_event])
> 
> (Just musing: we probably want a filtered version of event_wait for
> iotests.py so we don't have to type this so much.)

agree

> 
>> +    print_bitmap('check merged bitmap', vm)
>> +
> 
> Merged? We don't /merge/ the bitmaps, do we? I guess you mean to say
> something like: "Check bitmaps on merged/converged node", right?

Yes, something like this.

> 
>> +    vm.hmp_qemu_io('drive0', 'write 128K 64K')
>> +    print_bitmap('check updated bitmap', vm)
>> +
>> +    vm.shutdown()
>> +
>> +
>> +test(persistent=False, restart=False)
>> diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
>> new file mode 100644
>> index 0000000000..2bffb486d2
>> --- /dev/null
>> +++ b/tests/qemu-iotests/255.out
>> @@ -0,0 +1,17 @@
>> +
>> +Testcase non-persistent without restart
>> +
>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": false}}
>> +{"return": {}}
>> +initial bitmap: name=bitmap0 dirty-clusters=1
>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>> +{"return": {}}
>> +check, that no bitmaps in snapshot: not found
>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>> +{"return": {}}
>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>> +{"return": {}}
>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>> index 52b7c16e15..2758f48143 100644
>> --- a/tests/qemu-iotests/group
>> +++ b/tests/qemu-iotests/group
>> @@ -251,3 +251,4 @@
>>   249 rw auto quick
>>   252 rw auto backing quick
>>   253 rw auto quick
>> +255 rw auto quick
>>
> 
> Good test, I just have to dig through patch 3.
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic Vladimir Sementsov-Ogievskiy
@ 2019-05-28 23:24   ` John Snow
  2019-05-29  9:10     ` Vladimir Sementsov-Ogievskiy
  2019-05-29 15:33   ` Max Reitz
  1 sibling, 1 reply; 17+ messages in thread
From: John Snow @ 2019-05-28 23:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, den, mreitz



On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> Current logic
> =============
> 
> Reopen rw -> ro:
> 
> Store bitmaps and release BdrvDirtyBitmap's.
> 
> Reopen ro -> rw:
> 
> Load bitmap list
> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>    the worst thing]

...ah.

> A kind of fail with error message if we see not readonly bitmap
> 
> This leads to at least the following bug:
> 
> Assume we have bitmap0.

Presumably a normal, persistent bitmap.

> Create external snapshot
>   bitmap0 is stored and therefore removed

Written out to the backing file and removed from memory, because we've
reopened rw->ro; this is because of the migration "safety" clause where
we simply drop these bitmaps.

...Ah, and then we don't actually open them readonly again; that entire
stanza in reopen_ro never fires off at all because the bitmaps are
already gone.

> Commit snapshot
>   now we have no bitmaps

When we reopen the base as rw as you note, we skipped bitmaps for which
we had no in-memory bitmap for -- because the readonly logic was really
expecting to have these in-memory.

I should probably say that for the sake of migration correctness, the
way we flush things to disk and remove it from memory on write is really
bothersome to keep correct. The migration logic is so particular that it
keeps causing issues elsewhere, of which this is another symptom.

> Do some writes from guest (*)
>   they are not marked in bitmap

Yikes, right.

> Shutdown
> Start
>   bitmap0 is loaded as valid, but it is actually broken! It misses
>   writes (*)

Yikes; because it was consistent on flush and we skipped it on load,
it's not marked as IN_USE and we are free to load it up again.

> Incremental backup
>   it will be inconsistent
> 

Good writeup, thank you.

> New logic
> =========
> 
> Reopen rw -> ro:
> 
> Store bitmaps and don't remove them from RAM, only mark them readonly
> (the latter is already done, but didn't work properly because of
> removing in storing function)
> 

Yes! I like this change.

> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
> now in qcow2_reopen_bitmaps_rw)
> 

OK, so we allow rw --> rw without trying to be fussy about it. That
seems fine to me.

> Load bitmap list
> 
> Carefully consider all possible combinations of bitmaps in RAM and in
> the image, try to fix corruptions, print corresponding error messages.
> 
> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
> commited, as otherwise we can't write to the qcow2 file child on reopen
> ro -> rw.
> 

I take it this is the change that moved the invocation logic into
bdrv_reopen_multiple instead of bdrv_reopen_commit; also notably we no
longer check the transition edge which is much simpler.

oh, I see why you're doing it there now...

> Also, add corresponding test-cases to 255.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2.h              |   5 +-
>  include/block/block_int.h  |   2 +-
>  block.c                    |  29 ++----
>  block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>  block/qcow2.c              |   2 +-
>  tests/qemu-iotests/255     |   2 +
>  tests/qemu-iotests/255.out |  35 +++++++
>  7 files changed, 193 insertions(+), 79 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 8d92ef1fee..5928306e62 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -725,9 +725,10 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>                                                  Error **errp);
>  int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>                                   Error **errp);
> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
>  int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp);
> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> +                                          bool release_stored, Error **errp);
>  int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>  bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
>                                        const char *name,
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 1eebc7c8f3..9a694f3da0 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -536,7 +536,7 @@ struct BlockDriver {
>       * as rw. This handler should realize it. It also should unset readonly
>       * field of BlockDirtyBitmap's in case of success.
>       */
> -    int (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs, Error **errp);
> +    void (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs);
>      bool (*bdrv_can_store_new_dirty_bitmap)(BlockDriverState *bs,
>                                              const char *name,
>                                              uint32_t granularity,
> diff --git a/block.c b/block.c
> index cb11537029..db1ec0c673 100644
> --- a/block.c
> +++ b/block.c
> @@ -3334,6 +3334,16 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
>          bdrv_reopen_commit(&bs_entry->state);
>      }
>  
> +    QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
> +        BlockDriverState *bs = bs_entry->state.bs;
> +
> +        if (!bdrv_is_writable(bs) || !bs->drv->bdrv_reopen_bitmaps_rw) {
> +            continue;
> +        }
> +
> +        bs->drv->bdrv_reopen_bitmaps_rw(bs);
> +    }
> +

OK, so this has been moved up into the main body of the loop because we
cannot trust it to run in bdrv_reopen_commit because of the order in
which the nodes are reopened might leave us unable to write to our child
nodes to issue the IN_USE flag.

I have kept out of these discussions in the past; is there a general
solution that allows us to sort the block DAG leaf-up to avoid this
scenario?

Anyway, since the block graph organization isn't my problem I will say
that I think this is fine by me; but I'm not the one to impress here.

>      ret = 0;
>  cleanup_perm:
>      QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
> @@ -3770,16 +3780,12 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>      BlockDriver *drv;
>      BlockDriverState *bs;
>      BdrvChild *child;
> -    bool old_can_write, new_can_write;
>  
>      assert(reopen_state != NULL);
>      bs = reopen_state->bs;
>      drv = bs->drv;
>      assert(drv != NULL);
>  
> -    old_can_write =
> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
> -
>      /* If there are any driver level actions to take */
>      if (drv->bdrv_reopen_commit) {
>          drv->bdrv_reopen_commit(reopen_state);
> @@ -3823,21 +3829,6 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>      }
>  
>      bdrv_refresh_limits(bs, NULL);
> -
> -    new_can_write =
> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
> -    if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
> -        Error *local_err = NULL;
> -        if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
> -            /* This is not fatal, bitmaps just left read-only, so all following
> -             * writes will fail. User can remove read-only bitmaps to unblock
> -             * writes.
> -             */
> -            error_reportf_err(local_err,
> -                              "%s: Failed to make dirty bitmaps writable: ",
> -                              bdrv_get_node_name(bs));
> -        }
> -    }
>  }
>  

Certainly the new code is simpler here, which is good.

>  /*
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 2b84bfa007..4e565db11f 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -28,6 +28,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/cutils.h"
> +#include "qemu/error-report.h"
>  
>  #include "block/block_int.h"
>  #include "qcow2.h"
> @@ -951,6 +952,12 @@ static void set_readonly_helper(gpointer bitmap, gpointer value)
>      bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
>  }
>  
> +/* for g_slist_foreach for GSList of const char* elements */
> +static void error_report_helper(gpointer message, gpointer _unused)
> +{
> +    error_report("%s", (const char *)message);
> +}
> +
>  /* qcow2_load_dirty_bitmaps()
>   * Return value is a hint for caller: true means that the Qcow2 header was
>   * updated. (false doesn't mean that the header should be updated by the
> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>      return list;
>  }
>  
> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
> -                                 Error **errp)
> +/*
> + * qcow2_reopen_bitmaps_rw
> + *
> + * The function is called in bdrv_reopen_multiple after all calls to
> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
> + * write access to child bs, and with current reopening architecture, when
> + * reopen ro -> rw it is possible only after all reopen commits.
> + *
> + * So, we can't fail here. On the other hand, we may face different kinds of
> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
> + *
> + * Try to handle as many cases as we can.

Hm, I think you're right, but it does make me really uncomfortable that
we lose the ability to report errors back up the stack. I guess we
already always did ignore them, so this is no worse, but I don't like
the idea of adding new error_report_err calls if we can help it.

I guess we can't help it, though.

> + */
> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
>  {
>      BDRVQcow2State *s = bs->opaque;
>      Qcow2BitmapList *bm_list;
>      Qcow2Bitmap *bm;
> -    GSList *ro_dirty_bitmaps = NULL;
> +    GSList *ro_dirty_bitmaps = NULL, *corrupted_bitmaps = NULL;
> +    Error *local_err = NULL;
>      int ret = 0;
> -
> -    if (header_updated != NULL) {
> -        *header_updated = false;
> -    }
> +    bool need_update = false, updated_ok = false;
>  
>      if (s->nb_bitmaps == 0) {
>          /* No bitmaps - nothing to do */
> -        return 0;
> -    }
> -
> -    if (!can_write(bs)) {
> -        error_setg(errp, "Can't write to the image on reopening bitmaps rw");
> -        return -EINVAL;
> +        return;
>      }
>  
>      bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
> -                               s->bitmap_directory_size, errp);
> +                               s->bitmap_directory_size, &local_err);
>      if (bm_list == NULL) {
> -        return -EINVAL;
> +        error_reportf_err(local_err, "Failed to reopen bitmaps rw: "
> +                          "cannot load bitmap list: ");
> +        return;
>      }
>  
>      QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>          BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(bs, bm->name);
> -        if (bitmap == NULL) {
> -            continue;
> -        }
> +        const char *corruption = NULL;
>  
> -        if (!bdrv_dirty_bitmap_readonly(bitmap)) {
> -            error_setg(errp, "Bitmap %s was loaded prior to rw-reopen, but was "
> -                       "not marked as readonly. This is a bug, something went "
> -                       "wrong. All of the bitmaps may be corrupted", bm->name);
> -            ret = -EINVAL;
> -            goto out;
> -        }
> +        if (bm->flags & BME_FLAG_IN_USE) {
> +            if (bitmap == NULL) {
> +                /*
> +                 * It's an unexpected inconsistent bitmap,
> +                 * it is safe to ignore it.
> +                 */
> +                continue;
> +            }

This is supposed to be a reopen but we've found a bitmap we didn't load,
and it's IN_USE. Should we make any attempt to load it as an
inconsistent bitmap so the user can know about it?

If we're here, it means we DID reopen the image rw, so we ought to have
exclusive ownership of this file; the IN_USE flag here signals an
inconsistency, no?

>  
> -        bm->flags |= BME_FLAG_IN_USE;
> -        ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
> +            /*
> +             * It's either an inconsistent bitmap, or we are reopening rw -> rw,
> +             * or we just didn't save bitmap for some buggy reason. Still, no
> +             * reason to consider in-RAM bitmap inconsistent, than, mark it rw.
> +             */

I don't understand lines 2-3 of this comment. As far as I can tell:

- We might be seeing a legitimately corrupt bitmap. It's fine to mark it
as rw, because we can't write to it anyway. (It was marked inconsistent
on open.)
- We might be seeing a bitmap that's already properly rw. this call is
effectively a no-op.

Is that right? If that's true, I'd just simply say:

"This is either an inconsistent bitmap or we are reopening rw -> rw. It
is safe to mark it as not read only in either case."

What's the "or we just didn't save bitmap for some buggy reason" you are
alluding to?

> +            bdrv_dirty_bitmap_set_readonly(bitmap, false);
> +        } else {
> +            /*
> +             * In-image bitmap not marked IN_USE
> +             *
> +             * The only valid case is
> +             *     bitmap && bdrv_dirty_bitmap_readonly(bitmap) &&
> +             *        !bdrv_dirty_bitmap_inconsistent(bitmap))
> +             *
> +             * which means reopen ro -> rw of consistent bitmap.
> +             *
> +             * Other cases a different kinds of corruptions:
> +             */
> +            if (!bitmap) {
> +                corruption =
> +                    "Corruption: unexpected valid bitmap '%s' is found in the "
> +                    "image '%s' on reopen rw. Will try to set IN_USE flag.";
> +

In this case, we find a valid bitmap we expected to have a readonly copy
of in memory, but didn't. We attempt to load it.

> +                bitmap = load_bitmap(bs, bm, NULL);
> +                if (!bitmap) {
> +                    bitmap = bdrv_create_dirty_bitmap(
> +                            bs, bdrv_get_default_bitmap_granularity(bs),
> +                            bm->name, NULL);
> +                }
> +
> +                if (bitmap) {
> +                    bdrv_dirty_bitmap_set_persistence(bitmap, true);
> +                    bdrv_dirty_bitmap_set_readonly(bitmap, true);
> +                    bdrv_dirty_bitmap_set_inconsistent(bitmap);

And we mark it as inconsistent because we're not sure how we missed it
earlier. OK.

> +                }
> +            } else if (!bdrv_dirty_bitmap_readonly(bitmap)) {
> +                corruption =
> +                    "Corruption: bitmap '%s' is not marked IN_USE in the "
> +                    "image '%s' and not marked readonly in RAM. Will try to "
> +                    "set IN_USE flag.";
> +

And in this case, we find the bitmap but it's not marked readonly for
some reason.

> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);

Why set it readonly again?

> +                bdrv_dirty_bitmap_set_inconsistent(bitmap);

And just in case, we mark it inconsistent. (It's my impression that any
such write would have failed earlier, but maybe not.)

> +            } else if (bdrv_dirty_bitmap_inconsistent(bitmap)) {
> +                corruption =
> +                    "Corruption: bitmap '%s' is inconsistent but is not marked "
> +                    "IN_USE in the image. Will try to set IN_USE flag.";
> +
> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);

This one is weirder. We have an inconsistent bitmap but it's not IN_USE;
so we set it readonly again? Why?

> +            }
> +
> +            if (bitmap) {
> +                ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);

Oh, all those bitmaps we just set readonly we're going to mark as not
readonly again?

> +            }
> +            bm->flags |= BME_FLAG_IN_USE;
> +            need_update = true;
> +            if (corruption) {
> +                error_report(corruption, bm->name, bs->filename);
> +                corrupted_bitmaps = g_slist_append(corrupted_bitmaps, bm->name);
> +            }
> +        }
>      }
>  
> -    if (ro_dirty_bitmaps != NULL) {
> +    if (need_update) {
> +        if (!can_write(bs->file->bs)) {

I genuinely don't know: is it legitimate to check your child's write
permission in this way? will we always have bs->file->bs?

> +            error_report("Failed to reopen bitmaps rw: cannot write to "
> +                         "the protocol file");
> +            goto handle_corrupted;
> +        }
> +
>          /* in_use flags must be updated */
>          ret = update_ext_header_and_dir_in_place(bs, bm_list);
>          if (ret < 0) {
> -            error_setg_errno(errp, -ret, "Can't update bitmap directory");
> -            goto out;
> -        }
> -        if (header_updated != NULL) {
> -            *header_updated = true;
> +            error_report("Cannot update bitmap directory: %s", strerror(-ret));
> +            goto handle_corrupted;
>          }
> +        updated_ok = true;
>          g_slist_foreach(ro_dirty_bitmaps, set_readonly_helper, false);
>      }

And then here at the end the bulk of the work: we re-write the header if
necessary back out to disk to mark everything as IN_USE.

It seems strange that you are trying to maintain the invariant that we
won't set readonly to false if we fail, because this function "cannot fail."

>  
> -out:
> +handle_corrupted:
> +    if (corrupted_bitmaps) {
> +        if (updated_ok) {
> +            error_report("Corrupted bitmaps in '%s' successfully marked "
> +                         "IN_USE", bs->filename);
> +        } else {
> +            error_report("Failed to mark IN_USE the following corrupted "
> +                         "bitmaps in '%s', DO NOT USE THEM:", bs->filename);
> +            g_slist_foreach(corrupted_bitmaps, error_report_helper, NULL);
> +        }
> +    }
> +
>      g_slist_free(ro_dirty_bitmaps);
> +    g_slist_free(corrupted_bitmaps);
>      bitmap_list_free(bm_list);
> -
> -    return ret;
> -}
> -

I just don't know; for how much error checking this function is doing
now, it seems wrong to be behind an interface that cannot fail and will
actually do different things to the bitmaps depending on what it sees
with no return code to help the caller understand the cases.

It even has a bit at the end where it tries to print in uppercase to
manually scare a user into taking action, which says to me that there is
a deeper problem we need to be able to address without intervention from
the user.

That said, the patch does seem correct otherwise; and it does fix a
nasty bug which lets us use bitmaps with snapshots. I want this in for
4.1 if I can help it. I will talk to Kevin and Max and see if there's
some opinion here.

Thanks!

> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
> -{
> -    return qcow2_reopen_bitmaps_rw_hint(bs, NULL, errp);
>  }
>  
>  /* Checks to see if it's safe to resize bitmaps */
> @@ -1446,7 +1527,8 @@ fail:
>      bitmap_list_free(bm_list);
>  }
>  
> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
> +                                          bool release_stored, Error **errp)
>  {
>      BdrvDirtyBitmap *bitmap;
>      BDRVQcow2State *s = bs->opaque;
> @@ -1559,20 +1641,23 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>          g_free(tb);
>      }
>  
> -    QSIMPLEQ_FOREACH(bm, bm_list, entry) {
> -        /* For safety, we remove bitmap after storing.
> -         * We may be here in two cases:
> -         * 1. bdrv_close. It's ok to drop bitmap.
> -         * 2. inactivation. It means migration without 'dirty-bitmaps'
> -         *    capability, so bitmaps are not marked with
> -         *    BdrvDirtyBitmap.migration flags. It's not bad to drop them too,
> -         *    and reload on invalidation.
> -         */
> -        if (bm->dirty_bitmap == NULL) {
> -            continue;
> -        }
> +    if (release_stored) {
> +        QSIMPLEQ_FOREACH(bm, bm_list, entry) {
> +            /*
> +             * For safety, we remove bitmap after storing.
> +             * We may be here in two cases:
> +             * 1. bdrv_close. It's ok to drop bitmap.
> +             * 2. inactivation. It means migration without 'dirty-bitmaps'
> +             *    capability, so bitmaps are not marked with
> +             *    BdrvDirtyBitmap.migration flags. It's not bad to drop them
> +             *    too, and reload on invalidation.
> +             */

While we're here, I might touch up the comment.

For safety, the BdrvDirtyBitmap can be dropped after storing.
We may be here in two cases, both via qcow2_inactivate:
1. bdrv_close: It's correct to remove bitmaps on close.
2. migration: This implies we are migrating without the
   'dirty-bitmaps' capability, because bitmap->migration was unset.
   If needed, these bitmaps will be reloaded on invalidation.

I just wanted to clarify these points:

- The new boolean obviously means we don't /always/ drop them
- qcow2_inactivate is the only caller that instructs us to drop them
- both cases in the comment are through qcow2_inactivate.

you can take any or none of my wording suggestions as you feel is
appropriate.

> +            if (bm->dirty_bitmap == NULL) {
> +                continue;
> +            }
>  
> -        bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
> +            bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
> +        }
>      }
>  
>  success:
> @@ -1600,7 +1685,7 @@ int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
>      BdrvDirtyBitmap *bitmap;
>      Error *local_err = NULL;
>  
> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
> +    qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err);
>      if (local_err != NULL) {
>          error_propagate(errp, local_err);
>          return -EINVAL;
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d39882785d..f0a8479874 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2273,7 +2273,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>      int ret, result = 0;
>      Error *local_err = NULL;
>  
> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
> +    qcow2_store_persistent_dirty_bitmaps(bs, true, &local_err);
>      if (local_err != NULL) {
>          result = -EINVAL;
>          error_reportf_err(local_err, "Lost persistent bitmaps during "
> diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
> index 36712689d3..e8b0c9d4bb 100755
> --- a/tests/qemu-iotests/255
> +++ b/tests/qemu-iotests/255
> @@ -79,3 +79,5 @@ def test(persistent, restart):
>  
>  
>  test(persistent=False, restart=False)
> +test(persistent=True, restart=False)
> +test(persistent=True, restart=True)
> diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
> index 2bffb486d2..46e2e3ad4e 100644
> --- a/tests/qemu-iotests/255.out
> +++ b/tests/qemu-iotests/255.out
> @@ -15,3 +15,38 @@ check, that no bitmaps in snapshot: not found
>  {"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>  check merged bitmap: name=bitmap0 dirty-clusters=2
>  check updated bitmap: name=bitmap0 dirty-clusters=3
> +
> +Testcase persistent without restart
> +
> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
> +{"return": {}}
> +initial bitmap: name=bitmap0 dirty-clusters=1
> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
> +{"return": {}}
> +check, that no bitmaps in snapshot: not found
> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
> +{"return": {}}
> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
> +{"return": {}}
> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +check merged bitmap: name=bitmap0 dirty-clusters=2
> +check updated bitmap: name=bitmap0 dirty-clusters=3
> +
> +Testcase persistent with restart
> +
> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
> +{"return": {}}
> +initial bitmap: name=bitmap0 dirty-clusters=1
> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
> +{"return": {}}
> +check, that no bitmaps in snapshot: not found
> +... Restart ...
> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
> +{"return": {}}
> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
> +{"return": {}}
> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +check merged bitmap: name=bitmap0 dirty-clusters=2
> +check updated bitmap: name=bitmap0 dirty-clusters=3
> 

-- 
—js


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-28 23:24   ` John Snow
@ 2019-05-29  9:10     ` Vladimir Sementsov-Ogievskiy
  2019-05-29 18:08       ` John Snow
  0 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-29  9:10 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: fam, kwolf, Denis Lunev, mreitz

29.05.2019 2:24, John Snow wrote:
> 
> 
> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Current logic
>> =============
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and release BdrvDirtyBitmap's.
>>
>> Reopen ro -> rw:
>>
>> Load bitmap list
>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>     the worst thing]
> 
> ...ah.
> 
>> A kind of fail with error message if we see not readonly bitmap
>>
>> This leads to at least the following bug:
>>
>> Assume we have bitmap0.
> 
> Presumably a normal, persistent bitmap.
> 
>> Create external snapshot
>>    bitmap0 is stored and therefore removed
> 
> Written out to the backing file and removed from memory, because we've
> reopened rw->ro; this is because of the migration "safety" clause where
> we simply drop these bitmaps.
> 
> ...Ah, and then we don't actually open them readonly again; that entire
> stanza in reopen_ro never fires off at all because the bitmaps are
> already gone.
> 
>> Commit snapshot
>>    now we have no bitmaps
> 
> When we reopen the base as rw as you note, we skipped bitmaps for which
> we had no in-memory bitmap for -- because the readonly logic was really
> expecting to have these in-memory.
> 
> I should probably say that for the sake of migration correctness, the
> way we flush things to disk and remove it from memory on write is really
> bothersome to keep correct. The migration logic is so particular that it
> keeps causing issues elsewhere, of which this is another symptom.
> 
>> Do some writes from guest (*)
>>    they are not marked in bitmap
> 
> Yikes, right.
> 
>> Shutdown
>> Start
>>    bitmap0 is loaded as valid, but it is actually broken! It misses
>>    writes (*)
> 
> Yikes; because it was consistent on flush and we skipped it on load,
> it's not marked as IN_USE and we are free to load it up again.
> 
>> Incremental backup
>>    it will be inconsistent
>>
> 
> Good writeup, thank you.
> 
>> New logic
>> =========
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and don't remove them from RAM, only mark them readonly
>> (the latter is already done, but didn't work properly because of
>> removing in storing function)
>>
> 
> Yes! I like this change.
> 
>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>> now in qcow2_reopen_bitmaps_rw)
>>
> 
> OK, so we allow rw --> rw without trying to be fussy about it. That
> seems fine to me.
> 
>> Load bitmap list
>>
>> Carefully consider all possible combinations of bitmaps in RAM and in
>> the image, try to fix corruptions, print corresponding error messages.
>>
>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>> commited, as otherwise we can't write to the qcow2 file child on reopen
>> ro -> rw.
>>
> 
> I take it this is the change that moved the invocation logic into
> bdrv_reopen_multiple instead of bdrv_reopen_commit; also notably we no
> longer check the transition edge which is much simpler.
> 
> oh, I see why you're doing it there now...
> 
>> Also, add corresponding test-cases to 255.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2.h              |   5 +-
>>   include/block/block_int.h  |   2 +-
>>   block.c                    |  29 ++----
>>   block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>>   block/qcow2.c              |   2 +-
>>   tests/qemu-iotests/255     |   2 +
>>   tests/qemu-iotests/255.out |  35 +++++++
>>   7 files changed, 193 insertions(+), 79 deletions(-)
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 8d92ef1fee..5928306e62 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -725,9 +725,10 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>                                                   Error **errp);
>>   int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>>                                    Error **errp);
>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
>>   int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp);
>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>> +                                          bool release_stored, Error **errp);
>>   int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>>   bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
>>                                         const char *name,
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 1eebc7c8f3..9a694f3da0 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -536,7 +536,7 @@ struct BlockDriver {
>>        * as rw. This handler should realize it. It also should unset readonly
>>        * field of BlockDirtyBitmap's in case of success.
>>        */
>> -    int (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs, Error **errp);
>> +    void (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs);
>>       bool (*bdrv_can_store_new_dirty_bitmap)(BlockDriverState *bs,
>>                                               const char *name,
>>                                               uint32_t granularity,
>> diff --git a/block.c b/block.c
>> index cb11537029..db1ec0c673 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3334,6 +3334,16 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
>>           bdrv_reopen_commit(&bs_entry->state);
>>       }
>>   
>> +    QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
>> +        BlockDriverState *bs = bs_entry->state.bs;
>> +
>> +        if (!bdrv_is_writable(bs) || !bs->drv->bdrv_reopen_bitmaps_rw) {
>> +            continue;
>> +        }
>> +
>> +        bs->drv->bdrv_reopen_bitmaps_rw(bs);
>> +    }
>> +
> 
> OK, so this has been moved up into the main body of the loop because we
> cannot trust it to run in bdrv_reopen_commit because of the order in
> which the nodes are reopened might leave us unable to write to our child
> nodes to issue the IN_USE flag.
> 
> I have kept out of these discussions in the past; is there a general
> solution that allows us to sort the block DAG leaf-up to avoid this
> scenario?

In Virtuozzo I have a hacking patch, which reverse order of reopen queue
before commit for reopen-rw. But here is simpler solution.

I can't find the discussion now, but if I remember correctly Kevin didn't like
an idea of reverting the queue. He supposed some generic way, in which during
we have access to both states of reopening node..

> 
> Anyway, since the block graph organization isn't my problem I will say
> that I think this is fine by me; but I'm not the one to impress here.
> 
>>       ret = 0;
>>   cleanup_perm:
>>       QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
>> @@ -3770,16 +3780,12 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>       BlockDriver *drv;
>>       BlockDriverState *bs;
>>       BdrvChild *child;
>> -    bool old_can_write, new_can_write;
>>   
>>       assert(reopen_state != NULL);
>>       bs = reopen_state->bs;
>>       drv = bs->drv;
>>       assert(drv != NULL);
>>   
>> -    old_can_write =
>> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
>> -
>>       /* If there are any driver level actions to take */
>>       if (drv->bdrv_reopen_commit) {
>>           drv->bdrv_reopen_commit(reopen_state);
>> @@ -3823,21 +3829,6 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>       }
>>   
>>       bdrv_refresh_limits(bs, NULL);
>> -
>> -    new_can_write =
>> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
>> -    if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
>> -        Error *local_err = NULL;
>> -        if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
>> -            /* This is not fatal, bitmaps just left read-only, so all following
>> -             * writes will fail. User can remove read-only bitmaps to unblock
>> -             * writes.
>> -             */
>> -            error_reportf_err(local_err,
>> -                              "%s: Failed to make dirty bitmaps writable: ",
>> -                              bdrv_get_node_name(bs));
>> -        }
>> -    }
>>   }
>>   
> 
> Certainly the new code is simpler here, which is good.
> 
>>   /*
>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>> index 2b84bfa007..4e565db11f 100644
>> --- a/block/qcow2-bitmap.c
>> +++ b/block/qcow2-bitmap.c
>> @@ -28,6 +28,7 @@
>>   #include "qemu/osdep.h"
>>   #include "qapi/error.h"
>>   #include "qemu/cutils.h"
>> +#include "qemu/error-report.h"
>>   
>>   #include "block/block_int.h"
>>   #include "qcow2.h"
>> @@ -951,6 +952,12 @@ static void set_readonly_helper(gpointer bitmap, gpointer value)
>>       bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
>>   }
>>   
>> +/* for g_slist_foreach for GSList of const char* elements */
>> +static void error_report_helper(gpointer message, gpointer _unused)
>> +{
>> +    error_report("%s", (const char *)message);
>> +}
>> +
>>   /* qcow2_load_dirty_bitmaps()
>>    * Return value is a hint for caller: true means that the Qcow2 header was
>>    * updated. (false doesn't mean that the header should be updated by the
>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>       return list;
>>   }
>>   
>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>> -                                 Error **errp)
>> +/*
>> + * qcow2_reopen_bitmaps_rw
>> + *
>> + * The function is called in bdrv_reopen_multiple after all calls to
>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>> + * write access to child bs, and with current reopening architecture, when
>> + * reopen ro -> rw it is possible only after all reopen commits.
>> + *
>> + * So, we can't fail here. On the other hand, we may face different kinds of
>> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
>> + *
>> + * Try to handle as many cases as we can.
> 
> Hm, I think you're right, but it does make me really uncomfortable that
> we lose the ability to report errors back up the stack. I guess we
> already always did ignore them, so this is no worse, but I don't like
> the idea of adding new error_report_err calls if we can help it.
> 
> I guess we can't help it, though.

It's possible to return error and print it in bdrv_reopen_multiple instead, like
it was pre-patch, but I have two reasons against:

1. I wanted to stress that interface is for calling from commit code, and cannot fail.

2. I can't implement here clean logic like SUCCESS or (ERROR, nothing changed), as some
errors, are critical, some are not, we are trying to fix corruptions, so here is difficult
failure-handling logic, so, it may be simpler to keep it all here, not reporting an error,
which can't be handled in commit code anyway.

> 
>> + */
>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>>       Qcow2BitmapList *bm_list;
>>       Qcow2Bitmap *bm;
>> -    GSList *ro_dirty_bitmaps = NULL;
>> +    GSList *ro_dirty_bitmaps = NULL, *corrupted_bitmaps = NULL;
>> +    Error *local_err = NULL;
>>       int ret = 0;
>> -
>> -    if (header_updated != NULL) {
>> -        *header_updated = false;
>> -    }
>> +    bool need_update = false, updated_ok = false;
>>   
>>       if (s->nb_bitmaps == 0) {
>>           /* No bitmaps - nothing to do */
>> -        return 0;
>> -    }
>> -
>> -    if (!can_write(bs)) {
>> -        error_setg(errp, "Can't write to the image on reopening bitmaps rw");
>> -        return -EINVAL;
>> +        return;
>>       }
>>   
>>       bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
>> -                               s->bitmap_directory_size, errp);
>> +                               s->bitmap_directory_size, &local_err);
>>       if (bm_list == NULL) {
>> -        return -EINVAL;
>> +        error_reportf_err(local_err, "Failed to reopen bitmaps rw: "
>> +                          "cannot load bitmap list: ");
>> +        return;
>>       }
>>   
>>       QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>           BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(bs, bm->name);
>> -        if (bitmap == NULL) {
>> -            continue;
>> -        }
>> +        const char *corruption = NULL;
>>   
>> -        if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>> -            error_setg(errp, "Bitmap %s was loaded prior to rw-reopen, but was "
>> -                       "not marked as readonly. This is a bug, something went "
>> -                       "wrong. All of the bitmaps may be corrupted", bm->name);
>> -            ret = -EINVAL;
>> -            goto out;
>> -        }
>> +        if (bm->flags & BME_FLAG_IN_USE) {
>> +            if (bitmap == NULL) {
>> +                /*
>> +                 * It's an unexpected inconsistent bitmap,
>> +                 * it is safe to ignore it.
>> +                 */
>> +                continue;
>> +            }
> 
> This is supposed to be a reopen but we've found a bitmap we didn't load,
> and it's IN_USE. Should we make any attempt to load it as an
> inconsistent bitmap so the user can know about it?

I thought about this and decided to keep it simpler.. But on the other hand,
why not? So, I don't have strict opinion on this, I can add this logic.

> 
> If we're here, it means we DID reopen the image rw, so we ought to have
> exclusive ownership of this file; the IN_USE flag here signals an
> inconsistency, no?
> 
>>   
>> -        bm->flags |= BME_FLAG_IN_USE;
>> -        ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
>> +            /*
>> +             * It's either an inconsistent bitmap, or we are reopening rw -> rw,
>> +             * or we just didn't save bitmap for some buggy reason. Still, no
>> +             * reason to consider in-RAM bitmap inconsistent, than, mark it rw.
>> +             */
> 
> I don't understand lines 2-3 of this comment. As far as I can tell:
> 
> - We might be seeing a legitimately corrupt bitmap. It's fine to mark it
> as rw, because we can't write to it anyway. (It was marked inconsistent
> on open.)
> - We might be seeing a bitmap that's already properly rw. this call is
> effectively a no-op.
> 
> Is that right? If that's true, I'd just simply say:
> 
> "This is either an inconsistent bitmap or we are reopening rw -> rw. It
> is safe to mark it as not read only in either case."
> 
> What's the "or we just didn't save bitmap for some buggy reason" you are
> alluding to?

I mean, for example, on previous reopen to ro, we failed to store the bitmap and
therefore failed to drop IN_USE flag. So, we see it now.

> 
>> +            bdrv_dirty_bitmap_set_readonly(bitmap, false);
>> +        } else {
>> +            /*
>> +             * In-image bitmap not marked IN_USE
>> +             *
>> +             * The only valid case is
>> +             *     bitmap && bdrv_dirty_bitmap_readonly(bitmap) &&
>> +             *        !bdrv_dirty_bitmap_inconsistent(bitmap))
>> +             *
>> +             * which means reopen ro -> rw of consistent bitmap.
>> +             *
>> +             * Other cases a different kinds of corruptions:
>> +             */
>> +            if (!bitmap) {
>> +                corruption =
>> +                    "Corruption: unexpected valid bitmap '%s' is found in the "
>> +                    "image '%s' on reopen rw. Will try to set IN_USE flag.";
>> +
> 
> In this case, we find a valid bitmap we expected to have a readonly copy
> of in memory, but didn't. We attempt to load it.
> 
>> +                bitmap = load_bitmap(bs, bm, NULL);
>> +                if (!bitmap) {
>> +                    bitmap = bdrv_create_dirty_bitmap(
>> +                            bs, bdrv_get_default_bitmap_granularity(bs),
>> +                            bm->name, NULL);
>> +                }
>> +
>> +                if (bitmap) {
>> +                    bdrv_dirty_bitmap_set_persistence(bitmap, true);
>> +                    bdrv_dirty_bitmap_set_readonly(bitmap, true);
>> +                    bdrv_dirty_bitmap_set_inconsistent(bitmap);
> 
> And we mark it as inconsistent because we're not sure how we missed it
> earlier. OK.
> 
>> +                }
>> +            } else if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>> +                corruption =
>> +                    "Corruption: bitmap '%s' is not marked IN_USE in the "
>> +                    "image '%s' and not marked readonly in RAM. Will try to "
>> +                    "set IN_USE flag.";
>> +
> 
> And in this case, we find the bitmap but it's not marked readonly for
> some reason.
> 
>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
> 
> Why set it readonly again?

It is because inconsistance is not synced to the image. "readonly" exactly
means, that for some reasons we did not marked bitmap IN_USE in the image and
therefore must not write to it.

So, yes, here occurs new thing: readonly-inconsistent bitmap. It blocks guest
writes until we sync it somehow to the image or remove. And we are going to sync
it at the end of this function.

> 
>> +                bdrv_dirty_bitmap_set_inconsistent(bitmap);
> 
> And just in case, we mark it inconsistent. (It's my impression that any
> such write would have failed earlier, but maybe not.)

hm, which write?

> 
>> +            } else if (bdrv_dirty_bitmap_inconsistent(bitmap)) {
>> +                corruption =
>> +                    "Corruption: bitmap '%s' is inconsistent but is not marked "
>> +                    "IN_USE in the image. Will try to set IN_USE flag.";
>> +
>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
> 
> This one is weirder. We have an inconsistent bitmap but it's not IN_USE;
> so we set it readonly again? Why?

Same here, as "inconsitance" is not synced to the image. We'll drop readonly flag if
we sync it successfully.

> 
>> +            }
>> +
>> +            if (bitmap) {
>> +                ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
> 
> Oh, all those bitmaps we just set readonly we're going to mark as not
> readonly again?

yes, if successfully sync IN_USE flags

> 
>> +            }
>> +            bm->flags |= BME_FLAG_IN_USE;
>> +            need_update = true;
>> +            if (corruption) {
>> +                error_report(corruption, bm->name, bs->filename);
>> +                corrupted_bitmaps = g_slist_append(corrupted_bitmaps, bm->name);
>> +            }
>> +        }
>>       }
>>   
>> -    if (ro_dirty_bitmaps != NULL) {
>> +    if (need_update) {
>> +        if (!can_write(bs->file->bs)) {
> 
> I genuinely don't know: is it legitimate to check your child's write
> permission in this way? will we always have bs->file->bs?

Hmm.. but we are going to write to it very soon, I think it should exist.

> 
>> +            error_report("Failed to reopen bitmaps rw: cannot write to "
>> +                         "the protocol file");
>> +            goto handle_corrupted;
>> +        }
>> +
>>           /* in_use flags must be updated */
>>           ret = update_ext_header_and_dir_in_place(bs, bm_list);
>>           if (ret < 0) {
>> -            error_setg_errno(errp, -ret, "Can't update bitmap directory");
>> -            goto out;
>> -        }
>> -        if (header_updated != NULL) {
>> -            *header_updated = true;
>> +            error_report("Cannot update bitmap directory: %s", strerror(-ret));
>> +            goto handle_corrupted;
>>           }
>> +        updated_ok = true;
>>           g_slist_foreach(ro_dirty_bitmaps, set_readonly_helper, false);
>>       }
> 
> And then here at the end the bulk of the work: we re-write the header if
> necessary back out to disk to mark everything as IN_USE.
> 
> It seems strange that you are trying to maintain the invariant that we
> won't set readonly to false if we fail, because this function "cannot fail."

Yes, if we fail, bitmaps remains readonly, and it is preexisting behavior. User
can try reopen again or try to remove bitmaps.. It shouldn't happen often in
practice, but we must do everything we can to prevent appearing in the image
invalid bitmap not marked IN_USE, as it will lead to inconsistent backup.

Hmm, of-course, best thing is to do it all in commit-prepare, where we can fail
legally. But then we'll need write-access to child already in prepare, when now
we don't have it even in commit, I don't know how to achieve it.

> 
>>   
>> -out:
>> +handle_corrupted:
>> +    if (corrupted_bitmaps) {
>> +        if (updated_ok) {
>> +            error_report("Corrupted bitmaps in '%s' successfully marked "
>> +                         "IN_USE", bs->filename);
>> +        } else {
>> +            error_report("Failed to mark IN_USE the following corrupted "
>> +                         "bitmaps in '%s', DO NOT USE THEM:", bs->filename);
>> +            g_slist_foreach(corrupted_bitmaps, error_report_helper, NULL);
>> +        }
>> +    }
>> +
>>       g_slist_free(ro_dirty_bitmaps);
>> +    g_slist_free(corrupted_bitmaps);
>>       bitmap_list_free(bm_list);
>> -
>> -    return ret;
>> -}
>> -
> 
> I just don't know; for how much error checking this function is doing
> now, it seems wrong to be behind an interface that cannot fail and will
> actually do different things to the bitmaps depending on what it sees
> with no return code to help the caller understand the cases.
> 
> It even has a bit at the end where it tries to print in uppercase to
> manually scare a user into taking action, which says to me that there is
> a deeper problem we need to be able to address without intervention from
> the user.
> 
> That said, the patch does seem correct otherwise; and it does fix a
> nasty bug which lets us use bitmaps with snapshots. I want this in for
> 4.1 if I can help it. I will talk to Kevin and Max and see if there's
> some opinion here.

Yes. In short, it was bad, it still bad, but at least one bug is fixed :)

> 
> Thanks!
> 
>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
>> -{
>> -    return qcow2_reopen_bitmaps_rw_hint(bs, NULL, errp);
>>   }
>>   
>>   /* Checks to see if it's safe to resize bitmaps */
>> @@ -1446,7 +1527,8 @@ fail:
>>       bitmap_list_free(bm_list);
>>   }
>>   
>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>> +                                          bool release_stored, Error **errp)
>>   {
>>       BdrvDirtyBitmap *bitmap;
>>       BDRVQcow2State *s = bs->opaque;
>> @@ -1559,20 +1641,23 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>>           g_free(tb);
>>       }
>>   
>> -    QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>> -        /* For safety, we remove bitmap after storing.
>> -         * We may be here in two cases:
>> -         * 1. bdrv_close. It's ok to drop bitmap.
>> -         * 2. inactivation. It means migration without 'dirty-bitmaps'
>> -         *    capability, so bitmaps are not marked with
>> -         *    BdrvDirtyBitmap.migration flags. It's not bad to drop them too,
>> -         *    and reload on invalidation.
>> -         */
>> -        if (bm->dirty_bitmap == NULL) {
>> -            continue;
>> -        }
>> +    if (release_stored) {
>> +        QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>> +            /*
>> +             * For safety, we remove bitmap after storing.
>> +             * We may be here in two cases:
>> +             * 1. bdrv_close. It's ok to drop bitmap.
>> +             * 2. inactivation. It means migration without 'dirty-bitmaps'
>> +             *    capability, so bitmaps are not marked with
>> +             *    BdrvDirtyBitmap.migration flags. It's not bad to drop them
>> +             *    too, and reload on invalidation.
>> +             */
> 
> While we're here, I might touch up the comment.
> 
> For safety, the BdrvDirtyBitmap can be dropped after storing.
> We may be here in two cases, both via qcow2_inactivate:
> 1. bdrv_close: It's correct to remove bitmaps on close.
> 2. migration: This implies we are migrating without the
>     'dirty-bitmaps' capability, because bitmap->migration was unset.

I'm not sure in my understanding of English "because", but bitmap->migration
is a consequence, not a reason of not enabling this capability.

>     If needed, these bitmaps will be reloaded on invalidation.
> 
> I just wanted to clarify these points:
> 
> - The new boolean obviously means we don't /always/ drop them
> - qcow2_inactivate is the only caller that instructs us to drop them
> - both cases in the comment are through qcow2_inactivate.

Agree.

> 
> you can take any or none of my wording suggestions as you feel is
> appropriate.

Thank you! And for reviewing this complicated patch! I know, two words "Also"
in commit message are never followed by good clean design, and this is not an
exclusion.

> 
>> +            if (bm->dirty_bitmap == NULL) {
>> +                continue;
>> +            }
>>   
>> -        bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>> +            bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>> +        }
>>       }
>>   
>>   success:
>> @@ -1600,7 +1685,7 @@ int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
>>       BdrvDirtyBitmap *bitmap;
>>       Error *local_err = NULL;
>>   
>> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
>> +    qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err);
>>       if (local_err != NULL) {
>>           error_propagate(errp, local_err);
>>           return -EINVAL;
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index d39882785d..f0a8479874 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2273,7 +2273,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>>       int ret, result = 0;
>>       Error *local_err = NULL;
>>   
>> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
>> +    qcow2_store_persistent_dirty_bitmaps(bs, true, &local_err);
>>       if (local_err != NULL) {
>>           result = -EINVAL;
>>           error_reportf_err(local_err, "Lost persistent bitmaps during "
>> diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
>> index 36712689d3..e8b0c9d4bb 100755
>> --- a/tests/qemu-iotests/255
>> +++ b/tests/qemu-iotests/255
>> @@ -79,3 +79,5 @@ def test(persistent, restart):
>>   
>>   
>>   test(persistent=False, restart=False)
>> +test(persistent=True, restart=False)
>> +test(persistent=True, restart=True)
>> diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
>> index 2bffb486d2..46e2e3ad4e 100644
>> --- a/tests/qemu-iotests/255.out
>> +++ b/tests/qemu-iotests/255.out
>> @@ -15,3 +15,38 @@ check, that no bitmaps in snapshot: not found
>>   {"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>   check merged bitmap: name=bitmap0 dirty-clusters=2
>>   check updated bitmap: name=bitmap0 dirty-clusters=3
>> +
>> +Testcase persistent without restart
>> +
>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
>> +{"return": {}}
>> +initial bitmap: name=bitmap0 dirty-clusters=1
>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>> +{"return": {}}
>> +check, that no bitmaps in snapshot: not found
>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>> +{"return": {}}
>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>> +{"return": {}}
>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>> +
>> +Testcase persistent with restart
>> +
>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
>> +{"return": {}}
>> +initial bitmap: name=bitmap0 dirty-clusters=1
>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>> +{"return": {}}
>> +check, that no bitmaps in snapshot: not found
>> +... Restart ...
>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>> +{"return": {}}
>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>> +{"return": {}}
>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>>
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-23 15:47 ` [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic Vladimir Sementsov-Ogievskiy
  2019-05-28 23:24   ` John Snow
@ 2019-05-29 15:33   ` Max Reitz
  2019-05-29 15:58     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 17+ messages in thread
From: Max Reitz @ 2019-05-29 15:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, jsnow, Alberto Garcia, den

[-- Attachment #1: Type: text/plain, Size: 5052 bytes --]

On 23.05.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
> Current logic
> =============
> 
> Reopen rw -> ro:
> 
> Store bitmaps and release BdrvDirtyBitmap's.
> 
> Reopen ro -> rw:
> 
> Load bitmap list
> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>    the worst thing]
> A kind of fail with error message if we see not readonly bitmap
> 
> This leads to at least the following bug:
> 
> Assume we have bitmap0.
> Create external snapshot
>   bitmap0 is stored and therefore removed
> Commit snapshot
>   now we have no bitmaps
> Do some writes from guest (*)
>   they are not marked in bitmap
> Shutdown
> Start
>   bitmap0 is loaded as valid, but it is actually broken! It misses
>   writes (*)
> Incremental backup
>   it will be inconsistent
> 
> New logic
> =========
> 
> Reopen rw -> ro:
> 
> Store bitmaps and don't remove them from RAM, only mark them readonly
> (the latter is already done, but didn't work properly because of
> removing in storing function)
> 
> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
> now in qcow2_reopen_bitmaps_rw)
> 
> Load bitmap list
> 
> Carefully consider all possible combinations of bitmaps in RAM and in
> the image, try to fix corruptions, print corresponding error messages.
> 
> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
> commited, as otherwise we can't write to the qcow2 file child on reopen
> ro -> rw.
> 
> Also, add corresponding test-cases to 255.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2.h              |   5 +-
>  include/block/block_int.h  |   2 +-
>  block.c                    |  29 ++----
>  block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>  block/qcow2.c              |   2 +-
>  tests/qemu-iotests/255     |   2 +
>  tests/qemu-iotests/255.out |  35 +++++++
>  7 files changed, 193 insertions(+), 79 deletions(-)

[...]

> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index 2b84bfa007..4e565db11f 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c

[...]

> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>      return list;
>  }
>  
> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
> -                                 Error **errp)
> +/*
> + * qcow2_reopen_bitmaps_rw
> + *
> + * The function is called in bdrv_reopen_multiple after all calls to
> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
> + * write access to child bs, and with current reopening architecture, when
> + * reopen ro -> rw it is possible only after all reopen commits.
> + *
> + * So, we can't fail here.

Why?  Because the current design forbids it?

Then the current design seems flawed to me.

[CC-ing Berto]

Without having looked too close into this, it seems from your commit
message that it is possible to run into unrecoverable fatal errors here.
 We cannot just ignore that on the basis that the current design cannot
deal with that.

It just appears wrong to me to update any flags in the image in the
.commit() part of a transaction.  We should try to do so in .prepare().
 If it works, great, we’re set for .commit().  If it doesn’t, welp, time
for .abort() and pretend the image is still read-only (even though it
kind of may be half-prepared for writing).

If we fail to set IN_USE in .commit(), that’s a fatal error in my opinion.

After some chatting with John, I’ve come to the following belief:

When switching a node from RO to R/W, it must be able to write to its
children in .prepare() -- precisely because performing this switch may
need some metadata change.  If we cannot do this change, then we cannot
switch the node to R/W.  So it’s clear to me that this needs to be done
in .prepare().

So I think a node’s children must be R/W before we can .prepare() the
node.  That means we need to .commit() them.  That means we cannot have
a single transaction that switches a whole tree to be R/W, but it must
consist of one transaction per level.

If something fails, we need to roll back all the previous layers.  Well,
it depends.

If we switch to R/W (and something fails), then we need to try to revert
the children we have already made R/W to be RO.  If that doesn’t work,
welp, too bad, but not fatal.

Switching to RO goes the other way around (parents to children), so if
we encounter an error there, we cannot make that node’s children RO,
too.  We could try to revert the whole change and make the whole tree
R/W again, or we just don’t.  I think “just don’t” is reasonable.

Max

>     On the other hand, we may face different kinds of
> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
> + *
> + * Try to handle as many cases as we can.
> + */
> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-29 15:33   ` Max Reitz
@ 2019-05-29 15:58     ` Vladimir Sementsov-Ogievskiy
  2019-05-29 18:18       ` Max Reitz
  0 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-29 15:58 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block
  Cc: fam, kwolf, jsnow, Denis Lunev, Alberto Garcia

29.05.2019 18:33, Max Reitz wrote:
> On 23.05.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>> Current logic
>> =============
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and release BdrvDirtyBitmap's.
>>
>> Reopen ro -> rw:
>>
>> Load bitmap list
>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>     the worst thing]
>> A kind of fail with error message if we see not readonly bitmap
>>
>> This leads to at least the following bug:
>>
>> Assume we have bitmap0.
>> Create external snapshot
>>    bitmap0 is stored and therefore removed
>> Commit snapshot
>>    now we have no bitmaps
>> Do some writes from guest (*)
>>    they are not marked in bitmap
>> Shutdown
>> Start
>>    bitmap0 is loaded as valid, but it is actually broken! It misses
>>    writes (*)
>> Incremental backup
>>    it will be inconsistent
>>
>> New logic
>> =========
>>
>> Reopen rw -> ro:
>>
>> Store bitmaps and don't remove them from RAM, only mark them readonly
>> (the latter is already done, but didn't work properly because of
>> removing in storing function)
>>
>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>> now in qcow2_reopen_bitmaps_rw)
>>
>> Load bitmap list
>>
>> Carefully consider all possible combinations of bitmaps in RAM and in
>> the image, try to fix corruptions, print corresponding error messages.
>>
>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>> commited, as otherwise we can't write to the qcow2 file child on reopen
>> ro -> rw.
>>
>> Also, add corresponding test-cases to 255.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2.h              |   5 +-
>>   include/block/block_int.h  |   2 +-
>>   block.c                    |  29 ++----
>>   block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>>   block/qcow2.c              |   2 +-
>>   tests/qemu-iotests/255     |   2 +
>>   tests/qemu-iotests/255.out |  35 +++++++
>>   7 files changed, 193 insertions(+), 79 deletions(-)
> 
> [...]
> 
>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>> index 2b84bfa007..4e565db11f 100644
>> --- a/block/qcow2-bitmap.c
>> +++ b/block/qcow2-bitmap.c
> 
> [...]
> 
>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>       return list;
>>   }
>>   
>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>> -                                 Error **errp)
>> +/*
>> + * qcow2_reopen_bitmaps_rw
>> + *
>> + * The function is called in bdrv_reopen_multiple after all calls to
>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>> + * write access to child bs, and with current reopening architecture, when
>> + * reopen ro -> rw it is possible only after all reopen commits.
>> + *
>> + * So, we can't fail here.
> 
> Why?  Because the current design forbids it?
> 
> Then the current design seems flawed to me.
> 
> [CC-ing Berto]
> 
> Without having looked too close into this, it seems from your commit
> message that it is possible to run into unrecoverable fatal errors here.
>   We cannot just ignore that on the basis that the current design cannot
> deal with that.
> 
> It just appears wrong to me to update any flags in the image in the
> .commit() part of a transaction.  We should try to do so in .prepare().
>   If it works, great, we’re set for .commit().  If it doesn’t, welp, time
> for .abort() and pretend the image is still read-only (even though it
> kind of may be half-prepared for writing).
> 
> If we fail to set IN_USE in .commit(), that’s a fatal error in my opinion.
> 
> After some chatting with John, I’ve come to the following belief:
> 
> When switching a node from RO to R/W, it must be able to write to its
> children in .prepare() -- precisely because performing this switch may
> need some metadata change.  If we cannot do this change, then we cannot
> switch the node to R/W.  So it’s clear to me that this needs to be done
> in .prepare().
> 
> So I think a node’s children must be R/W before we can .prepare() the
> node.  That means we need to .commit() them.  That means we cannot have
> a single transaction that switches a whole tree to be R/W, but it must
> consist of one transaction per level.
> 
> If something fails, we need to roll back all the previous layers.  Well,
> it depends.
> 
> If we switch to R/W (and something fails), then we need to try to revert
> the children we have already made R/W to be RO.  If that doesn’t work,
> welp, too bad, but not fatal.

And, than, why not do full reopen rw in .prepare, and just organize
recursion so that children reopened rw before parent? Than we again have
one transaction for the tree, but abort may fail to rollback it in worst case.
But we cant avoid it anyway, with one transaction or with transactions per level..

So, just move all code from .commit to .prepare for reopen-rw and try to roll-back
it in .abort?

Or do you have an idea for a patch?

> 
> Switching to RO goes the other way around (parents to children), so if
> we encounter an error there, we cannot make that node’s children RO,
> too.  We could try to revert the whole change and make the whole tree
> R/W again, or we just don’t.  I think “just don’t” is reasonable.
> 
> Max
> 
>>      On the other hand, we may face different kinds of
>> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
>> + *
>> + * Try to handle as many cases as we can.
>> + */
>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-29  9:10     ` Vladimir Sementsov-Ogievskiy
@ 2019-05-29 18:08       ` John Snow
  2019-05-30  8:23         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 17+ messages in thread
From: John Snow @ 2019-05-29 18:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, Denis Lunev, mreitz

Max has picked this thread up for block discussion, so I'm going to
stick to slightly more bitmap related discussion here; we'll resume
block discussion in the other tail of this thread.

On 5/29/19 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 2:24, John Snow wrote:
>>
>>
>> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Current logic
>>> =============
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and release BdrvDirtyBitmap's.
>>>
>>> Reopen ro -> rw:
>>>
>>> Load bitmap list
>>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>>     the worst thing]
>>
>> ...ah.
>>
>>> A kind of fail with error message if we see not readonly bitmap
>>>
>>> This leads to at least the following bug:
>>>
>>> Assume we have bitmap0.
>>
>> Presumably a normal, persistent bitmap.
>>
>>> Create external snapshot
>>>    bitmap0 is stored and therefore removed
>>
>> Written out to the backing file and removed from memory, because we've
>> reopened rw->ro; this is because of the migration "safety" clause where
>> we simply drop these bitmaps.
>>
>> ...Ah, and then we don't actually open them readonly again; that entire
>> stanza in reopen_ro never fires off at all because the bitmaps are
>> already gone.
>>
>>> Commit snapshot
>>>    now we have no bitmaps
>>
>> When we reopen the base as rw as you note, we skipped bitmaps for which
>> we had no in-memory bitmap for -- because the readonly logic was really
>> expecting to have these in-memory.
>>
>> I should probably say that for the sake of migration correctness, the
>> way we flush things to disk and remove it from memory on write is really
>> bothersome to keep correct. The migration logic is so particular that it
>> keeps causing issues elsewhere, of which this is another symptom.
>>
>>> Do some writes from guest (*)
>>>    they are not marked in bitmap
>>
>> Yikes, right.
>>
>>> Shutdown
>>> Start
>>>    bitmap0 is loaded as valid, but it is actually broken! It misses
>>>    writes (*)
>>
>> Yikes; because it was consistent on flush and we skipped it on load,
>> it's not marked as IN_USE and we are free to load it up again.
>>
>>> Incremental backup
>>>    it will be inconsistent
>>>
>>
>> Good writeup, thank you.
>>
>>> New logic
>>> =========
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and don't remove them from RAM, only mark them readonly
>>> (the latter is already done, but didn't work properly because of
>>> removing in storing function)
>>>
>>
>> Yes! I like this change.
>>
>>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>>> now in qcow2_reopen_bitmaps_rw)
>>>
>>
>> OK, so we allow rw --> rw without trying to be fussy about it. That
>> seems fine to me.
>>
>>> Load bitmap list
>>>
>>> Carefully consider all possible combinations of bitmaps in RAM and in
>>> the image, try to fix corruptions, print corresponding error messages.
>>>
>>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>>> commited, as otherwise we can't write to the qcow2 file child on reopen
>>> ro -> rw.
>>>
>>
>> I take it this is the change that moved the invocation logic into
>> bdrv_reopen_multiple instead of bdrv_reopen_commit; also notably we no
>> longer check the transition edge which is much simpler.
>>
>> oh, I see why you're doing it there now...
>>
>>> Also, add corresponding test-cases to 255.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2.h              |   5 +-
>>>   include/block/block_int.h  |   2 +-
>>>   block.c                    |  29 ++----
>>>   block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>>>   block/qcow2.c              |   2 +-
>>>   tests/qemu-iotests/255     |   2 +
>>>   tests/qemu-iotests/255.out |  35 +++++++
>>>   7 files changed, 193 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index 8d92ef1fee..5928306e62 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -725,9 +725,10 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>                                                   Error **errp);
>>>   int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>>>                                    Error **errp);
>>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
>>>   int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
>>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp);
>>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>> +                                          bool release_stored, Error **errp);
>>>   int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>>>   bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
>>>                                         const char *name,
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 1eebc7c8f3..9a694f3da0 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -536,7 +536,7 @@ struct BlockDriver {
>>>        * as rw. This handler should realize it. It also should unset readonly
>>>        * field of BlockDirtyBitmap's in case of success.
>>>        */
>>> -    int (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs, Error **errp);
>>> +    void (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs);
>>>       bool (*bdrv_can_store_new_dirty_bitmap)(BlockDriverState *bs,
>>>                                               const char *name,
>>>                                               uint32_t granularity,
>>> diff --git a/block.c b/block.c
>>> index cb11537029..db1ec0c673 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -3334,6 +3334,16 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
>>>           bdrv_reopen_commit(&bs_entry->state);
>>>       }
>>>   
>>> +    QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
>>> +        BlockDriverState *bs = bs_entry->state.bs;
>>> +
>>> +        if (!bdrv_is_writable(bs) || !bs->drv->bdrv_reopen_bitmaps_rw) {
>>> +            continue;
>>> +        }
>>> +
>>> +        bs->drv->bdrv_reopen_bitmaps_rw(bs);
>>> +    }
>>> +
>>
>> OK, so this has been moved up into the main body of the loop because we
>> cannot trust it to run in bdrv_reopen_commit because of the order in
>> which the nodes are reopened might leave us unable to write to our child
>> nodes to issue the IN_USE flag.
>>
>> I have kept out of these discussions in the past; is there a general
>> solution that allows us to sort the block DAG leaf-up to avoid this
>> scenario?
> 
> In Virtuozzo I have a hacking patch, which reverse order of reopen queue
> before commit for reopen-rw. But here is simpler solution.
> 
> I can't find the discussion now, but if I remember correctly Kevin didn't like
> an idea of reverting the queue. He supposed some generic way, in which during
> we have access to both states of reopening node..
> 
>>
>> Anyway, since the block graph organization isn't my problem I will say
>> that I think this is fine by me; but I'm not the one to impress here.
>>
>>>       ret = 0;
>>>   cleanup_perm:
>>>       QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
>>> @@ -3770,16 +3780,12 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>>       BlockDriver *drv;
>>>       BlockDriverState *bs;
>>>       BdrvChild *child;
>>> -    bool old_can_write, new_can_write;
>>>   
>>>       assert(reopen_state != NULL);
>>>       bs = reopen_state->bs;
>>>       drv = bs->drv;
>>>       assert(drv != NULL);
>>>   
>>> -    old_can_write =
>>> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
>>> -
>>>       /* If there are any driver level actions to take */
>>>       if (drv->bdrv_reopen_commit) {
>>>           drv->bdrv_reopen_commit(reopen_state);
>>> @@ -3823,21 +3829,6 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>>       }
>>>   
>>>       bdrv_refresh_limits(bs, NULL);
>>> -
>>> -    new_can_write =
>>> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
>>> -    if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
>>> -        Error *local_err = NULL;
>>> -        if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
>>> -            /* This is not fatal, bitmaps just left read-only, so all following
>>> -             * writes will fail. User can remove read-only bitmaps to unblock
>>> -             * writes.
>>> -             */
>>> -            error_reportf_err(local_err,
>>> -                              "%s: Failed to make dirty bitmaps writable: ",
>>> -                              bdrv_get_node_name(bs));
>>> -        }
>>> -    }
>>>   }
>>>   
>>
>> Certainly the new code is simpler here, which is good.
>>
>>>   /*
>>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>>> index 2b84bfa007..4e565db11f 100644
>>> --- a/block/qcow2-bitmap.c
>>> +++ b/block/qcow2-bitmap.c
>>> @@ -28,6 +28,7 @@
>>>   #include "qemu/osdep.h"
>>>   #include "qapi/error.h"
>>>   #include "qemu/cutils.h"
>>> +#include "qemu/error-report.h"
>>>   
>>>   #include "block/block_int.h"
>>>   #include "qcow2.h"
>>> @@ -951,6 +952,12 @@ static void set_readonly_helper(gpointer bitmap, gpointer value)
>>>       bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
>>>   }
>>>   
>>> +/* for g_slist_foreach for GSList of const char* elements */
>>> +static void error_report_helper(gpointer message, gpointer _unused)
>>> +{
>>> +    error_report("%s", (const char *)message);
>>> +}
>>> +
>>>   /* qcow2_load_dirty_bitmaps()
>>>    * Return value is a hint for caller: true means that the Qcow2 header was
>>>    * updated. (false doesn't mean that the header should be updated by the
>>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>       return list;
>>>   }
>>>   
>>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>>> -                                 Error **errp)
>>> +/*
>>> + * qcow2_reopen_bitmaps_rw
>>> + *
>>> + * The function is called in bdrv_reopen_multiple after all calls to
>>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>>> + * write access to child bs, and with current reopening architecture, when
>>> + * reopen ro -> rw it is possible only after all reopen commits.
>>> + *
>>> + * So, we can't fail here. On the other hand, we may face different kinds of
>>> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
>>> + *
>>> + * Try to handle as many cases as we can.
>>
>> Hm, I think you're right, but it does make me really uncomfortable that
>> we lose the ability to report errors back up the stack. I guess we
>> already always did ignore them, so this is no worse, but I don't like
>> the idea of adding new error_report_err calls if we can help it.
>>
>> I guess we can't help it, though.
> 
> It's possible to return error and print it in bdrv_reopen_multiple instead, like
> it was pre-patch, but I have two reasons against:
> 
> 1. I wanted to stress that interface is for calling from commit code, and cannot fail.
> 
> 2. I can't implement here clean logic like SUCCESS or (ERROR, nothing changed), as some
> errors, are critical, some are not, we are trying to fix corruptions, so here is difficult
> failure-handling logic, so, it may be simpler to keep it all here, not reporting an error,
> which can't be handled in commit code anyway.
> 
>>
>>> + */
>>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
>>>   {
>>>       BDRVQcow2State *s = bs->opaque;
>>>       Qcow2BitmapList *bm_list;
>>>       Qcow2Bitmap *bm;
>>> -    GSList *ro_dirty_bitmaps = NULL;
>>> +    GSList *ro_dirty_bitmaps = NULL, *corrupted_bitmaps = NULL;
>>> +    Error *local_err = NULL;
>>>       int ret = 0;
>>> -
>>> -    if (header_updated != NULL) {
>>> -        *header_updated = false;
>>> -    }
>>> +    bool need_update = false, updated_ok = false;
>>>   
>>>       if (s->nb_bitmaps == 0) {
>>>           /* No bitmaps - nothing to do */
>>> -        return 0;
>>> -    }
>>> -
>>> -    if (!can_write(bs)) {
>>> -        error_setg(errp, "Can't write to the image on reopening bitmaps rw");
>>> -        return -EINVAL;
>>> +        return;
>>>       }
>>>   
>>>       bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
>>> -                               s->bitmap_directory_size, errp);
>>> +                               s->bitmap_directory_size, &local_err);
>>>       if (bm_list == NULL) {
>>> -        return -EINVAL;
>>> +        error_reportf_err(local_err, "Failed to reopen bitmaps rw: "
>>> +                          "cannot load bitmap list: ");
>>> +        return;
>>>       }
>>>   
>>>       QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>>           BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(bs, bm->name);
>>> -        if (bitmap == NULL) {
>>> -            continue;
>>> -        }
>>> +        const char *corruption = NULL;
>>>   
>>> -        if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>>> -            error_setg(errp, "Bitmap %s was loaded prior to rw-reopen, but was "
>>> -                       "not marked as readonly. This is a bug, something went "
>>> -                       "wrong. All of the bitmaps may be corrupted", bm->name);
>>> -            ret = -EINVAL;
>>> -            goto out;
>>> -        }
>>> +        if (bm->flags & BME_FLAG_IN_USE) {
>>> +            if (bitmap == NULL) {
>>> +                /*
>>> +                 * It's an unexpected inconsistent bitmap,
>>> +                 * it is safe to ignore it.
>>> +                 */
>>> +                continue;
>>> +            }
>>
>> This is supposed to be a reopen but we've found a bitmap we didn't load,
>> and it's IN_USE. Should we make any attempt to load it as an
>> inconsistent bitmap so the user can know about it?
> 
> I thought about this and decided to keep it simpler.. But on the other hand,
> why not? So, I don't have strict opinion on this, I can add this logic.
> 

It just seemed to me that if you're going through such efforts to check
consistency in reopen, we might as well do this one too. Maybe it can
make the loop simpler if you can share the code with the (!bitmap) case
below?

In lieu of any proper error pathways, alerting the user to abnormalities
via the API (in the form of +inconsistent bitmaps) seems like one of the
most helpful things we can do.

>>
>> If we're here, it means we DID reopen the image rw, so we ought to have
>> exclusive ownership of this file; the IN_USE flag here signals an
>> inconsistency, no?
>>
>>>   
>>> -        bm->flags |= BME_FLAG_IN_USE;
>>> -        ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
>>> +            /*
>>> +             * It's either an inconsistent bitmap, or we are reopening rw -> rw,
>>> +             * or we just didn't save bitmap for some buggy reason. Still, no
>>> +             * reason to consider in-RAM bitmap inconsistent, than, mark it rw.
>>> +             */
>>
>> I don't understand lines 2-3 of this comment. As far as I can tell:
>>
>> - We might be seeing a legitimately corrupt bitmap. It's fine to mark it
>> as rw, because we can't write to it anyway. (It was marked inconsistent
>> on open.)
>> - We might be seeing a bitmap that's already properly rw. this call is
>> effectively a no-op.
>>
>> Is that right? If that's true, I'd just simply say:
>>
>> "This is either an inconsistent bitmap or we are reopening rw -> rw. It
>> is safe to mark it as not read only in either case."
>>
>> What's the "or we just didn't save bitmap for some buggy reason" you are
>> alluding to?
> 
> I mean, for example, on previous reopen to ro, we failed to store the bitmap and
> therefore failed to drop IN_USE flag. So, we see it now.
> 

Oh, okay. Understood.

>>
>>> +            bdrv_dirty_bitmap_set_readonly(bitmap, false);
>>> +        } else {
>>> +            /*
>>> +             * In-image bitmap not marked IN_USE
>>> +             *
>>> +             * The only valid case is
>>> +             *     bitmap && bdrv_dirty_bitmap_readonly(bitmap) &&
>>> +             *        !bdrv_dirty_bitmap_inconsistent(bitmap))
>>> +             *
>>> +             * which means reopen ro -> rw of consistent bitmap.
>>> +             *
>>> +             * Other cases a different kinds of corruptions:
>>> +             */
>>> +            if (!bitmap) {
>>> +                corruption =
>>> +                    "Corruption: unexpected valid bitmap '%s' is found in the "
>>> +                    "image '%s' on reopen rw. Will try to set IN_USE flag.";
>>> +
>>
>> In this case, we find a valid bitmap we expected to have a readonly copy
>> of in memory, but didn't. We attempt to load it.
>>
>>> +                bitmap = load_bitmap(bs, bm, NULL);
>>> +                if (!bitmap) {
>>> +                    bitmap = bdrv_create_dirty_bitmap(
>>> +                            bs, bdrv_get_default_bitmap_granularity(bs),
>>> +                            bm->name, NULL);
>>> +                }
>>> +
>>> +                if (bitmap) {
>>> +                    bdrv_dirty_bitmap_set_persistence(bitmap, true);
>>> +                    bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>> +                    bdrv_dirty_bitmap_set_inconsistent(bitmap);
>>
>> And we mark it as inconsistent because we're not sure how we missed it
>> earlier. OK.
>>
>>> +                }
>>> +            } else if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>>> +                corruption =
>>> +                    "Corruption: bitmap '%s' is not marked IN_USE in the "
>>> +                    "image '%s' and not marked readonly in RAM. Will try to "
>>> +                    "set IN_USE flag.";
>>> +
>>
>> And in this case, we find the bitmap but it's not marked readonly for
>> some reason.
>>
>>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>
>> Why set it readonly again?
> 
> It is because inconsistance is not synced to the image. "readonly" exactly
> means, that for some reasons we did not marked bitmap IN_USE in the image and
> therefore must not write to it.
> 

That's one way of looking at what readonly means. Another was: "The
image this bitmap is attached to is read only, and any writes to this
bitmap are a logistical error."

> So, yes, here occurs new thing: readonly-inconsistent bitmap. It blocks guest
> writes until we sync it somehow to the image or remove. And we are going to sync
> it at the end of this function.
> 

Right, we've not really used readonly in this way before. It makes sense
to a point, but it's a bit of a semantic overload -- the disk is
actually RW but the bitmap is RO; the problem that I have with this is
that we guard RO bitmaps with assertions and not errors, so it seems
feasible to me that the reopen-rw will succeed, a guest will write, and
then we'll abort because of these "readonly corrupt" bitmaps.

In other words, we don't *really* support the idea of having readonly
bitmaps on readwrite nodes.

I think given that this is an error state to begin with that simply
marking the bitmap as inconsistent in memory (and trying to write the
IN_USE flag to its header) is sufficient, it will skip any new writes
and prohibit its use for any backup operations.

>>
>>> +                bdrv_dirty_bitmap_set_inconsistent(bitmap);
>>
>> And just in case, we mark it inconsistent. (It's my impression that any
>> such write would have failed earlier, but maybe not.)
> 
> hm, which write?
> 

You know, I was thinking we'd have been in RO mode before this, but I
forgot that's not true anymore. Disregard this comment.

>>
>>> +            } else if (bdrv_dirty_bitmap_inconsistent(bitmap)) {
>>> +                corruption =
>>> +                    "Corruption: bitmap '%s' is inconsistent but is not marked "
>>> +                    "IN_USE in the image. Will try to set IN_USE flag.";
>>> +
>>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>
>> This one is weirder. We have an inconsistent bitmap but it's not IN_USE;
>> so we set it readonly again? Why?
> 
> Same here, as "inconsitance" is not synced to the image. We'll drop readonly flag if
> we sync it successfully.
> 
>>
>>> +            }
>>> +
>>> +            if (bitmap) {
>>> +                ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
>>
>> Oh, all those bitmaps we just set readonly we're going to mark as not
>> readonly again?
> 
> yes, if successfully sync IN_USE flags
> 
>>
>>> +            }
>>> +            bm->flags |= BME_FLAG_IN_USE;
>>> +            need_update = true;
>>> +            if (corruption) {
>>> +                error_report(corruption, bm->name, bs->filename);
>>> +                corrupted_bitmaps = g_slist_append(corrupted_bitmaps, bm->name);
>>> +            }
>>> +        }
>>>       }
>>>   
>>> -    if (ro_dirty_bitmaps != NULL) {
>>> +    if (need_update) {
>>> +        if (!can_write(bs->file->bs)) {
>>
>> I genuinely don't know: is it legitimate to check your child's write
>> permission in this way? will we always have bs->file->bs?
> 
> Hmm.. but we are going to write to it very soon, I think it should exist.
> 

Apparently Max is adding a bdrv_storage_bs() helper for this exact
thing, in an upcoming patch. I just get nervous when I see double
indirections.

>>
>>> +            error_report("Failed to reopen bitmaps rw: cannot write to "
>>> +                         "the protocol file");
>>> +            goto handle_corrupted;
>>> +        }
>>> +
>>>           /* in_use flags must be updated */
>>>           ret = update_ext_header_and_dir_in_place(bs, bm_list);
>>>           if (ret < 0) {
>>> -            error_setg_errno(errp, -ret, "Can't update bitmap directory");
>>> -            goto out;
>>> -        }
>>> -        if (header_updated != NULL) {
>>> -            *header_updated = true;
>>> +            error_report("Cannot update bitmap directory: %s", strerror(-ret));
>>> +            goto handle_corrupted;
>>>           }
>>> +        updated_ok = true;
>>>           g_slist_foreach(ro_dirty_bitmaps, set_readonly_helper, false);
>>>       }
>>
>> And then here at the end the bulk of the work: we re-write the header if
>> necessary back out to disk to mark everything as IN_USE.
>>
>> It seems strange that you are trying to maintain the invariant that we
>> won't set readonly to false if we fail, because this function "cannot fail."
> 
> Yes, if we fail, bitmaps remains readonly, and it is preexisting behavior. User
> can try reopen again or try to remove bitmaps.. It shouldn't happen often in
> practice, but we must do everything we can to prevent appearing in the image
> invalid bitmap not marked IN_USE, as it will lead to inconsistent backup.
> 
> Hmm, of-course, best thing is to do it all in commit-prepare, where we can fail
> legally. But then we'll need write-access to child already in prepare, when now
> we don't have it even in commit, I don't know how to achieve it.
> 

I've talked to Max and I think he has some opinions about how we might
be able to do this, but it might rely on some of his pending series.

>>
>>>   
>>> -out:
>>> +handle_corrupted:
>>> +    if (corrupted_bitmaps) {
>>> +        if (updated_ok) {
>>> +            error_report("Corrupted bitmaps in '%s' successfully marked "
>>> +                         "IN_USE", bs->filename);
>>> +        } else {
>>> +            error_report("Failed to mark IN_USE the following corrupted "
>>> +                         "bitmaps in '%s', DO NOT USE THEM:", bs->filename);
>>> +            g_slist_foreach(corrupted_bitmaps, error_report_helper, NULL);
>>> +        }
>>> +    }
>>> +
>>>       g_slist_free(ro_dirty_bitmaps);
>>> +    g_slist_free(corrupted_bitmaps);
>>>       bitmap_list_free(bm_list);
>>> -
>>> -    return ret;
>>> -}
>>> -
>>
>> I just don't know; for how much error checking this function is doing
>> now, it seems wrong to be behind an interface that cannot fail and will
>> actually do different things to the bitmaps depending on what it sees
>> with no return code to help the caller understand the cases.
>>
>> It even has a bit at the end where it tries to print in uppercase to
>> manually scare a user into taking action, which says to me that there is
>> a deeper problem we need to be able to address without intervention from
>> the user.
>>
>> That said, the patch does seem correct otherwise; and it does fix a
>> nasty bug which lets us use bitmaps with snapshots. I want this in for
>> 4.1 if I can help it. I will talk to Kevin and Max and see if there's
>> some opinion here.
> 
> Yes. In short, it was bad, it still bad, but at least one bug is fixed :)
> 

Hahaha! Very good summary. Let's discuss the block implications with
Max, Berto and Kevin. Until then, do you think it's OK to split out the
release_bitmaps boolean as its own patch to "lessen" the severity of the
bug?

It looks like the reopen infrastructural changes might be a bit harder.

>>
>> Thanks!
>>
>>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
>>> -{
>>> -    return qcow2_reopen_bitmaps_rw_hint(bs, NULL, errp);
>>>   }
>>>   
>>>   /* Checks to see if it's safe to resize bitmaps */
>>> @@ -1446,7 +1527,8 @@ fail:
>>>       bitmap_list_free(bm_list);
>>>   }
>>>   
>>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>> +                                          bool release_stored, Error **errp)
>>>   {
>>>       BdrvDirtyBitmap *bitmap;
>>>       BDRVQcow2State *s = bs->opaque;
>>> @@ -1559,20 +1641,23 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>>>           g_free(tb);
>>>       }
>>>   
>>> -    QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>> -        /* For safety, we remove bitmap after storing.
>>> -         * We may be here in two cases:
>>> -         * 1. bdrv_close. It's ok to drop bitmap.
>>> -         * 2. inactivation. It means migration without 'dirty-bitmaps'
>>> -         *    capability, so bitmaps are not marked with
>>> -         *    BdrvDirtyBitmap.migration flags. It's not bad to drop them too,
>>> -         *    and reload on invalidation.
>>> -         */
>>> -        if (bm->dirty_bitmap == NULL) {
>>> -            continue;
>>> -        }
>>> +    if (release_stored) {
>>> +        QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>> +            /*
>>> +             * For safety, we remove bitmap after storing.
>>> +             * We may be here in two cases:
>>> +             * 1. bdrv_close. It's ok to drop bitmap.
>>> +             * 2. inactivation. It means migration without 'dirty-bitmaps'
>>> +             *    capability, so bitmaps are not marked with
>>> +             *    BdrvDirtyBitmap.migration flags. It's not bad to drop them
>>> +             *    too, and reload on invalidation.
>>> +             */
>>
>> While we're here, I might touch up the comment.
>>
>> For safety, the BdrvDirtyBitmap can be dropped after storing.
>> We may be here in two cases, both via qcow2_inactivate:
>> 1. bdrv_close: It's correct to remove bitmaps on close.
>> 2. migration: This implies we are migrating without the
>>     'dirty-bitmaps' capability, because bitmap->migration was unset.
> 
> I'm not sure in my understanding of English "because", but bitmap->migration
> is a consequence, not a reason of not enabling this capability.
> 

The fun thing about English is that there are definitely no rules and it
never makes sense. The other fun thing about English is that I am fairly
certain I don't know how to speak it.

In this case, I mean to say: "bitmap->migration is unset. since it is
unset, we can deduce that we are not running in a context where we are
doing a dirty-bitmaps migration."

Or, "migration: The implication (that we are migration without the
'dirty-bitmaps' capability) follows from {bitmap->migration being unset}."

I mean to say that the implication/deduction is what follows from the
boolean being unset. Of course, the boolean being unset is itself the
consequence from the fact that we are not in such a migration.

Clear as mud?

Let's say this:

"2. migration: We know we are migrating without the 'dirty-bitmaps'
capability, since bitmap->migration was found false above."

Hopefully this clarifies that migration was "set false" means not when
it was assigned false, but was "found to be set to false".

Sorry about my language.

(Well, not *my* language, but the one I happen to speak.)

>>     If needed, these bitmaps will be reloaded on invalidation.
>>
>> I just wanted to clarify these points:
>>
>> - The new boolean obviously means we don't /always/ drop them
>> - qcow2_inactivate is the only caller that instructs us to drop them
>> - both cases in the comment are through qcow2_inactivate.
> 
> Agree.
> 
>>
>> you can take any or none of my wording suggestions as you feel is
>> appropriate.
> 
> Thank you! And for reviewing this complicated patch! I know, two words "Also"
> in commit message are never followed by good clean design, and this is not an
> exclusion.
> 

Thanks for tackling it. It's always nice to see clear examples of
limitations of the current infrastructure.

(Well, not "nice" but certainly "helpful".)

>>
>>> +            if (bm->dirty_bitmap == NULL) {
>>> +                continue;
>>> +            }
>>>   
>>> -        bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>>> +            bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>>> +        }
>>>       }
>>>   
>>>   success:
>>> @@ -1600,7 +1685,7 @@ int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
>>>       BdrvDirtyBitmap *bitmap;
>>>       Error *local_err = NULL;
>>>   
>>> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
>>> +    qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err);
>>>       if (local_err != NULL) {
>>>           error_propagate(errp, local_err);
>>>           return -EINVAL;
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index d39882785d..f0a8479874 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -2273,7 +2273,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>>>       int ret, result = 0;
>>>       Error *local_err = NULL;
>>>   
>>> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
>>> +    qcow2_store_persistent_dirty_bitmaps(bs, true, &local_err);
>>>       if (local_err != NULL) {
>>>           result = -EINVAL;
>>>           error_reportf_err(local_err, "Lost persistent bitmaps during "
>>> diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
>>> index 36712689d3..e8b0c9d4bb 100755
>>> --- a/tests/qemu-iotests/255
>>> +++ b/tests/qemu-iotests/255
>>> @@ -79,3 +79,5 @@ def test(persistent, restart):
>>>   
>>>   
>>>   test(persistent=False, restart=False)
>>> +test(persistent=True, restart=False)
>>> +test(persistent=True, restart=True)
>>> diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
>>> index 2bffb486d2..46e2e3ad4e 100644
>>> --- a/tests/qemu-iotests/255.out
>>> +++ b/tests/qemu-iotests/255.out
>>> @@ -15,3 +15,38 @@ check, that no bitmaps in snapshot: not found
>>>   {"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>>   check merged bitmap: name=bitmap0 dirty-clusters=2
>>>   check updated bitmap: name=bitmap0 dirty-clusters=3
>>> +
>>> +Testcase persistent without restart
>>> +
>>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
>>> +{"return": {}}
>>> +initial bitmap: name=bitmap0 dirty-clusters=1
>>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>>> +{"return": {}}
>>> +check, that no bitmaps in snapshot: not found
>>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>>> +{"return": {}}
>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>>> +{"return": {}}
>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>>> +
>>> +Testcase persistent with restart
>>> +
>>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
>>> +{"return": {}}
>>> +initial bitmap: name=bitmap0 dirty-clusters=1
>>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>>> +{"return": {}}
>>> +check, that no bitmaps in snapshot: not found
>>> +... Restart ...
>>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>>> +{"return": {}}
>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>>> +{"return": {}}
>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-29 15:58     ` Vladimir Sementsov-Ogievskiy
@ 2019-05-29 18:18       ` Max Reitz
  0 siblings, 0 replies; 17+ messages in thread
From: Max Reitz @ 2019-05-29 18:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, jsnow, Denis Lunev, Alberto Garcia

[-- Attachment #1: Type: text/plain, Size: 6528 bytes --]

On 29.05.19 17:58, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 18:33, Max Reitz wrote:
>> On 23.05.19 17:47, Vladimir Sementsov-Ogievskiy wrote:
>>> Current logic
>>> =============
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and release BdrvDirtyBitmap's.
>>>
>>> Reopen ro -> rw:
>>>
>>> Load bitmap list
>>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>>     the worst thing]
>>> A kind of fail with error message if we see not readonly bitmap
>>>
>>> This leads to at least the following bug:
>>>
>>> Assume we have bitmap0.
>>> Create external snapshot
>>>    bitmap0 is stored and therefore removed
>>> Commit snapshot
>>>    now we have no bitmaps
>>> Do some writes from guest (*)
>>>    they are not marked in bitmap
>>> Shutdown
>>> Start
>>>    bitmap0 is loaded as valid, but it is actually broken! It misses
>>>    writes (*)
>>> Incremental backup
>>>    it will be inconsistent
>>>
>>> New logic
>>> =========
>>>
>>> Reopen rw -> ro:
>>>
>>> Store bitmaps and don't remove them from RAM, only mark them readonly
>>> (the latter is already done, but didn't work properly because of
>>> removing in storing function)
>>>
>>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>>> now in qcow2_reopen_bitmaps_rw)
>>>
>>> Load bitmap list
>>>
>>> Carefully consider all possible combinations of bitmaps in RAM and in
>>> the image, try to fix corruptions, print corresponding error messages.
>>>
>>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>>> commited, as otherwise we can't write to the qcow2 file child on reopen
>>> ro -> rw.
>>>
>>> Also, add corresponding test-cases to 255.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2.h              |   5 +-
>>>   include/block/block_int.h  |   2 +-
>>>   block.c                    |  29 ++----
>>>   block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>>>   block/qcow2.c              |   2 +-
>>>   tests/qemu-iotests/255     |   2 +
>>>   tests/qemu-iotests/255.out |  35 +++++++
>>>   7 files changed, 193 insertions(+), 79 deletions(-)
>>
>> [...]
>>
>>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>>> index 2b84bfa007..4e565db11f 100644
>>> --- a/block/qcow2-bitmap.c
>>> +++ b/block/qcow2-bitmap.c
>>
>> [...]
>>
>>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>       return list;
>>>   }
>>>   
>>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>>> -                                 Error **errp)
>>> +/*
>>> + * qcow2_reopen_bitmaps_rw
>>> + *
>>> + * The function is called in bdrv_reopen_multiple after all calls to
>>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>>> + * write access to child bs, and with current reopening architecture, when
>>> + * reopen ro -> rw it is possible only after all reopen commits.
>>> + *
>>> + * So, we can't fail here.
>>
>> Why?  Because the current design forbids it?
>>
>> Then the current design seems flawed to me.
>>
>> [CC-ing Berto]
>>
>> Without having looked too close into this, it seems from your commit
>> message that it is possible to run into unrecoverable fatal errors here.
>>   We cannot just ignore that on the basis that the current design cannot
>> deal with that.
>>
>> It just appears wrong to me to update any flags in the image in the
>> .commit() part of a transaction.  We should try to do so in .prepare().
>>   If it works, great, we’re set for .commit().  If it doesn’t, welp, time
>> for .abort() and pretend the image is still read-only (even though it
>> kind of may be half-prepared for writing).
>>
>> If we fail to set IN_USE in .commit(), that’s a fatal error in my opinion.
>>
>> After some chatting with John, I’ve come to the following belief:
>>
>> When switching a node from RO to R/W, it must be able to write to its
>> children in .prepare() -- precisely because performing this switch may
>> need some metadata change.  If we cannot do this change, then we cannot
>> switch the node to R/W.  So it’s clear to me that this needs to be done
>> in .prepare().
>>
>> So I think a node’s children must be R/W before we can .prepare() the
>> node.  That means we need to .commit() them.  That means we cannot have
>> a single transaction that switches a whole tree to be R/W, but it must
>> consist of one transaction per level.
>>
>> If something fails, we need to roll back all the previous layers.  Well,
>> it depends.
>>
>> If we switch to R/W (and something fails), then we need to try to revert
>> the children we have already made R/W to be RO.  If that doesn’t work,
>> welp, too bad, but not fatal.
> 
> And, than, why not do full reopen rw in .prepare, and just organize
> recursion so that children reopened rw before parent?

(1) Because .abort() must always succeed.  What I’m proposing is a
solution where we have to actively roll back, and that may not always work.

(2) Intuitively, I’d think it’d be easier to modify the central block
layer instead of having to check/modify all drivers that (A) their
prepare does everything, and (B) their abort can still roll back.  I may
well be wrong.

Maybe that is the easier solution.  .commit() would then just be a
cleanup, right?

> Than we again have
> one transaction for the tree, but abort may fail to rollback it in worst case.
> But we cant avoid it anyway, with one transaction or with transactions per level..

If abort can roll back.

> So, just move all code from .commit to .prepare for reopen-rw and try to roll-back
> it in .abort?

Well, "just".  It’s not just this one driver.

> Or do you have an idea for a patch?

No, I don’t.

Max

>> Switching to RO goes the other way around (parents to children), so if
>> we encounter an error there, we cannot make that node’s children RO,
>> too.  We could try to revert the whole change and make the whole tree
>> R/W again, or we just don’t.  I think “just don’t” is reasonable.
>>
>> Max
>>
>>>      On the other hand, we may face different kinds of
>>> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
>>> + *
>>> + * Try to handle as many cases as we can.
>>> + */
>>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-29 18:08       ` John Snow
@ 2019-05-30  8:23         ` Vladimir Sementsov-Ogievskiy
  2019-05-30 11:54           ` Vladimir Sementsov-Ogievskiy
  2019-05-30 14:20           ` John Snow
  0 siblings, 2 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-30  8:23 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: fam, kwolf, Denis Lunev, mreitz

29.05.2019 21:08, John Snow wrote:
> Max has picked this thread up for block discussion, so I'm going to
> stick to slightly more bitmap related discussion here; we'll resume
> block discussion in the other tail of this thread.
> 
> On 5/29/19 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 29.05.2019 2:24, John Snow wrote:
>>>
>>>
>>> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> Current logic
>>>> =============
>>>>
>>>> Reopen rw -> ro:
>>>>
>>>> Store bitmaps and release BdrvDirtyBitmap's.
>>>>
>>>> Reopen ro -> rw:
>>>>
>>>> Load bitmap list
>>>> Skip bitmaps which for which we don't have BdrvDirtyBitmap [this is
>>>>      the worst thing]
>>>
>>> ...ah.
>>>
>>>> A kind of fail with error message if we see not readonly bitmap
>>>>
>>>> This leads to at least the following bug:
>>>>
>>>> Assume we have bitmap0.
>>>
>>> Presumably a normal, persistent bitmap.
>>>
>>>> Create external snapshot
>>>>     bitmap0 is stored and therefore removed
>>>
>>> Written out to the backing file and removed from memory, because we've
>>> reopened rw->ro; this is because of the migration "safety" clause where
>>> we simply drop these bitmaps.
>>>
>>> ...Ah, and then we don't actually open them readonly again; that entire
>>> stanza in reopen_ro never fires off at all because the bitmaps are
>>> already gone.
>>>
>>>> Commit snapshot
>>>>     now we have no bitmaps
>>>
>>> When we reopen the base as rw as you note, we skipped bitmaps for which
>>> we had no in-memory bitmap for -- because the readonly logic was really
>>> expecting to have these in-memory.
>>>
>>> I should probably say that for the sake of migration correctness, the
>>> way we flush things to disk and remove it from memory on write is really
>>> bothersome to keep correct. The migration logic is so particular that it
>>> keeps causing issues elsewhere, of which this is another symptom.
>>>
>>>> Do some writes from guest (*)
>>>>     they are not marked in bitmap
>>>
>>> Yikes, right.
>>>
>>>> Shutdown
>>>> Start
>>>>     bitmap0 is loaded as valid, but it is actually broken! It misses
>>>>     writes (*)
>>>
>>> Yikes; because it was consistent on flush and we skipped it on load,
>>> it's not marked as IN_USE and we are free to load it up again.
>>>
>>>> Incremental backup
>>>>     it will be inconsistent
>>>>
>>>
>>> Good writeup, thank you.
>>>
>>>> New logic
>>>> =========
>>>>
>>>> Reopen rw -> ro:
>>>>
>>>> Store bitmaps and don't remove them from RAM, only mark them readonly
>>>> (the latter is already done, but didn't work properly because of
>>>> removing in storing function)
>>>>
>>>
>>> Yes! I like this change.
>>>
>>>> Reopen to rw (don't filter case with reopen rw -> rw, it is supported
>>>> now in qcow2_reopen_bitmaps_rw)
>>>>
>>>
>>> OK, so we allow rw --> rw without trying to be fussy about it. That
>>> seems fine to me.
>>>
>>>> Load bitmap list
>>>>
>>>> Carefully consider all possible combinations of bitmaps in RAM and in
>>>> the image, try to fix corruptions, print corresponding error messages.
>>>>
>>>> Also, call qcow2_reopen_bitmaps_rw after the whole reopen queue
>>>> commited, as otherwise we can't write to the qcow2 file child on reopen
>>>> ro -> rw.
>>>>
>>>
>>> I take it this is the change that moved the invocation logic into
>>> bdrv_reopen_multiple instead of bdrv_reopen_commit; also notably we no
>>> longer check the transition edge which is much simpler.
>>>
>>> oh, I see why you're doing it there now...
>>>
>>>> Also, add corresponding test-cases to 255.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/qcow2.h              |   5 +-
>>>>    include/block/block_int.h  |   2 +-
>>>>    block.c                    |  29 ++----
>>>>    block/qcow2-bitmap.c       | 197 ++++++++++++++++++++++++++-----------
>>>>    block/qcow2.c              |   2 +-
>>>>    tests/qemu-iotests/255     |   2 +
>>>>    tests/qemu-iotests/255.out |  35 +++++++
>>>>    7 files changed, 193 insertions(+), 79 deletions(-)
>>>>
>>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>>> index 8d92ef1fee..5928306e62 100644
>>>> --- a/block/qcow2.h
>>>> +++ b/block/qcow2.h
>>>> @@ -725,9 +725,10 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>>                                                    Error **errp);
>>>>    int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>>>>                                     Error **errp);
>>>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp);
>>>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs);
>>>>    int qcow2_truncate_bitmaps_check(BlockDriverState *bs, Error **errp);
>>>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp);
>>>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>>> +                                          bool release_stored, Error **errp);
>>>>    int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp);
>>>>    bool qcow2_can_store_new_dirty_bitmap(BlockDriverState *bs,
>>>>                                          const char *name,
>>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>>> index 1eebc7c8f3..9a694f3da0 100644
>>>> --- a/include/block/block_int.h
>>>> +++ b/include/block/block_int.h
>>>> @@ -536,7 +536,7 @@ struct BlockDriver {
>>>>         * as rw. This handler should realize it. It also should unset readonly
>>>>         * field of BlockDirtyBitmap's in case of success.
>>>>         */
>>>> -    int (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs, Error **errp);
>>>> +    void (*bdrv_reopen_bitmaps_rw)(BlockDriverState *bs);
>>>>        bool (*bdrv_can_store_new_dirty_bitmap)(BlockDriverState *bs,
>>>>                                                const char *name,
>>>>                                                uint32_t granularity,
>>>> diff --git a/block.c b/block.c
>>>> index cb11537029..db1ec0c673 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -3334,6 +3334,16 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
>>>>            bdrv_reopen_commit(&bs_entry->state);
>>>>        }
>>>>    
>>>> +    QSIMPLEQ_FOREACH(bs_entry, bs_queue, entry) {
>>>> +        BlockDriverState *bs = bs_entry->state.bs;
>>>> +
>>>> +        if (!bdrv_is_writable(bs) || !bs->drv->bdrv_reopen_bitmaps_rw) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        bs->drv->bdrv_reopen_bitmaps_rw(bs);
>>>> +    }
>>>> +
>>>
>>> OK, so this has been moved up into the main body of the loop because we
>>> cannot trust it to run in bdrv_reopen_commit because of the order in
>>> which the nodes are reopened might leave us unable to write to our child
>>> nodes to issue the IN_USE flag.
>>>
>>> I have kept out of these discussions in the past; is there a general
>>> solution that allows us to sort the block DAG leaf-up to avoid this
>>> scenario?
>>
>> In Virtuozzo I have a hacking patch, which reverse order of reopen queue
>> before commit for reopen-rw. But here is simpler solution.
>>
>> I can't find the discussion now, but if I remember correctly Kevin didn't like
>> an idea of reverting the queue. He supposed some generic way, in which during
>> we have access to both states of reopening node..
>>
>>>
>>> Anyway, since the block graph organization isn't my problem I will say
>>> that I think this is fine by me; but I'm not the one to impress here.
>>>
>>>>        ret = 0;
>>>>    cleanup_perm:
>>>>        QSIMPLEQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
>>>> @@ -3770,16 +3780,12 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>>>        BlockDriver *drv;
>>>>        BlockDriverState *bs;
>>>>        BdrvChild *child;
>>>> -    bool old_can_write, new_can_write;
>>>>    
>>>>        assert(reopen_state != NULL);
>>>>        bs = reopen_state->bs;
>>>>        drv = bs->drv;
>>>>        assert(drv != NULL);
>>>>    
>>>> -    old_can_write =
>>>> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
>>>> -
>>>>        /* If there are any driver level actions to take */
>>>>        if (drv->bdrv_reopen_commit) {
>>>>            drv->bdrv_reopen_commit(reopen_state);
>>>> @@ -3823,21 +3829,6 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
>>>>        }
>>>>    
>>>>        bdrv_refresh_limits(bs, NULL);
>>>> -
>>>> -    new_can_write =
>>>> -        !bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
>>>> -    if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
>>>> -        Error *local_err = NULL;
>>>> -        if (drv->bdrv_reopen_bitmaps_rw(bs, &local_err) < 0) {
>>>> -            /* This is not fatal, bitmaps just left read-only, so all following
>>>> -             * writes will fail. User can remove read-only bitmaps to unblock
>>>> -             * writes.
>>>> -             */
>>>> -            error_reportf_err(local_err,
>>>> -                              "%s: Failed to make dirty bitmaps writable: ",
>>>> -                              bdrv_get_node_name(bs));
>>>> -        }
>>>> -    }
>>>>    }
>>>>    
>>>
>>> Certainly the new code is simpler here, which is good.
>>>
>>>>    /*
>>>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>>>> index 2b84bfa007..4e565db11f 100644
>>>> --- a/block/qcow2-bitmap.c
>>>> +++ b/block/qcow2-bitmap.c
>>>> @@ -28,6 +28,7 @@
>>>>    #include "qemu/osdep.h"
>>>>    #include "qapi/error.h"
>>>>    #include "qemu/cutils.h"
>>>> +#include "qemu/error-report.h"
>>>>    
>>>>    #include "block/block_int.h"
>>>>    #include "qcow2.h"
>>>> @@ -951,6 +952,12 @@ static void set_readonly_helper(gpointer bitmap, gpointer value)
>>>>        bdrv_dirty_bitmap_set_readonly(bitmap, (bool)value);
>>>>    }
>>>>    
>>>> +/* for g_slist_foreach for GSList of const char* elements */
>>>> +static void error_report_helper(gpointer message, gpointer _unused)
>>>> +{
>>>> +    error_report("%s", (const char *)message);
>>>> +}
>>>> +
>>>>    /* qcow2_load_dirty_bitmaps()
>>>>     * Return value is a hint for caller: true means that the Qcow2 header was
>>>>     * updated. (false doesn't mean that the header should be updated by the
>>>> @@ -1103,76 +1110,150 @@ Qcow2BitmapInfoList *qcow2_get_bitmap_info_list(BlockDriverState *bs,
>>>>        return list;
>>>>    }
>>>>    
>>>> -int qcow2_reopen_bitmaps_rw_hint(BlockDriverState *bs, bool *header_updated,
>>>> -                                 Error **errp)
>>>> +/*
>>>> + * qcow2_reopen_bitmaps_rw
>>>> + *
>>>> + * The function is called in bdrv_reopen_multiple after all calls to
>>>> + * bdrv_reopen_commit, when final bs is writable. It is done so, as we need
>>>> + * write access to child bs, and with current reopening architecture, when
>>>> + * reopen ro -> rw it is possible only after all reopen commits.
>>>> + *
>>>> + * So, we can't fail here. On the other hand, we may face different kinds of
>>>> + * corruptions and/or just can't write IN_USE flags to the image due to EIO.
>>>> + *
>>>> + * Try to handle as many cases as we can.
>>>
>>> Hm, I think you're right, but it does make me really uncomfortable that
>>> we lose the ability to report errors back up the stack. I guess we
>>> already always did ignore them, so this is no worse, but I don't like
>>> the idea of adding new error_report_err calls if we can help it.
>>>
>>> I guess we can't help it, though.
>>
>> It's possible to return error and print it in bdrv_reopen_multiple instead, like
>> it was pre-patch, but I have two reasons against:
>>
>> 1. I wanted to stress that interface is for calling from commit code, and cannot fail.
>>
>> 2. I can't implement here clean logic like SUCCESS or (ERROR, nothing changed), as some
>> errors, are critical, some are not, we are trying to fix corruptions, so here is difficult
>> failure-handling logic, so, it may be simpler to keep it all here, not reporting an error,
>> which can't be handled in commit code anyway.
>>
>>>
>>>> + */
>>>> +void qcow2_reopen_bitmaps_rw(BlockDriverState *bs)
>>>>    {
>>>>        BDRVQcow2State *s = bs->opaque;
>>>>        Qcow2BitmapList *bm_list;
>>>>        Qcow2Bitmap *bm;
>>>> -    GSList *ro_dirty_bitmaps = NULL;
>>>> +    GSList *ro_dirty_bitmaps = NULL, *corrupted_bitmaps = NULL;
>>>> +    Error *local_err = NULL;
>>>>        int ret = 0;
>>>> -
>>>> -    if (header_updated != NULL) {
>>>> -        *header_updated = false;
>>>> -    }
>>>> +    bool need_update = false, updated_ok = false;
>>>>    
>>>>        if (s->nb_bitmaps == 0) {
>>>>            /* No bitmaps - nothing to do */
>>>> -        return 0;
>>>> -    }
>>>> -
>>>> -    if (!can_write(bs)) {
>>>> -        error_setg(errp, "Can't write to the image on reopening bitmaps rw");
>>>> -        return -EINVAL;
>>>> +        return;
>>>>        }
>>>>    
>>>>        bm_list = bitmap_list_load(bs, s->bitmap_directory_offset,
>>>> -                               s->bitmap_directory_size, errp);
>>>> +                               s->bitmap_directory_size, &local_err);
>>>>        if (bm_list == NULL) {
>>>> -        return -EINVAL;
>>>> +        error_reportf_err(local_err, "Failed to reopen bitmaps rw: "
>>>> +                          "cannot load bitmap list: ");
>>>> +        return;
>>>>        }
>>>>    
>>>>        QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>>>            BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(bs, bm->name);
>>>> -        if (bitmap == NULL) {
>>>> -            continue;
>>>> -        }
>>>> +        const char *corruption = NULL;
>>>>    
>>>> -        if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>>>> -            error_setg(errp, "Bitmap %s was loaded prior to rw-reopen, but was "
>>>> -                       "not marked as readonly. This is a bug, something went "
>>>> -                       "wrong. All of the bitmaps may be corrupted", bm->name);
>>>> -            ret = -EINVAL;
>>>> -            goto out;
>>>> -        }
>>>> +        if (bm->flags & BME_FLAG_IN_USE) {
>>>> +            if (bitmap == NULL) {
>>>> +                /*
>>>> +                 * It's an unexpected inconsistent bitmap,
>>>> +                 * it is safe to ignore it.
>>>> +                 */
>>>> +                continue;
>>>> +            }
>>>
>>> This is supposed to be a reopen but we've found a bitmap we didn't load,
>>> and it's IN_USE. Should we make any attempt to load it as an
>>> inconsistent bitmap so the user can know about it?
>>
>> I thought about this and decided to keep it simpler.. But on the other hand,
>> why not? So, I don't have strict opinion on this, I can add this logic.
>>
> 
> It just seemed to me that if you're going through such efforts to check
> consistency in reopen, we might as well do this one too. Maybe it can
> make the loop simpler if you can share the code with the (!bitmap) case
> below?
> 
> In lieu of any proper error pathways, alerting the user to abnormalities
> via the API (in the form of +inconsistent bitmaps) seems like one of the
> most helpful things we can do.
> 
>>>
>>> If we're here, it means we DID reopen the image rw, so we ought to have
>>> exclusive ownership of this file; the IN_USE flag here signals an
>>> inconsistency, no?
>>>
>>>>    
>>>> -        bm->flags |= BME_FLAG_IN_USE;
>>>> -        ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
>>>> +            /*
>>>> +             * It's either an inconsistent bitmap, or we are reopening rw -> rw,
>>>> +             * or we just didn't save bitmap for some buggy reason. Still, no
>>>> +             * reason to consider in-RAM bitmap inconsistent, than, mark it rw.
>>>> +             */
>>>
>>> I don't understand lines 2-3 of this comment. As far as I can tell:
>>>
>>> - We might be seeing a legitimately corrupt bitmap. It's fine to mark it
>>> as rw, because we can't write to it anyway. (It was marked inconsistent
>>> on open.)
>>> - We might be seeing a bitmap that's already properly rw. this call is
>>> effectively a no-op.
>>>
>>> Is that right? If that's true, I'd just simply say:
>>>
>>> "This is either an inconsistent bitmap or we are reopening rw -> rw. It
>>> is safe to mark it as not read only in either case."
>>>
>>> What's the "or we just didn't save bitmap for some buggy reason" you are
>>> alluding to?
>>
>> I mean, for example, on previous reopen to ro, we failed to store the bitmap and
>> therefore failed to drop IN_USE flag. So, we see it now.
>>
> 
> Oh, okay. Understood.
> 
>>>
>>>> +            bdrv_dirty_bitmap_set_readonly(bitmap, false);
>>>> +        } else {
>>>> +            /*
>>>> +             * In-image bitmap not marked IN_USE
>>>> +             *
>>>> +             * The only valid case is
>>>> +             *     bitmap && bdrv_dirty_bitmap_readonly(bitmap) &&
>>>> +             *        !bdrv_dirty_bitmap_inconsistent(bitmap))
>>>> +             *
>>>> +             * which means reopen ro -> rw of consistent bitmap.
>>>> +             *
>>>> +             * Other cases a different kinds of corruptions:
>>>> +             */
>>>> +            if (!bitmap) {
>>>> +                corruption =
>>>> +                    "Corruption: unexpected valid bitmap '%s' is found in the "
>>>> +                    "image '%s' on reopen rw. Will try to set IN_USE flag.";
>>>> +
>>>
>>> In this case, we find a valid bitmap we expected to have a readonly copy
>>> of in memory, but didn't. We attempt to load it.
>>>
>>>> +                bitmap = load_bitmap(bs, bm, NULL);
>>>> +                if (!bitmap) {
>>>> +                    bitmap = bdrv_create_dirty_bitmap(
>>>> +                            bs, bdrv_get_default_bitmap_granularity(bs),
>>>> +                            bm->name, NULL);
>>>> +                }
>>>> +
>>>> +                if (bitmap) {
>>>> +                    bdrv_dirty_bitmap_set_persistence(bitmap, true);
>>>> +                    bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>>> +                    bdrv_dirty_bitmap_set_inconsistent(bitmap);
>>>
>>> And we mark it as inconsistent because we're not sure how we missed it
>>> earlier. OK.
>>>
>>>> +                }
>>>> +            } else if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>>>> +                corruption =
>>>> +                    "Corruption: bitmap '%s' is not marked IN_USE in the "
>>>> +                    "image '%s' and not marked readonly in RAM. Will try to "
>>>> +                    "set IN_USE flag.";
>>>> +
>>>
>>> And in this case, we find the bitmap but it's not marked readonly for
>>> some reason.
>>>
>>>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>>
>>> Why set it readonly again?
>>
>> It is because inconsistance is not synced to the image. "readonly" exactly
>> means, that for some reasons we did not marked bitmap IN_USE in the image and
>> therefore must not write to it.
>>
> 
> That's one way of looking at what readonly means. Another was: "The
> image this bitmap is attached to is read only, and any writes to this
> bitmap are a logistical error."
> 
>> So, yes, here occurs new thing: readonly-inconsistent bitmap. It blocks guest
>> writes until we sync it somehow to the image or remove. And we are going to sync
>> it at the end of this function.
>>
> 
> Right, we've not really used readonly in this way before. It makes sense
> to a point, but it's a bit of a semantic overload -- the disk is
> actually RW but the bitmap is RO; the problem that I have with this is
> that we guard RO bitmaps with assertions and not errors,

Oops, you are right. I thought we have errors for guest writes in this case.

> so it seems
> feasible to me that the reopen-rw will succeed, a guest will write, and
> then we'll abort because of these "readonly corrupt" bitmaps.
> 
> In other words, we don't *really* support the idea of having readonly
> bitmaps on readwrite nodes.
> 
> I think given that this is an error state to begin with that simply
> marking the bitmap as inconsistent in memory (and trying to write the
> IN_USE flag to its header) is sufficient, it will skip any new writes
> and prohibit its use for any backup operations.

So, you propose not to annoy user too much because of inconsistent bitmaps,
for which we can't sync their inconsistancy to the image.. May be it's OK,
but let's see what we decide in block-discussion. It would be really cool
if we can move this all to .prepare

> 
>>>
>>>> +                bdrv_dirty_bitmap_set_inconsistent(bitmap);
>>>
>>> And just in case, we mark it inconsistent. (It's my impression that any
>>> such write would have failed earlier, but maybe not.)
>>
>> hm, which write?
>>
> 
> You know, I was thinking we'd have been in RO mode before this, but I
> forgot that's not true anymore. Disregard this comment.
> 
>>>
>>>> +            } else if (bdrv_dirty_bitmap_inconsistent(bitmap)) {
>>>> +                corruption =
>>>> +                    "Corruption: bitmap '%s' is inconsistent but is not marked "
>>>> +                    "IN_USE in the image. Will try to set IN_USE flag.";
>>>> +
>>>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>>
>>> This one is weirder. We have an inconsistent bitmap but it's not IN_USE;
>>> so we set it readonly again? Why?
>>
>> Same here, as "inconsitance" is not synced to the image. We'll drop readonly flag if
>> we sync it successfully.
>>
>>>
>>>> +            }
>>>> +
>>>> +            if (bitmap) {
>>>> +                ro_dirty_bitmaps = g_slist_append(ro_dirty_bitmaps, bitmap);
>>>
>>> Oh, all those bitmaps we just set readonly we're going to mark as not
>>> readonly again?
>>
>> yes, if successfully sync IN_USE flags
>>
>>>
>>>> +            }
>>>> +            bm->flags |= BME_FLAG_IN_USE;
>>>> +            need_update = true;
>>>> +            if (corruption) {
>>>> +                error_report(corruption, bm->name, bs->filename);
>>>> +                corrupted_bitmaps = g_slist_append(corrupted_bitmaps, bm->name);
>>>> +            }
>>>> +        }
>>>>        }
>>>>    
>>>> -    if (ro_dirty_bitmaps != NULL) {
>>>> +    if (need_update) {
>>>> +        if (!can_write(bs->file->bs)) {
>>>
>>> I genuinely don't know: is it legitimate to check your child's write
>>> permission in this way? will we always have bs->file->bs?
>>
>> Hmm.. but we are going to write to it very soon, I think it should exist.
>>
> 
> Apparently Max is adding a bdrv_storage_bs() helper for this exact
> thing, in an upcoming patch. I just get nervous when I see double
> indirections >

Hmm... But if we have separate data file for qcow2, bdrv_storage_bs will refer to it,
so here we need exactly bs->file I think.

>>>
>>>> +            error_report("Failed to reopen bitmaps rw: cannot write to "
>>>> +                         "the protocol file");
>>>> +            goto handle_corrupted;
>>>> +        }
>>>> +
>>>>            /* in_use flags must be updated */
>>>>            ret = update_ext_header_and_dir_in_place(bs, bm_list);
>>>>            if (ret < 0) {
>>>> -            error_setg_errno(errp, -ret, "Can't update bitmap directory");
>>>> -            goto out;
>>>> -        }
>>>> -        if (header_updated != NULL) {
>>>> -            *header_updated = true;
>>>> +            error_report("Cannot update bitmap directory: %s", strerror(-ret));
>>>> +            goto handle_corrupted;
>>>>            }
>>>> +        updated_ok = true;
>>>>            g_slist_foreach(ro_dirty_bitmaps, set_readonly_helper, false);
>>>>        }
>>>
>>> And then here at the end the bulk of the work: we re-write the header if
>>> necessary back out to disk to mark everything as IN_USE.
>>>
>>> It seems strange that you are trying to maintain the invariant that we
>>> won't set readonly to false if we fail, because this function "cannot fail."
>>
>> Yes, if we fail, bitmaps remains readonly, and it is preexisting behavior. User
>> can try reopen again or try to remove bitmaps.. It shouldn't happen often in
>> practice, but we must do everything we can to prevent appearing in the image
>> invalid bitmap not marked IN_USE, as it will lead to inconsistent backup.
>>
>> Hmm, of-course, best thing is to do it all in commit-prepare, where we can fail
>> legally. But then we'll need write-access to child already in prepare, when now
>> we don't have it even in commit, I don't know how to achieve it.
>>
> 
> I've talked to Max and I think he has some opinions about how we might
> be able to do this, but it might rely on some of his pending series.
> 
>>>
>>>>    
>>>> -out:
>>>> +handle_corrupted:
>>>> +    if (corrupted_bitmaps) {
>>>> +        if (updated_ok) {
>>>> +            error_report("Corrupted bitmaps in '%s' successfully marked "
>>>> +                         "IN_USE", bs->filename);
>>>> +        } else {
>>>> +            error_report("Failed to mark IN_USE the following corrupted "
>>>> +                         "bitmaps in '%s', DO NOT USE THEM:", bs->filename);
>>>> +            g_slist_foreach(corrupted_bitmaps, error_report_helper, NULL);
>>>> +        }
>>>> +    }
>>>> +
>>>>        g_slist_free(ro_dirty_bitmaps);
>>>> +    g_slist_free(corrupted_bitmaps);
>>>>        bitmap_list_free(bm_list);
>>>> -
>>>> -    return ret;
>>>> -}
>>>> -
>>>
>>> I just don't know; for how much error checking this function is doing
>>> now, it seems wrong to be behind an interface that cannot fail and will
>>> actually do different things to the bitmaps depending on what it sees
>>> with no return code to help the caller understand the cases.
>>>
>>> It even has a bit at the end where it tries to print in uppercase to
>>> manually scare a user into taking action, which says to me that there is
>>> a deeper problem we need to be able to address without intervention from
>>> the user.
>>>
>>> That said, the patch does seem correct otherwise; and it does fix a
>>> nasty bug which lets us use bitmaps with snapshots. I want this in for
>>> 4.1 if I can help it. I will talk to Kevin and Max and see if there's
>>> some opinion here.
>>
>> Yes. In short, it was bad, it still bad, but at least one bug is fixed :)
>>
> 
> Hahaha! Very good summary. Let's discuss the block implications with
> Max, Berto and Kevin. Until then, do you think it's OK to split out the
> release_bitmaps boolean as its own patch to "lessen" the severity of the
> bug?

Bitmap will not be reopend rw anyway, so, we should crash on first guest write, as
you noted..

Maybe, I'll refactor it back, to return normal error, and just ignore it in commit, so that,
we'll move it to .prepare as soon as we able to do and with less pain?

> 
> It looks like the reopen infrastructural changes might be a bit harder.
> 
>>>
>>> Thanks!
>>>
>>>> -int qcow2_reopen_bitmaps_rw(BlockDriverState *bs, Error **errp)
>>>> -{
>>>> -    return qcow2_reopen_bitmaps_rw_hint(bs, NULL, errp);
>>>>    }
>>>>    
>>>>    /* Checks to see if it's safe to resize bitmaps */
>>>> @@ -1446,7 +1527,8 @@ fail:
>>>>        bitmap_list_free(bm_list);
>>>>    }
>>>>    
>>>> -void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>>>> +void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>>> +                                          bool release_stored, Error **errp)
>>>>    {
>>>>        BdrvDirtyBitmap *bitmap;
>>>>        BDRVQcow2State *s = bs->opaque;
>>>> @@ -1559,20 +1641,23 @@ void qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs, Error **errp)
>>>>            g_free(tb);
>>>>        }
>>>>    
>>>> -    QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>>> -        /* For safety, we remove bitmap after storing.
>>>> -         * We may be here in two cases:
>>>> -         * 1. bdrv_close. It's ok to drop bitmap.
>>>> -         * 2. inactivation. It means migration without 'dirty-bitmaps'
>>>> -         *    capability, so bitmaps are not marked with
>>>> -         *    BdrvDirtyBitmap.migration flags. It's not bad to drop them too,
>>>> -         *    and reload on invalidation.
>>>> -         */
>>>> -        if (bm->dirty_bitmap == NULL) {
>>>> -            continue;
>>>> -        }
>>>> +    if (release_stored) {
>>>> +        QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>>>> +            /*
>>>> +             * For safety, we remove bitmap after storing.
>>>> +             * We may be here in two cases:
>>>> +             * 1. bdrv_close. It's ok to drop bitmap.
>>>> +             * 2. inactivation. It means migration without 'dirty-bitmaps'
>>>> +             *    capability, so bitmaps are not marked with
>>>> +             *    BdrvDirtyBitmap.migration flags. It's not bad to drop them
>>>> +             *    too, and reload on invalidation.
>>>> +             */
>>>
>>> While we're here, I might touch up the comment.
>>>
>>> For safety, the BdrvDirtyBitmap can be dropped after storing.
>>> We may be here in two cases, both via qcow2_inactivate:
>>> 1. bdrv_close: It's correct to remove bitmaps on close.
>>> 2. migration: This implies we are migrating without the
>>>      'dirty-bitmaps' capability, because bitmap->migration was unset.
>>
>> I'm not sure in my understanding of English "because", but bitmap->migration
>> is a consequence, not a reason of not enabling this capability.
>>
> 
> The fun thing about English is that there are definitely no rules and it
> never makes sense. The other fun thing about English is that I am fairly
> certain I don't know how to speak it.
> 
> In this case, I mean to say: "bitmap->migration is unset. since it is
> unset, we can deduce that we are not running in a context where we are
> doing a dirty-bitmaps migration."
> 
> Or, "migration: The implication (that we are migration without the
> 'dirty-bitmaps' capability) follows from {bitmap->migration being unset}."
> 
> I mean to say that the implication/deduction is what follows from the
> boolean being unset. Of course, the boolean being unset is itself the
> consequence from the fact that we are not in such a migration.
> 
> Clear as mud?
> 
> Let's say this:
> 
> "2. migration: We know we are migrating without the 'dirty-bitmaps'
> capability, since bitmap->migration was found false above."

Ok)

> 
> Hopefully this clarifies that migration was "set false" means not when
> it was assigned false, but was "found to be set to false".
> 
> Sorry about my language.
> 
> (Well, not *my* language, but the one I happen to speak.)
> 
>>>      If needed, these bitmaps will be reloaded on invalidation.
>>>
>>> I just wanted to clarify these points:
>>>
>>> - The new boolean obviously means we don't /always/ drop them
>>> - qcow2_inactivate is the only caller that instructs us to drop them
>>> - both cases in the comment are through qcow2_inactivate.
>>
>> Agree.
>>
>>>
>>> you can take any or none of my wording suggestions as you feel is
>>> appropriate.
>>
>> Thank you! And for reviewing this complicated patch! I know, two words "Also"
>> in commit message are never followed by good clean design, and this is not an
>> exclusion.
>>
> 
> Thanks for tackling it. It's always nice to see clear examples of
> limitations of the current infrastructure.
> 
> (Well, not "nice" but certainly "helpful".)
> 
>>>
>>>> +            if (bm->dirty_bitmap == NULL) {
>>>> +                continue;
>>>> +            }
>>>>    
>>>> -        bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>>>> +            bdrv_release_dirty_bitmap(bs, bm->dirty_bitmap);
>>>> +        }
>>>>        }
>>>>    
>>>>    success:
>>>> @@ -1600,7 +1685,7 @@ int qcow2_reopen_bitmaps_ro(BlockDriverState *bs, Error **errp)
>>>>        BdrvDirtyBitmap *bitmap;
>>>>        Error *local_err = NULL;
>>>>    
>>>> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
>>>> +    qcow2_store_persistent_dirty_bitmaps(bs, false, &local_err);
>>>>        if (local_err != NULL) {
>>>>            error_propagate(errp, local_err);
>>>>            return -EINVAL;
>>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>>> index d39882785d..f0a8479874 100644
>>>> --- a/block/qcow2.c
>>>> +++ b/block/qcow2.c
>>>> @@ -2273,7 +2273,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>>>>        int ret, result = 0;
>>>>        Error *local_err = NULL;
>>>>    
>>>> -    qcow2_store_persistent_dirty_bitmaps(bs, &local_err);
>>>> +    qcow2_store_persistent_dirty_bitmaps(bs, true, &local_err);
>>>>        if (local_err != NULL) {
>>>>            result = -EINVAL;
>>>>            error_reportf_err(local_err, "Lost persistent bitmaps during "
>>>> diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
>>>> index 36712689d3..e8b0c9d4bb 100755
>>>> --- a/tests/qemu-iotests/255
>>>> +++ b/tests/qemu-iotests/255
>>>> @@ -79,3 +79,5 @@ def test(persistent, restart):
>>>>    
>>>>    
>>>>    test(persistent=False, restart=False)
>>>> +test(persistent=True, restart=False)
>>>> +test(persistent=True, restart=True)
>>>> diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
>>>> index 2bffb486d2..46e2e3ad4e 100644
>>>> --- a/tests/qemu-iotests/255.out
>>>> +++ b/tests/qemu-iotests/255.out
>>>> @@ -15,3 +15,38 @@ check, that no bitmaps in snapshot: not found
>>>>    {"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>>>    check merged bitmap: name=bitmap0 dirty-clusters=2
>>>>    check updated bitmap: name=bitmap0 dirty-clusters=3
>>>> +
>>>> +Testcase persistent without restart
>>>> +
>>>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
>>>> +{"return": {}}
>>>> +initial bitmap: name=bitmap0 dirty-clusters=1
>>>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>>>> +{"return": {}}
>>>> +check, that no bitmaps in snapshot: not found
>>>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>>>> +{"return": {}}
>>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>>>> +{"return": {}}
>>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>>>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>>>> +
>>>> +Testcase persistent with restart
>>>> +
>>>> +{"execute": "block-dirty-bitmap-add", "arguments": {"name": "bitmap0", "node": "drive0", "persistent": true}}
>>>> +{"return": {}}
>>>> +initial bitmap: name=bitmap0 dirty-clusters=1
>>>> +{"execute": "blockdev-snapshot-sync", "arguments": {"device": "drive0", "format": "qcow2", "snapshot-file": "TEST_DIR/PID-top"}}
>>>> +{"return": {}}
>>>> +check, that no bitmaps in snapshot: not found
>>>> +... Restart ...
>>>> +{"execute": "block-commit", "arguments": {"device": "drive0", "top": "TEST_DIR/PID-top"}}
>>>> +{"return": {}}
>>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>>> +{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
>>>> +{"return": {}}
>>>> +{"data": {"device": "drive0", "len": 65536, "offset": 65536, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>>>> +check merged bitmap: name=bitmap0 dirty-clusters=2
>>>> +check updated bitmap: name=bitmap0 dirty-clusters=3
>>>>
>>>
>>
>>


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-30  8:23         ` Vladimir Sementsov-Ogievskiy
@ 2019-05-30 11:54           ` Vladimir Sementsov-Ogievskiy
  2019-05-30 14:20           ` John Snow
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-30 11:54 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: fam, kwolf, Denis Lunev, mreitz

30.05.2019 11:23, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 21:08, John Snow wrote:
>> Max has picked this thread up for block discussion, so I'm going to
>> stick to slightly more bitmap related discussion here; we'll resume
>> block discussion in the other tail of this thread.
>>


[..]

>>>> And we mark it as inconsistent because we're not sure how we missed it
>>>> earlier. OK.
>>>>
>>>>> +                }
>>>>> +            } else if (!bdrv_dirty_bitmap_readonly(bitmap)) {
>>>>> +                corruption =
>>>>> +                    "Corruption: bitmap '%s' is not marked IN_USE in the "
>>>>> +                    "image '%s' and not marked readonly in RAM. Will try to "
>>>>> +                    "set IN_USE flag.";
>>>>> +
>>>>
>>>> And in this case, we find the bitmap but it's not marked readonly for
>>>> some reason.
>>>>
>>>>> +                bdrv_dirty_bitmap_set_readonly(bitmap, true);
>>>>
>>>> Why set it readonly again?
>>>
>>> It is because inconsistance is not synced to the image. "readonly" exactly
>>> means, that for some reasons we did not marked bitmap IN_USE in the image and
>>> therefore must not write to it.
>>>
>>
>> That's one way of looking at what readonly means. Another was: "The
>> image this bitmap is attached to is read only, and any writes to this
>> bitmap are a logistical error."
>>
>>> So, yes, here occurs new thing: readonly-inconsistent bitmap. It blocks guest
>>> writes until we sync it somehow to the image or remove. And we are going to sync
>>> it at the end of this function.
>>>
>>
>> Right, we've not really used readonly in this way before. It makes sense
>> to a point, but it's a bit of a semantic overload -- the disk is
>> actually RW but the bitmap is RO; the problem that I have with this is
>> that we guard RO bitmaps with assertions and not errors,
> 
> Oops, you are right. I thought we have errors for guest writes in this case.

Aha, I was (partially) right, we have in bdrv_aligned_pwritev:
     if (bdrv_has_readonly_bitmaps(bs)) {
           return -EPERM;
       }

but what about discard, ...? seems it's not handled.



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-30  8:23         ` Vladimir Sementsov-Ogievskiy
  2019-05-30 11:54           ` Vladimir Sementsov-Ogievskiy
@ 2019-05-30 14:20           ` John Snow
  2019-05-30 14:39             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 17+ messages in thread
From: John Snow @ 2019-05-30 14:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: fam, kwolf, Denis Lunev, mreitz


On 5/30/19 4:23 AM, Vladimir Sementsov-Ogievskiy wrote:
> 29.05.2019 21:08, John Snow wrote:
>> On 5/29/19 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> 29.05.2019 2:24, John Snow wrote:
>>>> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:

[...]

>>
>> Right, we've not really used readonly in this way before. It makes sense
>> to a point, but it's a bit of a semantic overload -- the disk is
>> actually RW but the bitmap is RO; the problem that I have with this is
>> that we guard RO bitmaps with assertions and not errors,
> 
> Oops, you are right. I thought we have errors for guest writes in this case.
> 

(I'm looking at your second email) -- Ah, we guard this elsewhere.

Asserts:
bdrv_set_dirty_bitmap_locked
bdrv_reset_dirty_bitmap_locked
bdrv_clear_dirty_bitmap
bdrv_restore_dirty_bitmap
bdrv_set_dirty

Runtime errors (via has_readonly_bitmaps):
bdrv_aligned_pwritev
bdrv_co_pdiscard

I guess it's OK to set readonly in this way then, but I still think it's
a little confusing and, so long as the in-memory inconsistent bit is
set, not strictly necessary. (Unless there's some case I am not aware
of, or it just helps the code to be nicer in some way, etc. I'm not
aiming to be a purist about the way this flag is used.)

>> so it seems
>> feasible to me that the reopen-rw will succeed, a guest will write, and
>> then we'll abort because of these "readonly corrupt" bitmaps.
>>
>> In other words, we don't *really* support the idea of having readonly
>> bitmaps on readwrite nodes.
>>
>> I think given that this is an error state to begin with that simply
>> marking the bitmap as inconsistent in memory (and trying to write the
>> IN_USE flag to its header) is sufficient, it will skip any new writes
>> and prohibit its use for any backup operations.
> 
> So, you propose not to annoy user too much because of inconsistent bitmaps,
> for which we can't sync their inconsistancy to the image.. May be it's OK,
> but let's see what we decide in block-discussion. It would be really cool
> if we can move this all to .prepare
> 

Hm, I wouldn't say it's about not annoying the user too much; we should
still try to write the IN_USE flag to the image and (ideally) report
that we were unable to if we fail.

So I think doing these two things is enough:

1. Set the in-memory inconsistent flag, which will prohibit the user
from doing anything with this bitmap AND ignore all future data writes

2. Attempt to write the IN_USE flag back out to disk to ensure that it
will be loaded inconsistent in the future. If we fail to do so, this
should be considered a fatal error. (Why can't we write to our node?
It's probably EIO or something fatal.)

Of course, actually having #2 be fatal depends on reworking the block
layer a bit.

[...]

>>>>> +    if (need_update) {
>>>>> +        if (!can_write(bs->file->bs)) {
>>>>
>>>> I genuinely don't know: is it legitimate to check your child's write
>>>> permission in this way? will we always have bs->file->bs?
>>>
>>> Hmm.. but we are going to write to it very soon, I think it should exist.
>>>
>>
>> Apparently Max is adding a bdrv_storage_bs() helper for this exact
>> thing, in an upcoming patch. I just get nervous when I see double
>> indirections >
> 
> Hmm... But if we have separate data file for qcow2, bdrv_storage_bs will refer to it,
> so here we need exactly bs->file I think.
> 

Alright, I don't know the particulars. Whatever works is fine by me.

[...]

>>>
>>> Yes. In short, it was bad, it still bad, but at least one bug is fixed :)
>>>
>>
>> Hahaha! Very good summary. Let's discuss the block implications with
>> Max, Berto and Kevin. Until then, do you think it's OK to split out the
>> release_bitmaps boolean as its own patch to "lessen" the severity of the
>> bug?
> 
> Bitmap will not be reopend rw anyway, so, we should crash on first guest write, as
> you noted..
> 

Why not? If the bitmaps are still in memory (because we didn't close
them fully on reopen-ro), then reopen-rw ought to be able to see them
and drop the readonly marker, right?

> Maybe, I'll refactor it back, to return normal error, and just ignore it in commit, so that,
> we'll move it to .prepare as soon as we able to do and with less pain?
> 

I may have misunderstood Max, but at the moment I'm thinking that as
much as you can move into prepare() and have it still work the better. I
assume that writing the IN_USE flags from prepare() won't work, though,
because the nodes aren't technically RW yet, so the write primitives
won't work...

At this point, I'll trust your judgment to come up with something that
seems tidier; I don't think I have suggestions unless I start
prototyping it too.

(Though I might try to come up with something minimally workable as a
bugfix while we try to tackle more systemic issues...)

--js


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic
  2019-05-30 14:20           ` John Snow
@ 2019-05-30 14:39             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-30 14:39 UTC (permalink / raw)
  To: John Snow, qemu-devel, qemu-block; +Cc: fam, kwolf, Denis Lunev, mreitz

30.05.2019 17:20, John Snow wrote:
> 
> On 5/30/19 4:23 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 29.05.2019 21:08, John Snow wrote:
>>> On 5/29/19 5:10 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> 29.05.2019 2:24, John Snow wrote:
>>>>> On 5/23/19 11:47 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
>>>
>>> Right, we've not really used readonly in this way before. It makes sense
>>> to a point, but it's a bit of a semantic overload -- the disk is
>>> actually RW but the bitmap is RO; the problem that I have with this is
>>> that we guard RO bitmaps with assertions and not errors,
>>
>> Oops, you are right. I thought we have errors for guest writes in this case.
>>
> 
> (I'm looking at your second email) -- Ah, we guard this elsewhere.
> 
> Asserts:
> bdrv_set_dirty_bitmap_locked
> bdrv_reset_dirty_bitmap_locked
> bdrv_clear_dirty_bitmap
> bdrv_restore_dirty_bitmap
> bdrv_set_dirty
> 
> Runtime errors (via has_readonly_bitmaps):
> bdrv_aligned_pwritev
> bdrv_co_pdiscard
> 
> I guess it's OK to set readonly in this way then, but I still think it's
> a little confusing and, so long as the in-memory inconsistent bit is
> set, not strictly necessary. (Unless there's some case I am not aware
> of, or it just helps the code to be nicer in some way, etc. I'm not
> aiming to be a purist about the way this flag is used.)
> 
>>> so it seems
>>> feasible to me that the reopen-rw will succeed, a guest will write, and
>>> then we'll abort because of these "readonly corrupt" bitmaps.
>>>
>>> In other words, we don't *really* support the idea of having readonly
>>> bitmaps on readwrite nodes.
>>>
>>> I think given that this is an error state to begin with that simply
>>> marking the bitmap as inconsistent in memory (and trying to write the
>>> IN_USE flag to its header) is sufficient, it will skip any new writes
>>> and prohibit its use for any backup operations.
>>
>> So, you propose not to annoy user too much because of inconsistent bitmaps,
>> for which we can't sync their inconsistancy to the image.. May be it's OK,
>> but let's see what we decide in block-discussion. It would be really cool
>> if we can move this all to .prepare
>>
> 
> Hm, I wouldn't say it's about not annoying the user too much; we should
> still try to write the IN_USE flag to the image and (ideally) report
> that we were unable to if we fail.
> 
> So I think doing these two things is enough:
> 
> 1. Set the in-memory inconsistent flag, which will prohibit the user
> from doing anything with this bitmap AND ignore all future data writes
> 
> 2. Attempt to write the IN_USE flag back out to disk to ensure that it
> will be loaded inconsistent in the future. If we fail to do so, this
> should be considered a fatal error. (Why can't we write to our node?
> It's probably EIO or something fatal.)
> 
> Of course, actually having #2 be fatal depends on reworking the block
> layer a bit.
> 
> [...]
> 
>>>>>> +    if (need_update) {
>>>>>> +        if (!can_write(bs->file->bs)) {
>>>>>
>>>>> I genuinely don't know: is it legitimate to check your child's write
>>>>> permission in this way? will we always have bs->file->bs?
>>>>
>>>> Hmm.. but we are going to write to it very soon, I think it should exist.
>>>>
>>>
>>> Apparently Max is adding a bdrv_storage_bs() helper for this exact
>>> thing, in an upcoming patch. I just get nervous when I see double
>>> indirections >
>>
>> Hmm... But if we have separate data file for qcow2, bdrv_storage_bs will refer to it,
>> so here we need exactly bs->file I think.
>>
> 
> Alright, I don't know the particulars. Whatever works is fine by me.
> 
> [...]
> 
>>>>
>>>> Yes. In short, it was bad, it still bad, but at least one bug is fixed :)
>>>>
>>>
>>> Hahaha! Very good summary. Let's discuss the block implications with
>>> Max, Berto and Kevin. Until then, do you think it's OK to split out the
>>> release_bitmaps boolean as its own patch to "lessen" the severity of the
>>> bug?
>>
>> Bitmap will not be reopend rw anyway, so, we should crash on first guest write, as
>> you noted..
>>
> 
> Why not? If the bitmaps are still in memory (because we didn't close
> them fully on reopen-ro), then reopen-rw ought to be able to see them
> and drop the readonly marker, right?
> 
>> Maybe, I'll refactor it back, to return normal error, and just ignore it in commit, so that,
>> we'll move it to .prepare as soon as we able to do and with less pain?
>>
> 
> I may have misunderstood Max, but at the moment I'm thinking that as
> much as you can move into prepare() and have it still work the better. I
> assume that writing the IN_USE flags from prepare() won't work, though,
> because the nodes aren't technically RW yet, so the write primitives
> won't work...
> 
> At this point, I'll trust your judgment to come up with something that
> seems tidier; I don't think I have suggestions unless I start
> prototyping it too.
> 
> (Though I might try to come up with something minimally workable as a
> bugfix while we try to tackle more systemic issues...)
> 
> --js
> 

I'm now preparing v2 which tries to do everything we need in prepare, which includes
changes in generic block layer reopen functionality. Seems like it's not very hard.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-05-30 14:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 15:47 [Qemu-devel] [PATCH 0/3] qcow2-bitmaps: rewrite reopening logic Vladimir Sementsov-Ogievskiy
2019-05-23 15:47 ` [Qemu-devel] [PATCH 1/3] iotests: add test 255 to check bitmap life after snapshot + commit Vladimir Sementsov-Ogievskiy
2019-05-24 23:15   ` John Snow
2019-05-25  9:16     ` Vladimir Sementsov-Ogievskiy
2019-05-23 15:47 ` [Qemu-devel] [PATCH 2/3] block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps Vladimir Sementsov-Ogievskiy
2019-05-24 23:37   ` John Snow
2019-05-23 15:47 ` [Qemu-devel] [PATCH 3/3] block/qcow2-bitmap: rewrite bitmap reopening logic Vladimir Sementsov-Ogievskiy
2019-05-28 23:24   ` John Snow
2019-05-29  9:10     ` Vladimir Sementsov-Ogievskiy
2019-05-29 18:08       ` John Snow
2019-05-30  8:23         ` Vladimir Sementsov-Ogievskiy
2019-05-30 11:54           ` Vladimir Sementsov-Ogievskiy
2019-05-30 14:20           ` John Snow
2019-05-30 14:39             ` Vladimir Sementsov-Ogievskiy
2019-05-29 15:33   ` Max Reitz
2019-05-29 15:58     ` Vladimir Sementsov-Ogievskiy
2019-05-29 18:18       ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.