All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
@ 2023-01-12 19:14 Kevin Wolf
  2023-01-12 19:14 ` [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path Kevin Wolf
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Kevin Wolf @ 2023-01-12 19:14 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, aesteve, nsoffer, qemu-devel

This series addresses the problem described in these bug reports:
https://gitlab.com/qemu-project/qemu/-/issues/1330
https://bugzilla.redhat.com/show_bug.cgi?id=2147617

qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
However, when the function is called through blk_unref(), in the case of
such errors, while an error message is written to stderr, the callers
never see an error return. Specifically, 'qemu-img bitmap/commit' are
reported to exit with an exit code 0 despite the errors.

The solution taken here is inactivating the images first, which can
still return errors, but already performs all of the write operations.
Only then the images are actually blk_unref()-ed.

If we agree that this is the way to go (as a potential alternative,
allowing blk_unref() to fail would require changes in all kinds of
places, many of which probably wouldn't even know what to do with the
error), then I suppose doing the same for other qemu-img subcommands
would make sense, too.

As a bonus fix, I found an endianness confusion in an error path of
store_bitmap(). The reported error message "qcow2_free_clusters failed:
No space left on device" looked too suspicious to ignore this. Freeing
an actually existing cluster should never run into ENOSPC.

Kevin Wolf (4):
  qcow2: Fix theoretical corruption in store_bitmap() error path
  qemu-img commit: Report errors while closing the image
  qemu-img bitmap: Report errors while closing the image
  qemu-iotests: Test qemu-img bitmap/commit exit code on error

 block/qcow2-bitmap.c                          |  5 +-
 qemu-img.c                                    | 24 +++++
 .../qemu-iotests/tests/qemu-img-close-errors  | 95 +++++++++++++++++++
 .../tests/qemu-img-close-errors.out           | 23 +++++
 4 files changed, 145 insertions(+), 2 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors
 create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out

-- 
2.38.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path
  2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
@ 2023-01-12 19:14 ` Kevin Wolf
  2023-01-13  7:30   ` Philippe Mathieu-Daudé
  2023-01-12 19:14 ` [PATCH 2/4] qemu-img commit: Report errors while closing the image Kevin Wolf
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2023-01-12 19:14 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, aesteve, nsoffer, qemu-devel

In order to write the bitmap table to the image file, it is converted to
big endian. If the write fails, it is passed to clear_bitmap_table() to
free all of the clusters it had allocated before. However, if we don't
convert it back to native endianness first, we'll free things at a wrong
offset.

In practical terms, the offsets will be so high that we won't actually
free any allocated clusters, but just run into an error, but in theory
this can cause image corruption.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-bitmap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index bcad567c0c..3dff99ba06 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -115,7 +115,7 @@ static int update_header_sync(BlockDriverState *bs)
     return bdrv_flush(bs->file->bs);
 }
 
-static inline void bitmap_table_to_be(uint64_t *bitmap_table, size_t size)
+static inline void bitmap_table_bswap_be(uint64_t *bitmap_table, size_t size)
 {
     size_t i;
 
@@ -1401,9 +1401,10 @@ static int store_bitmap(BlockDriverState *bs, Qcow2Bitmap *bm, Error **errp)
         goto fail;
     }
 
-    bitmap_table_to_be(tb, tb_size);
+    bitmap_table_bswap_be(tb, tb_size);
     ret = bdrv_pwrite(bs->file, tb_offset, tb_size * sizeof(tb[0]), tb, 0);
     if (ret < 0) {
+        bitmap_table_bswap_be(tb, tb_size);
         error_setg_errno(errp, -ret, "Failed to write bitmap '%s' to file",
                          bm_name);
         goto fail;
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] qemu-img commit: Report errors while closing the image
  2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
  2023-01-12 19:14 ` [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path Kevin Wolf
@ 2023-01-12 19:14 ` Kevin Wolf
  2023-01-12 19:14 ` [PATCH 3/4] qemu-img bitmap: " Kevin Wolf
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2023-01-12 19:14 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, aesteve, nsoffer, qemu-devel

blk_unref() can't report any errors that happen while closing the image.
For example, if qcow2 hits an -ENOSPC error while writing out dirty
bitmaps when it's closed, it prints error messages to stderr, but
'qemu-img commit' won't see any error return value and will therefore
look successful with exit code 0.

In order to fix this, manually inactivate the image first before calling
blk_unref(). This already performs the operations that would be most
likely to fail while closing the image, but it can still return errors.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 439d8de1e3..22d6ecefd5 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -449,6 +449,11 @@ static BlockBackend *img_open(bool image_opts,
         blk = img_open_file(filename, NULL, fmt, flags, writethrough, quiet,
                             force_share);
     }
+
+    if (blk) {
+        blk_set_force_allow_inactivate(blk);
+    }
+
     return blk;
 }
 
@@ -1119,6 +1124,14 @@ unref_backing:
 done:
     qemu_progress_end();
 
+    /*
+     * Manually inactivate the image first because this way we can know whether
+     * an error occurred. blk_unref() doesn't tell us about failures.
+     */
+    ret = bdrv_inactivate_all();
+    if (ret < 0 && !local_err) {
+        error_setg_errno(&local_err, -ret, "Error while closing the image");
+    }
     blk_unref(blk);
 
     if (local_err) {
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] qemu-img bitmap: Report errors while closing the image
  2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
  2023-01-12 19:14 ` [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path Kevin Wolf
  2023-01-12 19:14 ` [PATCH 2/4] qemu-img commit: Report errors while closing the image Kevin Wolf
@ 2023-01-12 19:14 ` Kevin Wolf
  2023-01-13  7:32   ` Philippe Mathieu-Daudé
  2023-01-12 19:14 ` [PATCH 4/4] qemu-iotests: Test qemu-img bitmap/commit exit code on error Kevin Wolf
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2023-01-12 19:14 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, aesteve, nsoffer, qemu-devel

blk_unref() can't report any errors that happen while closing the image.
For example, if qcow2 hits an -ENOSPC error while writing out dirty
bitmaps when it's closed, it prints error messages to stderr, but
'qemu-img bitmap' won't see any error return value and will therefore
look successful with exit code 0.

In order to fix this, manually inactivate the image first before calling
blk_unref(). This already performs the operations that would be most
likely to fail while closing the image, but it can still return errors.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1330
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 22d6ecefd5..c3671d4890 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4645,6 +4645,7 @@ static int img_bitmap(int argc, char **argv)
     QSIMPLEQ_HEAD(, ImgBitmapAction) actions;
     ImgBitmapAction *act, *act_next;
     const char *op;
+    int inactivate_ret;
 
     QSIMPLEQ_INIT(&actions);
 
@@ -4829,6 +4830,16 @@ static int img_bitmap(int argc, char **argv)
     ret = 0;
 
  out:
+    /*
+     * Manually inactivate the images first because this way we can know whether
+     * an error occurred. blk_unref() doesn't tell us about failures.
+     */
+    inactivate_ret = bdrv_inactivate_all();
+    if (inactivate_ret < 0) {
+        error_report("Error while closing the image: %s", strerror(-inactivate_ret));
+        ret = 1;
+    }
+
     blk_unref(src);
     blk_unref(blk);
     qemu_opts_del(opts);
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] qemu-iotests: Test qemu-img bitmap/commit exit code on error
  2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
                   ` (2 preceding siblings ...)
  2023-01-12 19:14 ` [PATCH 3/4] qemu-img bitmap: " Kevin Wolf
@ 2023-01-12 19:14 ` Kevin Wolf
  2023-01-13  7:30 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Markus Armbruster
  2023-01-17 11:24 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Hanna Czenczek
  5 siblings, 0 replies; 17+ messages in thread
From: Kevin Wolf @ 2023-01-12 19:14 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, hreitz, aesteve, nsoffer, qemu-devel

This tests that when an error happens while writing back bitmaps to the
image file in qcow2_inactivate(), 'qemu-img bitmap/commit' actually
return an error value in their exit code instead of making the operation
look successful to scripts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 .../qemu-iotests/tests/qemu-img-close-errors  | 95 +++++++++++++++++++
 .../tests/qemu-img-close-errors.out           | 23 +++++
 2 files changed, 118 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors
 create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out

diff --git a/tests/qemu-iotests/tests/qemu-img-close-errors b/tests/qemu-iotests/tests/qemu-img-close-errors
new file mode 100755
index 0000000000..662f726e5d
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-close-errors
@@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Check that errors while closing the image, in particular writing back dirty
+# bitmaps, is correctly reported with a failing qemu-img exit code.
+#
+# Copyright (C) 2023 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=kwolf@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto generic
+
+size=1G
+
+# The error we are going to use is ENOSPC. Depending on how many bitmaps we
+# create in the backing file (and therefore increase the used up space), we get
+# failures in different places. With a low number, only merging the bitmap
+# fails, whereas with a higher number, already 'qemu-img commit' fails.
+for max_bitmap in 6 7; do
+    echo
+    echo "=== Test with $max_bitmap bitmaps ==="
+
+    TEST_IMG="$TEST_IMG.base" _make_test_img -q $size
+    for i in $(seq 1 $max_bitmap); do
+        $QEMU_IMG bitmap --add "$TEST_IMG.base" "stale-bitmap-$i"
+    done
+
+    # Simulate a block device of 128 MB by resizing the image file accordingly
+    # and then enforcing the size with the raw driver
+    truncate "$TEST_IMG.base" --size 128M
+    BASE_JSON='json:{
+        "driver": "qcow2",
+        "file": {
+            "driver": "raw",
+            "size": 134217728,
+            "file": {
+                "driver": "file",
+                "filename":"'"$TEST_IMG.base"'"
+            }
+        }
+    }'
+
+    _make_test_img -q -b "$BASE_JSON" -F $IMGFMT
+    $QEMU_IMG bitmap --add "$TEST_IMG" "good-bitmap"
+
+    $QEMU_IO -c 'write 0 126m' "$TEST_IMG" | _filter_qemu_io
+
+    $QEMU_IMG commit -d "$TEST_IMG" 2>&1 | _filter_generated_node_ids
+    echo "qemu-img commit exit code: ${PIPESTATUS[0]}"
+
+    $QEMU_IMG bitmap --add "$BASE_JSON" "good-bitmap"
+    echo "qemu-img bitmap --add exit code: $?"
+
+    $QEMU_IMG bitmap --merge "good-bitmap" -b "$TEST_IMG" "$BASE_JSON" \
+        "good-bitmap" 2>&1 | _filter_generated_node_ids
+    echo "qemu-img bitmap --merge exit code:  ${PIPESTATUS[0]}"
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
+
diff --git a/tests/qemu-iotests/tests/qemu-img-close-errors.out b/tests/qemu-iotests/tests/qemu-img-close-errors.out
new file mode 100644
index 0000000000..1bfe88f176
--- /dev/null
+++ b/tests/qemu-iotests/tests/qemu-img-close-errors.out
@@ -0,0 +1,23 @@
+QA output created by qemu-img-close-errors
+
+=== Test with 6 bitmaps ===
+wrote 132120576/132120576 bytes at offset 0
+126 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Image committed.
+qemu-img commit exit code: 0
+qemu-img bitmap --add exit code: 0
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'good-bitmap' to file: No space left on device
+qemu-img: Error while closing the image: Invalid argument
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'good-bitmap' to file: No space left on device
+qemu-img bitmap --merge exit code:  1
+
+=== Test with 7 bitmaps ===
+wrote 132120576/132120576 bytes at offset 0
+126 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'stale-bitmap-7' to file: No space left on device
+qemu-img: Lost persistent bitmaps during inactivation of node 'NODE_NAME': Failed to write bitmap 'stale-bitmap-7' to file: No space left on device
+qemu-img: Error while closing the image: Invalid argument
+qemu-img commit exit code: 1
+qemu-img bitmap --add exit code: 0
+qemu-img bitmap --merge exit code:  0
+*** done
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path
  2023-01-12 19:14 ` [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path Kevin Wolf
@ 2023-01-13  7:30   ` Philippe Mathieu-Daudé
  2023-01-13 10:45     ` Kevin Wolf
  0 siblings, 1 reply; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-01-13  7:30 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: hreitz, aesteve, nsoffer, qemu-devel

On 12/1/23 20:14, Kevin Wolf wrote:
> In order to write the bitmap table to the image file, it is converted to
> big endian. If the write fails, it is passed to clear_bitmap_table() to
> free all of the clusters it had allocated before. However, if we don't
> convert it back to native endianness first, we'll free things at a wrong
> offset.
> 
> In practical terms, the offsets will be so high that we won't actually
> free any allocated clusters, but just run into an error, but in theory
> this can cause image corruption.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block/qcow2-bitmap.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> index bcad567c0c..3dff99ba06 100644
> --- a/block/qcow2-bitmap.c
> +++ b/block/qcow2-bitmap.c
> @@ -115,7 +115,7 @@ static int update_header_sync(BlockDriverState *bs)
>       return bdrv_flush(bs->file->bs);
>   }
>   

Maybe add a comment here remembering to bswap back to native endianness?

> -static inline void bitmap_table_to_be(uint64_t *bitmap_table, size_t size)
> +static inline void bitmap_table_bswap_be(uint64_t *bitmap_table, size_t size)
>   {

This function uses cpu_to_be64(), semantically we convert back calling
be64_to_cpu(), but technically both functions end up being the same.

Alternatively:

      for (i = 0; i < size; ++i) {
-        bitmap_table[i] = cpu_to_be64(bitmap_table[i]);
+        bswap64s(&bitmap_table[i]);
      }

> @@ -1401,9 +1401,10 @@ static int store_bitmap(BlockDriverState *bs, Qcow2Bitmap *bm, Error **errp)
>           goto fail;
>       }
>   
> -    bitmap_table_to_be(tb, tb_size);
> +    bitmap_table_bswap_be(tb, tb_size);
>       ret = bdrv_pwrite(bs->file, tb_offset, tb_size * sizeof(tb[0]), tb, 0);
>       if (ret < 0) {
> +        bitmap_table_bswap_be(tb, tb_size);
>           error_setg_errno(errp, -ret, "Failed to write bitmap '%s' to file",
>                            bm_name);
>           goto fail;

Pre-existing, but consider using g_autofree for 'tb'.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
  2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
                   ` (3 preceding siblings ...)
  2023-01-12 19:14 ` [PATCH 4/4] qemu-iotests: Test qemu-img bitmap/commit exit code on error Kevin Wolf
@ 2023-01-13  7:30 ` Markus Armbruster
  2023-01-13 11:29   ` Kevin Wolf
  2023-01-17 11:24 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Hanna Czenczek
  5 siblings, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2023-01-13  7:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel

Drive-by comment...

Kevin Wolf <kwolf@redhat.com> writes:

> This series addresses the problem described in these bug reports:
> https://gitlab.com/qemu-project/qemu/-/issues/1330
> https://bugzilla.redhat.com/show_bug.cgi?id=2147617
>
> qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
> However, when the function is called through blk_unref(), in the case of
> such errors, while an error message is written to stderr, the callers
> never see an error return. Specifically, 'qemu-img bitmap/commit' are
> reported to exit with an exit code 0 despite the errors.

After having tead the "potential alternative" below, I figure this
failure happens within blk_unref().  But I can't see a call chain.  Am I
confused?

> The solution taken here is inactivating the images first, which can
> still return errors, but already performs all of the write operations.
> Only then the images are actually blk_unref()-ed.
>
> If we agree that this is the way to go (as a potential alternative,
> allowing blk_unref() to fail would require changes in all kinds of
> places, many of which probably wouldn't even know what to do with the
> error),

blk_unref() could fail only when it destroys @blk (refcnt goes to zero).
Correct?

We have a bunch of "unref" functions in the tree, and, as far as I can
tell from a quick grep, none of them can fail.  Supports your apparent
preference for not changing blk_unref().

>         then I suppose doing the same for other qemu-img subcommands
> would make sense, too.

I was about to ask whether there could be more silent failures like the
ones in commit and bitmap.  This suggests there are.

Say we do the same for all known such failures.  Would any remaining (or
new) such failures be programming errors?

[...]



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] qemu-img bitmap: Report errors while closing the image
  2023-01-12 19:14 ` [PATCH 3/4] qemu-img bitmap: " Kevin Wolf
@ 2023-01-13  7:32   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-01-13  7:32 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: hreitz, aesteve, nsoffer, qemu-devel

On 12/1/23 20:14, Kevin Wolf wrote:
> blk_unref() can't report any errors that happen while closing the image.
> For example, if qcow2 hits an -ENOSPC error while writing out dirty
> bitmaps when it's closed, it prints error messages to stderr, but
> 'qemu-img bitmap' won't see any error return value and will therefore
> look successful with exit code 0.
> 
> In order to fix this, manually inactivate the image first before calling
> blk_unref(). This already performs the operations that would be most
> likely to fail while closing the image, but it can still return errors.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1330
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   qemu-img.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path
  2023-01-13  7:30   ` Philippe Mathieu-Daudé
@ 2023-01-13 10:45     ` Kevin Wolf
  2023-01-13 17:37       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2023-01-13 10:45 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel

Am 13.01.2023 um 08:30 hat Philippe Mathieu-Daudé geschrieben:
> On 12/1/23 20:14, Kevin Wolf wrote:
> > In order to write the bitmap table to the image file, it is converted to
> > big endian. If the write fails, it is passed to clear_bitmap_table() to
> > free all of the clusters it had allocated before. However, if we don't
> > convert it back to native endianness first, we'll free things at a wrong
> > offset.
> > 
> > In practical terms, the offsets will be so high that we won't actually
> > free any allocated clusters, but just run into an error, but in theory
> > this can cause image corruption.
> > 
> > Cc: qemu-stable@nongnu.org
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   block/qcow2-bitmap.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
> > index bcad567c0c..3dff99ba06 100644
> > --- a/block/qcow2-bitmap.c
> > +++ b/block/qcow2-bitmap.c
> > @@ -115,7 +115,7 @@ static int update_header_sync(BlockDriverState *bs)
> >       return bdrv_flush(bs->file->bs);
> >   }
> 
> Maybe add a comment here remembering to bswap back to native endianness?
> 
> > -static inline void bitmap_table_to_be(uint64_t *bitmap_table, size_t size)
> > +static inline void bitmap_table_bswap_be(uint64_t *bitmap_table, size_t size)
> >   {
> 
> This function uses cpu_to_be64(), semantically we convert back calling
> be64_to_cpu(), but technically both functions end up being the same.

Yes, but we don't seem to have any public "neutral" functions, it's
always either from or to.

> Alternatively:
> 
>      for (i = 0; i < size; ++i) {
> -        bitmap_table[i] = cpu_to_be64(bitmap_table[i]);
> +        bswap64s(&bitmap_table[i]);
>      }

Doesn't that swap even on big endian hosts, resulting incorrectly in a
little endian table?

The closest thing we have that I can see is the be_bswap() macro in
bswap.h, but it's undefined again at the end of the header.

> > @@ -1401,9 +1401,10 @@ static int store_bitmap(BlockDriverState *bs, Qcow2Bitmap *bm, Error **errp)
> >           goto fail;
> >       }
> > -    bitmap_table_to_be(tb, tb_size);
> > +    bitmap_table_bswap_be(tb, tb_size);
> >       ret = bdrv_pwrite(bs->file, tb_offset, tb_size * sizeof(tb[0]), tb, 0);
> >       if (ret < 0) {
> > +        bitmap_table_bswap_be(tb, tb_size);
> >           error_setg_errno(errp, -ret, "Failed to write bitmap '%s' to file",
> >                            bm_name);
> >           goto fail;
> 
> Pre-existing, but consider using g_autofree for 'tb'.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

Thanks!

Kevin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
  2023-01-13  7:30 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Markus Armbruster
@ 2023-01-13 11:29   ` Kevin Wolf
  2023-02-14 20:09     ` Vladimir Sementsov-Ogievskiy
  2023-02-15 13:07     ` Markus Armbruster
  0 siblings, 2 replies; 17+ messages in thread
From: Kevin Wolf @ 2023-01-13 11:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel, vsementsov

Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben:
> Drive-by comment...
> 
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > This series addresses the problem described in these bug reports:
> > https://gitlab.com/qemu-project/qemu/-/issues/1330
> > https://bugzilla.redhat.com/show_bug.cgi?id=2147617
> >
> > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
> > However, when the function is called through blk_unref(), in the case of
> > such errors, while an error message is written to stderr, the callers
> > never see an error return. Specifically, 'qemu-img bitmap/commit' are
> > reported to exit with an exit code 0 despite the errors.
> 
> After having tead the "potential alternative" below, I figure this
> failure happens within blk_unref().  But I can't see a call chain.  Am I
> confused?

When I put an abort() into the error path:

#0  0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff6a54d76 in raise () from /lib64/libc.so.6
#2  0x00007ffff6a287f3 in abort () from /lib64/libc.so.6
#3  0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at ../block/qcow2.c:2705
#4  0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, close_data_file=true) at ../block/qcow2.c:2741
#5  0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at ../block/qcow2.c:2770
#6  0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939
#7  0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330
#8  0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850
#9  0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at ../block.c:3207
#10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at ../block/block-backend.c:895
#11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at ../block/block-backend.c:479
#12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at ../block/block-backend.c:537
#13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:4820
#14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:5450

> > The solution taken here is inactivating the images first, which can
> > still return errors, but already performs all of the write operations.
> > Only then the images are actually blk_unref()-ed.
> >
> > If we agree that this is the way to go (as a potential alternative,
> > allowing blk_unref() to fail would require changes in all kinds of
> > places, many of which probably wouldn't even know what to do with the
> > error),
> 
> blk_unref() could fail only when it destroys @blk (refcnt goes to zero).
> Correct?

I think so, yes.

> We have a bunch of "unref" functions in the tree, and, as far as I can
> tell from a quick grep, none of them can fail.  Supports your apparent
> preference for not changing blk_unref().
> 
> >         then I suppose doing the same for other qemu-img subcommands
> > would make sense, too.
> 
> I was about to ask whether there could be more silent failures like the
> ones in commit and bitmap.  This suggests there are.
> 
> Say we do the same for all known such failures.  Would any remaining (or
> new) such failures be programming errors?

Let's be honest: What I'm proposing here is not pretty and not a full
solution, it only covers the simplest part of the problem, which happens
to be the part that has shown up in practice.

If you have a good idea how to solve the general problem, I'm all ears.

I haven't checked other qemu-img subcommands, but I don't see why they
wouldn't be able to run into an error in .bdrv_close. They could be
fixed the same way.

The next level in difficulty might be QMP block-delete. It's still easy
because like in qemu-img, we know that we're freeing the last reference,
and so we could actually do the same here. Well, more or less the same
at least: Obviously not inactivate_all(), but just for a single node. We
also need to do this recursively for children, except only for those
that would actually go away together with our parent node and aren't
referenced elsewhere. Even if we manage to implement this correctly,
what do we do with the error? Would returning a QMP error imply that we
didn't actually close the image and it's still valid (and not
inactivated)?

Too easy? Let's make it a bit harder. Let's say a commit job completes
and we're now removing the intermediate nodes. One of these images could
in theory fail in .bdrv_close. We have successfully committed the data,
the new graph is ready and in good state. Just one of the old images
we're throwing out runs into ENOSPC in its .bdrv_close. Where do we
report that error? We don't even necessarily have a QMP command here, we
could only let the whole block job fail, which is probably not a good
way to let libvirt know what was happening. Also, we can't just
unconditionally inactivate the image beforehand there, it might still be
in use by other references.  Which may actually be dropped while we're
draining the node in bdrv_close().

Not enough headaches yet? There are plenty of places in QEMU that just
want to make sure that the node doesn't go away while they are still
doing something with it. So they use a bdrv_ref/unref pair locally.
These places could end up freeing the last reference if the node would
have gone away otherwise. They are almost certainly a very confusing
place to report the error. They might not even be places that can return
errors at all currently.

So the main reason why I'm not doing this properly by returning the
errors from qcow2_close() (and .bdrv_close in all other drivers) through
bdrv_unref() down to the callers of that is not only that it would be a
major conversion that would touch lots of places, but also that I
wouldn't even know what to do with the error in most callers. And that
I'm not sure what the semantics of an error in a close function should
be.

Another thing that could be tried is making failure in .bdrv_close less
likely by doing things earlier. At least ENOSPC could probably be
avoided if dirty bitmaps clusters were allocated during the write
request that first sets a bit in them (I know too little about the
details how bitmaps are implemented in qcow2, though, maybe Vladimir can
help here). But ultimately, you'll always get some I/O requests in
.bdrv_close and they could fail even if we made it less likely.

Kevin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path
  2023-01-13 10:45     ` Kevin Wolf
@ 2023-01-13 17:37       ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 17+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-01-13 17:37 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel

On 13/1/23 11:45, Kevin Wolf wrote:
> Am 13.01.2023 um 08:30 hat Philippe Mathieu-Daudé geschrieben:
>> On 12/1/23 20:14, Kevin Wolf wrote:
>>> In order to write the bitmap table to the image file, it is converted to
>>> big endian. If the write fails, it is passed to clear_bitmap_table() to
>>> free all of the clusters it had allocated before. However, if we don't
>>> convert it back to native endianness first, we'll free things at a wrong
>>> offset.
>>>
>>> In practical terms, the offsets will be so high that we won't actually
>>> free any allocated clusters, but just run into an error, but in theory
>>> this can cause image corruption.
>>>
>>> Cc: qemu-stable@nongnu.org
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>    block/qcow2-bitmap.c | 5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)

>> Maybe add a comment here remembering to bswap back to native endianness?
>>
>>> -static inline void bitmap_table_to_be(uint64_t *bitmap_table, size_t size)
>>> +static inline void bitmap_table_bswap_be(uint64_t *bitmap_table, size_t size)
>>>    {
>>
>> This function uses cpu_to_be64(), semantically we convert back calling
>> be64_to_cpu(), but technically both functions end up being the same.
> 
> Yes, but we don't seem to have any public "neutral" functions, it's
> always either from or to.
> 
>> Alternatively:
>>
>>       for (i = 0; i < size; ++i) {
>> -        bitmap_table[i] = cpu_to_be64(bitmap_table[i]);
>> +        bswap64s(&bitmap_table[i]);
>>       }
> 
> Doesn't that swap even on big endian hosts, resulting incorrectly in a
> little endian table?

Oops yes you are right... sorry!

> The closest thing we have that I can see is the be_bswap() macro in
> bswap.h, but it's undefined again at the end of the header.

Indeed.

Regards,

Phil.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
  2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
                   ` (4 preceding siblings ...)
  2023-01-13  7:30 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Markus Armbruster
@ 2023-01-17 11:24 ` Hanna Czenczek
  5 siblings, 0 replies; 17+ messages in thread
From: Hanna Czenczek @ 2023-01-17 11:24 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: aesteve, nsoffer, qemu-devel

On 12.01.23 20:14, Kevin Wolf wrote:
> This series addresses the problem described in these bug reports:
> https://gitlab.com/qemu-project/qemu/-/issues/1330
> https://bugzilla.redhat.com/show_bug.cgi?id=2147617
>
> qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
> However, when the function is called through blk_unref(), in the case of
> such errors, while an error message is written to stderr, the callers
> never see an error return. Specifically, 'qemu-img bitmap/commit' are
> reported to exit with an exit code 0 despite the errors.
>
> The solution taken here is inactivating the images first, which can
> still return errors, but already performs all of the write operations.
> Only then the images are actually blk_unref()-ed.

Reviewed-by: Hanna Czenczek <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
  2023-01-13 11:29   ` Kevin Wolf
@ 2023-02-14 20:09     ` Vladimir Sementsov-Ogievskiy
  2023-02-15 13:07     ` Markus Armbruster
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-02-14 20:09 UTC (permalink / raw)
  To: Kevin Wolf, Markus Armbruster
  Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel

On 13.01.23 14:29, Kevin Wolf wrote:
> Another thing that could be tried is making failure in .bdrv_close less
> likely by doing things earlier. At least ENOSPC could probably be
> avoided if dirty bitmaps clusters were allocated during the write
> request that first sets a bit in them (I know too little about the
> details how bitmaps are implemented in qcow2, though, maybe Vladimir can
> help here).

That's possible but not trivial :)

Qcow2 does nothing with dirty bitmaps during normal operation. Only on close, it finds all persistent bitmaps and stores them somehow, mostly allocating new clusters on the fly.

So the simplest way look like:

- add generic handler .bitmap_changed in BlockDriver, to handle bitmap change in qcow2 (that's not only write, but may be bitmap_merge opertion).
- in a new handler allocate some clusters to produce a pool for dirty bitmaps saving (will need clusters for bitmap data and metadata (bitmap table, bitmap directory))
- in block/qcow2-bitmap.c switch qcow2_alloc_cluster() to a wrapper, that first tries to get a cluster from the pool and if it's empty fallback to qcow2_alloc_cluster()

Note also, that this will increase fragmentation.

Or, may be more effective would be to preallocate clusters on bitmap creation (and therefore on image resize).

More difficult would be rework the whole code to bind allocated clusters for each persistent dirty bitmap.

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
  2023-01-13 11:29   ` Kevin Wolf
  2023-02-14 20:09     ` Vladimir Sementsov-Ogievskiy
@ 2023-02-15 13:07     ` Markus Armbruster
  2023-02-15 20:50       ` Vladimir Sementsov-Ogievskiy
  2023-02-22 12:08       ` Reference-counting and finalizers that can fail are uneasy partners (was: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image) Markus Armbruster
  1 sibling, 2 replies; 17+ messages in thread
From: Markus Armbruster @ 2023-02-15 13:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel, vsementsov

Kevin Wolf <kwolf@redhat.com> writes:

> Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben:
>> Drive-by comment...
>> 
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > This series addresses the problem described in these bug reports:
>> > https://gitlab.com/qemu-project/qemu/-/issues/1330
>> > https://bugzilla.redhat.com/show_bug.cgi?id=2147617
>> >
>> > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
>> > However, when the function is called through blk_unref(), in the case of
>> > such errors, while an error message is written to stderr, the callers
>> > never see an error return. Specifically, 'qemu-img bitmap/commit' are
>> > reported to exit with an exit code 0 despite the errors.
>> 
>> After having tead the "potential alternative" below, I figure this
>> failure happens within blk_unref().  But I can't see a call chain.  Am I
>> confused?
>
> When I put an abort() into the error path:
>
> #0  0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007ffff6a54d76 in raise () from /lib64/libc.so.6
> #2  0x00007ffff6a287f3 in abort () from /lib64/libc.so.6
> #3  0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at ../block/qcow2.c:2705
> #4  0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, close_data_file=true) at ../block/qcow2.c:2741
> #5  0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at ../block/qcow2.c:2770
> #6  0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939
> #7  0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330
> #8  0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850
> #9  0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at ../block.c:3207
> #10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at ../block/block-backend.c:895
> #11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at ../block/block-backend.c:479
> #12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at ../block/block-backend.c:537
> #13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:4820
> #14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:5450

Thanks!

>> > The solution taken here is inactivating the images first, which can
>> > still return errors, but already performs all of the write operations.
>> > Only then the images are actually blk_unref()-ed.
>> >
>> > If we agree that this is the way to go (as a potential alternative,
>> > allowing blk_unref() to fail would require changes in all kinds of
>> > places, many of which probably wouldn't even know what to do with the
>> > error),
>> 
>> blk_unref() could fail only when it destroys @blk (refcnt goes to zero).
>> Correct?
>
> I think so, yes.

Thanks again!

>> We have a bunch of "unref" functions in the tree, and, as far as I can
>> tell from a quick grep, none of them can fail.  Supports your apparent
>> preference for not changing blk_unref().
>> 
>> >         then I suppose doing the same for other qemu-img subcommands
>> > would make sense, too.
>> 
>> I was about to ask whether there could be more silent failures like the
>> ones in commit and bitmap.  This suggests there are.
>> 
>> Say we do the same for all known such failures.  Would any remaining (or
>> new) such failures be programming errors?
>
> Let's be honest: What I'm proposing here is not pretty and not a full
> solution, it only covers the simplest part of the problem, which happens
> to be the part that has shown up in practice.
>
> If you have a good idea how to solve the general problem, I'm all ears.
>
> I haven't checked other qemu-img subcommands, but I don't see why they
> wouldn't be able to run into an error in .bdrv_close. They could be
> fixed the same way.
>
> The next level in difficulty might be QMP block-delete. It's still easy
> because like in qemu-img, we know that we're freeing the last reference,
> and so we could actually do the same here. Well, more or less the same
> at least: Obviously not inactivate_all(), but just for a single node. We
> also need to do this recursively for children, except only for those
> that would actually go away together with our parent node and aren't
> referenced elsewhere. Even if we manage to implement this correctly,
> what do we do with the error? Would returning a QMP error imply that we
> didn't actually close the image and it's still valid (and not
> inactivated)?
>
> Too easy? Let's make it a bit harder. Let's say a commit job completes
> and we're now removing the intermediate nodes. One of these images could
> in theory fail in .bdrv_close. We have successfully committed the data,
> the new graph is ready and in good state. Just one of the old images
> we're throwing out runs into ENOSPC in its .bdrv_close. Where do we
> report that error? We don't even necessarily have a QMP command here, we
> could only let the whole block job fail, which is probably not a good
> way to let libvirt know what was happening. Also, we can't just
> unconditionally inactivate the image beforehand there, it might still be
> in use by other references.  Which may actually be dropped while we're
> draining the node in bdrv_close().
>
> Not enough headaches yet? There are plenty of places in QEMU that just
> want to make sure that the node doesn't go away while they are still
> doing something with it. So they use a bdrv_ref/unref pair locally.
> These places could end up freeing the last reference if the node would
> have gone away otherwise. They are almost certainly a very confusing
> place to report the error. They might not even be places that can return
> errors at all currently.

Yes.

> So the main reason why I'm not doing this properly by returning the
> errors from qcow2_close() (and .bdrv_close in all other drivers) through
> bdrv_unref() down to the callers of that is not only that it would be a
> major conversion that would touch lots of places, but also that I
> wouldn't even know what to do with the error in most callers. And that
> I'm not sure what the semantics of an error in a close function should
> be.

Understand.

> Another thing that could be tried is making failure in .bdrv_close less
> likely by doing things earlier. At least ENOSPC could probably be
> avoided if dirty bitmaps clusters were allocated during the write
> request that first sets a bit in them (I know too little about the
> details how bitmaps are implemented in qcow2, though, maybe Vladimir can
> help here). But ultimately, you'll always get some I/O requests in
> .bdrv_close and they could fail even if we made it less likely.

Let me try to summarize to make sure I understand.

Closing an image can fail for the same reason close() can fail: flushing
caches can fail, and not caching is not an option.

The close is commonly hidden within a bdrv_unref().  It closes when the
last reference goes away.

Sometimes we know which bdrv_unref() will close.  Sometimes we don't.

Some bdrv_unref() callers can report errors sanely.  Others simply
can't.

Some failures to close can be safely ignored, such as closing a
temporary image that is going away anyway.  But it's hard to tell when
this is the case.

Ideally, things fail cleanly: we either do what's asked and succeed, or
do nothing and fail.  A failure to close is commonly unclean.  So, even
if we can report it, recovery can be hard or impossible.


A common criticism of garbage collection is that finalization is delayed
and runs "out of context".  The above shows that reference counting
isn't all that better.

We could have two variants of bdrv_unref(), one that must not fail, and
one that can fail and must be checked.  But as you explained, ensuring
failure only happens in places where we can handle an error sanely is
somewhere between hard and impossible.

No better ideas, I'm afraid.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
  2023-02-15 13:07     ` Markus Armbruster
@ 2023-02-15 20:50       ` Vladimir Sementsov-Ogievskiy
  2023-02-22 12:08       ` Reference-counting and finalizers that can fail are uneasy partners (was: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image) Markus Armbruster
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-02-15 20:50 UTC (permalink / raw)
  To: Markus Armbruster, Kevin Wolf
  Cc: qemu-block, hreitz, aesteve, nsoffer, qemu-devel

On 15.02.23 16:07, Markus Armbruster wrote:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben:
>>> Drive-by comment...
>>>
>>> Kevin Wolf <kwolf@redhat.com> writes:
>>>
>>>> This series addresses the problem described in these bug reports:
>>>> https://gitlab.com/qemu-project/qemu/-/issues/1330
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2147617
>>>>
>>>> qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
>>>> However, when the function is called through blk_unref(), in the case of
>>>> such errors, while an error message is written to stderr, the callers
>>>> never see an error return. Specifically, 'qemu-img bitmap/commit' are
>>>> reported to exit with an exit code 0 despite the errors.
>>>
>>> After having tead the "potential alternative" below, I figure this
>>> failure happens within blk_unref().  But I can't see a call chain.  Am I
>>> confused?
>>
>> When I put an abort() into the error path:
>>
>> #0  0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6
>> #1  0x00007ffff6a54d76 in raise () from /lib64/libc.so.6
>> #2  0x00007ffff6a287f3 in abort () from /lib64/libc.so.6
>> #3  0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at ../block/qcow2.c:2705
>> #4  0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, close_data_file=true) at ../block/qcow2.c:2741
>> #5  0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at ../block/qcow2.c:2770
>> #6  0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939
>> #7  0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330
>> #8  0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850
>> #9  0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at ../block.c:3207
>> #10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at ../block/block-backend.c:895
>> #11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at ../block/block-backend.c:479
>> #12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at ../block/block-backend.c:537
>> #13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:4820
>> #14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:5450
> 
> Thanks!
> 
>>>> The solution taken here is inactivating the images first, which can
>>>> still return errors, but already performs all of the write operations.
>>>> Only then the images are actually blk_unref()-ed.
>>>>
>>>> If we agree that this is the way to go (as a potential alternative,
>>>> allowing blk_unref() to fail would require changes in all kinds of
>>>> places, many of which probably wouldn't even know what to do with the
>>>> error),
>>>
>>> blk_unref() could fail only when it destroys @blk (refcnt goes to zero).
>>> Correct?
>>
>> I think so, yes.
> 
> Thanks again!
> 
>>> We have a bunch of "unref" functions in the tree, and, as far as I can
>>> tell from a quick grep, none of them can fail.  Supports your apparent
>>> preference for not changing blk_unref().
>>>
>>>>          then I suppose doing the same for other qemu-img subcommands
>>>> would make sense, too.
>>>
>>> I was about to ask whether there could be more silent failures like the
>>> ones in commit and bitmap.  This suggests there are.
>>>
>>> Say we do the same for all known such failures.  Would any remaining (or
>>> new) such failures be programming errors?
>>
>> Let's be honest: What I'm proposing here is not pretty and not a full
>> solution, it only covers the simplest part of the problem, which happens
>> to be the part that has shown up in practice.
>>
>> If you have a good idea how to solve the general problem, I'm all ears.
>>
>> I haven't checked other qemu-img subcommands, but I don't see why they
>> wouldn't be able to run into an error in .bdrv_close. They could be
>> fixed the same way.
>>
>> The next level in difficulty might be QMP block-delete. It's still easy
>> because like in qemu-img, we know that we're freeing the last reference,
>> and so we could actually do the same here. Well, more or less the same
>> at least: Obviously not inactivate_all(), but just for a single node. We
>> also need to do this recursively for children, except only for those
>> that would actually go away together with our parent node and aren't
>> referenced elsewhere. Even if we manage to implement this correctly,
>> what do we do with the error? Would returning a QMP error imply that we
>> didn't actually close the image and it's still valid (and not
>> inactivated)?
>>
>> Too easy? Let's make it a bit harder. Let's say a commit job completes
>> and we're now removing the intermediate nodes. One of these images could
>> in theory fail in .bdrv_close. We have successfully committed the data,
>> the new graph is ready and in good state. Just one of the old images
>> we're throwing out runs into ENOSPC in its .bdrv_close. Where do we
>> report that error? We don't even necessarily have a QMP command here, we
>> could only let the whole block job fail, which is probably not a good
>> way to let libvirt know what was happening. Also, we can't just
>> unconditionally inactivate the image beforehand there, it might still be
>> in use by other references.  Which may actually be dropped while we're
>> draining the node in bdrv_close().
>>
>> Not enough headaches yet? There are plenty of places in QEMU that just
>> want to make sure that the node doesn't go away while they are still
>> doing something with it. So they use a bdrv_ref/unref pair locally.
>> These places could end up freeing the last reference if the node would
>> have gone away otherwise. They are almost certainly a very confusing
>> place to report the error. They might not even be places that can return
>> errors at all currently.
> 
> Yes.
> 
>> So the main reason why I'm not doing this properly by returning the
>> errors from qcow2_close() (and .bdrv_close in all other drivers) through
>> bdrv_unref() down to the callers of that is not only that it would be a
>> major conversion that would touch lots of places, but also that I
>> wouldn't even know what to do with the error in most callers. And that
>> I'm not sure what the semantics of an error in a close function should
>> be.
> 
> Understand.
> 
>> Another thing that could be tried is making failure in .bdrv_close less
>> likely by doing things earlier. At least ENOSPC could probably be
>> avoided if dirty bitmaps clusters were allocated during the write
>> request that first sets a bit in them (I know too little about the
>> details how bitmaps are implemented in qcow2, though, maybe Vladimir can
>> help here). But ultimately, you'll always get some I/O requests in
>> .bdrv_close and they could fail even if we made it less likely.
> 
> Let me try to summarize to make sure I understand.
> 
> Closing an image can fail for the same reason close() can fail: flushing
> caches can fail, and not caching is not an option.
> 
> The close is commonly hidden within a bdrv_unref().  It closes when the
> last reference goes away.
> 
> Sometimes we know which bdrv_unref() will close.  Sometimes we don't.
> 
> Some bdrv_unref() callers can report errors sanely.  Others simply
> can't.
> 
> Some failures to close can be safely ignored, such as closing a
> temporary image that is going away anyway.  But it's hard to tell when
> this is the case.
> 
> Ideally, things fail cleanly: we either do what's asked and succeed, or
> do nothing and fail.  A failure to close is commonly unclean.  So, even
> if we can report it, recovery can be hard or impossible.
> 
> 
> A common criticism of garbage collection is that finalization is delayed
> and runs "out of context".  The above shows that reference counting
> isn't all that better.
> 
> We could have two variants of bdrv_unref(), one that must not fail, and
> one that can fail and must be checked.  But as you explained, ensuring
> failure only happens in places where we can handle an error sanely is
> somewhere between hard and impossible.
> 
> No better ideas, I'm afraid.
> 


I just add my thought:

If user worries about correct closing of any block node, user should control when the node is closed. And we have all the instruments:

User just should create each node personally by blockdev-add (or -blockdev). And then before terminating QEMU process do blockdev-del correspondingly for all nodes. This way the moment when the node is finally closed is obvious: it's blockdev-del, where user can get appropriate error message (and retry deletion if needed).

To achieve this we need similar additional bdrv_inactivate() call in qmp_blockdev_del() to report an error (and don't bdrv_unref() in case of failed inactivation). And we can add a boolean "force" argument to blcokdev-del to skip this additional bdrv_inactivate() call.

(yes, this will not work for qemu-img. But qsd may be used instead to create more reliable scenarios)

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Reference-counting and finalizers that can fail are uneasy partners (was: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image)
  2023-02-15 13:07     ` Markus Armbruster
  2023-02-15 20:50       ` Vladimir Sementsov-Ogievskiy
@ 2023-02-22 12:08       ` Markus Armbruster
  2023-02-22 12:54         ` Daniel P. Berrangé
  1 sibling, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2023-02-22 12:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, qemu-block, hreitz, aesteve, nsoffer, vsementsov

A half-baked thought has been sloshing around in my head.  Perhaps I can
bake it some more by writing it up.

Reference-counting and finalizers that can fail are uneasy partners.

When managing lifetimes manually, you control where finalization
happens.  When finalization can fail, you're as empowered as you could
be to make it fail in a place where you can handle the failure sensibly.

Manual resource management is tedious and error prone, and that's a
serious problem.  Garbage collection takes it off your hands.  Good.
But now finalization happens at some future time, and in garbage
collection context.  Fine when finalization's side effects are
sufficiently harmless.  But what if finalization can fail?  We trade one
serious problem (manual resource management) for another one (handling
finalization failures).

Reference counting is slightly different.  Here, finalization can only
happen at unref, which means you retain more control than with garbage
collection.  However, we do need unrefs in places where we can't
sensibly handle failure.  For instance, when code operates on an object
whose reference count can be dropped concurrently, we need to guard with
a ref/unref bracket to keep the object alive while the code is messing
with it.

The only way out I can see is to systematically avoid finalizers that
can fail, by extracting the part that can fail into a shutdown method,
to be called in a suitable context, and before finalization.

Yes, this takes us back to manual resource management, only we manage
shutdown instead of death.

Finalizing something that has not been shut down would be a programming
error.  A recoverable one, I guess; we can have finalize attempt to shut
down then, and if it fails, just weep into the logs and move on.

We gain a "shut down" state, and new problems may well come with it.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Reference-counting and finalizers that can fail are uneasy partners (was: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image)
  2023-02-22 12:08       ` Reference-counting and finalizers that can fail are uneasy partners (was: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image) Markus Armbruster
@ 2023-02-22 12:54         ` Daniel P. Berrangé
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel P. Berrangé @ 2023-02-22 12:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Kevin Wolf, qemu-block, hreitz, aesteve, nsoffer, vsementsov

On Wed, Feb 22, 2023 at 01:08:05PM +0100, Markus Armbruster wrote:
> A half-baked thought has been sloshing around in my head.  Perhaps I can
> bake it some more by writing it up.
> 
> Reference-counting and finalizers that can fail are uneasy partners.
> 
> When managing lifetimes manually, you control where finalization
> happens.  When finalization can fail, you're as empowered as you could
> be to make it fail in a place where you can handle the failure sensibly.
> 
> Manual resource management is tedious and error prone, and that's a
> serious problem.  Garbage collection takes it off your hands.  Good.
> But now finalization happens at some future time, and in garbage
> collection context.  Fine when finalization's side effects are
> sufficiently harmless.  But what if finalization can fail?  We trade one
> serious problem (manual resource management) for another one (handling
> finalization failures).
> 
> Reference counting is slightly different.  Here, finalization can only
> happen at unref, which means you retain more control than with garbage
> collection.  However, we do need unrefs in places where we can't
> sensibly handle failure.  For instance, when code operates on an object
> whose reference count can be dropped concurrently, we need to guard with
> a ref/unref bracket to keep the object alive while the code is messing
> with it.
> 
> The only way out I can see is to systematically avoid finalizers that
> can fail, by extracting the part that can fail into a shutdown method,
> to be called in a suitable context, and before finalization.

Yes, I concur with pretty much everything you say above.

Since finalizers can occur in any context the logic that runs in
them needs to be quite clearly defined and free of side effects
or unpredictable failure scenarios.

I would probably go further and say that finalizers need to be
able to execute in finite time, so that callers do not have
execution of their thread blocked arbitrarily if they happen to
be one the one that releases the last reference.

Finalizers should be releasing resources that are already in
a "safe" state. Releasing memory, decrementing references,
unregistering callbacks, are typical safe things.

Performing I/O is a clearly a bad idea / inappropriate for
a finalizer by this standard.

Garbage collection vs reference counts is tangential to this
problem, as you say, they'll both share the same problem we're
facing.

> Yes, this takes us back to manual resource management, only we manage
> shutdown instead of death.
> 
> Finalizing something that has not been shut down would be a programming
> error.  A recoverable one, I guess; we can have finalize attempt to shut
> down then, and if it fails, just weep into the logs and move on.

This is approximately what I did with QIOChannel.  There is a
qio_channel_close() method that is best practice to invoke to
release resources assoociated with the channel, possibly flushing
pending I/O, and reporting failures via an Error **errp.

If this is not called, however, the finalizer will call close
on your behalf, discarding errors. It'll probably be OK much of
the time, and if we find it isn't, then the missing explicit
close call needs to be addressed.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-02-22 12:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12 19:14 [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Kevin Wolf
2023-01-12 19:14 ` [PATCH 1/4] qcow2: Fix theoretical corruption in store_bitmap() error path Kevin Wolf
2023-01-13  7:30   ` Philippe Mathieu-Daudé
2023-01-13 10:45     ` Kevin Wolf
2023-01-13 17:37       ` Philippe Mathieu-Daudé
2023-01-12 19:14 ` [PATCH 2/4] qemu-img commit: Report errors while closing the image Kevin Wolf
2023-01-12 19:14 ` [PATCH 3/4] qemu-img bitmap: " Kevin Wolf
2023-01-13  7:32   ` Philippe Mathieu-Daudé
2023-01-12 19:14 ` [PATCH 4/4] qemu-iotests: Test qemu-img bitmap/commit exit code on error Kevin Wolf
2023-01-13  7:30 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Markus Armbruster
2023-01-13 11:29   ` Kevin Wolf
2023-02-14 20:09     ` Vladimir Sementsov-Ogievskiy
2023-02-15 13:07     ` Markus Armbruster
2023-02-15 20:50       ` Vladimir Sementsov-Ogievskiy
2023-02-22 12:08       ` Reference-counting and finalizers that can fail are uneasy partners (was: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image) Markus Armbruster
2023-02-22 12:54         ` Daniel P. Berrangé
2023-01-17 11:24 ` [PATCH 0/4] qemu-img: Fix exit code for errors closing the image Hanna Czenczek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.