QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
@ 2021-04-15 15:22 Kevin Wolf
  2021-04-15 15:22 ` [RFC PATCH 1/2] iotests: Test qemu-img convert of zeroed data cluster Kevin Wolf
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Kevin Wolf @ 2021-04-15 15:22 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pl, qemu-devel, mreitz

Peter, three years ago you changed 'qemu-img convert' to sacrifice some
sparsification in order to get aligned requests on the target image. At
the time, I thought the impact would be small, but it turns out that
this can end up wasting gigabytes of storagee (like converting a fully
zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).

https://bugzilla.redhat.com/show_bug.cgi?id=1882917

I'm not entirely sure how to attack this best since this is a tradeoff,
but maybe the approach in this series is still good enough for the case
that you wanted to fix back then?

Of course, it would be possible to have a more complete fix like looking
forward a few blocks more before writing data, but that would probably
not be entirely trivial because you would have to merge blocks with ZERO
block status with DATA blocks that contain only zeros. I'm not sure if
it's worth this complication of the code.

Kevin Wolf (2):
  iotests: Test qemu-img convert of zeroed data cluster
  qemu-img convert: Fix sparseness detection

 qemu-img.c                 | 18 ++++--------------
 tests/qemu-iotests/122     |  1 +
 tests/qemu-iotests/122.out |  6 ++++--
 3 files changed, 9 insertions(+), 16 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/2] iotests: Test qemu-img convert of zeroed data cluster
  2021-04-15 15:22 [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection Kevin Wolf
@ 2021-04-15 15:22 ` Kevin Wolf
  2021-04-15 15:22 ` [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection Kevin Wolf
  2021-04-19  8:36 ` [RFC PATCH 0/2] " Peter Lieven
  2 siblings, 0 replies; 12+ messages in thread
From: Kevin Wolf @ 2021-04-15 15:22 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pl, qemu-devel, mreitz

This demonstrates what happens when the block status changes in
sub-min_sparse granularity, but all of the parts are zeroed out. The
alignment logic in is_allocated_sectors() prevents that the target image
remains fully sparse as expected, but turns it into a data cluster of
explicit zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/122     |  1 +
 tests/qemu-iotests/122.out | 10 ++++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122
index 5d550ed13e..7a213a4df9 100755
--- a/tests/qemu-iotests/122
+++ b/tests/qemu-iotests/122
@@ -251,6 +251,7 @@ $QEMU_IO -c "write -P 0 0 64k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_test
 $QEMU_IO -c "write 0 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
 $QEMU_IO -c "write 8k 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
 $QEMU_IO -c "write 17k 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -c "write -P 0 65k 1k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir
 
 for min_sparse in 4k 8k; do
     echo
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 3a3e121d57..dcc44a2304 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -192,6 +192,8 @@ wrote 1024/1024 bytes at offset 8192
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 1024/1024 bytes at offset 17408
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 66560
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 convert -S 4k
 [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
@@ -199,7 +201,9 @@ convert -S 4k
 { "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": false}]
+{ "start": 20480, "length": 46080, "depth": 0, "zero": true, "data": false},
+{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
 
 convert -c -S 4k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
@@ -211,7 +215,9 @@ convert -c -S 4k
 
 convert -S 8k
 [{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": false}]
+{ "start": 24576, "length": 41984, "depth": 0, "zero": true, "data": false},
+{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
 
 convert -c -S 8k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
-- 
2.30.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection
  2021-04-15 15:22 [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection Kevin Wolf
  2021-04-15 15:22 ` [RFC PATCH 1/2] iotests: Test qemu-img convert of zeroed data cluster Kevin Wolf
@ 2021-04-15 15:22 ` Kevin Wolf
  2021-04-20 14:31   ` Vladimir Sementsov-Ogievskiy
  2021-04-19  8:36 ` [RFC PATCH 0/2] " Peter Lieven
  2 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2021-04-15 15:22 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, pl, qemu-devel, mreitz

In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
like non-zero data if the end of the checked area isn't aligned. This
can improve the efficiency of the conversion and was introduced in
commit 8dcd3c9b91a.

However, it comes with a correctness problem: qemu-img convert is
supposed to sparsify areas that contain only zeros, which it doesn't do
any more. It turns out that this even happens when not only the
unaligned area is zeroed, but also the blocks before and after it. In
the bug report, conversion of a fragmented 10G image containing only
zeros resulted in an image consuming 2.82 GiB even though the expected
size is only 4 KiB.

As a tradeoff between both, let's ignore zeroed sectors only after
non-zero data to fix the alignment, but if we're only looking at zeros,
keep them as such, even if it may mean additional RMW cycles.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1882917
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c                 | 18 ++++--------------
 tests/qemu-iotests/122.out | 12 ++++--------
 2 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index a5993682aa..ca4eba2dd1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1168,20 +1168,10 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
     }
 
     tail = (sector_num + i) & (alignment - 1);
-    if (tail) {
-        if (is_zero && i <= tail) {
-            /* treat unallocated areas which only consist
-             * of a small tail as allocated. */
-            is_zero = false;
-        }
-        if (!is_zero) {
-            /* align up end offset of allocated areas. */
-            i += alignment - tail;
-            i = MIN(i, n);
-        } else {
-            /* align down end offset of zero areas. */
-            i -= tail;
-        }
+    if (tail && !is_zero) {
+        /* align up end offset of allocated areas. */
+        i += alignment - tail;
+        i = MIN(i, n);
     }
     *pnum = i;
     return !is_zero;
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index dcc44a2304..fe0ea34164 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -199,11 +199,9 @@ convert -S 4k
 [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
-{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 20480, "length": 46080, "depth": 0, "zero": true, "data": false},
-{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
+{ "start": 12288, "length": 5120, "depth": 0, "zero": true, "data": false},
+{ "start": 17408, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": false}]
 
 convert -c -S 4k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
@@ -215,9 +213,7 @@ convert -c -S 4k
 
 convert -S 8k
 [{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 24576, "length": 41984, "depth": 0, "zero": true, "data": false},
-{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
+{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": false}]
 
 convert -c -S 8k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
-- 
2.30.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
  2021-04-15 15:22 [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection Kevin Wolf
  2021-04-15 15:22 ` [RFC PATCH 1/2] iotests: Test qemu-img convert of zeroed data cluster Kevin Wolf
  2021-04-15 15:22 ` [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection Kevin Wolf
@ 2021-04-19  8:36 ` Peter Lieven
  2021-04-19  9:13   ` Peter Lieven
  2021-04-19 11:22   ` Kevin Wolf
  2 siblings, 2 replies; 12+ messages in thread
From: Peter Lieven @ 2021-04-19  8:36 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz



> Am 15.04.2021 um 17:22 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Peter, three years ago you changed 'qemu-img convert' to sacrifice some
> sparsification in order to get aligned requests on the target image. At
> the time, I thought the impact would be small, but it turns out that
> this can end up wasting gigabytes of storagee (like converting a fully
> zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1882917
> 
> I'm not entirely sure how to attack this best since this is a tradeoff,
> but maybe the approach in this series is still good enough for the case
> that you wanted to fix back then?
> 
> Of course, it would be possible to have a more complete fix like looking
> forward a few blocks more before writing data, but that would probably
> not be entirely trivial because you would have to merge blocks with ZERO
> block status with DATA blocks that contain only zeros. I'm not sure if
> it's worth this complication of the code.

I will try to look into this asap.

Is there a hint which FS I need to set the extent hint when creating the raw image? I was not able to do that.

Peter




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
  2021-04-19  8:36 ` [RFC PATCH 0/2] " Peter Lieven
@ 2021-04-19  9:13   ` Peter Lieven
  2021-04-19 12:31     ` Kevin Wolf
  2021-04-19 11:22   ` Kevin Wolf
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Lieven @ 2021-04-19  9:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz


[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]



> Am 19.04.2021 um 10:36 schrieb Peter Lieven <pl@kamp.de>:
> 
> 
> 
>> Am 15.04.2021 um 17:22 schrieb Kevin Wolf <kwolf@redhat.com>:
>> 
>> Peter, three years ago you changed 'qemu-img convert' to sacrifice some
>> sparsification in order to get aligned requests on the target image. At
>> the time, I thought the impact would be small, but it turns out that
>> this can end up wasting gigabytes of storagee (like converting a fully
>> zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1882917
>> 
>> I'm not entirely sure how to attack this best since this is a tradeoff,
>> but maybe the approach in this series is still good enough for the case
>> that you wanted to fix back then?
>> 
>> Of course, it would be possible to have a more complete fix like looking
>> forward a few blocks more before writing data, but that would probably
>> not be entirely trivial because you would have to merge blocks with ZERO
>> block status with DATA blocks that contain only zeros. I'm not sure if
>> it's worth this complication of the code.
> 
> I will try to look into this asap.

Besides from the reproducer described in the ticket, I retried my old conversion test in our environment:

Before commit 8dcd3c9b91: reads 4608 writes 14959
After commit 8dcd3c9b91: reads 0 writes 14924
With Kevins patch: reads 110 writes 14924

I think this is a good result if it avoids other issues.

Peter


[-- Attachment #2: Type: text/html, Size: 3458 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
  2021-04-19  8:36 ` [RFC PATCH 0/2] " Peter Lieven
  2021-04-19  9:13   ` Peter Lieven
@ 2021-04-19 11:22   ` Kevin Wolf
  1 sibling, 0 replies; 12+ messages in thread
From: Kevin Wolf @ 2021-04-19 11:22 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, Max Reitz

Am 19.04.2021 um 10:36 hat Peter Lieven geschrieben:
> 
> 
> > Am 15.04.2021 um 17:22 schrieb Kevin Wolf <kwolf@redhat.com>:
> > 
> > Peter, three years ago you changed 'qemu-img convert' to sacrifice some
> > sparsification in order to get aligned requests on the target image. At
> > the time, I thought the impact would be small, but it turns out that
> > this can end up wasting gigabytes of storagee (like converting a fully
> > zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1882917
> > 
> > I'm not entirely sure how to attack this best since this is a tradeoff,
> > but maybe the approach in this series is still good enough for the case
> > that you wanted to fix back then?
> > 
> > Of course, it would be possible to have a more complete fix like looking
> > forward a few blocks more before writing data, but that would probably
> > not be entirely trivial because you would have to merge blocks with ZERO
> > block status with DATA blocks that contain only zeros. I'm not sure if
> > it's worth this complication of the code.
> 
> I will try to look into this asap.
> 
> Is there a hint which FS I need to set the extent hint when creating
> the raw image? I was not able to do that.

Grepping the current kernel source, it seems extent size hints still
work only on XFS. But I don't think it's necessary for reproducing this
bug. In fact, disabling the extent size hint should cause a lot more
fragmentation, which should make the problem more visible.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
  2021-04-19  9:13   ` Peter Lieven
@ 2021-04-19 12:31     ` Kevin Wolf
  2021-04-19 17:12       ` Peter Lieven
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2021-04-19 12:31 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, Max Reitz

Am 19.04.2021 um 11:13 hat Peter Lieven geschrieben:
> 
> 
> > Am 19.04.2021 um 10:36 schrieb Peter Lieven <pl@kamp.de>:
> > 
> > 
> > 
> >> Am 15.04.2021 um 17:22 schrieb Kevin Wolf <kwolf@redhat.com>:
> >> 
> >> Peter, three years ago you changed 'qemu-img convert' to sacrifice some
> >> sparsification in order to get aligned requests on the target image. At
> >> the time, I thought the impact would be small, but it turns out that
> >> this can end up wasting gigabytes of storagee (like converting a fully
> >> zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).
> >> 
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1882917
> >> 
> >> I'm not entirely sure how to attack this best since this is a tradeoff,
> >> but maybe the approach in this series is still good enough for the case
> >> that you wanted to fix back then?
> >> 
> >> Of course, it would be possible to have a more complete fix like looking
> >> forward a few blocks more before writing data, but that would probably
> >> not be entirely trivial because you would have to merge blocks with ZERO
> >> block status with DATA blocks that contain only zeros. I'm not sure if
> >> it's worth this complication of the code.
> > 
> > I will try to look into this asap.
> 
> Besides from the reproducer described in the ticket, I retried my old
> conversion test in our environment:
> 
> Before commit 8dcd3c9b91: reads 4608 writes 14959
> After commit 8dcd3c9b91: reads 0 writes 14924
> With Kevins patch: reads 110 writes 14924
> 
> I think this is a good result if it avoids other issues.

Sounds like a promising way to make the tradeoff. Thanks for testing!

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
  2021-04-19 12:31     ` Kevin Wolf
@ 2021-04-19 17:12       ` Peter Lieven
  2021-04-20  6:49         ` Kevin Wolf
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Lieven @ 2021-04-19 17:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz



Von meinem iPhone gesendet

> Am 19.04.2021 um 14:31 schrieb Kevin Wolf <kwolf@redhat.com>:
> 
> Am 19.04.2021 um 11:13 hat Peter Lieven geschrieben:
>> 
>> 
>>>> Am 19.04.2021 um 10:36 schrieb Peter Lieven <pl@kamp.de>:
>>> 
>>> 
>>> 
>>>> Am 15.04.2021 um 17:22 schrieb Kevin Wolf <kwolf@redhat.com>:
>>>> 
>>>> Peter, three years ago you changed 'qemu-img convert' to sacrifice some
>>>> sparsification in order to get aligned requests on the target image. At
>>>> the time, I thought the impact would be small, but it turns out that
>>>> this can end up wasting gigabytes of storagee (like converting a fully
>>>> zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).
>>>> 
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1882917
>>>> 
>>>> I'm not entirely sure how to attack this best since this is a tradeoff,
>>>> but maybe the approach in this series is still good enough for the case
>>>> that you wanted to fix back then?
>>>> 
>>>> Of course, it would be possible to have a more complete fix like looking
>>>> forward a few blocks more before writing data, but that would probably
>>>> not be entirely trivial because you would have to merge blocks with ZERO
>>>> block status with DATA blocks that contain only zeros. I'm not sure if
>>>> it's worth this complication of the code.
>>> 
>>> I will try to look into this asap.
>> 
>> Besides from the reproducer described in the ticket, I retried my old
>> conversion test in our environment:
>> 
>> Before commit 8dcd3c9b91: reads 4608 writes 14959
>> After commit 8dcd3c9b91: reads 0 writes 14924
>> With Kevins patch: reads 110 writes 14924
>> 
>> I think this is a good result if it avoids other issues.
> 
> Sounds like a promising way to make the tradeoff. Thanks for testing!

is this sth for 6.0-rc4?

Peter





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection
  2021-04-19 17:12       ` Peter Lieven
@ 2021-04-20  6:49         ` Kevin Wolf
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin Wolf @ 2021-04-20  6:49 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, Max Reitz

Am 19.04.2021 um 19:12 hat Peter Lieven geschrieben:
> 
> 
> Von meinem iPhone gesendet
> 
> > Am 19.04.2021 um 14:31 schrieb Kevin Wolf <kwolf@redhat.com>:
> > 
> > Am 19.04.2021 um 11:13 hat Peter Lieven geschrieben:
> >> 
> >> 
> >>>> Am 19.04.2021 um 10:36 schrieb Peter Lieven <pl@kamp.de>:
> >>> 
> >>> 
> >>> 
> >>>> Am 15.04.2021 um 17:22 schrieb Kevin Wolf <kwolf@redhat.com>:
> >>>> 
> >>>> Peter, three years ago you changed 'qemu-img convert' to sacrifice some
> >>>> sparsification in order to get aligned requests on the target image. At
> >>>> the time, I thought the impact would be small, but it turns out that
> >>>> this can end up wasting gigabytes of storagee (like converting a fully
> >>>> zeroed 10 GB image taking 2.8 GB instead of a few kilobytes).
> >>>> 
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1882917
> >>>> 
> >>>> I'm not entirely sure how to attack this best since this is a tradeoff,
> >>>> but maybe the approach in this series is still good enough for the case
> >>>> that you wanted to fix back then?
> >>>> 
> >>>> Of course, it would be possible to have a more complete fix like looking
> >>>> forward a few blocks more before writing data, but that would probably
> >>>> not be entirely trivial because you would have to merge blocks with ZERO
> >>>> block status with DATA blocks that contain only zeros. I'm not sure if
> >>>> it's worth this complication of the code.
> >>> 
> >>> I will try to look into this asap.
> >> 
> >> Besides from the reproducer described in the ticket, I retried my old
> >> conversion test in our environment:
> >> 
> >> Before commit 8dcd3c9b91: reads 4608 writes 14959
> >> After commit 8dcd3c9b91: reads 0 writes 14924
> >> With Kevins patch: reads 110 writes 14924
> >> 
> >> I think this is a good result if it avoids other issues.
> > 
> > Sounds like a promising way to make the tradeoff. Thanks for testing!
> 
> is this sth for 6.0-rc4?

No, certainly not. It would be for the first 6.1 pull request.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection
  2021-04-15 15:22 ` [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection Kevin Wolf
@ 2021-04-20 14:31   ` Vladimir Sementsov-Ogievskiy
  2021-04-20 15:04     ` Kevin Wolf
  0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-20 14:31 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block; +Cc: pl, qemu-devel, mreitz

15.04.2021 18:22, Kevin Wolf wrote:
> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
> like non-zero data if the end of the checked area isn't aligned. This
> can improve the efficiency of the conversion and was introduced in
> commit 8dcd3c9b91a.
> 
> However, it comes with a correctness problem: qemu-img convert is
> supposed to sparsify areas that contain only zeros, which it doesn't do
> any more. It turns out that this even happens when not only the
> unaligned area is zeroed, but also the blocks before and after it. In
> the bug report, conversion of a fragmented 10G image containing only
> zeros resulted in an image consuming 2.82 GiB even though the expected
> size is only 4 KiB.
> 
> As a tradeoff between both, let's ignore zeroed sectors only after
> non-zero data to fix the alignment, but if we're only looking at zeros,
> keep them as such, even if it may mean additional RMW cycles.
> 

Hmm.. If I understand correctly, we are going to do unaligned write-zero. And that helps. Doesn't that mean that alignment is wrongly detected?

> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1882917
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   qemu-img.c                 | 18 ++++--------------
>   tests/qemu-iotests/122.out | 12 ++++--------
>   2 files changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index a5993682aa..ca4eba2dd1 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1168,20 +1168,10 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>       }
>   
>       tail = (sector_num + i) & (alignment - 1);
> -    if (tail) {
> -        if (is_zero && i <= tail) {
> -            /* treat unallocated areas which only consist
> -             * of a small tail as allocated. */
> -            is_zero = false;
> -        }
> -        if (!is_zero) {
> -            /* align up end offset of allocated areas. */
> -            i += alignment - tail;
> -            i = MIN(i, n);
> -        } else {
> -            /* align down end offset of zero areas. */
> -            i -= tail;
> -        }
> +    if (tail && !is_zero) {
> +        /* align up end offset of allocated areas. */
> +        i += alignment - tail;
> +        i = MIN(i, n);
>       }
>       *pnum = i;
>       return !is_zero;
> diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
> index dcc44a2304..fe0ea34164 100644
> --- a/tests/qemu-iotests/122.out
> +++ b/tests/qemu-iotests/122.out
> @@ -199,11 +199,9 @@ convert -S 4k
>   [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
>   { "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
>   { "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
> -{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 20480, "length": 46080, "depth": 0, "zero": true, "data": false},
> -{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
> +{ "start": 12288, "length": 5120, "depth": 0, "zero": true, "data": false},
> +{ "start": 17408, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": false}]
>   
>   convert -c -S 4k
>   [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
> @@ -215,9 +213,7 @@ convert -c -S 4k
>   
>   convert -S 8k
>   [{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 24576, "length": 41984, "depth": 0, "zero": true, "data": false},
> -{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
> +{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": false}]
>   
>   convert -c -S 8k
>   [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection
  2021-04-20 14:31   ` Vladimir Sementsov-Ogievskiy
@ 2021-04-20 15:04     ` Kevin Wolf
  2021-04-20 16:52       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2021-04-20 15:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: pl, qemu-devel, qemu-block, mreitz

Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 15.04.2021 18:22, Kevin Wolf wrote:
> > In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
> > like non-zero data if the end of the checked area isn't aligned. This
> > can improve the efficiency of the conversion and was introduced in
> > commit 8dcd3c9b91a.
> > 
> > However, it comes with a correctness problem: qemu-img convert is
> > supposed to sparsify areas that contain only zeros, which it doesn't do
> > any more. It turns out that this even happens when not only the
> > unaligned area is zeroed, but also the blocks before and after it. In
> > the bug report, conversion of a fragmented 10G image containing only
> > zeros resulted in an image consuming 2.82 GiB even though the expected
> > size is only 4 KiB.
> > 
> > As a tradeoff between both, let's ignore zeroed sectors only after
> > non-zero data to fix the alignment, but if we're only looking at zeros,
> > keep them as such, even if it may mean additional RMW cycles.
> > 
> 
> Hmm.. If I understand correctly, we are going to do unaligned
> write-zero. And that helps.

This can happen (mostly raw images on block devices, I think?), but
usually it just means skipping the write because we know that the target
image is already zeroed.

What it does mean is that if the next part is data, we'll have an
unaligned data write.

> Doesn't that mean that alignment is wrongly detected?

The problem is that you can have bdrv_block_status_above() return the
same allocation status multiple times in a row, but *pnum can be
unaligned for the conversion.

We only look at a single range returned by it when detecting the
alignment, so it could be that we have zero buffers for both 0-11 and
12-16 and detect two misaligned ranges, when both together are a
perfectly aligned zeroed range.

In theory we could try to do some lookahead and merge ranges where
possible, which should give us the perfect result, but it would make the
code considerably more complicated. (Whether we want to merge them
doesn't only depend on the block status, but possibly also on the
content of a DATA range.)

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection
  2021-04-20 15:04     ` Kevin Wolf
@ 2021-04-20 16:52       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-04-20 16:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, pl, qemu-devel, mreitz

20.04.2021 18:04, Kevin Wolf wrote:
> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 15.04.2021 18:22, Kevin Wolf wrote:
>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>> like non-zero data if the end of the checked area isn't aligned. This
>>> can improve the efficiency of the conversion and was introduced in
>>> commit 8dcd3c9b91a.
>>>
>>> However, it comes with a correctness problem: qemu-img convert is
>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>> any more. It turns out that this even happens when not only the
>>> unaligned area is zeroed, but also the blocks before and after it. In
>>> the bug report, conversion of a fragmented 10G image containing only
>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>> size is only 4 KiB.
>>>
>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>> keep them as such, even if it may mean additional RMW cycles.
>>>
>>
>> Hmm.. If I understand correctly, we are going to do unaligned
>> write-zero. And that helps.
> 
> This can happen (mostly raw images on block devices, I think?), but
> usually it just means skipping the write because we know that the target
> image is already zeroed.
> 
> What it does mean is that if the next part is data, we'll have an
> unaligned data write.
> 
>> Doesn't that mean that alignment is wrongly detected?
> 
> The problem is that you can have bdrv_block_status_above() return the
> same allocation status multiple times in a row, but *pnum can be
> unaligned for the conversion.
> 
> We only look at a single range returned by it when detecting the
> alignment, so it could be that we have zero buffers for both 0-11 and
> 12-16 and detect two misaligned ranges, when both together are a
> perfectly aligned zeroed range.
> 
> In theory we could try to do some lookahead and merge ranges where
> possible, which should give us the perfect result, but it would make the
> code considerably more complicated. (Whether we want to merge them
> doesn't only depend on the block status, but possibly also on the
> content of a DATA range.)
> 
> Kevin
> 

Oh, I understand now the problem, thanks for explanation.

Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.

But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?

what about something like this:

diff --git a/qemu-img.c b/qemu-img.c
index babb5573ab..d1704584a0 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
          }
      }
  
+    if (i == n) {
+        /*
+         * The whole buf is the same.
+         *
+         * if it's data, just return it. It's the old behavior.
+         *
+         * if it's zero, just return too. It will work good if target is alredy
+         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
+         * to write data.
+         */
+        *pnum = i;
+        return !is_zero;
+    }
+
      tail = (sector_num + i) & (alignment - 1);
      if (tail) {
          if (is_zero && i <= tail) {
-            /* treat unallocated areas which only consist
-             * of a small tail as allocated. */
+            /*
+             * For sure next sector after i is data, and it will rewrite this
+             * tail anyway due to RMW. So, let's just write data now.
+             */
              is_zero = false;
          }
          if (!is_zero) {
-            /* align up end offset of allocated areas. */
+            /* If possible, align up end offset of allocated areas. */
              i += alignment - tail;
              i = MIN(i, n);
          } else {
-            /* align down end offset of zero areas. */
+            /*
+             * For sure next sector after i is data, and it will rewrite this
+             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
+             * to aligned bound.
+             */
              i -= tail;
          }
      }


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 15:22 [RFC PATCH 0/2] qemu-img convert: Fix sparseness detection Kevin Wolf
2021-04-15 15:22 ` [RFC PATCH 1/2] iotests: Test qemu-img convert of zeroed data cluster Kevin Wolf
2021-04-15 15:22 ` [RFC PATCH 2/2] qemu-img convert: Fix sparseness detection Kevin Wolf
2021-04-20 14:31   ` Vladimir Sementsov-Ogievskiy
2021-04-20 15:04     ` Kevin Wolf
2021-04-20 16:52       ` Vladimir Sementsov-Ogievskiy
2021-04-19  8:36 ` [RFC PATCH 0/2] " Peter Lieven
2021-04-19  9:13   ` Peter Lieven
2021-04-19 12:31     ` Kevin Wolf
2021-04-19 17:12       ` Peter Lieven
2021-04-20  6:49         ` Kevin Wolf
2021-04-19 11:22   ` Kevin Wolf

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git
	git clone --mirror https://lore.kernel.org/qemu-devel/2 qemu-devel/git/2.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git