QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
@ 2019-08-12 18:11 Vladimir Sementsov-Ogievskiy
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE Vladimir Sementsov-Ogievskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 18:11 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den, vsementsov, qemu-devel, mreitz

Hi all!

I'm not sure, is it a bug or a feature, but using qcow2 under raw is
broken. It should be either fixed like I propose (by Max's suggestion)
or somehow forbidden (just forbid backing-file supporting node to be
file child of raw-format node).

Vladimir Sementsov-Ogievskiy (2):
  block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  iotests: test mirroring qcow2 under raw format

 block/raw-format.c         |  2 +-
 tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/263.out | 12 ++++++++++
 tests/qemu-iotests/group   |  1 +
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/263
 create mode 100644 tests/qemu-iotests/263.out

-- 
2.18.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-12 18:11 [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Vladimir Sementsov-Ogievskiy
@ 2019-08-12 18:11 ` Vladimir Sementsov-Ogievskiy
  2019-08-13 11:04   ` Kevin Wolf
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format Vladimir Sementsov-Ogievskiy
  2019-08-12 19:46 ` [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Max Reitz
  2 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 18:11 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den, vsementsov, qemu-devel, mreitz

BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
returned file. But is it correct behavior at all? If returned file
itself has a backing file, we may report as totally unallocated and
area which actually has data in bottom backing file.

So, mirroring of qcow2 under raw-format is broken. Which is illustrated
by following commit with a test. Let's make raw-format behave more
correctly returning BDRV_BLOCK_DATA.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/raw-format.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index bffd424dd0..a273ee2387 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -275,7 +275,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     *pnum = bytes;
     *file = bs->file->bs;
     *map = offset + s->offset;
-    return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_RECURSE;
 }
 
 static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
-- 
2.18.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format
  2019-08-12 18:11 [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Vladimir Sementsov-Ogievskiy
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE Vladimir Sementsov-Ogievskiy
@ 2019-08-12 18:11 ` Vladimir Sementsov-Ogievskiy
  2019-08-13  9:10   ` Kevin Wolf
  2019-08-12 19:46 ` [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Max Reitz
  2 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-12 18:11 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, den, vsementsov, qemu-devel, mreitz

Check that it is fixed by previous commit

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/263.out | 12 ++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 59 insertions(+)
 create mode 100755 tests/qemu-iotests/263
 create mode 100644 tests/qemu-iotests/263.out

diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
new file mode 100755
index 0000000000..dbe38e6c6c
--- /dev/null
+++ b/tests/qemu-iotests/263
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+#
+# Test mirroring qcow2 under raw-format
+#
+# Copyright (c) 2019 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import iotests
+from iotests import qemu_img_create, qemu_io, filter_qemu_io, file_path, log
+
+iotests.verify_image_format(supported_fmts=['qcow2'])
+
+base, top, target = file_path('base', 'top', 'target')
+size = 1024 * 1024
+
+qemu_img_create('-f', iotests.imgfmt, base, str(size))
+log(filter_qemu_io(qemu_io('-f', iotests.imgfmt,
+                           '-c', 'write -P 1 0 1M', base)))
+qemu_img_create('-f', iotests.imgfmt, '-b', base, top, str(size))
+
+vm = iotests.VM().add_drive(None, opts='driver=raw,size=' + str(size) +
+                            ',file.driver=qcow2,file.file.filename=' + top)
+vm.launch()
+
+vm.qmp_log('drive-mirror', device='drive0', target=target, sync='full',
+           filters=[iotests.filter_qmp_testfiles])
+log(vm.event_wait('BLOCK_JOB_READY'), filters=[iotests.filter_qmp_event])
+vm.qmp_log('block-job-complete', device='drive0')
+vm.shutdown()
+
+log(filter_qemu_io(qemu_io('-f', 'raw', '-c', 'read -P 1 0 1M', target)))
+
+log('-- finish --') # avoid extra new-line at the end of .out file
diff --git a/tests/qemu-iotests/263.out b/tests/qemu-iotests/263.out
new file mode 100644
index 0000000000..65e5c812c6
--- /dev/null
+++ b/tests/qemu-iotests/263.out
@@ -0,0 +1,12 @@
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+{"execute": "drive-mirror", "arguments": {"device": "drive0", "sync": "full", "target": "TEST_DIR/PID-target"}}
+{"return": {}}
+{"data": {"device": "drive0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "block-job-complete", "arguments": {"device": "drive0"}}
+{"return": {}}
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+-- finish --
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index f13e5f2e23..186a0220e5 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -271,3 +271,4 @@
 254 rw backing quick
 255 rw quick
 256 rw quick
+263 rw quick
-- 
2.18.0



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-12 18:11 [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Vladimir Sementsov-Ogievskiy
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE Vladimir Sementsov-Ogievskiy
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format Vladimir Sementsov-Ogievskiy
@ 2019-08-12 19:46 ` Max Reitz
  2019-08-12 19:50   ` Max Reitz
  2019-08-13  9:34   ` Kevin Wolf
  2 siblings, 2 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-12 19:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: kwolf, den, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1151 bytes --]

On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
> broken. It should be either fixed like I propose (by Max's suggestion)
> or somehow forbidden (just forbid backing-file supporting node to be
> file child of raw-format node).

I agree, I think only filters should return BDRV_BLOCK_RAW.

(And not even them, they should just be handled transparently by
bdrv_co_block_status().  But that’s something for later.)

> Vladimir Sementsov-Ogievskiy (2):
>   block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>   iotests: test mirroring qcow2 under raw format
> 
>  block/raw-format.c         |  2 +-
>  tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/263.out | 12 ++++++++++
>  tests/qemu-iotests/group   |  1 +
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/263
>  create mode 100644 tests/qemu-iotests/263.out

Thanks, applied to my block-next branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-12 19:46 ` [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Max Reitz
@ 2019-08-12 19:50   ` Max Reitz
  2019-08-13  8:39     ` Vladimir Sementsov-Ogievskiy
  2019-08-13  9:34   ` Kevin Wolf
  1 sibling, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-08-12 19:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block; +Cc: kwolf, den, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1272 bytes --]

On 12.08.19 21:46, Max Reitz wrote:
> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>> broken. It should be either fixed like I propose (by Max's suggestion)
>> or somehow forbidden (just forbid backing-file supporting node to be
>> file child of raw-format node).
> 
> I agree, I think only filters should return BDRV_BLOCK_RAW.
> 
> (And not even them, they should just be handled transparently by
> bdrv_co_block_status().  But that’s something for later.)
> 
>> Vladimir Sementsov-Ogievskiy (2):
>>   block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>   iotests: test mirroring qcow2 under raw format
>>
>>  block/raw-format.c         |  2 +-
>>  tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>  tests/qemu-iotests/263.out | 12 ++++++++++
>>  tests/qemu-iotests/group   |  1 +
>>  4 files changed, 60 insertions(+), 1 deletion(-)
>>  create mode 100755 tests/qemu-iotests/263
>>  create mode 100644 tests/qemu-iotests/263.out
> 
> Thanks, applied to my block-next branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Oops, maybe not.  221 needs to be adjusted.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-12 19:50   ` Max Reitz
@ 2019-08-13  8:39     ` Vladimir Sementsov-Ogievskiy
  2019-08-13  9:01       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13  8:39 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: kwolf, qemu-devel, Denis Lunev

12.08.2019 22:50, Max Reitz wrote:
> On 12.08.19 21:46, Max Reitz wrote:
>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>> or somehow forbidden (just forbid backing-file supporting node to be
>>> file child of raw-format node).
>>
>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>
>> (And not even them, they should just be handled transparently by
>> bdrv_co_block_status().  But that’s something for later.)
>>
>>> Vladimir Sementsov-Ogievskiy (2):
>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>    iotests: test mirroring qcow2 under raw format
>>>
>>>   block/raw-format.c         |  2 +-
>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>>   tests/qemu-iotests/263.out | 12 ++++++++++
>>>   tests/qemu-iotests/group   |  1 +
>>>   4 files changed, 60 insertions(+), 1 deletion(-)
>>>   create mode 100755 tests/qemu-iotests/263
>>>   create mode 100644 tests/qemu-iotests/263.out
>>
>> Thanks, applied to my block-next branch:
>>
>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
> 
> Oops, maybe not.  221 needs to be adjusted.
> 


Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
don't look good.

So, it's not quite right to report DATA | RECURSE, we actually should report
DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
DATA will be set in final result (generic layer must not drop it, obviously).

ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
resend something new.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13  8:39     ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13  9:01       ` Vladimir Sementsov-Ogievskiy
  2019-08-13  9:33         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13  9:01 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: kwolf, qemu-devel, Denis Lunev

13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
> 12.08.2019 22:50, Max Reitz wrote:
>> On 12.08.19 21:46, Max Reitz wrote:
>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hi all!
>>>>
>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>>> or somehow forbidden (just forbid backing-file supporting node to be
>>>> file child of raw-format node).
>>>
>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>>
>>> (And not even them, they should just be handled transparently by
>>> bdrv_co_block_status().  But that’s something for later.)
>>>
>>>> Vladimir Sementsov-Ogievskiy (2):
>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>>    iotests: test mirroring qcow2 under raw format
>>>>
>>>>   block/raw-format.c         |  2 +-
>>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
>>>>   tests/qemu-iotests/group   |  1 +
>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
>>>>   create mode 100755 tests/qemu-iotests/263
>>>>   create mode 100644 tests/qemu-iotests/263.out
>>>
>>> Thanks, applied to my block-next branch:
>>>
>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
>>
>> Oops, maybe not.  221 needs to be adjusted.
>>
> 
> 
> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
> don't look good.
> 
> So, it's not quite right to report DATA | RECURSE, we actually should report
> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
> DATA will be set in final result (generic layer must not drop it, obviously).
> 
> ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
> resend something new.
> 
> 


Hmmm.. So, we have raw node, and assume backing chain under it. who should loop through it,
generic code or raw driver?

Now it all looks like generic code is responsible for looping through filtered chain (backing files
and filters) and driver is responsible for all it's children except for filtered child.

Or, driver may return something that says to generic child to handle the whole backing chain of returned
file at once, as it's another backing chain. And seems even RECURSE don't work correctly as it doesn't handle
the backing chain in this recursion. Why it works better than RAW - just because we return it together
with DATA flags and this DATA flag is kept anyway, independently of finding zeros or not.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format Vladimir Sementsov-Ogievskiy
@ 2019-08-13  9:10   ` Kevin Wolf
  2019-08-13  9:22     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13  9:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: den, qemu-devel, qemu-block, mreitz

Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Check that it is fixed by previous commit
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/263.out | 12 ++++++++++
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 59 insertions(+)
>  create mode 100755 tests/qemu-iotests/263
>  create mode 100644 tests/qemu-iotests/263.out
> 
> diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
> new file mode 100755
> index 0000000000..dbe38e6c6c
> --- /dev/null
> +++ b/tests/qemu-iotests/263
> @@ -0,0 +1,46 @@
> +#!/usr/bin/env python
> +#
> +# Test mirroring qcow2 under raw-format
> +#
> +# Copyright (c) 2019 Virtuozzo International GmbH.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +#
> +
> +import iotests
> +from iotests import qemu_img_create, qemu_io, filter_qemu_io, file_path, log
> +
> +iotests.verify_image_format(supported_fmts=['qcow2'])
> +
> +base, top, target = file_path('base', 'top', 'target')
> +size = 1024 * 1024
> +
> +qemu_img_create('-f', iotests.imgfmt, base, str(size))
> +log(filter_qemu_io(qemu_io('-f', iotests.imgfmt,
> +                           '-c', 'write -P 1 0 1M', base)))
> +qemu_img_create('-f', iotests.imgfmt, '-b', base, top, str(size))

Who deletes these files at the end of the test? (There's a reason why
all the other test cases uses 'with FilePath(...)'.)

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format
  2019-08-13  9:10   ` Kevin Wolf
@ 2019-08-13  9:22     ` Vladimir Sementsov-Ogievskiy
  2019-08-13  9:36       ` Kevin Wolf
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13  9:22 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Denis Lunev, qemu-devel, qemu-block, mreitz

13.08.2019 12:10, Kevin Wolf wrote:
> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Check that it is fixed by previous commit
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/263.out | 12 ++++++++++
>>   tests/qemu-iotests/group   |  1 +
>>   3 files changed, 59 insertions(+)
>>   create mode 100755 tests/qemu-iotests/263
>>   create mode 100644 tests/qemu-iotests/263.out
>>
>> diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
>> new file mode 100755
>> index 0000000000..dbe38e6c6c
>> --- /dev/null
>> +++ b/tests/qemu-iotests/263
>> @@ -0,0 +1,46 @@
>> +#!/usr/bin/env python
>> +#
>> +# Test mirroring qcow2 under raw-format
>> +#
>> +# Copyright (c) 2019 Virtuozzo International GmbH.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 2 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +#
>> +
>> +import iotests
>> +from iotests import qemu_img_create, qemu_io, filter_qemu_io, file_path, log
>> +
>> +iotests.verify_image_format(supported_fmts=['qcow2'])
>> +
>> +base, top, target = file_path('base', 'top', 'target')
>> +size = 1024 * 1024
>> +
>> +qemu_img_create('-f', iotests.imgfmt, base, str(size))
>> +log(filter_qemu_io(qemu_io('-f', iotests.imgfmt,
>> +                           '-c', 'write -P 1 0 1M', base)))
>> +qemu_img_create('-f', iotests.imgfmt, '-b', base, top, str(size))
> 
> Who deletes these files at the end of the test? (There's a reason why
> all the other test cases uses 'with FilePath(...)'.)
> 

file_path sets corresponding handling, it's alternative to FilePath


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13  9:01       ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13  9:33         ` Vladimir Sementsov-Ogievskiy
  2019-08-13 11:14           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13  9:33 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: kwolf, qemu-devel, Denis Lunev

13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
>> 12.08.2019 22:50, Max Reitz wrote:
>>> On 12.08.19 21:46, Max Reitz wrote:
>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Hi all!
>>>>>
>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>>>> or somehow forbidden (just forbid backing-file supporting node to be
>>>>> file child of raw-format node).
>>>>
>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>>>
>>>> (And not even them, they should just be handled transparently by
>>>> bdrv_co_block_status().  But that’s something for later.)
>>>>
>>>>> Vladimir Sementsov-Ogievskiy (2):
>>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>>>    iotests: test mirroring qcow2 under raw format
>>>>>
>>>>>   block/raw-format.c         |  2 +-
>>>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
>>>>>   tests/qemu-iotests/group   |  1 +
>>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
>>>>>   create mode 100755 tests/qemu-iotests/263
>>>>>   create mode 100644 tests/qemu-iotests/263.out
>>>>
>>>> Thanks, applied to my block-next branch:
>>>>
>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
>>>
>>> Oops, maybe not.  221 needs to be adjusted.
>>>
>>
>>
>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
>> don't look good.
>>
>> So, it's not quite right to report DATA | RECURSE, we actually should report
>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
>> DATA will be set in final result (generic layer must not drop it, obviously).
>>
>> ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
>> resend something new.
>>
>>
> 
> 
> Hmmm.. So, we have raw node, and assume backing chain under it. who should loop through it,
> generic code or raw driver?
> 
> Now it all looks like generic code is responsible for looping through filtered chain (backing files
> and filters) and driver is responsible for all it's children except for filtered child.
> 
> Or, driver may return something that says to generic child to handle the whole backing chain of returned
> file at once, as it's another backing chain. And seems even RECURSE don't work correctly as it doesn't handle
> the backing chain in this recursion. Why it works better than RAW - just because we return it together
> with DATA flags and this DATA flag is kept anyway, independently of finding zeros or not.
> 
> 


Hmm, so, is it correct that we return DATA | RECURSE, if we are not really sure that it is data?

If we see at

  * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer

seems like we should report DATA only if there is allocation..

  * DATA ZERO OFFSET_VALID
  *  t    t        t       sectors read as zero, returned file is zero at offset
  *  t    f        t       sectors read as valid from file at offset
  *  f    t        t       sectors preallocated, read as zero, returned file not

so, ZERO alone doesn't guarantee that we may safely read?

So, for qcow2 metadata-preallocated images, what about zero-init? We report DATA, and probably get ZERO from
file and have finally DAYA | ZERO which guarantees that read will return zeros, but is it true?

Finally, what "DATA" mean? That space is allocated and occupies disk space? Or it only  means only ALLOCATED i.e.
"read from this layer, not from backing" otherwise, and adds additional meaning to ZERO when used together, that
read will return zeros for sure?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-12 19:46 ` [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Max Reitz
  2019-08-12 19:50   ` Max Reitz
@ 2019-08-13  9:34   ` Kevin Wolf
  2019-08-13 14:38     ` Max Reitz
  1 sibling, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13  9:34 UTC (permalink / raw)
  To: Max Reitz; +Cc: den, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

Am 12.08.2019 um 21:46 hat Max Reitz geschrieben:
> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> > 
> > I'm not sure, is it a bug or a feature, but using qcow2 under raw is
> > broken. It should be either fixed like I propose (by Max's suggestion)
> > or somehow forbidden (just forbid backing-file supporting node to be
> > file child of raw-format node).
> 
> I agree, I think only filters should return BDRV_BLOCK_RAW.

If BDRV_BLOCK_RAW isn't suitable for raw, something went wrong. :-)

But anyway, raw is essentially a filter, at least if you don't use
the offset option. And BDRV_BLOCK_RAW shouldn't even depend on an
unchanged offset because the .bdrv_co_block_status implementation
returns the right offset.

And indeed, you can replace raw with blkdebug and it still fails (the
testcase from patch 2 fails, too, but it's obvious enough this way):

    $ ./qemu-img map --output=json --image-opts driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2 
    [{ "start": 0, "length": 1048576, "depth": 1, "zero": true, "data": false},
    { "start": 1048576, "length": 1048576, "depth": 1, "zero": false, "data": true, "offset": 327680},
    { "start": 2097152, "length": 65011712, "depth": 1, "zero": true, "data": false}]

    $ ./qemu-img map --output=json --image-opts driver=raw,file.driver=qcow2,file.file.driver=file,file.file.filename=/tmp/test.qcow2 
    [{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": false}]

    $ ./qemu-img map --output=json --image-opts driver=blkdebug,image.driver=qcow2,image.file.driver=file,image.file.filename=/tmp/test.qcow2 
    [{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": false}]

After applying your "deal with filters" series, blkdebug actually prints
the expected result and passes the iotests case, but raw still doesn't.
So you must have fixed something for filters that declare themselves
filters, even though that semantics should probably be coupled to
BDRV_BLOCK_RAW instead so that the raw format can benefit from it, too.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format
  2019-08-13  9:22     ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13  9:36       ` Kevin Wolf
  0 siblings, 0 replies; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13  9:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: Denis Lunev, qemu-devel, qemu-block, mreitz

Am 13.08.2019 um 11:22 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 13.08.2019 12:10, Kevin Wolf wrote:
> > Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> Check that it is fixed by previous commit
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >> ---
> >>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
> >>   tests/qemu-iotests/263.out | 12 ++++++++++
> >>   tests/qemu-iotests/group   |  1 +
> >>   3 files changed, 59 insertions(+)
> >>   create mode 100755 tests/qemu-iotests/263
> >>   create mode 100644 tests/qemu-iotests/263.out
> >>
> >> diff --git a/tests/qemu-iotests/263 b/tests/qemu-iotests/263
> >> new file mode 100755
> >> index 0000000000..dbe38e6c6c
> >> --- /dev/null
> >> +++ b/tests/qemu-iotests/263
> >> @@ -0,0 +1,46 @@
> >> +#!/usr/bin/env python
> >> +#
> >> +# Test mirroring qcow2 under raw-format
> >> +#
> >> +# Copyright (c) 2019 Virtuozzo International GmbH.
> >> +#
> >> +# This program is free software; you can redistribute it and/or modify
> >> +# it under the terms of the GNU General Public License as published by
> >> +# the Free Software Foundation; either version 2 of the License, or
> >> +# (at your option) any later version.
> >> +#
> >> +# This program is distributed in the hope that it will be useful,
> >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> +# GNU General Public License for more details.
> >> +#
> >> +# You should have received a copy of the GNU General Public License
> >> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> +#
> >> +
> >> +import iotests
> >> +from iotests import qemu_img_create, qemu_io, filter_qemu_io, file_path, log
> >> +
> >> +iotests.verify_image_format(supported_fmts=['qcow2'])
> >> +
> >> +base, top, target = file_path('base', 'top', 'target')
> >> +size = 1024 * 1024
> >> +
> >> +qemu_img_create('-f', iotests.imgfmt, base, str(size))
> >> +log(filter_qemu_io(qemu_io('-f', iotests.imgfmt,
> >> +                           '-c', 'write -P 1 0 1M', base)))
> >> +qemu_img_create('-f', iotests.imgfmt, '-b', base, top, str(size))
> > 
> > Who deletes these files at the end of the test? (There's a reason why
> > all the other test cases uses 'with FilePath(...)'.)
> > 
> 
> file_path sets corresponding handling, it's alternative to FilePath

Ah, I see it uses atexit. Thanks, I didn't know this feature yet.

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-12 18:11 ` [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE Vladimir Sementsov-Ogievskiy
@ 2019-08-13 11:04   ` Kevin Wolf
  2019-08-13 11:28     ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:43     ` [Qemu-devel] " Max Reitz
  0 siblings, 2 replies; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 11:04 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: den, qemu-devel, qemu-block, mreitz

Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
> returned file. But is it correct behavior at all? If returned file
> itself has a backing file, we may report as totally unallocated and
> area which actually has data in bottom backing file.
> 
> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
> by following commit with a test. Let's make raw-format behave more
> correctly returning BDRV_BLOCK_DATA.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

After some reading, I think I came to the conclusion that RAW is the
correct thing to do. There is indeed a problem, but this patch is trying
to fix it in the wrong place.

In the case where the backing file contains some data, and we have a
'raw' node above the qcow2 overlay node, the content of the respective
block is not defined by the queried backing file layer, so it is
completely correct that bdrv_is_allocated() returns false, like it would
if you queried the qcow2 layer directly. If it returned true, we would
copy everything, which isn't right either (the test cases should may add
the qemu-img map output of the target so this becomes visible).

The problem is that we try to recurse along the backing chain, but we
fail to make the step from the raw node to the backing file.

Note that just extending Max's "deal with filters" is not enough to fix
this because raw doesn't actually meet all of the criteria for being a
filter in this sense (at least because the 'offset' option can change
offsets between raw and its child).

I think this is essentially a result of special-casing backing files
everywhere instead of treating them like children like any other.
bdrv_co_block_status_above() probably shouldn't recurse along the
backing chain, but along the returned *file pointers, and consider the
returned offset in *map.

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13  9:33         ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 11:14           ` Vladimir Sementsov-Ogievskiy
  2019-08-13 11:51             ` Kevin Wolf
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 11:14 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: kwolf, qemu-devel, Denis Lunev

13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.08.2019 22:50, Max Reitz wrote:
>>>> On 12.08.19 21:46, Max Reitz wrote:
>>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Hi all!
>>>>>>
>>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>>>>> or somehow forbidden (just forbid backing-file supporting node to be
>>>>>> file child of raw-format node).
>>>>>
>>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>>>>
>>>>> (And not even them, they should just be handled transparently by
>>>>> bdrv_co_block_status().  But that’s something for later.)
>>>>>
>>>>>> Vladimir Sementsov-Ogievskiy (2):
>>>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>>>>    iotests: test mirroring qcow2 under raw format
>>>>>>
>>>>>>   block/raw-format.c         |  2 +-
>>>>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
>>>>>>   tests/qemu-iotests/group   |  1 +
>>>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>   create mode 100755 tests/qemu-iotests/263
>>>>>>   create mode 100644 tests/qemu-iotests/263.out
>>>>>
>>>>> Thanks, applied to my block-next branch:
>>>>>
>>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
>>>>
>>>> Oops, maybe not.  221 needs to be adjusted.
>>>>
>>>
>>>
>>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
>>> don't look good.
>>>
>>> So, it's not quite right to report DATA | RECURSE, we actually should report
>>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
>>> DATA will be set in final result (generic layer must not drop it, obviously).
>>>
>>> ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
>>> resend something new.
>>>
>>>
>>
>>
>> Hmmm.. So, we have raw node, and assume backing chain under it. who should loop through it,
>> generic code or raw driver?
>>
>> Now it all looks like generic code is responsible for looping through filtered chain (backing files
>> and filters) and driver is responsible for all it's children except for filtered child.
>>
>> Or, driver may return something that says to generic child to handle the whole backing chain of returned
>> file at once, as it's another backing chain. And seems even RECURSE don't work correctly as it doesn't handle
>> the backing chain in this recursion. Why it works better than RAW - just because we return it together
>> with DATA flags and this DATA flag is kept anyway, independently of finding zeros or not.
>>
>>
> 
> 
> Hmm, so, is it correct that we return DATA | RECURSE, if we are not really sure that it is data?
> 
> If we see at
> 
>   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> 
> seems like we should report DATA only if there is allocation..
> 
>   * DATA ZERO OFFSET_VALID
>   *  t    t        t       sectors read as zero, returned file is zero at offset
>   *  t    f        t       sectors read as valid from file at offset
>   *  f    t        t       sectors preallocated, read as zero, returned file not
> 
> so, ZERO alone doesn't guarantee that we may safely read?
> 
> So, for qcow2 metadata-preallocated images, what about zero-init? We report DATA, and probably get ZERO from
> file and have finally DAYA | ZERO which guarantees that read will return zeros, but is it true?
> 
> Finally, what "DATA" mean? That space is allocated and occupies disk space? Or it only  means only ALLOCATED i.e.
> "read from this layer, not from backing" otherwise, and adds additional meaning to ZERO when used together, that
> read will return zeros for sure?
> 


Continue self-discussion.

Consider closer the following case:
 >   * DATA ZERO OFFSET_VALID
 >   *  f    t        t       sectors preallocated, read as zero, returned file not

It actually means that we must not read, as read will return wrong data, when clusters are actually zero for guest.

It's OK, when for ex. qcow2 returns this combination and link to its file child: it means that if you read from qcow2
node, you'll see correct zeros, as qcow2 has special metadata which shows that these clusters are zero. But if you read
from file directly at returned offset you'll see garbage, so don't do it.

But what if some node returns this combination with file == itself? It actually means that you must not read, but you
should call block-status to understand that there are zeros. So, if some format can return this combination with
file == itself it means that you must not read it directly, but only after checking block status.

And file-posix is example of such driver. But file-posix holes are guaranteed to read as zero, so we can report DATA | ZERO.
But this will break user expirience which assumes that DATA means occupation of real disk space.

...
me go and re-read what we've documented in NBD protocol about block steus...

"DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just notes that probably there is no disk space occupation
for this region.. So it's about disk space allocation and nothing about correctness of read.
and NBD_STATE_ZERO guarantees that region read as all zeroes.

Look at code in nbd/server.c.. Aha, it calls block_status_above and turns !ALLOCATED into HOLE. Which means that it will never
return HOLE for file-posix..





-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 11:04   ` Kevin Wolf
@ 2019-08-13 11:28     ` Vladimir Sementsov-Ogievskiy
  2019-08-13 12:01       ` Kevin Wolf
  2019-08-13 14:43     ` [Qemu-devel] " Max Reitz
  1 sibling, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 11:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Denis Lunev, qemu-devel, qemu-block, mreitz

13.08.2019 14:04, Kevin Wolf wrote:
> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>> returned file. But is it correct behavior at all? If returned file
>> itself has a backing file, we may report as totally unallocated and
>> area which actually has data in bottom backing file.
>>
>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>> by following commit with a test. Let's make raw-format behave more
>> correctly returning BDRV_BLOCK_DATA.
>>
>> Suggested-by: Max Reitz <mreitz@redhat.com>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> After some reading, I think I came to the conclusion that RAW is the
> correct thing to do. There is indeed a problem, but this patch is trying
> to fix it in the wrong place.
> 
> In the case where the backing file contains some data, and we have a
> 'raw' node above the qcow2 overlay node, the content of the respective
> block is not defined by the queried backing file layer, so it is
> completely correct that bdrv_is_allocated() returns false, like it would
> if you queried the qcow2 layer directly. If it returned true, we would
> copy everything, which isn't right either (the test cases should may add
> the qemu-img map output of the target so this becomes visible).
> 
> The problem is that we try to recurse along the backing chain, but we
> fail to make the step from the raw node to the backing file.

I'd say, the problem is that we ignore backing chain of non-backing child

> 
> Note that just extending Max's "deal with filters" is not enough to fix
> this because raw doesn't actually meet all of the criteria for being a
> filter in this sense (at least because the 'offset' option can change
> offsets between raw and its child).
> 
> I think this is essentially a result of special-casing backing files
> everywhere instead of treating them like children like any other.

But we need to special-case them, as we have interfaces operating on backing chain,

> bdrv_co_block_status_above() probably shouldn't recurse along the
> backing chain, but along the returned *file pointers, and consider the
> returned offset in *map.
> 

So, you mean that in case of unallocated, format layer should return it's backing file as file?
Then, maybe bdrv_co_block_status should not recurse at all?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 11:14           ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 11:51             ` Kevin Wolf
  2019-08-13 13:00               ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:31               ` Max Reitz
  0 siblings, 2 replies; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 11:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Denis Lunev, qemu-devel, qemu-block, Max Reitz

Am 13.08.2019 um 13:14 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
> > 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
> >> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
> >>> 12.08.2019 22:50, Max Reitz wrote:
> >>>> On 12.08.19 21:46, Max Reitz wrote:
> >>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
> >>>>>> Hi all!
> >>>>>>
> >>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
> >>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
> >>>>>> or somehow forbidden (just forbid backing-file supporting node to be
> >>>>>> file child of raw-format node).
> >>>>>
> >>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
> >>>>>
> >>>>> (And not even them, they should just be handled transparently by
> >>>>> bdrv_co_block_status().  But that’s something for later.)
> >>>>>
> >>>>>> Vladimir Sementsov-Ogievskiy (2):
> >>>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
> >>>>>>    iotests: test mirroring qcow2 under raw format
> >>>>>>
> >>>>>>   block/raw-format.c         |  2 +-
> >>>>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
> >>>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
> >>>>>>   tests/qemu-iotests/group   |  1 +
> >>>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
> >>>>>>   create mode 100755 tests/qemu-iotests/263
> >>>>>>   create mode 100644 tests/qemu-iotests/263.out
> >>>>>
> >>>>> Thanks, applied to my block-next branch:
> >>>>>
> >>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
> >>>>
> >>>> Oops, maybe not.  221 needs to be adjusted.
> >>>>
> >>>
> >>>
> >>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
> >>> don't look good.
> >>>
> >>> So, it's not quite right to report DATA | RECURSE, we actually should report
> >>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
> >>> DATA will be set in final result (generic layer must not drop it, obviously).
> >>>
> >>> ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
> >>> resend something new.
> >>>
> >>>
> >>
> >>
> >> Hmmm.. So, we have raw node, and assume backing chain under it. who should loop through it,
> >> generic code or raw driver?
> >>
> >> Now it all looks like generic code is responsible for looping through filtered chain (backing files
> >> and filters) and driver is responsible for all it's children except for filtered child.
> >>
> >> Or, driver may return something that says to generic child to handle the whole backing chain of returned
> >> file at once, as it's another backing chain. And seems even RECURSE don't work correctly as it doesn't handle
> >> the backing chain in this recursion. Why it works better than RAW - just because we return it together
> >> with DATA flags and this DATA flag is kept anyway, independently of finding zeros or not.
> >>
> >>
> > 
> > 
> > Hmm, so, is it correct that we return DATA | RECURSE, if we are not really sure that it is data?
> > 
> > If we see at
> > 
> >   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> > 
> > seems like we should report DATA only if there is allocation..
> > 
> >   * DATA ZERO OFFSET_VALID
> >   *  t    t        t       sectors read as zero, returned file is zero at offset
> >   *  t    f        t       sectors read as valid from file at offset
> >   *  f    t        t       sectors preallocated, read as zero, returned file not
> > 
> > so, ZERO alone doesn't guarantee that we may safely read?
> > 
> > So, for qcow2 metadata-preallocated images, what about zero-init? We report DATA, and probably get ZERO from
> > file and have finally DAYA | ZERO which guarantees that read will return zeros, but is it true?
> > 
> > Finally, what "DATA" mean? That space is allocated and occupies disk space? Or it only  means only ALLOCATED i.e.
> > "read from this layer, not from backing" otherwise, and adds additional meaning to ZERO when used together, that
> > read will return zeros for sure?

I think DATA means that the data for this block is provided by *file. I
wouldn't necessarily understand it to mean that the data actually takes
up physical disk space there.

> Continue self-discussion.
> 
> Consider closer the following case:
>  >   * DATA ZERO OFFSET_VALID
>  >   *  f    t        t       sectors preallocated, read as zero, returned file not
> 
> It actually means that we must not read, as read will return wrong
> data, when clusters are actually zero for guest.

It means that you need to read from bs itself to get the correct data
(which will be zero). Even though OFFSET_VALID is set, reading from
*file (typically bs->file->bs) at the returned offset might not give the
right result.

> It's OK, when for ex. qcow2 returns this combination and link to its
> file child: it means that if you read from qcow2 node, you'll see
> correct zeros, as qcow2 has special metadata which shows that these
> clusters are zero. But if you read from file directly at returned
> offset you'll see garbage, so don't do it.

Correct.

> But what if some node returns this combination with file == itself? It
> actually means that you must not read, but you should call
> block-status to understand that there are zeros. So, if some format
> can return this combination with file == itself it means that you must
> not read it directly, but only after checking block status.

This doesn't make sense to me. Reading from a node is always correct.

But you're right that DATA seems to mean something slightly different at
the protocol level because *file cannot have a meaningful value for the
lower layer there. In this case, DATA still seems to mean that the data
is fetched from the lower layer (i.e. the block device on which the file
system resides). For holes, this is not the case.

> And file-posix is example of such driver. But file-posix holes are guaranteed to read as zero, so we can report DATA | ZERO.
> But this will break user expirience which assumes that DATA means occupation of real disk space.

With the above explanation, DATA shouldn't be set for holes.

But it's still kind of inconsistent because OFFSET_VALID and the offset
refer to bs itself and not to the lower layer.

> ...
> me go and re-read what we've documented in NBD protocol about block steus...
> 
> "DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just notes that probably there is no disk space occupation
> for this region.. So it's about disk space allocation and nothing about correctness of read.
> and NBD_STATE_ZERO guarantees that region read as all zeroes.
> 
> Look at code in nbd/server.c.. Aha, it calls block_status_above and turns !ALLOCATED into HOLE. Which means that it will never
> return HOLE for file-posix..

Hm... This is a mess. :-)

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 11:28     ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 12:01       ` Kevin Wolf
  2019-08-13 13:21         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 12:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: Denis Lunev, qemu-devel, qemu-block, mreitz

Am 13.08.2019 um 13:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 13.08.2019 14:04, Kevin Wolf wrote:
> > Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
> >> returned file. But is it correct behavior at all? If returned file
> >> itself has a backing file, we may report as totally unallocated and
> >> area which actually has data in bottom backing file.
> >>
> >> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
> >> by following commit with a test. Let's make raw-format behave more
> >> correctly returning BDRV_BLOCK_DATA.
> >>
> >> Suggested-by: Max Reitz <mreitz@redhat.com>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > 
> > After some reading, I think I came to the conclusion that RAW is the
> > correct thing to do. There is indeed a problem, but this patch is trying
> > to fix it in the wrong place.
> > 
> > In the case where the backing file contains some data, and we have a
> > 'raw' node above the qcow2 overlay node, the content of the respective
> > block is not defined by the queried backing file layer, so it is
> > completely correct that bdrv_is_allocated() returns false, like it would
> > if you queried the qcow2 layer directly. If it returned true, we would
> > copy everything, which isn't right either (the test cases should may add
> > the qemu-img map output of the target so this becomes visible).
> > 
> > The problem is that we try to recurse along the backing chain, but we
> > fail to make the step from the raw node to the backing file.
> 
> I'd say, the problem is that we ignore backing chain of non-backing
> child

Yes, exactly. And I know even less about what happens if a child is
neither bs->file nor bs->backing. Imagine a qcow2 image with an external
data file that is a qcow2 image with a backing file itself. :-)

Actually, just having two qcow2 layers nested with bs->file probably
already fails.

> > Note that just extending Max's "deal with filters" is not enough to fix
> > this because raw doesn't actually meet all of the criteria for being a
> > filter in this sense (at least because the 'offset' option can change
> > offsets between raw and its child).
> > 
> > I think this is essentially a result of special-casing backing files
> > everywhere instead of treating them like children like any other.
> 
> But we need to special-case them, as we have interfaces operating on
> backing chain,

I'm not sure yet if this means that these interfaces are wrong, but it
might. But in any case, I think we depend on special-casing in more
places than we should.

> > bdrv_co_block_status_above() probably shouldn't recurse along the
> > backing chain, but along the returned *file pointers, and consider the
> > returned offset in *map.
> 
> So, you mean that in case of unallocated, format layer should return
> it's backing file as file?

Yes, because that's where it's reading the data from.

Hm... Now I wonder what this means for DATA... In theory it would have
to be set for backing files, but that would make it completely useless.
We can distinguish the cases by looking at *file, but how does the
generic block layer know which child should be counted as "allocated"
and which shouldn't?

> Then, maybe bdrv_co_block_status should not recurse at all?

Maybe. I think I need to draw a bit on the whiteboard before I can
really say anything. It would probably be a pretty fundamental change to
the model how block_status works.

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 11:51             ` Kevin Wolf
@ 2019-08-13 13:00               ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:31               ` Max Reitz
  1 sibling, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 13:00 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Denis Lunev, qemu-devel, qemu-block, Max Reitz

13.08.2019 14:51, Kevin Wolf wrote:
> Am 13.08.2019 um 13:14 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
>>>> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 12.08.2019 22:50, Max Reitz wrote:
>>>>>> On 12.08.19 21:46, Max Reitz wrote:
>>>>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> Hi all!
>>>>>>>>
>>>>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>>>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>>>>>>> or somehow forbidden (just forbid backing-file supporting node to be
>>>>>>>> file child of raw-format node).
>>>>>>>
>>>>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>>>>>>
>>>>>>> (And not even them, they should just be handled transparently by
>>>>>>> bdrv_co_block_status().  But that’s something for later.)
>>>>>>>
>>>>>>>> Vladimir Sementsov-Ogievskiy (2):
>>>>>>>>     block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>>>>>>     iotests: test mirroring qcow2 under raw format
>>>>>>>>
>>>>>>>>    block/raw-format.c         |  2 +-
>>>>>>>>    tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>>>>>>>    tests/qemu-iotests/263.out | 12 ++++++++++
>>>>>>>>    tests/qemu-iotests/group   |  1 +
>>>>>>>>    4 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>>>    create mode 100755 tests/qemu-iotests/263
>>>>>>>>    create mode 100644 tests/qemu-iotests/263.out
>>>>>>>
>>>>>>> Thanks, applied to my block-next branch:
>>>>>>>
>>>>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
>>>>>>
>>>>>> Oops, maybe not.  221 needs to be adjusted.
>>>>>>
>>>>>
>>>>>
>>>>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
>>>>> don't look good.
>>>>>
>>>>> So, it's not quite right to report DATA | RECURSE, we actually should report
>>>>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
>>>>> DATA will be set in final result (generic layer must not drop it, obviously).
>>>>>
>>>>> ALLOCATED never returned by drivers but seems it should be. I'll think a bit and
>>>>> resend something new.
>>>>>
>>>>>
>>>>
>>>>
>>>> Hmmm.. So, we have raw node, and assume backing chain under it. who should loop through it,
>>>> generic code or raw driver?
>>>>
>>>> Now it all looks like generic code is responsible for looping through filtered chain (backing files
>>>> and filters) and driver is responsible for all it's children except for filtered child.
>>>>
>>>> Or, driver may return something that says to generic child to handle the whole backing chain of returned
>>>> file at once, as it's another backing chain. And seems even RECURSE don't work correctly as it doesn't handle
>>>> the backing chain in this recursion. Why it works better than RAW - just because we return it together
>>>> with DATA flags and this DATA flag is kept anyway, independently of finding zeros or not.
>>>>
>>>>
>>>
>>>
>>> Hmm, so, is it correct that we return DATA | RECURSE, if we are not really sure that it is data?
>>>
>>> If we see at
>>>
>>>    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
>>>
>>> seems like we should report DATA only if there is allocation..
>>>
>>>    * DATA ZERO OFFSET_VALID
>>>    *  t    t        t       sectors read as zero, returned file is zero at offset
>>>    *  t    f        t       sectors read as valid from file at offset
>>>    *  f    t        t       sectors preallocated, read as zero, returned file not
>>>
>>> so, ZERO alone doesn't guarantee that we may safely read?
>>>
>>> So, for qcow2 metadata-preallocated images, what about zero-init? We report DATA, and probably get ZERO from
>>> file and have finally DAYA | ZERO which guarantees that read will return zeros, but is it true?
>>>
>>> Finally, what "DATA" mean? That space is allocated and occupies disk space? Or it only  means only ALLOCATED i.e.
>>> "read from this layer, not from backing" otherwise, and adds additional meaning to ZERO when used together, that
>>> read will return zeros for sure?
> 
> I think DATA means that the data for this block is provided by *file. I
> wouldn't necessarily understand it to mean that the data actually takes
> up physical disk space there.
> 
>> Continue self-discussion.
>>
>> Consider closer the following case:
>>   >   * DATA ZERO OFFSET_VALID
>>   >   *  f    t        t       sectors preallocated, read as zero, returned file not
>>
>> It actually means that we must not read, as read will return wrong
>> data, when clusters are actually zero for guest.
> 
> It means that you need to read from bs itself to get the correct data
> (which will be zero). Even though OFFSET_VALID is set, reading from
> *file (typically bs->file->bs) at the returned offset might not give the
> right result.
> 
>> It's OK, when for ex. qcow2 returns this combination and link to its
>> file child: it means that if you read from qcow2 node, you'll see
>> correct zeros, as qcow2 has special metadata which shows that these
>> clusters are zero. But if you read from file directly at returned
>> offset you'll see garbage, so don't do it.
> 
> Correct.
> 
>> But what if some node returns this combination with file == itself? It
>> actually means that you must not read, but you should call
>> block-status to understand that there are zeros. So, if some format
>> can return this combination with file == itself it means that you must
>> not read it directly, but only after checking block status.
> 
> This doesn't make sense to me. Reading from a node is always correct.
> 
> But you're right that DATA seems to mean something slightly different at
> the protocol level because *file cannot have a meaningful value for the
> lower layer there. In this case, DATA still seems to mean that the data
> is fetched from the lower layer (i.e. the block device on which the file
> system resides). For holes, this is not the case.
> 
>> And file-posix is example of such driver. But file-posix holes are guaranteed to read as zero, so we can report DATA | ZERO.
>> But this will break user expirience which assumes that DATA means occupation of real disk space.
> 
> With the above explanation, DATA shouldn't be set for holes.
> 
> But it's still kind of inconsistent because OFFSET_VALID and the offset
> refer to bs itself and not to the lower layer.
> 
>> ...
>> me go and re-read what we've documented in NBD protocol about block steus...
>>
>> "DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just notes that probably there is no disk space occupation
>> for this region.. So it's about disk space allocation and nothing about correctness of read.
>> and NBD_STATE_ZERO guarantees that region read as all zeroes.
>>
>> Look at code in nbd/server.c.. Aha, it calls block_status_above and turns !ALLOCATED into HOLE. Which means that it will never
>> return HOLE for file-posix..
> 
> Hm... This is a mess. :-)
> 


I'm afraid the these all are consequences of really different usages of BLOCK_STATUS:

1. For backing-chain aware block-jobs, to work with overlays. So backing children are different from the others,
as they are presenting overlays, which may be used in separate, and other children are not overlays and "more owned" by their parents

2. To understand where are zeroes and improve performance of copying loops (not copying zeroes, not sending zeroes through the wire,
  use effective write_zeroes)

3. To show file-mapping (qemu-img map)

4. Mirror use it to do DISCARD if "UNALLOCATED", but seems wrong for me now.. For which driver bdrv_block_status_above(source, NULL,...)
will return UNALLOCATED? Seems neither file-posix nor qcow2.

anything else?

May be we should split these use cases instead of trying to combine them using a lot of returned flags and parameters?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 12:01       ` Kevin Wolf
@ 2019-08-13 13:21         ` " Kevin Wolf
  2019-08-13 14:46           ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 13:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy; +Cc: mreitz, Denis Lunev, qemu-block, qemu-devel

Am 13.08.2019 um 14:01 hat Kevin Wolf geschrieben:
> Am 13.08.2019 um 13:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 13.08.2019 14:04, Kevin Wolf wrote:
> > > Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > >> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
> > >> returned file. But is it correct behavior at all? If returned file
> > >> itself has a backing file, we may report as totally unallocated and
> > >> area which actually has data in bottom backing file.
> > >>
> > >> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
> > >> by following commit with a test. Let's make raw-format behave more
> > >> correctly returning BDRV_BLOCK_DATA.
> > >>
> > >> Suggested-by: Max Reitz <mreitz@redhat.com>
> > >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > 
> > > After some reading, I think I came to the conclusion that RAW is the
> > > correct thing to do. There is indeed a problem, but this patch is trying
> > > to fix it in the wrong place.
> > > 
> > > In the case where the backing file contains some data, and we have a
> > > 'raw' node above the qcow2 overlay node, the content of the respective
> > > block is not defined by the queried backing file layer, so it is
> > > completely correct that bdrv_is_allocated() returns false, like it would
> > > if you queried the qcow2 layer directly. If it returned true, we would
> > > copy everything, which isn't right either (the test cases should may add
> > > the qemu-img map output of the target so this becomes visible).
> > > 
> > > The problem is that we try to recurse along the backing chain, but we
> > > fail to make the step from the raw node to the backing file.
> > 
> > I'd say, the problem is that we ignore backing chain of non-backing
> > child
> 
> Yes, exactly. And I know even less about what happens if a child is
> neither bs->file nor bs->backing. Imagine a qcow2 image with an external
> data file that is a qcow2 image with a backing file itself. :-)
> 
> Actually, just having two qcow2 layers nested with bs->file probably
> already fails.
> 
> > > Note that just extending Max's "deal with filters" is not enough to fix
> > > this because raw doesn't actually meet all of the criteria for being a
> > > filter in this sense (at least because the 'offset' option can change
> > > offsets between raw and its child).
> > > 
> > > I think this is essentially a result of special-casing backing files
> > > everywhere instead of treating them like children like any other.
> > 
> > But we need to special-case them, as we have interfaces operating on
> > backing chain,
> 
> I'm not sure yet if this means that these interfaces are wrong, but it
> might. But in any case, I think we depend on special-casing in more
> places than we should.
> 
> > > bdrv_co_block_status_above() probably shouldn't recurse along the
> > > backing chain, but along the returned *file pointers, and consider the
> > > returned offset in *map.
> > 
> > So, you mean that in case of unallocated, format layer should return
> > it's backing file as file?
> 
> Yes, because that's where it's reading the data from.
> 
> Hm... Now I wonder what this means for DATA... In theory it would have
> to be set for backing files, but that would make it completely useless.
> We can distinguish the cases by looking at *file, but how does the
> generic block layer know which child should be counted as "allocated"
> and which shouldn't?

Possible answer to my own question:

bdrv_is_allocated(bs) isn't even asking a complete question. What we
really need to ask is whether a specific child is where data comes from.

What the current callers of bdrv_is_allocated() are interested in is
whether the data comes from bs->backing or from somewhere else. That is,
if removing bs from the graph (so that all parents of bs would point to
bs->backing instead) would still result in the same data in the given
block.

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 11:51             ` Kevin Wolf
  2019-08-13 13:00               ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:31               ` Max Reitz
  2019-08-13 14:46                 ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:50                 ` Eric Blake
  1 sibling, 2 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 14:31 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1.1: Type: text/plain, Size: 380 bytes --]

On 13.08.19 13:51, Kevin Wolf wrote:

[...]

> Hm... This is a mess. :-)

Just out of curiosity: Why?

Aren’t there only two things we really need from the block_status
infrastructure?

(1) Whether something is allocated in the given layer of the backing chain,

(2) Whether we know that a given range reads as zeroes.

Do we really need anything else?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13  9:34   ` Kevin Wolf
@ 2019-08-13 14:38     ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 14:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1.1: Type: text/plain, Size: 3638 bytes --]

On 13.08.19 11:34, Kevin Wolf wrote:
> Am 12.08.2019 um 21:46 hat Max Reitz geschrieben:
>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>> or somehow forbidden (just forbid backing-file supporting node to be
>>> file child of raw-format node).
>>
>> I agree, I think only filters should return BDRV_BLOCK_RAW.
> 
> If BDRV_BLOCK_RAW isn't suitable for raw, something went wrong. :-)

Yes, its introduction.

> But anyway, raw is essentially a filter, at least if you don't use
> the offset option.

Which is precisely why it isn’t a filter.

Another reason is that it generally appears as a replacement for a
format driver.  You never insert it into some graph because you want it
as a filter.  Which is why behaving like a format driver in block_status
makes sense, in my opinion.

>                    And BDRV_BLOCK_RAW shouldn't even depend on an
> unchanged offset because the .bdrv_co_block_status implementation
> returns the right offset.
> 
> And indeed, you can replace raw with blkdebug and it still fails (the
> testcase from patch 2 fails, too, but it's obvious enough this way):
> 
>     $ ./qemu-img map --output=json --image-opts driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2 
>     [{ "start": 0, "length": 1048576, "depth": 1, "zero": true, "data": false},
>     { "start": 1048576, "length": 1048576, "depth": 1, "zero": false, "data": true, "offset": 327680},
>     { "start": 2097152, "length": 65011712, "depth": 1, "zero": true, "data": false}]
> 
>     $ ./qemu-img map --output=json --image-opts driver=raw,file.driver=qcow2,file.file.driver=file,file.file.filename=/tmp/test.qcow2 
>     [{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": false}]
> 
>     $ ./qemu-img map --output=json --image-opts driver=blkdebug,image.driver=qcow2,image.file.driver=file,image.file.filename=/tmp/test.qcow2 
>     [{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": false}]
> 
> After applying your "deal with filters" series, blkdebug actually prints
> the expected result and passes the iotests case, but raw still doesn't.
> So you must have fixed something for filters that declare themselves
> filters,

I hope to have fixed many things for filters that declare themselves
filters. ;-)

I think the problem is that filters just break backing chains right now.
 My series fixes that.  (So even if you have a blkdebug node on top of
your qcow2 node, you should realize that the node you really care about
is the qcow2 node.  If you look at the filter, you won’t see the backing
file and will thus interpret unallocated areas the wrong way.)

(The most important problem is that bdrv_is_allocated_above() currently
only goes to the backing_bs().  That’s fixed by patch 13, “block: Use
CAFs in block status functions”.)

>          even though that semantics should probably be coupled to
> BDRV_BLOCK_RAW instead so that the raw format can benefit from it, too.

I think BDRV_BLOCK_RAW should just be dropped.  I don’t see its purpose
for non-filters now that we have BDRV_BLOCK_RECURSE; and filters should
just be handled generically in bdrv_co_block_status().

Alternatively, we can decide that calling block_status on a filter node
is a bad idea, because the caller may not be prepared for the fact that
block_status will not return information for this node.  But that would
mean dropping BDRV_BLOCK_RAW, too.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 11:04   ` Kevin Wolf
  2019-08-13 11:28     ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:43     ` " Max Reitz
  2019-08-13 14:56       ` Vladimir Sementsov-Ogievskiy
  2019-08-13 15:41       ` Kevin Wolf
  1 sibling, 2 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 14:43 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy; +Cc: den, qemu-devel, qemu-block

[-- Attachment #1.1: Type: text/plain, Size: 1825 bytes --]

On 13.08.19 13:04, Kevin Wolf wrote:
> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>> returned file. But is it correct behavior at all? If returned file
>> itself has a backing file, we may report as totally unallocated and
>> area which actually has data in bottom backing file.
>>
>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>> by following commit with a test. Let's make raw-format behave more
>> correctly returning BDRV_BLOCK_DATA.
>>
>> Suggested-by: Max Reitz <mreitz@redhat.com>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> After some reading, I think I came to the conclusion that RAW is the
> correct thing to do. There is indeed a problem, but this patch is trying
> to fix it in the wrong place.
> 
> In the case where the backing file contains some data, and we have a
> 'raw' node above the qcow2 overlay node, the content of the respective
> block is not defined by the queried backing file layer, so it is
> completely correct that bdrv_is_allocated() returns false,like it would
> if you queried the qcow2 layer directly.

I disagree.  The queried backing file layer is the raw node.  As I said,
in my opinion raw nodes are not filter nodes, neither in behavior (they
have an offset option), nor in how they are generally used (as a format).

The raw format does not support backing files.  Therefore, everything on
a raw node is allocated.

(That is, like, my opinion.)

>                                          If it returned true, we would
> copy everything, which isn't right either (the test cases should may add
> the qemu-img map output of the target so this becomes visible).

It is right.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 13:21         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
@ 2019-08-13 14:46           ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 14:46 UTC (permalink / raw)
  To: Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: Denis Lunev, qemu-block, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 4272 bytes --]

On 13.08.19 15:21, Kevin Wolf wrote:
> Am 13.08.2019 um 14:01 hat Kevin Wolf geschrieben:
>> Am 13.08.2019 um 13:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 13.08.2019 14:04, Kevin Wolf wrote:
>>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>>> returned file. But is it correct behavior at all? If returned file
>>>>> itself has a backing file, we may report as totally unallocated and
>>>>> area which actually has data in bottom backing file.
>>>>>
>>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>>> by following commit with a test. Let's make raw-format behave more
>>>>> correctly returning BDRV_BLOCK_DATA.
>>>>>
>>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>
>>>> After some reading, I think I came to the conclusion that RAW is the
>>>> correct thing to do. There is indeed a problem, but this patch is trying
>>>> to fix it in the wrong place.
>>>>
>>>> In the case where the backing file contains some data, and we have a
>>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>>> block is not defined by the queried backing file layer, so it is
>>>> completely correct that bdrv_is_allocated() returns false, like it would
>>>> if you queried the qcow2 layer directly. If it returned true, we would
>>>> copy everything, which isn't right either (the test cases should may add
>>>> the qemu-img map output of the target so this becomes visible).
>>>>
>>>> The problem is that we try to recurse along the backing chain, but we
>>>> fail to make the step from the raw node to the backing file.
>>>
>>> I'd say, the problem is that we ignore backing chain of non-backing
>>> child
>>
>> Yes, exactly. And I know even less about what happens if a child is
>> neither bs->file nor bs->backing. Imagine a qcow2 image with an external
>> data file that is a qcow2 image with a backing file itself. :-)
>>
>> Actually, just having two qcow2 layers nested with bs->file probably
>> already fails.
>>
>>>> Note that just extending Max's "deal with filters" is not enough to fix
>>>> this because raw doesn't actually meet all of the criteria for being a
>>>> filter in this sense (at least because the 'offset' option can change
>>>> offsets between raw and its child).
>>>>
>>>> I think this is essentially a result of special-casing backing files
>>>> everywhere instead of treating them like children like any other.
>>>
>>> But we need to special-case them, as we have interfaces operating on
>>> backing chain,
>>
>> I'm not sure yet if this means that these interfaces are wrong, but it
>> might. But in any case, I think we depend on special-casing in more
>> places than we should.
>>
>>>> bdrv_co_block_status_above() probably shouldn't recurse along the
>>>> backing chain, but along the returned *file pointers, and consider the
>>>> returned offset in *map.
>>>
>>> So, you mean that in case of unallocated, format layer should return
>>> it's backing file as file?
>>
>> Yes, because that's where it's reading the data from.
>>
>> Hm... Now I wonder what this means for DATA... In theory it would have
>> to be set for backing files, but that would make it completely useless.
>> We can distinguish the cases by looking at *file, but how does the
>> generic block layer know which child should be counted as "allocated"
>> and which shouldn't?
> 
> Possible answer to my own question:
> 
> bdrv_is_allocated(bs) isn't even asking a complete question. What we
> really need to ask is whether a specific child is where data comes from.
> 
> What the current callers of bdrv_is_allocated() are interested in is
> whether the data comes from bs->backing or from somewhere else. That is,
> if removing bs from the graph (so that all parents of bs would point to
> bs->backing instead) would still result in the same data in the given
> block.

Maybe callers of bdrv_is_allocated() should just ensure that the node
they pass actually has a backing file.

(If it doesn’t, they should skip all filters until it does.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 14:31               ` Max Reitz
@ 2019-08-13 14:46                 ` Vladimir Sementsov-Ogievskiy
  2019-08-13 14:53                   ` Max Reitz
  2019-08-13 14:50                 ` Eric Blake
  1 sibling, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 14:46 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf; +Cc: qemu-devel, qemu-block, Denis Lunev

13.08.2019 17:31, Max Reitz wrote:
> On 13.08.19 13:51, Kevin Wolf wrote:
> 
> [...]
> 
>> Hm... This is a mess. :-)
> 
> Just out of curiosity: Why?
> 
> Aren’t there only two things we really need from the block_status
> infrastructure?
> 
> (1) Whether something is allocated in the given layer of the backing chain,
> 
> (2) Whether we know that a given range reads as zeroes.
> 
> Do we really need anything else?
> 

qemu-img map?


1. We need to fix the bug somehow
2. We need to fix comment about different block-status flags, as it really
lacks information of what actually "DATA" means (together with *file).
And what finally means "allocated", can you define it precisely?
3. Fix nbd-server to be closer to NBD spec about block-status

I made several tries to imagine [1] and [2] but never succeeded..


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 14:31               ` Max Reitz
  2019-08-13 14:46                 ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:50                 ` Eric Blake
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Blake @ 2019-08-13 14:50 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1050 bytes --]

On 8/13/19 9:31 AM, Max Reitz wrote:
> On 13.08.19 13:51, Kevin Wolf wrote:
> 
> [...]
> 
>> Hm... This is a mess. :-)
> 
> Just out of curiosity: Why?
> 
> Aren’t there only two things we really need from the block_status
> infrastructure?
> 
> (1) Whether something is allocated in the given layer of the backing chain,

(1)(a) - is it allocated in this layer
(1)(b) - is it allocated in any backing layer

> 
> (2) Whether we know that a given range reads as zeroes.
> 
> Do we really need anything else?

qemu-img map needs:

(3) What host-relative offset, if any, corresponds to a given
guest-visible offset.

I also need to find time to revisit my proposed patches on block_status
alignment - there are some cases where we want to ensure that one layer
with large granularity does not pick up mid-granularity changes in
status read from a backing layer with smaller granularity.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 14:46                 ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 14:53                   ` Max Reitz
  2019-08-13 15:03                     ` Kevin Wolf
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-08-13 14:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
  Cc: qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1816 bytes --]

On 13.08.19 16:46, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 17:31, Max Reitz wrote:
>> On 13.08.19 13:51, Kevin Wolf wrote:
>>
>> [...]
>>
>>> Hm... This is a mess. :-)
>>
>> Just out of curiosity: Why?
>>
>> Aren’t there only two things we really need from the block_status
>> infrastructure?
>>
>> (1) Whether something is allocated in the given layer of the backing chain,
>>
>> (2) Whether we know that a given range reads as zeroes.
>>
>> Do we really need anything else?
>>
> 
> qemu-img map?

Which is a debugging tool.  So it doesn’t fall under “really” in my
book.  If removing everything but allocation+zero information would make
the code a lot simpler, I think that would be worth it.

> 1. We need to fix the bug somehow
> 2. We need to fix comment about different block-status flags, as it really
> lacks information of what actually "DATA" means (together with *file).
> And what finally means "allocated", can you define it precisely?

As I wrote in my other mails, I think the problem is that it’s just
unexpected that block_status automatically skips through for filters.
It shouldn’t, that’s just black magic that the caller should not rely on.

(We see precisely here that it’s wrong, because the callers are not
prepared for the allocation information returned to be associated with a
different node than what they passed.)

So my definition is just “If the node has a COW backing file and
block_status returns ‘not allocated’, the data will be there.
Otherwise, the data is in the current node.”  Yes, that means that
filters should appear as fully allocated.

Max

> 3. Fix nbd-server to be closer to NBD spec about block-status
> 
> I made several tries to imagine [1] and [2] but never succeeded..
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 14:43     ` [Qemu-devel] " Max Reitz
@ 2019-08-13 14:56       ` Vladimir Sementsov-Ogievskiy
  2019-08-13 15:03         ` Max Reitz
  2019-08-13 15:41       ` Kevin Wolf
  1 sibling, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 14:56 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf; +Cc: qemu-devel, qemu-block, Denis Lunev

13.08.2019 17:43, Max Reitz wrote:
> On 13.08.19 13:04, Kevin Wolf wrote:
>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>> returned file. But is it correct behavior at all? If returned file
>>> itself has a backing file, we may report as totally unallocated and
>>> area which actually has data in bottom backing file.
>>>
>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>> by following commit with a test. Let's make raw-format behave more
>>> correctly returning BDRV_BLOCK_DATA.
>>>
>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
>> After some reading, I think I came to the conclusion that RAW is the
>> correct thing to do. There is indeed a problem, but this patch is trying
>> to fix it in the wrong place.
>>
>> In the case where the backing file contains some data, and we have a
>> 'raw' node above the qcow2 overlay node, the content of the respective
>> block is not defined by the queried backing file layer, so it is
>> completely correct that bdrv_is_allocated() returns false,like it would
>> if you queried the qcow2 layer directly.
> 
> I disagree.  The queried backing file layer is the raw node.  As I said,
> in my opinion raw nodes are not filter nodes, neither in behavior (they
> have an offset option), nor in how they are generally used (as a format).
> 
> The raw format does not support backing files.  Therefore, everything on
> a raw node is allocated.
> 

Could you tell me at least, what means "allocated" ?

It's a term that describing a region somehow.. But how? Allocated where?
In raw node, in its child or both? Am I right that if region allocated in
one of non-cow children it is assumed to be allocated in parent too? Or what?

And it's unrelated to real disk allocation which (IMHO) directly shows that
this a bad term.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 14:53                   ` Max Reitz
@ 2019-08-13 15:03                     ` Kevin Wolf
  2019-08-13 15:04                       ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 15:03 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1: Type: text/plain, Size: 1983 bytes --]

Am 13.08.2019 um 16:53 hat Max Reitz geschrieben:
> On 13.08.19 16:46, Vladimir Sementsov-Ogievskiy wrote:
> > 13.08.2019 17:31, Max Reitz wrote:
> >> On 13.08.19 13:51, Kevin Wolf wrote:
> >>
> >> [...]
> >>
> >>> Hm... This is a mess. :-)
> >>
> >> Just out of curiosity: Why?
> >>
> >> Aren’t there only two things we really need from the block_status
> >> infrastructure?
> >>
> >> (1) Whether something is allocated in the given layer of the backing chain,
> >>
> >> (2) Whether we know that a given range reads as zeroes.
> >>
> >> Do we really need anything else?
> >>
> > 
> > qemu-img map?
> 
> Which is a debugging tool.  So it doesn’t fall under “really” in my
> book.  If removing everything but allocation+zero information would make
> the code a lot simpler, I think that would be worth it.
> 
> > 1. We need to fix the bug somehow
> > 2. We need to fix comment about different block-status flags, as it really
> > lacks information of what actually "DATA" means (together with *file).
> > And what finally means "allocated", can you define it precisely?
> 
> As I wrote in my other mails, I think the problem is that it’s just
> unexpected that block_status automatically skips through for filters.
> It shouldn’t, that’s just black magic that the caller should not rely on.
> 
> (We see precisely here that it’s wrong, because the callers are not
> prepared for the allocation information returned to be associated with a
> different node than what they passed.)
> 
> So my definition is just “If the node has a COW backing file and
> block_status returns ‘not allocated’, the data will be there.
> Otherwise, the data is in the current node.”  Yes, that means that
> filters should appear as fully allocated.

You can do that, but then the callers need to learn to do the recursion
instead. After all, just copying everything if a filter is in the
subtree isn't the desired behaviour.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 14:56       ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 15:03         ` Max Reitz
  2019-08-13 15:22           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2019-08-13 15:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
  Cc: qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2453 bytes --]

On 13.08.19 16:56, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 17:43, Max Reitz wrote:
>> On 13.08.19 13:04, Kevin Wolf wrote:
>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>> returned file. But is it correct behavior at all? If returned file
>>>> itself has a backing file, we may report as totally unallocated and
>>>> area which actually has data in bottom backing file.
>>>>
>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>> by following commit with a test. Let's make raw-format behave more
>>>> correctly returning BDRV_BLOCK_DATA.
>>>>
>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>
>>> After some reading, I think I came to the conclusion that RAW is the
>>> correct thing to do. There is indeed a problem, but this patch is trying
>>> to fix it in the wrong place.
>>>
>>> In the case where the backing file contains some data, and we have a
>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>> block is not defined by the queried backing file layer, so it is
>>> completely correct that bdrv_is_allocated() returns false,like it would
>>> if you queried the qcow2 layer directly.
>>
>> I disagree.  The queried backing file layer is the raw node.  As I said,
>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>> have an offset option), nor in how they are generally used (as a format).
>>
>> The raw format does not support backing files.  Therefore, everything on
>> a raw node is allocated.
>>
> 
> Could you tell me at least, what means "allocated" ?
> 
> It's a term that describing a region somehow.. But how? Allocated where?
> In raw node, in its child or both? Am I right that if region allocated in
> one of non-cow children it is assumed to be allocated in parent too? Or what?
> 
> And it's unrelated to real disk allocation which (IMHO) directly shows that
> this a bad term.

It’s a term for COW backing chains.  If something is allocated on a
given node in a COW backing chain, it means it is either present in
exactly that node or in one of its storage children (in case the node is
a format node).  If it is not allocated, it is not, and read accesses
will be forwarded to the COW backing child.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
  2019-08-13 15:03                     ` Kevin Wolf
@ 2019-08-13 15:04                       ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 15:04 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2094 bytes --]

On 13.08.19 17:03, Kevin Wolf wrote:
> Am 13.08.2019 um 16:53 hat Max Reitz geschrieben:
>> On 13.08.19 16:46, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 17:31, Max Reitz wrote:
>>>> On 13.08.19 13:51, Kevin Wolf wrote:
>>>>
>>>> [...]
>>>>
>>>>> Hm... This is a mess. :-)
>>>>
>>>> Just out of curiosity: Why?
>>>>
>>>> Aren’t there only two things we really need from the block_status
>>>> infrastructure?
>>>>
>>>> (1) Whether something is allocated in the given layer of the backing chain,
>>>>
>>>> (2) Whether we know that a given range reads as zeroes.
>>>>
>>>> Do we really need anything else?
>>>>
>>>
>>> qemu-img map?
>>
>> Which is a debugging tool.  So it doesn’t fall under “really” in my
>> book.  If removing everything but allocation+zero information would make
>> the code a lot simpler, I think that would be worth it.
>>
>>> 1. We need to fix the bug somehow
>>> 2. We need to fix comment about different block-status flags, as it really
>>> lacks information of what actually "DATA" means (together with *file).
>>> And what finally means "allocated", can you define it precisely?
>>
>> As I wrote in my other mails, I think the problem is that it’s just
>> unexpected that block_status automatically skips through for filters.
>> It shouldn’t, that’s just black magic that the caller should not rely on.
>>
>> (We see precisely here that it’s wrong, because the callers are not
>> prepared for the allocation information returned to be associated with a
>> different node than what they passed.)
>>
>> So my definition is just “If the node has a COW backing file and
>> block_status returns ‘not allocated’, the data will be there.
>> Otherwise, the data is in the current node.”  Yes, that means that
>> filters should appear as fully allocated.
> 
> You can do that, but then the callers need to learn to do the recursion
> instead. After all, just copying everything if a filter is in the
> subtree isn't the desired behaviour.

Yes, hence the “deal with filters” series.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 15:03         ` Max Reitz
@ 2019-08-13 15:22           ` Vladimir Sementsov-Ogievskiy
  2019-08-13 16:07             ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 15:22 UTC (permalink / raw)
  To: Max Reitz, Kevin Wolf; +Cc: qemu-devel, qemu-block, Denis Lunev

13.08.2019 18:03, Max Reitz wrote:
> On 13.08.19 16:56, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 17:43, Max Reitz wrote:
>>> On 13.08.19 13:04, Kevin Wolf wrote:
>>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>>> returned file. But is it correct behavior at all? If returned file
>>>>> itself has a backing file, we may report as totally unallocated and
>>>>> area which actually has data in bottom backing file.
>>>>>
>>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>>> by following commit with a test. Let's make raw-format behave more
>>>>> correctly returning BDRV_BLOCK_DATA.
>>>>>
>>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>
>>>> After some reading, I think I came to the conclusion that RAW is the
>>>> correct thing to do. There is indeed a problem, but this patch is trying
>>>> to fix it in the wrong place.
>>>>
>>>> In the case where the backing file contains some data, and we have a
>>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>>> block is not defined by the queried backing file layer, so it is
>>>> completely correct that bdrv_is_allocated() returns false,like it would
>>>> if you queried the qcow2 layer directly.
>>>
>>> I disagree.  The queried backing file layer is the raw node.  As I said,
>>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>>> have an offset option), nor in how they are generally used (as a format).
>>>
>>> The raw format does not support backing files.  Therefore, everything on
>>> a raw node is allocated.
>>>
>>
>> Could you tell me at least, what means "allocated" ?
>>
>> It's a term that describing a region somehow.. But how? Allocated where?
>> In raw node, in its child or both? Am I right that if region allocated in
>> one of non-cow children it is assumed to be allocated in parent too? Or what?
>>
>> And it's unrelated to real disk allocation which (IMHO) directly shows that
>> this a bad term.
> 
> It’s a term for COW backing chains.  If something is allocated on a
> given node in a COW backing chain, it means it is either present in
> exactly that node or in one of its storage children (in case the node is
> a format node).  If it is not allocated, it is not, and read accesses
> will be forwarded to the COW backing child.
> 

And this definition leads exactly to bug in these series:


[raw]
   |
   |file
   V       file
[qcow2]--------->[file]
   |
   |backing
   V
[base]


Assume something is actually allocated in [base] but not in [qcow2].
So, [qcow2] node reports it as unallocated. So nobdy of [raw]'s storage
children contains this as allocated, so it's unallocated in [raw].


And if you say that logic is different for raw as it don't have COW child,
we may put qcow2 instead of raw here with same problem [yes, qcow2 on qcow2
is a bit strange, but we try to make something generic]




-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 14:43     ` [Qemu-devel] " Max Reitz
  2019-08-13 14:56       ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 15:41       ` Kevin Wolf
  2019-08-13 15:54         ` Vladimir Sementsov-Ogievskiy
  2019-08-13 16:21         ` Max Reitz
  1 sibling, 2 replies; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 15:41 UTC (permalink / raw)
  To: Max Reitz; +Cc: den, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3715 bytes --]

Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
> On 13.08.19 13:04, Kevin Wolf wrote:
> > Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
> >> returned file. But is it correct behavior at all? If returned file
> >> itself has a backing file, we may report as totally unallocated and
> >> area which actually has data in bottom backing file.
> >>
> >> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
> >> by following commit with a test. Let's make raw-format behave more
> >> correctly returning BDRV_BLOCK_DATA.
> >>
> >> Suggested-by: Max Reitz <mreitz@redhat.com>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > 
> > After some reading, I think I came to the conclusion that RAW is the
> > correct thing to do. There is indeed a problem, but this patch is trying
> > to fix it in the wrong place.
> > 
> > In the case where the backing file contains some data, and we have a
> > 'raw' node above the qcow2 overlay node, the content of the respective
> > block is not defined by the queried backing file layer, so it is
> > completely correct that bdrv_is_allocated() returns false,like it would
> > if you queried the qcow2 layer directly.
> 
> I disagree.  The queried backing file layer is the raw node.  As I said,
> in my opinion raw nodes are not filter nodes, neither in behavior (they
> have an offset option), nor in how they are generally used (as a format).
> 
> The raw format does not support backing files.  Therefore, everything on
> a raw node is allocated.
> 
> (That is, like, my opinion.)
> 
> >                                          If it returned true, we would
> > copy everything, which isn't right either (the test cases should may add
> > the qemu-img map output of the target so this becomes visible).
> 
> It is right.

So we don't even agree what mirroring the raw node should even mean.

I can the see your point when you say that the raw node has no backing
file, so everything should be copied. But I can also see the point that
the raw node can really just be used as a filter that limits the data
exposed from the qcow2 layer, and you want to keep the copy a COW
overlay over the same backing file.

Both are valid use cases in principle and there is no single right or
wrong.

We don't currently support the latter use case because we have only
sync=full or sync=top, but if you could specify a base node instead, we
could probably suport the case without all of the special-casing filter
nodes and backing file childs.

You would call bdrv_co_block_status_above() with the right base node and
it would just recurse whereever the data is stored, be it bs->backing,
bs->file or even driver-specific children. This would allow you to find
out whether some block in the top node came from the base node that
we're going to keep. If yes, skip it; if no, copy it.

This way we wouldn't have to decide whether raw is a filter or not,
because it wouldn't make a difference. The behaviour would only depend
on the base node given by the user. If you specified the top-level qcow2
file as the base, you get your full copy; if you specified the backing
qcow2, you get the partial copy where the target still uses the same
backing file.

(Hm... It would only actually work if the offsets stay the same in the
chain, which is true for backing file children, but not necessarily for
other children. Anyway, even if we don't gain much functionality, I
really want a more generic model that avoids different types of nodes
and edges as much as possible.)

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 15:41       ` Kevin Wolf
@ 2019-08-13 15:54         ` Vladimir Sementsov-Ogievskiy
  2019-08-13 16:08           ` Kevin Wolf
  2019-08-13 16:21         ` Max Reitz
  1 sibling, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 15:54 UTC (permalink / raw)
  To: Kevin Wolf, Max Reitz; +Cc: qemu-devel, qemu-block, Denis Lunev

13.08.2019 18:41, Kevin Wolf wrote:
> Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
>> On 13.08.19 13:04, Kevin Wolf wrote:
>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>> returned file. But is it correct behavior at all? If returned file
>>>> itself has a backing file, we may report as totally unallocated and
>>>> area which actually has data in bottom backing file.
>>>>
>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>> by following commit with a test. Let's make raw-format behave more
>>>> correctly returning BDRV_BLOCK_DATA.
>>>>
>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>
>>> After some reading, I think I came to the conclusion that RAW is the
>>> correct thing to do. There is indeed a problem, but this patch is trying
>>> to fix it in the wrong place.
>>>
>>> In the case where the backing file contains some data, and we have a
>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>> block is not defined by the queried backing file layer, so it is
>>> completely correct that bdrv_is_allocated() returns false,like it would
>>> if you queried the qcow2 layer directly.
>>
>> I disagree.  The queried backing file layer is the raw node.  As I said,
>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>> have an offset option), nor in how they are generally used (as a format).
>>
>> The raw format does not support backing files.  Therefore, everything on
>> a raw node is allocated.
>>
>> (That is, like, my opinion.)
>>
>>>                                           If it returned true, we would
>>> copy everything, which isn't right either (the test cases should may add
>>> the qemu-img map output of the target so this becomes visible).
>>
>> It is right.
> 
> So we don't even agree what mirroring the raw node should even mean.
> 
> I can the see your point when you say that the raw node has no backing
> file, so everything should be copied. But I can also see the point that
> the raw node can really just be used as a filter that limits the data
> exposed from the qcow2 layer, and you want to keep the copy a COW
> overlay over the same backing file.
> 
> Both are valid use cases in principle and there is no single right or
> wrong.
> 
> We don't currently support the latter use case because we have only
> sync=full or sync=top, but if you could specify a base node instead, we
> could probably suport the case without all of the special-casing filter
> nodes and backing file childs.
> 
> You would call bdrv_co_block_status_above() with the right base node and
> it would just recurse whereever the data is stored, be it bs->backing,
> bs->file or even driver-specific children. This would allow you to find
> out whether some block in the top node came from the base node that
> we're going to keep. If yes, skip it; if no, copy it.
> 
> This way we wouldn't have to decide whether raw is a filter or not,
> because it wouldn't make a difference. The behaviour would only depend
> on the base node given by the user. If you specified the top-level qcow2
> file as the base, you get your full copy;

ahm, full-copy = base is NULL..

> if you specified the backing
> qcow2, you get the partial copy where the target still uses the same
> backing file.
> 
> (Hm... It would only actually work if the offsets stay the same in the
> chain, which is true for backing file children, but not necessarily for
> other children.

Don't follow, what you mean by offsets stay the same and what is wrong with it?

> Anyway, even if we don't gain much functionality, I
> really want a more generic model that avoids different types of nodes
> and edges as much as possible.)
> 
> Kevin
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 15:22           ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 16:07             ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 16:07 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
  Cc: qemu-devel, qemu-block, Denis Lunev

[-- Attachment #1.1: Type: text/plain, Size: 3917 bytes --]

On 13.08.19 17:22, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 18:03, Max Reitz wrote:
>> On 13.08.19 16:56, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.08.2019 17:43, Max Reitz wrote:
>>>> On 13.08.19 13:04, Kevin Wolf wrote:
>>>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>>>> returned file. But is it correct behavior at all? If returned file
>>>>>> itself has a backing file, we may report as totally unallocated and
>>>>>> area which actually has data in bottom backing file.
>>>>>>
>>>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>>>> by following commit with a test. Let's make raw-format behave more
>>>>>> correctly returning BDRV_BLOCK_DATA.
>>>>>>
>>>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>
>>>>> After some reading, I think I came to the conclusion that RAW is the
>>>>> correct thing to do. There is indeed a problem, but this patch is trying
>>>>> to fix it in the wrong place.
>>>>>
>>>>> In the case where the backing file contains some data, and we have a
>>>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>>>> block is not defined by the queried backing file layer, so it is
>>>>> completely correct that bdrv_is_allocated() returns false,like it would
>>>>> if you queried the qcow2 layer directly.
>>>>
>>>> I disagree.  The queried backing file layer is the raw node.  As I said,
>>>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>>>> have an offset option), nor in how they are generally used (as a format).
>>>>
>>>> The raw format does not support backing files.  Therefore, everything on
>>>> a raw node is allocated.
>>>>
>>>
>>> Could you tell me at least, what means "allocated" ?
>>>
>>> It's a term that describing a region somehow.. But how? Allocated where?
>>> In raw node, in its child or both? Am I right that if region allocated in
>>> one of non-cow children it is assumed to be allocated in parent too? Or what?
>>>
>>> And it's unrelated to real disk allocation which (IMHO) directly shows that
>>> this a bad term.
>>
>> It’s a term for COW backing chains.  If something is allocated on a
>> given node in a COW backing chain, it means it is either present in
>> exactly that node or in one of its storage children (in case the node is
>> a format node).  If it is not allocated, it is not, and read accesses
>> will be forwarded to the COW backing child.
>>
> 
> And this definition leads exactly to bug in these series:
> 
> 
> [raw]
>    |
>    |file
>    V       file
> [qcow2]--------->[file]
>    |
>    |backing
>    V
> [base]
> 
> 
> Assume something is actually allocated in [base] but not in [qcow2].
> So, [qcow2] node reports it as unallocated. So nobdy of [raw]'s storage
> children contains this as allocated, so it's unallocated in [raw].

Well, I would say it is in raw’s storage child, because the definition
of backing chain allocation only recurses down the COW link.

If it is *anywhere* in the storage subtree, it is to be considered
allocated on this level.  If it is in the backing COW subtree, it is not.

And if it is in neither, it doesn’t matter whether we say it’s allocated
or not:

For example (1), for sparse files, the raw driver may not have the data
in the storage subtree.  But it sure as hell won’t have it in its
backing COW subtree, because it doesn’t have one.  So for simplicity’s
sake, it may just return everything as allocated.

For example (2), for sparse qcow2 files, the qcow2 driver may report
some ranges as unallocated even when it doesn’t have a backing file.  We
don’t need to change it to report such cases as allocated.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 15:54         ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 16:08           ` Kevin Wolf
  2019-08-13 16:32             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Kevin Wolf @ 2019-08-13 16:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Denis Lunev, qemu-devel, qemu-block, Max Reitz

Am 13.08.2019 um 17:54 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 13.08.2019 18:41, Kevin Wolf wrote:
> > Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
> >> On 13.08.19 13:04, Kevin Wolf wrote:
> >>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
> >>>> returned file. But is it correct behavior at all? If returned file
> >>>> itself has a backing file, we may report as totally unallocated and
> >>>> area which actually has data in bottom backing file.
> >>>>
> >>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
> >>>> by following commit with a test. Let's make raw-format behave more
> >>>> correctly returning BDRV_BLOCK_DATA.
> >>>>
> >>>> Suggested-by: Max Reitz <mreitz@redhat.com>
> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>
> >>> After some reading, I think I came to the conclusion that RAW is the
> >>> correct thing to do. There is indeed a problem, but this patch is trying
> >>> to fix it in the wrong place.
> >>>
> >>> In the case where the backing file contains some data, and we have a
> >>> 'raw' node above the qcow2 overlay node, the content of the respective
> >>> block is not defined by the queried backing file layer, so it is
> >>> completely correct that bdrv_is_allocated() returns false,like it would
> >>> if you queried the qcow2 layer directly.
> >>
> >> I disagree.  The queried backing file layer is the raw node.  As I said,
> >> in my opinion raw nodes are not filter nodes, neither in behavior (they
> >> have an offset option), nor in how they are generally used (as a format).
> >>
> >> The raw format does not support backing files.  Therefore, everything on
> >> a raw node is allocated.
> >>
> >> (That is, like, my opinion.)
> >>
> >>>                                           If it returned true, we would
> >>> copy everything, which isn't right either (the test cases should may add
> >>> the qemu-img map output of the target so this becomes visible).
> >>
> >> It is right.
> > 
> > So we don't even agree what mirroring the raw node should even mean.
> > 
> > I can the see your point when you say that the raw node has no backing
> > file, so everything should be copied. But I can also see the point that
> > the raw node can really just be used as a filter that limits the data
> > exposed from the qcow2 layer, and you want to keep the copy a COW
> > overlay over the same backing file.
> > 
> > Both are valid use cases in principle and there is no single right or
> > wrong.
> > 
> > We don't currently support the latter use case because we have only
> > sync=full or sync=top, but if you could specify a base node instead, we
> > could probably suport the case without all of the special-casing filter
> > nodes and backing file childs.
> > 
> > You would call bdrv_co_block_status_above() with the right base node and
> > it would just recurse whereever the data is stored, be it bs->backing,
> > bs->file or even driver-specific children. This would allow you to find
> > out whether some block in the top node came from the base node that
> > we're going to keep. If yes, skip it; if no, copy it.
> > 
> > This way we wouldn't have to decide whether raw is a filter or not,
> > because it wouldn't make a difference. The behaviour would only depend
> > on the base node given by the user. If you specified the top-level qcow2
> > file as the base, you get your full copy;
> 
> ahm, full-copy = base is NULL..

Oops, yes, of course. Using the top-level node would create an empty
"copy".

> > if you specified the backing
> > qcow2, you get the partial copy where the target still uses the same
> > backing file.
> > 
> > (Hm... It would only actually work if the offsets stay the same in the
> > chain, which is true for backing file children, but not necessarily for
> > other children.
> 
> Don't follow, what you mean by offsets stay the same and what is wrong
> with it?

Say we have this graph:

raw,offset=65536
    |
    v
  qcow2-----+
    |       |
    v       v
  file     base

Now you can't just mirror the raw node into a target.qcow2 that shares
base as the backing file, because the offsets will be wrong. In order to
use such a copy correctly, you'd have to use a raw node again in the
backing chain:

target.qcow2----+
    |           |
    v           v
  file      raw,offset=65536
                |
                v
              base

So the case where offsets differ between the top and the base node isn't
trivial.

(If this case isn't complicated enough yet, imagine passing file as the
base node instead... It just can't work.)

Kevin


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 15:41       ` Kevin Wolf
  2019-08-13 15:54         ` Vladimir Sementsov-Ogievskiy
@ 2019-08-13 16:21         ` Max Reitz
  1 sibling, 0 replies; 38+ messages in thread
From: Max Reitz @ 2019-08-13 16:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: den, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block

[-- Attachment #1.1: Type: text/plain, Size: 5547 bytes --]

On 13.08.19 17:41, Kevin Wolf wrote:
> Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
>> On 13.08.19 13:04, Kevin Wolf wrote:
>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>> returned file. But is it correct behavior at all? If returned file
>>>> itself has a backing file, we may report as totally unallocated and
>>>> area which actually has data in bottom backing file.
>>>>
>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>> by following commit with a test. Let's make raw-format behave more
>>>> correctly returning BDRV_BLOCK_DATA.
>>>>
>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>
>>> After some reading, I think I came to the conclusion that RAW is the
>>> correct thing to do. There is indeed a problem, but this patch is trying
>>> to fix it in the wrong place.
>>>
>>> In the case where the backing file contains some data, and we have a
>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>> block is not defined by the queried backing file layer, so it is
>>> completely correct that bdrv_is_allocated() returns false,like it would
>>> if you queried the qcow2 layer directly.
>>
>> I disagree.  The queried backing file layer is the raw node.  As I said,
>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>> have an offset option), nor in how they are generally used (as a format).
>>
>> The raw format does not support backing files.  Therefore, everything on
>> a raw node is allocated.
>>
>> (That is, like, my opinion.)
>>
>>>                                          If it returned true, we would
>>> copy everything, which isn't right either (the test cases should may add
>>> the qemu-img map output of the target so this becomes visible).
>>
>> It is right.
> 
> So we don't even agree what mirroring the raw node should even mean.
> 
> I can the see your point when you say that the raw node has no backing
> file, so everything should be copied. But I can also see the point that
> the raw node can really just be used as a filter that limits the data
> exposed from the qcow2 layer, and you want to keep the copy a COW
> overlay over the same backing file.

But it is not a filter.  If you limit the data by using an offset, that
precisely makes it not a filter.

> Both are valid use cases in principle and there is no single right or
> wrong.

I don’t get what you mean because raw is not a filter.

If it were a filter, it’d be clear: Regarding allocation information, we
skip it.  But it isn’t.  Hence we cannot skip it.

I don’t see why you’re trying to make the point with raw, when it simply
is not a filter.


> We don't currently support the latter use case because we have only
> sync=full or sync=top, but if you could specify a base node instead, we
> could probably suport the case without all of the special-casing filter
> nodes and backing file childs.

This sounds like it’d be difficult to special-case filter nodes.  It isn’t.

Whenever the user specifies some node that should refer to a backing
chain element, all you do is call bdrv_skip_rw_filters() and you land at
the corresponding COW node.

> You would call bdrv_co_block_status_above() with the right base node and
> it would just recurse whereever the data is stored, be it bs->backing,
> bs->file or even driver-specific children. This would allow you to find
> out whether some block in the top node came from the base node that
> we're going to keep. If yes, skip it; if no, copy it.
> 
> This way we wouldn't have to decide whether raw is a filter or not,

Well, it isn’t.

> because it wouldn't make a difference. The behaviour would only depend
> on the base node given by the user. If you specified the top-level qcow2
> file as the base, you get your full copy; if you specified the backing
> qcow2, you get the partial copy where the target still uses the same
> backing file.

I simply do not see your point.

For two reasons:

(1) I don’t think anyone should use is_allocate on nodes that are not
COW nodes.  They are likely to be doing something wrong then.

You don’t seem to agree, because you say this makes the callers
complicated.  What I think is that we don’t have that many callers, and
it’s probably a good thing when they have to think about the backing
chain structure.

But anyway.

(2) What I understand is that you propose adding some way to find out
the next element in the COW chain if is_allocated returns false.  Well,
my “Deal with filters” series adds two ways to do that:

- bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs)) will return the first
child behind the COW node, but that may be another filter.

- bdrv_backing_chain_next(bs) will return “the first non-filter backing
image of the first non-filter image.”.  It’s just
bdrv_skip_rw_filters(bdrv_filtered_cow_bs(bdrv_skip_rw_filters(bs))).

So the thing is, we don’t need to modify block_status to return that
information.  If is_allocated returns that something is not allocated,
the caller can just call one of these and thus get the next node where
to look.

When it comes to bdrv_is_allocated_above(), my series fixes that anyway,
no?  (In the sense that it does not choke on filters.)


So I cannot see how it does not come down to the callers.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 16:08           ` Kevin Wolf
@ 2019-08-13 16:32             ` Vladimir Sementsov-Ogievskiy
  2019-08-14  6:27               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-13 16:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Denis Lunev, qemu-devel, qemu-block, Max Reitz

13.08.2019 19:08, Kevin Wolf wrote:
> Am 13.08.2019 um 17:54 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 13.08.2019 18:41, Kevin Wolf wrote:
>>> Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
>>>> On 13.08.19 13:04, Kevin Wolf wrote:
>>>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>>>> returned file. But is it correct behavior at all? If returned file
>>>>>> itself has a backing file, we may report as totally unallocated and
>>>>>> area which actually has data in bottom backing file.
>>>>>>
>>>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>>>> by following commit with a test. Let's make raw-format behave more
>>>>>> correctly returning BDRV_BLOCK_DATA.
>>>>>>
>>>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>
>>>>> After some reading, I think I came to the conclusion that RAW is the
>>>>> correct thing to do. There is indeed a problem, but this patch is trying
>>>>> to fix it in the wrong place.
>>>>>
>>>>> In the case where the backing file contains some data, and we have a
>>>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>>>> block is not defined by the queried backing file layer, so it is
>>>>> completely correct that bdrv_is_allocated() returns false,like it would
>>>>> if you queried the qcow2 layer directly.
>>>>
>>>> I disagree.  The queried backing file layer is the raw node.  As I said,
>>>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>>>> have an offset option), nor in how they are generally used (as a format).
>>>>
>>>> The raw format does not support backing files.  Therefore, everything on
>>>> a raw node is allocated.
>>>>
>>>> (That is, like, my opinion.)
>>>>
>>>>>                                            If it returned true, we would
>>>>> copy everything, which isn't right either (the test cases should may add
>>>>> the qemu-img map output of the target so this becomes visible).
>>>>
>>>> It is right.
>>>
>>> So we don't even agree what mirroring the raw node should even mean.
>>>
>>> I can the see your point when you say that the raw node has no backing
>>> file, so everything should be copied. But I can also see the point that
>>> the raw node can really just be used as a filter that limits the data
>>> exposed from the qcow2 layer, and you want to keep the copy a COW
>>> overlay over the same backing file.
>>>
>>> Both are valid use cases in principle and there is no single right or
>>> wrong.
>>>
>>> We don't currently support the latter use case because we have only
>>> sync=full or sync=top, but if you could specify a base node instead, we
>>> could probably suport the case without all of the special-casing filter
>>> nodes and backing file childs.
>>>
>>> You would call bdrv_co_block_status_above() with the right base node and
>>> it would just recurse whereever the data is stored, be it bs->backing,
>>> bs->file or even driver-specific children. This would allow you to find
>>> out whether some block in the top node came from the base node that
>>> we're going to keep. If yes, skip it; if no, copy it.
>>>
>>> This way we wouldn't have to decide whether raw is a filter or not,
>>> because it wouldn't make a difference. The behaviour would only depend
>>> on the base node given by the user. If you specified the top-level qcow2
>>> file as the base, you get your full copy;
>>
>> ahm, full-copy = base is NULL..
> 
> Oops, yes, of course. Using the top-level node would create an empty
> "copy".
> 
>>> if you specified the backing
>>> qcow2, you get the partial copy where the target still uses the same
>>> backing file.
>>>
>>> (Hm... It would only actually work if the offsets stay the same in the
>>> chain, which is true for backing file children, but not necessarily for
>>> other children.
>>
>> Don't follow, what you mean by offsets stay the same and what is wrong
>> with it?
> 
> Say we have this graph:
> 
> raw,offset=65536
>      |
>      v
>    qcow2-----+
>      |       |
>      v       v
>    file     base
> 
> Now you can't just mirror the raw node into a target.qcow2 that shares
> base as the backing file, because the offsets will be wrong. In order to
> use such a copy correctly, you'd have to use a raw node again in the
> backing chain:
> 
> target.qcow2----+
>      |           |
>      v           v
>    file      raw,offset=65536
>                  |
>                  v
>                base
> 
> So the case where offsets differ between the top and the base node isn't
> trivial.

Understand, but for me it don't look like the thing that behaves in unexpected
for user way, on the contrary, it seems obvious that it will not work, as user
understand what is backing file (offsets are backed by corresponding offsets)

> 
> (If this case isn't complicated enough yet, imagine passing file as the
> base node instead... It just can't work.)
> 

Yes, if we chose the way you proposed we have a possibility to use any node as a base,
but again this does something completely different from usual "top" or "based" mode..
So it's just a new possibility, may be useless, it don't break up the concept.


I like the idea of generic block_status_above that you propose, but still there should
a decision of what exactly DATA, ZERO and ALLOCATED means. Ok, assume we don't need ALLOCATED..
Then we don't need DATA too?

And how block_status_above would work? If node don't report ZERO and don't report DATA but report
*file pointer == base, we should stop, and don't go to *file, it seems obvious..

But what if node reports ZERO together with *file pointer == base? Should consider this region
belonging to base and UNALLOCATED and not copy it, or we should consider it ALLOCATED because
current node reports it ZERO? So, criteria of "ALLOCATED" or "where should we stop in recursion"
is not obvious.. Do you have one?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
  2019-08-13 16:32             ` Vladimir Sementsov-Ogievskiy
@ 2019-08-14  6:27               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 38+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-14  6:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Denis Lunev, qemu-devel, qemu-block, Max Reitz



13 авг. 2019 г. 19:32 пользователь Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> написал:

13.08.2019 19:08, Kevin Wolf wrote:
> Am 13.08.2019 um 17:54 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 13.08.2019 18:41, Kevin Wolf wrote:
>>> Am 13.08.2019 um 16:43 hat Max Reitz geschrieben:
>>>> On 13.08.19 13:04, Kevin Wolf wrote:
>>>>> Am 12.08.2019 um 20:11 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> BDRV_BLOCK_RAW makes generic bdrv_co_block_status to fallthrough to
>>>>>> returned file. But is it correct behavior at all? If returned file
>>>>>> itself has a backing file, we may report as totally unallocated and
>>>>>> area which actually has data in bottom backing file.
>>>>>>
>>>>>> So, mirroring of qcow2 under raw-format is broken. Which is illustrated
>>>>>> by following commit with a test. Let's make raw-format behave more
>>>>>> correctly returning BDRV_BLOCK_DATA.
>>>>>>
>>>>>> Suggested-by: Max Reitz <mreitz@redhat.com>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>
>>>>> After some reading, I think I came to the conclusion that RAW is the
>>>>> correct thing to do. There is indeed a problem, but this patch is trying
>>>>> to fix it in the wrong place.
>>>>>
>>>>> In the case where the backing file contains some data, and we have a
>>>>> 'raw' node above the qcow2 overlay node, the content of the respective
>>>>> block is not defined by the queried backing file layer, so it is
>>>>> completely correct that bdrv_is_allocated() returns false,like it would
>>>>> if you queried the qcow2 layer directly.
>>>>
>>>> I disagree.  The queried backing file layer is the raw node.  As I said,
>>>> in my opinion raw nodes are not filter nodes, neither in behavior (they
>>>> have an offset option), nor in how they are generally used (as a format).
>>>>
>>>> The raw format does not support backing files.  Therefore, everything on
>>>> a raw node is allocated.
>>>>
>>>> (That is, like, my opinion.)
>>>>
>>>>>                                            If it returned true, we would
>>>>> copy everything, which isn't right either (the test cases should may add
>>>>> the qemu-img map output of the target so this becomes visible).
>>>>
>>>> It is right.
>>>
>>> So we don't even agree what mirroring the raw node should even mean.
>>>
>>> I can the see your point when you say that the raw node has no backing
>>> file, so everything should be copied. But I can also see the point that
>>> the raw node can really just be used as a filter that limits the data
>>> exposed from the qcow2 layer, and you want to keep the copy a COW
>>> overlay over the same backing file.
>>>
>>> Both are valid use cases in principle and there is no single right or
>>> wrong.
>>>
>>> We don't currently support the latter use case because we have only
>>> sync=full or sync=top, but if you could specify a base node instead, we
>>> could probably suport the case without all of the special-casing filter
>>> nodes and backing file childs.
>>>
>>> You would call bdrv_co_block_status_above() with the right base node and
>>> it would just recurse whereever the data is stored, be it bs->backing,
>>> bs->file or even driver-specific children. This would allow you to find
>>> out whether some block in the top node came from the base node that
>>> we're going to keep. If yes, skip it; if no, copy it.
>>>
>>> This way we wouldn't have to decide whether raw is a filter or not,
>>> because it wouldn't make a difference. The behaviour would only depend
>>> on the base node given by the user. If you specified the top-level qcow2
>>> file as the base, you get your full copy;
>>
>> ahm, full-copy = base is NULL..
>
> Oops, yes, of course. Using the top-level node would create an empty
> "copy".
>
>>> if you specified the backing
>>> qcow2, you get the partial copy where the target still uses the same
>>> backing file.
>>>
>>> (Hm... It would only actually work if the offsets stay the same in the
>>> chain, which is true for backing file children, but not necessarily for
>>> other children.
>>
>> Don't follow, what you mean by offsets stay the same and what is wrong
>> with it?
>
> Say we have this graph:
>
> raw,offset=65536
>      |
>      v
>    qcow2-----+
>      |       |
>      v       v
>    file     base
>
> Now you can't just mirror the raw node into a target.qcow2 that shares
> base as the backing file, because the offsets will be wrong. In order to
> use such a copy correctly, you'd have to use a raw node again in the
> backing chain:
>
> target.qcow2----+
>      |           |
>      v           v
>    file      raw,offset=65536
>                  |
>                  v
>                base
>
> So the case where offsets differ between the top and the base node isn't
> trivial.

Understand, but for me it don't look like the thing that behaves in unexpected
for user way, on the contrary, it seems obvious that it will not work, as user
understand what is backing file (offsets are backed by corresponding offsets)

>
> (If this case isn't complicated enough yet, imagine passing file as the
> base node instead... It just can't work.)
>

Yes, if we chose the way you proposed we have a possibility to use any node as a base,
but again this does something completely different from usual "top" or "based" mode..
So it's just a new possibility, may be useless, it don't break up the concept.


I like the idea of generic block_status_above that you propose, but still there should
a decision of what exactly DATA, ZERO and ALLOCATED means. Ok, assume we don't need ALLOCATED..
Then we don't need DATA too?

And how block_status_above would work? If node don't report ZERO and don't report DATA but report
*file pointer == base, we should stop, and don't go to *file, it seems obvious..

But what if node reports ZERO together with *file pointer == base? Should consider this region
belonging to base and UNALLOCATED and not copy it, or we should consider it ALLOCATED because
current node reports it ZERO? So, criteria of "ALLOCATED" or "where should we stop in recursion"
is not obvious.. Do you have one?

And now I see a drawback of full recursion as Kevin proposes: it's slower when we are interested in allocation. We don't need to recurse through file* backing chain, if *file is not backing, as we already know that data is allocated.

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, back to index

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-12 18:11 [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Vladimir Sementsov-Ogievskiy
2019-08-12 18:11 ` [Qemu-devel] [PATCH 1/2] block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE Vladimir Sementsov-Ogievskiy
2019-08-13 11:04   ` Kevin Wolf
2019-08-13 11:28     ` Vladimir Sementsov-Ogievskiy
2019-08-13 12:01       ` Kevin Wolf
2019-08-13 13:21         ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2019-08-13 14:46           ` Max Reitz
2019-08-13 14:43     ` [Qemu-devel] " Max Reitz
2019-08-13 14:56       ` Vladimir Sementsov-Ogievskiy
2019-08-13 15:03         ` Max Reitz
2019-08-13 15:22           ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:07             ` Max Reitz
2019-08-13 15:41       ` Kevin Wolf
2019-08-13 15:54         ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:08           ` Kevin Wolf
2019-08-13 16:32             ` Vladimir Sementsov-Ogievskiy
2019-08-14  6:27               ` Vladimir Sementsov-Ogievskiy
2019-08-13 16:21         ` Max Reitz
2019-08-12 18:11 ` [Qemu-devel] [PATCH 2/2] iotests: test mirroring qcow2 under raw format Vladimir Sementsov-Ogievskiy
2019-08-13  9:10   ` Kevin Wolf
2019-08-13  9:22     ` Vladimir Sementsov-Ogievskiy
2019-08-13  9:36       ` Kevin Wolf
2019-08-12 19:46 ` [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW Max Reitz
2019-08-12 19:50   ` Max Reitz
2019-08-13  8:39     ` Vladimir Sementsov-Ogievskiy
2019-08-13  9:01       ` Vladimir Sementsov-Ogievskiy
2019-08-13  9:33         ` Vladimir Sementsov-Ogievskiy
2019-08-13 11:14           ` Vladimir Sementsov-Ogievskiy
2019-08-13 11:51             ` Kevin Wolf
2019-08-13 13:00               ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:31               ` Max Reitz
2019-08-13 14:46                 ` Vladimir Sementsov-Ogievskiy
2019-08-13 14:53                   ` Max Reitz
2019-08-13 15:03                     ` Kevin Wolf
2019-08-13 15:04                       ` Max Reitz
2019-08-13 14:50                 ` Eric Blake
2019-08-13  9:34   ` Kevin Wolf
2019-08-13 14:38     ` Max Reitz

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org qemu-devel@archiver.kernel.org
	public-inbox-index qemu-devel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox