qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* backing chain & block status & filters
@ 2020-04-28  8:55 Vladimir Sementsov-Ogievskiy
  2020-04-28 11:08 ` Max Reitz
  0 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-28  8:55 UTC (permalink / raw)
  To: qemu block; +Cc: Kevin Wolf, Andrey Shinkevich, qemu-devel, Max Reitz

Hi!

I wanted to resend my "[PATCH 0/4] fix & merge block_status_above and is_allocated_above", and returned to all the inconsistencies about block-status. I keep in mind Max's series about child-access functions, and Andrey's work about using COR filter in block-stream, which depends on Max's series (because, without them COR fitler with file child breaks backing chains).. And, it seems that it's better to discuss some questions before resending.

First, problems about block-status:

1. We consider ALLOCATED = ZERO | DATA, and documented as follows:

    * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
    * BDRV_BLOCK_ZERO: offset reads as zero
    * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
    * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    *                       layer rather than any backing, set by block layer

This actually means, that we should always have BDRV_BLOCK_ALLOCATED for formats which doesn't support backing. So, all such format drivers must return ZERO or DATA (or both?), yes?. Seems file-posix does so, but, for example, iscsi - doesn't.

2. ZERO. The meaning differs a bit for generic block_status and for drivers.. I think, we at least should document it like this:

BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
BDRV_BLOCK_ZERO: if driver return ZERO, than the region is allocated at this layer and read as ZERO. If generic block_status returns ZERO, it only mean that it reads as zero, but the region may be allocated on underlying level.

3. bdi.unallocated_blocks_are_zero

I think it's very bad, that we have formats, that supports backing, but doesn't report bdi.unallocated_blocks_are_zero as true. It means that UNALLOCATED region reads as zero if we have short backing file, and not as zero if we remove this short backing file. I can live with it but this is weird logic. These bad drivers are qcow (not qcow2), parallels and vmdk. I hope, they actually just forget to set unallocated_blocks_are_zero to true.

Next. But what about drivers which doesn't support backing? As we considered above, they should always return ZERO or DATA, as everything is allocated in this backing-chain level (last level, of course).. So again unallocated_blocks_are_zero is meaningless. So, I think, that driver which doesn't support backings, should be fixed to return always ZERO or DATA, than we don't need this unallocated_blocks_are_zero at all.

3. Short backing files in allocated_above: we must consider space after EOF as ALLOCATED, if short backing file is inside requested backing-chain part, as it produced exactly because of this short file (and we never go to backing). (current realization of allocated_above is buggy, see my outdated series "[PATCH 0/4] fix & merge block_status_above and is_allocated_above")

4. Long ago we've discussed problems about BDRV_BLOCK_RAW, when we have a backing chain of non-backing child.. I just remember that we didn't reach the consensus.

5. Filters.. OK we have two functions for them: bdrv_co_block_status_from_file and bdrv_co_block_status_from_backing. I think both are wrong:

bdrv_co_block_status_from_file leads to problem [4], when we can report UNALLOCATED, which refers not to the current backing chain, but to sub backing chain of file child, which is inconsistent with block_status_above and is_allocated_above iteration.

bdrv_co_block_status_from_backing is also is not consistent with block_status_above iteration.. At least at leads to querying the same node twice.

=================

So, about filters and backing chains. Keeping (OK, just, trying to keep) all these things in mind, I think that it's better to keep backing chains exactly *backing* chains, so that "backing" child is the only "not own" child of the node. So, its close to current behavior and we don't need child-access functions. Then how filters should work:

Filter with backing child, should always return UNALLOCATED (i.e. no DATA, no ZERO), it is honest: everything is on the other level of backing chain.

Filter with file child should always return BDRV_BLOCK_DATA | BDRV_BLOCK_RECURSE, to show that:
1. everything is allocated in *this* level of backing chain
2. filter is too lazy to dig in it's file child (and, maybe the whole sub-tree of it) and asks generic layer to do it by itself, if it wants zeroes.

Then, of course, if we want some filter to be inside backing chain, it should have not "file" child but "backing". For this, we may support in current public filter both variants: backing or file, as user prefer. I.e., filter is opened either with file option or with backing and operate correspondingly. And newer filters (like backup-top) may support only backing variants.

=================

So, I propose to complicate a bit code of old file-based filters, to support backing child (which may be done on demand, when needed. For example, Virtuozzo now need only COR filter (and only to satisfy iotests, actually)), and keep generic code about block-status and handling backing-chains simpler (and simpler to document).

Another benefit is that block-graph is already visible through QAPI, and it would be good to keep backing link meaning, not adding the thing that for filters "file" link behaves as "backing".

So again, I'm for limiting backing-behavior to bs->backing child only (as it currently behaves). What do you think?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-05-07 19:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28  8:55 backing chain & block status & filters Vladimir Sementsov-Ogievskiy
2020-04-28 11:08 ` Max Reitz
2020-04-28 11:28   ` Kevin Wolf
2020-04-28 15:13     ` Vladimir Sementsov-Ogievskiy
2020-04-28 16:18       ` Eric Blake
2020-04-28 16:46         ` Vladimir Sementsov-Ogievskiy
2020-04-28 18:37           ` Kevin Wolf
2020-04-28 19:44           ` Vladimir Sementsov-Ogievskiy
2020-04-29  9:15             ` Vladimir Sementsov-Ogievskiy
2020-04-29 10:50               ` Vladimir Sementsov-Ogievskiy
2020-04-28 14:51   ` Vladimir Sementsov-Ogievskiy
2020-04-30 19:12     ` Vladimir Sementsov-Ogievskiy
2020-05-01  3:04       ` Andrey Shinkevich
2020-05-06  5:56         ` Vladimir Sementsov-Ogievskiy
2020-05-07 12:58     ` Max Reitz
2020-05-07 19:34       ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).